Re: [PATCH v2 1/2] KVM: arm64: Add PMU event filtering infrastructure

2020-08-19 Thread Alexander Graf

Hi Marc,

On 10.03.20 19:00, Marc Zyngier wrote:

On 2020-03-10 17:40, Auger Eric wrote:

Hi Marc,

On 3/10/20 12:03 PM, Marc Zyngier wrote:

Hi Eric,

On 2020-03-09 18:05, Auger Eric wrote:

Hi Marc,

On 3/9/20 1:48 PM, Marc Zyngier wrote:

It can be desirable to expose a PMU to a guest, and yet not want the
guest to be able to count some of the implemented events (because this
would give information on shared resources, for example.

For this, let's extend the PMUv3 device API, and offer a way to 
setup a

bitmap of the allowed events (the default being no bitmap, and thus no
filtering).

Userspace can thus allow/deny ranges of event. The default policy
depends on the "polarity" of the first filter setup (default deny 
if the

filter allows events, and default allow if the filter denies events).
This allows to setup exactly what is allowed for a given guest.

Note that although the ioctl is per-vcpu, the map of allowed events is
global to the VM (it can be setup from any vcpu until the vcpu PMU is
initialized).

Signed-off-by: Marc Zyngier 
---
 arch/arm64/include/asm/kvm_host.h |  6 +++
 arch/arm64/include/uapi/asm/kvm.h | 16 ++
 virt/kvm/arm/arm.c    |  2 +
 virt/kvm/arm/pmu.c    | 84 
+--

 4 files changed, 92 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h
b/arch/arm64/include/asm/kvm_host.h
index 57fd46acd058..8e63c618688d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -91,6 +91,12 @@ struct kvm_arch {
  * supported.
  */
 bool return_nisv_io_abort_to_user;
+
+    /*
+ * VM-wide PMU filter, implemented as a bitmap and big enough
+ * for up to 65536 events
+ */
+    unsigned long *pmu_filter;
 };

 #define KVM_NR_MEM_OBJS 40
diff --git a/arch/arm64/include/uapi/asm/kvm.h
b/arch/arm64/include/uapi/asm/kvm.h
index ba85bb23f060..7b1511d6ce44 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -159,6 +159,21 @@ struct kvm_sync_regs {
 struct kvm_arch_memory_slot {
 };

+/*
+ * PMU filter structure. Describe a range of events with a particular
+ * action. To be used with KVM_ARM_VCPU_PMU_V3_FILTER.
+ */
+struct kvm_pmu_event_filter {
+    __u16    base_event;
+    __u16    nevents;
+
+#define KVM_PMU_EVENT_ALLOW    0
+#define KVM_PMU_EVENT_DENY    1
+
+    __u8    action;
+    __u8    pad[3];
+};
+
 /* for KVM_GET/SET_VCPU_EVENTS */
 struct kvm_vcpu_events {
 struct {
@@ -329,6 +344,7 @@ struct kvm_vcpu_events {
 #define KVM_ARM_VCPU_PMU_V3_CTRL    0
 #define   KVM_ARM_VCPU_PMU_V3_IRQ    0
 #define   KVM_ARM_VCPU_PMU_V3_INIT    1
+#define   KVM_ARM_VCPU_PMU_V3_FILTER    2
 #define KVM_ARM_VCPU_TIMER_CTRL    1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER    0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER    1
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index eda7b624eab8..8d849ac88a44 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -164,6 +164,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 free_percpu(kvm->arch.last_vcpu_ran);
 kvm->arch.last_vcpu_ran = NULL;

+    bitmap_free(kvm->arch.pmu_filter);
+
 for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 if (kvm->vcpus[i]) {
 kvm_vcpu_destroy(kvm->vcpus[i]);
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index f0d0312c0a55..9f0fd0224d5b 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -579,10 +579,19 @@ static void kvm_pmu_create_perf_event(struct
kvm_vcpu *vcpu, u64 select_idx)

 kvm_pmu_stop_counter(vcpu, pmc);
 eventsel = data & ARMV8_PMU_EVTYPE_EVENT;
+    if (pmc->idx == ARMV8_PMU_CYCLE_IDX)
+    eventsel = ARMV8_PMUV3_PERFCTR_CPU_CYCLES;

nit:
if (pmc->idx == ARMV8_PMU_CYCLE_IDX)
    eventsel = ARMV8_PMUV3_PERFCTR_CPU_CYCLES;
else
    eventsel = data & ARMV8_PMU_EVTYPE_EVENT;


You don't like it? ;-)

? eventset set only once instead of 2 times


The compiler does the right thing, but sore, I'll change it.


I haven't seen a v3 follow-up after this. Do you happen to have that 
somewhere in a local branch and just need to send it out or would you 
prefer if I pick up v2 and address the comments?



Thanks,

Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

2019-09-06 Thread Alexander Graf




On 06.09.19 15:50, Peter Maydell wrote:

On Fri, 6 Sep 2019 at 14:41, Alexander Graf  wrote:

On 06.09.19 15:31, Peter Maydell wrote:

(b) we try to reuse the code we already have that does TCG exception
injection, which might or might not be a design mistake, and


That's probably a design mistake, correct :)


Well, conceptually it's not necessarily a bad idea, because
in both cases what we're doing is "change the system register
state (PC, ESR_EL1, ELR_EL1 etc) so that the CPU looks like
it's just taken an exception"; but some of what the
TCG code needs to do isn't necessary for KVM and all of it
was not written with the idea of KVM in mind at all...


Yes, and it probably makes sense to model the QEMU internal API 
similarly, so that exception generating code does not have to distinguish.


However, it's much easier for KVM to ensure exception prioritization as 
well as internal state synchronization. Conceptually, as you've seen, it 
really doesn't make a lot of sense to pull register state from KVM, 
wiggle it and then push it down when KVM has awareness of the same 
transformation anyway.


So I guess the path is clear: Create a generic interface for exception 
injection and a separate patch similar to what Christoffer already 
posted with the new CAP to route illegal MMIO traps into user space.


Connecting the two dots in user space really should be trivial then.

(famous last words)


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

2019-09-06 Thread Alexander Graf




On 06.09.19 15:12, Christoffer Dall wrote:

On Fri, Sep 06, 2019 at 02:08:15PM +0200, Alexander Graf wrote:



On 06.09.19 10:00, Christoffer Dall wrote:

On Thu, Sep 05, 2019 at 02:09:18PM +0100, Marc Zyngier wrote:

On 05/09/2019 10:22, Christoffer Dall wrote:

On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote:

On Thu, 5 Sep 2019 at 09:52, Marc Zyngier  wrote:


On Thu, 05 Sep 2019 09:16:54 +0100,
Peter Maydell  wrote:

This is true, but the problem is that barfing out to userspace
makes it harder to debug the guest because it means that
the VM is immediately destroyed, whereas AIUI if we
inject some kind of exception then (assuming you're set up
to do kernel-debug via gdbstub) you can actually examine
the offending guest code with a debugger because at least
your VM is still around to inspect...


To Christoffer's point, I find the benefit a bit dubious. Yes, you get
an exception, but the instruction that caused it may be completely
legal (store with post-increment, for example), leading to an even
more puzzled developer (that exception should never have been
delivered the first place).


Right, but the combination of "host kernel prints a message
about an unsupported load/store insn" and "within-guest debug
dump/stack trace/etc" is much more useful than just having
"host kernel prints message" and "QEMU exits"; and it requires
about 3 lines of code change...


I'm far more in favour of dumping the state of the access in the run
structure (much like we do for a MMIO access) and let userspace do
something about it (such as dumping information on the console or
breaking). It could even inject an exception *if* the user has asked
for it.


...whereas this requires agreement on a kernel-userspace API,
larger changes in the kernel, somebody to implement the userspace
side of things, and the user to update both the kernel and QEMU.
It's hard for me to see that the benefit here over the 3-line
approach really outweighs the extra effort needed. In practice
saying "we should do this" is saying "we're going to do nothing",
based on the historical record.



How about something like the following (completely untested, liable for
ABI discussions etc. etc., but for illustration purposes).

I think it raises the question (and likely many other) of whether we can
break the existing 'ABI' and change behavior for missing ISV
retrospectively for legacy user space when the issue has occurred?
Someone might have written code that reacts to the -ENOSYS, so I've
taken the conservative approach for this for the time being.


diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 8a37c8e89777..19a92c49039c 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -76,6 +76,14 @@ struct kvm_arch {
/* Mandated version of PSCI */
u32 psci_version;
+
+   /*
+* If we encounter a data abort without valid instruction syndrome
+* information, report this to user space.  User space can (and
+* should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
+* supported.
+*/
+   bool return_nisv_io_abort_to_user;
   };
   #define KVM_NR_MEM_OBJS 40
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index f656169db8c3..019bc560edc1 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -83,6 +83,14 @@ struct kvm_arch {
/* Mandated version of PSCI */
u32 psci_version;
+
+   /*
+* If we encounter a data abort without valid instruction syndrome
+* information, report this to user space.  User space can (and
+* should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
+* supported.
+*/
+   bool return_nisv_io_abort_to_user;
   };
   #define KVM_NR_MEM_OBJS 40
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 5e3f12d5359e..a4dd004d0db9 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -235,6 +235,7 @@ struct kvm_hyperv_exit {
   #define KVM_EXIT_S390_STSI25
   #define KVM_EXIT_IOAPIC_EOI   26
   #define KVM_EXIT_HYPERV   27
+#define KVM_EXIT_ARM_NISV 28
   /* For KVM_EXIT_INTERNAL_ERROR */
   /* Emulate instruction failed. */
@@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt {
   #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
   #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
   #define KVM_CAP_PMU_EVENT_FILTER 173
+#define KVM_CAP_ARM_NISV_TO_USER 174
   #ifdef KVM_CAP_IRQ_ROUTING
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 35a069815baf..2ce94bd9d4a9 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void)
return 0;
   }
+int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
+   struct kvm_enable_cap *cap)
+{
+   int r;
+
+   if (cap->fl

Re: [UNVERIFIED SENDER] Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

2019-09-06 Thread Alexander Graf



On 06.09.19 14:34, Marc Zyngier wrote:

On 06/09/2019 13:08, Alexander Graf wrote:



On 06.09.19 10:00, Christoffer Dall wrote:

On Thu, Sep 05, 2019 at 02:09:18PM +0100, Marc Zyngier wrote:


[...]


@@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
ret = kvm_handle_mmio_return(vcpu, vcpu->run);
if (ret)
return ret;
+   } else if (run->exit_reason == KVM_EXIT_ARM_NISV) {
+   kvm_inject_undefined(vcpu);


Just to make sure I understand: Is the expectation here that userspace
could clear the exit reason if it managed to handle the exit? And
otherwise we'd inject an UNDEF on reentry?



Yes, but I think we should change that to an external abort.  I'll test
something and send a proper patch with more clear documentation.


Why not leave the injection to user space in any case? API wise there is
no need to be backwards compatible, as we require the CAP to be enabled,
right?

IMHO it should be 100% a policy decision in user space whether to
emulate and what type of exception to inject, if anything.


The exception has to be something that the trapped instruction can
actually generate. An UNDEF is definitely wrong, as the guest would have
otherwise UNDEF'd at EL1, and KVM would have never seen it. You cannot
deviate from the rule of architecture, and userspace feels like the
wrong place to enforce it.


There are multiple viable options user space has:

  1) Trigger an external abort
  2) Emulate the instruction in user space
  3) Inject a PV mechanism into the guest to emulate the insn inside 
guest space


Why should we treat 1) any different from 2) or 3)? Why is there a "fast 
path" for the external abort, on an exit that is not performance 
critical or has any other reason to get special attention from kernel 
space. All we're doing is add more code in a privileged layer for a case 
that realistically should never occur in the first place.



Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

2019-09-06 Thread Alexander Graf



On 06.09.19 10:00, Christoffer Dall wrote:

On Thu, Sep 05, 2019 at 02:09:18PM +0100, Marc Zyngier wrote:

On 05/09/2019 10:22, Christoffer Dall wrote:

On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote:

On Thu, 5 Sep 2019 at 09:52, Marc Zyngier  wrote:


On Thu, 05 Sep 2019 09:16:54 +0100,
Peter Maydell  wrote:

This is true, but the problem is that barfing out to userspace
makes it harder to debug the guest because it means that
the VM is immediately destroyed, whereas AIUI if we
inject some kind of exception then (assuming you're set up
to do kernel-debug via gdbstub) you can actually examine
the offending guest code with a debugger because at least
your VM is still around to inspect...


To Christoffer's point, I find the benefit a bit dubious. Yes, you get
an exception, but the instruction that caused it may be completely
legal (store with post-increment, for example), leading to an even
more puzzled developer (that exception should never have been
delivered the first place).


Right, but the combination of "host kernel prints a message
about an unsupported load/store insn" and "within-guest debug
dump/stack trace/etc" is much more useful than just having
"host kernel prints message" and "QEMU exits"; and it requires
about 3 lines of code change...


I'm far more in favour of dumping the state of the access in the run
structure (much like we do for a MMIO access) and let userspace do
something about it (such as dumping information on the console or
breaking). It could even inject an exception *if* the user has asked
for it.


...whereas this requires agreement on a kernel-userspace API,
larger changes in the kernel, somebody to implement the userspace
side of things, and the user to update both the kernel and QEMU.
It's hard for me to see that the benefit here over the 3-line
approach really outweighs the extra effort needed. In practice
saying "we should do this" is saying "we're going to do nothing",
based on the historical record.



How about something like the following (completely untested, liable for
ABI discussions etc. etc., but for illustration purposes).

I think it raises the question (and likely many other) of whether we can
break the existing 'ABI' and change behavior for missing ISV
retrospectively for legacy user space when the issue has occurred?

Someone might have written code that reacts to the -ENOSYS, so I've

taken the conservative approach for this for the time being.


diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 8a37c8e89777..19a92c49039c 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -76,6 +76,14 @@ struct kvm_arch {
  
  	/* Mandated version of PSCI */

u32 psci_version;
+
+   /*
+* If we encounter a data abort without valid instruction syndrome
+* information, report this to user space.  User space can (and
+* should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
+* supported.
+*/
+   bool return_nisv_io_abort_to_user;
  };
  
  #define KVM_NR_MEM_OBJS 40

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index f656169db8c3..019bc560edc1 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -83,6 +83,14 @@ struct kvm_arch {
  
  	/* Mandated version of PSCI */

u32 psci_version;
+
+   /*
+* If we encounter a data abort without valid instruction syndrome
+* information, report this to user space.  User space can (and
+* should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
+* supported.
+*/
+   bool return_nisv_io_abort_to_user;
  };
  
  #define KVM_NR_MEM_OBJS 40

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 5e3f12d5359e..a4dd004d0db9 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -235,6 +235,7 @@ struct kvm_hyperv_exit {
  #define KVM_EXIT_S390_STSI25
  #define KVM_EXIT_IOAPIC_EOI   26
  #define KVM_EXIT_HYPERV   27
+#define KVM_EXIT_ARM_NISV 28
  
  /* For KVM_EXIT_INTERNAL_ERROR */

  /* Emulate instruction failed. */
@@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt {
  #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
  #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
  #define KVM_CAP_PMU_EVENT_FILTER 173
+#define KVM_CAP_ARM_NISV_TO_USER 174
  
  #ifdef KVM_CAP_IRQ_ROUTING
  
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c

index 35a069815baf..2ce94bd9d4a9 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void)
return 0;
  }
  
+int kvm_vm_ioctl_enable_cap(struct kvm *kvm,

+   struct kvm_enable_cap *cap)
+{
+   int r;
+
+   if (cap->flags)
+   return -EINVAL;
+
+   switch (cap->cap) {
+   case KVM_CAP_ARM_NISV_TO_USER:
+   r = 0;
+   

Re: [UNVERIFIED SENDER] Re: [PATCH 0/9] arm64: Stolen time support

2019-08-14 Thread Alexander Graf




On 14.08.19 16:19, Marc Zyngier wrote:

On Wed, 14 Aug 2019 14:02:25 +0100,
Alexander Graf  wrote:




On 05.08.19 15:06, Steven Price wrote:

On 03/08/2019 19:05, Marc Zyngier wrote:

On Fri,  2 Aug 2019 15:50:08 +0100
Steven Price  wrote:

Hi Steven,


This series add support for paravirtualized time for arm64 guests and
KVM hosts following the specification in Arm's document DEN 0057A:

https://developer.arm.com/docs/den0057/a

It implements support for stolen time, allowing the guest to
identify time when it is forcibly not executing.

It doesn't implement support for Live Physical Time (LPT) as there are
some concerns about the overheads and approach in the above
specification, and I expect an updated version of the specification to
be released soon with just the stolen time parts.


Thanks for posting this.

My current concern with this series is around the fact that we allocate
memory from the kernel on behalf of the guest. It is the first example
of such thing in the ARM port, and I can't really say I'm fond of it.

x86 seems to get away with it by having the memory allocated from
userspace, why I tend to like more. Yes, put_user is more
expensive than a straight store, but this isn't done too often either.

What is the rational for your current approach?


As I see it there are 3 approaches that can be taken here:

1. Hypervisor allocates memory and adds it to the virtual machine. This
means that everything to do with the 'device' is encapsulated behind the
KVM_CREATE_DEVICE / KVM_[GS]ET_DEVICE_ATTR ioctls. But since we want the
stolen time structure to be fast it cannot be a trapping region and has
to be backed by real memory - in this case allocated by the host kernel.

2. Host user space allocates memory. Similar to above, but this time
user space needs to manage the memory region as well as the usual
KVM_CREATE_DEVICE dance. I've no objection to this, but it means
kvmtool/QEMU needs to be much more aware of what is going on (e.g. how
to size the memory region).


You ideally want to get the host overhead for a VM to as little as you
can. I'm not terribly fond of the idea of reserving a full page just
because we're too afraid of having the guest donate memory.


Well, reduce the amount of memory you give to the guest by one page,
and allocate that page to the stolen time device. Problem solved!

Seriously, if you're worried about the allocation of a single page,
you should first look at how many holes we have in the vcpu structure,
for example (even better, with the 8.4 NV patches applied). Just
fixing that would give you that page back *per vcpu*.


I'm worried about additional memory slots, about fragmenting the 
cachable guest memory regions, about avoidable HV taxes.


I think we need to distinguish here between the KVM implementation and 
the hypervisor/guest interface. Just because in KVM we can save overhead 
today doesn't mean that the HV interface should be built around the 
assumption that "memory is free".





3. Guest kernel "donates" the memory to the hypervisor for the
structure. As far as I'm aware this is what x86 does. The problems I see
this approach are:

   a) kexec becomes much more tricky - there needs to be a disabling
mechanism for the guest to stop the hypervisor scribbling on memory
before starting the new kernel.


I wouldn't call "quiesce a device" much more tricky. We have to do
that for other devices as well today.


And since there is no standard way of doing it, we keep inventing
weird and wonderful ways of doing so -- cue the terrible GICv3 LPI
situation, and all the various hacks to keep existing IOMMU mappings
around across firmware/kernel handovers as well as kexec.


Well, the good news here is that we don't have to keep it around ;).






   b) If there is more than one entity that is interested in the
information (e.g. firmware and kernel) then this requires some form of
arbitration in the guest because the hypervisor doesn't want to have to
track an arbitrary number of regions to update.


Why would FW care?


Exactly. It doesn't care. Not caring means it doesn't know about the
page the guest has allocated for stolen time, and starts using it for
its own purposes. Hello, memory corruption. Same thing goes if you
reboot into a non stolen time aware kernel.


If you reboot, you go via the vcpu reset path which clears the map, no? 
Same goes for FW entry. If you enter firmware that does not set up the 
map, you never see it.







   c) Performance can suffer if the host kernel doesn't have a suitably
aligned/sized area to use. As you say - put_user() is more expensive.


Just define the interface to always require natural alignment when
donating a memory location?


The structure is updated on every return to the VM.


If you really do suffer from put_user(), there are alternatives. You
could just map the page on the registration hcall and then leave it
pinned until the vcpu gets destroyed again.


put_user() sho

Re: [PATCH 0/9] arm64: Stolen time support

2019-08-14 Thread Alexander Graf




On 05.08.19 15:06, Steven Price wrote:

On 03/08/2019 19:05, Marc Zyngier wrote:

On Fri,  2 Aug 2019 15:50:08 +0100
Steven Price  wrote:

Hi Steven,


This series add support for paravirtualized time for arm64 guests and
KVM hosts following the specification in Arm's document DEN 0057A:

https://developer.arm.com/docs/den0057/a

It implements support for stolen time, allowing the guest to
identify time when it is forcibly not executing.

It doesn't implement support for Live Physical Time (LPT) as there are
some concerns about the overheads and approach in the above
specification, and I expect an updated version of the specification to
be released soon with just the stolen time parts.


Thanks for posting this.

My current concern with this series is around the fact that we allocate
memory from the kernel on behalf of the guest. It is the first example
of such thing in the ARM port, and I can't really say I'm fond of it.

x86 seems to get away with it by having the memory allocated from
userspace, why I tend to like more. Yes, put_user is more
expensive than a straight store, but this isn't done too often either.

What is the rational for your current approach?


As I see it there are 3 approaches that can be taken here:

1. Hypervisor allocates memory and adds it to the virtual machine. This
means that everything to do with the 'device' is encapsulated behind the
KVM_CREATE_DEVICE / KVM_[GS]ET_DEVICE_ATTR ioctls. But since we want the
stolen time structure to be fast it cannot be a trapping region and has
to be backed by real memory - in this case allocated by the host kernel.

2. Host user space allocates memory. Similar to above, but this time
user space needs to manage the memory region as well as the usual
KVM_CREATE_DEVICE dance. I've no objection to this, but it means
kvmtool/QEMU needs to be much more aware of what is going on (e.g. how
to size the memory region).


You ideally want to get the host overhead for a VM to as little as you 
can. I'm not terribly fond of the idea of reserving a full page just 
because we're too afraid of having the guest donate memory.




3. Guest kernel "donates" the memory to the hypervisor for the
structure. As far as I'm aware this is what x86 does. The problems I see
this approach are:

  a) kexec becomes much more tricky - there needs to be a disabling
mechanism for the guest to stop the hypervisor scribbling on memory
before starting the new kernel.


I wouldn't call "quiesce a device" much more tricky. We have to do that 
for other devices as well today.



  b) If there is more than one entity that is interested in the
information (e.g. firmware and kernel) then this requires some form of
arbitration in the guest because the hypervisor doesn't want to have to
track an arbitrary number of regions to update.


Why would FW care?


  c) Performance can suffer if the host kernel doesn't have a suitably
aligned/sized area to use. As you say - put_user() is more expensive.


Just define the interface to always require natural alignment when 
donating a memory location?



The structure is updated on every return to the VM.


If you really do suffer from put_user(), there are alternatives. You 
could just map the page on the registration hcall and then leave it 
pinned until the vcpu gets destroyed again.



Of course x86 does prove the third approach can work, but I'm not sure
which is actually better. Avoid the kexec cancellation requirements was
the main driver of the current approach. Although many of the


I really don't understand the problem with kexec cancellation. Worst 
case, let guest FW set it up for you and propagate only the address down 
via ACPI/DT. That way you can mark the respective memory as reserved too.


But even with a Linux only mechanism, just take a look at 
arch/x86/kernel/kvmclock.c. All they do to remove the map is to hook 
into machine_crash_shutdown() and machine_shutdown().



Alex


conversations about this were also tied up with Live Physical Time which
adds its own complications.

Steve
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm



Re: [PATCH kvm-unit-tests v4] arm: Add PL031 test

2019-07-25 Thread Alexander Graf




On 25.07.19 15:25, Andrew Jones wrote:

On Thu, Jul 25, 2019 at 03:09:49PM +0200, Alexander Graf wrote:

This patch adds a unit test for the PL031 RTC that is used in the virt machine.
It just pokes basic functionality. I've mostly written it to familiarize myself
with the device, but I suppose having the test around does not hurt, as it also
exercises the GIC SPI interrupt path.

Signed-off-by: Alexander Graf 
Reviewed-by: Andrew Jones 

---

v1 -> v2:

   - Use FDT to find base, irq and existence
   - Put isb after timer read
   - Use dist_base for gicv3

v2 -> v3

   - Enable compilation on 32bit ARM target
   - Use ioremap

v3 -> v4:

   - Use dt_pbus_translate_node()
   - Make irq_triggered volatile
---
  arm/Makefile.common |   1 +
  arm/pl031.c | 260 
  lib/arm/asm/gic.h   |   1 +
  3 files changed, 262 insertions(+)
  create mode 100644 arm/pl031.c


Thanks for the new version. I have a new nit (below), but my r-b stands
with or without making another change.

[...]


+static int rtc_fdt_init(void)
+{
+   const struct fdt_property *prop;
+   const void *fdt = dt_fdt();
+   struct dt_pbus_reg base;
+   int node, len;
+   u32 *data;
+
+   node = fdt_node_offset_by_compatible(fdt, -1, "arm,pl031");
+   if (node < 0)
+   return -1;
+
+   prop = fdt_get_property(fdt, node, "interrupts", );
+   assert(prop && len == (3 * sizeof(u32)));
+   data = (u32 *)prop->data;
+   assert(data[0] == 0); /* SPI */
+   pl031_irq = SPI(fdt32_to_cpu(data[1]));
+
+   assert(!dt_pbus_translate_node(node, 0, ));


We prefer to do something like

  ret = dt_pbus_translate_node(node, 0, );
  assert(!ret);

than the above, just in case we ever compiled with assert() defined as a
no-op. But the probability of doing that is pretty close to zero.


Yeah, but before someone wastes an hour of debugging on this later, 
let's fix it right away. Thanks for catching it! :)



Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH kvm-unit-tests v2] arm: Add PL031 test

2019-07-12 Thread Alexander Graf
This patch adds a unit test for the PL031 RTC that is used in the virt machine.
It just pokes basic functionality. I've mostly written it to familiarize myself
with the device, but I suppose having the test around does not hurt, as it also
exercises the GIC SPI interrupt path.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - Use FDT to find base, irq and existence
  - Put isb after timer read
  - Use dist_base for gicv3
---
 arm/Makefile.common |   1 +
 arm/pl031.c | 265 
 lib/arm/asm/gic.h   |   1 +
 3 files changed, 267 insertions(+)
 create mode 100644 arm/pl031.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index f0c4b5d..b8988f2 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -11,6 +11,7 @@ tests-common += $(TEST_DIR)/pmu.flat
 tests-common += $(TEST_DIR)/gic.flat
 tests-common += $(TEST_DIR)/psci.flat
 tests-common += $(TEST_DIR)/sieve.flat
+tests-common += $(TEST_DIR)/pl031.flat
 
 tests-all = $(tests-common) $(tests)
 all: directories $(tests-all)
diff --git a/arm/pl031.c b/arm/pl031.c
new file mode 100644
index 000..d975937
--- /dev/null
+++ b/arm/pl031.c
@@ -0,0 +1,265 @@
+/*
+ * Verify PL031 functionality
+ *
+ * This test verifies whether the emulated PL031 behaves correctly.
+ *
+ * Copyright 2019 Amazon.com, Inc. or its affiliates.
+ * Author: Alexander Graf 
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct pl031_regs {
+   uint32_t dr;/* Data Register */
+   uint32_t mr;/* Match Register */
+   uint32_t lr;/* Load Register */
+   union {
+   uint8_t cr; /* Control Register */
+   uint32_t cr32;
+   };
+   union {
+   uint8_t imsc;   /* Interrupt Mask Set or Clear register */
+   uint32_t imsc32;
+   };
+   union {
+   uint8_t ris;/* Raw Interrupt Status */
+   uint32_t ris32;
+   };
+   union {
+   uint8_t mis;/* Masked Interrupt Status */
+   uint32_t mis32;
+   };
+   union {
+   uint8_t icr;/* Interrupt Clear Register */
+   uint32_t icr32;
+   };
+   uint32_t reserved[1008];
+   uint32_t periph_id[4];
+   uint32_t pcell_id[4];
+};
+
+static u32 cntfrq;
+static struct pl031_regs *pl031;
+static int pl031_irq;
+static void *gic_ispendr;
+static void *gic_isenabler;
+static bool irq_triggered;
+
+static uint64_t read_timer(void)
+{
+   uint64_t r = read_sysreg(cntpct_el0);
+   isb();
+
+   return r;
+}
+
+static int check_id(void)
+{
+   uint32_t id[] = { 0x31, 0x10, 0x14, 0x00, 0x0d, 0xf0, 0x05, 0xb1 };
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(id); i++)
+   if (id[i] != readl(>periph_id[i]))
+   return 1;
+
+   return 0;
+}
+
+static int check_ro(void)
+{
+   uint32_t offs[] = { offsetof(struct pl031_regs, ris),
+   offsetof(struct pl031_regs, mis),
+   offsetof(struct pl031_regs, periph_id[0]),
+   offsetof(struct pl031_regs, periph_id[1]),
+   offsetof(struct pl031_regs, periph_id[2]),
+   offsetof(struct pl031_regs, periph_id[3]),
+   offsetof(struct pl031_regs, pcell_id[0]),
+   offsetof(struct pl031_regs, pcell_id[1]),
+   offsetof(struct pl031_regs, pcell_id[2]),
+   offsetof(struct pl031_regs, pcell_id[3]) };
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(offs); i++) {
+   uint32_t before32;
+   uint16_t before16;
+   uint8_t before8;
+   void *addr = (void*)pl031 + offs[i];
+   uint32_t poison = 0xdeadbeefULL;
+
+   before8 = readb(addr);
+   before16 = readw(addr);
+   before32 = readl(addr);
+
+   writeb(poison, addr);
+   writew(poison, addr);
+   writel(poison, addr);
+
+   if (before8 != readb(addr))
+   return 1;
+   if (before16 != readw(addr))
+   return 1;
+   if (before32 != readl(addr))
+   return 1;
+   }
+
+   return 0;
+}
+
+static int check_rtc_freq(void)
+{
+   uint32_t seconds_to_wait = 2;
+   uint32_t before = readl(>dr);
+   uint64_t before_tick = read_timer();
+   uint64_t target_tick = before_tick + (cntfrq * seconds_to_wait);
+
+   /* Wait for 2 seconds */
+   while (read_timer() < target_tick) ;
+
+   if (readl(>dr) != before + seconds_to_wait)
+   return 1;
+
+   return 0;
+}
+
+static bool gic_irq_pending(void)
+{
+   uint32_t offset = (pl031_irq / 32) * 4;
+
+   return readl(gic_i

Re: [PATCH kvm-unit-tests] arm: Add PL031 test

2019-07-12 Thread Alexander Graf




On 10.07.19 16:25, Marc Zyngier wrote:

Hi Alex,

I don't know much about pl031, so my comments are pretty general...

On 10/07/2019 14:27, Alexander Graf wrote:

This patch adds a unit test for the PL031 RTC that is used in the virt machine.
It just pokes basic functionality. I've mostly written it to familiarize myself
with the device, but I suppose having the test around does not hurt, as it also
exercises the GIC SPI interrupt path.

Signed-off-by: Alexander Graf 
---
  arm/Makefile.common |   1 +
  arm/pl031.c | 227 
  lib/arm/asm/gic.h   |   1 +
  3 files changed, 229 insertions(+)
  create mode 100644 arm/pl031.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index f0c4b5d..b8988f2 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -11,6 +11,7 @@ tests-common += $(TEST_DIR)/pmu.flat
  tests-common += $(TEST_DIR)/gic.flat
  tests-common += $(TEST_DIR)/psci.flat
  tests-common += $(TEST_DIR)/sieve.flat
+tests-common += $(TEST_DIR)/pl031.flat
  
  tests-all = $(tests-common) $(tests)

  all: directories $(tests-all)
diff --git a/arm/pl031.c b/arm/pl031.c
new file mode 100644
index 000..a364a1a
--- /dev/null
+++ b/arm/pl031.c
@@ -0,0 +1,227 @@
+/*
+ * Verify PL031 functionality
+ *
+ * This test verifies whether the emulated PL031 behaves correctly.
+ *
+ * Copyright 2019 Amazon.com, Inc. or its affiliates.
+ * Author: Alexander Graf 
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+#include 
+#include 
+#include 
+#include 
+
+static u32 cntfrq;
+
+#define PL031_BASE 0x0901
+#define PL031_IRQ 2


Does the unit test framework have a way to extract this from the firmware 
tables?
That'd be much nicer...


It does, I've moved it to that now :).




+
+struct pl031_regs {
+   uint32_t dr;/* Data Register */
+   uint32_t mr;/* Match Register */
+   uint32_t lr;/* Load Register */
+   union {
+   uint8_t cr; /* Control Register */
+   uint32_t cr32;
+   };
+   union {
+   uint8_t imsc;   /* Interrupt Mask Set or Clear register */
+   uint32_t imsc32;
+   };
+   union {
+   uint8_t ris;/* Raw Interrupt Status */
+   uint32_t ris32;
+   };
+   union {
+   uint8_t mis;/* Masked Interrupt Status */
+   uint32_t mis32;
+   };
+   union {
+   uint8_t icr;/* Interrupt Clear Register */
+   uint32_t icr32;
+   };
+   uint32_t reserved[1008];
+   uint32_t periph_id[4];
+   uint32_t pcell_id[4];
+};
+
+static struct pl031_regs *pl031 = (void*)PL031_BASE;
+static void *gic_ispendr;
+static void *gic_isenabler;
+static bool irq_triggered;
+
+static int check_id(void)
+{
+   uint32_t id[] = { 0x31, 0x10, 0x14, 0x00, 0x0d, 0xf0, 0x05, 0xb1 };
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(id); i++)
+   if (id[i] != readl(>periph_id[i]))
+   return 1;
+
+   return 0;
+}
+
+static int check_ro(void)
+{
+   uint32_t offs[] = { offsetof(struct pl031_regs, ris),
+   offsetof(struct pl031_regs, mis),
+   offsetof(struct pl031_regs, periph_id[0]),
+   offsetof(struct pl031_regs, periph_id[1]),
+   offsetof(struct pl031_regs, periph_id[2]),
+   offsetof(struct pl031_regs, periph_id[3]),
+   offsetof(struct pl031_regs, pcell_id[0]),
+   offsetof(struct pl031_regs, pcell_id[1]),
+   offsetof(struct pl031_regs, pcell_id[2]),
+   offsetof(struct pl031_regs, pcell_id[3]) };
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(offs); i++) {
+   uint32_t before32;
+   uint16_t before16;
+   uint8_t before8;
+   void *addr = (void*)pl031 + offs[i];
+   uint32_t poison = 0xdeadbeefULL;
+
+   before8 = readb(addr);
+   before16 = readw(addr);
+   before32 = readl(addr);
+
+   writeb(poison, addr);
+   writew(poison, addr);
+   writel(poison, addr);
+
+   if (before8 != readb(addr))
+   return 1;
+   if (before16 != readw(addr))
+   return 1;
+   if (before32 != readl(addr))
+   return 1;
+   }
+
+   return 0;
+}
+
+static int check_rtc_freq(void)
+{
+   uint32_t seconds_to_wait = 2;
+   uint32_t before = readl(>dr);
+   uint64_t before_tick = read_sysreg(cntpct_el0);


Hmmm. See below.


+   uint64_t target_tick = before_tick + (cntfrq * seconds_to_wait);
+
+   /* Wait for 2 seconds */
+   while (read_sysreg(cntpct_el0) < target_tick) ;


Careful here. The control dependency saves you, bu

Re: [PATCH kvm-unit-tests] arm: Add PL031 test

2019-07-11 Thread Alexander Graf




On 11.07.19 11:42, Andre Przywara wrote:

On Thu, 11 Jul 2019 09:52:42 +0200
Paolo Bonzini  wrote:

Hi,


On 11/07/19 07:49, Alexander Graf wrote:

I agree that it would belong more in qtest, but tests in not exactly the
right place is better than no tests.


The problem with qtest is that it tests QEMU device models from a QEMU
internal view.


Not really: fundamentally it tests QEMU device models with stimuli that
come from another process in the host, rather than code that runs in a
guest.  It does have hooks into QEMU's internal view (mostly to
intercept interrupts and advance the clocks), but the main feature of
the protocol is the ability to do memory reads and writes.


I am much more interested in the guest visible side of things. If
kvmtool wanted to implement a PL031, it should be able to execute the
same test that we run against QEMU, no?


One of the design goals of kvmtool is to get away with as little emulation
as possible, in favour of paravirtualisation (so it's just virtio and not
IDE/flash). So a hardware RTC emulation sounds dispensable in this context.


The main reason to have a PL031 exposed to a VM is to make OVMF happy, 
so that it can provide wall clock time runtime services. I suppose that 
sooner or later you may want to run OVMF in kvmtool as well, no?


The alternative to the PL031 here is to do a PV interface, yes. I'm not 
really convinced that that would be any easier though. The PL031 is a 
very trivial device. The only real downside is that it will wrap around 
in 2038.



Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault

2018-09-01 Thread Alexander Graf


On 23.08.18 14:43, Marc Zyngier wrote:
> On 23/08/18 13:24, Alexander Graf wrote:
>> On 08/23/2018 01:16 PM, Marc Zyngier wrote:
>>> On 21/08/18 17:54, Alexander Graf wrote:
>>>> On 08/21/2018 05:08 PM, Marc Zyngier wrote:
>>>>> On 21/08/18 15:08, Alexander Graf wrote:
>>>>>> On 08/21/2018 03:57 PM, Marc Zyngier wrote:
>>>>>>> On 21/08/18 14:35, Alexander Graf wrote:
>>>>>>>> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>>>>>>>>> The only case where we actually need to perform a dcache maintenance
>>>>>>>>> is when we map the page for the first time, and subsequent permission
>>>>>>>>> faults do not require cache maintenance. Let's make it conditional
>>>>>>>>> on not being a permission fault (and thus a translation fault).
>>>>>>>>>
>>>>>>>>> Reviewed-by: Christoffer Dall 
>>>>>>>>> Signed-off-by: Marc Zyngier 
>>>>>>>> This patch unfortunately breaks something on Hi1616 SoCs when running
>>>>>>>> 32bit guests. With this patch applied (and thus with 4.18) I get random
>>>>>>>> illegal instruction warnings from 32bit code inside VMs. I do not know
>>>>>>>> at this point whether this affects other CPUs as well.
>>>>>>> Can you please give a few more details?
>>>>>>>
>>>>>>> - what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help
>>>>>> These are A72s:
>>>>>>
>>>>>> processor    : 0
>>>>>> BogoMIPS    : 100.00
>>>>>> Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
>>>>>> CPU implementer    : 0x41
>>>>>> CPU architecture: 8
>>>>>> CPU variant    : 0x0
>>>>>> CPU part    : 0xd08
>>>>>> CPU revision    : 2
>>>>>>
>>>>>>> - an example of the crash? Is it within the decompressor? After? This
>>>>>>> things do matter, given the number of crazy things the 32bit kernel does
>>>>>> They are always in user space. My current reproducer is this:
>>>>>>
>>>>>>  $ while rpm -qa > /dev/null; do :; done
>>>>>>
>>>>>> If I run this in parallel with something that just populates RAM (dd
>>>>>> if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault
>>>>>> within seconds:
>>>>>>
>>>>>> sh-4.4# while rpm -qa > /dev/null; do true; done
>>>>>> Illegal instruction (core dumped)
>>>>>>
>>>>>>
>>>>>>> - a host kernel configuration?
>>>>>> Host kernel configuration is just the normal openSUSE one:
>>>>>>
>>>>>> https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable
>>>>>>
>>>>>>>> If anyone is interested in a reproducer, I have something handy. But 
>>>>>>>> for
>>>>>>>> now I believe we should just revert this patch.
>>>>>>> Before we revert anything, I'd like to understand what is happening.
>>>>>> Yeah, I didn't realize the commit is already in since 4.16, so I agree.
>>>>>> I'll bisect a bit, but it may take a while.
>>>>> Do you mind giving this a try?
>>>>>
>>>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>>>>> index 1d90d79706bd..df8f3d5eaa22 100644
>>>>> --- a/virt/kvm/arm/mmu.c
>>>>> +++ b/virt/kvm/arm/mmu.c
>>>>> @@ -1531,7 +1536,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>>>>> phys_addr_t fault_ipa,
>>>>>   kvm_set_pfn_dirty(pfn);
>>>>>   }
>>>>>
>>>>> - if (fault_status != FSC_PERM)
>>>>> + if (fault_status != FSC_PERM || write_fault)
>>>>>   clean_dcache_guest_page(pfn, PMD_SIZE);
>>>>>
>>>>>   if (exec_fault) {
>>>>> @@ -1553,7 +1558,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>>>>> phys_addr_t fault_ipa,
>>>>>   mark_page_dirty(kvm, gfn);
>>>>>

Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault

2018-08-23 Thread Alexander Graf

On 08/23/2018 01:16 PM, Marc Zyngier wrote:

On 21/08/18 17:54, Alexander Graf wrote:

On 08/21/2018 05:08 PM, Marc Zyngier wrote:

On 21/08/18 15:08, Alexander Graf wrote:

On 08/21/2018 03:57 PM, Marc Zyngier wrote:

On 21/08/18 14:35, Alexander Graf wrote:

On 10/23/2017 06:11 PM, Marc Zyngier wrote:

The only case where we actually need to perform a dcache maintenance
is when we map the page for the first time, and subsequent permission
faults do not require cache maintenance. Let's make it conditional
on not being a permission fault (and thus a translation fault).

Reviewed-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 

This patch unfortunately breaks something on Hi1616 SoCs when running
32bit guests. With this patch applied (and thus with 4.18) I get random
illegal instruction warnings from 32bit code inside VMs. I do not know
at this point whether this affects other CPUs as well.

Can you please give a few more details?

- what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help

These are A72s:

processor    : 0
BogoMIPS    : 100.00
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer    : 0x41
CPU architecture: 8
CPU variant    : 0x0
CPU part    : 0xd08
CPU revision    : 2


- an example of the crash? Is it within the decompressor? After? This
things do matter, given the number of crazy things the 32bit kernel does

They are always in user space. My current reproducer is this:

     $ while rpm -qa > /dev/null; do :; done

If I run this in parallel with something that just populates RAM (dd
if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault
within seconds:

sh-4.4# while rpm -qa > /dev/null; do true; done
Illegal instruction (core dumped)



- a host kernel configuration?

Host kernel configuration is just the normal openSUSE one:

https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable


If anyone is interested in a reproducer, I have something handy. But for
now I believe we should just revert this patch.

Before we revert anything, I'd like to understand what is happening.

Yeah, I didn't realize the commit is already in since 4.16, so I agree.
I'll bisect a bit, but it may take a while.

Do you mind giving this a try?

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 1d90d79706bd..df8f3d5eaa22 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1531,7 +1536,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
kvm_set_pfn_dirty(pfn);
}
   
-		if (fault_status != FSC_PERM)

+   if (fault_status != FSC_PERM || write_fault)
clean_dcache_guest_page(pfn, PMD_SIZE);
   
   		if (exec_fault) {

@@ -1553,7 +1558,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
mark_page_dirty(kvm, gfn);
}
   
-		if (fault_status != FSC_PERM)

+   if (fault_status != FSC_PERM || write_fault)
clean_dcache_guest_page(pfn, PAGE_SIZE);
   
   		if (exec_fault) {



The missing logic is that a write from the guest could have triggered
a CoW, meaning we definitely need to flush it in that case too. It
fixes a kvm-unit-test regression here.

This patch unfortunately does not fix the issue. I still see illegal
instructions.

[+Dave]

That's because what you're observing has nothing to do with caching, but
with FP/SIMD trapping instead. Thanks to the guest image you've provided,
I've been able to extract the following:

[3.944273] systemd-udevd (266): undefined instruction: pc=(ptrval)
[3.945396] CPU: 0 PID: 266 Comm: systemd-udevd Not tainted 4.17.14-2-lpae 
#1 openSUSE Tumbleweed (unreleased)
[3.947130] Hardware name: Generic DT based system
[3.947976] PC is at 0xb6b9397a
[3.948547] LR is at 0xb6e9a1b0
[3.958291] pc : []lr : []psr: 20070030
[3.959664] sp : bebe36a0  ip : b6f6ad50  fp : 005e8784
[3.960601] r10: bebe3814  r9 : bebe36c0  r8 : bebe36bc
[3.961522] r7 : bebe3814  r6 : 0074  r5 :   r4 : 
[3.962661] r3 : 005f2b60  r2 : 0074  r1 :   r0 : bebe3814
[3.963801] Flags: nzCv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment 
user
[4.000606] Control: 30c5383d  Table: 6a8e7400  DAC: fffd
[4.001659] Code: d1f9 f1a0 0001 4770 (eee0) 1b10

maz@flakes:~/armv7-build-fail$ (echo .thumb; echo .inst.w 0xeee01b10) > x.S &&  
arm-linux-gnueabihf-gcc -c x.S && arm-linux-gnueabihf-objdump -d x.o

x.o: file format elf32-littlearm


Disassembly of section .text:

 <.text>:
0:  eee0 1b10   vdup.8  q0, r1

A VFP instruction. Given that you've reported that things worked in
4.17 and broke in 4.18, I strongly suspect the new lazy FPSIMD code.
Upon inspection, the way we setup trapping for 32bit is a tiny bit
suspect.

Could you please give this patch a go? My Seattle has been run

Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault

2018-08-21 Thread Alexander Graf

On 08/21/2018 05:08 PM, Marc Zyngier wrote:

On 21/08/18 15:08, Alexander Graf wrote:

On 08/21/2018 03:57 PM, Marc Zyngier wrote:

On 21/08/18 14:35, Alexander Graf wrote:

On 10/23/2017 06:11 PM, Marc Zyngier wrote:

The only case where we actually need to perform a dcache maintenance
is when we map the page for the first time, and subsequent permission
faults do not require cache maintenance. Let's make it conditional
on not being a permission fault (and thus a translation fault).

Reviewed-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 

This patch unfortunately breaks something on Hi1616 SoCs when running
32bit guests. With this patch applied (and thus with 4.18) I get random
illegal instruction warnings from 32bit code inside VMs. I do not know
at this point whether this affects other CPUs as well.

Can you please give a few more details?

- what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help

These are A72s:

processor    : 0
BogoMIPS    : 100.00
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer    : 0x41
CPU architecture: 8
CPU variant    : 0x0
CPU part    : 0xd08
CPU revision    : 2


- an example of the crash? Is it within the decompressor? After? This
things do matter, given the number of crazy things the 32bit kernel does

They are always in user space. My current reproducer is this:

    $ while rpm -qa > /dev/null; do :; done

If I run this in parallel with something that just populates RAM (dd
if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault
within seconds:

sh-4.4# while rpm -qa > /dev/null; do true; done
Illegal instruction (core dumped)



- a host kernel configuration?

Host kernel configuration is just the normal openSUSE one:

https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable


If anyone is interested in a reproducer, I have something handy. But for
now I believe we should just revert this patch.

Before we revert anything, I'd like to understand what is happening.

Yeah, I didn't realize the commit is already in since 4.16, so I agree.
I'll bisect a bit, but it may take a while.

Do you mind giving this a try?

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 1d90d79706bd..df8f3d5eaa22 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1531,7 +1536,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
kvm_set_pfn_dirty(pfn);
}
  
-		if (fault_status != FSC_PERM)

+   if (fault_status != FSC_PERM || write_fault)
clean_dcache_guest_page(pfn, PMD_SIZE);
  
  		if (exec_fault) {

@@ -1553,7 +1558,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
mark_page_dirty(kvm, gfn);
}
  
-		if (fault_status != FSC_PERM)

+   if (fault_status != FSC_PERM || write_fault)
clean_dcache_guest_page(pfn, PAGE_SIZE);
  
  		if (exec_fault) {



The missing logic is that a write from the guest could have triggered
a CoW, meaning we definitely need to flush it in that case too. It
fixes a kvm-unit-test regression here.


This patch unfortunately does not fix the issue. I still see illegal 
instructions.



Alex

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault

2018-08-21 Thread Alexander Graf

On 08/21/2018 04:08 PM, Alexander Graf wrote:

On 08/21/2018 03:57 PM, Marc Zyngier wrote:

On 21/08/18 14:35, Alexander Graf wrote:

On 10/23/2017 06:11 PM, Marc Zyngier wrote:

The only case where we actually need to perform a dcache maintenance
is when we map the page for the first time, and subsequent permission
faults do not require cache maintenance. Let's make it conditional
on not being a permission fault (and thus a translation fault).

Reviewed-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 

This patch unfortunately breaks something on Hi1616 SoCs when running
32bit guests. With this patch applied (and thus with 4.18) I get random
illegal instruction warnings from 32bit code inside VMs. I do not know
at this point whether this affects other CPUs as well.

Can you please give a few more details?

- what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help


These are A72s:

processor    : 0
BogoMIPS    : 100.00
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer    : 0x41
CPU architecture: 8
CPU variant    : 0x0
CPU part    : 0xd08
CPU revision    : 2



- an example of the crash? Is it within the decompressor? After? This
things do matter, given the number of crazy things the 32bit kernel does


They are always in user space. My current reproducer is this:

  $ while rpm -qa > /dev/null; do :; done

If I run this in parallel with something that just populates RAM (dd 
if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction 
fault within seconds:


sh-4.4# while rpm -qa > /dev/null; do true; done
Illegal instruction (core dumped)



- a host kernel configuration?


Host kernel configuration is just the normal openSUSE one:

https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable 



If anyone is interested in a reproducer, I have something handy. But 
for

now I believe we should just revert this patch.

Before we revert anything, I'd like to understand what is happening.


Yeah, I didn't realize the commit is already in since 4.16, so I 
agree. I'll bisect a bit, but it may take a while.


I'm stuck on bisect. Somewhere in the merge window things broke so badly 
that udev just segfaults on boot. I got this far:


$ git bisect log
git bisect start
# good: [29dcea88779c856c7dc92040a0c01233263101d4] Linux 4.17
git bisect good 29dcea88779c856c7dc92040a0c01233263101d4
# bad: [529bea37411759c2b5b41a187b3020723c67c16d] Linux 4.18.1
git bisect bad 529bea37411759c2b5b41a187b3020723c67c16d
# good: [3036bc45364f98515a2c446d7fac2c34dcfbeff4] Merge tag 
'media/v4.18-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

git bisect good 3036bc45364f98515a2c446d7fac2c34dcfbeff4
# good: [721afaa2aeb860067decdddadc84ed16f42f2048] Merge tag 'armsoc-dt' 
of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

git bisect good 721afaa2aeb860067decdddadc84ed16f42f2048
# good: [e7441c9274a6a5453e06f4c2b8b5f72eca0a3f17] mac80211: disable 
BHs/preemption in ieee80211_tx_control_port()

git bisect good e7441c9274a6a5453e06f4c2b8b5f72eca0a3f17
# bad: [05df204549c510c7c56e58d25098c448998a0cd5] Merge tag 
'devicetree-fixes-for-4.18' of 
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

git bisect bad 05df204549c510c7c56e58d25098c448998a0cd5
# good: [b19b9282093588e73401f9d4981310a8de975f7d] Merge tag 
'riscv-for-linus-4.18-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux

git bisect good b19b9282093588e73401f9d4981310a8de975f7d
# good: [bf642e3a1996f1ed8f083c5ecd4b51270a9e11bc] Merge tag 
'drm-intel-fixes-2018-07-12' of 
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

git bisect good bf642e3a1996f1ed8f083c5ecd4b51270a9e11bc
# bad: [2a7e1211e130c51a2b5743ecf247645ac8e936ee] Merge tag 
'vfio-v4.18-rc5' of git://github.com/awilliam/linux-vfio

git bisect bad 2a7e1211e130c51a2b5743ecf247645ac8e936ee
# bad: [a74aa9676c0256eac05efb2c8e3127e0835e66e5] Merge tag 
'char-misc-4.18-rc5' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

git bisect bad a74aa9676c0256eac05efb2c8e3127e0835e66e5
# bad: [f1454959ad89f9fe2b6862fa3c41070feaffeab9] Merge tag 
'mmc-v4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

git bisect bad f1454959ad89f9fe2b6862fa3c41070feaffeab9
# bad: [1e09177acae32a61586af26d83ca5ef591cdcaf5] Merge tag 
'mips_fixes_4.18_3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

git bisect bad 1e09177acae32a61586af26d83ca5ef591cdcaf5
# good: [4f65245f2d178b9cba48350620d76faa4a098841] HID: hiddev: fix 
potential Spectre v1

git bisect good 4f65245f2d178b9cba48350620d76faa4a098841



Alex

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault

2018-08-21 Thread Alexander Graf

On 08/21/2018 03:57 PM, Marc Zyngier wrote:

On 21/08/18 14:35, Alexander Graf wrote:

On 10/23/2017 06:11 PM, Marc Zyngier wrote:

The only case where we actually need to perform a dcache maintenance
is when we map the page for the first time, and subsequent permission
faults do not require cache maintenance. Let's make it conditional
on not being a permission fault (and thus a translation fault).

Reviewed-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 

This patch unfortunately breaks something on Hi1616 SoCs when running
32bit guests. With this patch applied (and thus with 4.18) I get random
illegal instruction warnings from 32bit code inside VMs. I do not know
at this point whether this affects other CPUs as well.

Can you please give a few more details?

- what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help


These are A72s:

processor    : 0
BogoMIPS    : 100.00
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer    : 0x41
CPU architecture: 8
CPU variant    : 0x0
CPU part    : 0xd08
CPU revision    : 2



- an example of the crash? Is it within the decompressor? After? This
things do matter, given the number of crazy things the 32bit kernel does


They are always in user space. My current reproducer is this:

  $ while rpm -qa > /dev/null; do :; done

If I run this in parallel with something that just populates RAM (dd 
if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault 
within seconds:


sh-4.4# while rpm -qa > /dev/null; do true; done
Illegal instruction (core dumped)



- a host kernel configuration?


Host kernel configuration is just the normal openSUSE one:

https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable


If anyone is interested in a reproducer, I have something handy. But for
now I believe we should just revert this patch.

Before we revert anything, I'd like to understand what is happening.


Yeah, I didn't realize the commit is already in since 4.16, so I agree. 
I'll bisect a bit, but it may take a while.



Alex

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault

2018-08-21 Thread Alexander Graf

On 08/21/2018 03:35 PM, Alexander Graf wrote:

On 10/23/2017 06:11 PM, Marc Zyngier wrote:

The only case where we actually need to perform a dcache maintenance
is when we map the page for the first time, and subsequent permission
faults do not require cache maintenance. Let's make it conditional
on not being a permission fault (and thus a translation fault).

Reviewed-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 


This patch unfortunately breaks something on Hi1616 SoCs when running 
32bit guests. With this patch applied (and thus with 4.18) I get 
random illegal instruction warnings from 32bit code inside VMs. I do 
not know at this point whether this affects other CPUs as well.


If anyone is interested in a reproducer, I have something handy. But 
for now I believe we should just revert this patch.


Ok, I'm slightly confused. The patch in question is already upstream 
since 4.16, but the regression reportedly came with the switch from 4.17 
to 4.18. I'll try to bisect it down a bit further ...



Alex

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault

2018-08-21 Thread Alexander Graf

On 10/23/2017 06:11 PM, Marc Zyngier wrote:

The only case where we actually need to perform a dcache maintenance
is when we map the page for the first time, and subsequent permission
faults do not require cache maintenance. Let's make it conditional
on not being a permission fault (and thus a translation fault).

Reviewed-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 


This patch unfortunately breaks something on Hi1616 SoCs when running 
32bit guests. With this patch applied (and thus with 4.18) I get random 
illegal instruction warnings from 32bit code inside VMs. I do not know 
at this point whether this affects other CPUs as well.


If anyone is interested in a reproducer, I have something handy. But for 
now I believe we should just revert this patch.



Alex

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2] KVM: arm/arm64: Handle hva aging while destroying the vm

2017-07-06 Thread Alexander Graf



On 05.07.17 10:57, Suzuki K Poulose wrote:

Hi Alex,

On Wed, Jul 05, 2017 at 08:20:31AM +0200, Alexander Graf wrote:

The kvm_age_hva callback may be called all the way concurrently while
kvm_mmu_notifier_release() is running.

The release function sets kvm->arch.pgd = NULL which the aging function
however implicitly relies on in stage2_get_pud(). That means they can
race and the aging function may dereference a NULL pgd pointer.

This patch adds a check for that case, so that we leave the aging
function silently.

Cc: sta...@vger.kernel.org
Fixes: 293f29363 ("kvm-arm: Unmap shadow pagetables properly")
Signed-off-by: Alexander Graf <ag...@suse.de>

---

v1 -> v2:

   - Fix commit message
   - Add Fixes and stable tags
---
  virt/kvm/arm/mmu.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index f2d5b6c..227931f 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -861,6 +861,10 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache
pgd_t *pgd;
pud_t *pud;
  
+	/* Do we clash with kvm_free_stage2_pgd()? */

+   if (!kvm->arch.pgd)
+   return NULL;
+


I think this check should be moved up in the chain. We call kvm_age_hva(), with
the kvm->mmu_lock held and we don't release it till we reach here. So, ideally,
if we find the PGD is null when we reach kvm_age_hva(), we could simply return
there, like we do for other call backs from the KVM mmu_notifier.


That probably works too - I'm not sure which version is more consistent 
as well as more maintainable in the long run. I'll leave the call here 
to Christoffer.



Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2] KVM: arm/arm64: Handle hva aging while destroying the vm

2017-07-05 Thread Alexander Graf
The kvm_age_hva callback may be called all the way concurrently while
kvm_mmu_notifier_release() is running.

The release function sets kvm->arch.pgd = NULL which the aging function
however implicitly relies on in stage2_get_pud(). That means they can
race and the aging function may dereference a NULL pgd pointer.

This patch adds a check for that case, so that we leave the aging
function silently.

Cc: sta...@vger.kernel.org
Fixes: 293f29363 ("kvm-arm: Unmap shadow pagetables properly")
Signed-off-by: Alexander Graf <ag...@suse.de>

---

v1 -> v2:

  - Fix commit message
  - Add Fixes and stable tags
---
 virt/kvm/arm/mmu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index f2d5b6c..227931f 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -861,6 +861,10 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache
pgd_t *pgd;
pud_t *pud;
 
+   /* Do we clash with kvm_free_stage2_pgd()? */
+   if (!kvm->arch.pgd)
+   return NULL;
+
pgd = kvm->arch.pgd + stage2_pgd_index(addr);
if (WARN_ON(stage2_pgd_none(*pgd))) {
if (!cache)
-- 
1.8.5.6

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: Handle hva aging while destroying the vm

2017-07-03 Thread Alexander Graf

On 07/03/2017 10:03 AM, Christoffer Dall wrote:

Hi Alex,

On Fri, Jun 23, 2017 at 05:21:59PM +0200, Alexander Graf wrote:

If we want to age an HVA while the VM is getting destroyed, we have a
tiny race window during which we may end up dereferencing an invalid
kvm->arch.pgd value.

CPU0   CPU1

kvm_age_hva()
   kvm_mmu_notifier_release()
   kvm_arch_flush_shadow_all()
   kvm_free_stage2_pgd()
   
stage2_get_pmd()

   set kvm->arch.pgd = 0
   

stage2_get_pud()
arch.pgd>


I don't think this sequence, can happen, but I think kvm_age_hva() can
be called with the mmu_lock held and kvm->pgd already being NULL.

Is that possible for the mmu notifiers to be calling clear(_flush)_young
while also calling notifier_release?


I *think* the aging happens completely orthogonally to release. But 
let's ask Andrea - I'm sure he knows :).



Alex



If so, the patch below looks good to me.

Thanks,
-Christoffer



This patch adds a check for that case.

Signed-off-by: Alexander Graf <ag...@suse.de>
---
  virt/kvm/arm/mmu.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index f2d5b6c..227931f 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -861,6 +861,10 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache
pgd_t *pgd;
pud_t *pud;
  
+	/* Do we clash with kvm_free_stage2_pgd()? */

+   if (!kvm->arch.pgd)
+   return NULL;
+
pgd = kvm->arch.pgd + stage2_pgd_index(addr);
if (WARN_ON(stage2_pgd_none(*pgd))) {
if (!cache)
--
1.8.5.6

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm



___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm/arm64: Handle hva aging while destroying the vm

2017-06-23 Thread Alexander Graf
If we want to age an HVA while the VM is getting destroyed, we have a
tiny race window during which we may end up dereferencing an invalid
kvm->arch.pgd value.

   CPU0   CPU1

   kvm_age_hva()
  kvm_mmu_notifier_release()
  kvm_arch_flush_shadow_all()
  kvm_free_stage2_pgd()
  
   stage2_get_pmd()
   
  set kvm->arch.pgd = 0
  
   
   stage2_get_pud()
   arch.pgd>
   

This patch adds a check for that case.

Signed-off-by: Alexander Graf <ag...@suse.de>
---
 virt/kvm/arm/mmu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index f2d5b6c..227931f 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -861,6 +861,10 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache
pgd_t *pgd;
pud_t *pud;
 
+   /* Do we clash with kvm_free_stage2_pgd()? */
+   if (!kvm->arch.pgd)
+   return NULL;
+
pgd = kvm->arch.pgd + stage2_pgd_index(addr);
if (WARN_ON(stage2_pgd_none(*pgd))) {
if (!cache)
-- 
1.8.5.6

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 00/25] arm64: KVM: Mediate access to GICv3 sysregs at EL2

2017-06-08 Thread Alexander Graf



On 01.06.17 12:20, Marc Zyngier wrote:

Some systems have less than perfect GICv3 implementations, leading to
all kind of ugly issues (guest hanging, host dying). In order to allow
some level of diagnostic, and in some cases implement workarounds,
this series enables the trapping of both Group-0, Group-1 and Common
sysregs. Mediating the access at EL2 allows some form of sanity
checking that the HW is sometimes sorely lacking.

Instead of fully emulating a GICv3 CPU interface, we still use the
existing HW (list registers, AP registers, VMCR...), which allows the
code to be independent from the rest of the KVM code, and to cope with
partial trapping.

Of course, trapping has a cost, which is why this must be either
enabled on the command line, or selected by another cpu capability
(see Cavium erratum 30115). A quick test on an A57-based platform
shows a 25% hit when repeatedly banging on the trapped registers,
while normal workloads do not seem to suffer noticeably from such
trapping (hackbench variance is in the usual noise, despite being very
IPI happy).

This has been tested on a dual socket Thundex-X and a Freescale LS-2085a.

I've taken the liberty to rebase David Daney's initial Cavium erratum
30115 workaround on top of this series, and included it here as a
typical use case.


I've run this patch set on an affected ThunderX system and indeed not 
seen any hangs. I have seen lost guest USB keyboard events which might 
point at interrupt problems or not, but let's assume it's a different 
issue for now.


Tested-by: Alexander Graf <ag...@suse.de>


Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

2017-04-21 Thread Alexander Graf



On 04.04.17 12:35, Suzuki K Poulose wrote:

Hi Christoffer,

On 04/04/17 11:13, Christoffer Dall wrote:

Hi Suzuki,

On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:

In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
unmap_stage2_range() on the entire memory range for the guest. This
could
cause problems with other callers (e.g, munmap on a memslot) trying to
unmap a range. And since we have to unmap the entire Guest memory range
holding a spinlock, make sure we yield the lock if necessary, after we
unmap each PUD range.

Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Cc: sta...@vger.kernel.org # v3.10+
Cc: Paolo Bonzini 
Cc: Marc Zyngier 
Cc: Christoffer Dall 
Cc: Mark Rutland 
Signed-off-by: Suzuki K Poulose 
[ Avoid vCPU starvation and lockup detector warnings ]
Signed-off-by: Marc Zyngier 
Signed-off-by: Suzuki K Poulose 



This unfortunately fails to build on 32-bit ARM, and I also think we
intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.


Sorry about that, I didn't test the patch with arm32. I am fine the
patch below. And I agree that the name change does make things more
readable. See below for a hunk that I posted to the kbuild report.



How about adding this to your patch (which includes a rename of
S2_PGD_SIZE which is horribly confusing as it indicates the size of the
first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
of address space mapped by a single entry in the same table):

diff --git a/arch/arm/include/asm/stage2_pgtable.h
b/arch/arm/include/asm/stage2_pgtable.h
index 460d616..c997f2d 100644
--- a/arch/arm/include/asm/stage2_pgtable.h
+++ b/arch/arm/include/asm/stage2_pgtable.h
@@ -35,10 +35,13 @@

 #define stage2_pud_huge(pud)pud_huge(pud)

+#define S2_PGDIR_SIZEPGDIR_SIZE
+#define S2_PGDIR_MASKPGDIR_MASK
+
 /* Open coded p*d_addr_end that can deal with 64bit addresses */
 static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr,
phys_addr_t end)
 {
-phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
+phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;

 return (boundary - 1 < end - 1) ? boundary : end;
 }
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index db94f3a..6e79a4c 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
 static unsigned long hyp_idmap_end;
 static phys_addr_t hyp_idmap_vector;

-#define S2_PGD_SIZE(PTRS_PER_S2_PGD * sizeof(pgd_t))
+#define S2_PGD_TABLE_SIZE(PTRS_PER_S2_PGD * sizeof(pgd_t))
 #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))

 #define KVM_S2PTE_FLAG_IS_IOMAP(1UL << 0)
@@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm,
phys_addr_t start, u64 size)
  * If the range is too large, release the kvm->mmu_lock
  * to prevent starvation and lockup detector warnings.
  */
-if (size > S2_PUD_SIZE)
+if (size > S2_PGDIR_SIZE)
 cond_resched_lock(>mmu_lock);
 next = stage2_pgd_addr_end(addr, end);
 if (!stage2_pgd_none(*pgd))
@@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
 }

 /* Allocate the HW PGD, making sure that each page gets its own
refcount */
-pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
+pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
 if (!pgd)
 return -ENOMEM;

@@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
 spin_unlock(>mmu_lock);

 /* Free the HW pgd, one page at a time */
-free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
+free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
 kvm->arch.pgd = NULL;
 }



Btw, I have a different hunk to solve the problem, posted to the kbuild
report. I will post it here for the sake of capturing the discussion in
one place. The following hunk on top of the patch, changes the lock
release after we process one PGDIR entry. As for the first time
we enter the loop we haven't done much with the lock held, hence it may
make
sense to do it after the first round and we have more work to do.

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index db94f3a..582a972 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -295,15 +295,15 @@ static void unmap_stage2_range(struct kvm *kvm,
phys_addr_t start, u64 size)
assert_spin_locked(>mmu_lock);
pgd = kvm->arch.pgd + stage2_pgd_index(addr);
do {
+   next = stage2_pgd_addr_end(addr, end);
+   if (!stage2_pgd_none(*pgd))


Just as heads up, I had this version applied to my tree by accident 
(commit 8b3405e345b5a098101b0c31b264c812bba045d9 from Christoffer's 
queue) and ran into a NULL pointer dereference:



Re: [PATCH v3 0/5] Support userspace irqchip with arch timers

2017-04-06 Thread Alexander Graf



On 05.04.17 11:28, Christoffer Dall wrote:

This series is the second version of the rework of the patches to support
architected timers with a userspace irqchip sent by Alexander Graf [1].

We first cleanup some of the timer code to make it easier to understand
what is being done in the later patches, and then define the ABI,
implement timers support, implement PMU support, and finally advertise
the features.

These patches are based on the recent work from Jintack to support the
physical timer in addition to the virtual timer.  This series including
its dependencies can be found here:

git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git irqs-to-user-v3

I tested this using Alex's QEMU patch with his fixes for SMP applied.  This
seems to be rock-solid.  The temporary-not-for-upstream-but-for-testing patch
can be found here (force-pushed and rebased since v2):

https://git.linaro.org/people/christoffer.dall/qemu-arm.git no-kvm-irqchip

I also tested it on 32-bit and it looks good there as well.


Reviewed-by: Alexander Graf <ag...@suse.de>


Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 3/5] KVM: arm/arm64: Support arch timers with a userspace gic

2017-04-06 Thread Alexander Graf



On 06.04.17 10:25, Marc Zyngier wrote:

On 06/04/17 09:16, Alexander Graf wrote:



On 05.04.17 11:28, Christoffer Dall wrote:

From: Alexander Graf <ag...@suse.de>




@@ -559,6 +628,13 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
if (timer->enabled)
return 0;

+   /* Without a VGIC we do not map virtual IRQs to physical IRQs */
+   if (!irqchip_in_kernel(vcpu->kvm))
+   goto no_vgic;
+
+   if (!vgic_initialized(vcpu->kvm))
+   return -ENODEV;
+
/*
 * Find the physical IRQ number corresponding to the host_vtimer_irq
 */
@@ -582,8 +658,8 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
if (ret)
return ret;

+no_vgic:
timer->enabled = 1;


What happens if

   1) User space spawns a VM with user space irqchip
   2) Runs the VM
   3) Then adds a virtual gic device


As soon as a vcpu has run once, it is not possible to instantiate a vgic
(see virt/kvm/arm/vgic/vgic-init.c:kvm_vgic_create around line 101).


Ah, I was missing that part. Awesome, all problems solved :).


Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 3/5] KVM: arm/arm64: Support arch timers with a userspace gic

2017-04-06 Thread Alexander Graf



On 05.04.17 11:28, Christoffer Dall wrote:

From: Alexander Graf <ag...@suse.de>

If you're running with a userspace gic or other interrupt constroller
(that is no vgic in the kernel), then you have so far not been able to
use the architected timers, because the output of the architected
timers, which are driven inside the kernel, was a kernel-only construct
between the arch timer code and the vgic.

This patch implements the new KVM_CAP_ARM_USER_IRQ feature, where we use a
side channel on the kvm_run structure, run->s.regs.device_irq_level, to
always notify userspace of the timer output levels when using a userspace
irqchip.

This works by ensureing that before we enter the guest, if the timer
output level has changed compared to what we last told userspace, we
don't enter the guest, but instead return to userspace to notify it of
the new level.  If we are exiting, because of an MMIO for example, and
the level changed at the same time, the value is also updated and
userspace can sample the line as it needs.  This is nicely achieved
simply always updating the timer_irq_level field after the main run
loop.

Note that the kvm_timer_update_irq trace event is changed to show the
host IRQ number for the timer instead of the guest IRQ number, because
the kernel no longer know which IRQ userspace wires up the timer signal
to.

Also note that this patch implements all required functionality but does
not yet advertise the capability.

Signed-off-by: Alexander Graf <ag...@suse.de>
Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org>
---
 arch/arm/kvm/arm.c   |  18 +++
 include/kvm/arm_arch_timer.h |   2 +
 virt/kvm/arm/arch_timer.c| 122 +++
 3 files changed, 110 insertions(+), 32 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 7fa4898..efb16e5 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -515,13 +515,7 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
return ret;
}

-   /*
-* Enable the arch timers only if we have an in-kernel VGIC
-* and it has been properly initialized, since we cannot handle
-* interrupts from the virtual timer with a userspace gic.
-*/
-   if (irqchip_in_kernel(kvm) && vgic_initialized(kvm))
-   ret = kvm_timer_enable(vcpu);
+   ret = kvm_timer_enable(vcpu);

return ret;
 }
@@ -640,9 +634,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
local_irq_disable();

/*
-* Re-check atomic conditions
+* If we have a singal pending, or need to notify a userspace
+* irqchip about timer level changes, then we exit (and update
+* the timer level state in kvm_timer_update_run below).
 */
-   if (signal_pending(current)) {
+   if (signal_pending(current) ||
+   kvm_timer_should_notify_user(vcpu)) {
ret = -EINTR;
run->exit_reason = KVM_EXIT_INTR;
}
@@ -714,6 +711,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
ret = handle_exit(vcpu, run, ret);
}

+   /* Tell userspace about in-kernel device output levels */
+   kvm_timer_update_run(vcpu);
+
if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, , NULL);
return ret;
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index fe797d6..295584f 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -63,6 +63,8 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
 void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
 void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu);
 void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu);
+bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu);
+void kvm_timer_update_run(struct kvm_vcpu *vcpu);
 void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu);

 u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 363f0d2..5dc2167 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -184,6 +184,27 @@ bool kvm_timer_should_fire(struct arch_timer_context 
*timer_ctx)
return cval <= now;
 }

+/*
+ * Reflect the timer output level into the kvm_run structure
+ */
+void kvm_timer_update_run(struct kvm_vcpu *vcpu)
+{
+   struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+   struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
+   struct kvm_sync_regs *regs = >run->s.regs;
+
+   if (likely(irqchip_in_kernel(vcpu->kvm)))
+   return;
+
+   /* Populate the device bitmap with the timer states */
+   regs->device_irq_level &= ~(KVM_ARM_DEV_EL1_VTIMER |
+

Re: [PATCH v2 2/5] KVM: arm/arm64: Add ARM user space interrupt signaling ABI

2017-04-04 Thread Alexander Graf



On 21.02.17 12:41, Christoffer Dall wrote:

Hi Alex,

On Fri, Feb 03, 2017 at 05:51:18PM +, Peter Maydell wrote:

On 3 February 2017 at 14:56, Christoffer Dall <cd...@linaro.org> wrote:

From: Christoffer Dall <christoffer.d...@linaro.org>

We have 2 modes for dealing with interrupts in the ARM world. We can
either handle them all using hardware acceleration through the vgic or
we can emulate a gic in user space and only drive CPU IRQ pins from
there.

Unfortunately, when driving IRQs from user space, we never tell user
space about events from devices emulated inside the kernel, which may
result in interrupt line state changes, so we lose out on for example
timer and PMU events if we run with user space gic emulation.

Define an ABI to publish such device output levels to userspace.

Signed-off-by: Alexander Graf <ag...@suse.de>
Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org>



Acked-by: Peter Maydell <peter.mayd...@linaro.org>

for the userspace ABI/docs part.



Given Peter's ack on this ABI, any chance you could take a look at
fixing the userspace SMP issue and respinning the userspace patches for
this series?


The problem with user space SMP was simply a missing lock:

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index a66845f..1b33895 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -548,10 +548,14 @@ MemTxAttrs kvm_arch_post_run(CPUState *cs, struct 
kvm_run *run)


 /* Synchronize our internal vtimer irq line with the kvm one */
 if (run->s.regs.device_irq_level != cpu->device_irq_level) {
-qemu_set_irq(ARM_CPU(cs)->gt_timer_outputs[GTIMER_VIRT],
+qemu_mutex_lock_iothread();
+qemu_set_irq(cpu->gt_timer_outputs[GTIMER_VIRT],
  run->s.regs.device_irq_level & 
KVM_ARM_DEV_EL1_VTIMER);

+qemu_set_irq(cpu->gt_timer_outputs[GTIMER_PHYS],
+ run->s.regs.device_irq_level & 
KVM_ARM_DEV_EL1_PTIMER);

 /* TODO: Handle changes in PTIMER and PMU as well */
 cpu->device_irq_level = run->s.regs.device_irq_level;
+qemu_mutex_unlock_iothread();
 }

 return MEMTXATTRS_UNSPECIFIED;




I also wasn't able to use your series out of the box. There are several 
places in the code that check whether a timer is enabled (for register 
save/restore for example) and without vgic we never ended up setting 
that to true.


I don't know how safe it is to set it to true regardless. It feels to me 
like someone has to put more thought into how to switch from an 
initialized user space timer state to a vgic one. For reference, my test 
patch is below:


diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 891a725..56a5d51 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -629,8 +629,11 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
return 0;

/* Without a VGIC we do not map virtual IRQs to physical IRQs */
-   if (!irqchip_in_kernel(vcpu->kvm))
+   if (!irqchip_in_kernel(vcpu->kvm)) {
+   /* Enable timer for now, may be racy? */
+   timer->enabled = 1;
return 0;
+   }

if (!vgic_initialized(vcpu->kvm))
return -ENODEV;




Once we solved that piece, I'll happily respin my user space patch.


Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm: Add simple arch timer test

2016-11-16 Thread Alexander Graf



On 16/11/2016 16:54, Andrew Jones wrote:

On Mon, Sep 19, 2016 at 04:52:01PM +0200, Andrew Jones wrote:

On Mon, Sep 19, 2016 at 01:44:40PM +0200, Alexander Graf wrote:

All virtualization capable ARM cores support the ARM architected virtual timer.

This patch adds minimalistic checks whether we can fire a virtual timer event
using them. It does not actually trigger an interrupt yet, as infrastructure
for that is missing. However, it does check whether the timer pin is marked as
pending on the GIC.

Signed-off-by: Alexander Graf <ag...@suse.de>
---
 arm/Makefile.arm64 |   2 +-
 arm/vtimer-test.c  | 122 +
 2 files changed, 123 insertions(+), 1 deletion(-)
 create mode 100644 arm/vtimer-test.c

diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
index 0b0761c..5b3fbe8 100644
--- a/arm/Makefile.arm64
+++ b/arm/Makefile.arm64
@@ -12,7 +12,7 @@ cflatobjs += lib/arm64/processor.o
 cflatobjs += lib/arm64/spinlock.o

 # arm64 specific tests
-tests =
+tests = $(TEST_DIR)/vtimer-test.flat

 include $(TEST_DIR)/Makefile.common

diff --git a/arm/vtimer-test.c b/arm/vtimer-test.c
new file mode 100644
index 000..a3e24ce
--- /dev/null
+++ b/arm/vtimer-test.c
@@ -0,0 +1,122 @@
+/*
+ * Unit tests for the virtual timer on the ARM virt machine
+ *
+ * Copyright (C) 2016, Alexander Graf <ag...@suse.de>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.
+ */
+#include 
+#include 
+
+#define VTIMER_CTL_ENABLE  (1 << 0)
+#define VTIMER_CTL_IMASK   (1 << 1)
+#define VTIMER_CTL_ISTATUS (1 << 2)
+
+#define VIRT_GIC_DIST_BASE 0x0800
+#define GIC_DIST_PENDING_SET0x200
+#define GIC_DIST_PENDING_CLEAR  0x280
+#define GIC_DIST_ACTIVE_SET 0x300
+#define GIC_DIST_ACTIVE_CLEAR   0x380
+
+#define VIRT_GIC_CPU_BASE  0x0801


Once I get my arm/gic series in we won't need to hard code
the gic base addresses and these gic register offsets will
also be provided.


+
+#define ARCH_TIMER_VIRT_IRQ   11
+#define ARCH_TIMER_S_EL1_IRQ  13
+#define ARCH_TIMER_NS_EL1_IRQ 14
+#define ARCH_TIMER_NS_EL2_IRQ 10


We can pull these out of the DT.


+
+static void write_vtimer_ctl(u64 val)
+{
+   asm volatile("msr cntv_ctl_el0, %0" : : "r" (val));
+}
+
+static void write_vtimer_cval(u64 val)
+{
+   asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
+}
+
+static u64 read_vtimer_ctl(void)
+{
+   u64 val;
+   asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
+   return val;
+}
+
+static u64 read_vtimer_cval(void)
+{
+   u64 val;
+   asm volatile("mrs %0, cntv_cval_el0" : "=r" (val));
+   return val;
+}
+
+static u64 read_vtimer_val(void)
+{
+   u64 val;
+   asm volatile("mrs %0, cntvct_el0" : "=r" (val));
+   return val;
+}
+
+static u64 read_vtimer_freq(void)
+{
+   u64 val;
+   asm volatile("mrs %0, cntfrq_el0" : "=r" (val));
+   return val;
+}
+
+static bool gic_vtimer_pending(void)
+{
+   u32 *pending = (u32*)(VIRT_GIC_DIST_BASE + GIC_DIST_PENDING_SET);
+   return (readl(pending) & (1 << (ARCH_TIMER_VIRT_IRQ + 16)));


nit: don't need the outer ()

I think I'd like to add the PPI(irq) macro we added to QEMU for
the +16's.


+}
+
+static bool test_cval_10msec(void)
+{
+   u64 time_10ms = read_vtimer_freq() / 100;
+   u64 time_1us = time_10ms / 1;
+   u64 before_timer = read_vtimer_val();
+   u64 after_timer, latency;
+
+   /* Program timer to fire in 10ms */
+   write_vtimer_cval(before_timer + time_10ms);
+
+   /* Wait for the timer to fire */
+   while (!(read_vtimer_ctl() & VTIMER_CTL_ISTATUS)) ;


If emulation fails to set the status bit then we'll spin forever.
That's OK if we set the test timeout (arm/unittests.cfg:timeout
for this test appropriately) Or, maybe can add a trial count here
that's sufficiently large?


+
+   /* It fired, check how long it took */
+   after_timer = read_vtimer_val();
+
+   latency = (after_timer - (before_timer + time_10ms));


nit: don't need the outer ()

Shouldn't latency be signed so we can see if the status bit was
set too soon? Or just compare time_10ms with after - before?


+   printf("After timer: 0x%lx\n", after_timer);
+   printf("Expected   : 0x%lx\n", before_timer + time_10ms);
+   printf("Latency: %ld us\n", latency / time_1us);
+
+   return latency < time_10ms;


Does this mean that the threshold for success is 10ms? If so,
then that's not too large?


+}
+
+static void test_vtimer(void)
+{
+   printf("\n=== vtimer with busy loop ===\n");


I guess this is a subtest header, i.e. this file will get
other subtests and headers like this one will divide up
the test output. That's fine, but new, at least for arm tests.
Currently we're div

Re: [RFC 1/2] linux-headers: update

2016-11-01 Thread Alexander Graf



On 01/11/2016 11:19, Peter Maydell wrote:

On 29 October 2016 at 22:10, Alexander Graf <ag...@suse.de> wrote:

This patch updates the Linux headers to include the in-progress user
space ARM timer patches. It is explicitly RFC, as the patches are not
merged yet.
---


Is there a cover letter email for this series ?


I figured that the set is so small that it didn't deserve one :).


Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC 1/2] linux-headers: update

2016-10-29 Thread Alexander Graf
This patch updates the Linux headers to include the in-progress user
space ARM timer patches. It is explicitly RFC, as the patches are not
merged yet.
---
 linux-headers/asm-arm/kvm.h   | 1 +
 linux-headers/asm-arm64/kvm.h | 1 +
 linux-headers/linux/kvm.h | 6 ++
 3 files changed, 8 insertions(+)

diff --git a/linux-headers/asm-arm/kvm.h b/linux-headers/asm-arm/kvm.h
index 541268c..5d58ec2 100644
--- a/linux-headers/asm-arm/kvm.h
+++ b/linux-headers/asm-arm/kvm.h
@@ -105,6 +105,7 @@ struct kvm_debug_exit_arch {
 };
 
 struct kvm_sync_regs {
+   __u8 timer_irq_level;
 };
 
 struct kvm_arch_memory_slot {
diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index fd5a276..0e1cbd1 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -143,6 +143,7 @@ struct kvm_debug_exit_arch {
 #define KVM_GUESTDBG_USE_HW(1 << 17)
 
 struct kvm_sync_regs {
+   __u8 timer_irq_level;
 };
 
 struct kvm_arch_memory_slot {
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 4806e06..737113c 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -870,6 +870,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_S390_USER_INSTR0 130
 #define KVM_CAP_MSI_DEVID 131
 #define KVM_CAP_PPC_HTM 132
+#define KVM_CAP_ARM_TIMER 133
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1327,4 +1328,9 @@ struct kvm_assigned_msix_entry {
 #define KVM_X2APIC_API_USE_32BIT_IDS(1ULL << 0)
 #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK  (1ULL << 1)
 
+/* Available with KVM_CAP_ARM_TIMER */
+
+/* Bits for run->arm_timer.timesource */
+#define KVM_ARM_TIMER_VTIMER   (1 << 0)
+
 #endif /* __LINUX_KVM_H */
-- 
1.8.5.6

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC 2/2] ARM: KVM: Enable in-kernel timers with user space gic

2016-10-29 Thread Alexander Graf
When running with KVM enabled, you can choose between emulating the
gic in kernel or user space. If the kernel supports in-kernel virtualization
of the interrupt controller, it will default to that. If not, if will
default to user space emulation.

Unfortunately when running in user mode gic emulation, we miss out on
timer events which are only available from kernel space. This patch leverages
the new kernel/user space pending line synchronization for those timer events.

Signed-off-by: Alexander Graf <ag...@suse.de>
---
 hw/arm/virt.c| 10 ++
 target-arm/cpu.h |  3 +++
 target-arm/kvm.c | 19 +++
 3 files changed, 32 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 070bbf8..8ac81e3 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -622,6 +622,16 @@ static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic, 
int type,
 } else if (type == 2) {
 create_v2m(vbi, pic);
 }
+
+#ifdef CONFIG_KVM
+if (kvm_enabled() && !kvm_irqchip_in_kernel()) {
+if (!kvm_check_extension(kvm_state, KVM_CAP_ARM_TIMER)) {
+error_report("KVM with user space irqchip only works when the "
+ "host kernel supports KVM_CAP_ARM_TIMER");
+exit(1);
+}
+}
+#endif
 }
 
 static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic, int uart,
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 19d967b..7686082 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -659,6 +659,9 @@ struct ARMCPU {
 
 ARMELChangeHook *el_change_hook;
 void *el_change_hook_opaque;
+
+/* Used to synchronize KVM and QEMU timer levels */
+uint8_t timer_irq_level;
 };
 
 static inline ARMCPU *arm_env_get_cpu(CPUARMState *env)
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index c00b94e..0d8b642 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -527,6 +527,25 @@ void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
 
 MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
 {
+ARMCPU *cpu;
+
+if (kvm_irqchip_in_kernel()) {
+/*
+ * We only need to sync timer states with user-space interrupt
+ * controllers, so return early and save cycles if we don't.
+ */
+return MEMTXATTRS_UNSPECIFIED;
+}
+
+cpu = ARM_CPU(cs);
+
+/* Synchronize our internal vtimer irq line with the kvm one */
+if (run->s.regs.timer_irq_level != cpu->timer_irq_level) {
+qemu_set_irq(ARM_CPU(cs)->gt_timer_outputs[GTIMER_VIRT],
+ run->s.regs.timer_irq_level & KVM_ARM_TIMER_VTIMER);
+cpu->timer_irq_level = run->s.regs.timer_irq_level;
+}
+
 return MEMTXATTRS_UNSPECIFIED;
 }
 
-- 
1.8.5.6

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/3] Support userspace irqchip with arch timers

2016-10-29 Thread Alexander Graf

> On 29 Oct 2016, at 15:19, Paolo Bonzini  wrote:
> 
> What the status of userspace for this thing? Are QEMU patches being
> posted and reviewed?
 
 I didn't see a notification that the patches were merged. Are they in
 Linus' tree yet? Then I can post enablement to qemu-devel.
>>> 
>>> I think you got it backward. I have no intention of merging them until I
>>> see a vague consensus on the userspace API, and a set of patches ready
>>> to be merged in QEMU.
>> 
>> That's not how kvm apis are made.
> 
> Actually I think it's always been like this, depending on what Marc meant for
> "ready to be merged in QEMU".  It doesn't make sense to merge KVM APIs without
> having userspace patches at least posted as RFC to qemu-devel, and without
> having at least a positive response from the QEMU architecture maintainer.

I halfway agree. I do agree that there needs to be a reference implementation 
that proves the API to make sense. That bit is referenced in the cover letter 
of the patch set.

Peter as the QEMU architecture maintainer has been part of the review process 
and involved from the beginning. In fact, the current approach was his idea.

Do we need to fly the loop over qemu-devel? I doubt it, but if it makes you 
happy I can post an RFC there too.

> ARM does require a bit more care because there's no overlap between kernel
> and userspace maintainers, so perhaps that's the source of the confusion?
> 
> Now, of course merging the patches in QEMU may take a month or two depending
> on the timing (because you have to wait for the patches to be merged into
> Linus's tree and for the KVM headers to be updated in QEMU---which is not
> going to happen during freeze of course).  So of course the KVM patch thus
> can be committed even if QEMU is in freeze, as long as the QEMU architecture
> maintainer gives an overall green light.

Right. My plan was to make sure that people agree on the KVM interface. Then 
directly send non-RFC patches to qemu-devel, which can only happen when the KVM 
patches are merged. I didn’t see any reason why that approach would be 
controversial, since everyone who really cared was involved.

But again, I don’t care strongly enough to make a fuss. If people are happier 
with RFC patches on the ML rather than a github link to RFC patches, I’ll send 
them.


Alex

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/3] Support userspace irqchip with arch timers

2016-10-28 Thread Alexander Graf


> Am 28.10.2016 um 17:57 schrieb Marc Zyngier <marc.zyng...@arm.com>:
> 
>> On 28/10/16 16:52, Alexander Graf wrote:
>> 
>> 
>>> Am 28.10.2016 um 16:38 schrieb Marc Zyngier <marc.zyng...@arm.com>:
>>> 
>>> Alex,
>>> 
>>>> On 30/09/16 20:31, Alexander Graf wrote:
>>>> 
>>>> 
>>>>>> On 30.09.16 17:43, Christoffer Dall wrote:
>>>>>> On Fri, Sep 30, 2016 at 05:38:11PM +0200, Alexander Graf wrote:
>>>>>> 
>>>>>> 
>>>>>>> On 30.09.16 16:54, Alexander Graf wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> On 27.09.16 21:08, Christoffer Dall wrote:
>>>>>>>> Hi Alex,
>>>>>>>> 
>>>>>>>> Marc and I have been looking at this during Linaro connect and have
>>>>>>>> slightly reworked your patch into this small series.
>>>>>>>> 
>>>>>>>> It would be good if you could have a look at it and test it out.
>>>>>>>> 
>>>>>>>> I've tested it with your QEMU, and it works for UP, but secondary CPUs
>>>>>>>> fail to come up, and it looks like the kernel never gets an IPI for
>>>>>>>> those CPUs from userspace.  Any chance you're willing to take a look at
>>>>>>>> that?
>>>>>>> 
>>>>>>> I still need to see whether I can come up with a prettier solution, but
>>>>>>> for now this works:
>>>>>>> 
>>>>>>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>>>>>> 
>>>>>> Eh, no, not in i386 code :). But the problem seems to be a missing
>>>>>> mpstate sync.
>>>>>> 
>>>>> Yeah, that looked really dodgy.  Have you tested it? :)
>>>> 
>>>> This time around tested with the correct command line parameters I hope
>>>> :). I'll send a pretty patch later.
>>>> 
>>>> diff --git a/target-arm/kvm.c b/target-arm/kvm.c
>>>> index b4c8fe2..b549f00 100644
>>>> --- a/target-arm/kvm.c
>>>> +++ b/target-arm/kvm.c
>>>> @@ -173,6 +173,12 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>>> */
>>>>kvm_async_interrupts_allowed = true;
>>>> 
>>>> +/*
>>>> + * PSCI wakes up secondary cores, so we always need to
>>>> + * have vCPUs waiting in kernel space
>>>> + */
>>>> +kvm_halt_in_kernel_allowed = true;
>>>> +
>>>>cap_has_mp_state = kvm_check_extension(s, KVM_CAP_MP_STATE);
>>>> 
>>>>type_register_static(_arm_cpu_type_info);
>>> 
>>> What the status of userspace for this thing? Are QEMU patches being
>>> posted and reviewed?
>> 
>> I didn't see a notification that the patches were merged. Are they in
>> Linus' tree yet? Then I can post enablement to qemu-devel.
> 
> I think you got it backward. I have no intention of merging them until I
> see a vague consensus on the userspace API, and a set of patches ready
> to be merged in QEMU.

That's not how kvm apis are made.


Alex


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/3] Support userspace irqchip with arch timers

2016-10-28 Thread Alexander Graf


> Am 28.10.2016 um 16:38 schrieb Marc Zyngier <marc.zyng...@arm.com>:
> 
> Alex,
> 
>> On 30/09/16 20:31, Alexander Graf wrote:
>> 
>> 
>>> On 30.09.16 17:43, Christoffer Dall wrote:
>>>> On Fri, Sep 30, 2016 at 05:38:11PM +0200, Alexander Graf wrote:
>>>> 
>>>> 
>>>>> On 30.09.16 16:54, Alexander Graf wrote:
>>>>> 
>>>>> 
>>>>>> On 27.09.16 21:08, Christoffer Dall wrote:
>>>>>> Hi Alex,
>>>>>> 
>>>>>> Marc and I have been looking at this during Linaro connect and have
>>>>>> slightly reworked your patch into this small series.
>>>>>> 
>>>>>> It would be good if you could have a look at it and test it out.
>>>>>> 
>>>>>> I've tested it with your QEMU, and it works for UP, but secondary CPUs
>>>>>> fail to come up, and it looks like the kernel never gets an IPI for
>>>>>> those CPUs from userspace.  Any chance you're willing to take a look at
>>>>>> that?
>>>>> 
>>>>> I still need to see whether I can come up with a prettier solution, but
>>>>> for now this works:
>>>>> 
>>>>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>>>> 
>>>> Eh, no, not in i386 code :). But the problem seems to be a missing
>>>> mpstate sync.
>>>> 
>>> Yeah, that looked really dodgy.  Have you tested it? :)
>> 
>> This time around tested with the correct command line parameters I hope
>> :). I'll send a pretty patch later.
>> 
>> diff --git a/target-arm/kvm.c b/target-arm/kvm.c
>> index b4c8fe2..b549f00 100644
>> --- a/target-arm/kvm.c
>> +++ b/target-arm/kvm.c
>> @@ -173,6 +173,12 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>  */
>> kvm_async_interrupts_allowed = true;
>> 
>> +/*
>> + * PSCI wakes up secondary cores, so we always need to
>> + * have vCPUs waiting in kernel space
>> + */
>> +kvm_halt_in_kernel_allowed = true;
>> +
>> cap_has_mp_state = kvm_check_extension(s, KVM_CAP_MP_STATE);
>> 
>> type_register_static(_arm_cpu_type_info);
> 
> What the status of userspace for this thing? Are QEMU patches being
> posted and reviewed?

I didn't see a notification that the patches were merged. Are they in Linus' 
tree yet? Then I can post enablement to qemu-devel.

Alex


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: development guide for installing KVMARM kernel

2016-10-11 Thread Alexander Graf


On 10.10.16 15:50, Marc Zyngier wrote:
> On 2016-10-06 19:54, Eugene Bagdasaryan wrote:
>> Hey Marc
>>
>> I basically want to try and install OpenStack on RPi, which I guess
>> is possible.
> 
> Sounds pretty ambitious, but I guess someone has to try...
> 
>> (1) For the 64-bit arm OS, what would you suggest?
> 
> I'll defer that to the local RPi zealot (Alex, cc'd), because I've
> stopped caring about this platform long before it even existed.
> 
> Alex, would you mind letting Eugene know how to run a full 64bit
> mainline kernel + userspace on a RPi3?

Sure! If everything works (and if it doesn't, please report to the
mailing list), just follow the description below:


https://en.opensuse.org/HCL:Raspberry_Pi3#Installing_the_64-bit_non-upstream_openSUSE_Tumbleweed_image

Get the image, put it on an SD card, boot it up (it takes ~5 minutes for
the first boot), then install your own upstream kernel on it and run it.
4.8 should work just fine.

If you also want KVM, make sure you have the user space irqchip fixes in
your kernel.


Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/3] Support userspace irqchip with arch timers

2016-09-30 Thread Alexander Graf


On 30.09.16 17:43, Christoffer Dall wrote:
> On Fri, Sep 30, 2016 at 05:38:11PM +0200, Alexander Graf wrote:
>>
>>
>> On 30.09.16 16:54, Alexander Graf wrote:
>>>
>>>
>>> On 27.09.16 21:08, Christoffer Dall wrote:
>>>> Hi Alex,
>>>>
>>>> Marc and I have been looking at this during Linaro connect and have
>>>> slightly reworked your patch into this small series.
>>>>
>>>> It would be good if you could have a look at it and test it out.
>>>>
>>>> I've tested it with your QEMU, and it works for UP, but secondary CPUs
>>>> fail to come up, and it looks like the kernel never gets an IPI for
>>>> those CPUs from userspace.  Any chance you're willing to take a look at
>>>> that?
>>>
>>> I still need to see whether I can come up with a prettier solution, but
>>> for now this works:
>>>
>>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>>
>> Eh, no, not in i386 code :). But the problem seems to be a missing
>> mpstate sync.
>>
> Yeah, that looked really dodgy.  Have you tested it? :)

This time around tested with the correct command line parameters I hope
:). I'll send a pretty patch later.

diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index b4c8fe2..b549f00 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -173,6 +173,12 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
  */
 kvm_async_interrupts_allowed = true;

+/*
+ * PSCI wakes up secondary cores, so we always need to
+ * have vCPUs waiting in kernel space
+ */
+kvm_halt_in_kernel_allowed = true;
+
 cap_has_mp_state = kvm_check_extension(s, KVM_CAP_MP_STATE);

 type_register_static(_arm_cpu_type_info);


Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/3] Support userspace irqchip with arch timers

2016-09-30 Thread Alexander Graf


On 30.09.16 17:43, Christoffer Dall wrote:
> On Fri, Sep 30, 2016 at 05:38:11PM +0200, Alexander Graf wrote:
>>
>>
>> On 30.09.16 16:54, Alexander Graf wrote:
>>>
>>>
>>> On 27.09.16 21:08, Christoffer Dall wrote:
>>>> Hi Alex,
>>>>
>>>> Marc and I have been looking at this during Linaro connect and have
>>>> slightly reworked your patch into this small series.
>>>>
>>>> It would be good if you could have a look at it and test it out.
>>>>
>>>> I've tested it with your QEMU, and it works for UP, but secondary CPUs
>>>> fail to come up, and it looks like the kernel never gets an IPI for
>>>> those CPUs from userspace.  Any chance you're willing to take a look at
>>>> that?
>>>
>>> I still need to see whether I can come up with a prettier solution, but
>>> for now this works:
>>>
>>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>>
>> Eh, no, not in i386 code :). But the problem seems to be a missing
>> mpstate sync.
>>
> Yeah, that looked really dodgy.  Have you tested it? :)

I have, but I ran the wrong command line and by accident used -M
...,kernel-irqchip=on :)


Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v7] KVM: arm/arm64: Route vtimer events to user space

2016-09-26 Thread Alexander Graf
We have 2 modes for dealing with interrupts in the ARM world. We can either
handle them all using hardware acceleration through the vgic or we can emulate
a gic in user space and only drive CPU IRQ pins from there.

Unfortunately, when driving IRQs from user space, we never tell user space
about timer events that may result in interrupt line state changes, so we
lose out on timer events if we run with user space gic emulation.

This patch fixes that by exposing kvm's view of the vtimer irq line to user
space and giving it a chance to sync that when it changes.

With this patch I can successfully run edk2 and Linux with user space gic
emulation.

Signed-off-by: Alexander Graf <ag...@suse.de>

---

v1 -> v2:

  - Add back curly brace that got lost

v2 -> v3:

  - Split into patch set

v3 -> v4:

  - Improve documentation

v4 -> v5:

  - Rewrite to use pending state sync in sregs (marc)
  - Remove redundant checks of vgic_initialized()

v5 -> v6:

  - s/pending/asserted/
  - split kvm_timer_flush_hwstate function
  - remove user_timer_asserted
  - rename kernel_timer_asserted to timer_irq_level
  - exit only once per level change

v6 -> v7:

  - Move run struct sync out of the run loop
  - Only check for run sync in kvm_timer_flush_hwstate()
  - qemu tree to try this out: https://github.com/agraf/qemu.git 
no-kvm-irqchip-for-v6
---
 Documentation/virtual/kvm/api.txt |  27 +
 arch/arm/include/uapi/asm/kvm.h   |   2 +
 arch/arm/kvm/arm.c|  18 +++---
 arch/arm64/include/uapi/asm/kvm.h |   2 +
 include/kvm/arm_arch_timer.h  |   3 +-
 include/uapi/linux/kvm.h  |   6 ++
 virt/kvm/arm/arch_timer.c | 122 --
 7 files changed, 150 insertions(+), 30 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 739db9a..20ab558 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3928,3 +3928,30 @@ In order to use SynIC, it has to be activated by setting 
this
 capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
 will disable the use of APIC hardware virtualization even if supported
 by the CPU, as it's incompatible with SynIC auto-EOI behavior.
+
+8.3 KVM_CAP_ARM_TIMER
+
+Architectures: arm, arm64
+This capability, if KVM_CHECK_EXTENSION indicates that it is available and no
+in-kernel interrupt controller is in use, means that that the kernel populates
+the vcpu's run->s.regs.timer_irq_level field with timers that are currently
+considered asserted by kvm.
+
+If active, kvm guarantees at least one exit happens after it detects an
+assertion change. This exit could either be a KVM_EXIT_INTR or any other exit
+event, like KVM_EXIT_MMIO. This way kvm gives user space a chance to update its
+own interrupt asserted status. This usually involves triggering an interrupt
+line on a user space emulated interrupt controller, so user space should always
+check the state of run->s.regs.timer_irq_level on every kvm exit.
+
+The field run->s.regs.timer_irq_level is available independent of
+run->kvm_valid_regs or run->kvm_dirty_regs bits. If no in-kernel interrupt
+controller is used and the capability exists, it will always be available and
+populated.
+
+Currently the following bits are defined for the timer_irq_level bitmap:
+
+KVM_ARM_TIMER_VTIMER  -  virtual timer
+
+Future versions of kvm may implement additional timer events. These will get
+indicated by additional KVM_CAP extensions.
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index a2b3eb3..ad61b4f 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -105,6 +105,8 @@ struct kvm_debug_exit_arch {
 };
 
 struct kvm_sync_regs {
+   /* Used with KVM_CAP_ARM_TIMER */
+   u8 timer_irq_level;
 };
 
 struct kvm_arch_memory_slot {
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 75f130e..c4c1734 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -187,6 +187,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ARM_PSCI_0_2:
case KVM_CAP_READONLY_MEM:
case KVM_CAP_MP_STATE:
+   case KVM_CAP_ARM_TIMER:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -474,13 +475,7 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
return ret;
}
 
-   /*
-* Enable the arch timers only if we have an in-kernel VGIC
-* and it has been properly initialized, since we cannot handle
-* interrupts from the virtual timer with a userspace gic.
-*/
-   if (irqchip_in_kernel(kvm) && vgic_initialized(kvm))
-   ret = kvm_timer_enable(vcpu);
+   ret = kvm_timer_enable(vcpu);
 
return ret;
 }
@@ -549,7 +544,7 @@ static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
  */
 int kvm_arch_vcpu_ioctl

Re: [PATCH v4 2/2] KVM: arm/arm64: Route vtimer events to user space

2016-09-20 Thread Alexander Graf


On 20.09.16 11:21, Marc Zyngier wrote:
> On 19/09/16 18:39, Alexander Graf wrote:
>>
>>
>> On 19.09.16 16:48, Marc Zyngier wrote:
>>> On 19/09/16 12:14, Alexander Graf wrote:
>>>> We have 2 modes for dealing with interrupts in the ARM world. We can either
>>>> handle them all using hardware acceleration through the vgic or we can 
>>>> emulate
>>>> a gic in user space and only drive CPU IRQ pins from there.
>>>>
>>>> Unfortunately, when driving IRQs from user space, we never tell user space
>>>> about timer events that may result in interrupt line state changes, so we
>>>> lose out on timer events if we run with user space gic emulation.
>>>>
>>>> This patch fixes that by routing vtimer expiration events to user space.
>>>> With this patch I can successfully run edk2 and Linux with user space gic
>>>> emulation.
>>>>
>>>> Signed-off-by: Alexander Graf <ag...@suse.de>
>>>>
>>>> ---
>>>>
>>>> v1 -> v2:
>>>>
>>>>   - Add back curly brace that got lost
>>>>
>>>> v2 -> v3:
>>>>
>>>>   - Split into patch set
>>>>
>>>> v3 -> v4:
>>>>
>>>>   - Improve documentation
>>>> ---
>>>>  Documentation/virtual/kvm/api.txt |  30 -
>>>>  arch/arm/include/asm/kvm_host.h   |   3 +
>>>>  arch/arm/kvm/arm.c|  22 ---
>>>>  arch/arm64/include/asm/kvm_host.h |   3 +
>>>>  include/uapi/linux/kvm.h  |  14 +
>>>>  virt/kvm/arm/arch_timer.c | 125 
>>>> +++---
>>>>  6 files changed, 155 insertions(+), 42 deletions(-)
>>>>
>>>> diff --git a/Documentation/virtual/kvm/api.txt 
>>>> b/Documentation/virtual/kvm/api.txt
>>>> index 23937e0..1c0bd86 100644
>>>> --- a/Documentation/virtual/kvm/api.txt
>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>> @@ -3202,9 +3202,14 @@ struct kvm_run {
>>>>/* in */
>>>>__u8 request_interrupt_window;
>>>>  
>>>> -Request that KVM_RUN return when it becomes possible to inject external
>>>> +[x86] Request that KVM_RUN return when it becomes possible to inject 
>>>> external
>>>>  interrupts into the guest.  Useful in conjunction with KVM_INTERRUPT.
>>>>  
>>>> +[arm*] Bits set to 1 in here mask IRQ lines that would otherwise 
>>>> potentially
>>>> +trigger forever. These lines are available:
>>>> +
>>>> +KVM_IRQWINDOW_VTIMER  -  Masks hw virtual timer irq while in guest
>>>> +
>>>>__u8 padding1[7];
>>>>  
>>>>/* out */
>>>> @@ -3519,6 +3524,18 @@ Hyper-V SynIC state change. Notification is used to 
>>>> remap SynIC
>>>>  event/message pages and to enable/disable SynIC messages/events processing
>>>>  in userspace.
>>>>  
>>>> +  /* KVM_EXIT_ARM_TIMER */
>>>> +  struct {
>>>> +  __u8 timesource;
>>>> +  } arm_timer;
>>>> +
>>>> +Indicates that a timer triggered that user space needs to handle and
>>>> +potentially mask with vcpu->run->request_interrupt_window to allow the
>>>> +guest to proceed. This only happens for timers that got enabled through
>>>> +KVM_CAP_ARM_TIMER. The following time sources are available:
>>>> +
>>>> +KVM_ARM_TIMER_VTIMER  - virtual cpu timer
>>>> +
>>>>/* Fix the size of the union. */
>>>>char padding[256];
>>>>};
>>>> @@ -3739,6 +3756,17 @@ Once this is done the KVM_REG_MIPS_VEC_* and 
>>>> KVM_REG_MIPS_MSA_* registers can be
>>>>  accessed, and the Config5.MSAEn bit is accessible via the KVM API and 
>>>> also from
>>>>  the guest.
>>>>  
>>>> +6.11 KVM_CAP_ARM_TIMER
>>>> +
>>>> +Architectures: arm, arm64
>>>> +Target: vcpu
>>>> +Parameters: args[0] contains a bitmap of timers to select (see 5.)
>>>> +
>>>> +This capability allows to route per-core timers into user space. When it's
>>>> +enabled and no in-kernel interrupt controller is in use, the timers 
>>>> selected
>>>> +by args[0] trigger KVM_EXIT_ARM_TIME

Re: [PATCH v4 2/2] KVM: arm/arm64: Route vtimer events to user space

2016-09-19 Thread Alexander Graf


On 19.09.16 16:48, Marc Zyngier wrote:
> On 19/09/16 12:14, Alexander Graf wrote:
>> We have 2 modes for dealing with interrupts in the ARM world. We can either
>> handle them all using hardware acceleration through the vgic or we can 
>> emulate
>> a gic in user space and only drive CPU IRQ pins from there.
>>
>> Unfortunately, when driving IRQs from user space, we never tell user space
>> about timer events that may result in interrupt line state changes, so we
>> lose out on timer events if we run with user space gic emulation.
>>
>> This patch fixes that by routing vtimer expiration events to user space.
>> With this patch I can successfully run edk2 and Linux with user space gic
>> emulation.
>>
>> Signed-off-by: Alexander Graf <ag...@suse.de>
>>
>> ---
>>
>> v1 -> v2:
>>
>>   - Add back curly brace that got lost
>>
>> v2 -> v3:
>>
>>   - Split into patch set
>>
>> v3 -> v4:
>>
>>   - Improve documentation
>> ---
>>  Documentation/virtual/kvm/api.txt |  30 -
>>  arch/arm/include/asm/kvm_host.h   |   3 +
>>  arch/arm/kvm/arm.c|  22 ---
>>  arch/arm64/include/asm/kvm_host.h |   3 +
>>  include/uapi/linux/kvm.h  |  14 +
>>  virt/kvm/arm/arch_timer.c | 125 
>> +++---
>>  6 files changed, 155 insertions(+), 42 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt 
>> b/Documentation/virtual/kvm/api.txt
>> index 23937e0..1c0bd86 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -3202,9 +3202,14 @@ struct kvm_run {
>>  /* in */
>>  __u8 request_interrupt_window;
>>  
>> -Request that KVM_RUN return when it becomes possible to inject external
>> +[x86] Request that KVM_RUN return when it becomes possible to inject 
>> external
>>  interrupts into the guest.  Useful in conjunction with KVM_INTERRUPT.
>>  
>> +[arm*] Bits set to 1 in here mask IRQ lines that would otherwise potentially
>> +trigger forever. These lines are available:
>> +
>> +KVM_IRQWINDOW_VTIMER  -  Masks hw virtual timer irq while in guest
>> +
>>  __u8 padding1[7];
>>  
>>  /* out */
>> @@ -3519,6 +3524,18 @@ Hyper-V SynIC state change. Notification is used to 
>> remap SynIC
>>  event/message pages and to enable/disable SynIC messages/events processing
>>  in userspace.
>>  
>> +/* KVM_EXIT_ARM_TIMER */
>> +struct {
>> +__u8 timesource;
>> +} arm_timer;
>> +
>> +Indicates that a timer triggered that user space needs to handle and
>> +potentially mask with vcpu->run->request_interrupt_window to allow the
>> +guest to proceed. This only happens for timers that got enabled through
>> +KVM_CAP_ARM_TIMER. The following time sources are available:
>> +
>> +KVM_ARM_TIMER_VTIMER  - virtual cpu timer
>> +
>>  /* Fix the size of the union. */
>>  char padding[256];
>>  };
>> @@ -3739,6 +3756,17 @@ Once this is done the KVM_REG_MIPS_VEC_* and 
>> KVM_REG_MIPS_MSA_* registers can be
>>  accessed, and the Config5.MSAEn bit is accessible via the KVM API and also 
>> from
>>  the guest.
>>  
>> +6.11 KVM_CAP_ARM_TIMER
>> +
>> +Architectures: arm, arm64
>> +Target: vcpu
>> +Parameters: args[0] contains a bitmap of timers to select (see 5.)
>> +
>> +This capability allows to route per-core timers into user space. When it's
>> +enabled and no in-kernel interrupt controller is in use, the timers selected
>> +by args[0] trigger KVM_EXIT_ARM_TIMER guest exits when they are pending,
>> +unless masked by vcpu->run->request_interrupt_window (see 5.).
>> +
>>  7. Capabilities that can be enabled on VMs
>>  --
>>  
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index de338d9..77d1f73 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -180,6 +180,9 @@ struct kvm_vcpu_arch {
>>  
>>  /* Detect first run of a vcpu */
>>  bool has_run_once;
>> +
>> +/* User space wants timer notifications */
>> +bool user_space_arm_timers;
> 
> Please move this to the timer structure.

Sure.

> 
>>  };
>>  
>>  struct kvm_vm_stat {
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm

[PATCH v4 2/2] KVM: arm/arm64: Route vtimer events to user space

2016-09-19 Thread Alexander Graf
We have 2 modes for dealing with interrupts in the ARM world. We can either
handle them all using hardware acceleration through the vgic or we can emulate
a gic in user space and only drive CPU IRQ pins from there.

Unfortunately, when driving IRQs from user space, we never tell user space
about timer events that may result in interrupt line state changes, so we
lose out on timer events if we run with user space gic emulation.

This patch fixes that by routing vtimer expiration events to user space.
With this patch I can successfully run edk2 and Linux with user space gic
emulation.

Signed-off-by: Alexander Graf <ag...@suse.de>

---

v1 -> v2:

  - Add back curly brace that got lost

v2 -> v3:

  - Split into patch set

v3 -> v4:

  - Improve documentation
---
 Documentation/virtual/kvm/api.txt |  30 -
 arch/arm/include/asm/kvm_host.h   |   3 +
 arch/arm/kvm/arm.c|  22 ---
 arch/arm64/include/asm/kvm_host.h |   3 +
 include/uapi/linux/kvm.h  |  14 +
 virt/kvm/arm/arch_timer.c | 125 +++---
 6 files changed, 155 insertions(+), 42 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 23937e0..1c0bd86 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3202,9 +3202,14 @@ struct kvm_run {
/* in */
__u8 request_interrupt_window;
 
-Request that KVM_RUN return when it becomes possible to inject external
+[x86] Request that KVM_RUN return when it becomes possible to inject external
 interrupts into the guest.  Useful in conjunction with KVM_INTERRUPT.
 
+[arm*] Bits set to 1 in here mask IRQ lines that would otherwise potentially
+trigger forever. These lines are available:
+
+KVM_IRQWINDOW_VTIMER  -  Masks hw virtual timer irq while in guest
+
__u8 padding1[7];
 
/* out */
@@ -3519,6 +3524,18 @@ Hyper-V SynIC state change. Notification is used to 
remap SynIC
 event/message pages and to enable/disable SynIC messages/events processing
 in userspace.
 
+   /* KVM_EXIT_ARM_TIMER */
+   struct {
+   __u8 timesource;
+   } arm_timer;
+
+Indicates that a timer triggered that user space needs to handle and
+potentially mask with vcpu->run->request_interrupt_window to allow the
+guest to proceed. This only happens for timers that got enabled through
+KVM_CAP_ARM_TIMER. The following time sources are available:
+
+KVM_ARM_TIMER_VTIMER  - virtual cpu timer
+
/* Fix the size of the union. */
char padding[256];
};
@@ -3739,6 +3756,17 @@ Once this is done the KVM_REG_MIPS_VEC_* and 
KVM_REG_MIPS_MSA_* registers can be
 accessed, and the Config5.MSAEn bit is accessible via the KVM API and also from
 the guest.
 
+6.11 KVM_CAP_ARM_TIMER
+
+Architectures: arm, arm64
+Target: vcpu
+Parameters: args[0] contains a bitmap of timers to select (see 5.)
+
+This capability allows to route per-core timers into user space. When it's
+enabled and no in-kernel interrupt controller is in use, the timers selected
+by args[0] trigger KVM_EXIT_ARM_TIMER guest exits when they are pending,
+unless masked by vcpu->run->request_interrupt_window (see 5.).
+
 7. Capabilities that can be enabled on VMs
 --
 
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index de338d9..77d1f73 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -180,6 +180,9 @@ struct kvm_vcpu_arch {
 
/* Detect first run of a vcpu */
bool has_run_once;
+
+   /* User space wants timer notifications */
+   bool user_space_arm_timers;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index c84b6ad..57bdb71 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -187,6 +187,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ARM_PSCI_0_2:
case KVM_CAP_READONLY_MEM:
case KVM_CAP_MP_STATE:
+   case KVM_CAP_ARM_TIMER:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -474,13 +475,7 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
return ret;
}
 
-   /*
-* Enable the arch timers only if we have an in-kernel VGIC
-* and it has been properly initialized, since we cannot handle
-* interrupts from the virtual timer with a userspace gic.
-*/
-   if (irqchip_in_kernel(kvm) && vgic_initialized(kvm))
-   ret = kvm_timer_enable(vcpu);
+   ret = kvm_timer_enable(vcpu);
 
return ret;
 }
@@ -601,6 +596,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
run->exit_reason = KVM_EXIT_INTR;
}
 
+   if (kvm_che

[PATCH v4 0/2] KVM: ARM: Enable vtimers with user space gic

2016-09-19 Thread Alexander Graf
Some systems out there (well, one type in particular - the Raspberry Pi series)
do have virtualization capabilities in the core, but no ARM GIC interrupt
controller.

To run on these systems, the cleanest route is to just handle all
interrupt delivery in user space and only deal with IRQ pins in the core
side in KVM.

This works pretty well already, but breaks when the guest starts to use
architected timers, as these are handled straight inside kernel space today.

This patch set allows user space to receive vtimer events as well as mask
them, so that we can handle all vtimer related interrupt injection from user
space, enabling us to use architected timer with user space gic emulation.

I have successfully run edk2 as well as Linux using these patches on a
Raspberry Pi 3 system with acceptable speed.

A branch with WIP QEMU code can be found here:

  https://github.com/agraf/qemu.git no-kvm-irqchip

To use the user space irqchip, just run it with

  $ qemu-system-aarch64 -M virt ...

if you're on a non-vgic host system. Or -M virt,kernel-irqchip=off if your
host system has vgic support.

v1 -> v2:

  - Add back curly brace that got lost

v2 -> v3:

  - Fix "only only" in documentation
  - Split patches
  - Remove kvm_emulate.h include

v3 -> v4:

  - Improve documentation

Alexander Graf (2):
  KVM: arm/arm64: Add vcpu ENABLE_CAP functionality
  KVM: arm/arm64: Route vtimer events to user space

 Documentation/virtual/kvm/api.txt |  34 ++-
 arch/arm/include/asm/kvm_host.h   |   3 +
 arch/arm/kvm/arm.c|  47 +++---
 arch/arm64/include/asm/kvm_host.h |   3 +
 include/uapi/linux/kvm.h  |  14 +
 virt/kvm/arm/arch_timer.c | 125 +++---
 6 files changed, 183 insertions(+), 43 deletions(-)

-- 
1.8.5.6

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 0/2] KVM: ARM: Enable vtimers with user space gic

2016-09-19 Thread Alexander Graf


On 16.09.16 15:30, Christoffer Dall wrote:
> On Fri, Sep 16, 2016 at 02:31:42PM +0200, Paolo Bonzini wrote:
>>
>>
>> On 16/09/2016 14:30, Christoffer Dall wrote:
>> This patch set allows user space to receive vtimer events as well as mask
>> them, so that we can handle all vtimer related interrupt injection from 
>> user
>> space, enabling us to use architected timer with user space gic 
>> emulation.
>
> I have already voiced my concerns in the past, including face to face,
> and I'm going to repeat it: I not keen at all on adding a new userspace
> interface that is going to bitrot extremely quickly.

 You don't have automated tests set up?  It's not going to bitrot if you
 test it, either with kvm-unit-tests or just by smoke-testing Linux.
 It's _for_ the raspi, but it's not limited to it.
>>>
>>> Our automated testing situation is not great, no.  Something we're
>>> looking at, but have resource problems with.
>>
>> But it's not a good reason to hold back a feature...
>>
> 
> I didn't say that exactly, but choosing not to merge something we cannot
> maintain and which we're not paid to look after and where there's a
> minimal interest, is not entirely unreasonable.
> 
> That being said, I'm not categorically against these patches, but I
> share Marc's view that we've already seen that non-vgic support had been
> broken for multiple versions without anyone complaining, and without
> automated testing or substantial interest in the work, the patches
> really are likely to bit-rot.

I know that it's very hard to grasp from an upstream maintainer
perspective, but keep in mind where the bulk of execution of kernel code
lies. The average life cycle of a "stable" Linux distribution's kernel
is a few years.

So far all regressions in the user space gic code have been found within
less than 1y of the respective code release. I'd say that counts for
quite a well used feature.

> But I haven't even looked at the patches in detail, I was just replying
> to the comment about testing.

Also keep in mind that without the architected timer support (and/or
without qemu patches than enable user space timers) the user space gic
support is pretty unusable to most people, so you obviously get less
reports.


Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 0/2] KVM: ARM: Enable vtimers with user space gic

2016-09-16 Thread Alexander Graf


> Am 16.09.2016 um 15:46 schrieb Andrew Jones :
> 
>> On Fri, Sep 16, 2016 at 03:30:27PM +0200, Christoffer Dall wrote:
>>> On Fri, Sep 16, 2016 at 02:31:42PM +0200, Paolo Bonzini wrote:
>>> 
>>> 
>>> On 16/09/2016 14:30, Christoffer Dall wrote:
>>> This patch set allows user space to receive vtimer events as well as 
>>> mask
>>> them, so that we can handle all vtimer related interrupt injection from 
>>> user
>>> space, enabling us to use architected timer with user space gic 
>>> emulation.
>> 
>> I have already voiced my concerns in the past, including face to face,
>> and I'm going to repeat it: I not keen at all on adding a new userspace
>> interface that is going to bitrot extremely quickly.
> 
> You don't have automated tests set up?  It's not going to bitrot if you
> test it, either with kvm-unit-tests or just by smoke-testing Linux.
> It's _for_ the raspi, but it's not limited to it.
 
 Our automated testing situation is not great, no.  Something we're
 looking at, but have resource problems with.
>>> 
>>> But it's not a good reason to hold back a feature...
>> 
>> I didn't say that exactly, but choosing not to merge something we cannot
>> maintain and which we're not paid to look after and where there's a
>> minimal interest, is not entirely unreasonable.
>> 
>> That being said, I'm not categorically against these patches, but I
>> share Marc's view that we've already seen that non-vgic support had been
>> broken for multiple versions without anyone complaining, and without
>> automated testing or substantial interest in the work, the patches
>> really are likely to bit-rot.
>> 
>> But I haven't even looked at the patches in detail, I was just replying
>> to the comment about testing.
> 
> This may be a great time to start encouraging feature writers to submit
> kvm-unit-tests patches at the same time as the feature (Hi Alex :-)

I actually started off implementing this with the help of kvm-unit-tests. It's 
awesome!

I'm lacking actual irq support to make the test reasonable though and wanted to 
get the kernel bits out first :). But I'll sit down on that again soon I hope.


Alex


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 0/2] KVM: ARM: Enable vtimers with user space gic

2016-09-16 Thread Alexander Graf

> On 16 Sep 2016, at 14:29, Christoffer Dall <christoffer.d...@linaro.org> 
> wrote:
> 
> On Fri, Sep 16, 2016 at 02:25:01PM +0200, Alexander Graf wrote:
>> 
>>> On 16 Sep 2016, at 12:20, Marc Zyngier <marc.zyng...@arm.com> wrote:
>>> 
>>> Hi Alex,
>>> 
>>> On 16/09/16 07:26, Alexander Graf wrote:
>>>> Some systems out there (well, one type in particular - the Raspberry Pi 
>>>> series)
>>>> do have virtualization capabilities in the core, but no ARM GIC interrupt
>>>> controller.
>>>> 
>>>> To run on these systems, the cleanest route is to just handle all
>>>> interrupt delivery in user space and only deal with IRQ pins in the core
>>>> side in KVM.
>>>> 
>>>> This works pretty well already, but breaks when the guest starts to use
>>>> architected timers, as these are handled straight inside kernel space 
>>>> today.
>>>> 
>>>> This patch set allows user space to receive vtimer events as well as mask
>>>> them, so that we can handle all vtimer related interrupt injection from 
>>>> user
>>>> space, enabling us to use architected timer with user space gic emulation.
>>> 
>>> I have already voiced my concerns in the past, including face to face,
>>> and I'm going to repeat it: I not keen at all on adding a new userspace
>>> interface that is going to bitrot extremely quickly.
>>> 
>>> Let's face it, this new ABI will have a single user, with a limited
>>> shelf life. I understand that the RPi is a popular product, but it looks
>>> fairly obvious that this kind of sub-standard HW will eventually
>>> disappear. We'll then be left with a userspace ABI that will break at
>> 
>> I’m not 100% convinced that this is the case. Emulating the GIC in user 
>> space can have other interesting use cases. For example, it might come in 
>> handy for nesting. It may be useful for migrating a gicv2 VM to a gicv3 host 
>> without gicv2 emulation as well.
>> 
> 
> I don't see why you'd do this; the VGIC hardware can perfectly well be
> used for nesting as well, and this works rather well.

Mostly because it’s more. I like my problems self-contained, and simulating 
only nesting on the CPU level is less to worry about than simulating vgic 
switchover as well. Of course eventually with nesting you’d want a nested vgic 
;).

> 
>>> every single release, given that nobody in the RPi community actually
>>> uses a mainline kernel.
>> 
>> I actually verified all of this patch on 4.8-rc5 upstream, which is the only 
>> 64bit kernel you can find for the RPi. So I’d expect the situation to change 
>> with 64bit.
>> 
>>> And breaking this ABI will introduce userspace exploitable bugs, like
>>> the one you've already shown. If anything, I would have loved to
>>> completely kill the whole userspace GIC, because nobody cares. Yes, I
>>> understand it is fun to have KVM running on the RPi. But the maintenance
>>> costs far outweigh the fun aspect already.
>> 
>> Having CPU pins accessible is very useful for use cases of KVM that are 
>> beyond your traditional VM.
>> 
>>> You could still run KVM with an external emulated timer (not the arch
>>> timer). No need for a new ABI for that.
>> 
>> That’s what I thought too, but turns out that it’s not quite as simple as I 
>> hoped ;).
> 
> Why not?  I thought a few people had this running recently.  What were
> the issues?  Perhaps I can convince someone to submit the patches they
> used if useful.

Just give it a try - I didn’t get any timer events and couldn’t quite figure 
out why. Patch is attached below.


Alex

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index a193b5a..f118ab8 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -91,6 +91,7 @@ typedef struct {
 bool secure;
 bool highmem;
 int32_t gic_version;
+bool archtimer;
 } VirtMachineState;

 #define TYPE_VIRT_MACHINE   MACHINE_TYPE_NAME("virt")
@@ -177,6 +178,7 @@ static const MemMapEntry a15memmap[] = {
 [VIRT_FW_CFG] = { 0x0902, 0x0018 },
 [VIRT_GPIO] =   { 0x0903, 0x1000 },
 [VIRT_SECURE_UART] ={ 0x0904, 0x1000 },
+[VIRT_SP804] =  { 0x0905, 0x1000 },
 [VIRT_MMIO] =   { 0x0a00, 0x0200 },
 /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
 [VIRT_PLATFORM_BUS] =   { 0x0c00, 0x0200 },
@@ -195,6 +197,7 @@ static const int a15irqmap[] = {
 [VIRT_PCIE] 

Re: [PATCH v2] KVM: arm/arm64: Route vtimer events to user space

2016-09-16 Thread Alexander Graf

> On 16 Sep 2016, at 11:44, Christoffer Dall <christoffer.d...@linaro.org> 
> wrote:
> 
> Alex,
> 
> On Fri, Sep 16, 2016 at 07:09:13AM +0200, Alexander Graf wrote:
>> We have 2 modes for dealing with interrupts in the ARM world. We can either
>> handle them all using hardware acceleration through the vgic or we can 
>> emulate
>> a gic in user space and only drive CPU IRQ pins from there.
>> 
>> Unfortunately, when driving IRQs from user space, we never tell user space
>> about timer events that may result in interrupt line state changes, so we
>> lose out on timer events if we run with user space gic emulation.
>> 
>> This patch set fixes that by routing vtimer expiration events to user space.
>> With this patch I can successfully run edk2 and Linux with user space gic
>> emulation.
> 
> I have two versions of v2.  Are there any differences or did it just go
> out twice or got duplicated somehow on my end?

Just ignore v2. I shouldn’t send emails that early in the morning. v3 is much 
better ;)


Alex

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 0/2] KVM: ARM: Enable vtimers with user space gic

2016-09-16 Thread Alexander Graf

> On 16 Sep 2016, at 12:20, Marc Zyngier <marc.zyng...@arm.com> wrote:
> 
> Hi Alex,
> 
> On 16/09/16 07:26, Alexander Graf wrote:
>> Some systems out there (well, one type in particular - the Raspberry Pi 
>> series)
>> do have virtualization capabilities in the core, but no ARM GIC interrupt
>> controller.
>> 
>> To run on these systems, the cleanest route is to just handle all
>> interrupt delivery in user space and only deal with IRQ pins in the core
>> side in KVM.
>> 
>> This works pretty well already, but breaks when the guest starts to use
>> architected timers, as these are handled straight inside kernel space today.
>> 
>> This patch set allows user space to receive vtimer events as well as mask
>> them, so that we can handle all vtimer related interrupt injection from user
>> space, enabling us to use architected timer with user space gic emulation.
> 
> I have already voiced my concerns in the past, including face to face,
> and I'm going to repeat it: I not keen at all on adding a new userspace
> interface that is going to bitrot extremely quickly.
> 
> Let's face it, this new ABI will have a single user, with a limited
> shelf life. I understand that the RPi is a popular product, but it looks
> fairly obvious that this kind of sub-standard HW will eventually
> disappear. We'll then be left with a userspace ABI that will break at

I’m not 100% convinced that this is the case. Emulating the GIC in user space 
can have other interesting use cases. For example, it might come in handy for 
nesting. It may be useful for migrating a gicv2 VM to a gicv3 host without 
gicv2 emulation as well.

> every single release, given that nobody in the RPi community actually
> uses a mainline kernel.

I actually verified all of this patch on 4.8-rc5 upstream, which is the only 
64bit kernel you can find for the RPi. So I’d expect the situation to change 
with 64bit.

> And breaking this ABI will introduce userspace exploitable bugs, like
> the one you've already shown. If anything, I would have loved to
> completely kill the whole userspace GIC, because nobody cares. Yes, I
> understand it is fun to have KVM running on the RPi. But the maintenance
> costs far outweigh the fun aspect already.

Having CPU pins accessible is very useful for use cases of KVM that are beyond 
your traditional VM.

> You could still run KVM with an external emulated timer (not the arch
> timer). No need for a new ABI for that.

That’s what I thought too, but turns out that it’s not quite as simple as I 
hoped ;). Also, I much rather at least have a common notion of “arch timers are 
always available on arm64” than “kvm always uses the vgic”. The former has much 
more impact and much more reach.


Alex

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 2/2] KVM: arm/arm64: Route vtimer events to user space

2016-09-16 Thread Alexander Graf
We have 2 modes for dealing with interrupts in the ARM world. We can either
handle them all using hardware acceleration through the vgic or we can emulate
a gic in user space and only drive CPU IRQ pins from there.

Unfortunately, when driving IRQs from user space, we never tell user space
about timer events that may result in interrupt line state changes, so we
lose out on timer events if we run with user space gic emulation.

This patch fixes that by routing vtimer expiration events to user space.
With this patch I can successfully run edk2 and Linux with user space gic
emulation.

Signed-off-by: Alexander Graf <ag...@suse.de>

---

v1 -> v2:

  - Add back curly brace that got lost

v2 -> v3:

  - Split into patch set
---
 Documentation/virtual/kvm/api.txt |  24 +++-
 arch/arm/include/asm/kvm_host.h   |   3 +
 arch/arm/kvm/arm.c|  22 ---
 arch/arm64/include/asm/kvm_host.h |   3 +
 include/uapi/linux/kvm.h  |  14 +
 virt/kvm/arm/arch_timer.c | 125 +++---
 6 files changed, 149 insertions(+), 42 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 23937e0..dec1a78 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3202,8 +3202,10 @@ struct kvm_run {
/* in */
__u8 request_interrupt_window;
 
-Request that KVM_RUN return when it becomes possible to inject external
+[x86] Request that KVM_RUN return when it becomes possible to inject external
 interrupts into the guest.  Useful in conjunction with KVM_INTERRUPT.
+[arm*] Bits set to 1 in here mask IRQ lines that would otherwise potentially
+trigger forever. Useful with KVM_CAP_ARM_TIMER.
 
__u8 padding1[7];
 
@@ -3519,6 +3521,16 @@ Hyper-V SynIC state change. Notification is used to 
remap SynIC
 event/message pages and to enable/disable SynIC messages/events processing
 in userspace.
 
+   /* KVM_EXIT_ARM_TIMER */
+   struct {
+   __u8 timesource;
+   } arm_timer;
+
+Indicates that a timer triggered that user space needs to handle and
+potentially mask with vcpu->run->request_interrupt_window to allow the
+guest to proceed. This only happens for timers that got enabled through
+KVM_CAP_ARM_TIMER.
+
/* Fix the size of the union. */
char padding[256];
};
@@ -3739,6 +3751,16 @@ Once this is done the KVM_REG_MIPS_VEC_* and 
KVM_REG_MIPS_MSA_* registers can be
 accessed, and the Config5.MSAEn bit is accessible via the KVM API and also from
 the guest.
 
+6.11 KVM_CAP_ARM_TIMER
+
+Architectures: arm, arm64
+Target: vcpu
+Parameters: args[0] contains a bitmap of timers to enable
+
+This capability allows to route per-core timers into user space. When it's
+enabled, the enabled timers trigger KVM_EXIT_ARM_TIMER guest exits when they
+are pending, unless masked by vcpu->run->request_interrupt_window.
+
 7. Capabilities that can be enabled on VMs
 --
 
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index de338d9..77d1f73 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -180,6 +180,9 @@ struct kvm_vcpu_arch {
 
/* Detect first run of a vcpu */
bool has_run_once;
+
+   /* User space wants timer notifications */
+   bool user_space_arm_timers;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index c84b6ad..57bdb71 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -187,6 +187,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ARM_PSCI_0_2:
case KVM_CAP_READONLY_MEM:
case KVM_CAP_MP_STATE:
+   case KVM_CAP_ARM_TIMER:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -474,13 +475,7 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
return ret;
}
 
-   /*
-* Enable the arch timers only if we have an in-kernel VGIC
-* and it has been properly initialized, since we cannot handle
-* interrupts from the virtual timer with a userspace gic.
-*/
-   if (irqchip_in_kernel(kvm) && vgic_initialized(kvm))
-   ret = kvm_timer_enable(vcpu);
+   ret = kvm_timer_enable(vcpu);
 
return ret;
 }
@@ -601,6 +596,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
run->exit_reason = KVM_EXIT_INTR;
}
 
+   if (kvm_check_request(KVM_REQ_PENDING_TIMER, vcpu)) {
+   /* Tell user space about the pending vtimer */
+   ret = 0;
+   run->exit_reason = KVM_EXIT_ARM_TIMER;
+   run->arm_timer.timesource = KVM_ARM_TIMER_VTIMER;
+   }
+
if

[PATCH v3 1/2] KVM: arm/arm64: Add vcpu ENABLE_CAP functionality

2016-09-16 Thread Alexander Graf
In a follow-up patch we will need to enable capabilities on demand for
backwards compatibility. This patch adds the generic framework to handle
vcpu cap enablement to the arm code base.

Signed-off-by: Alexander Graf <ag...@suse.de>
---
 Documentation/virtual/kvm/api.txt |  4 +++-
 arch/arm/kvm/arm.c| 25 +
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 739db9a..23937e0 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -997,7 +997,9 @@ documentation when it pops into existence).
 
 Capability: KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM
 Architectures: x86 (only KVM_CAP_ENABLE_CAP_VM),
-  mips (only KVM_CAP_ENABLE_CAP), ppc, s390
+  mips (only KVM_CAP_ENABLE_CAP), ppc, s390,
+  arm (only KVM_CAP_ENABLE_CAP),
+  arm64 (only KVM_CAP_ENABLE_CAP)
 Type: vcpu ioctl, vm ioctl (with KVM_CAP_ENABLE_CAP_VM)
 Parameters: struct kvm_enable_cap (in)
 Returns: 0 on success; -1 on error
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 75f130e..c84b6ad 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -878,6 +878,23 @@ static int kvm_arm_vcpu_has_attr(struct kvm_vcpu *vcpu,
return ret;
 }
 
+static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
+struct kvm_enable_cap *cap)
+{
+   int r;
+
+   if (cap->flags)
+   return -EINVAL;
+
+   switch (cap->cap) {
+   default:
+   r = -EINVAL;
+   break;
+   }
+
+   return r;
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 unsigned int ioctl, unsigned long arg)
 {
@@ -941,6 +958,14 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
return -EFAULT;
return kvm_arm_vcpu_has_attr(vcpu, );
}
+   case KVM_ENABLE_CAP:
+   {
+   struct kvm_enable_cap cap;
+
+   if (copy_from_user(, argp, sizeof(cap)))
+   return -EFAULT;
+   return kvm_vcpu_ioctl_enable_cap(vcpu, );
+   }
default:
return -EINVAL;
}
-- 
2.6.6

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm/arm64: timer: Fix hw sync for user space irqchip path

2016-09-15 Thread Alexander Graf
While adding the new vgic implementation, apparently nobody tested
the non-vgic path where user space controls the vgic, so two functions
slipped through the cracks that get called in generic code but don't
check whether hardware support is enabled.

This patch guards them with proper checks to ensure we only try to
use vgic data structures if they are available. Without this, I get
a stack trace:

[   74.363037] Unable to handle kernel paging request at virtual address 
ffe8
[...]
[   74.929654] [] _raw_spin_lock+0x1c/0x58
[   74.935133] [] kvm_vgic_flush_hwstate+0x88/0x288
[   74.941406] [] kvm_arch_vcpu_ioctl_run+0xfc/0x630
[   74.947766] [] kvm_vcpu_ioctl+0x2f4/0x710
[   74.953420] [] do_vfs_ioctl+0xb0/0x728
[   74.958807] [] SyS_ioctl+0x94/0xa8
[   74.963844] [] el0_svc_naked+0x38/0x3c

Fixes: 0919e84c0
Cc: sta...@vger.kernel.org
Signed-off-by: Alexander Graf <ag...@suse.de>
---
 virt/kvm/arm/vgic/vgic.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index e83b7fe..9f312ba 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -645,6 +645,9 @@ next:
 /* Sync back the hardware VGIC state into our emulation after a guest's run. */
 void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
+   if (!vcpu->kvm->arch.vgic.enabled)
+   return;
+
vgic_process_maintenance_interrupt(vcpu);
vgic_fold_lr_state(vcpu);
vgic_prune_ap_list(vcpu);
@@ -653,6 +656,9 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 /* Flush our emulation state into the GIC hardware before entering the guest. 
*/
 void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 {
+   if (!vcpu->kvm->arch.vgic.enabled)
+   return;
+
spin_lock(>arch.vgic_cpu.ap_list_lock);
vgic_flush_lr_state(vcpu);
spin_unlock(>arch.vgic_cpu.ap_list_lock);
-- 
2.6.6

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm/arm64: Route vtimer events to user space

2016-09-15 Thread Alexander Graf
We have 2 modes for dealing with interrupts in the ARM world. We can either
handle them all using hardware acceleration through the vgic or we can emulate
a gic in user space and only drive CPU IRQ pins from there.

Unfortunately, when driving IRQs from user space, we never tell user space
about timer events that may result in interrupt line state changes, so we
lose out on timer events if we run with user space gic emulation.

This patch set fixes that by routing vtimer expiration events to user space.
With this patch I can successfully run edk2 and Linux with user space gic
emulation.

Signed-off-by: Alexander Graf <ag...@suse.de>

---

A branch with WIP QEMU code can be found here:

  https://github.com/agraf/qemu.git no-kvm-irqchip
---
 Documentation/virtual/kvm/api.txt |  28 -
 arch/arm/include/asm/kvm_host.h   |   3 +
 arch/arm/kvm/arm.c|  46 +++---
 arch/arm64/include/asm/kvm_host.h |   3 +
 include/uapi/linux/kvm.h  |  14 +
 virt/kvm/arm/arch_timer.c | 126 --
 6 files changed, 177 insertions(+), 43 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 739db9a..6a64c53 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -997,7 +997,9 @@ documentation when it pops into existence).
 
 Capability: KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM
 Architectures: x86 (only KVM_CAP_ENABLE_CAP_VM),
-  mips (only KVM_CAP_ENABLE_CAP), ppc, s390
+  mips (only KVM_CAP_ENABLE_CAP), ppc, s390,
+  arm (only KVM_CAP_ENABLE_CAP),
+  arm64 (only only KVM_CAP_ENABLE_CAP)
 Type: vcpu ioctl, vm ioctl (with KVM_CAP_ENABLE_CAP_VM)
 Parameters: struct kvm_enable_cap (in)
 Returns: 0 on success; -1 on error
@@ -3200,8 +3202,10 @@ struct kvm_run {
/* in */
__u8 request_interrupt_window;
 
-Request that KVM_RUN return when it becomes possible to inject external
+[x86] Request that KVM_RUN return when it becomes possible to inject external
 interrupts into the guest.  Useful in conjunction with KVM_INTERRUPT.
+[arm*] Bits set to 1 in here mask IRQ lines that would otherwise potentially
+trigger forever. Useful with KVM_CAP_ARM_TIMER.
 
__u8 padding1[7];
 
@@ -3517,6 +3521,16 @@ Hyper-V SynIC state change. Notification is used to 
remap SynIC
 event/message pages and to enable/disable SynIC messages/events processing
 in userspace.
 
+   /* KVM_EXIT_ARM_TIMER */
+   struct {
+   __u8 timesource;
+   } arm_timer;
+
+Indicates that a timer triggered that user space needs to handle and
+potentially mask with vcpu->run->request_interrupt_window to allow the
+guest to proceed. This only happens for timers that got enabled through
+KVM_CAP_ARM_TIMER.
+
/* Fix the size of the union. */
char padding[256];
};
@@ -3737,6 +3751,16 @@ Once this is done the KVM_REG_MIPS_VEC_* and 
KVM_REG_MIPS_MSA_* registers can be
 accessed, and the Config5.MSAEn bit is accessible via the KVM API and also from
 the guest.
 
+6.11 KVM_CAP_ARM_TIMER
+
+Architectures: arm, arm64
+Target: vcpu
+Parameters: args[0] contains a bitmap of timers to enable
+
+This capability allows to route per-core timers into user space. When it's
+enabled, the enabled timers trigger KVM_EXIT_ARM_TIMER guest exits when they
+are pending, unless masked by vcpu->run->request_interrupt_window.
+
 7. Capabilities that can be enabled on VMs
 --
 
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index de338d9..77d1f73 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -180,6 +180,9 @@ struct kvm_vcpu_arch {
 
/* Detect first run of a vcpu */
bool has_run_once;
+
+   /* User space wants timer notifications */
+   bool user_space_arm_timers;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 75f130e..1b4b9a6 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -187,6 +187,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ARM_PSCI_0_2:
case KVM_CAP_READONLY_MEM:
case KVM_CAP_MP_STATE:
+   case KVM_CAP_ARM_TIMER:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -474,13 +475,7 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
return ret;
}
 
-   /*
-* Enable the arch timers only if we have an in-kernel VGIC
-* and it has been properly initialized, since we cannot handle
-* interrupts from the virtual timer with a userspace gic.
-*/
-   if (irqchip_in_kernel(kvm) && vgic_initialized(kvm))
-   ret = kvm_timer_enable(vcpu);
+   ret = kvm_timer_enable(vcpu);
 
return

Re: issues with emulated PCI MMIO backed by host memory under KVM

2016-06-28 Thread Alexander Graf



On 06/28/2016 12:55 PM, Laszlo Ersek wrote:

On 06/27/16 12:34, Christoffer Dall wrote:

On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote:

So first of all, let me reiterate that I could only find a single
instance in QEMU where a PCI MMIO region is backed by host memory,
which is vga-pci.c. I wonder of there are any other occurrences, but
if there aren't any, it makes much more sense to prohibit PCI BARs
backed by host memory rather than spend a lot of effort working around
it.

Right, ok.  So Marc's point during his KVM Forum talk was basically,
don't use the legacy VGA adapter on ARM and use virtio graphics, right?

The EFI GOP (Graphics Output Protocol) abstraction provides two ways for
UEFI applications to access the display, and one way for a runtime OS to
inherit the display hardware from the firmware (without OS native drivers).

(a) For UEFI apps:
- direct framebuffer access
- Blt() (block transfer) member function

(b) For runtime OS:
- direct framebuffer access ("efifb" in Linux)

Virtio-gpu lacks a linear framebuffer by design. Therefore the above
methods are reduced to the following:

(c) UEFI apps can access virtio-gpu with:
- GOP.Blt() member function only

(d) The runtime guest OS can access the virtio-gpu device as-inherited
from the firmware (i.e., without native drivers) with:
- n/a.

Given that we expect all aarch64 OSes to include native virtio-gpu
drivers on their install media, (d) is actually not a problem. Whenever
the OS kernel runs, we except to have no need for "efifb", ever. So
that's good.

The problem is (c). UEFI boot loaders would have to be taught to call
GOP.Blt() manually, whenever they need to display something. I'm not
sure about grub2's current status, but it is free software, so in theory
it should be doable. However, UEFI windows boot loaders are proprietary


Yes, grub2 already ignores the frame buffer target address and instead 
uses Blt() operations only.



*and* they require direct framebuffer access (on x86 at least); they
don't work with Blt()-only. (I found some Microsoft presentations about
this earlier.)

So, virtio-gpu is an almost universal solution for the problem, but not
entirely. For any given GOP, offering Blt() *only* (i.e., not exposing a
linear framebuffer) conforms to the UEFI spec, but some boot loaders are
known to present further requirements (on x86 anyway).


Well, I'm perfectly happy in ignoring Windows on KVM for now, if that 
gets us working, smooth Linux guest support :).



Alex

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: issues with emulated PCI MMIO backed by host memory under KVM

2016-06-27 Thread Alexander Graf


> Am 27.06.2016 um 15:57 schrieb Ard Biesheuvel :
> 
>> On 27 June 2016 at 15:35, Christoffer Dall  
>> wrote:
>>> On Mon, Jun 27, 2016 at 02:30:46PM +0200, Ard Biesheuvel wrote:
 On 27 June 2016 at 12:34, Christoffer Dall  
 wrote:
> On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote:
>> On 27 June 2016 at 11:16, Christoffer Dall  
>> wrote:
>> Hi,
>> 
>> I'm going to ask some stupid questions here...
>> 
>>> On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote:
>>> Hi all,
>>> 
>>> This old subject came up again in a discussion related to PCIe support
>>> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO
>>> regions as cacheable is preventing us from reusing a significant slice
>>> of the PCIe support infrastructure, and so I'd like to bring this up
>>> again, perhaps just to reiterate why we're simply out of luck.
>>> 
>>> To refresh your memories, the issue is that on ARM, PCI MMIO regions
>>> for emulated devices may be backed by memory that is mapped cacheable
>>> by the host. Note that this has nothing to do with the device being
>>> DMA coherent or not: in this case, we are dealing with regions that
>>> are not memory from the POV of the guest, and it is reasonable for the
>>> guest to assume that accesses to such a region are not visible to the
>>> device before they hit the actual PCI MMIO window and are translated
>>> into cycles on the PCI bus.
>> 
>> For the sake of completeness, why is this reasonable?
> 
> Because the whole point of accessing these regions is to communicate
> with the device. It is common to use write combining mappings for
> things like framebuffers to group writes before they hit the PCI bus,
> but any caching just makes it more difficult for the driver state and
> device state to remain synchronized.
> 
>> Is this how any real ARM system implementing PCI would actually work?
> 
> Yes.
> 
>>> That means that mapping such a region
>>> cacheable is a strange thing to do, in fact, and it is unlikely that
>>> patches implementing this against the generic PCI stack in Tianocore
>>> will be accepted by the maintainers.
>>> 
>>> Note that this issue not only affects framebuffers on PCI cards, it
>>> also affects emulated USB host controllers (perhaps Alex can remind us
>>> which one exactly?) and likely other emulated generic PCI devices as
>>> well.
>>> 
>>> Since the issue exists only for emulated PCI devices whose MMIO
>>> regions are backed by host memory, is there any way we can already
>>> distinguish such memslots from ordinary ones? If we can, is there
>>> anything we could do to treat these specially? Perhaps something like
>>> using read-only memslots so we can at least trap guest writes instead
>>> of having main memory going out of sync with the caches unnoticed? I
>>> am just brainstorming here ...
>> 
>> I think the only sensible solution is to make sure that the guest and
>> emulation mappings use the same memory type, either cached or
>> non-cached, and we 'simply' have to find the best way to implement this.
>> 
>> As Drew suggested, forcing some S2 mappings to be non-cacheable is the
>> one way.
>> 
>> The other way is to use something like what you once wrote that rewrites
>> stage-1 mappings to be cacheable, does that apply here ?
>> 
>> Do we have a clear picture of why we'd prefer one way over the other?
> 
> So first of all, let me reiterate that I could only find a single
> instance in QEMU where a PCI MMIO region is backed by host memory,
> which is vga-pci.c. I wonder of there are any other occurrences, but
> if there aren't any, it makes much more sense to prohibit PCI BARs
> backed by host memory rather than spend a lot of effort working around
> it.
 
 Right, ok.  So Marc's point during his KVM Forum talk was basically,
 don't use the legacy VGA adapter on ARM and use virtio graphics, right?
>>> 
>>> Yes. But nothing is preventing you currently from using that, and I
>>> think we should prefer crappy performance but correct operation over
>>> the current situation. So in general, we should either disallow PCI
>>> BARs backed by host memory, or emulate them, but never back them by a
>>> RAM memslot when running under ARM/KVM.
>> 
>> agreed, I just think that emulating accesses by trapping them is not
>> just slow, it's not really possible in practice and even if it is, it's
>> probably *unusably* slow.
> 
> Well, it would probably involve a lot of effort to implement emulation
> of instructions with multiple output registers, such as ldp/stp and
> register writeback. And indeed, trapping on 

Re: issues with emulated PCI MMIO backed by host memory under KVM

2016-06-27 Thread Alexander Graf


> Am 27.06.2016 um 12:34 schrieb Christoffer Dall :
> 
>> On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote:
>>> On 27 June 2016 at 11:16, Christoffer Dall  
>>> wrote:
>>> Hi,
>>> 
>>> I'm going to ask some stupid questions here...
>>> 
 On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote:
 Hi all,
 
 This old subject came up again in a discussion related to PCIe support
 for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO
 regions as cacheable is preventing us from reusing a significant slice
 of the PCIe support infrastructure, and so I'd like to bring this up
 again, perhaps just to reiterate why we're simply out of luck.
 
 To refresh your memories, the issue is that on ARM, PCI MMIO regions
 for emulated devices may be backed by memory that is mapped cacheable
 by the host. Note that this has nothing to do with the device being
 DMA coherent or not: in this case, we are dealing with regions that
 are not memory from the POV of the guest, and it is reasonable for the
 guest to assume that accesses to such a region are not visible to the
 device before they hit the actual PCI MMIO window and are translated
 into cycles on the PCI bus.
>>> 
>>> For the sake of completeness, why is this reasonable?
>> 
>> Because the whole point of accessing these regions is to communicate
>> with the device. It is common to use write combining mappings for
>> things like framebuffers to group writes before they hit the PCI bus,
>> but any caching just makes it more difficult for the driver state and
>> device state to remain synchronized.
>> 
>>> Is this how any real ARM system implementing PCI would actually work?
>> 
>> Yes.
>> 
 That means that mapping such a region
 cacheable is a strange thing to do, in fact, and it is unlikely that
 patches implementing this against the generic PCI stack in Tianocore
 will be accepted by the maintainers.
 
 Note that this issue not only affects framebuffers on PCI cards, it
 also affects emulated USB host controllers (perhaps Alex can remind us
 which one exactly?) and likely other emulated generic PCI devices as
 well.
 
 Since the issue exists only for emulated PCI devices whose MMIO
 regions are backed by host memory, is there any way we can already
 distinguish such memslots from ordinary ones? If we can, is there
 anything we could do to treat these specially? Perhaps something like
 using read-only memslots so we can at least trap guest writes instead
 of having main memory going out of sync with the caches unnoticed? I
 am just brainstorming here ...
>>> 
>>> I think the only sensible solution is to make sure that the guest and
>>> emulation mappings use the same memory type, either cached or
>>> non-cached, and we 'simply' have to find the best way to implement this.
>>> 
>>> As Drew suggested, forcing some S2 mappings to be non-cacheable is the
>>> one way.
>>> 
>>> The other way is to use something like what you once wrote that rewrites
>>> stage-1 mappings to be cacheable, does that apply here ?
>>> 
>>> Do we have a clear picture of why we'd prefer one way over the other?
>> 
>> So first of all, let me reiterate that I could only find a single
>> instance in QEMU where a PCI MMIO region is backed by host memory,
>> which is vga-pci.c. I wonder of there are any other occurrences, but
>> if there aren't any, it makes much more sense to prohibit PCI BARs
>> backed by host memory rather than spend a lot of effort working around
>> it.
> 
> Right, ok.  So Marc's point during his KVM Forum talk was basically,
> don't use the legacy VGA adapter on ARM and use virtio graphics, right?
> 
> What is the proposed solution for someone shipping an ARM server and
> wishing to provide a graphical output for that server?

Well, there is at least one server that I know of that has PCI VGA built in ;).

I think he was more concerned about VMs rather than real hardware.

> 
> It feels strange to work around supporting PCI VGA adapters in ARM VMs,
> if that's not a supported real hardware case.  However, I don't see what
> would prevent someone from plugging a VGA adapter into the PCI slot on
> an ARM server, and people selling ARM servers probably want this to
> happen, I'm guessing.
> 
>> 
>> If we do decide to fix this, the best way would be to use uncached
>> attributes for the QEMU userland mapping, and force it uncached in the
>> guest via a stage 2 override (as Drews suggests). The only problem I
>> see here is that the host's kernel direct mapping has a cached alias
>> that we need to get rid of.
> 
> Do we have a way to accomplish that?
> 
> Will we run into a bunch of other problems if we begin punching holes in
> the direct mapping for regular RAM?

Yeah, and how do you deal with aliases on that memory? You'd also need to stop 
ksm to run on it 

Re: issues with emulated PCI MMIO backed by host memory under KVM

2016-06-25 Thread Alexander Graf


> Am 24.06.2016 um 16:04 schrieb Ard Biesheuvel :
> 
> Hi all,
> 
> This old subject came up again in a discussion related to PCIe support
> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO
> regions as cacheable is preventing us from reusing a significant slice
> of the PCIe support infrastructure, and so I'd like to bring this up
> again, perhaps just to reiterate why we're simply out of luck.
> 
> To refresh your memories, the issue is that on ARM, PCI MMIO regions
> for emulated devices may be backed by memory that is mapped cacheable
> by the host. Note that this has nothing to do with the device being
> DMA coherent or not: in this case, we are dealing with regions that
> are not memory from the POV of the guest, and it is reasonable for the
> guest to assume that accesses to such a region are not visible to the
> device before they hit the actual PCI MMIO window and are translated
> into cycles on the PCI bus. That means that mapping such a region
> cacheable is a strange thing to do, in fact, and it is unlikely that
> patches implementing this against the generic PCI stack in Tianocore
> will be accepted by the maintainers.
> 
> Note that this issue not only affects framebuffers on PCI cards, it
> also affects emulated USB host controllers (perhaps Alex can remind us
> which one exactly?) and likely other emulated generic PCI devices as
> well.
> 
> Since the issue exists only for emulated PCI devices whose MMIO
> regions are backed by host memory, is there any way we can already
> distinguish such memslots from ordinary ones? If we can, is there
> anything we could do to treat these specially? Perhaps something like
> using read-only memslots so we can at least trap guest writes instead
> of having main memory going out of sync with the caches unnoticed? I
> am just brainstorming here ...

The "easiest" first step would be to simply not map host memory into the guest 
when we're on arm. Unfortunately that would mean we trap on everything as mmio 
accesses, including user space access from Xorg for example. That in turn means 
we'd need to mmio emulate neon instructions and all other sorts of things that 
can trigger mmio exits without being emulated today.

Also, even with that working and maybe even coalesced mmio implemented, I'd 
guess it'd still be too slow for real world usage...


Alex


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: issues with emulated PCI MMIO backed by host memory under KVM

2016-06-25 Thread Alexander Graf


> Am 24.06.2016 um 20:16 schrieb Ard Biesheuvel :
> 
>> On 24 June 2016 at 16:04, Ard Biesheuvel  wrote:
>> [...]
>> Note that this issue not only affects framebuffers on PCI cards, it
>> also affects emulated USB host controllers (perhaps Alex can remind us
>> which one exactly?)
> 
> Actually, looking at the QEMU source code, I am not able to spot the
> USB hcd emulation code that backs a PCI MMIO BAR using host memory,
> and in fact, the only instance I *can* find is vga-pci.c
> 
> @Alex: could you please explain which exact issue with USB emulation
> is suspected to be caused by this?

IIRC Linux put thhe usb rings into guest memory and mapped them as NC inside 
the guest. So the host will see stale data from the cache.


Alex


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 1/2] arm/arm64: KVM: Detect vGIC presence at runtime

2016-04-21 Thread Alexander Graf


On 22.04.16 00:04, Peter Maydell wrote:
> On 21 April 2016 at 22:41, Alexander Graf <ag...@suse.de> wrote:
>> So effectively all we'd need is to set CNTHCTL_EL2.EL1PCEN to 0 for
>> guests that have no in-kernel irqchip, no? We should then trap on all
>> timer accesses and be able to emulate them in user space where we can
>> inject IRQs into an emulated GIC again.
> 
> You'd still need to define a new userspace ABI for "we stopped
> for a system register trap" and the userspace code to emulate
> the timers as part of KVM rather than as part of TCG, which seems
> like a lot of effort for a mode that you really don't want to be
> using...

It might make sense to have a generic "system register couldn't get
handled" exit code anyway. If nothing else, at least to display
unhandled registers in the qemu context where they appear rather than in
the kernel log (which a guest could deliberately clutter as it stands
today).

With such an interface in place, the rest would only be a few lines of glue.

> For GICv3 it gets trickier again because the interface between
> the interrupt controller and the CPU is no longer as simple
> as "an FIQ line and an IRQ line". (In particular the interrupt
> controller needs to know the CPU's current exception level and
> security state to know if it should raise IRQ or FIQ, which means
> that it needs to be told every time the CPU changes EL. I haven't
> yet figured out if I should model this in the QEMU emulated GICv3
> by having a backdoor callback function for this or by biting
> the bullet and putting the GICv3 cpu interface really in the
> CPU with a properly modelled equivalent of the stream protocol...)

We moved the lapic into target-i386 as well, no? Given that it really is
very close to the cpu these days it might not be a bad idea.


Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: Handle forward time correction gracefully

2016-04-06 Thread Alexander Graf

On 04/06/2016 10:37 AM, Marc Zyngier wrote:

On a host that runs NTP, corrections can have a direct impact on
the background timer that we program on the behalf of a vcpu.

In particular, NTP performing a forward correction will result in
a timer expiring sooner than expected from a guest point of view.
Not a big deal, we kick the vcpu anyway.

But on wake-up, the vcpu thread is going to perform a check to
find out whether or not it should block. And at that point, the
timer check is going to say "timer has not expired yet, go back
to sleep". This results in the timer event being lost forever.

There are multiple ways to handle this. One would be record that
the timer has expired and let kvm_cpu_has_pending_timer return
true in that case, but that would be fairly invasive. Another is
to check for the "short sleep" condition in the hrtimer callback,
and restart the timer for the remaining time when the condition
is detected.

This patch implements the latter, with a bit of refactoring in
order to avoid too much code duplication.

Reported-by: Alexander Graf <ag...@suse.de>
Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>


Oh - before I forget. This should go out with CC stable :)


Alex

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: Handle forward time correction gracefully

2016-04-06 Thread Alexander Graf

On 04/06/2016 10:37 AM, Marc Zyngier wrote:

On a host that runs NTP, corrections can have a direct impact on
the background timer that we program on the behalf of a vcpu.

In particular, NTP performing a forward correction will result in
a timer expiring sooner than expected from a guest point of view.
Not a big deal, we kick the vcpu anyway.

But on wake-up, the vcpu thread is going to perform a check to
find out whether or not it should block. And at that point, the
timer check is going to say "timer has not expired yet, go back
to sleep". This results in the timer event being lost forever.

There are multiple ways to handle this. One would be record that
the timer has expired and let kvm_cpu_has_pending_timer return
true in that case, but that would be fairly invasive. Another is
to check for the "short sleep" condition in the hrtimer callback,
and restart the timer for the remaining time when the condition
is detected.

This patch implements the latter, with a bit of refactoring in
order to avoid too much code duplication.

Reported-by: Alexander Graf <ag...@suse.de>
Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>


Looks good, I can give it a test run later as well, but for now

Reviewed-by: Alexander Graf <ag...@suse.de>


Alex

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: the arm cache coherency cluster v2

2015-05-03 Thread Alexander Graf


On 18.03.15 20:18, Andrew Jones wrote:
 On Wed, Mar 18, 2015 at 03:08:20PM -0400, Andrew Jones wrote:
 In reply to this message I'll send two series' one for KVM and
 one for QEMU. The two series' are their respective component
 complements, and attempt to implement cache coherency for arm
 guests using emulated devices, where the emulator (qemu) uses
 cached memory for the device memory, but the guest uses
 uncached - as device memory is generally used. Right now I've
 just focused on VGA vram.

 This approach is the MADV_UNCACHED type that Paolo suggested.
 This type of approach could also be described as make userspace's
 memory access type match the expected access type of the guest,
 and Mario has suggested using a memory driver, which could have
 the same result.

 The coming series' is inspired by both Paolo's and Mario's
 suggestions, but it uses a kvm memslot flag, rather than an
 madvise flag, and thus for the memory driver, it's just KVM.

 See the thread

 https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg01254.html

 for some more background.

 Thanks in advance for comments.

 drew
 
 I forgot to mention that I've done some light testing with this.
 It seems to work, and without (to eye) noticeable performance
 degradation.

Just for the record, I couldn't get it to work :). But I'm looking
forward to the next version with MMU notifiers!


Alex
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm