Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation

2012-07-02 Thread Avi Kivity
On 06/21/2012 08:54 PM, Christoffer Dall wrote:
 @@ -504,6 +514,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
 *vcpu, struct kvm_run *run)
*/
   preempt_disable();
   local_irq_disable();
 +
 + if (check_new_vmid_gen(kvm)) {
 + local_irq_enable();
 + preempt_enable();
 + continue;
 + }
 +

 I see the same race with signals.  Your signal_pending() check needs to
 be after the local_irq_disable(), otherwise we can enter a guest with a
 pending signal.

 
 that's not functionally incorrect though is it? It may simply increase
 the latency for the signal delivery as far as I can see, but I
 definitely don't mind changing this path in any case.

Nothing guarantees that there will be a next exit.  I think we still run
the timer tick on guest entry, so we'll exit after a few milliseconds,
but there are patches to disable the timer tick if only one task is queued.

-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation

2012-06-21 Thread Avi Kivity
On 06/20/2012 07:40 AM, Christoffer Dall wrote:
 

 cpu0   cpu1
 (vmid=0, gen=1)(gen=0)
 -- --
 gen == global_gen, return

   gen != global_gen
   increment global_gen
   smp_call_function
 reset_vm_context
   vmid=0

 enter with vmid=0  enter with vmid=0

 I can't see how the scenario above can happen. First, no-one can run
 with vmid 0 - it is reserved for the host. If we bump global_gen on
 cpuN, then since we do smp_call_function(x, x, wait=1) we are now sure
 that after this call, all cpus(N-1) potentially being inside a VM will
 have exited, called reset_vm_context, but before they can re-enter
 into the guest, they will call update_vttbr, and if their generation
 counter differs from global_gen, they will try to grab that spinlock
 and everything should happen in order.

 
 the whole vmid=0 confused me a bit. The point is since we moved the
 generation check outside the spin_lock we have to re-check, I see:

In fact I think the problem occured with the original code as well.  The
problem is that the check is not atomic wrt guest entry, so

  spin_lock
  check/update
  spin_unlock

  enter guest

has a hole between spin_unlock and guest entry.

 
  /**
 + * check_new_vmid_gen - check that the VMID is still valid
 + * @kvm: The VM's VMID to checkt
 + *
 + * return true if there is a new generation of VMIDs being used
 + *
 + * The hardware supports only 256 values with the value zero reserved for the
 + * host, so we check if an assigned value belongs to a previous generation,
 + * which which requires us to assign a new value. If we're the first to use a
 + * VMID for the new generation, we must flush necessary caches and TLBs on 
 all
 + * CPUs.
 + */
 +static bool check_new_vmid_gen(struct kvm *kvm)
 +{
 + if (likely(kvm-arch.vmid_gen == atomic64_read(kvm_vmid_gen)))
 + return;
 +}
 +
 +/**
   * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
   * @kvm  The guest that we are about to run
   *
 @@ -324,15 +342,7 @@ static void update_vttbr(struct kvm *kvm)
  {
   phys_addr_t pgd_phys;
 
 - /*
 -  *  Check that the VMID is still valid.
 -  *  (The hardware supports only 256 values with the value zero
 -  *   reserved for the host, so we check if an assigned value belongs to
 -  *   a previous generation, which which requires us to assign a new
 -  *   value. If we're the first to use a VMID for the new generation,
 -  *   we must flush necessary caches and TLBs on all CPUs.)
 -  */
 - if (likely(kvm-arch.vmid_gen == atomic64_read(kvm_vmid_gen)))
 + if (!check_new_vmid_gen(kvm))
   return;
 
   spin_lock(kvm_vmid_lock);
 @@ -504,6 +514,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
 *vcpu, struct kvm_run *run)
*/
   preempt_disable();
   local_irq_disable();
 +
 + if (check_new_vmid_gen(kvm)) {
 + local_irq_enable();
 + preempt_enable();
 + continue;
 + }
 +

I see the same race with signals.  Your signal_pending() check needs to
be after the local_irq_disable(), otherwise we can enter a guest with a
pending signal.

Better place the signal check before the vmid_gen check, to avoid the
possibility of a a signal being held up by other guests.

-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation

2012-06-21 Thread Christoffer Dall
On Thu, Jun 21, 2012 at 4:13 AM, Avi Kivity a...@redhat.com wrote:
 On 06/20/2012 07:40 AM, Christoffer Dall wrote:


 cpu0                       cpu1
 (vmid=0, gen=1)            (gen=0)
 -- --
 gen == global_gen, return

                           gen != global_gen
                           increment global_gen
                           smp_call_function
 reset_vm_context
                           vmid=0

 enter with vmid=0          enter with vmid=0

 I can't see how the scenario above can happen. First, no-one can run
 with vmid 0 - it is reserved for the host. If we bump global_gen on
 cpuN, then since we do smp_call_function(x, x, wait=1) we are now sure
 that after this call, all cpus(N-1) potentially being inside a VM will
 have exited, called reset_vm_context, but before they can re-enter
 into the guest, they will call update_vttbr, and if their generation
 counter differs from global_gen, they will try to grab that spinlock
 and everything should happen in order.


 the whole vmid=0 confused me a bit. The point is since we moved the
 generation check outside the spin_lock we have to re-check, I see:

 In fact I think the problem occured with the original code as well.  The
 problem is that the check is not atomic wrt guest entry, so

  spin_lock
  check/update
  spin_unlock

  enter guest

 has a hole between spin_unlock and guest entry.


you are absolutely right.


  /**
 + * check_new_vmid_gen - check that the VMID is still valid
 + * @kvm: The VM's VMID to checkt
 + *
 + * return true if there is a new generation of VMIDs being used
 + *
 + * The hardware supports only 256 values with the value zero reserved for 
 the
 + * host, so we check if an assigned value belongs to a previous generation,
 + * which which requires us to assign a new value. If we're the first to use 
 a
 + * VMID for the new generation, we must flush necessary caches and TLBs on 
 all
 + * CPUs.
 + */
 +static bool check_new_vmid_gen(struct kvm *kvm)
 +{
 +     if (likely(kvm-arch.vmid_gen == atomic64_read(kvm_vmid_gen)))
 +             return;
 +}
 +
 +/**
   * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
   * @kvm      The guest that we are about to run
   *
 @@ -324,15 +342,7 @@ static void update_vttbr(struct kvm *kvm)
  {
       phys_addr_t pgd_phys;

 -     /*
 -      *  Check that the VMID is still valid.
 -      *  (The hardware supports only 256 values with the value zero
 -      *   reserved for the host, so we check if an assigned value belongs to
 -      *   a previous generation, which which requires us to assign a new
 -      *   value. If we're the first to use a VMID for the new generation,
 -      *   we must flush necessary caches and TLBs on all CPUs.)
 -      */
 -     if (likely(kvm-arch.vmid_gen == atomic64_read(kvm_vmid_gen)))
 +     if (!check_new_vmid_gen(kvm))
               return;

       spin_lock(kvm_vmid_lock);
 @@ -504,6 +514,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
 *vcpu, struct kvm_run *run)
                */
               preempt_disable();
               local_irq_disable();
 +
 +             if (check_new_vmid_gen(kvm)) {
 +                     local_irq_enable();
 +                     preempt_enable();
 +                     continue;
 +             }
 +

 I see the same race with signals.  Your signal_pending() check needs to
 be after the local_irq_disable(), otherwise we can enter a guest with a
 pending signal.


that's not functionally incorrect though is it? It may simply increase
the latency for the signal delivery as far as I can see, but I
definitely don't mind changing this path in any case.


 Better place the signal check before the vmid_gen check, to avoid the
 possibility of a a signal being held up by other guests.


nice ;)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation

2012-06-19 Thread Avi Kivity
On 06/19/2012 01:05 AM, Christoffer Dall wrote:
 Premature, but this is sad.  I suggest you split vmid generation from
 next available vmid.  This allows you to make the generation counter
 atomic so it may be read outside the lock.

 You can do

if (likely(kvm-arch.vmd_gen) == atomic_read(kvm_vmid_gen)))
   return;

spin_lock(...

 
 I knew you were going to say something here :), please take a look at
 this and see if you agree:

It looks reasonable wrt locking.

 +
 + /* First user of a new VMID generation? */
 + if (unlikely(kvm_next_vmid == 0)) {
 + atomic64_inc(kvm_vmid_gen);
 + kvm_next_vmid = 1;
 +
 + /* This does nothing on UP */
 + smp_call_function(reset_vm_context, NULL, 1);

Does this imply a memory barrier?  If not, smp_mb__after_atomic_inc().

 +
 + /*
 +  * On SMP we know no other CPUs can use this CPU's or
 +  * each other's VMID since the kvm_vmid_lock blocks
 +  * them from reentry to the guest.
 +  */
 +
 + reset_vm_context(NULL);

These two lines can be folded as on_each_cpu().

Don't you have a race here where you can increment the generation just
before guest entry?

cpu0   cpu1
(vmid=0, gen=1)(gen=0)
-- --
gen == global_gen, return

   gen != global_gen
   increment global_gen
   smp_call_function
reset_vm_context
   vmid=0

enter with vmid=0  enter with vmid=0

You must recheck gen after disabling interrupts to ensure global_gen
didn't bump after update_vttbr but before entry.  x86 has a lot of this,
see vcpu_enter_guest() and vcpu-requests (doesn't apply directly to
your case but may come in useful later).

 
 +
 +/**
 + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest 
 code
 + * @vcpu:The VCPU pointer
 + * @run: The kvm_run structure pointer used for userspace state 
 exchange
 + *
 + * This function is called through the VCPU_RUN ioctl called from user 
 space. It
 + * will execute VM code in a loop until the time slice for the process is 
 used
 + * or some emulation is needed from user space in which case the function 
 will
 + * return with return value 0 and with the kvm_run structure filled in 
 with the
 + * required data for the requested emulation.
 + */
  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
  {
 - return -EINVAL;
 + int ret = 0;
 + sigset_t sigsaved;
 +
 + if (vcpu-sigset_active)
 + sigprocmask(SIG_SETMASK, vcpu-sigset, sigsaved);
 +
 + run-exit_reason = KVM_EXIT_UNKNOWN;
 + while (run-exit_reason == KVM_EXIT_UNKNOWN) {

 It's not a good idea to read stuff from run unless it's part of the ABI,
 since userspace can play with it while you're reading it.  It's harmless
 here but in other places it can lead to a vulnerability.

 
 ok, so in this case, userspace can 'suppress' an MMIO or interrupt
 window operation.
 
 I can change to keep some local variable to maintain the state and
 have some API convention for emulation functions, if you feel strongly
 about it, but otherwise it feels to me like the code will be more
 complicated. Do you have other ideas?

x86 uses:

 0 - return to userspace (run prepared)
 1 - return to guest (run untouched)
 -ESOMETHING - return to userspace

as return values from handlers and for locals (usually named 'r').


-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation

2012-06-19 Thread Christoffer Dall
On Tue, Jun 19, 2012 at 5:16 AM, Avi Kivity a...@redhat.com wrote:
 On 06/19/2012 01:05 AM, Christoffer Dall wrote:
 Premature, but this is sad.  I suggest you split vmid generation from
 next available vmid.  This allows you to make the generation counter
 atomic so it may be read outside the lock.

 You can do

    if (likely(kvm-arch.vmd_gen) == atomic_read(kvm_vmid_gen)))
           return;

    spin_lock(...


 I knew you were going to say something here :), please take a look at
 this and see if you agree:

 It looks reasonable wrt locking.

 +
 +     /* First user of a new VMID generation? */
 +     if (unlikely(kvm_next_vmid == 0)) {
 +             atomic64_inc(kvm_vmid_gen);
 +             kvm_next_vmid = 1;
 +
 +             /* This does nothing on UP */
 +             smp_call_function(reset_vm_context, NULL, 1);

 Does this imply a memory barrier?  If not, smp_mb__after_atomic_inc().


yes, it implies a memory barrier.

 +
 +             /*
 +              * On SMP we know no other CPUs can use this CPU's or
 +              * each other's VMID since the kvm_vmid_lock blocks
 +              * them from reentry to the guest.
 +              */
 +
 +             reset_vm_context(NULL);

 These two lines can be folded as on_each_cpu().

 Don't you have a race here where you can increment the generation just
 before guest entry?

I don't think I do.


 cpu0                       cpu1
 (vmid=0, gen=1)            (gen=0)
 -- --
 gen == global_gen, return

                           gen != global_gen
                           increment global_gen
                           smp_call_function
 reset_vm_context
                           vmid=0

 enter with vmid=0          enter with vmid=0

I can't see how the scenario above can happen. First, no-one can run
with vmid 0 - it is reserved for the host. If we bump global_gen on
cpuN, then since we do smp_call_function(x, x, wait=1) we are now sure
that after this call, all cpus(N-1) potentially being inside a VM will
have exited, called reset_vm_context, but before they can re-enter
into the guest, they will call update_vttbr, and if their generation
counter differs from global_gen, they will try to grab that spinlock
and everything should happen in order.


 You must recheck gen after disabling interrupts to ensure global_gen
 didn't bump after update_vttbr but before entry.  x86 has a lot of this,
 see vcpu_enter_guest() and vcpu-requests (doesn't apply directly to
 your case but may come in useful later).


 +
 +/**
 + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest 
 code
 + * @vcpu:    The VCPU pointer
 + * @run:     The kvm_run structure pointer used for userspace state 
 exchange
 + *
 + * This function is called through the VCPU_RUN ioctl called from user 
 space. It
 + * will execute VM code in a loop until the time slice for the process is 
 used
 + * or some emulation is needed from user space in which case the function 
 will
 + * return with return value 0 and with the kvm_run structure filled in 
 with the
 + * required data for the requested emulation.
 + */
  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
  {
 -     return -EINVAL;
 +     int ret = 0;
 +     sigset_t sigsaved;
 +
 +     if (vcpu-sigset_active)
 +             sigprocmask(SIG_SETMASK, vcpu-sigset, sigsaved);
 +
 +     run-exit_reason = KVM_EXIT_UNKNOWN;
 +     while (run-exit_reason == KVM_EXIT_UNKNOWN) {

 It's not a good idea to read stuff from run unless it's part of the ABI,
 since userspace can play with it while you're reading it.  It's harmless
 here but in other places it can lead to a vulnerability.


 ok, so in this case, userspace can 'suppress' an MMIO or interrupt
 window operation.

 I can change to keep some local variable to maintain the state and
 have some API convention for emulation functions, if you feel strongly
 about it, but otherwise it feels to me like the code will be more
 complicated. Do you have other ideas?

 x86 uses:

  0 - return to userspace (run prepared)
  1 - return to guest (run untouched)
  -ESOMETHING - return to userspace


yeah, we can do that I guess, that's fair.

 as return values from handlers and for locals (usually named 'r').


 --
 error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation

2012-06-19 Thread Christoffer Dall
On Tue, Jun 19, 2012 at 11:27 PM, Christoffer Dall
c.d...@virtualopensystems.com wrote:
 On Tue, Jun 19, 2012 at 5:16 AM, Avi Kivity a...@redhat.com wrote:
 On 06/19/2012 01:05 AM, Christoffer Dall wrote:
 Premature, but this is sad.  I suggest you split vmid generation from
 next available vmid.  This allows you to make the generation counter
 atomic so it may be read outside the lock.

 You can do

    if (likely(kvm-arch.vmd_gen) == atomic_read(kvm_vmid_gen)))
           return;

    spin_lock(...


 I knew you were going to say something here :), please take a look at
 this and see if you agree:

 It looks reasonable wrt locking.

 +
 +     /* First user of a new VMID generation? */
 +     if (unlikely(kvm_next_vmid == 0)) {
 +             atomic64_inc(kvm_vmid_gen);
 +             kvm_next_vmid = 1;
 +
 +             /* This does nothing on UP */
 +             smp_call_function(reset_vm_context, NULL, 1);

 Does this imply a memory barrier?  If not, smp_mb__after_atomic_inc().


 yes, it implies a memory barrier.

 +
 +             /*
 +              * On SMP we know no other CPUs can use this CPU's or
 +              * each other's VMID since the kvm_vmid_lock blocks
 +              * them from reentry to the guest.
 +              */
 +
 +             reset_vm_context(NULL);

 These two lines can be folded as on_each_cpu().

 Don't you have a race here where you can increment the generation just
 before guest entry?

 I don't think I do.


uh, strike that, I do.


 cpu0                       cpu1
 (vmid=0, gen=1)            (gen=0)
 -- --
 gen == global_gen, return

                           gen != global_gen
                           increment global_gen
                           smp_call_function
 reset_vm_context
                           vmid=0

 enter with vmid=0          enter with vmid=0

 I can't see how the scenario above can happen. First, no-one can run
 with vmid 0 - it is reserved for the host. If we bump global_gen on
 cpuN, then since we do smp_call_function(x, x, wait=1) we are now sure
 that after this call, all cpus(N-1) potentially being inside a VM will
 have exited, called reset_vm_context, but before they can re-enter
 into the guest, they will call update_vttbr, and if their generation
 counter differs from global_gen, they will try to grab that spinlock
 and everything should happen in order.


the whole vmid=0 confused me a bit. The point is since we moved the
generation check outside the spin_lock we have to re-check, I see:

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 19fe3b0..74760af 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -313,6 +313,24 @@ static void reset_vm_context(void *info)
 }

 /**
+ * check_new_vmid_gen - check that the VMID is still valid
+ * @kvm: The VM's VMID to checkt
+ *
+ * return true if there is a new generation of VMIDs being used
+ *
+ * The hardware supports only 256 values with the value zero reserved for the
+ * host, so we check if an assigned value belongs to a previous generation,
+ * which which requires us to assign a new value. If we're the first to use a
+ * VMID for the new generation, we must flush necessary caches and TLBs on all
+ * CPUs.
+ */
+static bool check_new_vmid_gen(struct kvm *kvm)
+{
+   if (likely(kvm-arch.vmid_gen == atomic64_read(kvm_vmid_gen)))
+   return;
+}
+
+/**
  * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
  * @kvmThe guest that we are about to run
  *
@@ -324,15 +342,7 @@ static void update_vttbr(struct kvm *kvm)
 {
phys_addr_t pgd_phys;

-   /*
-*  Check that the VMID is still valid.
-*  (The hardware supports only 256 values with the value zero
-*   reserved for the host, so we check if an assigned value belongs to
-*   a previous generation, which which requires us to assign a new
-*   value. If we're the first to use a VMID for the new generation,
-*   we must flush necessary caches and TLBs on all CPUs.)
-*/
-   if (likely(kvm-arch.vmid_gen == atomic64_read(kvm_vmid_gen)))
+   if (!check_new_vmid_gen(kvm))
return;

spin_lock(kvm_vmid_lock);
@@ -504,6 +514,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
*vcpu, struct kvm_run *run)
 */
preempt_disable();
local_irq_disable();
+
+   if (check_new_vmid_gen(kvm)) {
+   local_irq_enable();
+   preempt_enable();
+   continue;
+   }
+
kvm_guest_enter();
vcpu-mode = IN_GUEST_MODE;


 You must recheck gen after disabling interrupts to ensure global_gen
 didn't bump after update_vttbr but before entry.  x86 has a lot of this,
 see vcpu_enter_guest() and vcpu-requests (doesn't apply directly to
 your case but may come in useful later).


 +
 +/**
 + * kvm_arch_vcpu_ioctl_run - 

Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation

2012-06-18 Thread Avi Kivity
On 06/15/2012 10:08 PM, Christoffer Dall wrote:
 Provides complete world-switch implementation to switch to other guests
 running in non-secure modes. Includes Hyp exception handlers that
 capture necessary exception information and stores the information on
 the VCPU and KVM structures.
 
 Switching to Hyp mode is done through a simple HVC instructions. The
 exception vector code will check that the HVC comes from VMID==0 and if
 so will store the necessary state on the Hyp stack, which will look like
 this (growing downwards, see the hyp_hvc handler):
   ...
   Hyp_Sp + 4: spsr (Host-SVC cpsr)
   Hyp_Sp: lr_usr
 
 When returning from Hyp mode to SVC mode, another HVC instruction is
 executed from Hyp mode, which is taken in the Hyp_Svc handler. The Hyp
 stack pointer should be where it was left from the above initial call,
 since the values on the stack will be used to restore state (see
 hyp_svc).
 
 Otherwise, the world-switch is pretty straight-forward. All state that
 can be modified by the guest is first backed up on the Hyp stack and the
 VCPU values is loaded onto the hardware. State, which is not loaded, but
 theoretically modifiable by the guest is protected through the
 virtualiation features to generate a trap and cause software emulation.
 Upon guest returns, all state is restored from hardware onto the VCPU
 struct and the original state is restored from the Hyp-stack onto the
 hardware.
 
 One controversy may be the back-door call to __irq_svc (the host
 kernel's own physical IRQ handler) which is called when a physical IRQ
 exception is taken in Hyp mode while running in the guest.
 
 SMP support using the VMPIDR calculated on the basis of the host MPIDR
 and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
 
 Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
 a separate patch into the appropriate patches introducing the
 functionality. Note that the VMIDs are stored per VM as required by the ARM
 architecture reference manual.
 
 +
 +/**
 + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
 + * @kvm  The guest that we are about to run
 + *
 + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure 
 the
 + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
 + * caches and TLBs.
 + */
 +static void update_vttbr(struct kvm *kvm)
 +{
 + phys_addr_t pgd_phys;
 +
 + spin_lock(kvm_vmid_lock);

Premature, but this is sad.  I suggest you split vmid generation from
next available vmid.  This allows you to make the generation counter
atomic so it may be read outside the lock.

You can do

if (likely(kvm-arch.vmd_gen) == atomic_read(kvm_vmid_gen)))
   return;

spin_lock(...

 +
 + /*
 +  *  Check that the VMID is still valid.
 +  *  (The hardware supports only 256 values with the value zero
 +  *   reserved for the host, so we check if an assigned value has rolled
 +  *   over a sequence, which requires us to assign a new value and flush
 +  *   necessary caches and TLBs on all CPUs.)
 +  */
 + if (unlikely((kvm-arch.vmid ^ next_vmid)  VMID_BITS)) {
 + /* Check for a new VMID generation */
 + if (unlikely((next_vmid  VMID_MASK) == 0)) {
 + /* Check for the (very unlikely) 64-bit wrap around */

Unlikely?  it's impossible.

 +
 +/**
 + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
 + * @vcpu:The VCPU pointer
 + * @run: The kvm_run structure pointer used for userspace state exchange
 + *
 + * This function is called through the VCPU_RUN ioctl called from user 
 space. It
 + * will execute VM code in a loop until the time slice for the process is 
 used
 + * or some emulation is needed from user space in which case the function 
 will
 + * return with return value 0 and with the kvm_run structure filled in with 
 the
 + * required data for the requested emulation.
 + */
  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
  {
 - return -EINVAL;
 + int ret = 0;
 + sigset_t sigsaved;
 +
 + if (vcpu-sigset_active)
 + sigprocmask(SIG_SETMASK, vcpu-sigset, sigsaved);
 +
 + run-exit_reason = KVM_EXIT_UNKNOWN;
 + while (run-exit_reason == KVM_EXIT_UNKNOWN) {

It's not a good idea to read stuff from run unless it's part of the ABI,
since userspace can play with it while you're reading it.  It's harmless
here but in other places it can lead to a vulnerability.

 + /*
 +  * Check conditions before entering the guest
 +  */
 + if (need_resched())
 + kvm_resched(vcpu);

I think cond_resched() is all that's needed these days (kvm_resched()
predates preempt notifiers).

 +
 + if (signal_pending(current)) {
 + ret = -EINTR;
 + run-exit_reason = KVM_EXIT_INTR;
 + 

Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation

2012-06-18 Thread Christoffer Dall
On Mon, Jun 18, 2012 at 9:41 AM, Avi Kivity a...@redhat.com wrote:
 On 06/15/2012 10:08 PM, Christoffer Dall wrote:
 Provides complete world-switch implementation to switch to other guests
 running in non-secure modes. Includes Hyp exception handlers that
 capture necessary exception information and stores the information on
 the VCPU and KVM structures.

 Switching to Hyp mode is done through a simple HVC instructions. The
 exception vector code will check that the HVC comes from VMID==0 and if
 so will store the necessary state on the Hyp stack, which will look like
 this (growing downwards, see the hyp_hvc handler):
   ...
   Hyp_Sp + 4: spsr (Host-SVC cpsr)
   Hyp_Sp    : lr_usr

 When returning from Hyp mode to SVC mode, another HVC instruction is
 executed from Hyp mode, which is taken in the Hyp_Svc handler. The Hyp
 stack pointer should be where it was left from the above initial call,
 since the values on the stack will be used to restore state (see
 hyp_svc).

 Otherwise, the world-switch is pretty straight-forward. All state that
 can be modified by the guest is first backed up on the Hyp stack and the
 VCPU values is loaded onto the hardware. State, which is not loaded, but
 theoretically modifiable by the guest is protected through the
 virtualiation features to generate a trap and cause software emulation.
 Upon guest returns, all state is restored from hardware onto the VCPU
 struct and the original state is restored from the Hyp-stack onto the
 hardware.

 One controversy may be the back-door call to __irq_svc (the host
 kernel's own physical IRQ handler) which is called when a physical IRQ
 exception is taken in Hyp mode while running in the guest.

 SMP support using the VMPIDR calculated on the basis of the host MPIDR
 and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.

 Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
 a separate patch into the appropriate patches introducing the
 functionality. Note that the VMIDs are stored per VM as required by the ARM
 architecture reference manual.

 +
 +/**
 + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
 + * @kvm      The guest that we are about to run
 + *
 + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure 
 the
 + * VM has a valid VMID, otherwise assigns a new one and flushes 
 corresponding
 + * caches and TLBs.
 + */
 +static void update_vttbr(struct kvm *kvm)
 +{
 +     phys_addr_t pgd_phys;
 +
 +     spin_lock(kvm_vmid_lock);

 Premature, but this is sad.  I suggest you split vmid generation from
 next available vmid.  This allows you to make the generation counter
 atomic so it may be read outside the lock.

 You can do

    if (likely(kvm-arch.vmd_gen) == atomic_read(kvm_vmid_gen)))
           return;

    spin_lock(...


I knew you were going to say something here :), please take a look at
this and see if you agree:

+struct kvm_arch {
+   /* The VMID generation used for the virt. memory system */
+   u64vmid_gen;
+   u32vmid;
+
+   /* 1-level 2nd stage table and lock */
+   struct mutex pgd_mutex;
+   pgd_t *pgd;
+
+   /* VTTBR value associated with above pgd and vmid */
+   u64vttbr;
+};

[snip]

+/* The VMID used in the VTTBR */
+static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
+static u8 kvm_next_vmid;
+DEFINE_SPINLOCK(kvm_vmid_lock);

[snip]

+/**
+ * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
+ * @kvmThe guest that we are about to run
+ *
+ * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
+ * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
+ * caches and TLBs.
+ */
+static void update_vttbr(struct kvm *kvm)
+{
+   phys_addr_t pgd_phys;
+
+   /*
+*  Check that the VMID is still valid.
+*  (The hardware supports only 256 values with the value zero
+*   reserved for the host, so we check if an assigned value belongs to
+*   a previous generation, which which requires us to assign a new
+*   value. If we're the first to use a VMID for the new generation,
+*   we must flush necessary caches and TLBs on all CPUs.)
+*/
+   if (likely(kvm-arch.vmid_gen == atomic64_read(kvm_vmid_gen)))
+   return;
+
+   spin_lock(kvm_vmid_lock);
+
+   /* First user of a new VMID generation? */
+   if (unlikely(kvm_next_vmid == 0)) {
+   atomic64_inc(kvm_vmid_gen);
+   kvm_next_vmid = 1;
+
+   /* This does nothing on UP */
+   smp_call_function(reset_vm_context, NULL, 1);
+
+   /*
+* On SMP we know no other CPUs can use this CPU's or
+* each other's VMID since the kvm_vmid_lock blocks
+* them from reentry to the guest.
+*/
+
+   reset_vm_context(NULL);
+   }
+
+   kvm-arch.vmid =