Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
On 06/21/2012 08:54 PM, Christoffer Dall wrote: @@ -504,6 +514,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) */ preempt_disable(); local_irq_disable(); + + if (check_new_vmid_gen(kvm)) { + local_irq_enable(); + preempt_enable(); + continue; + } + I see the same race with signals. Your signal_pending() check needs to be after the local_irq_disable(), otherwise we can enter a guest with a pending signal. that's not functionally incorrect though is it? It may simply increase the latency for the signal delivery as far as I can see, but I definitely don't mind changing this path in any case. Nothing guarantees that there will be a next exit. I think we still run the timer tick on guest entry, so we'll exit after a few milliseconds, but there are patches to disable the timer tick if only one task is queued. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
On 06/20/2012 07:40 AM, Christoffer Dall wrote: cpu0 cpu1 (vmid=0, gen=1)(gen=0) -- -- gen == global_gen, return gen != global_gen increment global_gen smp_call_function reset_vm_context vmid=0 enter with vmid=0 enter with vmid=0 I can't see how the scenario above can happen. First, no-one can run with vmid 0 - it is reserved for the host. If we bump global_gen on cpuN, then since we do smp_call_function(x, x, wait=1) we are now sure that after this call, all cpus(N-1) potentially being inside a VM will have exited, called reset_vm_context, but before they can re-enter into the guest, they will call update_vttbr, and if their generation counter differs from global_gen, they will try to grab that spinlock and everything should happen in order. the whole vmid=0 confused me a bit. The point is since we moved the generation check outside the spin_lock we have to re-check, I see: In fact I think the problem occured with the original code as well. The problem is that the check is not atomic wrt guest entry, so spin_lock check/update spin_unlock enter guest has a hole between spin_unlock and guest entry. /** + * check_new_vmid_gen - check that the VMID is still valid + * @kvm: The VM's VMID to checkt + * + * return true if there is a new generation of VMIDs being used + * + * The hardware supports only 256 values with the value zero reserved for the + * host, so we check if an assigned value belongs to a previous generation, + * which which requires us to assign a new value. If we're the first to use a + * VMID for the new generation, we must flush necessary caches and TLBs on all + * CPUs. + */ +static bool check_new_vmid_gen(struct kvm *kvm) +{ + if (likely(kvm-arch.vmid_gen == atomic64_read(kvm_vmid_gen))) + return; +} + +/** * update_vttbr - Update the VTTBR with a valid VMID before the guest runs * @kvm The guest that we are about to run * @@ -324,15 +342,7 @@ static void update_vttbr(struct kvm *kvm) { phys_addr_t pgd_phys; - /* - * Check that the VMID is still valid. - * (The hardware supports only 256 values with the value zero - * reserved for the host, so we check if an assigned value belongs to - * a previous generation, which which requires us to assign a new - * value. If we're the first to use a VMID for the new generation, - * we must flush necessary caches and TLBs on all CPUs.) - */ - if (likely(kvm-arch.vmid_gen == atomic64_read(kvm_vmid_gen))) + if (!check_new_vmid_gen(kvm)) return; spin_lock(kvm_vmid_lock); @@ -504,6 +514,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) */ preempt_disable(); local_irq_disable(); + + if (check_new_vmid_gen(kvm)) { + local_irq_enable(); + preempt_enable(); + continue; + } + I see the same race with signals. Your signal_pending() check needs to be after the local_irq_disable(), otherwise we can enter a guest with a pending signal. Better place the signal check before the vmid_gen check, to avoid the possibility of a a signal being held up by other guests. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
On Thu, Jun 21, 2012 at 4:13 AM, Avi Kivity a...@redhat.com wrote: On 06/20/2012 07:40 AM, Christoffer Dall wrote: cpu0 cpu1 (vmid=0, gen=1) (gen=0) -- -- gen == global_gen, return gen != global_gen increment global_gen smp_call_function reset_vm_context vmid=0 enter with vmid=0 enter with vmid=0 I can't see how the scenario above can happen. First, no-one can run with vmid 0 - it is reserved for the host. If we bump global_gen on cpuN, then since we do smp_call_function(x, x, wait=1) we are now sure that after this call, all cpus(N-1) potentially being inside a VM will have exited, called reset_vm_context, but before they can re-enter into the guest, they will call update_vttbr, and if their generation counter differs from global_gen, they will try to grab that spinlock and everything should happen in order. the whole vmid=0 confused me a bit. The point is since we moved the generation check outside the spin_lock we have to re-check, I see: In fact I think the problem occured with the original code as well. The problem is that the check is not atomic wrt guest entry, so spin_lock check/update spin_unlock enter guest has a hole between spin_unlock and guest entry. you are absolutely right. /** + * check_new_vmid_gen - check that the VMID is still valid + * @kvm: The VM's VMID to checkt + * + * return true if there is a new generation of VMIDs being used + * + * The hardware supports only 256 values with the value zero reserved for the + * host, so we check if an assigned value belongs to a previous generation, + * which which requires us to assign a new value. If we're the first to use a + * VMID for the new generation, we must flush necessary caches and TLBs on all + * CPUs. + */ +static bool check_new_vmid_gen(struct kvm *kvm) +{ + if (likely(kvm-arch.vmid_gen == atomic64_read(kvm_vmid_gen))) + return; +} + +/** * update_vttbr - Update the VTTBR with a valid VMID before the guest runs * @kvm The guest that we are about to run * @@ -324,15 +342,7 @@ static void update_vttbr(struct kvm *kvm) { phys_addr_t pgd_phys; - /* - * Check that the VMID is still valid. - * (The hardware supports only 256 values with the value zero - * reserved for the host, so we check if an assigned value belongs to - * a previous generation, which which requires us to assign a new - * value. If we're the first to use a VMID for the new generation, - * we must flush necessary caches and TLBs on all CPUs.) - */ - if (likely(kvm-arch.vmid_gen == atomic64_read(kvm_vmid_gen))) + if (!check_new_vmid_gen(kvm)) return; spin_lock(kvm_vmid_lock); @@ -504,6 +514,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) */ preempt_disable(); local_irq_disable(); + + if (check_new_vmid_gen(kvm)) { + local_irq_enable(); + preempt_enable(); + continue; + } + I see the same race with signals. Your signal_pending() check needs to be after the local_irq_disable(), otherwise we can enter a guest with a pending signal. that's not functionally incorrect though is it? It may simply increase the latency for the signal delivery as far as I can see, but I definitely don't mind changing this path in any case. Better place the signal check before the vmid_gen check, to avoid the possibility of a a signal being held up by other guests. nice ;) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
On 06/19/2012 01:05 AM, Christoffer Dall wrote: Premature, but this is sad. I suggest you split vmid generation from next available vmid. This allows you to make the generation counter atomic so it may be read outside the lock. You can do if (likely(kvm-arch.vmd_gen) == atomic_read(kvm_vmid_gen))) return; spin_lock(... I knew you were going to say something here :), please take a look at this and see if you agree: It looks reasonable wrt locking. + + /* First user of a new VMID generation? */ + if (unlikely(kvm_next_vmid == 0)) { + atomic64_inc(kvm_vmid_gen); + kvm_next_vmid = 1; + + /* This does nothing on UP */ + smp_call_function(reset_vm_context, NULL, 1); Does this imply a memory barrier? If not, smp_mb__after_atomic_inc(). + + /* + * On SMP we know no other CPUs can use this CPU's or + * each other's VMID since the kvm_vmid_lock blocks + * them from reentry to the guest. + */ + + reset_vm_context(NULL); These two lines can be folded as on_each_cpu(). Don't you have a race here where you can increment the generation just before guest entry? cpu0 cpu1 (vmid=0, gen=1)(gen=0) -- -- gen == global_gen, return gen != global_gen increment global_gen smp_call_function reset_vm_context vmid=0 enter with vmid=0 enter with vmid=0 You must recheck gen after disabling interrupts to ensure global_gen didn't bump after update_vttbr but before entry. x86 has a lot of this, see vcpu_enter_guest() and vcpu-requests (doesn't apply directly to your case but may come in useful later). + +/** + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code + * @vcpu:The VCPU pointer + * @run: The kvm_run structure pointer used for userspace state exchange + * + * This function is called through the VCPU_RUN ioctl called from user space. It + * will execute VM code in a loop until the time slice for the process is used + * or some emulation is needed from user space in which case the function will + * return with return value 0 and with the kvm_run structure filled in with the + * required data for the requested emulation. + */ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) { - return -EINVAL; + int ret = 0; + sigset_t sigsaved; + + if (vcpu-sigset_active) + sigprocmask(SIG_SETMASK, vcpu-sigset, sigsaved); + + run-exit_reason = KVM_EXIT_UNKNOWN; + while (run-exit_reason == KVM_EXIT_UNKNOWN) { It's not a good idea to read stuff from run unless it's part of the ABI, since userspace can play with it while you're reading it. It's harmless here but in other places it can lead to a vulnerability. ok, so in this case, userspace can 'suppress' an MMIO or interrupt window operation. I can change to keep some local variable to maintain the state and have some API convention for emulation functions, if you feel strongly about it, but otherwise it feels to me like the code will be more complicated. Do you have other ideas? x86 uses: 0 - return to userspace (run prepared) 1 - return to guest (run untouched) -ESOMETHING - return to userspace as return values from handlers and for locals (usually named 'r'). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
On Tue, Jun 19, 2012 at 5:16 AM, Avi Kivity a...@redhat.com wrote: On 06/19/2012 01:05 AM, Christoffer Dall wrote: Premature, but this is sad. I suggest you split vmid generation from next available vmid. This allows you to make the generation counter atomic so it may be read outside the lock. You can do if (likely(kvm-arch.vmd_gen) == atomic_read(kvm_vmid_gen))) return; spin_lock(... I knew you were going to say something here :), please take a look at this and see if you agree: It looks reasonable wrt locking. + + /* First user of a new VMID generation? */ + if (unlikely(kvm_next_vmid == 0)) { + atomic64_inc(kvm_vmid_gen); + kvm_next_vmid = 1; + + /* This does nothing on UP */ + smp_call_function(reset_vm_context, NULL, 1); Does this imply a memory barrier? If not, smp_mb__after_atomic_inc(). yes, it implies a memory barrier. + + /* + * On SMP we know no other CPUs can use this CPU's or + * each other's VMID since the kvm_vmid_lock blocks + * them from reentry to the guest. + */ + + reset_vm_context(NULL); These two lines can be folded as on_each_cpu(). Don't you have a race here where you can increment the generation just before guest entry? I don't think I do. cpu0 cpu1 (vmid=0, gen=1) (gen=0) -- -- gen == global_gen, return gen != global_gen increment global_gen smp_call_function reset_vm_context vmid=0 enter with vmid=0 enter with vmid=0 I can't see how the scenario above can happen. First, no-one can run with vmid 0 - it is reserved for the host. If we bump global_gen on cpuN, then since we do smp_call_function(x, x, wait=1) we are now sure that after this call, all cpus(N-1) potentially being inside a VM will have exited, called reset_vm_context, but before they can re-enter into the guest, they will call update_vttbr, and if their generation counter differs from global_gen, they will try to grab that spinlock and everything should happen in order. You must recheck gen after disabling interrupts to ensure global_gen didn't bump after update_vttbr but before entry. x86 has a lot of this, see vcpu_enter_guest() and vcpu-requests (doesn't apply directly to your case but may come in useful later). + +/** + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code + * @vcpu: The VCPU pointer + * @run: The kvm_run structure pointer used for userspace state exchange + * + * This function is called through the VCPU_RUN ioctl called from user space. It + * will execute VM code in a loop until the time slice for the process is used + * or some emulation is needed from user space in which case the function will + * return with return value 0 and with the kvm_run structure filled in with the + * required data for the requested emulation. + */ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) { - return -EINVAL; + int ret = 0; + sigset_t sigsaved; + + if (vcpu-sigset_active) + sigprocmask(SIG_SETMASK, vcpu-sigset, sigsaved); + + run-exit_reason = KVM_EXIT_UNKNOWN; + while (run-exit_reason == KVM_EXIT_UNKNOWN) { It's not a good idea to read stuff from run unless it's part of the ABI, since userspace can play with it while you're reading it. It's harmless here but in other places it can lead to a vulnerability. ok, so in this case, userspace can 'suppress' an MMIO or interrupt window operation. I can change to keep some local variable to maintain the state and have some API convention for emulation functions, if you feel strongly about it, but otherwise it feels to me like the code will be more complicated. Do you have other ideas? x86 uses: 0 - return to userspace (run prepared) 1 - return to guest (run untouched) -ESOMETHING - return to userspace yeah, we can do that I guess, that's fair. as return values from handlers and for locals (usually named 'r'). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
On Tue, Jun 19, 2012 at 11:27 PM, Christoffer Dall c.d...@virtualopensystems.com wrote: On Tue, Jun 19, 2012 at 5:16 AM, Avi Kivity a...@redhat.com wrote: On 06/19/2012 01:05 AM, Christoffer Dall wrote: Premature, but this is sad. I suggest you split vmid generation from next available vmid. This allows you to make the generation counter atomic so it may be read outside the lock. You can do if (likely(kvm-arch.vmd_gen) == atomic_read(kvm_vmid_gen))) return; spin_lock(... I knew you were going to say something here :), please take a look at this and see if you agree: It looks reasonable wrt locking. + + /* First user of a new VMID generation? */ + if (unlikely(kvm_next_vmid == 0)) { + atomic64_inc(kvm_vmid_gen); + kvm_next_vmid = 1; + + /* This does nothing on UP */ + smp_call_function(reset_vm_context, NULL, 1); Does this imply a memory barrier? If not, smp_mb__after_atomic_inc(). yes, it implies a memory barrier. + + /* + * On SMP we know no other CPUs can use this CPU's or + * each other's VMID since the kvm_vmid_lock blocks + * them from reentry to the guest. + */ + + reset_vm_context(NULL); These two lines can be folded as on_each_cpu(). Don't you have a race here where you can increment the generation just before guest entry? I don't think I do. uh, strike that, I do. cpu0 cpu1 (vmid=0, gen=1) (gen=0) -- -- gen == global_gen, return gen != global_gen increment global_gen smp_call_function reset_vm_context vmid=0 enter with vmid=0 enter with vmid=0 I can't see how the scenario above can happen. First, no-one can run with vmid 0 - it is reserved for the host. If we bump global_gen on cpuN, then since we do smp_call_function(x, x, wait=1) we are now sure that after this call, all cpus(N-1) potentially being inside a VM will have exited, called reset_vm_context, but before they can re-enter into the guest, they will call update_vttbr, and if their generation counter differs from global_gen, they will try to grab that spinlock and everything should happen in order. the whole vmid=0 confused me a bit. The point is since we moved the generation check outside the spin_lock we have to re-check, I see: diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 19fe3b0..74760af 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -313,6 +313,24 @@ static void reset_vm_context(void *info) } /** + * check_new_vmid_gen - check that the VMID is still valid + * @kvm: The VM's VMID to checkt + * + * return true if there is a new generation of VMIDs being used + * + * The hardware supports only 256 values with the value zero reserved for the + * host, so we check if an assigned value belongs to a previous generation, + * which which requires us to assign a new value. If we're the first to use a + * VMID for the new generation, we must flush necessary caches and TLBs on all + * CPUs. + */ +static bool check_new_vmid_gen(struct kvm *kvm) +{ + if (likely(kvm-arch.vmid_gen == atomic64_read(kvm_vmid_gen))) + return; +} + +/** * update_vttbr - Update the VTTBR with a valid VMID before the guest runs * @kvmThe guest that we are about to run * @@ -324,15 +342,7 @@ static void update_vttbr(struct kvm *kvm) { phys_addr_t pgd_phys; - /* -* Check that the VMID is still valid. -* (The hardware supports only 256 values with the value zero -* reserved for the host, so we check if an assigned value belongs to -* a previous generation, which which requires us to assign a new -* value. If we're the first to use a VMID for the new generation, -* we must flush necessary caches and TLBs on all CPUs.) -*/ - if (likely(kvm-arch.vmid_gen == atomic64_read(kvm_vmid_gen))) + if (!check_new_vmid_gen(kvm)) return; spin_lock(kvm_vmid_lock); @@ -504,6 +514,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) */ preempt_disable(); local_irq_disable(); + + if (check_new_vmid_gen(kvm)) { + local_irq_enable(); + preempt_enable(); + continue; + } + kvm_guest_enter(); vcpu-mode = IN_GUEST_MODE; You must recheck gen after disabling interrupts to ensure global_gen didn't bump after update_vttbr but before entry. x86 has a lot of this, see vcpu_enter_guest() and vcpu-requests (doesn't apply directly to your case but may come in useful later). + +/** + * kvm_arch_vcpu_ioctl_run -
Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
On 06/15/2012 10:08 PM, Christoffer Dall wrote: Provides complete world-switch implementation to switch to other guests running in non-secure modes. Includes Hyp exception handlers that capture necessary exception information and stores the information on the VCPU and KVM structures. Switching to Hyp mode is done through a simple HVC instructions. The exception vector code will check that the HVC comes from VMID==0 and if so will store the necessary state on the Hyp stack, which will look like this (growing downwards, see the hyp_hvc handler): ... Hyp_Sp + 4: spsr (Host-SVC cpsr) Hyp_Sp: lr_usr When returning from Hyp mode to SVC mode, another HVC instruction is executed from Hyp mode, which is taken in the Hyp_Svc handler. The Hyp stack pointer should be where it was left from the above initial call, since the values on the stack will be used to restore state (see hyp_svc). Otherwise, the world-switch is pretty straight-forward. All state that can be modified by the guest is first backed up on the Hyp stack and the VCPU values is loaded onto the hardware. State, which is not loaded, but theoretically modifiable by the guest is protected through the virtualiation features to generate a trap and cause software emulation. Upon guest returns, all state is restored from hardware onto the VCPU struct and the original state is restored from the Hyp-stack onto the hardware. One controversy may be the back-door call to __irq_svc (the host kernel's own physical IRQ handler) which is called when a physical IRQ exception is taken in Hyp mode while running in the guest. SMP support using the VMPIDR calculated on the basis of the host MPIDR and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier. Reuse of VMIDs has been implemented by Antonios Motakis and adapated from a separate patch into the appropriate patches introducing the functionality. Note that the VMIDs are stored per VM as required by the ARM architecture reference manual. + +/** + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs + * @kvm The guest that we are about to run + * + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding + * caches and TLBs. + */ +static void update_vttbr(struct kvm *kvm) +{ + phys_addr_t pgd_phys; + + spin_lock(kvm_vmid_lock); Premature, but this is sad. I suggest you split vmid generation from next available vmid. This allows you to make the generation counter atomic so it may be read outside the lock. You can do if (likely(kvm-arch.vmd_gen) == atomic_read(kvm_vmid_gen))) return; spin_lock(... + + /* + * Check that the VMID is still valid. + * (The hardware supports only 256 values with the value zero + * reserved for the host, so we check if an assigned value has rolled + * over a sequence, which requires us to assign a new value and flush + * necessary caches and TLBs on all CPUs.) + */ + if (unlikely((kvm-arch.vmid ^ next_vmid) VMID_BITS)) { + /* Check for a new VMID generation */ + if (unlikely((next_vmid VMID_MASK) == 0)) { + /* Check for the (very unlikely) 64-bit wrap around */ Unlikely? it's impossible. + +/** + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code + * @vcpu:The VCPU pointer + * @run: The kvm_run structure pointer used for userspace state exchange + * + * This function is called through the VCPU_RUN ioctl called from user space. It + * will execute VM code in a loop until the time slice for the process is used + * or some emulation is needed from user space in which case the function will + * return with return value 0 and with the kvm_run structure filled in with the + * required data for the requested emulation. + */ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) { - return -EINVAL; + int ret = 0; + sigset_t sigsaved; + + if (vcpu-sigset_active) + sigprocmask(SIG_SETMASK, vcpu-sigset, sigsaved); + + run-exit_reason = KVM_EXIT_UNKNOWN; + while (run-exit_reason == KVM_EXIT_UNKNOWN) { It's not a good idea to read stuff from run unless it's part of the ABI, since userspace can play with it while you're reading it. It's harmless here but in other places it can lead to a vulnerability. + /* + * Check conditions before entering the guest + */ + if (need_resched()) + kvm_resched(vcpu); I think cond_resched() is all that's needed these days (kvm_resched() predates preempt notifiers). + + if (signal_pending(current)) { + ret = -EINTR; + run-exit_reason = KVM_EXIT_INTR; +
Re: [PATCH v8 11/15] ARM: KVM: World-switch implementation
On Mon, Jun 18, 2012 at 9:41 AM, Avi Kivity a...@redhat.com wrote: On 06/15/2012 10:08 PM, Christoffer Dall wrote: Provides complete world-switch implementation to switch to other guests running in non-secure modes. Includes Hyp exception handlers that capture necessary exception information and stores the information on the VCPU and KVM structures. Switching to Hyp mode is done through a simple HVC instructions. The exception vector code will check that the HVC comes from VMID==0 and if so will store the necessary state on the Hyp stack, which will look like this (growing downwards, see the hyp_hvc handler): ... Hyp_Sp + 4: spsr (Host-SVC cpsr) Hyp_Sp : lr_usr When returning from Hyp mode to SVC mode, another HVC instruction is executed from Hyp mode, which is taken in the Hyp_Svc handler. The Hyp stack pointer should be where it was left from the above initial call, since the values on the stack will be used to restore state (see hyp_svc). Otherwise, the world-switch is pretty straight-forward. All state that can be modified by the guest is first backed up on the Hyp stack and the VCPU values is loaded onto the hardware. State, which is not loaded, but theoretically modifiable by the guest is protected through the virtualiation features to generate a trap and cause software emulation. Upon guest returns, all state is restored from hardware onto the VCPU struct and the original state is restored from the Hyp-stack onto the hardware. One controversy may be the back-door call to __irq_svc (the host kernel's own physical IRQ handler) which is called when a physical IRQ exception is taken in Hyp mode while running in the guest. SMP support using the VMPIDR calculated on the basis of the host MPIDR and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier. Reuse of VMIDs has been implemented by Antonios Motakis and adapated from a separate patch into the appropriate patches introducing the functionality. Note that the VMIDs are stored per VM as required by the ARM architecture reference manual. + +/** + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs + * @kvm The guest that we are about to run + * + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding + * caches and TLBs. + */ +static void update_vttbr(struct kvm *kvm) +{ + phys_addr_t pgd_phys; + + spin_lock(kvm_vmid_lock); Premature, but this is sad. I suggest you split vmid generation from next available vmid. This allows you to make the generation counter atomic so it may be read outside the lock. You can do if (likely(kvm-arch.vmd_gen) == atomic_read(kvm_vmid_gen))) return; spin_lock(... I knew you were going to say something here :), please take a look at this and see if you agree: +struct kvm_arch { + /* The VMID generation used for the virt. memory system */ + u64vmid_gen; + u32vmid; + + /* 1-level 2nd stage table and lock */ + struct mutex pgd_mutex; + pgd_t *pgd; + + /* VTTBR value associated with above pgd and vmid */ + u64vttbr; +}; [snip] +/* The VMID used in the VTTBR */ +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1); +static u8 kvm_next_vmid; +DEFINE_SPINLOCK(kvm_vmid_lock); [snip] +/** + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs + * @kvmThe guest that we are about to run + * + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding + * caches and TLBs. + */ +static void update_vttbr(struct kvm *kvm) +{ + phys_addr_t pgd_phys; + + /* +* Check that the VMID is still valid. +* (The hardware supports only 256 values with the value zero +* reserved for the host, so we check if an assigned value belongs to +* a previous generation, which which requires us to assign a new +* value. If we're the first to use a VMID for the new generation, +* we must flush necessary caches and TLBs on all CPUs.) +*/ + if (likely(kvm-arch.vmid_gen == atomic64_read(kvm_vmid_gen))) + return; + + spin_lock(kvm_vmid_lock); + + /* First user of a new VMID generation? */ + if (unlikely(kvm_next_vmid == 0)) { + atomic64_inc(kvm_vmid_gen); + kvm_next_vmid = 1; + + /* This does nothing on UP */ + smp_call_function(reset_vm_context, NULL, 1); + + /* +* On SMP we know no other CPUs can use this CPU's or +* each other's VMID since the kvm_vmid_lock blocks +* them from reentry to the guest. +*/ + + reset_vm_context(NULL); + } + + kvm-arch.vmid =