Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others
On Tue, 3 Jul 2012 04:55:35 -0300, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Jun 04, 2012 at 10:37:24AM +0530, Nikunj A. Dadhania wrote: flush_tlb_others_ipi depends on lot of statics in tlb.c. Replicated the flush_tlb_others_ipi as kvm_flush_tlb_others to further adapt to paravirtualization. Use the vcpu state information inside the kvm_flush_tlb_others to avoid sending ipi to pre-empted vcpus. * Do not send ipi's to offline vcpus and set flush_on_enter flag * For online vcpus: Wait for them to clear the flag The approach was discussed here: https://lkml.org/lkml/2012/2/20/157 Suggested-by: Peter Zijlstra a.p.zijls...@chello.nl Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com -- Pseudo Algo: Write() == guest_exit() flush_on_enter[i]=0; running[i] = 0; guest_enter() running[i] = 1; smp_mb(); if(flush_on_enter[i]) { tlb_flush() flush_on_enter[i]=0; } Read() == GUESTKVM-HV f-flushcpumask = cpumask - me; again: for_each_cpu(i, f-flushmask) { if (!running[i]) { case 1: running[n]=1 (cpuN does not see flush_on_enter set, guest later finds it running and sends ipi, we are fine here, need to clear the flag on guest_exit) flush_on_enter[i] = 1; case2: running[n]=1 (cpuN - will see flush on enter and an IPI as well - addressed in patch-4) if (!running[i]) cpu_clear(f-flushmask); All is well, vm_enter will do the fixup } case 3: running[n] = 0; (cpuN went to sleep, we saw it as awake, ipi sent, but wait will break without zero_mask and goto again will take care) } send_ipi(f-flushmask) wait_a_while_for_zero_mask(); if (!zero_mask) goto again; Can you please measure increased vmentry/vmexit overhead? x86/vmexit.c of git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git should help. Please find below the results (debug patch attached for enabling registration of kvm_vcu_state) I have taken results for 1 and 4 vcpus. Used the following command for starting the tests: /usr/libexec/qemu-kvm -smp $i -device testdev,chardev=testlog -chardev file,id=testlog,path=vmexit.out -serial stdio -kernel ./x86/vmexit.flat Machine : IBM xSeries with Intel(R) Xeon(R) X7560 2.27GHz CPU with 32 core, 32 online cpus and 4*64GB RAM. x base - unpatched host kernel + wo_vs - patched host kernel, vcpu_state not registered * w_vs.txt - patched host kernel and vcpu_state registered 1 vcpu results: --- cpuid = NAvg Stddev x 10 2135.1 17.8975 + 10 2188 18.3666 * 10 2448.9 43.9910 vmcall == NAvg Stddev x 10 2025.5 38.1641 + 10 2047.5 24.8205 * 10 2306.2 40.3066 mov_from_cr8 NAvg Stddev x 10 12 0. + 10 12 0. * 10 12 0. mov_to_cr8 == NAvg Stddev x 10 19.4 0.5164 + 10 19.1 0.3162 * 10 19.2 0.4216 inl_from_pmtimer NAvg Stddev x 1018093.2 462.0543 + 1016579.71448.8892 * 1018577.7 266.2676 ple-round-robin === NAvg Stddev x 10 16.1 0.3162 +
Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others
On Tue, Jul 03, 2012 at 01:49:49PM +0530, Nikunj A Dadhania wrote: On Tue, 3 Jul 2012 04:55:35 -0300, Marcelo Tosatti mtosa...@redhat.com wrote: if (!zero_mask) goto again; Can you please measure increased vmentry/vmexit overhead? x86/vmexit.c of git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git should help. Sure will get back with the result. + /* + * Guest might have seen us offline and would have set + * flush_on_enter. + */ + kvm_read_guest_cached(vcpu-kvm, ghc, vs, 2*sizeof(__u32)); + if (vs-flush_on_enter) + kvm_x86_ops-tlb_flush(vcpu); So flush_tlb_page which was an invlpg now flushes the entire TLB. Did you take that into account? When the vcpu is sleeping/pre-empted out, multiple request for flush_tlb could have happened. And now when we are here, it is cleaning up all the TLB. Yes, cases where there are sufficient exits transforming one TLB entry invalidation into full TLB invalidation should go unnoticed. One other approach would be to queue the addresses, that brings us with the question: how many request to queue? This would require us adding more syncronization between guest and host for updating the area where these addresses is shared. Sounds unnecessarily complicated. +again: + for_each_cpu(cpu, to_cpumask(f-flush_cpumask)) { + v_state = per_cpu(vcpu_state, cpu); + + if (!v_state-state) { Should use ACCESS_ONCE to make sure the value is not register cached. \ + v_state-flush_on_enter = 1; + smp_mb(); + if (!v_state-state) And here. Sure will add this check for both in my next version. + cpumask_clear_cpu(cpu, to_cpumask(f-flush_cpumask)); + } + } + + if (cpumask_empty(to_cpumask(f-flush_cpumask))) + goto out; + + apic-send_IPI_mask(to_cpumask(f-flush_cpumask), + INVALIDATE_TLB_VECTOR_START + sender); + + loop = 1000; + while (!cpumask_empty(to_cpumask(f-flush_cpumask)) --loop) + cpu_relax(); + + if (!cpumask_empty(to_cpumask(f-flush_cpumask))) + goto again; Is this necessary in addition to the in-guest-mode/out-guest-mode detection? If so, why? The case 3 where we initially saw the vcpu was running, and a flush ipi is send to the vcpu. During this time the vcpu might be pre-empted, so we come out of the loop=1000 with !empty flushmask. We then re-verify the flushmask against the current running vcpu and make sure that the vcpu that was pre-empted is un-marked and we can proceed out of the kvm_flush_tlb_others_ipi without waiting for sleeping/pre-empted vcpus. Regards Nikunj -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others
On Wed, 4 Jul 2012 23:09:10 -0300, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Jul 03, 2012 at 01:49:49PM +0530, Nikunj A Dadhania wrote: On Tue, 3 Jul 2012 04:55:35 -0300, Marcelo Tosatti mtosa...@redhat.com wrote: if (!zero_mask) goto again; Can you please measure increased vmentry/vmexit overhead? x86/vmexit.c of git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git should help. Sure will get back with the result. + /* +* Guest might have seen us offline and would have set +* flush_on_enter. +*/ + kvm_read_guest_cached(vcpu-kvm, ghc, vs, 2*sizeof(__u32)); + if (vs-flush_on_enter) + kvm_x86_ops-tlb_flush(vcpu); So flush_tlb_page which was an invlpg now flushes the entire TLB. Did you take that into account? When the vcpu is sleeping/pre-empted out, multiple request for flush_tlb could have happened. And now when we are here, it is cleaning up all the TLB. Yes, cases where there are sufficient exits transforming one TLB entry invalidation into full TLB invalidation should go unnoticed. One other approach would be to queue the addresses, that brings us with the question: how many request to queue? This would require us adding more syncronization between guest and host for updating the area where these addresses is shared. Sounds unnecessarily complicated. Yes, I did give this a try earlier, but did not see much improvement with the amount of complexity that it was bringing in. Regards Nikunj -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others
On Mon, Jun 04, 2012 at 10:37:24AM +0530, Nikunj A. Dadhania wrote: flush_tlb_others_ipi depends on lot of statics in tlb.c. Replicated the flush_tlb_others_ipi as kvm_flush_tlb_others to further adapt to paravirtualization. Use the vcpu state information inside the kvm_flush_tlb_others to avoid sending ipi to pre-empted vcpus. * Do not send ipi's to offline vcpus and set flush_on_enter flag * For online vcpus: Wait for them to clear the flag The approach was discussed here: https://lkml.org/lkml/2012/2/20/157 Suggested-by: Peter Zijlstra a.p.zijls...@chello.nl Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com -- Pseudo Algo: Write() == guest_exit() flush_on_enter[i]=0; running[i] = 0; guest_enter() running[i] = 1; smp_mb(); if(flush_on_enter[i]) { tlb_flush() flush_on_enter[i]=0; } Read() == GUESTKVM-HV f-flushcpumask = cpumask - me; again: for_each_cpu(i, f-flushmask) { if (!running[i]) { case 1: running[n]=1 (cpuN does not see flush_on_enter set, guest later finds it running and sends ipi, we are fine here, need to clear the flag on guest_exit) flush_on_enter[i] = 1; case2: running[n]=1 (cpuN - will see flush on enter and an IPI as well - addressed in patch-4) if (!running[i]) cpu_clear(f-flushmask); All is well, vm_enter will do the fixup } case 3: running[n] = 0; (cpuN went to sleep, we saw it as awake, ipi sent, but wait will break without zero_mask and goto again will take care) } send_ipi(f-flushmask) wait_a_while_for_zero_mask(); if (!zero_mask) goto again; Can you please measure increased vmentry/vmexit overhead? x86/vmexit.c of git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git should help. arch/x86/include/asm/kvm_para.h |3 +- arch/x86/include/asm/tlbflush.h |9 ++ arch/x86/kernel/kvm.c |1 + arch/x86/kvm/x86.c | 14 - arch/x86/mm/tlb.c | 61 +++ 5 files changed, 86 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index f57b5cc..684a285 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -55,7 +55,8 @@ struct kvm_steal_time { struct kvm_vcpu_state { __u32 state; - __u32 pad[15]; + __u32 flush_on_enter; + __u32 pad[14]; }; #define KVM_VCPU_STATE_ALIGN_BITS 5 diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index c0e108e..29470bd 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -119,6 +119,12 @@ static inline void native_flush_tlb_others(const struct cpumask *cpumask, { } +static inline void kvm_flush_tlb_others(const struct cpumask *cpumask, + struct mm_struct *mm, + unsigned long va) +{ +} + static inline void reset_lazy_tlbstate(void) { } @@ -145,6 +151,9 @@ static inline void flush_tlb_range(struct vm_area_struct *vma, void native_flush_tlb_others(const struct cpumask *cpumask, struct mm_struct *mm, unsigned long va); +void kvm_flush_tlb_others(const struct cpumask *cpumask, + struct mm_struct *mm, unsigned long va); + #define TLBSTATE_OK 1 #define TLBSTATE_LAZY2 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index
Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others
On Mon, Jun 04, 2012 at 10:37:24AM +0530, Nikunj A. Dadhania wrote: flush_tlb_others_ipi depends on lot of statics in tlb.c. Replicated the flush_tlb_others_ipi as kvm_flush_tlb_others to further adapt to paravirtualization. Use the vcpu state information inside the kvm_flush_tlb_others to avoid sending ipi to pre-empted vcpus. * Do not send ipi's to offline vcpus and set flush_on_enter flag * For online vcpus: Wait for them to clear the flag The approach was discussed here: https://lkml.org/lkml/2012/2/20/157 Suggested-by: Peter Zijlstra a.p.zijls...@chello.nl Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com -- Pseudo Algo: Write() == guest_exit() flush_on_enter[i]=0; running[i] = 0; guest_enter() running[i] = 1; smp_mb(); if(flush_on_enter[i]) { tlb_flush() flush_on_enter[i]=0; } Read() == GUESTKVM-HV f-flushcpumask = cpumask - me; again: for_each_cpu(i, f-flushmask) { if (!running[i]) { case 1: running[n]=1 (cpuN does not see flush_on_enter set, guest later finds it running and sends ipi, we are fine here, need to clear the flag on guest_exit) flush_on_enter[i] = 1; case2: running[n]=1 (cpuN - will see flush on enter and an IPI as well - addressed in patch-4) if (!running[i]) cpu_clear(f-flushmask); All is well, vm_enter will do the fixup } case 3: running[n] = 0; (cpuN went to sleep, we saw it as awake, ipi sent, but wait will break without zero_mask and goto again will take care) } send_ipi(f-flushmask) wait_a_while_for_zero_mask(); if (!zero_mask) goto again; --- arch/x86/include/asm/kvm_para.h |3 +- arch/x86/include/asm/tlbflush.h |9 ++ arch/x86/kernel/kvm.c |1 + arch/x86/kvm/x86.c | 14 - arch/x86/mm/tlb.c | 61 +++ 5 files changed, 86 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index f57b5cc..684a285 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -55,7 +55,8 @@ struct kvm_steal_time { struct kvm_vcpu_state { __u32 state; - __u32 pad[15]; + __u32 flush_on_enter; + __u32 pad[14]; }; #define KVM_VCPU_STATE_ALIGN_BITS 5 diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index c0e108e..29470bd 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -119,6 +119,12 @@ static inline void native_flush_tlb_others(const struct cpumask *cpumask, { } +static inline void kvm_flush_tlb_others(const struct cpumask *cpumask, + struct mm_struct *mm, + unsigned long va) +{ +} + static inline void reset_lazy_tlbstate(void) { } @@ -145,6 +151,9 @@ static inline void flush_tlb_range(struct vm_area_struct *vma, void native_flush_tlb_others(const struct cpumask *cpumask, struct mm_struct *mm, unsigned long va); +void kvm_flush_tlb_others(const struct cpumask *cpumask, + struct mm_struct *mm, unsigned long va); + #define TLBSTATE_OK 1 #define TLBSTATE_LAZY2 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index bb686a6..66db54e 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -465,6 +465,7 @@ void __init kvm_guest_init(void) }
Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others
On Tue, 3 Jul 2012 04:55:35 -0300, Marcelo Tosatti mtosa...@redhat.com wrote: if (!zero_mask) goto again; Can you please measure increased vmentry/vmexit overhead? x86/vmexit.c of git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git should help. Sure will get back with the result. + /* +* Guest might have seen us offline and would have set +* flush_on_enter. +*/ + kvm_read_guest_cached(vcpu-kvm, ghc, vs, 2*sizeof(__u32)); + if (vs-flush_on_enter) + kvm_x86_ops-tlb_flush(vcpu); So flush_tlb_page which was an invlpg now flushes the entire TLB. Did you take that into account? When the vcpu is sleeping/pre-empted out, multiple request for flush_tlb could have happened. And now when we are here, it is cleaning up all the TLB. One other approach would be to queue the addresses, that brings us with the question: how many request to queue? This would require us adding more syncronization between guest and host for updating the area where these addresses is shared. +again: + for_each_cpu(cpu, to_cpumask(f-flush_cpumask)) { + v_state = per_cpu(vcpu_state, cpu); + + if (!v_state-state) { Should use ACCESS_ONCE to make sure the value is not register cached. \ + v_state-flush_on_enter = 1; + smp_mb(); + if (!v_state-state) And here. Sure will add this check for both in my next version. + cpumask_clear_cpu(cpu, to_cpumask(f-flush_cpumask)); + } + } + + if (cpumask_empty(to_cpumask(f-flush_cpumask))) + goto out; + + apic-send_IPI_mask(to_cpumask(f-flush_cpumask), + INVALIDATE_TLB_VECTOR_START + sender); + + loop = 1000; + while (!cpumask_empty(to_cpumask(f-flush_cpumask)) --loop) + cpu_relax(); + + if (!cpumask_empty(to_cpumask(f-flush_cpumask))) + goto again; Is this necessary in addition to the in-guest-mode/out-guest-mode detection? If so, why? The case 3 where we initially saw the vcpu was running, and a flush ipi is send to the vcpu. During this time the vcpu might be pre-empted, so we come out of the loop=1000 with !empty flushmask. We then re-verify the flushmask against the current running vcpu and make sure that the vcpu that was pre-empted is un-marked and we can proceed out of the kvm_flush_tlb_others_ipi without waiting for sleeping/pre-empted vcpus. Regards Nikunj -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others
On Tue, 3 Jul 2012 05:11:35 -0300, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Jun 04, 2012 at 10:37:24AM +0530, Nikunj A. Dadhania wrote: arch/x86/include/asm/kvm_para.h |3 +- arch/x86/include/asm/tlbflush.h |9 ++ arch/x86/kernel/kvm.c |1 + arch/x86/kvm/x86.c | 14 - arch/x86/mm/tlb.c | 61 +++ 5 files changed, 86 insertions(+), 2 deletions(-) [...] diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 264f172..4714a7b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c Please split guest/host (arch/x86/kernel/kvm.c etc VS arch/x86/kvm/) patches. Ok Please document guest/host interface (Documentation/virtual/kvm/paravirt-tlb-flush.txt, add a pointer to it from msr.txt). Sure. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others
On Tue, Jun 19, 2012 at 11:41:54AM +0530, Nikunj A Dadhania wrote: Hi Marcelo, Thanks for the review. On Tue, 12 Jun 2012 20:02:18 -0300, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Jun 04, 2012 at 10:37:24AM +0530, Nikunj A. Dadhania wrote: flush_tlb_others_ipi depends on lot of statics in tlb.c. Replicated the flush_tlb_others_ipi as kvm_flush_tlb_others to further adapt to paravirtualization. Use the vcpu state information inside the kvm_flush_tlb_others to avoid sending ipi to pre-empted vcpus. * Do not send ipi's to offline vcpus and set flush_on_enter flag * For online vcpus: Wait for them to clear the flag The approach was discussed here: https://lkml.org/lkml/2012/2/20/157 Suggested-by: Peter Zijlstra a.p.zijls...@chello.nl Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com Why not reintroduce the hypercall to flush TLBs? Sure, I will get a version with this. No waiting, no entry/exit trickery. Are you also suggesting to get rid of vcpu-running information? Yes. We would atleast need this to raise a flushTLB hypercall on the sleeping vcpu. No, you always raise a flush TLB hypercall on flush_tlb_others, like Xen does. Instead of an MMIO exit to APIC to IPI, its a hypercall exit. This is half-way-there paravirt with all the downsides. Some more details on what are the downsides would help us reach to a better solution. The downside i refer to is overhead to write, test and maintain separate code for running as a guest. If you are adding awareness about hypervisor anyway, a hypercall is the simplest way: - It is simple to request a remote TLB flush via vcpu-requests in the hypervisor. Information about which vcpus are in/out guest mode already available. - Maintenance of in/out guest mode information in guest memory adds more overhead to entry/exit paths. - No need to handle the interprocessor synchronization in the pseudo algo below (already handled by vcpu-requests mechanism). - The guest TLB IPI flow is (target vcpu): - exit-due-to-external-interrupt. - inject-ipi. - enter guest mode. - execute IPI handler, flush TLB. - ACK IPI. - source vcpu continues. - The hypervisor TLB flush flow is: - exit-due-to-external-interrupt. - execute IPI handler (in host, from make_all_cpus_request) - ACK IPI. - source vcpu continues. Unless there is an advantage in using APIC IPI exit vs hypercall exit? Even though the guest running information might be useful in other cases. Yes, that was one of the things on the mind. Pseudo Algo: Write() == guest_exit() flush_on_enter[i]=0; running[i] = 0; guest_enter() running[i] = 1; smp_mb(); if(flush_on_enter[i]) { tlb_flush() flush_on_enter[i]=0; } Read() == GUESTKVM-HV f-flushcpumask = cpumask - me; again: for_each_cpu(i, f-flushmask) { if (!running[i]) { case 1: running[n]=1 (cpuN does not see flush_on_enter set, guest later finds it running and sends ipi, we are fine here, need to clear the flag on guest_exit) flush_on_enter[i] = 1; case2: running[n]=1 (cpuN - will see flush on enter and an IPI as well - addressed in patch-4) if (!running[i]) cpu_clear(f-flushmask); All is well, vm_enter will do the fixup } case 3: running[n] = 0; (cpuN went to sleep, we saw it as awake, ipi sent, but wait will break without zero_mask and goto again will take care) } send_ipi(f-flushmask) wait_a_while_for_zero_mask(); if (!zero_mask)
Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others
Hi Marcelo, Thanks for the review. On Tue, 12 Jun 2012 20:02:18 -0300, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Jun 04, 2012 at 10:37:24AM +0530, Nikunj A. Dadhania wrote: flush_tlb_others_ipi depends on lot of statics in tlb.c. Replicated the flush_tlb_others_ipi as kvm_flush_tlb_others to further adapt to paravirtualization. Use the vcpu state information inside the kvm_flush_tlb_others to avoid sending ipi to pre-empted vcpus. * Do not send ipi's to offline vcpus and set flush_on_enter flag * For online vcpus: Wait for them to clear the flag The approach was discussed here: https://lkml.org/lkml/2012/2/20/157 Suggested-by: Peter Zijlstra a.p.zijls...@chello.nl Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com Why not reintroduce the hypercall to flush TLBs? Sure, I will get a version with this. No waiting, no entry/exit trickery. Are you also suggesting to get rid of vcpu-running information? We would atleast need this to raise a flushTLB hypercall on the sleeping vcpu. This is half-way-there paravirt with all the downsides. Some more details on what are the downsides would help us reach to a better solution. Even though the guest running information might be useful in other cases. Yes, that was one of the things on the mind. Pseudo Algo: Write() == guest_exit() flush_on_enter[i]=0; running[i] = 0; guest_enter() running[i] = 1; smp_mb(); if(flush_on_enter[i]) { tlb_flush() flush_on_enter[i]=0; } Read() == GUESTKVM-HV f-flushcpumask = cpumask - me; again: for_each_cpu(i, f-flushmask) { if (!running[i]) { case 1: running[n]=1 (cpuN does not see flush_on_enter set, guest later finds it running and sends ipi, we are fine here, need to clear the flag on guest_exit) flush_on_enter[i] = 1; case2: running[n]=1 (cpuN - will see flush on enter and an IPI as well - addressed in patch-4) if (!running[i]) cpu_clear(f-flushmask); All is well, vm_enter will do the fixup } case 3: running[n] = 0; (cpuN went to sleep, we saw it as awake, ipi sent, but wait will break without zero_mask and goto again will take care) } send_ipi(f-flushmask) wait_a_while_for_zero_mask(); if (!zero_mask) goto again; --- arch/x86/include/asm/kvm_para.h |3 +- arch/x86/include/asm/tlbflush.h |9 ++ arch/x86/kernel/kvm.c |1 + arch/x86/kvm/x86.c | 14 - arch/x86/mm/tlb.c | 61 +++ 5 files changed, 86 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others
On Mon, Jun 04, 2012 at 10:37:24AM +0530, Nikunj A. Dadhania wrote: flush_tlb_others_ipi depends on lot of statics in tlb.c. Replicated the flush_tlb_others_ipi as kvm_flush_tlb_others to further adapt to paravirtualization. Use the vcpu state information inside the kvm_flush_tlb_others to avoid sending ipi to pre-empted vcpus. * Do not send ipi's to offline vcpus and set flush_on_enter flag * For online vcpus: Wait for them to clear the flag The approach was discussed here: https://lkml.org/lkml/2012/2/20/157 Suggested-by: Peter Zijlstra a.p.zijls...@chello.nl Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com Why not reintroduce the hypercall to flush TLBs? No waiting, no entry/exit trickery. This is half-way-there paravirt with all the downsides. Even though the guest running information might be useful in other cases. Pseudo Algo: Write() == guest_exit() flush_on_enter[i]=0; running[i] = 0; guest_enter() running[i] = 1; smp_mb(); if(flush_on_enter[i]) { tlb_flush() flush_on_enter[i]=0; } Read() == GUESTKVM-HV f-flushcpumask = cpumask - me; again: for_each_cpu(i, f-flushmask) { if (!running[i]) { case 1: running[n]=1 (cpuN does not see flush_on_enter set, guest later finds it running and sends ipi, we are fine here, need to clear the flag on guest_exit) flush_on_enter[i] = 1; case2: running[n]=1 (cpuN - will see flush on enter and an IPI as well - addressed in patch-4) if (!running[i]) cpu_clear(f-flushmask); All is well, vm_enter will do the fixup } case 3: running[n] = 0; (cpuN went to sleep, we saw it as awake, ipi sent, but wait will break without zero_mask and goto again will take care) } send_ipi(f-flushmask) wait_a_while_for_zero_mask(); if (!zero_mask) goto again; --- arch/x86/include/asm/kvm_para.h |3 +- arch/x86/include/asm/tlbflush.h |9 ++ arch/x86/kernel/kvm.c |1 + arch/x86/kvm/x86.c | 14 - arch/x86/mm/tlb.c | 61 +++ 5 files changed, 86 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html