Re: [PATCH 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
On Tue, Jun 25, 2019 at 8:48 PM Nadav Amit wrote: > > > On Jun 25, 2019, at 8:36 PM, Andy Lutomirski wrote: > > > > On Wed, Jun 12, 2019 at 11:49 PM Nadav Amit wrote: > >> To improve TLB shootdown performance, flush the remote and local TLBs > >> concurrently. Introduce flush_tlb_multi() that does so. The current > >> flush_tlb_others() interface is kept, since paravirtual interfaces need > >> to be adapted first before it can be removed. This is left for future > >> work. In such PV environments, TLB flushes are not performed, at this > >> time, concurrently. > > > > Would it be straightforward to have a default PV flush_tlb_multi() > > that uses flush_tlb_others() under the hood? > > I prefer not to have a default PV implementation that should anyhow go away. > > I can create unoptimized untested versions for Xen and Hyper-V, if you want. > I think I prefer that approach. We should be able to get the maintainers to test it. I don't love having legacy paths in there, ahem, UV. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
> On Jun 25, 2019, at 8:36 PM, Andy Lutomirski wrote: > > On Wed, Jun 12, 2019 at 11:49 PM Nadav Amit wrote: >> To improve TLB shootdown performance, flush the remote and local TLBs >> concurrently. Introduce flush_tlb_multi() that does so. The current >> flush_tlb_others() interface is kept, since paravirtual interfaces need >> to be adapted first before it can be removed. This is left for future >> work. In such PV environments, TLB flushes are not performed, at this >> time, concurrently. > > Would it be straightforward to have a default PV flush_tlb_multi() > that uses flush_tlb_others() under the hood? I prefer not to have a default PV implementation that should anyhow go away. I can create unoptimized untested versions for Xen and Hyper-V, if you want. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
On Wed, Jun 12, 2019 at 11:49 PM Nadav Amit wrote: > > To improve TLB shootdown performance, flush the remote and local TLBs > concurrently. Introduce flush_tlb_multi() that does so. The current > flush_tlb_others() interface is kept, since paravirtual interfaces need > to be adapted first before it can be removed. This is left for future > work. In such PV environments, TLB flushes are not performed, at this > time, concurrently. Would it be straightforward to have a default PV flush_tlb_multi() that uses flush_tlb_others() under the hood? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
> On Jun 25, 2019, at 8:00 PM, Dave Hansen wrote: > > On 6/25/19 7:35 PM, Nadav Amit wrote: const struct flush_tlb_info *f = info; + enum tlb_flush_reason reason; + + reason = (f->mm == NULL) ? TLB_LOCAL_SHOOTDOWN : TLB_LOCAL_MM_SHOOTDOWN; >>> >>> Should we just add the "reason" to flush_tlb_info? It's OK-ish to imply >>> it like this, but seems like it would be nicer and easier to track down >>> the origins of these things if we did this at the caller. >> >> I prefer not to. I want later to inline flush_tlb_info into the same >> cacheline that holds call_function_data. Increasing the size of >> flush_tlb_info for no good reason will not help… > > Well, flush_tlb_info is at 6/8ths of a cacheline at the moment. > call_function_data is 3/8ths. To me, that means we have some slack in > the size. I do not understand your math.. :( 6 + 3 > 8 so putting both flush_tlb_info and call_function_data does not leave us any slack (we can save one qword, so we can actually put them at the same cacheline). You can see my current implementation here: https://lore.kernel.org/lkml/20190531063645.4697-4-na...@vmware.com/T/#m0ab5fe0799ba9ff0d41197f1095679fe26aebd57 https://lore.kernel.org/lkml/20190531063645.4697-4-na...@vmware.com/T/#m7b35a93dffd23fbb7ca813c795a0777d4cdcb51b ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
On 6/25/19 7:35 PM, Nadav Amit wrote: >>> const struct flush_tlb_info *f = info; >>> + enum tlb_flush_reason reason; >>> + >>> + reason = (f->mm == NULL) ? TLB_LOCAL_SHOOTDOWN : TLB_LOCAL_MM_SHOOTDOWN; >> >> Should we just add the "reason" to flush_tlb_info? It's OK-ish to imply >> it like this, but seems like it would be nicer and easier to track down >> the origins of these things if we did this at the caller. > > I prefer not to. I want later to inline flush_tlb_info into the same > cacheline that holds call_function_data. Increasing the size of > flush_tlb_info for no good reason will not help… Well, flush_tlb_info is at 6/8ths of a cacheline at the moment. call_function_data is 3/8ths. To me, that means we have some slack in the size. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
> On Jun 25, 2019, at 2:29 PM, Dave Hansen wrote: > > On 6/12/19 11:48 PM, Nadav Amit wrote: >> To improve TLB shootdown performance, flush the remote and local TLBs >> concurrently. Introduce flush_tlb_multi() that does so. The current >> flush_tlb_others() interface is kept, since paravirtual interfaces need >> to be adapted first before it can be removed. This is left for future >> work. In such PV environments, TLB flushes are not performed, at this >> time, concurrently. >> >> Add a static key to tell whether this new interface is supported. >> >> Cc: "K. Y. Srinivasan" >> Cc: Haiyang Zhang >> Cc: Stephen Hemminger >> Cc: Sasha Levin >> Cc: Thomas Gleixner >> Cc: Ingo Molnar >> Cc: Borislav Petkov >> Cc: x...@kernel.org >> Cc: Juergen Gross >> Cc: Paolo Bonzini >> Cc: Dave Hansen >> Cc: Andy Lutomirski >> Cc: Peter Zijlstra >> Cc: Boris Ostrovsky >> Cc: linux-hyp...@vger.kernel.org >> Cc: linux-ker...@vger.kernel.org >> Cc: virtualization@lists.linux-foundation.org >> Cc: k...@vger.kernel.org >> Cc: xen-de...@lists.xenproject.org >> Signed-off-by: Nadav Amit >> --- >> arch/x86/hyperv/mmu.c | 2 + >> arch/x86/include/asm/paravirt.h | 8 +++ >> arch/x86/include/asm/paravirt_types.h | 6 +++ >> arch/x86/include/asm/tlbflush.h | 6 +++ >> arch/x86/kernel/kvm.c | 1 + >> arch/x86/kernel/paravirt.c| 3 ++ >> arch/x86/mm/tlb.c | 71 ++- >> arch/x86/xen/mmu_pv.c | 2 + >> 8 files changed, 87 insertions(+), 12 deletions(-) >> >> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c >> index e65d7fe6489f..ca28b400c87c 100644 >> --- a/arch/x86/hyperv/mmu.c >> +++ b/arch/x86/hyperv/mmu.c >> @@ -233,4 +233,6 @@ void hyperv_setup_mmu_ops(void) >> pr_info("Using hypercall for remote TLB flush\n"); >> pv_ops.mmu.flush_tlb_others = hyperv_flush_tlb_others; >> pv_ops.mmu.tlb_remove_table = tlb_remove_table; >> + >> +static_key_disable(_tlb_multi_enabled.key); >> } >> diff --git a/arch/x86/include/asm/paravirt.h >> b/arch/x86/include/asm/paravirt.h >> index c25c38a05c1c..192be7254457 100644 >> --- a/arch/x86/include/asm/paravirt.h >> +++ b/arch/x86/include/asm/paravirt.h >> @@ -47,6 +47,8 @@ static inline void slow_down_io(void) >> #endif >> } >> >> +DECLARE_STATIC_KEY_TRUE(flush_tlb_multi_enabled); >> + >> static inline void __flush_tlb(void) >> { >> PVOP_VCALL0(mmu.flush_tlb_user); >> @@ -62,6 +64,12 @@ static inline void __flush_tlb_one_user(unsigned long >> addr) >> PVOP_VCALL1(mmu.flush_tlb_one_user, addr); >> } >> >> +static inline void flush_tlb_multi(const struct cpumask *cpumask, >> + const struct flush_tlb_info *info) >> +{ >> +PVOP_VCALL2(mmu.flush_tlb_multi, cpumask, info); >> +} >> + >> static inline void flush_tlb_others(const struct cpumask *cpumask, >> const struct flush_tlb_info *info) >> { >> diff --git a/arch/x86/include/asm/paravirt_types.h >> b/arch/x86/include/asm/paravirt_types.h >> index 946f8f1f1efc..b93b3d90729a 100644 >> --- a/arch/x86/include/asm/paravirt_types.h >> +++ b/arch/x86/include/asm/paravirt_types.h >> @@ -211,6 +211,12 @@ struct pv_mmu_ops { >> void (*flush_tlb_user)(void); >> void (*flush_tlb_kernel)(void); >> void (*flush_tlb_one_user)(unsigned long addr); >> +/* >> + * flush_tlb_multi() is the preferred interface, which is capable to >> + * flush both local and remote CPUs. >> + */ >> +void (*flush_tlb_multi)(const struct cpumask *cpus, >> +const struct flush_tlb_info *info); >> void (*flush_tlb_others)(const struct cpumask *cpus, >> const struct flush_tlb_info *info); >> >> diff --git a/arch/x86/include/asm/tlbflush.h >> b/arch/x86/include/asm/tlbflush.h >> index dee375831962..79272938cf79 100644 >> --- a/arch/x86/include/asm/tlbflush.h >> +++ b/arch/x86/include/asm/tlbflush.h >> @@ -569,6 +569,9 @@ static inline void flush_tlb_page(struct vm_area_struct >> *vma, unsigned long a) >> flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT, false); >> } >> >> +void native_flush_tlb_multi(const struct cpumask *cpumask, >> + const struct flush_tlb_info *info); >> + >> void native_flush_tlb_others(const struct cpumask *cpumask, >> const struct flush_tlb_info *info); >> >> @@ -593,6 +596,9 @@ static inline void arch_tlbbatch_add_mm(struct >> arch_tlbflush_unmap_batch *batch, >> extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); >> >> #ifndef CONFIG_PARAVIRT >> +#define flush_tlb_multi(mask, info) \ >> +native_flush_tlb_multi(mask, info) >> + >> #define flush_tlb_others(mask, info) \ >> native_flush_tlb_others(mask, info) >> >> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c >> index 5169b8cc35bb..00d81e898717 100644 >> ---
Re: [PATCH 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
On 6/12/19 11:48 PM, Nadav Amit wrote: > To improve TLB shootdown performance, flush the remote and local TLBs > concurrently. Introduce flush_tlb_multi() that does so. The current > flush_tlb_others() interface is kept, since paravirtual interfaces need > to be adapted first before it can be removed. This is left for future > work. In such PV environments, TLB flushes are not performed, at this > time, concurrently. > > Add a static key to tell whether this new interface is supported. > > Cc: "K. Y. Srinivasan" > Cc: Haiyang Zhang > Cc: Stephen Hemminger > Cc: Sasha Levin > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Cc: x...@kernel.org > Cc: Juergen Gross > Cc: Paolo Bonzini > Cc: Dave Hansen > Cc: Andy Lutomirski > Cc: Peter Zijlstra > Cc: Boris Ostrovsky > Cc: linux-hyp...@vger.kernel.org > Cc: linux-ker...@vger.kernel.org > Cc: virtualization@lists.linux-foundation.org > Cc: k...@vger.kernel.org > Cc: xen-de...@lists.xenproject.org > Signed-off-by: Nadav Amit > --- > arch/x86/hyperv/mmu.c | 2 + > arch/x86/include/asm/paravirt.h | 8 +++ > arch/x86/include/asm/paravirt_types.h | 6 +++ > arch/x86/include/asm/tlbflush.h | 6 +++ > arch/x86/kernel/kvm.c | 1 + > arch/x86/kernel/paravirt.c| 3 ++ > arch/x86/mm/tlb.c | 71 ++- > arch/x86/xen/mmu_pv.c | 2 + > 8 files changed, 87 insertions(+), 12 deletions(-) > > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c > index e65d7fe6489f..ca28b400c87c 100644 > --- a/arch/x86/hyperv/mmu.c > +++ b/arch/x86/hyperv/mmu.c > @@ -233,4 +233,6 @@ void hyperv_setup_mmu_ops(void) > pr_info("Using hypercall for remote TLB flush\n"); > pv_ops.mmu.flush_tlb_others = hyperv_flush_tlb_others; > pv_ops.mmu.tlb_remove_table = tlb_remove_table; > + > + static_key_disable(_tlb_multi_enabled.key); > } > diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h > index c25c38a05c1c..192be7254457 100644 > --- a/arch/x86/include/asm/paravirt.h > +++ b/arch/x86/include/asm/paravirt.h > @@ -47,6 +47,8 @@ static inline void slow_down_io(void) > #endif > } > > +DECLARE_STATIC_KEY_TRUE(flush_tlb_multi_enabled); > + > static inline void __flush_tlb(void) > { > PVOP_VCALL0(mmu.flush_tlb_user); > @@ -62,6 +64,12 @@ static inline void __flush_tlb_one_user(unsigned long addr) > PVOP_VCALL1(mmu.flush_tlb_one_user, addr); > } > > +static inline void flush_tlb_multi(const struct cpumask *cpumask, > +const struct flush_tlb_info *info) > +{ > + PVOP_VCALL2(mmu.flush_tlb_multi, cpumask, info); > +} > + > static inline void flush_tlb_others(const struct cpumask *cpumask, > const struct flush_tlb_info *info) > { > diff --git a/arch/x86/include/asm/paravirt_types.h > b/arch/x86/include/asm/paravirt_types.h > index 946f8f1f1efc..b93b3d90729a 100644 > --- a/arch/x86/include/asm/paravirt_types.h > +++ b/arch/x86/include/asm/paravirt_types.h > @@ -211,6 +211,12 @@ struct pv_mmu_ops { > void (*flush_tlb_user)(void); > void (*flush_tlb_kernel)(void); > void (*flush_tlb_one_user)(unsigned long addr); > + /* > + * flush_tlb_multi() is the preferred interface, which is capable to > + * flush both local and remote CPUs. > + */ > + void (*flush_tlb_multi)(const struct cpumask *cpus, > + const struct flush_tlb_info *info); > void (*flush_tlb_others)(const struct cpumask *cpus, >const struct flush_tlb_info *info); > > diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h > index dee375831962..79272938cf79 100644 > --- a/arch/x86/include/asm/tlbflush.h > +++ b/arch/x86/include/asm/tlbflush.h > @@ -569,6 +569,9 @@ static inline void flush_tlb_page(struct vm_area_struct > *vma, unsigned long a) > flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT, false); > } > > +void native_flush_tlb_multi(const struct cpumask *cpumask, > + const struct flush_tlb_info *info); > + > void native_flush_tlb_others(const struct cpumask *cpumask, >const struct flush_tlb_info *info); > > @@ -593,6 +596,9 @@ static inline void arch_tlbbatch_add_mm(struct > arch_tlbflush_unmap_batch *batch, > extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); > > #ifndef CONFIG_PARAVIRT > +#define flush_tlb_multi(mask, info) \ > + native_flush_tlb_multi(mask, info) > + > #define flush_tlb_others(mask, info) \ > native_flush_tlb_others(mask, info) > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index 5169b8cc35bb..00d81e898717 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -630,6 +630,7 @@ static void __init kvm_guest_init(void) >
[PATCH 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
To improve TLB shootdown performance, flush the remote and local TLBs concurrently. Introduce flush_tlb_multi() that does so. The current flush_tlb_others() interface is kept, since paravirtual interfaces need to be adapted first before it can be removed. This is left for future work. In such PV environments, TLB flushes are not performed, at this time, concurrently. Add a static key to tell whether this new interface is supported. Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Stephen Hemminger Cc: Sasha Levin Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: x...@kernel.org Cc: Juergen Gross Cc: Paolo Bonzini Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Boris Ostrovsky Cc: linux-hyp...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: k...@vger.kernel.org Cc: xen-de...@lists.xenproject.org Signed-off-by: Nadav Amit --- arch/x86/hyperv/mmu.c | 2 + arch/x86/include/asm/paravirt.h | 8 +++ arch/x86/include/asm/paravirt_types.h | 6 +++ arch/x86/include/asm/tlbflush.h | 6 +++ arch/x86/kernel/kvm.c | 1 + arch/x86/kernel/paravirt.c| 3 ++ arch/x86/mm/tlb.c | 71 ++- arch/x86/xen/mmu_pv.c | 2 + 8 files changed, 87 insertions(+), 12 deletions(-) diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c index e65d7fe6489f..ca28b400c87c 100644 --- a/arch/x86/hyperv/mmu.c +++ b/arch/x86/hyperv/mmu.c @@ -233,4 +233,6 @@ void hyperv_setup_mmu_ops(void) pr_info("Using hypercall for remote TLB flush\n"); pv_ops.mmu.flush_tlb_others = hyperv_flush_tlb_others; pv_ops.mmu.tlb_remove_table = tlb_remove_table; + + static_key_disable(_tlb_multi_enabled.key); } diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index c25c38a05c1c..192be7254457 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -47,6 +47,8 @@ static inline void slow_down_io(void) #endif } +DECLARE_STATIC_KEY_TRUE(flush_tlb_multi_enabled); + static inline void __flush_tlb(void) { PVOP_VCALL0(mmu.flush_tlb_user); @@ -62,6 +64,12 @@ static inline void __flush_tlb_one_user(unsigned long addr) PVOP_VCALL1(mmu.flush_tlb_one_user, addr); } +static inline void flush_tlb_multi(const struct cpumask *cpumask, + const struct flush_tlb_info *info) +{ + PVOP_VCALL2(mmu.flush_tlb_multi, cpumask, info); +} + static inline void flush_tlb_others(const struct cpumask *cpumask, const struct flush_tlb_info *info) { diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 946f8f1f1efc..b93b3d90729a 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -211,6 +211,12 @@ struct pv_mmu_ops { void (*flush_tlb_user)(void); void (*flush_tlb_kernel)(void); void (*flush_tlb_one_user)(unsigned long addr); + /* +* flush_tlb_multi() is the preferred interface, which is capable to +* flush both local and remote CPUs. +*/ + void (*flush_tlb_multi)(const struct cpumask *cpus, + const struct flush_tlb_info *info); void (*flush_tlb_others)(const struct cpumask *cpus, const struct flush_tlb_info *info); diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index dee375831962..79272938cf79 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -569,6 +569,9 @@ static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long a) flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT, false); } +void native_flush_tlb_multi(const struct cpumask *cpumask, +const struct flush_tlb_info *info); + void native_flush_tlb_others(const struct cpumask *cpumask, const struct flush_tlb_info *info); @@ -593,6 +596,9 @@ static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch, extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); #ifndef CONFIG_PARAVIRT +#define flush_tlb_multi(mask, info)\ + native_flush_tlb_multi(mask, info) + #define flush_tlb_others(mask, info) \ native_flush_tlb_others(mask, info) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 5169b8cc35bb..00d81e898717 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -630,6 +630,7 @@ static void __init kvm_guest_init(void) kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) { pv_ops.mmu.flush_tlb_others = kvm_flush_tlb_others; pv_ops.mmu.tlb_remove_table = tlb_remove_table; + static_key_disable(_tlb_multi_enabled.key); } if