> 在 2016年6月26日,14:10,Boqun Feng <boqun.f...@gmail.com> 写道: > > On Sun, Jun 26, 2016 at 01:21:04PM +0800, panxinhui wrote: >> >>> 在 2016年6月26日,03:20,Peter Zijlstra <pet...@infradead.org> 写道: >>> >>> On Sun, Jun 26, 2016 at 01:27:56AM +0800, panxinhui wrote: >>>>>> Would that not have issues where the owner cpu is kept running but the >>>>>> spinner (ie. _this_ vcpu) gets preempted? I would think that in that >>>>>> case we too want to stop spinning. >>>>>> >>>>> >>>> do you mean that the spinner detect itself had yield out during the >>>> big spin loop? >>>> >>>> It is very possible to happen. BUT if spinner(on this vcpu) yield >>>> out, the next spinner would break the spin loop. AND if spinner >>>> detect itself yield out once, it’s very possible to get the osq lock >>>> soon as long as the ower vcpu is running. >>>> >>>> SO I think we need just check the owner vcpu’s yield_count. >>> >>> I had a quick look at KVM and it looks like it only has >>> kvm_cpu::preempted, which would suggest the interface boqun proposed. >>> >>> We'll have to look at many of the other virt platforms as well to see >>> what they can do. >>> >>> We could also provide _both_ interfaces and a platform can implement >>> whichever variant (or both) it can. >>> >> the kvm code on ppc has implemented yield_count. >> It let me feel a little relaxed. :) >> >> looks like we could introduce the interface like below. >> >> bool vcpu_is_preempted(int cpu) >> { >> return arch_vcpu_is_preempted(cpu); >> } >> >> #ifdef arch_vcpu_has_yield_count >> bool vcpu_has_preemtped_once(int cpu, unsigned int yield_count) >> { >> return arch_get_vcpu_yield_count() != yield_count; >> } >> >> #else >> bool vcpu_has_preemtped_once(int cpu, unsigned int yield_count) >> { >> /*just let called know it is preepmpted*/ >> return vcpu_is_preempted(cpu); >> } >> #endif >> > > Unfortunately, on PPC, we compile pseries code along with powernv code > in a kernel binary, therefore we need to wire the proper primitives at > runtime. > no, we can use same vcpu preempted check ops actually. just like how arch_spin_lock() does.
if (SHARED_PROCESSOR) __spin_yield(lock); so we can u32 arch_get_vcpu_yield_count(int cpu) { if (SHARED_PROCESSOR) return be32_to_cpu(lppaca_of(cpu).yield_count; return 0; } and all these things can be don inline then. > I've cooked the following patch, please note I haven't test this, this > is just a POC. > > Regards, > Boqun > > ---------------->8 > virt: Introduce vcpu preemption detection functions > > Lock holder preemption is an serious problem in virtualized environment > for locking. Different archs and hypervisors employ different ways to > try to solve the problem in different locking mechanisms. And recently > Pan Xinhui found a significant performance drop in osq_lock(), which > could be solved by providing a mechanism to detect the preemption of > vcpu. > > Therefore this patch introduces several vcpu preemption detection > primitives, which allows locking primtives to save the time of > unnecesarry spinning. These primitives act as an abstract layer between > locking functions and arch or hypervisor related code. > > There are two sets of primitives: > > 1. vcpu_preempt_count() and vcpu_has_preempted(), they must be used > pairwisely in a same preempt disable critical section. And they > could detect whether a vcpu preemption happens between them. > > 2. vcpu_is_preempted(), used to check whether other cpu's vcpu is > preempted. > > This patch also implements those primitives on pseries and wire them up. > > Signed-off-by: Boqun Feng <boqun.f...@gmail.com> > --- > arch/powerpc/platforms/pseries/Kconfig | 1 + > arch/powerpc/platforms/pseries/setup.c | 28 +++++++++++ > include/linux/vcpu_preempt.h | 90 ++++++++++++++++++++++++++++++++++ > 3 files changed, 119 insertions(+) > create mode 100644 include/linux/vcpu_preempt.h > > diff --git a/arch/powerpc/platforms/pseries/Kconfig > b/arch/powerpc/platforms/pseries/Kconfig > index bec90fb30425..7d24c3e48878 100644 > --- a/arch/powerpc/platforms/pseries/Kconfig > +++ b/arch/powerpc/platforms/pseries/Kconfig > @@ -21,6 +21,7 @@ config PPC_PSERIES > select HOTPLUG_CPU if SMP > select ARCH_RANDOM > select PPC_DOORBELL > + select HAS_VCPU_PREEMPTION_DETECTION > default y > it is already too many config there… as I comment above, we could make these things inline, so looks like we needn't to add such config. > config PPC_SPLPAR > diff --git a/arch/powerpc/platforms/pseries/setup.c > b/arch/powerpc/platforms/pseries/setup.c > index 9883bc7ea007..5d4aed54e039 100644 > --- a/arch/powerpc/platforms/pseries/setup.c > +++ b/arch/powerpc/platforms/pseries/setup.c > @@ -42,6 +42,7 @@ > #include <linux/of.h> > #include <linux/of_pci.h> > #include <linux/kexec.h> > +#include <linux/vcpu_preempt.h> > > #include <asm/mmu.h> > #include <asm/processor.h> > @@ -501,6 +502,31 @@ static void __init find_and_init_phbs(void) > of_pci_check_probe_only(); > } > > +struct vcpu_preempt_ops vcpu_preempt_ops = DEFAULT_VCPU_PREEMPT_OPS; > +EXPORT_SYMBOL(vcpu_preempt_ops); > + > +static long pseries_vcpu_preempt_count(void) > +{ > + return be32_to_cpu(get_lppaca()->yield_count); > +} > + > +static bool pseries_vcpu_is_preempted(int cpu) > +{ > + return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1); > +} > + > +static bool pseries_vcpu_has_preempted(long vpc) > +{ > + return pseries_vcpu_preempt_count() != vpc; > +} > + > +static void __init pseries_setup_vcpu_preempt_ops(void) > +{ > + vcpu_preempt_ops.preempt_count = pseries_vcpu_preempt_count; > + vcpu_preempt_ops.is_preempted = pseries_vcpu_is_preempted; > + vcpu_preempt_ops.has_preempted = pseries_vcpu_has_preempted; > +} > + > static void __init pSeries_setup_arch(void) > { > set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT); > @@ -549,6 +575,8 @@ static void __init pSeries_setup_arch(void) > "%ld\n", rc); > } > } > + > + pseries_setup_vcpu_preempt_ops(); > } > > static int __init pSeries_init_panel(void) > diff --git a/include/linux/vcpu_preempt.h b/include/linux/vcpu_preempt.h > new file mode 100644 > index 000000000000..4a88414bb83f > --- /dev/null > +++ b/include/linux/vcpu_preempt.h How about add them in sched.h I just think they are similar. correct me if I got a mistake :) thanks xinhui > @@ -0,0 +1,90 @@ > +/* > + * Primitives for checking the vcpu preemption from the guest. > + */ > + > +static long __vcpu_preempt_count(void) > +{ > + return 0; > +} > + > +static bool __vcpu_has_preempted(long vpc) > +{ > + return false; > +} > + > +static bool __vcpu_is_preempted(int cpu) > +{ > + return false; > +} > + > +struct vcpu_preempt_ops { > + /* > + * Get the current vcpu's "preempt count", which is going to use for > + * checking whether the current vcpu has ever been preempted > + */ > + long (*preempt_count)(void); > + > + /* > + * Return whether a vcpu is preempted > + */ > + bool (*is_preempted)(int cpu); > + > + /* > + * Given a "vcpu preempt count", Return whether a vcpu preemption ever > + * happened after the .preempt_count() was called. > + */ > + bool (*has_preempted)(long vpc); > +}; > + > +struct vcpu_preempt_ops vcpu_preempt_ops; > + > +/* Default boilerplate */ > +#define DEFAULT_VCPU_PREEMPT_OPS \ > + { \ > + .preempt_count = __vcpu_preempt_count, \ > + .is_preempted = __vcpu_is_preempted, \ > + .has_preempted = __vcpu_has_preempted \ > + } > + > +#ifdef CONFIG_HAS_VCPU_PREEMPTION_DETECTION > +/* > + * vcpu_preempt_count: Get the current cpu's "vcpu preempt count"(vpc). > + * > + * The vpc is used for checking whether the current vcpu has ever been > + * preempted via vcpu_has_preempted(). > + * > + * This function and vcpu_has_preepmted() should be called in the same > + * preemption disabled critical section. > + */ > +static long vcpu_preempt_count(void) > +{ > + return vcpu_preempt_ops.preempt_count(); > +} > + > +/* > + * vcpu_is_preempted: Check whether @cpu's vcpu is preempted. > + */ > +static bool vcpu_is_preempted(int cpu) > +{ > + return vcpu_preempt_ops.is_preempted(cpu); > +} > + > +/* > + * vcpu_has_preepmted: Check whether the current cpu's vcpu has ever been > + * preempted. > + * > + * The checked duration is between the vcpu_preempt_count() which returns > @vpc > + * is called and this function called. > + * > + * This function and corresponding vcpu_preempt_count() should be in the same > + * preemption disabled cirtial section. > + */ > +static bool vcpu_has_preempted(long vpc) > +{ > + return vcpu_preempt_ops.has_preempt(vpc); > +} > +#else /* CONFIG_HAS_VCPU_PREEMPTION_DETECTION */ > +#define vcpu_preempt_count() __vcpu_preempt_count() > +#define vcpu_is_preempted(cpu) __vcpu_is_preempted(cpu) > +#define vcpu_has_preempted(vpc) __vcpu_has_preempted(vpc) > +#endif /* CONFIG_HAS_VCPU_PREEPMTION_DETECTION */ > -- > 2.9.0 >