> 在 2016年6月26日,22:11,Boqun Feng <boqun.f...@gmail.com> 写道: > > On Sun, Jun 26, 2016 at 02:58:00PM +0800, panxinhui wrote: > [snip] >>>> >>> >>> Unfortunately, on PPC, we compile pseries code along with powernv code >>> in a kernel binary, therefore we need to wire the proper primitives at >>> runtime. >>> >> no, we can use same vcpu preempted check ops actually. >> just like how arch_spin_lock() does. >> >> if (SHARED_PROCESSOR) >> __spin_yield(lock); >> >> so we can >> u32 arch_get_vcpu_yield_count(int cpu) >> { >> if (SHARED_PROCESSOR) > > SHARED_PROCESSOR is true only if a PPC_PSERIES kernel runs in shared > processor mode. Using this here will rule out the situation where a > PPC_PSERIES kernel(ppc guest) runs indedicated processor mode. I don’t > think we can rule out the dedicated processor mode, because even in the well, copied from the codes
/* * We are using a non architected field to determine if a partition is * shared or dedicated. This currently works on both KVM and PHYP, but * we will have to transition to something better. */ #define LPPACA_OLD_SHARED_PROC 2 static inline bool lppaca_shared_proc(struct lppaca *l) { return !!(l->__old_status & LPPACA_OLD_SHARED_PROC); } OKay, I have no idea about the difference between dedicated processor mode and the so-called shared.. Let me talk to you f2f monday…. > dedicated mode, the lock holder may still be preempted. So why? Why do > we only want the yield_count in the shared processor mode? > >> return be32_to_cpu(lppaca_of(cpu).yield_count; >> return 0; >> } >> >> and all these things can be don inline then. >> >>> I've cooked the following patch, please note I haven't test this, this >>> is just a POC. >>> >>> Regards, >>> Boqun >>> >>> ---------------->8 >>> virt: Introduce vcpu preemption detection functions >>> >>> Lock holder preemption is an serious problem in virtualized environment >>> for locking. Different archs and hypervisors employ different ways to >>> try to solve the problem in different locking mechanisms. And recently >>> Pan Xinhui found a significant performance drop in osq_lock(), which >>> could be solved by providing a mechanism to detect the preemption of >>> vcpu. >>> >>> Therefore this patch introduces several vcpu preemption detection >>> primitives, which allows locking primtives to save the time of >>> unnecesarry spinning. These primitives act as an abstract layer between >>> locking functions and arch or hypervisor related code. >>> >>> There are two sets of primitives: >>> >>> 1. vcpu_preempt_count() and vcpu_has_preempted(), they must be used >>> pairwisely in a same preempt disable critical section. And they >>> could detect whether a vcpu preemption happens between them. >>> >>> 2. vcpu_is_preempted(), used to check whether other cpu's vcpu is >>> preempted. >>> >>> This patch also implements those primitives on pseries and wire them up. >>> >>> Signed-off-by: Boqun Feng <boqun.f...@gmail.com> >>> --- >>> arch/powerpc/platforms/pseries/Kconfig | 1 + >>> arch/powerpc/platforms/pseries/setup.c | 28 +++++++++++ >>> include/linux/vcpu_preempt.h | 90 >>> ++++++++++++++++++++++++++++++++++ >>> 3 files changed, 119 insertions(+) >>> create mode 100644 include/linux/vcpu_preempt.h >>> >>> diff --git a/arch/powerpc/platforms/pseries/Kconfig >>> b/arch/powerpc/platforms/pseries/Kconfig >>> index bec90fb30425..7d24c3e48878 100644 >>> --- a/arch/powerpc/platforms/pseries/Kconfig >>> +++ b/arch/powerpc/platforms/pseries/Kconfig >>> @@ -21,6 +21,7 @@ config PPC_PSERIES >>> select HOTPLUG_CPU if SMP >>> select ARCH_RANDOM >>> select PPC_DOORBELL >>> + select HAS_VCPU_PREEMPTION_DETECTION >>> default y >>> >> it is already too many config there… > > Why is this a problem? It's a Kconfig file, there are not too many > readability issues here. It simply serve as a database for the > relationships between different config options. menuconfig and its just a personal favor… I also a little hate the callback ops if it is not necessary to use, as it breaks my reading of the codes :- AND same reason, I don’t want to vim .config then /any config_ anyway, this is not a big problem… > friends can do their job well. BTW, have you read config X86? ;-) > >> as I comment above, we could make these things inline, so looks like we >> needn't to add such config. >> > > Another reason, I didn't make those primitives arch-specific inline > functions, is that they are actually not only arch-specific but also > (virtual-)environment-specific, or hypervisor-specific, that is their > implementations differ when the kernel runs on different hypervisors(KVM > and Xen, for example). > > If you make the implementation part inline functions, you end up > handling different virtual environment every time the functions are > called, which is a waste of time. We are lucky because phyp and kvm well, the call is also expensive. > share the same interface of yield count, but this might not be true for > other archs and other hypervisors. > fair enough. >>> config PPC_SPLPAR >>> diff --git a/arch/powerpc/platforms/pseries/setup.c >>> b/arch/powerpc/platforms/pseries/setup.c >>> index 9883bc7ea007..5d4aed54e039 100644 >>> --- a/arch/powerpc/platforms/pseries/setup.c >>> +++ b/arch/powerpc/platforms/pseries/setup.c >>> @@ -42,6 +42,7 @@ >>> #include <linux/of.h> >>> #include <linux/of_pci.h> >>> #include <linux/kexec.h> >>> +#include <linux/vcpu_preempt.h> >>> >>> #include <asm/mmu.h> >>> #include <asm/processor.h> >>> @@ -501,6 +502,31 @@ static void __init find_and_init_phbs(void) >>> of_pci_check_probe_only(); >>> } >>> >>> +struct vcpu_preempt_ops vcpu_preempt_ops = DEFAULT_VCPU_PREEMPT_OPS; >>> +EXPORT_SYMBOL(vcpu_preempt_ops); >>> + >>> +static long pseries_vcpu_preempt_count(void) >>> +{ >>> + return be32_to_cpu(get_lppaca()->yield_count); >>> +} >>> + >>> +static bool pseries_vcpu_is_preempted(int cpu) >>> +{ >>> + return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1); >>> +} >>> + >>> +static bool pseries_vcpu_has_preempted(long vpc) >>> +{ >>> + return pseries_vcpu_preempt_count() != vpc; >>> +} >>> + >>> +static void __init pseries_setup_vcpu_preempt_ops(void) >>> +{ >>> + vcpu_preempt_ops.preempt_count = pseries_vcpu_preempt_count; >>> + vcpu_preempt_ops.is_preempted = pseries_vcpu_is_preempted; >>> + vcpu_preempt_ops.has_preempted = pseries_vcpu_has_preempted; >>> +} >>> + >>> static void __init pSeries_setup_arch(void) >>> { >>> set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT); >>> @@ -549,6 +575,8 @@ static void __init pSeries_setup_arch(void) >>> "%ld\n", rc); >>> } >>> } >>> + >>> + pseries_setup_vcpu_preempt_ops(); >>> } >>> >>> static int __init pSeries_init_panel(void) >>> diff --git a/include/linux/vcpu_preempt.h b/include/linux/vcpu_preempt.h >>> new file mode 100644 >>> index 000000000000..4a88414bb83f >>> --- /dev/null >>> +++ b/include/linux/vcpu_preempt.h >> >> How about add them in sched.h > > I don't have a strong opinion on which header file to use. TBH, I was > trying to use some header files for parvirt stuff, but failed to find > one, so I add a new one because I don't want people confused between > preemption of vcpus and preemption of tasks. But once again, I don't > have a strong opinion ;-) > >> I just think they are similar. correct me if I got a mistake :) >> >> thanks >> xinhui >> >>> @@ -0,0 +1,90 @@ >>> +/* >>> + * Primitives for checking the vcpu preemption from the guest. >>> + */ >>> + >>> +static long __vcpu_preempt_count(void) >>> +{ >>> + return 0; >>> +} >>> + >>> +static bool __vcpu_has_preempted(long vpc) >>> +{ >>> + return false; >>> +} >>> + >>> +static bool __vcpu_is_preempted(int cpu) >>> +{ >>> + return false; >>> +} >>> + >>> +struct vcpu_preempt_ops { >>> + /* >>> + * Get the current vcpu's "preempt count", which is going to use for >>> + * checking whether the current vcpu has ever been preempted >>> + */ >>> + long (*preempt_count)(void); >>> + >>> >>> + /* >>> + * Return whether a vcpu is preempted >>> + */ >>> + bool (*is_preempted)(int cpu); >>> + >>> + /* >>> + * Given a "vcpu preempt count", Return whether a vcpu preemption ever >>> + * happened after the .preempt_count() was called. >>> + */ >>> + bool (*has_preempted)(long vpc); >>> +}; >>> + >>> +struct vcpu_preempt_ops vcpu_preempt_ops; >>> + >>> +/* Default boilerplate */ >>> +#define DEFAULT_VCPU_PREEMPT_OPS \ >>> + { \ >>> + .preempt_count = __vcpu_preempt_count, \ >>> + .is_preempted = __vcpu_is_preempted, \ >>> + .has_preempted = __vcpu_has_preempted \ >>> + } >>> + >>> +#ifdef CONFIG_HAS_VCPU_PREEMPTION_DETECTION >>> +/* >>> + * vcpu_preempt_count: Get the current cpu's "vcpu preempt count"(vpc). >>> + * >>> + * The vpc is used for checking whether the current vcpu has ever been >>> + * preempted via vcpu_has_preempted(). >>> + * >>> + * This function and vcpu_has_preepmted() should be called in the same >>> + * preemption disabled critical section. >>> + */ >>> +static long vcpu_preempt_count(void) >>> +{ >>> + return vcpu_preempt_ops.preempt_count(); >>> +} >>> + >>> +/* >>> + * vcpu_is_preempted: Check whether @cpu's vcpu is preempted. >>> + */ >>> +static bool vcpu_is_preempted(int cpu) >>> +{ >>> + return vcpu_preempt_ops.is_preempted(cpu); >>> +} >>> + >>> +/* >>> + * vcpu_has_preepmted: Check whether the current cpu's vcpu has ever been >>> + * preempted. >>> + * >>> + * The checked duration is between the vcpu_preempt_count() which returns >>> @vpc >>> + * is called and this function called. >>> + * >>> + * This function and corresponding vcpu_preempt_count() should be in the >>> same >>> + * preemption disabled cirtial section. >>> + */ >>> +static bool vcpu_has_preempted(long vpc) >>> +{ >>> + return vcpu_preempt_ops.has_preempt(vpc); >>> +} >>> +#else /* CONFIG_HAS_VCPU_PREEMPTION_DETECTION */ >>> +#define vcpu_preempt_count() __vcpu_preempt_count() >>> +#define vcpu_is_preempted(cpu) __vcpu_is_preempted(cpu) >>> +#define vcpu_has_preempted(vpc) __vcpu_has_preempted(vpc) >>> +#endif /* CONFIG_HAS_VCPU_PREEPMTION_DETECTION */ >>> -- >>> 2.9.0 >>> >>