Re: [PATCH v13 10/11] pvqspinlock, x86: Enable PV qspinlock for KVM
On Wed, 29 Oct 2014, Waiman Long wrote: > AIM7 XFS Disk Test (no overcommit) > kernel JPMReal Time Sys TimeUsr Time > - ---- > PV ticketlock 25423737.08 98.95 5.44 > PV qspinlock 25495757.06 98.63 5.40 > unfairlock 26162796.91 97.05 5.42 > > AIM7 XFS Disk Test (200% overcommit) > kernel JPMReal Time Sys TimeUsr Time > - ---- > PV ticketlock 64446827.93 415.22 6.33 > PV qspinlock 64562427.88 419.84 0.39 That number is made up by what? > unfairlock 69551825.88 377.40 4.09 > > AIM7 EXT4 Disk Test (no overcommit) > kernel JPMReal Time Sys TimeUsr Time > - ---- > PV ticketlock 19955659.02 103.67 5.76 > PV qspinlock 20111738.95 102.15 5.40 > unfairlock 20665908.71 98.13 5.46 > > AIM7 EXT4 Disk Test (200% overcommit) > kernel JPMReal Time Sys TimeUsr Time > - ---- > PV ticketlock 47834137.63 495.81 30.78 > PV qspinlock 47405837.97 475.74 30.95 > unfairlock 56022432.13 398.43 26.27 > > For the AIM7 disk workload, both PV ticketlock and qspinlock have > about the same performance. The unfairlock performs slightly better > than the PV lock. Slightly? Taking the PV locks, which are basically the same for the existing ticket locks and your new fangled qlocks as a reference then the so called 'unfair locks' which are just the native locks w/o the PV nonsense are fundamentally better up to a whopping 18% in the ext4/200% overcommit case. See below. > EBIZZY-m Test (no overcommit) > kernelRec/s Real Time Sys TimeUsr Time > - - - > PV ticketlock 3255 10.00 60.65 3.62 > PV qspinlock 3318 10.00 54.27 3.60 > unfairlock 2833 10.00 26.66 3.09 > > EBIZZY-m Test (200% overcommit) > kernelRec/s Real Time Sys TimeUsr Time > - - - > PV ticketlock 841 10.00 71.03 2.37 > PV qspinlock 834 10.00 68.27 2.39 > unfairlock 865 10.00 27.08 1.51 > > futextest (no overcommit) > kernel kops/s > --- > PV ticketlock11523 > PV qspinlock 12328 > unfairlock 9478 > > futextest (200% overcommit) > kernel kops/s > --- > PV ticketlock 7276 > PV qspinlock 7095 > unfairlock 5614 > > The ebizzy and futextest have much higher spinlock contention than > the AIM7 disk workload. In this case, the unfairlock performs worse > than both the PV ticketlock and qspinlock. The performance of the 2 > PV locks are comparable. While I can see that the PV lock stuff performs 13% better for the ebizzy no overcommit case, what about the very interresting numbers for the same test with 200% overcommit? The regular lock has a slightly better performance, but significantly less sys/usr time. How do you explain that? 'Lies, damned lies and statistics' comes to my mind. Thanks, tglx ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v13 10/11] pvqspinlock, x86: Enable PV qspinlock for KVM
On Wed, Oct 29, 2014 at 04:19:10PM -0400, Waiman Long wrote: > This patch adds the necessary KVM specific code to allow KVM to > support the CPU halting and kicking operations needed by the queue > spinlock PV code. > > Two KVM guests of 20 CPU cores (2 nodes) were created for performance > testing in one of the following three configurations: > 1) Only 1 VM is active > 2) Both VMs are active and they share the same 20 physical CPUs > (200% overcommit) > > The tests run included the disk workload of the AIM7 benchmark on > both ext4 and xfs RAM disks at 3000 users on a 3.17 based kernel. The > "ebizzy -m" test and futextest was was also run and its performance > data were recorded. With two VMs running, the "idle=poll" kernel > option was added to simulate a busy guest. If PV qspinlock is not > enabled, unfairlock will be used automically in a guest. What is the unfairlock? Isn't it just using a bytelock at this point? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v13 10/11] pvqspinlock, x86: Enable PV qspinlock for KVM
This patch adds the necessary KVM specific code to allow KVM to support the CPU halting and kicking operations needed by the queue spinlock PV code. Two KVM guests of 20 CPU cores (2 nodes) were created for performance testing in one of the following three configurations: 1) Only 1 VM is active 2) Both VMs are active and they share the same 20 physical CPUs (200% overcommit) The tests run included the disk workload of the AIM7 benchmark on both ext4 and xfs RAM disks at 3000 users on a 3.17 based kernel. The "ebizzy -m" test and futextest was was also run and its performance data were recorded. With two VMs running, the "idle=poll" kernel option was added to simulate a busy guest. If PV qspinlock is not enabled, unfairlock will be used automically in a guest. AIM7 XFS Disk Test (no overcommit) kernel JPMReal Time Sys TimeUsr Time - ---- PV ticketlock 25423737.08 98.95 5.44 PV qspinlock 25495757.06 98.63 5.40 unfairlock26162796.91 97.05 5.42 AIM7 XFS Disk Test (200% overcommit) kernel JPMReal Time Sys TimeUsr Time - ---- PV ticketlock 64446827.93 415.22 6.33 PV qspinlock 64562427.88 419.84 0.39 unfairlock69551825.88 377.40 4.09 AIM7 EXT4 Disk Test (no overcommit) kernel JPMReal Time Sys TimeUsr Time - ---- PV ticketlock 19955659.02 103.67 5.76 PV qspinlock 20111738.95 102.15 5.40 unfairlock20665908.71 98.13 5.46 AIM7 EXT4 Disk Test (200% overcommit) kernel JPMReal Time Sys TimeUsr Time - ---- PV ticketlock 47834137.63 495.81 30.78 PV qspinlock 47405837.97 475.74 30.95 unfairlock56022432.13 398.43 26.27 For the AIM7 disk workload, both PV ticketlock and qspinlock have about the same performance. The unfairlock performs slightly better than the PV lock. EBIZZY-m Test (no overcommit) kernelRec/s Real Time Sys TimeUsr Time - - - PV ticketlock 3255 10.00 60.65 3.62 PV qspinlock 3318 10.00 54.27 3.60 unfairlock2833 10.00 26.66 3.09 EBIZZY-m Test (200% overcommit) kernelRec/s Real Time Sys TimeUsr Time - - - PV ticketlock 841 10.00 71.03 2.37 PV qspinlock 834 10.00 68.27 2.39 unfairlock 865 10.00 27.08 1.51 futextest (no overcommit) kernel kops/s --- PV ticketlock11523 PV qspinlock 12328 unfairlock9478 futextest (200% overcommit) kernel kops/s --- PV ticketlock 7276 PV qspinlock 7095 unfairlock5614 The ebizzy and futextest have much higher spinlock contention than the AIM7 disk workload. In this case, the unfairlock performs worse than both the PV ticketlock and qspinlock. The performance of the 2 PV locks are comparable. Signed-off-by: Waiman Long --- arch/x86/kernel/kvm.c | 138 - kernel/Kconfig.locks |2 +- 2 files changed, 138 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index eaa064c..1183645 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -568,7 +568,7 @@ arch_initcall(activate_jump_labels); #ifdef CONFIG_PARAVIRT_SPINLOCKS /* Kick a cpu by its apicid. Used to wake up a halted vcpu */ -static void kvm_kick_cpu(int cpu) +void kvm_kick_cpu(int cpu) { int apicid; unsigned long flags = 0; @@ -576,7 +576,9 @@ static void kvm_kick_cpu(int cpu) apicid = per_cpu(x86_cpu_to_apicid, cpu); kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid); } +PV_CALLEE_SAVE_REGS_THUNK(kvm_kick_cpu); +#ifndef CONFIG_QUEUE_SPINLOCK enum kvm_contention_stat { TAKEN_SLOW, TAKEN_SLOW_PICKUP, @@ -804,6 +806,132 @@ static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket) } } } +#else /* !CONFIG_QUEUE_SPINLOCK */ + +#ifdef CONFIG_KVM_DEBUG_FS +static struct dentry *d_spin_debug; +static struct dentry *d_kvm_debug; +static u32 kick_nohlt_stats; /* Kick but not halt count