Re: [PATCH v13 10/11] pvqspinlock, x86: Enable PV qspinlock for KVM

2014-12-02 Thread Thomas Gleixner
On Wed, 29 Oct 2014, Waiman Long wrote:
> AIM7 XFS Disk Test (no overcommit)
>   kernel JPMReal Time   Sys TimeUsr Time
>   -  ----   
>   PV ticketlock 25423737.08   98.95   5.44
>   PV qspinlock  25495757.06   98.63   5.40
>   unfairlock  26162796.91   97.05   5.42
> 
> AIM7 XFS Disk Test (200% overcommit)
>   kernel JPMReal Time   Sys TimeUsr Time
>   -  ----   
>   PV ticketlock 64446827.93  415.22   6.33
>   PV qspinlock  64562427.88  419.84   0.39

That number is made up by what? 

>   unfairlock  69551825.88  377.40   4.09
> 
> AIM7 EXT4 Disk Test (no overcommit)
>   kernel JPMReal Time   Sys TimeUsr Time
>   -  ----   
>   PV ticketlock 19955659.02  103.67   5.76
>   PV qspinlock  20111738.95  102.15   5.40
>   unfairlock  20665908.71   98.13   5.46
> 
> AIM7 EXT4 Disk Test (200% overcommit)
>   kernel JPMReal Time   Sys TimeUsr Time
>   -  ----   
>   PV ticketlock 47834137.63  495.81  30.78
>   PV qspinlock  47405837.97  475.74  30.95
>   unfairlock  56022432.13  398.43  26.27
> 
> For the AIM7 disk workload, both PV ticketlock and qspinlock have
> about the same performance. The unfairlock performs slightly better
> than the PV lock.

Slightly?

Taking the PV locks, which are basically the same for the existing
ticket locks and your new fangled qlocks as a reference then the so
called 'unfair locks' which are just the native locks w/o the PV
nonsense are fundamentally better up to a whopping 18% in the
ext4/200% overcommit case. See below.
 
> EBIZZY-m Test (no overcommit)
>   kernelRec/s   Real Time   Sys TimeUsr Time
>   - -   -   
>   PV ticketlock 3255  10.00   60.65   3.62
>   PV qspinlock  3318  10.00   54.27   3.60
>   unfairlock  2833  10.00   26.66   3.09
> 
> EBIZZY-m Test (200% overcommit)
>   kernelRec/s   Real Time   Sys TimeUsr Time
>   - -   -   
>   PV ticketlock  841  10.00   71.03   2.37
>   PV qspinlock   834  10.00   68.27   2.39
>   unfairlock   865  10.00   27.08   1.51
> 
>   futextest (no overcommit)
>   kernel   kops/s
>   ---
>   PV ticketlock11523
>   PV qspinlock 12328
>   unfairlock  9478
> 
>   futextest (200% overcommit)
>   kernel   kops/s
>   ---
>   PV ticketlock 7276
>   PV qspinlock  7095
>   unfairlock  5614
> 
> The ebizzy and futextest have much higher spinlock contention than
> the AIM7 disk workload. In this case, the unfairlock performs worse
> than both the PV ticketlock and qspinlock. The performance of the 2
> PV locks are comparable.

While I can see that the PV lock stuff performs 13% better for the
ebizzy no overcommit case, what about the very interresting numbers
for the same test with 200% overcommit?

The regular lock has a slightly better performance, but significantly
less sys/usr time. How do you explain that?

'Lies, damned lies and statistics' comes to my mind.

Thanks,

tglx
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v13 10/11] pvqspinlock, x86: Enable PV qspinlock for KVM

2014-12-02 Thread Konrad Rzeszutek Wilk
On Wed, Oct 29, 2014 at 04:19:10PM -0400, Waiman Long wrote:
> This patch adds the necessary KVM specific code to allow KVM to
> support the CPU halting and kicking operations needed by the queue
> spinlock PV code.
> 
> Two KVM guests of 20 CPU cores (2 nodes) were created for performance
> testing in one of the following three configurations:
>  1) Only 1 VM is active
>  2) Both VMs are active and they share the same 20 physical CPUs
> (200% overcommit)
> 
> The tests run included the disk workload of the AIM7 benchmark on
> both ext4 and xfs RAM disks at 3000 users on a 3.17 based kernel. The
> "ebizzy -m" test and futextest was was also run and its performance
> data were recorded.  With two VMs running, the "idle=poll" kernel
> option was added to simulate a busy guest. If PV qspinlock is not
> enabled, unfairlock will be used automically in a guest.

What is the unfairlock? Isn't it just using a bytelock at this point?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v13 10/11] pvqspinlock, x86: Enable PV qspinlock for KVM

2014-10-29 Thread Waiman Long
This patch adds the necessary KVM specific code to allow KVM to
support the CPU halting and kicking operations needed by the queue
spinlock PV code.

Two KVM guests of 20 CPU cores (2 nodes) were created for performance
testing in one of the following three configurations:
 1) Only 1 VM is active
 2) Both VMs are active and they share the same 20 physical CPUs
(200% overcommit)

The tests run included the disk workload of the AIM7 benchmark on
both ext4 and xfs RAM disks at 3000 users on a 3.17 based kernel. The
"ebizzy -m" test and futextest was was also run and its performance
data were recorded.  With two VMs running, the "idle=poll" kernel
option was added to simulate a busy guest. If PV qspinlock is not
enabled, unfairlock will be used automically in a guest.

AIM7 XFS Disk Test (no overcommit)
  kernel JPMReal Time   Sys TimeUsr Time
  -  ----   
  PV ticketlock 25423737.08   98.95   5.44
  PV qspinlock  25495757.06   98.63   5.40
  unfairlock26162796.91   97.05   5.42

AIM7 XFS Disk Test (200% overcommit)
  kernel JPMReal Time   Sys TimeUsr Time
  -  ----   
  PV ticketlock 64446827.93  415.22   6.33
  PV qspinlock  64562427.88  419.84   0.39
  unfairlock69551825.88  377.40   4.09

AIM7 EXT4 Disk Test (no overcommit)
  kernel JPMReal Time   Sys TimeUsr Time
  -  ----   
  PV ticketlock 19955659.02  103.67   5.76
  PV qspinlock  20111738.95  102.15   5.40
  unfairlock20665908.71   98.13   5.46

AIM7 EXT4 Disk Test (200% overcommit)
  kernel JPMReal Time   Sys TimeUsr Time
  -  ----   
  PV ticketlock 47834137.63  495.81  30.78
  PV qspinlock  47405837.97  475.74  30.95
  unfairlock56022432.13  398.43  26.27

For the AIM7 disk workload, both PV ticketlock and qspinlock have
about the same performance. The unfairlock performs slightly better
than the PV lock.

EBIZZY-m Test (no overcommit)
  kernelRec/s   Real Time   Sys TimeUsr Time
  - -   -   
  PV ticketlock 3255  10.00   60.65   3.62
  PV qspinlock  3318  10.00   54.27   3.60
  unfairlock2833  10.00   26.66   3.09

EBIZZY-m Test (200% overcommit)
  kernelRec/s   Real Time   Sys TimeUsr Time
  - -   -   
  PV ticketlock  841  10.00   71.03   2.37
  PV qspinlock   834  10.00   68.27   2.39
  unfairlock 865  10.00   27.08   1.51

  futextest (no overcommit)
  kernel   kops/s
  ---
  PV ticketlock11523
  PV qspinlock 12328
  unfairlock9478

  futextest (200% overcommit)
  kernel   kops/s
  ---
  PV ticketlock 7276
  PV qspinlock  7095
  unfairlock5614

The ebizzy and futextest have much higher spinlock contention than
the AIM7 disk workload. In this case, the unfairlock performs worse
than both the PV ticketlock and qspinlock. The performance of the 2
PV locks are comparable.

Signed-off-by: Waiman Long 
---
 arch/x86/kernel/kvm.c |  138 -
 kernel/Kconfig.locks  |2 +-
 2 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index eaa064c..1183645 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -568,7 +568,7 @@ arch_initcall(activate_jump_labels);
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
 
 /* Kick a cpu by its apicid. Used to wake up a halted vcpu */
-static void kvm_kick_cpu(int cpu)
+void kvm_kick_cpu(int cpu)
 {
int apicid;
unsigned long flags = 0;
@@ -576,7 +576,9 @@ static void kvm_kick_cpu(int cpu)
apicid = per_cpu(x86_cpu_to_apicid, cpu);
kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
 }
+PV_CALLEE_SAVE_REGS_THUNK(kvm_kick_cpu);
 
+#ifndef CONFIG_QUEUE_SPINLOCK
 enum kvm_contention_stat {
TAKEN_SLOW,
TAKEN_SLOW_PICKUP,
@@ -804,6 +806,132 @@ static void kvm_unlock_kick(struct arch_spinlock *lock, 
__ticket_t ticket)
}
}
 }
+#else /* !CONFIG_QUEUE_SPINLOCK */
+
+#ifdef CONFIG_KVM_DEBUG_FS
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+static u32 kick_nohlt_stats;   /* Kick but not halt count