Re: [PATCH 0/9] qspinlock stuff -v15

2015-03-27 Thread Raghavendra K T

On 03/16/2015 06:46 PM, Peter Zijlstra wrote:

Hi Waiman,

As promised; here is the paravirt stuff I did during the trip to BOS last week.

All the !paravirt patches are more or less the same as before (the only real
change is the copyright lines in the first patch).

The paravirt stuff is 'simple' and KVM only -- the Xen code was a little more
convoluted and I've no real way to test that but it should be stright fwd to
make work.

I ran this using the virtme tool (thanks Andy) on my laptop with a 4x
overcommit on vcpus (16 vcpus as compared to the 4 my laptop actually has) and
it both booted and survived a hackbench run (perf bench sched messaging -g 20
-l 5000).

So while the paravirt code isn't the most optimal code ever conceived it does 
work.

Also, the paravirt patching includes replacing the call with movb $0, %arg1
for the native case, which should greatly reduce the cost of having
CONFIG_PARAVIRT_SPINLOCKS enabled on actual hardware.

I feel that if someone were to do a Xen patch we can go ahead and merge this
stuff (finally!).

These patches do not implement the paravirt spinlock debug stats currently
implemented (separately) by KVM and Xen, but that should not be too hard to do
on top and in the 'generic' code -- no reason to duplicate all that.

Of course; once this lands people can look at improving the paravirt nonsense.



last time I had reported some hangs in kvm case, and I can confirm that
the current set of patches works fine.

Feel free to add
Tested-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com #kvm pv

As far as performance is concerned (with my 16core +ht machine having
16vcpu guests [ even w/ , w/o the lfsr hash patchset ]), I do not see
any significant observations to report, though I understand that we
could see much more benefit with large number of vcpus because of
possible reduction in cache bouncing.






___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 9/9] qspinlock, x86, kvm: Implement KVM support for paravirt qspinlock

2015-03-20 Thread Raghavendra K T

On 03/20/2015 02:38 AM, Waiman Long wrote:

On 03/19/2015 06:01 AM, Peter Zijlstra wrote:

[...]

You are probably right. The initial apply_paravirt() was done before the
SMP boot. Subsequent ones were at kernel module load time. I put a
counter in the __native_queue_spin_unlock() and it registered 26949
unlock calls in a 16-cpu guest before it got patched out.


because even printks take lock..

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-24 Thread Raghavendra K T
Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
prev = *lock;
add_smp(lock-tickets.head, TICKET_LOCK_INC);

/* add_smp() is a full mb() */

if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the add_smp() and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.

2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
 benchmark overcommit %improve
 kernbench  1x   -0.13
 kernbench  2x0.02
 dbench 1x   -1.77
 dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]
[PeterZ: Detailed changelog]

Reported-by: Sasha Levin sasha.le...@oracle.com
Suggested-by: Linus Torvalds torva...@linux-foundation.org
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Reviewed-by: Oleg Nesterov o...@redhat.com
Acked-by: David Vrabel david.vra...@citrix.com
---
 arch/x86/include/asm/spinlock.h | 94 -
 arch/x86/kernel/kvm.c   |  7 ++-
 arch/x86/xen/spinlock.c |  7 ++-
 3 files changed, 58 insertions(+), 50 deletions(-)

Changes for stable:
  - Don't replace the ACCESS_ONCE to READ_ONCE which would cause horraneous
Compiler warnings (Linus, David Vbriel, PeterZ, Ingo)

Changes since V4:
  - one more usage of tickets_equal() (Oleg)
  - head  tail situation can lead to false contended check (Oleg)
  - smp_mb__after_atomic() added after slowptha_enter (Oleg)

Changes since V3:
  - Detailed changelog (PeterZ)
  - Replace ACCESS_ONCE with READ_ONCE (oleg)
  - Add xen changes (Oleg)
  - Correct break logic in unlock_wait() (Oleg)

Changes since V2:
  - Move the slowpath flag to head, this enables xadd usage in unlock code
and inturn we can get rid of read/write after  unlock (Oleg)
  - usage of ticket_equals (Oleg)

Changes since V1:
  - Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: 
Jeremy).
  - Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy)
  - clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison.

 Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
base = 3_19_rc7

3_19_rc7_spinfix_v3
+---+---+---++---+
 kernbench (Time taken in sec lower is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x   54.2300 3.0652 54.3008 4.0366-0.13056
2x   90.1883 5.5509 90.1650 6.4336 0.02583
+---+---+---++---+
+---+---+---++---+
dbench (Throughput higher is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x 7029.9188 2.5952   6905.0712 4.4737-1.77595
2x 3254.207514.8291   3233.713726.8784-0.62976
+---+---+---++---+

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 625660f..cf87de3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -46,7 +46,7 @@ static __always_inline bool static_key_false(struct 
static_key *key);
 
 static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
-   set_bit(0, (volatile unsigned long *)lock-tickets.tail);
+   set_bit(0, (volatile unsigned long *)lock-tickets.head);
 }
 
 #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
@@ -60,10 +60,30 @@ static inline

Re: [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-24 Thread Raghavendra K T

On 02/24/2015 08:17 PM, Ingo Molnar wrote:


* Greg KH gre...@linuxfoundation.org wrote:


On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote:

Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
 prev = *lock;
 add_smp(lock-tickets.head, TICKET_LOCK_INC);

 /* add_smp() is a full mb() */

 if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the add_smp() and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.

2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Result:
  setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
  benchmark overcommit %improve
  kernbench  1x   -0.13
  kernbench  2x0.02
  dbench 1x   -1.77
  dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]
[PeterZ: Detailed changelog]

Reported-by: Sasha Levin sasha.le...@oracle.com
Suggested-by: Linus Torvalds torva...@linux-foundation.org
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Reviewed-by: Oleg Nesterov o...@redhat.com
Acked-by: David Vrabel david.vra...@citrix.com
---
  arch/x86/include/asm/spinlock.h | 94 -
  arch/x86/kernel/kvm.c   |  7 ++-
  arch/x86/xen/spinlock.c |  7 ++-
  3 files changed, 58 insertions(+), 50 deletions(-)

Changes for stable:
   - Don't replace the ACCESS_ONCE to READ_ONCE which would cause horraneous
 Compiler warnings (Linus, David Vbriel, PeterZ, Ingo)


What is the git commit id of this in Linus's tree?  What
stable tree(s) do you want this applied to?


It's:

  d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock


Yes, This is the original patch. Please note I have taken out the
READ_ONCE changes from the original patch to avoid build warnings
mentioned below.
(Those READ_ONCE changes were cosmetic and was not present in the
previous versions)



You'll also need this fix from Linus to avoid (harmless)
build warnings:

  dd36929720f4 kernel: make READ_ONCE() valid on const arguments


So this may not be absolutely necessary with the current patch.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-24 Thread Raghavendra K T

On 02/24/2015 08:50 PM, Greg KH wrote:

On Tue, Feb 24, 2015 at 03:47:37PM +0100, Ingo Molnar wrote:


* Greg KH gre...@linuxfoundation.org wrote:


On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote:

Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
 prev = *lock;
 add_smp(lock-tickets.head, TICKET_LOCK_INC);

 /* add_smp() is a full mb() */

 if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the add_smp() and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.

2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Result:
  setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
  benchmark overcommit %improve
  kernbench  1x   -0.13
  kernbench  2x0.02
  dbench 1x   -1.77
  dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]
[PeterZ: Detailed changelog]

Reported-by: Sasha Levin sasha.le...@oracle.com
Suggested-by: Linus Torvalds torva...@linux-foundation.org
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Reviewed-by: Oleg Nesterov o...@redhat.com
Acked-by: David Vrabel david.vra...@citrix.com
---
  arch/x86/include/asm/spinlock.h | 94 -
  arch/x86/kernel/kvm.c   |  7 ++-
  arch/x86/xen/spinlock.c |  7 ++-
  3 files changed, 58 insertions(+), 50 deletions(-)

Changes for stable:
   - Don't replace the ACCESS_ONCE to READ_ONCE which would cause horraneous
 Compiler warnings (Linus, David Vbriel, PeterZ, Ingo)


What is the git commit id of this in Linus's tree?  What
stable tree(s) do you want this applied to?


It's:

  d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock

You'll also need this fix from Linus to avoid (harmless)
build warnings:

  dd36929720f4 kernel: make READ_ONCE() valid on const arguments


Great.  But what stable kernel trees should it be backported to?  Just
3.19?  Or anything older?


My patch was intended only for 3.19.

Though paravirt changes have gone in 3.12, the problem manifested
clearly after some of the completion related changes. but I leave that 
decision to experts here. (I 'll send necessary changes if patch is

needed for older versions because it may not apply cleanly).


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015-02-17 Thread Raghavendra K T

On 02/16/2015 10:17 PM, David Vrabel wrote:

On 15/02/15 17:30, Raghavendra K T wrote:

--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -41,7 +41,7 @@ static u8 zero_stats;
  static inline void check_zero(void)
  {
u8 ret;
-   u8 old = ACCESS_ONCE(zero_stats);
+   u8 old = READ_ONCE(zero_stats);
if (unlikely(old)) {
ret = cmpxchg(zero_stats, old, 0);
/* This ensures only one fellow resets the stat */
@@ -112,6 +112,7 @@ __visible void xen_lock_spinning(struct arch_spinlock 
*lock, __ticket_t want)
struct xen_lock_waiting *w = this_cpu_ptr(lock_waiting);
int cpu = smp_processor_id();
u64 start;
+   __ticket_t head;
unsigned long flags;

/* If kicker interrupts not initialized yet, just spin */
@@ -159,11 +160,15 @@ __visible void xen_lock_spinning(struct arch_spinlock 
*lock, __ticket_t want)
 */
__ticket_enter_slowpath(lock);

+   /* make sure enter_slowpath, which is atomic does not cross the read */
+   smp_mb__after_atomic();
+
/*
 * check again make sure it didn't become free while
 * we weren't looking
 */
-   if (ACCESS_ONCE(lock-tickets.head) == want) {
+   head = READ_ONCE(lock-tickets.head);
+   if (__tickets_equal(head, want)) {
add_stats(TAKEN_SLOW_PICKUP, 1);
goto out;
}
@@ -204,8 +209,8 @@ static void xen_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
const struct xen_lock_waiting *w = per_cpu(lock_waiting, cpu);

/* Make sure we read lock before want */
-   if (ACCESS_ONCE(w-lock) == lock 
-   ACCESS_ONCE(w-want) == next) {
+   if (READ_ONCE(w-lock) == lock 
+   READ_ONCE(w-want) == next) {
add_stats(RELEASED_SLOW_KICKED, 1);
xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
break;


Acked-by: David Vrabel david.vra...@citrix.com

Although some of the ACCESS_ONCE to READ_ONCE changes are cosmetic and
are perhaps best left out of a patch destined for stable.



Thanks.
Yes, will send out a separate patch for -stable without READ_ONCE 
changes once this patches goes in.





___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015-02-15 Thread Raghavendra K T
* Raghavendra K T raghavendra...@linux.vnet.ibm.com [2015-02-15 11:25:44]:

Resending the V5 with smp_mb__after_atomic() change without bumping up
revision

---8---

From 0b9ecde30e3bf5b5b24009fd2ac5fc7ac4b81158 Mon Sep 17 00:00:00 2001
From: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Date: Fri, 6 Feb 2015 16:44:11 +0530
Subject: [PATCH RESEND V5] x86 spinlock: Fix memory corruption on completing
 completions

Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
prev = *lock;
add_smp(lock-tickets.head, TICKET_LOCK_INC);

/* add_smp() is a full mb() */

if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the add_smp() and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.

2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
 benchmark overcommit %improve
 kernbench  1x   -0.13
 kernbench  2x0.02
 dbench 1x   -1.77
 dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]
[PeterZ: Detailed changelog]

Reported-by: Sasha Levin sasha.le...@oracle.com
Suggested-by: Linus Torvalds torva...@linux-foundation.org
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/spinlock.h | 95 -
 arch/x86/kernel/kvm.c   | 10 +++--
 arch/x86/xen/spinlock.c | 10 +++--
 3 files changed, 59 insertions(+), 56 deletions(-)

potential TODO:
 * The whole patch be splitted into, 1. move slowpath flag
 2. fix memory corruption in completion problem ??

Changes since V4:
  - one more usage of tickets_equal() (Oleg)
  - head  tail situation can lead to false contended check (Oleg)
  - smp_mb__after_atomic() added after slowptha_enter (Oleg)

Changes since V3:
  - Detailed changelog (PeterZ)
  - Replace ACCESS_ONCE with READ_ONCE (oleg)
  - Add xen changes (Oleg)
  - Correct break logic in unlock_wait() (Oleg)

Changes since V2:
  - Move the slowpath flag to head, this enables xadd usage in unlock code
and inturn we can get rid of read/write after  unlock (Oleg)
  - usage of ticket_equals (Oleg)

Changes since V1:
  - Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: 
Jeremy).
  - Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy)
  - clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison.

 Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
base = 3_19_rc7

3_19_rc7_spinfix_v3
+---+---+---++---+
 kernbench (Time taken in sec lower is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x   54.2300 3.0652 54.3008 4.0366-0.13056
2x   90.1883 5.5509 90.1650 6.4336 0.02583
+---+---+---++---+
+---+---+---++---+
dbench (Throughput higher is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x 7029.9188 2.5952   6905.0712 4.4737-1.77595
2x 3254.207514.8291   3233.713726.8784-0.62976
+---+---+---++---+

 (here is the result I got from the patches, I believe there may
 be some small overhead from xadd etc, but overall looks fine but
 a thorough test may be needed)


diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h

Re: [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015-02-15 Thread Raghavendra K T

On 02/15/2015 09:47 PM, Oleg Nesterov wrote:

Well, I regret I mentioned the lack of barrier after enter_slowpath ;)

On 02/15, Raghavendra K T wrote:


@@ -46,7 +46,8 @@ static __always_inline bool static_key_false(struct 
static_key *key);

  static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
  {
-   set_bit(0, (volatile unsigned long *)lock-tickets.tail);
+   set_bit(0, (volatile unsigned long *)lock-tickets.head);
+   barrier();
  }


Because this barrier() looks really confusing.

Firsty, it is equally unneeded on x86. At the same time, it can not help.
We need a memory barrier() between set_bit(SLOWPATH) and READ_ONCE(head)
to avoid the race with spin_unlock().

So I think you should replace it with smp_mb__after_atomic() or remove it.



I resent the patch the above change.



Other than that I believe this version is correct. So I won't insist, this
is cosmetic after all.


Thanks Oleg.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V4] x86 spinlock: Fix memory corruption on completing completions

2015-02-14 Thread Raghavendra K T

On 02/13/2015 09:02 PM, Oleg Nesterov wrote:

On 02/13, Raghavendra K T wrote:


@@ -164,7 +161,7 @@ static inline int arch_spin_is_locked(arch_spinlock_t *lock)
  {
struct __raw_tickets tmp = READ_ONCE(lock-tickets);

-   return tmp.tail != tmp.head;
+   return tmp.tail != (tmp.head  ~TICKET_SLOWPATH_FLAG);
  }


Well, this can probably use __tickets_equal() too. But this is cosmetic.


That looks good. Added.


It seems that arch_spin_is_contended() should be fixed with this change,

(__ticket_t)(tmp.tail - tmp.head)  TICKET_LOCK_INC

can be true because of TICKET_SLOWPATH_FLAG in .head, even if it is actually
unlocked.



Done.
Hmm! it was because I was still under impression that slowpath bit is
in tail. You are right, situation could lead to positive max and may
report false contention.

And the (__ticket_t) typecast looks unnecessary, it only adds more

confusuin, but this is cosmetic too.



Done.


@@ -772,7 +773,8 @@ __visible void kvm_lock_spinning(struct arch_spinlock 
*lock, __ticket_t want)
 * check again make sure it didn't become free while
 * we weren't looking.
 */
-   if (ACCESS_ONCE(lock-tickets.head) == want) {
+   head = READ_ONCE(lock-tickets.head);
+   if (__tickets_equal(head, want)) {
add_stats(TAKEN_SLOW_PICKUP, 1);
goto out;


This is off-topic, but with or without this change perhaps it makes sense
to add smp_mb__after_atomic(). It is nop on x86, just to make this code
more understandable for those (for me ;) who can never remember even the
x86 rules.



Hope you meant it for add_stat. yes  smp_mb__after_atomic() would be
harmless barrier() in x86. Did not add this V5 as yoiu though  but this
made me look at slowpath_enter code and added an explicit barrier() 
there :).



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015-02-14 Thread Raghavendra K T

On 02/15/2015 11:25 AM, Raghavendra K T wrote:

Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
 prev = *lock;
 add_smp(lock-tickets.head, TICKET_LOCK_INC);

 /* add_smp() is a full mb() */

 if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the add_smp() and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.

2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Result:
  setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
  benchmark overcommit %improve
  kernbench  1x   -0.13
  kernbench  2x0.02
  dbench 1x   -1.77
  dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]
[PeterZ: Detailed changelog]

Reported-by: Sasha Levin sasha.le...@oracle.com
Suggested-by: Linus Torvalds torva...@linux-foundation.org
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---


Sasha, Hope this addresses invalid read concern you had with latest
xadd based implementation.

(Think we need to test without Oleg's PATCH] sched/completion: 
completion_done() should serialize with complete() reported by Paul.)


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015-02-14 Thread Raghavendra K T
Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
prev = *lock;
add_smp(lock-tickets.head, TICKET_LOCK_INC);

/* add_smp() is a full mb() */

if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the add_smp() and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.

2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
 benchmark overcommit %improve
 kernbench  1x   -0.13
 kernbench  2x0.02
 dbench 1x   -1.77
 dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]
[PeterZ: Detailed changelog]

Reported-by: Sasha Levin sasha.le...@oracle.com
Suggested-by: Linus Torvalds torva...@linux-foundation.org
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/spinlock.h | 95 -
 arch/x86/kernel/kvm.c   | 10 +++--
 arch/x86/xen/spinlock.c | 10 +++--
 3 files changed, 59 insertions(+), 56 deletions(-)

potential TODO:
 * The whole patch be splitted into, 1. move slowpath flag
 2. fix memory corruption in completion problem ??

Changes since V4:
  - one more usage of tickets_equal() (Oleg)
  - head  tail situation can lead to false contended check (Oleg)

Changes since V3:
  - Detailed changelog (PeterZ)
  - Replace ACCESS_ONCE with READ_ONCE (oleg)
  - Add xen changes (Oleg)
  - Correct break logic in unlock_wait() (Oleg)

Changes since V2:
  - Move the slowpath flag to head, this enables xadd usage in unlock code
and inturn we can get rid of read/write after  unlock (Oleg)
  - usage of ticket_equals (Oleg)

Changes since V1:
  - Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: 
Jeremy).
  - Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy)
  - clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison.

 Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
base = 3_19_rc7

3_19_rc7_spinfix_v3
+---+---+---++---+
 kernbench (Time taken in sec lower is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x   54.2300 3.0652 54.3008 4.0366-0.13056
2x   90.1883 5.5509 90.1650 6.4336 0.02583
+---+---+---++---+
+---+---+---++---+
dbench (Throughput higher is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x 7029.9188 2.5952   6905.0712 4.4737-1.77595
2x 3254.207514.8291   3233.713726.8784-0.62976
+---+---+---++---+

 (here is the result I got from the patches, I believe there may
 be some small overhead from xadd etc, but overall looks fine but
 a thorough test may be needed)


diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 625660f..4413315 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -46,7 +46,8 @@ static __always_inline bool static_key_false(struct 
static_key *key);
 
 static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
-   set_bit(0, (volatile unsigned long *)lock-tickets.tail);
+   set_bit(0, (volatile unsigned long *)lock-tickets.head);
+   barrier();
 }
 
 #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
@@ -60,10 +61,30

[PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T
Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
prev = *lock;
add_smp(lock-tickets.head, TICKET_LOCK_INC);

/* add_smp() is a full mb() */

if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);


which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the add_smp() and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg).
2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus).

Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
 benchmark overcommit %improve
 kernbench  1x   -0.13
 kernbench  2x0.02
 dbench 1x   -1.77
 dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]

Reported-by: Sasha Levin sasha.le...@oracle.com
Suggested-by: Linus Torvalds torva...@linux-foundation.org
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/spinlock.h | 87 -
 arch/x86/kernel/kvm.c   |  4 +-
 2 files changed, 45 insertions(+), 46 deletions(-)

potential TODO:
 * The whole patch be splitted into, 1. move slowpath flag
 2. fix memory corruption in completion problem ??
 * May be we could directly pass inc for __ticket_check_and_clear_slowpath
   but I hope current code is more readable.

Changes since V2:
  - Move the slowpath flag to head, this enables xadd usage in unlock code
and inturn we can get rid of read/write after  unlock (Oleg)
  - usage of ticket_equals (Oleg)

Changes since V1:
  - Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: 
Jeremy).
  - Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy)
  - clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison.
 Note: The current implementation is still based on avoid writing after unlock.
  we could still have potential invalid memory read. (Sasha)

 Result: 
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)

base = 3_19_rc7

3_19_rc7_spinfix_v3
+---+---+---++---+
 kernbench (Time taken in sec lower is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x   54.2300 3.0652 54.3008 4.0366-0.13056
2x   90.1883 5.5509 90.1650 6.4336 0.02583
+---+---+---++---+
+---+---+---++---+
dbench (Throughput higher is better) 
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x 7029.9188 2.5952   6905.0712 4.4737-1.77595
2x 3254.207514.8291   3233.713726.8784-0.62976
+---+---+---++---+

 (here is the result I got from the patches, I believe there may
 be some small overhead from xadd etc, but overall looks fine but
 a thorough test may be needed)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 625660f..9697b45 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -46,7 +46,7 @@ static __always_inline bool static_key_false(struct 
static_key *key);
 
 static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
-   set_bit(0, (volatile unsigned long *)lock-tickets.tail);
+   set_bit(0, (volatile unsigned long *)lock-tickets.head);
 }
 
 #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
@@ -60,10 +60,30 @@ static inline void __ticket_unlock_kick(arch_spinlock_t 
*lock,
 }
 
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
+static inline int  __tickets_equal(__ticket_t one, __ticket_t two)
+{
+   return !((one ^ two)  ~TICKET_SLOWPATH_FLAG);
+}
+
+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock,
+   __ticket_t head)
+{
+   if (head  TICKET_SLOWPATH_FLAG) {
+   arch_spinlock_t old, new;
+
+   old.tickets.head = head;
+   new.tickets.head = head  ~TICKET_SLOWPATH_FLAG;
+   old.tickets.tail = new.tickets.head + TICKET_LOCK_INC;
+   new.tickets.tail = old.tickets.tail;
+
+   /* try to clear slowpath flag when

[PATCH V4] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T
Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
prev = *lock;
add_smp(lock-tickets.head, TICKET_LOCK_INC);

/* add_smp() is a full mb() */

if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the add_smp() and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.

2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
 benchmark overcommit %improve
 kernbench  1x   -0.13
 kernbench  2x0.02
 dbench 1x   -1.77
 dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]
[PeterZ: Detailed changelog]

Reported-by: Sasha Levin sasha.le...@oracle.com
Suggested-by: Linus Torvalds torva...@linux-foundation.org
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/spinlock.h | 91 -
 arch/x86/kernel/kvm.c   | 10 +++--
 arch/x86/xen/spinlock.c | 10 +++--
 3 files changed, 56 insertions(+), 55 deletions(-)

potential TODO:
 * The whole patch be splitted into, 1. move slowpath flag
 2. fix memory corruption in completion problem ??

Changes since V3:
  - Detailed changelog (PeterZ)
  - Replace ACCESS_ONCE with READ_ONCE (oleg)
  - Add xen changes (Oleg)
  - Correct break logic in unlock_wait() (Oleg)

Changes since V2:
  - Move the slowpath flag to head, this enables xadd usage in unlock code
and inturn we can get rid of read/write after  unlock (Oleg)
  - usage of ticket_equals (Oleg)

Changes since V1:
  - Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: 
Jeremy).
  - Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy)
  - clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison.
 Note: The current implementation is still based on avoid writing after unlock.
  we could still have potential invalid memory read. (Sasha)

 Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
base = 3_19_rc7

3_19_rc7_spinfix_v3
+---+---+---++---+
 kernbench (Time taken in sec lower is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x   54.2300 3.0652 54.3008 4.0366-0.13056
2x   90.1883 5.5509 90.1650 6.4336 0.02583
+---+---+---++---+
+---+---+---++---+
dbench (Throughput higher is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x 7029.9188 2.5952   6905.0712 4.4737-1.77595
2x 3254.207514.8291   3233.713726.8784-0.62976
+---+---+---++---+

 (here is the result I got from the patches, I believe there may
 be some small overhead from xadd etc, but overall looks fine but
 a thorough test may be needed)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 625660f..646a1a3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -46,7 +46,7 @@ static __always_inline bool static_key_false(struct 
static_key *key);
 
 static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
-   set_bit(0, (volatile unsigned long *)lock-tickets.tail);
+   set_bit(0, (volatile unsigned long *)lock-tickets.head);
 }
 
 #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
@@ -60,10 +60,30 @@ static inline

Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T

On 02/12/2015 07:20 PM, Oleg Nesterov wrote:

On 02/12, Raghavendra K T wrote:


@@ -191,8 +189,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t 
*lock)
 * We need to check unlocked in a loop, tmp.head == head
 * can be false positive because of overflow.
 */
-   if (tmp.head == (tmp.tail  ~TICKET_SLOWPATH_FLAG) ||
-   tmp.head != head)
+   if (__tickets_equal(tmp.head, tmp.tail) || tmp.head != head)
break;


Ah, it seems that tmp.head != head should be turned into
!__tickets_equal(), no?

Suppose that TICKET_SLOWPATH_FLAG is set after the first ACCESS_ONCE(head),
then tmp.head != head will be true before the first unlock we are waiting
for.


Good catch. othewise we would wrongly break out even when somebody
does halt wait.



And perhaps you can turn these ACCESS_ONCE into READ_ONCE as well.


Yes again :)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T

On 02/12/2015 07:32 PM, Oleg Nesterov wrote:

Damn, sorry for noise, forgot to mention...

On 02/12, Raghavendra K T wrote:


+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock,
+   __ticket_t head)
+{
+   if (head  TICKET_SLOWPATH_FLAG) {
+   arch_spinlock_t old, new;
+
+   old.tickets.head = head;
+   new.tickets.head = head  ~TICKET_SLOWPATH_FLAG;
+   old.tickets.tail = new.tickets.head + TICKET_LOCK_INC;
+   new.tickets.tail = old.tickets.tail;
+
+   /* try to clear slowpath flag when there are no contenders */
+   cmpxchg(lock-head_tail, old.head_tail, new.head_tail);
+   }
+}


...


+clear_slowpath:
+   if (TICKET_SLOWPATH_FLAG)
+   __ticket_check_and_clear_slowpath(lock, inc.head);


I think you can remove this if (TICKET_SLOWPATH_FLAG) check. If it is
defined as zero, gcc should optimize out the code under if (head  0).



right, the above if ( ) is unnecesary, though we would have same code
at the end, getting rid of that makes code more clean.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T

On 02/12/2015 07:07 PM, Oleg Nesterov wrote:

On 02/12, Raghavendra K T wrote:


@@ -772,7 +773,8 @@ __visible void kvm_lock_spinning(struct arch_spinlock 
*lock, __ticket_t want)
 * check again make sure it didn't become free while
 * we weren't looking.
 */
-   if (ACCESS_ONCE(lock-tickets.head) == want) {
+   head = ACCESS_ONCE(lock-tickets.head);
+   if (__tickets_equal(head, want)) {
add_stats(TAKEN_SLOW_PICKUP, 1);


While at it, perhaps it makes sense to s/ACCESS_ONCE/READ_ONCE/ but this
is cosmetic.


yes.. will do.



We also need to change another user of enter_slow_path, xen_lock_spinning()
in arch/x86/xen/spinlock.c.



Had in mind but forgot before sending. will update resend.


Other than that looks correct at first glance... but this is up to
maintainers.



Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T

On 02/12/2015 08:30 PM, Peter Zijlstra wrote:

On Thu, Feb 12, 2015 at 05:17:27PM +0530, Raghavendra K T wrote:

[...]


Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg).
2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus).


Maybe spend a few more words explaining these things; something like:

Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, as long as we clear it again on the first
(try)lock -- this removes the write after unlock.


Nit: I 'll reword a bit here since slowpath flag would result in
unnecessary kick but otherwise harmless IMO.
something like:
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag 
would result in unnecessary kicks.



By moving the slowpath flag from the tail to the head ticket we avoid
the need to access both the head and tail tickets on unlock.

We can further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Its 'obvious' now, but maybe not so much after we've all not looked at
this for a few months.



Absolutely correct. Thanks Peter for the detailed and very helpful
writeup.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Raghavendra K T

On 02/11/2015 11:08 PM, Oleg Nesterov wrote:

On 02/11, Raghavendra K T wrote:


On 02/10/2015 06:56 PM, Oleg Nesterov wrote:


In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg
the whole .head_tail. Plus obviously more boring changes. This needs a
separate patch even _if_ this can work.


Correct, but apart from this, before doing xadd in unlock,
we would have to make sure lsb bit is cleared so that we can live with 1
bit overflow to tail which is unused. now either or both of head,tail
lsb bit may be set after unlock.


Sorry, can't understand... could you spell?

If TICKET_SLOWPATH_FLAG lives in .head arch_spin_unlock() could simply do

head = xadd(lock-tickets.head, TICKET_LOCK_INC);

if (head  TICKET_SLOWPATH_FLAG)
__ticket_unlock_kick(head);

so it can't overflow to .tail?



You are right.
I totally forgot we can get rid of tail operations :)



And we we do this, probably it makes sense to add something like

bool tickets_equal(__ticket_t one, __ticket_t two)
{
return (one ^ two)  ~TICKET_SLOWPATH_FLAG;
}



Very nice idea. I was tired of ~TICKET_SLOWPATH_FLAG usage all over in
the current (complex :)) implementation. These two suggestions helps
alot.


and change kvm_lock_spinning() to use tickets_equal(tickets.head, want), plus
it can have more users in asm/spinlock.h.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Raghavendra K T

On 02/10/2015 06:56 PM, Oleg Nesterov wrote:

On 02/10, Raghavendra K T wrote:


On 02/10/2015 06:23 AM, Linus Torvalds wrote:


  add_smp(lock-tickets.head, TICKET_LOCK_INC);
  if (READ_ONCE(lock-tickets.tail)  TICKET_SLOWPATH_FLAG) ..

into something like

  val = xadd((lock-ticket.head_tail, TICKET_LOCK_INC  TICKET_SHIFT);
  if (unlikely(val  TICKET_SLOWPATH_FLAG)) ...

would be the right thing to do. Somebody should just check that I got
that shift right, and that the tail is in the high bytes (head really
needs to be high to work, if it's in the low byte(s) the xadd would
overflow from head into tail which would be wrong).


Unfortunately xadd could result in head overflow as tail is high.

The other option was repeated cmpxchg which is bad I believe.
Any suggestions?


Stupid question... what if we simply move SLOWPATH from .tail to .head?
In this case arch_spin_unlock() could do xadd(tickets.head) and check
the result


It is a good idea. Trying this now.


In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg
the whole .head_tail. Plus obviously more boring changes. This needs a
separate patch even _if_ this can work.


Correct, but apart from this, before doing xadd in unlock,
we would have to make sure lsb bit is cleared so that we can live with 1 
bit overflow to tail which is unused. now either or both of head,tail

lsb bit may be set after unlock.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-10 Thread Raghavendra K T

On 02/10/2015 06:23 AM, Linus Torvalds wrote:

On Mon, Feb 9, 2015 at 4:02 AM, Peter Zijlstra pet...@infradead.org wrote:

On Mon, Feb 09, 2015 at 03:04:22PM +0530, Raghavendra K T wrote:

So we have 3 choices,
1. xadd
2. continue with current approach.
3. a read before unlock and also after that.


For the truly paranoid we have probe_kernel_address(), suppose the lock
was in module space and the module just got unloaded under us.


That's much too expensive.

The xadd shouldn't be noticeably more expensive than the current
add_smp(). Yes, lock xadd used to be several cycles slower than
just lock add on some early cores, but I think these days it's down
to a single-cycle difference, which is not really different from doing
a separate load after the add.

The real problem with xadd used to be that we always had to do magic
special-casing for i386, but that's one of the reasons we dropped
support for original 80386.

So I think Raghavendra's last version (which hopefully fixes the
lockup problem that Sasha reported) together with changing that


V2 did pass the stress, but getting confirmation Sasha would help.



 add_smp(lock-tickets.head, TICKET_LOCK_INC);
 if (READ_ONCE(lock-tickets.tail)  TICKET_SLOWPATH_FLAG) ..

into something like

 val = xadd((lock-ticket.head_tail, TICKET_LOCK_INC  TICKET_SHIFT);
 if (unlikely(val  TICKET_SLOWPATH_FLAG)) ...

would be the right thing to do. Somebody should just check that I got
that shift right, and that the tail is in the high bytes (head really
needs to be high to work, if it's in the low byte(s) the xadd would
overflow from head into tail which would be wrong).


Unfortunately xadd could result in head overflow as tail is high.

The other option was repeated cmpxchg which is bad I believe.
Any suggestions?

( I have the V3 with Oleg's suggestion and performance numbers but
without this getting resolved, It will be one unnecessary iteration).

How about getting rid off SLOW_PATH_FLAG in spinlock (i.e. use it only
 as hint for paravirt), but do unlock_kick whenever we see that
(tail-head)  TICKET_LOCK_INC?. (but this also may need cmpxchg in loop
in unlock but we will be able to get rid of clear slowpath logic)

Only problem is we may do unnecessary kicks even in 1x load.








___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-09 Thread Raghavendra K T

On 02/09/2015 02:44 AM, Jeremy Fitzhardinge wrote:

On 02/06/2015 06:49 AM, Raghavendra K T wrote:

[...]



Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.


Yep, that seems like a sound approach.


Current approach seem to be working now. (though we could not avoid read).
Related question: Do you think we could avoid SLOWPATH_FLAG itself by
checking head and tail difference. or is it costly because it may
result in unnecessary unlock_kicks?


However it brings additional case to be handled, viz., slowpath still
could be set when somebody does arch_trylock. Handle that too by ignoring
slowpath flag during lock availability check.

Reported-by: Sasha Levin sasha.le...@oracle.com
Suggested-by: Linus Torvalds torva...@linux-foundation.org
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
  arch/x86/include/asm/spinlock.h | 70 -
  1 file changed, 34 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 625660f..0829f86 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -49,6 +49,23 @@ static inline void __ticket_enter_slowpath(arch_spinlock_t 
*lock)
set_bit(0, (volatile unsigned long *)lock-tickets.tail);
  }

+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
+{
+   arch_spinlock_t old, new;
+   __ticket_t diff;
+
+   old.tickets = READ_ONCE(lock-tickets);


Couldn't the caller pass in the lock state that it read rather than
re-reading it?



Yes we could. do you mean we could pass additional read value apart from 
lock, (because lock will be anyway needed for cmpxchg).




+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
+{
+}
+
  #endif /* CONFIG_PARAVIRT_SPINLOCKS */

  static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
@@ -84,7 +105,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t 
*lock)
register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };

inc = xadd(lock-tickets, inc);
-   if (likely(inc.head == inc.tail))
+   if (likely(inc.head == (inc.tail  ~TICKET_SLOWPATH_FLAG)))




good point, we can get rid of this as well.


The intent of this conditional was to be the quickest possible path when
taking a fastpath lock, with the code below being used for all slowpath
locks (free or taken). So I don't think masking out SLOWPATH_FLAG is
necessary here.


goto out;

inc.tail = ~TICKET_SLOWPATH_FLAG;
@@ -98,7 +119,10 @@ static __always_inline void arch_spin_lock(arch_spinlock_t 
*lock)
} while (--count);
__ticket_lock_spinning(lock, inc.tail);
}
-out:   barrier();  /* make sure nothing creeps before the lock is taken */
+out:
+   __ticket_check_and_clear_slowpath(lock);
+
+   barrier();  /* make sure nothing creeps before the lock is taken */


Which means that if goto out path is only ever used for fastpath
locks, you can limit calling __ticket_check_and_clear_slowpath() to the
slowpath case.



Yes, I ll move that call up.


  }

  static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
@@ -115,47 +139,21 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
  }

-static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock,
-   arch_spinlock_t old)
-{
-   arch_spinlock_t new;
-
-   BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
-
-   /* Perform the unlock on the before copy */
-   old.tickets.head += TICKET_LOCK_INC;


NB (see below)


Thanks for pointing, this solved the hang issue. I
missed this exact addition.




-
-   /* Clear the slowpath flag */
-   new.head_tail = old.head_tail  ~(TICKET_SLOWPATH_FLAG  TICKET_SHIFT);
-
-   /*
-* If the lock is uncontended, clear the flag - use cmpxchg in
-* case it changes behind our back though.
-*/
-   if (new.tickets.head != new.tickets.tail ||
-   cmpxchg(lock-head_tail, old.head_tail,
-   new.head_tail) != old.head_tail) {
-   /*
-* Lock still has someone queued for it, so wake up an
-* appropriate waiter.
-*/
-   __ticket_unlock_kick(lock, old.tickets.head);
-   }
-}
-
  static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
  {
if (TICKET_SLOWPATH_FLAG 
-   static_key_false(paravirt_ticketlocks_enabled)) {
-   arch_spinlock_t prev;
+   static_key_false(paravirt_ticketlocks_enabled)) {
+   __ticket_t prev_head;

-   prev = *lock;
+   prev_head = lock-tickets.head

Re: [PATCH V2] x86 spinlock: Fix memory corruption on completing completions

2015-02-09 Thread Raghavendra K T

Ccing Davidlohr, (sorry that I got confused with similar address in cc
list).

On 02/09/2015 08:44 PM, Oleg Nesterov wrote:

On 02/09, Raghavendra K T wrote:


+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
+{
+   arch_spinlock_t old, new;
+   __ticket_t diff;
+
+   old.tickets = READ_ONCE(lock-tickets);
+   diff = (old.tickets.tail  ~TICKET_SLOWPATH_FLAG) - old.tickets.head;
+
+   /* try to clear slowpath flag when there are no contenders */
+   if ((old.tickets.tail  TICKET_SLOWPATH_FLAG) 
+   (diff == TICKET_LOCK_INC)) {
+   new = old;
+   new.tickets.tail = ~TICKET_SLOWPATH_FLAG;
+   cmpxchg(lock-head_tail, old.head_tail, new.head_tail);
+   }
+}


Can't we simplify it? We own .head, and we already know it. We only need
to clear TICKET_SLOWPATH_FLAG in .tail atomically?

IOW,

static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t 
*lock, __ticket_t head)
{
__ticket_t old_tail, new_tail;

new_tail = head + TICKET_LOCK_INC;
old_tail = new_tail | TICKET_SLOWPATH_FLAG;

if (READ_ONCE(lock-tickets.tail) == old_tail)
cmpxchg(lock-tickets.tail, old_tail, new_tail);
}

Plus

-   __ticket_check_and_clear_slowpath(lock);
+   __ticket_check_and_clear_slowpath(lock, inc.tail);

Or I missed something?


Thanks.. Perfect, 'll update with this change. (Jeremy had hinted
similar).



And I think it would be better to avoid ifdef(CONFIG_PARAVIRT_SPINLOCKS),
ww can just do

if (TICKET_SLOWPATH_FLAG)
__ticket_check_and_clear_slowpath();



Okay.
While at it, I think current arch_spin_unlock() has similar structure
and wanted to clean it up. considering we define
TICKET_SLOWPATH_FLAG 0 or 1, I think compiler would be smart enough
to generate appropriate code and we could avoid #ifdef.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-09 Thread Raghavendra K T

On 02/09/2015 05:32 PM, Peter Zijlstra wrote:

On Mon, Feb 09, 2015 at 03:04:22PM +0530, Raghavendra K T wrote:

So we have 3 choices,
1. xadd
2. continue with current approach.
3. a read before unlock and also after that.


For the truly paranoid we have probe_kernel_address(), suppose the lock
was in module space and the module just got unloaded under us.



Thanks.. Good idea, How costly is it?
atleast we could do probe_kernel_address() and check the value of
slowpath flag if people as us to address invalid read problem.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V2] x86 spinlock: Fix memory corruption on completing completions

2015-02-09 Thread Raghavendra K T
Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
prev = *lock;
add_smp(lock-tickets.head, TICKET_LOCK_INC);

/* add_smp() is a full mb() */

if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);


which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the add_smp() and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

However it brings additional case to be handled, viz., slowpath still
could be set when somebody does arch_trylock. Handle that too by ignoring
slowpath flag during lock availability check.

[Jeremy: hinted missing TICKET_LOCK_INC for kick]

Reported-by: Sasha Levin sasha.le...@oracle.com
Suggested-by: Linus Torvalds torva...@linux-foundation.org
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/spinlock.h | 70 -
 1 file changed, 34 insertions(+), 36 deletions(-)


Changes since V1:
  
  Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: 
Jeremy).
  Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy)
  clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison.
 Note: The current implementation is still based on avoid writing after unlock.
  we could still have potential invalid memory read. (Sasha)

 Patch has passed stress testing.

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 625660f..7fc50d7 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -49,6 +49,23 @@ static inline void __ticket_enter_slowpath(arch_spinlock_t 
*lock)
set_bit(0, (volatile unsigned long *)lock-tickets.tail);
 }
 
+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
+{
+   arch_spinlock_t old, new;
+   __ticket_t diff;
+
+   old.tickets = READ_ONCE(lock-tickets);
+   diff = (old.tickets.tail  ~TICKET_SLOWPATH_FLAG) - old.tickets.head;
+
+   /* try to clear slowpath flag when there are no contenders */
+   if ((old.tickets.tail  TICKET_SLOWPATH_FLAG) 
+   (diff == TICKET_LOCK_INC)) {
+   new = old;
+   new.tickets.tail = ~TICKET_SLOWPATH_FLAG;
+   cmpxchg(lock-head_tail, old.head_tail, new.head_tail);
+   }
+}
+
 #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
 static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock,
__ticket_t ticket)
@@ -59,11 +76,15 @@ static inline void __ticket_unlock_kick(arch_spinlock_t 
*lock,
 {
 }
 
+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
+{
+}
+
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
 
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
-   return lock.tickets.head == lock.tickets.tail;
+   return lock.tickets.head == (lock.tickets.tail  ~TICKET_SLOWPATH_FLAG);
 }
 
 /*
@@ -98,7 +119,9 @@ static __always_inline void arch_spin_lock(arch_spinlock_t 
*lock)
} while (--count);
__ticket_lock_spinning(lock, inc.tail);
}
-out:   barrier();  /* make sure nothing creeps before the lock is taken */
+   __ticket_check_and_clear_slowpath(lock);
+out:
+   barrier();  /* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
@@ -115,47 +138,22 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
 }
 
-static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock,
-   arch_spinlock_t old)
-{
-   arch_spinlock_t new;
-
-   BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
-
-   /* Perform the unlock on the before copy */
-   old.tickets.head += TICKET_LOCK_INC;
-
-   /* Clear the slowpath flag */
-   new.head_tail = old.head_tail  ~(TICKET_SLOWPATH_FLAG  TICKET_SHIFT);
-
-   /*
-* If the lock is uncontended, clear the flag - use cmpxchg in
-* case it changes behind our back though.
-*/
-   if (new.tickets.head != new.tickets.tail ||
-   cmpxchg(lock-head_tail, old.head_tail,
-   new.head_tail) != old.head_tail) {
-   /*
-* Lock still has someone queued for it, so wake up an
-* appropriate waiter.
-*/
-   __ticket_unlock_kick(lock, old.tickets.head);
-   }
-}
-
 static __always_inline

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-08 Thread Raghavendra K T

On 02/07/2015 12:27 AM, Sasha Levin wrote:

On 02/06/2015 09:49 AM, Raghavendra K T wrote:

Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
 prev = *lock;
 add_smp(lock-tickets.head, TICKET_LOCK_INC);

 /* add_smp() is a full mb() */

 if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);


which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the add_smp() and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

However it brings additional case to be handled, viz., slowpath still
could be set when somebody does arch_trylock. Handle that too by ignoring
slowpath flag during lock availability check.

Reported-by: Sasha Levin sasha.le...@oracle.com
Suggested-by: Linus Torvalds torva...@linux-foundation.org
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com


With this patch, my VMs lock up quickly after boot with:


Tried to reproduce the hang myself, and there seems to be still a
barrier (or logic I miss).

Looking closely below, unlock_kick got missed though we see
that SLOWPATH_FLAG is still set:

/me goes back to look closely

(gdb) bt
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
#1  0x81037c27 in halt () at ./arch/x86/include/asm/paravirt.h:116
#2  kvm_lock_spinning (lock=0x88023ffe8240, want=52504) at 
arch/x86/kernel/kvm.c:786

#3  0x81037251 in __raw_callee_save_kvm_lock_spinning ()
#4  0x88023fc0edb0 in ?? ()
#5  0x in ?? ()

(gdb) p *(arch_spinlock_t *)0x88023ffe8240
$1 = {{head_tail = 3441806612, tickets = {head = 52500, tail = 52517}}}
(gdb) t 2
[Switching to thread 2 (Thread 2)]
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
55  }
(gdb) bt
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
#1  0x81037c27 in halt () at ./arch/x86/include/asm/paravirt.h:116
#2  kvm_lock_spinning (lock=0x88023ffe8240, want=52502) at 
arch/x86/kernel/kvm.c:786

#3  0x81037251 in __raw_callee_save_kvm_lock_spinning ()
#4  0x0246 in irq_stack_union ()
#5  0x00080750 in ?? ()
#6  0x0002 in ?? ()
#7  0x0004 in irq_stack_union ()
#8  0xcd16 in nmi_print_seq ()
Cannot access memory at address 0xbfc0
(gdb) t 3
[Switching to thread 3 (Thread 3)]
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
55  }
(gdb) bt
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
#1  0x81037c27 in halt () at ./arch/x86/include/asm/paravirt.h:116
#2  kvm_lock_spinning (lock=0x88023ffe8240, want=52512) at 
arch/x86/kernel/kvm.c:786

#3  0x81037251 in __raw_callee_save_kvm_lock_spinning ()
#4  0x88023fc8edb0 in ?? ()
#5  0x in ?? ()

[...] //other threads with similar output

(gdb) t 8
[Switching to thread 8 (Thread 8)]
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
55  }
(gdb) bt
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
#1  0x81037c27 in halt () at ./arch/x86/include/asm/paravirt.h:116
#2  kvm_lock_spinning (lock=0x88023ffe8240, want=52500) at 
arch/x86/kernel/kvm.c:786

#3  0x81037251 in __raw_callee_save_kvm_lock_spinning ()
#4  0x88023fdcedb0 in ?? ()
#5  0x in ?? ()

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-08 Thread Raghavendra K T

On 02/06/2015 09:55 PM, Linus Torvalds wrote:

On Fri, Feb 6, 2015 at 6:49 AM, Raghavendra K T
raghavendra...@linux.vnet.ibm.com wrote:

Paravirt spinlock clears slowpath flag after doing unlock.

[ fix edited out ]

So I'm not going to be applying this for 3.19, because it's much too
late and the patch is too scary. Plus the bug probably effectively
never shows up in real life (it is probably easy to trigger the
speculative *read* but probably never the actual speculative write
after dropping the lock last).



Understood and agreed.


This will need a lot of testing by the paravirt people - both
performance and correctness. So *maybe* for 3.20, but maybe for even
later, and then marked for stable, of course.

Are there any good paravirt stress-tests that people could run for
extended times?



I have been running several benchmarks (kern, sys, hack, ebizzy etc in
in 1x,2x scenarios. I run them for performance test as well.
(In the current patch I did not get kvm hang in normal run, But
overcommit reproduced it).

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-06 Thread Raghavendra K T
Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
prev = *lock;
add_smp(lock-tickets.head, TICKET_LOCK_INC);

/* add_smp() is a full mb() */

if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);


which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the add_smp() and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

However it brings additional case to be handled, viz., slowpath still
could be set when somebody does arch_trylock. Handle that too by ignoring
slowpath flag during lock availability check.

Reported-by: Sasha Levin sasha.le...@oracle.com
Suggested-by: Linus Torvalds torva...@linux-foundation.org
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/spinlock.h | 70 -
 1 file changed, 34 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 625660f..0829f86 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -49,6 +49,23 @@ static inline void __ticket_enter_slowpath(arch_spinlock_t 
*lock)
set_bit(0, (volatile unsigned long *)lock-tickets.tail);
 }
 
+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
+{
+   arch_spinlock_t old, new;
+   __ticket_t diff;
+
+   old.tickets = READ_ONCE(lock-tickets);
+   diff = (old.tickets.tail  ~TICKET_SLOWPATH_FLAG) - old.tickets.head;
+
+   /* try to clear slowpath flag when there are no contenders */
+   if ((old.tickets.tail  TICKET_SLOWPATH_FLAG) 
+   (diff == TICKET_LOCK_INC)) {
+   new = old;
+   new.tickets.tail = ~TICKET_SLOWPATH_FLAG;
+   cmpxchg(lock-head_tail, old.head_tail, new.head_tail);
+   }
+}
+
 #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
 static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock,
__ticket_t ticket)
@@ -59,6 +76,10 @@ static inline void __ticket_unlock_kick(arch_spinlock_t 
*lock,
 {
 }
 
+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
+{
+}
+
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
 
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
@@ -84,7 +105,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t 
*lock)
register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
inc = xadd(lock-tickets, inc);
-   if (likely(inc.head == inc.tail))
+   if (likely(inc.head == (inc.tail  ~TICKET_SLOWPATH_FLAG)))
goto out;
 
inc.tail = ~TICKET_SLOWPATH_FLAG;
@@ -98,7 +119,10 @@ static __always_inline void arch_spin_lock(arch_spinlock_t 
*lock)
} while (--count);
__ticket_lock_spinning(lock, inc.tail);
}
-out:   barrier();  /* make sure nothing creeps before the lock is taken */
+out:
+   __ticket_check_and_clear_slowpath(lock);
+
+   barrier();  /* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
@@ -115,47 +139,21 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
 }
 
-static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock,
-   arch_spinlock_t old)
-{
-   arch_spinlock_t new;
-
-   BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
-
-   /* Perform the unlock on the before copy */
-   old.tickets.head += TICKET_LOCK_INC;
-
-   /* Clear the slowpath flag */
-   new.head_tail = old.head_tail  ~(TICKET_SLOWPATH_FLAG  TICKET_SHIFT);
-
-   /*
-* If the lock is uncontended, clear the flag - use cmpxchg in
-* case it changes behind our back though.
-*/
-   if (new.tickets.head != new.tickets.tail ||
-   cmpxchg(lock-head_tail, old.head_tail,
-   new.head_tail) != old.head_tail) {
-   /*
-* Lock still has someone queued for it, so wake up an
-* appropriate waiter.
-*/
-   __ticket_unlock_kick(lock, old.tickets.head);
-   }
-}
-
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
if (TICKET_SLOWPATH_FLAG 
-   static_key_false(paravirt_ticketlocks_enabled)) {
-   arch_spinlock_t prev;
+   static_key_false

Re: [PATCH v14 08/11] qspinlock, x86: Rename paravirt_ticketlocks_enabled

2015-01-21 Thread Raghavendra K T

On 01/21/2015 01:42 AM, Waiman Long wrote:

This patch renames the paravirt_ticketlocks_enabled static key to a
more generic paravirt_spinlocks_enabled name.

Signed-off-by: Waiman Long waiman.l...@hp.com
Signed-off-by: Peter Zijlstra pet...@infradead.org
---

Reviewed-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v2] Implement Batched (group) ticket lock

2014-07-01 Thread Raghavendra K T

On 07/01/2014 01:35 PM, Peter Zijlstra wrote:

On Sat, Jun 28, 2014 at 02:47:04PM +0530, Raghavendra K T wrote:

In virtualized environment there are mainly three problems
related to spinlocks that affects performance.
1. LHP (lock holder preemption)
2. Lock Waiter Preemption (LWP)
3. Starvation/fairness

Though Ticketlocks solve fairness problem it worsens LWP, LHP problems. Though
pv-ticketlocks tried to address these problems we can further improve at the
  cost of relaxed fairness. The following patch tries to achieve that by 
grouping
(batched) ticketlocks.


And here I stop reading and ignore this patch, right?

Why should I look at this?



For baremetal we continue to have 'fully fair ticketlock' with this 
patch series.


Good thing about the patch is we do have upper bound on starvation and
at the same time we solve Lock Waiter Preemption horror of strict
serialization of ticketlocks in guest and combine definite advantages
of unfair locks for guest.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v2] Implement Batched (group) ticket lock

2014-07-01 Thread Raghavendra K T


For baremetal we continue to have 'fully fair ticketlock' with this patch
series.



But but but, we're looking at removing ticket locks. So why do we want
to invest in them now?



I have nothing against qspinlock. I am happy to test it/add any bit to 
it if I could.


With this patch we get excellent performance for guest with the
unmodified kernel without affecting host.

My test on guest with batch_size =16,32 showed even better performance
  bs=16  bs=32
 ebizzy_0.5x   0.14   0.90
 ebizzy_1.0x   3.57   7.52
 ebizzy_1.5x  58.97  67.65
 ebizzy_2.0x 121.55 136.45




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH v2] Implement Batched (group) ticket lock

2014-06-28 Thread Raghavendra K T
In virtualized environment there are mainly three problems
related to spinlocks that affects performance.
1. LHP (lock holder preemption)
2. Lock Waiter Preemption (LWP)
3. Starvation/fairness

Though Ticketlocks solve fairness problem it worsens LWP, LHP problems. Though
pv-ticketlocks tried to address these problems we can further improve at the
 cost of relaxed fairness. The following patch tries to achieve that by grouping
(batched) ticketlocks.

Here we form a batch of eligible lock holders and we serve the eligible (to hold
the lock) batches in FIFO, but the lock-waiters within eligible batch are served
in an unfair manner. This increases probability of any eligible lock-holder 
being
in running state (to an average of (batch_size/2)-1) and also it provides needed
bounded starvation since any lock requester can not acquire more than batch_size
times repeatedly during contention. On the negetive side we would need an extra 
cmpxchg.

The patch has the batch size of 8. (As we know increasing  batch size means we 
are
closer to unfair locks and batch size of 1 = ticketlock).

V2 implementation has been optimized for baremetal :
(1) Making !CONFIG_PARAVIRT_SPINLOCK case equivalent to older ticketlock.
(2) Removing extra overhead of cmpxchg for CONFIG_PARAVIRT_SPINLOCK case
by replacing with add_smp call.

Result:
Test system: 32cpu 2node machine w/ 64GB each (16 pcpu machine +ht).
Guests:  8GB  16vcpu guests (average of 8 iterations)

A: batched ticket with batch_size = 4
B: batched ticket with batch_size = 8
C: qspinlock V11 of Waiman
D: qspinlock V11 of Waiman with VIRT_UNFAIR=y

 % Improvements with kvm guests
   A  B C D
  ebizzy_0.5x   -0.20 1.071-0.07  1.39
  ebizzy_1.0x0.83 8.89  7.89  6.08
  ebizzy_1.5x   24.1138.39  5.42 42.56
  ebizzy_2.0x   47.7778.49 11.94108.14

Observations:
With increased batch_size batch ticketlock has closer performanace
to unfair locks and very promising for kvm guests that expects
high performance of unfair locks with bounded starvation property of
fair locks.

Baremetal:
No significant performnce difference even for CONFIG_PARAVIRT_SPINLOCK enabled
on baremetal

Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/spinlock.h   | 71 +--
 arch/x86/include/asm/spinlock_types.h | 46 ---
 arch/x86/kernel/kvm.c |  6 ++-
 arch/x86/xen/spinlock.c   |  6 ++-
 4 files changed, 107 insertions(+), 22 deletions(-)

Note:
 I tried different implementation for CONFIG_PRAVIRT enabled performance 
without luck
 for some time that added delay for V2 :(.

 (could not test with PeterZ's latest qspinlock which I reported)

Changes since V1:
 - Make sure no extra cmpxchg overhead for !CONFIG_PARAVIRT (it works like 
previous ticketlock now
   since batch size is set to 1).
 - Change TICKET_LOCK_INC_SHIFT back to 0 in !CONFIG_PARAVIRT case (Rik).
 - Add build check to make sure TICKET_BATCH is power of 2 (Rik).
 - Replace extra cmpxchg with add_smp for CONFIG_PARAVIRT enabled case in host 
(Waiman, Linus.. had concernes).
 - Correct TICKET_LOCK_BATCH_MASK to suit all TAIL_INC  (expression given by 
Waiman).
 - Add comment on LOCK bit (Waiman).
 - Add Xen support (Completely untested).

TODO: (But none of the below imply shortcoming of current patch)
- we can further add dynamically changing batch_size (to 1 and MAX size)
  implementation (inspiration and hint by Paul McKenney) as necessary.

- I have found that increasing  batch size gives excellent improvements for
 overcommitted guests. As Rik pointed we may want to further increase batch_size
 but we need its (side) effect on bigger machine for undercommit cases.

- Should we enable it for baremetal too? (As Rik pointed, exploiting NUMA may
  give more performance)

- Should we have config option for batch_size?

 I would like to thank Rik, Waiman for their review and valuable feedback for 
V1.
 Please provide your suggestion and comments.

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 54f1c80..cfb78e6 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -8,6 +8,7 @@
 #include linux/compiler.h
 #include asm/paravirt.h
 #include asm/bitops.h
+#include linux/bug.h
 
 /*
  * Your basic SMP spinlocks, allowing only a single CPU anywhere
@@ -49,7 +50,42 @@ static inline void __ticket_enter_slowpath(arch_spinlock_t 
*lock)
set_bit(0, (volatile unsigned long *)lock-tickets.tail);
 }
 
+static int __ticket_lock_get_batch_mask(void)
+{
+   if (static_key_false(paravirt_ticketlocks_enabled))
+   return TICKET_BATCH_MASK;
+   else
+   return TICKET_BATCH_MASK_NATIVE;
+}
+
+static void __ticket_lock_batch_spin(arch_spinlock_t *lock, __ticket_t ticket)
+{
+   if (static_key_false(paravirt_ticketlocks_enabled

Re: [PATCH 11/11] qspinlock, kvm: Add paravirt support

2014-06-22 Thread Raghavendra K T

On 06/15/2014 06:17 PM, Peter Zijlstra wrote:

Signed-off-by: Peter Zijlstrapet...@infradead.org
---

[...]

+
+void kvm_wait(int *ptr, int val)
+{
+   unsigned long flags;
+
+   if (in_nmi())
+   return;
+
+   /*
+* Make sure an interrupt handler can't upset things in a
+* partially setup state.
+*/


I am seeing hang with even 2 cpu guest (with patches on top of 3.15-rc6 ).
looking further with gdb I see one cpu is stuck with native_halt with
slowpath flag(_Q_LOCKED_SLOW) set when it was called.

(gdb) bt
#0  native_halt () at /test/master/arch/x86/include/asm/irqflags.h:55
#1  0x81033118 in halt (ptr=0x81eb0e58, val=524291) at 
/test/master/arch/x86/include/asm/paravirt.h:116
#2  kvm_wait (ptr=0x81eb0e58, val=524291) at 
arch/x86/kernel/kvm.c:835
#3  kvm_wait (ptr=0x81eb0e58, val=524291) at 
arch/x86/kernel/kvm.c:809
#4  0x810a2d8e in pv_wait (lock=0x81eb0e58) at 
/test/master/arch/x86/include/asm/paravirt.h:744
#5  __pv_wait_head (lock=0x81eb0e58) at 
kernel/locking/qspinlock.c:352


Value of lock seem to be 524288 (means already unlocked?)
So apart from races Waiman mentioned, are we also in need of smp_mb()
here and/or native_queue_unlock()?.

Interestingly I see other cpu stuck at multi_cpu_stop().

(gdb) thr 1
[Switching to thread 1 (Thread 1)]#0  multi_cpu_stop 
(data=0x8802140d1da0) at kernel/stop_machine.c:192

192 if (msdata-state != curstate) {

Or is it I am missing something.

please let me know if .config need to be shared.


+   local_irq_save(flags);
+
+   /*
+* check again make sure it didn't become free while
+* we weren't looking.
+*/
+   if (ACCESS_ONCE(*ptr) != val)
+   goto out;
+
+   /*
+* halt until it's our turn and kicked. Note that we do safe halt
+* for irq enabled case to avoid hang when lock info is overwritten
+* in irq spinlock slowpath and no spurious interrupt occur to save us.
+*/
+   if (arch_irqs_disabled_flags(flags))
+   halt();
+   else
+   safe_halt();
+
+out:
+   local_irq_restore(flags);
+}
+#endif /* QUEUE_SPINLOCK */


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC] Implement Batched (group) ticket lock

2014-05-30 Thread Raghavendra K T

On 05/30/2014 04:15 AM, Waiman Long wrote:

On 05/28/2014 08:16 AM, Raghavendra K T wrote:

- we need an intelligent way to nullify the effect of batching for
baremetal
  (because extra cmpxchg is not required).


To do this, you will need to have 2 slightly different algorithms
depending on the paravirt_ticketlocks_enabled jump label.


Thanks for the hint Waiman.

[...]

+spin:
+for (;;) {
+inc.head = ACCESS_ONCE(lock-tickets.head);
+if (!(inc.head  TICKET_LOCK_HEAD_INC)) {
+new.head = inc.head | TICKET_LOCK_HEAD_INC;
+if (cmpxchg(lock-tickets.head, inc.head, new.head)
+== inc.head)
+goto out;
+}
+cpu_relax();
+}
+


It had taken me some time to figure out the the LSB of inc.head is used
as a bit lock for the contending tasks in the spin loop. I would suggest
adding some comment here to make it easier to look at.


Agree. 'll add a comment.

[...]

+#define TICKET_BATCH0x4 /* 4 waiters can contend simultaneously */
+#define TICKET_LOCK_BATCH_MASK
(~(TICKET_BATCHTICKET_LOCK_INC_SHIFT) + \
+  TICKET_LOCK_TAIL_INC - 1)


I don't think TAIL_INC has anything to do with setting the BATCH_MASK.
It works here because TAIL_INC is 2. I think it is clearer to define it
as either (~(TICKET_BATCHTICKET_LOCK_INC_SHIFT) + 1) or
(~((TICKET_BATCHTICKET_LOCK_INC_SHIFT) - 1)).


You are right.
Thanks for pointing out. Your expression is simple and clearer. 'll use
one of them.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC] Implement Batched (group) ticket lock

2014-05-29 Thread Raghavendra K T

On 05/29/2014 03:25 AM, Rik van Riel wrote:

On 05/28/2014 08:16 AM, Raghavendra K T wrote:

This patch looks very promising.


Thank you Rik.

[...]


- My kernbench/ebizzy test on baremetal (32 cpu +ht sandybridge) did not seem to
   show the impact of extra cmpxchg. but there should be effect of extra 
cmpxchg.


Canceled out by better NUMA locality?


Yes perhaps. it was even slightly better.

[...]

- we can further add dynamically changing batch_size implementation 
(inspiration and
   hint by Paul McKenney) as necessary.


I could see a larger batch size being beneficial.

Currently the maximum wait time for a spinlock on a system
with N CPUs is N times the length of the largest critical
section.

Having the batch size set equal to the number of CPUs would only
double that, and better locality (CPUs local to the current
lock holder winning the spinlock operation) might speed things
up enough to cancel that part of that out again...


having batch size = number of cpus would definitely help contended cases
especially on larger machines (by my experience with testing on a 4
node 32 core machine). +ht case should make it even more
beneficial.

My only botheration was overhead in undercommit cases because of extra
cmpxchg.
So may be batch_size = total cpus / numa node be optimal?...

[...]

+#define TICKET_LOCK_INC_SHIFT 1
+#define __TICKET_LOCK_TAIL_INC (1TICKET_LOCK_INC_SHIFT)
+
  #ifdef CONFIG_PARAVIRT_SPINLOCKS
-#define __TICKET_LOCK_INC  2
  #define TICKET_SLOWPATH_FLAG  ((__ticket_t)1)
  #else
-#define __TICKET_LOCK_INC  1
  #define TICKET_SLOWPATH_FLAG  ((__ticket_t)0)
  #endif


For the !CONFIG_PARAVIRT case, TICKET_LOCK_INC_SHIFT used to be 0,
now you are making it one. Probably not an issue, since even people
who compile with 128  CONFIG_NR_CPUS = 256 will likely have their
spinlocks padded out to 32 or 64 bits anyway in most data structures.


Yes..

[...]

+#define TICKET_BATCH0x4 /* 4 waiters can contend simultaneously */
+#define TICKET_LOCK_BATCH_MASK (~(TICKET_BATCHTICKET_LOCK_INC_SHIFT) + \
+ TICKET_LOCK_TAIL_INC - 1)


I do not see the value in having TICKET_BATCH declared with a
hexadecimal number,


yes.. It had only helped me to make the idea readable to myself, I
could get rid of this if needed.

and it may be worth making sure the code

does not compile if someone tried a TICKET_BATCH value that
is not a power of 2.


I agree.  will have BUILD_BUG for not power of 2 in next version.
But yes it reminds me that I wanted to have TICKET_BATCH = 1 for
!CONFIG_PARAVIRT so that we continue to have original fair lock version.
Does that make sense? I left it after thinking about same kernel running
on host/guest which would anyway will have CONFIG_PARAVIRT on.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC] Implement Batched (group) ticket lock

2014-05-29 Thread Raghavendra K T

On 05/29/2014 12:16 PM, Peter Zijlstra wrote:

On Wed, May 28, 2014 at 05:46:39PM +0530, Raghavendra K T wrote:

In virtualized environment there are mainly three problems
related to spinlocks that affect performance.
1. LHP (lock holder preemption)
2. Lock Waiter Preemption (LWP)
3. Starvation/fairness

  Though ticketlocks solve the fairness problem, it worsens LWP, LHP problems.
pv-ticketlocks tried to address this. But we can further improve at the
cost of relaxed fairness.


So I really hate the idea of having different locks for paravirt and
normal kernels.


Yes. I understand that queued lock for normal kernel and unfair version 
of queued spinlock for virtual guest would do better.


Since strict serialization of lockwaiters (in both ticketlock/queued 
spinlock) does not work well for virtualized guest, my idea was to
give an alternate idea which has bounded starvation and performs as good 
as unfair version to virtualized guest.




And we're looking to move to that queued lock for normal kernels.


Agree. and I have tested that too.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC] Implement Batched (group) ticket lock

2014-05-28 Thread Raghavendra K T
In virtualized environment there are mainly three problems
related to spinlocks that affect performance.
1. LHP (lock holder preemption)
2. Lock Waiter Preemption (LWP)
3. Starvation/fairness

 Though ticketlocks solve the fairness problem, it worsens LWP, LHP problems.
pv-ticketlocks tried to address this. But we can further improve at the
cost of relaxed fairness.

In this patch, we form a batch of eligible lock holders and we serve the 
eligible
(to hold the lock) batches in FIFO, but the lock-waiters within eligible batch 
are served
in an unfair manner. This increases probability of any eligible lock-holder 
being
in running state (to an average of (batch_size/2)-1). It also provides needed
bounded starvation since any lock requester can not acquire more than batch_size
times repeatedly during contention. On the negetive side we would need an extra 
cmpxchg.

 The patch has the batch size of 4. (As we know increasing  batch size means we 
are
closer to unfair locks and batch size of 1 = ticketlock).

Result:
Test system: 32cpu 2node machine w/ 64GB each (16 pcpu machine +ht).
Guests:  8GB  16vcpu guests (average of 8 iterations)

 % Improvements with kvm guests (batch size = 4):
  ebizzy_0.5x   4.3
  ebizzy_1.0x   7.8
  ebizzy_1.5x  23.4
  ebizzy_2.0x  48.6

Baremetal:
ebizzy showed very high stdev, kernbench result was good but both of them did
not show much difference.

ebizzy: rec/sec higher is better
base50452
patched 50703

kernbench  time in sec (lesser is better)
base48.9 sec
patched 48.8 sec

Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/spinlock.h   | 35 +--
 arch/x86/include/asm/spinlock_types.h | 14 ++
 arch/x86/kernel/kvm.c |  6 --
 3 files changed, 39 insertions(+), 16 deletions(-)

TODO:
- we need an intelligent way to nullify the effect of batching for baremetal
 (because extra cmpxchg is not required).

- we may have to make batch size as kernel arg to solve above problem
 (to run same kernel for host/guest). Increasing batch size also seem to help
 virtualized guest more, so we will have flexibility of tuning depending on vm 
size.

- My kernbench/ebizzy test on baremetal (32 cpu +ht sandybridge) did not seem to
  show the impact of extra cmpxchg. but there should be effect of extra cmpxchg.

- virtualized guest had slight impact on 1x cases of some benchmarks but we 
have got
 impressive performance for 1x cases. So overall, patch needs exhaustive 
tesing.

- we can further add dynamically changing batch_size implementation 
(inspiration and
  hint by Paul McKenney) as necessary.
 
 I have found that increasing  batch size gives excellent improvements for 
 overcommitted guests. I understand that we need more exhaustive testing.

 Please provide your suggestion and comments.

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 0f62f54..87685f1 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -81,23 +81,36 @@ static __always_inline int 
arch_spin_value_unlocked(arch_spinlock_t lock)
  */
 static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
 {
-   register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
+   register struct __raw_tickets inc = { .tail = TICKET_LOCK_TAIL_INC };
+   struct __raw_tickets new;
 
inc = xadd(lock-tickets, inc);
-   if (likely(inc.head == inc.tail))
-   goto out;
 
inc.tail = ~TICKET_SLOWPATH_FLAG;
for (;;) {
unsigned count = SPIN_THRESHOLD;
 
do {
-   if (ACCESS_ONCE(lock-tickets.head) == inc.tail)
-   goto out;
+   if ((inc.head  TICKET_LOCK_BATCH_MASK) == (inc.tail 
+   TICKET_LOCK_BATCH_MASK))
+   goto spin;
cpu_relax();
+   inc.head = ACCESS_ONCE(lock-tickets.head);
} while (--count);
__ticket_lock_spinning(lock, inc.tail);
}
+spin:
+   for (;;) {
+   inc.head = ACCESS_ONCE(lock-tickets.head);
+   if (!(inc.head  TICKET_LOCK_HEAD_INC)) {
+   new.head = inc.head | TICKET_LOCK_HEAD_INC;
+   if (cmpxchg(lock-tickets.head, inc.head, new.head)
+   == inc.head)
+   goto out;
+   }
+   cpu_relax();
+   }
+
 out:   barrier();  /* make sure nothing creeps before the lock is taken */
 }
 
@@ -109,7 +122,8 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
if (old.tickets.head != (old.tickets.tail  ~TICKET_SLOWPATH_FLAG))
return 0;
 
-   new.head_tail = old.head_tail + (TICKET_LOCK_INC  TICKET_SHIFT);
+   new.head_tail

Re: [PATCH v9 00/19] qspinlock: a 4-byte queue spinlock with PV support

2014-04-27 Thread Raghavendra K T

On 04/17/2014 08:33 PM, Waiman Long wrote:

v8-v9:
   - Integrate PeterZ's version of the queue spinlock patch with some
 modification:
 http://lkml.kernel.org/r/20140310154236.038181...@infradead.org
   - Break the more complex patches into smaller ones to ease review effort.
   - Fix a racing condition in the PV qspinlock code.

v7-v8:
   - Remove one unneeded atomic operation from the slowpath, thus
 improving performance.
   - Simplify some of the codes and add more comments.
   - Test for X86_FEATURE_HYPERVISOR CPU feature bit to enable/disable
 unfair lock.
   - Reduce unfair lock slowpath lock stealing frequency depending
 on its distance from the queue head.
   - Add performance data for IvyBridge-EX CPU.

v6-v7:
   - Remove an atomic operation from the 2-task contending code
   - Shorten the names of some macros
   - Make the queue waiter to attempt to steal lock when unfair lock is
 enabled.
   - Remove lock holder kick from the PV code and fix a race condition
   - Run the unfair lock  PV code on overcommitted KVM guests to collect
 performance data.

v5-v6:
  - Change the optimized 2-task contending code to make it fairer at the
expense of a bit of performance.
  - Add a patch to support unfair queue spinlock for Xen.
  - Modify the PV qspinlock code to follow what was done in the PV
ticketlock.
  - Add performance data for the unfair lock as well as the PV
support code.

v4-v5:
  - Move the optimized 2-task contending code to the generic file to
enable more architectures to use it without code duplication.
  - Address some of the style-related comments by PeterZ.
  - Allow the use of unfair queue spinlock in a real para-virtualized
execution environment.
  - Add para-virtualization support to the qspinlock code by ensuring
that the lock holder and queue head stay alive as much as possible.

v3-v4:
  - Remove debugging code and fix a configuration error
  - Simplify the qspinlock structure and streamline the code to make it
perform a bit better
  - Add an x86 version of asm/qspinlock.h for holding x86 specific
optimization.
  - Add an optimized x86 code path for 2 contending tasks to improve
low contention performance.

v2-v3:
  - Simplify the code by using numerous mode only without an unfair option.
  - Use the latest smp_load_acquire()/smp_store_release() barriers.
  - Move the queue spinlock code to kernel/locking.
  - Make the use of queue spinlock the default for x86-64 without user
configuration.
  - Additional performance tuning.

v1-v2:
  - Add some more comments to document what the code does.
  - Add a numerous CPU mode to support = 16K CPUs
  - Add a configuration option to allow lock stealing which can further
improve performance in many cases.
  - Enable wakeup of queue head CPU at unlock time for non-numerous
CPU mode.

This patch set has 3 different sections:
  1) Patches 1-7: Introduces a queue-based spinlock implementation that
 can replace the default ticket spinlock without increasing the
 size of the spinlock data structure. As a result, critical kernel
 data structures that embed spinlock won't increase in size and
 break data alignments.
  2) Patches 8-13: Enables the use of unfair queue spinlock in a
 virtual guest. This can resolve some of the locking related
 performance issues due to the fact that the next CPU to get the
 lock may have been scheduled out for a period of time.
  3) Patches 14-19: Enable qspinlock para-virtualization support
 by halting the waiting CPUs after spinning for a certain amount of
 time. The unlock code will detect the a sleeping waiter and wake it
 up. This is essentially the same logic as the PV ticketlock code.

The queue spinlock has slightly better performance than the ticket
spinlock in uncontended case. Its performance can be much better
with moderate to heavy contention.  This patch has the potential of
improving the performance of all the workloads that have moderate to
heavy spinlock contention.

The queue spinlock is especially suitable for NUMA machines with at
least 2 sockets, though noticeable performance benefit probably won't
show up in machines with less than 4 sockets.

The purpose of this patch set is not to solve any particular spinlock
contention problems. Those need to be solved by refactoring the code
to make more efficient use of the lock or finer granularity ones. The
main purpose is to make the lock contention problems more tolerable
until someone can spend the time and effort to fix them.


For kvm part feel free to add:
Tested-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com

V9 testing has shown no hangs.
I was able to do some performance testing. here are the results:

Overall we are seeing good improvement for pv-unfair version.

System : 32 cpu sandybridge with HT on. (4 node machine with 32 GB each)
Guest: 8GB with 16 vcpu/VM.
Average was taken over 8-10 data points.

Base = 3.15-rc2

Re: [PATCH v9 00/19] qspinlock: a 4-byte queue spinlock with PV support

2014-04-17 Thread Raghavendra K T

On 04/17/2014 10:53 PM, Konrad Rzeszutek Wilk wrote:

On Thu, Apr 17, 2014 at 11:03:52AM -0400, Waiman Long wrote:

v8-v9:
   - Integrate PeterZ's version of the queue spinlock patch with some
 modification:
 http://lkml.kernel.org/r/20140310154236.038181...@infradead.org
   - Break the more complex patches into smaller ones to ease review effort.
   - Fix a racing condition in the PV qspinlock code.


I am not seeing anything mentioning that the overcommit scenario
for KVM and Xen had been fixed. Or was the 'racing condition' said
issue?


Saw changes in patch 18 that fixes for kvm (19 for xen). 'll
test the series and confirm.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 00/10] qspinlock: a 4-byte queue spinlock with PV support

2014-04-09 Thread Raghavendra K T

On 04/09/2014 12:45 AM, Waiman Long wrote:

Yes, I am able to reproduce the hang problem with ebizzy. BTW, could you
try to apply the attached patch file on top of the v8 patch series to
see if it can fix the hang problem?


Ran the benchmarks with the fix and I am not seeing hang so far.
ebizzy improvements.

0.5x  2.7345
1x   -10.6593
1.5x  35.6962
2x88.0461

dbench improvements
0.5x   3.2428
1x 1.1514
1.5x   5.5071
2x23.8700
( I will have to restest on ebizzy 1x reression but overall performance
numbers look good).


Baseline was 3.14-rc8 without any of the qspin patch series.



Does the baseline have PV ticketlock or without any PV support?


baseline had PV ticketlock enabled.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 00/10] qspinlock: a 4-byte queue spinlock with PV support

2014-04-07 Thread Raghavendra K T

On 04/02/2014 06:57 PM, Waiman Long wrote:

N.B. Sorry for the duplicate. This patch series were resent as the
  original one was rejected by the vger.kernel.org list server
  due to long header. There is no change in content.

v7-v8:
   - Remove one unneeded atomic operation from the slowpath, thus
 improving performance.
   - Simplify some of the codes and add more comments.
   - Test for X86_FEATURE_HYPERVISOR CPU feature bit to enable/disable
 unfair lock.
   - Reduce unfair lock slowpath lock stealing frequency depending
 on its distance from the queue head.
   - Add performance data for IvyBridge-EX CPU.

v6-v7:
   - Remove an atomic operation from the 2-task contending code
   - Shorten the names of some macros
   - Make the queue waiter to attempt to steal lock when unfair lock is
 enabled.
   - Remove lock holder kick from the PV code and fix a race condition
   - Run the unfair lock  PV code on overcommitted KVM guests to collect
 performance data.

v5-v6:
  - Change the optimized 2-task contending code to make it fairer at the
expense of a bit of performance.
  - Add a patch to support unfair queue spinlock for Xen.
  - Modify the PV qspinlock code to follow what was done in the PV
ticketlock.
  - Add performance data for the unfair lock as well as the PV
support code.

v4-v5:
  - Move the optimized 2-task contending code to the generic file to
enable more architectures to use it without code duplication.
  - Address some of the style-related comments by PeterZ.
  - Allow the use of unfair queue spinlock in a real para-virtualized
execution environment.
  - Add para-virtualization support to the qspinlock code by ensuring
that the lock holder and queue head stay alive as much as possible.

v3-v4:
  - Remove debugging code and fix a configuration error
  - Simplify the qspinlock structure and streamline the code to make it
perform a bit better
  - Add an x86 version of asm/qspinlock.h for holding x86 specific
optimization.
  - Add an optimized x86 code path for 2 contending tasks to improve
low contention performance.

v2-v3:
  - Simplify the code by using numerous mode only without an unfair option.
  - Use the latest smp_load_acquire()/smp_store_release() barriers.
  - Move the queue spinlock code to kernel/locking.
  - Make the use of queue spinlock the default for x86-64 without user
configuration.
  - Additional performance tuning.

v1-v2:
  - Add some more comments to document what the code does.
  - Add a numerous CPU mode to support = 16K CPUs
  - Add a configuration option to allow lock stealing which can further
improve performance in many cases.
  - Enable wakeup of queue head CPU at unlock time for non-numerous
CPU mode.

This patch set has 3 different sections:
  1) Patches 1-4: Introduces a queue-based spinlock implementation that
 can replace the default ticket spinlock without increasing the
 size of the spinlock data structure. As a result, critical kernel
 data structures that embed spinlock won't increase in size and
 break data alignments.
  2) Patches 5-6: Enables the use of unfair queue spinlock in a
 para-virtualized execution environment. This can resolve some
 of the locking related performance issues due to the fact that
 the next CPU to get the lock may have been scheduled out for a
 period of time.
  3) Patches 7-10: Enable qspinlock para-virtualization support
 by halting the waiting CPUs after spinning for a certain amount of
 time. The unlock code will detect the a sleeping waiter and wake it
 up. This is essentially the same logic as the PV ticketlock code.

The queue spinlock has slightly better performance than the ticket
spinlock in uncontended case. Its performance can be much better
with moderate to heavy contention.  This patch has the potential of
improving the performance of all the workloads that have moderate to
heavy spinlock contention.

The queue spinlock is especially suitable for NUMA machines with at
least 2 sockets, though noticeable performance benefit probably won't
show up in machines with less than 4 sockets.

The purpose of this patch set is not to solve any particular spinlock
contention problems. Those need to be solved by refactoring the code
to make more efficient use of the lock or finer granularity ones. The
main purpose is to make the lock contention problems more tolerable
until someone can spend the time and effort to fix them.

To illustrate the performance benefit of the queue spinlock, the
ebizzy benchmark was run with the -m option in two different computers:

   Test machine ticket-lock queue-lock
    --- --
   4-socket 40-core 2316 rec/s  2899 rec/s
   Westmere-EX (HT off)
   2-socket 12-core 2130 rec/s  2176 rec/s
   Westmere-EP (HT on)



I tested the v7,v8 of qspinlock with unfair config on kvm guest.
I was 

Re: [PATCH v8 00/10] qspinlock: a 4-byte queue spinlock with PV support

2014-04-07 Thread Raghavendra K T

On 04/07/2014 10:08 PM, Waiman Long wrote:

On 04/07/2014 02:14 AM, Raghavendra K T wrote:

[...]

But I am seeing hang in overcommit cases. Gdb showed that many vcpus
are halted and there was no progress. Suspecting the problem /race with
halting, I removed the halt() part of kvm_hibernate().  I am yet to
take a closer look at the code on halt() related changes.


It seems like there may still be race conditions where the current code
is not handling correctly. I will look into that to see where the
problem is. BTW, what test do you use to produce the hang condition?


Running ebizzy on 2 of the vms simultaneously (for sometime in repeated 
loop) could reproduce it.



Patch series with that change gave around 20% improvement for dbench
2x and 30% improvement for ebizzy 2x cases. (1x has no significant
loss/gain).



While at it, Just  a correction it was 30% for  ebizzy1.5x  and around
80% for ebizzy 2x.


What is the baseline for the performance improvement? Is it without the
unfair lock and PV qspinlock?


Baseline was 3.14-rc8 without any of the qspin patch series.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v5 4/8] pvqspinlock, x86: Allow unfair spinlock in a real PV environment

2014-03-04 Thread Raghavendra K T

The PV code in the v5 patch did seem to improve benchmark performance
with moderate to heavy spinlock contention. However, I didn't see much
CPU kicking going on. My theory is that the additional PV code
complicates the pause loop timing so that the hardware PLE didn't kick
in, whereas the original pause loop is pretty simple causing PLE to
happen fairly frequently.


you could play with ple_gap parameter to make it work for bigger
spin-loops in such cases.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v5 5/8] pvqspinlock, x86: Enable unfair queue spinlock in a KVM guest

2014-02-27 Thread Raghavendra K T

On 02/26/2014 08:44 PM, Waiman Long wrote:

This patch adds a KVM init function to activate the unfair queue
spinlock in a KVM guest when the PARAVIRT_UNFAIR_LOCKS kernel config
option is selected.

Signed-off-by: Waiman Long waiman.l...@hp.com
---
  arch/x86/kernel/kvm.c |   17 +
  1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 713f1b3..a489140 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -826,3 +826,20 @@ static __init int kvm_spinlock_init_jump(void)
  early_initcall(kvm_spinlock_init_jump);

  #endif/* CONFIG_PARAVIRT_SPINLOCKS */
+
+#ifdef CONFIG_PARAVIRT_UNFAIR_LOCKS
+/*
+ * Enable unfair lock if running in a real para-virtualized environment
+ */
+static __init int kvm_unfair_locks_init_jump(void)
+{
+   if (!kvm_para_available())
+   return 0;
+


kvm_kick_cpu_type() in patch 8 assumes that host has support for kick
hypercall (KVM_HC_KICK_CPU).

I think for that we need explicit check of this 
kvm_para_has_feature(KVM_FEATURE_PV_UNHALT).


otherwise things may break for unlikely case of running a new guest on
a old host?



+   static_key_slow_inc(paravirt_unfairlocks_enabled);
+   printk(KERN_INFO KVM setup unfair spinlock\n);
+
+   return 0;
+}
+early_initcall(kvm_unfair_locks_init_jump);
+#endif



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v5 7/8] pvqspinlock, x86: Add qspinlock para-virtualization support

2014-02-27 Thread Raghavendra K T

On 02/27/2014 08:15 PM, Paolo Bonzini wrote:
[...]

But neither of the VCPUs being kicked here are halted -- they're either
running or runnable (descheduled by the hypervisor).


/me actually looks at Waiman's code...

Right, this is really different from pvticketlocks, where the *unlock*
primitive wakes up a sleeping VCPU.  It is more similar to PLE
(pause-loop exiting).


Adding to the discussion, I see there are two possibilities here,
considering that in undercommit cases we should not exceed
HEAD_SPIN_THRESHOLD,

1. the looping vcpu in pv_head_spin_check() should do halt()
considering that we have done enough spinning (more than typical
lock-hold time), and hence we are in potential overcommit.

2. multiplex kick_cpu to do directed yield in qspinlock case.
But this may result in some ping ponging?



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V13 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

2013-08-26 Thread Raghavendra K T
From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com

kvm_hc_kick_cpu allows the calling vcpu to kick another vcpu out of halt state.
the presence of these hypercalls is indicated to guest via
kvm_feature_pv_unhalt.

Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration
During migration, any vcpu that got kicked but did not become runnable
(still in halted state) should be runnable after migration.

Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Suzuki Poulose suz...@in.ibm.com
[Raghu: Apic related changes, folding pvunhalted into vcpu_runnable
 Added flags for future use (suggested by Gleb)]
[ Raghu: fold pv_unhalt flag as suggested by Eric Northup]
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/asm/kvm_host.h |  5 +
 arch/x86/kvm/cpuid.c|  3 ++-
 arch/x86/kvm/x86.c  | 44 -
 3 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f87f7fc..1d1f711 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -511,6 +511,11 @@ struct kvm_vcpu_arch {
 * instruction.
 */
bool write_fault_to_shadow_pgtable;
+
+   /* pv related host specific info */
+   struct {
+   bool pv_unhalted;
+   } pv;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a20ecb5..b110fe6 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -413,7 +413,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 
function,
 (1  KVM_FEATURE_CLOCKSOURCE2) |
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_PV_EOI) |
-(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+(1  KVM_FEATURE_PV_UNHALT);
 
if (sched_info_on())
entry-eax |= (1  KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d21bce5..1e73dab 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5495,6 +5495,36 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
return 1;
 }
 
+/*
+ * kvm_pv_kick_cpu_op:  Kick a vcpu.
+ *
+ * @apicid - apicid of vcpu to be kicked.
+ */
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int 
apicid)
+{
+   struct kvm_vcpu *vcpu = NULL;
+   int i;
+
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   if (!kvm_apic_present(vcpu))
+   continue;
+
+   if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
+   break;
+   }
+   if (vcpu) {
+   /*
+* Setting unhalt flag here can result in spurious runnable
+* state when unhalt reset does not happen in vcpu_block.
+* But that is harmless since that should soon result in halt.
+*/
+   vcpu-arch.pv.pv_unhalted = true;
+   /* We need everybody see unhalt before vcpu unblocks */
+   smp_wmb();
+   kvm_vcpu_kick(vcpu);
+   }
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
unsigned long nr, a0, a1, a2, a3, ret;
@@ -5528,6 +5558,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
case KVM_HC_VAPIC_POLL_IRQ:
ret = 0;
break;
+   case KVM_HC_KICK_CPU:
+   kvm_pv_kick_cpu_op(vcpu-kvm, a0, a1);
+   ret = 0;
+   break;
default:
ret = -KVM_ENOSYS;
break;
@@ -5950,6 +5984,7 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
kvm_apic_accept_events(vcpu);
switch(vcpu-arch.mp_state) {
case KVM_MP_STATE_HALTED:
+   vcpu-arch.pv.pv_unhalted = false;
vcpu-arch.mp_state =
KVM_MP_STATE_RUNNABLE;
case KVM_MP_STATE_RUNNABLE:
@@ -6249,7 +6284,12 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu 
*vcpu,
struct kvm_mp_state *mp_state)
 {
kvm_apic_accept_events(vcpu);
-   mp_state-mp_state = vcpu-arch.mp_state;
+   if (vcpu-arch.mp_state == KVM_MP_STATE_HALTED 
+   vcpu-arch.pv.pv_unhalted)
+   mp_state-mp_state = KVM_MP_STATE_RUNNABLE;
+   else
+   mp_state-mp_state = vcpu-arch.mp_state;
+
return 0;
 }
 
@@ -6770,6 +6810,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
BUG_ON(vcpu-kvm == NULL);
kvm = vcpu

[PATCH V13 3/4] kvm hypervisor: Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic

2013-08-26 Thread Raghavendra K T
Note that we are using APIC_DM_REMRD which has reserved usage.
In future if APIC_DM_REMRD usage is standardized, then we should
find some other way or go back to old method.

Suggested-by: Gleb Natapov g...@redhat.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/kvm/lapic.c |  5 -
 arch/x86/kvm/x86.c   | 25 ++---
 2 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afc1124..48c13c9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -706,7 +706,10 @@ out:
break;
 
case APIC_DM_REMRD:
-   apic_debug(Ignoring delivery mode 3\n);
+   result = 1;
+   vcpu-arch.pv.pv_unhalted = 1;
+   kvm_make_request(KVM_REQ_EVENT, vcpu);
+   kvm_vcpu_kick(vcpu);
break;
 
case APIC_DM_SMI:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1e73dab..640d112 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5502,27 +5502,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
  */
 static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int 
apicid)
 {
-   struct kvm_vcpu *vcpu = NULL;
-   int i;
+   struct kvm_lapic_irq lapic_irq;
 
-   kvm_for_each_vcpu(i, vcpu, kvm) {
-   if (!kvm_apic_present(vcpu))
-   continue;
+   lapic_irq.shorthand = 0;
+   lapic_irq.dest_mode = 0;
+   lapic_irq.dest_id = apicid;
 
-   if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
-   break;
-   }
-   if (vcpu) {
-   /*
-* Setting unhalt flag here can result in spurious runnable
-* state when unhalt reset does not happen in vcpu_block.
-* But that is harmless since that should soon result in halt.
-*/
-   vcpu-arch.pv.pv_unhalted = true;
-   /* We need everybody see unhalt before vcpu unblocks */
-   smp_wmb();
-   kvm_vcpu_kick(vcpu);
-   }
+   lapic_irq.delivery_mode = APIC_DM_REMRD;
+   kvm_irq_delivery_to_apic(kvm, 0, lapic_irq, NULL);
 }
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V13 4/4] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock

2013-08-26 Thread Raghavendra K T
KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in paravirtual spinlock
enabled guest.

KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be enabled
in guest.

Thanks Vatsa for rewriting KVM_HC_KICK_CPU
Cc: Rob Landley r...@landley.net
Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 Documentation/virtual/kvm/cpuid.txt  |  4 
 Documentation/virtual/kvm/hypercalls.txt | 14 ++
 2 files changed, 18 insertions(+)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 83afe65..22ff659 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -43,6 +43,10 @@ KVM_FEATURE_CLOCKSOURCE2   || 3 || kvmclock 
available at msrs
 KVM_FEATURE_ASYNC_PF   || 4 || async pf can be enabled by
||   || writing to msr 0x4b564d02
 --
+KVM_FEATURE_PV_UNHALT  || 7 || guest checks this feature bit
+   ||   || before enabling paravirtualized
+   ||   || spinlock support.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/Documentation/virtual/kvm/hypercalls.txt 
b/Documentation/virtual/kvm/hypercalls.txt
index ea113b5..022198e 100644
--- a/Documentation/virtual/kvm/hypercalls.txt
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -64,3 +64,17 @@ Purpose: To enable communication between the hypervisor and 
guest there is a
 shared page that contains parts of supervisor visible register state.
 The guest can map this shared page to access its supervisor register through
 memory using this hypercall.
+
+5. KVM_HC_KICK_CPU
+
+Architecture: x86
+Status: active
+Purpose: Hypercall used to wakeup a vcpu from HLT state
+Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
+kernel mode for an event to occur (ex: a spinlock to become available) can
+execute HLT instruction once it has busy-waited for more than a threshold
+time-interval. Execution of HLT instruction would cause the hypervisor to put
+the vcpu to sleep until occurence of an appropriate event. Another vcpu of the
+same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall,
+specifying APIC ID (a1) of the vcpu to be woken up. An additional argument (a0)
+is used in the hypercall for future use.
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V13 1/4] kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi

2013-08-26 Thread Raghavendra K T
this is needed by both guest and host.

Originally-from: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 include/uapi/linux/kvm_para.h| 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 06fdbd9..94dc8ca 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -23,6 +23,7 @@
 #define KVM_FEATURE_ASYNC_PF   4
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
+#define KVM_FEATURE_PV_UNHALT  7
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index cea2c5c..2841f86 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP  2
 #define KVM_HC_FEATURES3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE  4
+#define KVM_HC_KICK_CPU5
 
 /*
  * hypercalls use architecture specific
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V13 0/4] Paravirtualized ticket spinlocks for KVM host

2013-08-26 Thread Raghavendra K T

On 08/26/2013 03:34 PM, Gleb Natapov wrote:

On Mon, Aug 26, 2013 at 02:18:32PM +0530, Raghavendra K T wrote:


  This series forms the kvm host part of paravirtual spinlock
  based against kvm tree.

  Please refer to https://lkml.org/lkml/2013/8/9/265 for
  kvm guest and Xen, x86 part merged to -tip spinlocks.

  Please note that:
  kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi is a common patch
  for both guest and host.


Thanks, applied. The patchset is not against kvm.git queue though, so I
had to fix one minor conflict manually.


Thank you Gleb.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH delta V13 14/14] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-14 Thread Raghavendra K T

On 08/14/2013 01:32 AM, Raghavendra K T wrote:


Ingo, below delta patch should fix it, IIRC, I hope you will be folding this
back to patch 14/14 itself. Else please let me.
I have already run allnoconfig, allyesconfig, randconfig with below patch. But 
will
test again.


I Did 2 more runs of allyesconfig, allnoconfig, and 40 runs of
randconfig. patchset with the fix looking good now.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RESEND V13 14/14] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-10 Thread Raghavendra K T
* Raghavendra K T raghavendra...@linux.vnet.ibm.com [2013-08-09 19:52:02]:
resending because x86_cpu_to_apicid is defined only for SMP systems.
so fold back kvm_kick_vcpu function into CONFIG_PARAVIRT_SPINLOCK that
depends on SMP. (this was taken out to for pv-flushtlb usage) 
---8---
From 10e92f7911a8aed5b8574f53607ffc5d094d4de1 Mon Sep 17 00:00:00 2001
From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Date: Tue, 6 Aug 2013 14:55:41 +0530
Subject: [PATCH V13 14/14] kvm : Paravirtual ticketlocks support for linux
 guests running on KVM hypervisor

During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so,
 support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Suzuki Poulose suz...@in.ibm.com
[Raghu: check_zero race fix, enum for kvm_contention_stat, jumplabel related 
changes,
addition of safe_halt for irq enabled case, bailout spinning in nmi case(Gleb)]
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/asm/kvm_para.h |  14 ++-
 arch/x86/kernel/kvm.c   | 262 
 2 files changed, 274 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 695399f..427afcb 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -118,10 +118,20 @@ void kvm_async_pf_task_wait(u32 token);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* !CONFIG_PARAVIRT_SPINLOCKS */
+static inline void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
 #define kvm_async_pf_task_wait(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
+
 static inline u32 kvm_read_and_reset_pf_reason(void)
 {
return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a96d32c..d442471 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -34,6 +34,7 @@
 #include linux/sched.h
 #include linux/slab.h
 #include linux/kprobes.h
+#include linux/debugfs.h
 #include asm/timer.h
 #include asm/cpu.h
 #include asm/traps.h
@@ -419,6 +420,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
WARN_ON(kvm_register_clock(primary cpu clock));
kvm_guest_cpu_init();
native_smp_prepare_boot_cpu();
+   kvm_spinlock_init();
 }
 
 static void kvm_guest_cpu_online(void *dummy)
@@ -523,3 +525,263 @@ static __init int activate_jump_labels(void)
return 0;
 }
 arch_initcall(activate_jump_labels);
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+/* Kick a cpu by its apicid. Used to wake up a halted vcpu */
+void kvm_kick_cpu(int cpu)
+{
+   int apicid;
+   unsigned long flags = 0;
+
+   apicid = per_cpu(x86_cpu_to_apicid, cpu);
+   kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
+}
+
+enum kvm_contention_stat {
+   TAKEN_SLOW,
+   TAKEN_SLOW_PICKUP,
+   RELEASED_SLOW,
+   RELEASED_SLOW_KICKED,
+   NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+#define HISTO_BUCKETS  30
+
+static struct kvm_spinlock_stats
+{
+   u32 contention_stats[NR_CONTENTION_STATS];
+   u32 histo_spin_blocked[HISTO_BUCKETS+1];
+   u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+   u8 ret;
+   u8 old;
+
+   old = ACCESS_ONCE(zero_stats);
+   if (unlikely(old)) {
+   ret = cmpxchg(zero_stats, old, 0);
+   /* This ensures only one fellow resets the stat */
+   if (ret == old)
+   memset(spinlock_stats, 0, sizeof(spinlock_stats));
+   }
+}
+
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+   check_zero();
+   spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+   return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+   unsigned index;
+
+   index = ilog2(delta);
+   check_zero();
+
+   if (index  HISTO_BUCKETS)
+   array[index]++;
+   else
+   array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+   u32 delta;
+
+   delta = sched_clock() - start;
+   __spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+   spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void

[PATCH RESEND V13 14/14] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-10 Thread Raghavendra K T
* Raghavendra K T raghavendra...@linux.vnet.ibm.com [2013-08-09 19:52:02]:

From 10e92f7911a8aed5b8574f53607ffc5d094d4de1 Mon Sep 17 00:00:00 2001
From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Date: Tue, 6 Aug 2013 14:55:41 +0530
Subject: [PATCH V13 RESEND 14/14] kvm : Paravirtual ticketlocks support for 
linux
 guests running on KVM hypervisor

During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so,
 support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Suzuki Poulose suz...@in.ibm.com
[Raghu: check_zero race fix, enum for kvm_contention_stat, jumplabel related 
changes,
addition of safe_halt for irq enabled case, bailout spinning in nmi case(Gleb)
move kvm_vcpu_kick inside CONFIG_PARAVIRT_SPINLOCK to fix UP compilation]
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/asm/kvm_para.h |  14 ++-
 arch/x86/kernel/kvm.c   | 262 
 2 files changed, 274 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 695399f..427afcb 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -118,10 +118,20 @@ void kvm_async_pf_task_wait(u32 token);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* !CONFIG_PARAVIRT_SPINLOCKS */
+static inline void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
 #define kvm_async_pf_task_wait(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
+
 static inline u32 kvm_read_and_reset_pf_reason(void)
 {
return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a96d32c..d442471 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -34,6 +34,7 @@
 #include linux/sched.h
 #include linux/slab.h
 #include linux/kprobes.h
+#include linux/debugfs.h
 #include asm/timer.h
 #include asm/cpu.h
 #include asm/traps.h
@@ -419,6 +420,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
WARN_ON(kvm_register_clock(primary cpu clock));
kvm_guest_cpu_init();
native_smp_prepare_boot_cpu();
+   kvm_spinlock_init();
 }
 
 static void kvm_guest_cpu_online(void *dummy)
@@ -523,3 +525,263 @@ static __init int activate_jump_labels(void)
return 0;
 }
 arch_initcall(activate_jump_labels);
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+/* Kick a cpu by its apicid. Used to wake up a halted vcpu */
+void kvm_kick_cpu(int cpu)
+{
+   int apicid;
+   unsigned long flags = 0;
+
+   apicid = per_cpu(x86_cpu_to_apicid, cpu);
+   kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
+}
+
+enum kvm_contention_stat {
+   TAKEN_SLOW,
+   TAKEN_SLOW_PICKUP,
+   RELEASED_SLOW,
+   RELEASED_SLOW_KICKED,
+   NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+#define HISTO_BUCKETS  30
+
+static struct kvm_spinlock_stats
+{
+   u32 contention_stats[NR_CONTENTION_STATS];
+   u32 histo_spin_blocked[HISTO_BUCKETS+1];
+   u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+   u8 ret;
+   u8 old;
+
+   old = ACCESS_ONCE(zero_stats);
+   if (unlikely(old)) {
+   ret = cmpxchg(zero_stats, old, 0);
+   /* This ensures only one fellow resets the stat */
+   if (ret == old)
+   memset(spinlock_stats, 0, sizeof(spinlock_stats));
+   }
+}
+
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+   check_zero();
+   spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+   return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+   unsigned index;
+
+   index = ilog2(delta);
+   check_zero();
+
+   if (index  HISTO_BUCKETS)
+   array[index]++;
+   else
+   array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+   u32 delta;
+
+   delta = sched_clock() - start;
+   __spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+   spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+   d_kvm_debug = debugfs_create_dir(kvm, NULL);
+   if (!d_kvm_debug)
+   printk(KERN_WARNING Could not create 'kvm

Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks

2013-08-09 Thread Raghavendra K T

On 08/09/2013 04:34 AM, H. Peter Anvin wrote:


Okay, I figured it out.

One of several problems with the formatting of this patchset is that it
has one- and two-digit patch numbers in the headers, which meant that my
scripts tried to apply patch 10 first.



My bad. I 'll send out in uniform digit form next time.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks

2013-08-09 Thread Raghavendra K T

On 08/09/2013 06:30 PM, Konrad Rzeszutek Wilk wrote:

My bad. I 'll send out in uniform digit form next time.



If you use 'git format-patch --subject-prefix PATCH V14 v3.11-rc4..'
and 'git send-email --subject [PATCH V14] bla blah ..'
that should be automatically taken care of?



Thanks Konrad.
Yes. it should. I am planning to use git send-email instead of custom
script.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks

2013-08-09 Thread Raghavendra K T

On 08/09/2013 06:34 AM, H. Peter Anvin wrote:

The kbuild test bot is reporting some pretty serious errors for this
patchset.  I think these are serious enough that the patchset will need
to be respun.



There were two problems:
(1) we were including spinlock_types.h in 
arch/x86/include/asm/paravirt_types.h.


This was resulting in redefinition of arch_spinlock_t for non SMP kernel

solution was :
conditional inclusion of spinlock_types.h like below

#ifdef CONFIG_SMP
#include asm/spinlock_types.h
#else
typedef u16 __ticket_t;
#endif

(we needed __ticket_t declaration for UP)

(2) we had forward declaration of atomic_read in jumpl_label.
It was causing problem for alpha, which has macro for atomic_read
instead of static inline.

Solution was to remove forward declaration and have atomic.h included.

Peter,
 the above two changes needs small changes in patch 1 and patch 9.
I 'll be resending only those two patches.

I am happy to resend the whole series if it is better and easier for
you.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks

2013-08-09 Thread Raghavendra K T

On 08/09/2013 06:34 AM, H. Peter Anvin wrote:

The kbuild test bot is reporting some pretty serious errors for this
patchset.  I think these are serious enough that the patchset will need
to be respun.



Sent V13, there were 3 patches in total that changed due to dependency.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V13 00/14] Paravirtualized ticket spinlocks

2013-08-09 Thread Raghavendra K T
This series replaces the existing paravirtualized spinlock mechanism
with a paravirtualized ticketlock mechanism. The series provides
implementation for both Xen and KVM.

The current set of patches are for Xen/x86 spinlock/KVM guest side, to be 
included
against -tip.

A separate patchset for KVM host based on kvm tree is already sent.

Results:
===
setup: 32 core machine with 32 vcpu KVM guest (HT off)  with 8GB RAM
base = 3.11-rc
patched = base + V12 series

+-+++
 dbench (Throughput in MB/sec. Higher is better)
+-+++
|   base (stdev %)|patched(stdev%) | %gain  |
+-+++
| 15035.3   (0.3) |15150.0   (0.6) |   0.8  |
|  1470.0   (2.2) | 1713.7   (1.9) |  16.6  |
|   848.6   (4.3) |  967.8   (4.3) |  14.0  |
|   652.9   (3.5) |  685.3   (3.7) |   5.0  |
+-+++

pvspinlock shows benefits for overcommit ratio  1 for PLE enabled cases,
and undercommits results are flat.

non PLE results are much better for smaller VMs. 
http://lkml.indiana.edu/hypermail/linux/kernel/1306.3/01095.html

I would thanks all the experts here for their time in reviewing and
commenting on the patch series (including perhaps who have been not
listed). 

Finally special thanks to KVM maintainers (Avi, Marcelo, Gleb) for
their comments and Andrew Theurer, Chegu Vinod for their valuable
testing and feedback.

The older series[4] was tested by Attilio for Xen implementation.

Note that Konrad needs to revert below two patches to enable xen on hvm 
  70dd4998, f10cd522c

Changes in V13
 - fixed arch_spinlock_t redefinition issue.
 - fixed forward declaration of atomic_read issue in jump_label

Changes in V12
 - spiltted uapi header patch.
 - dropped patch 18.
 - bailout of lock spinning in case of NMI (Gleb)
 - drop out patch 18 whose benefits are inconclusive (Gleb, Ingo)

Changes in V11:
 - use safe_halt in lock_spinning path to avoid potential problem 
  in case of irq_handlers taking lock in slowpath (Gleb)
 - add a0 flag for the kick hypercall for future extension  (Gleb)
 - add stubs for missing architecture for kvm_vcpu_schedule() (Gleb)
 - Change hypercall documentation.
 - Rebased to 3.11-rc1

Changes in V10:
Addressed Konrad's review comments:
- Added break in patch 5 since now we know exact cpu to wakeup
- Dropped patch 12 and Konrad needs to revert two patches to enable xen on hvm 
  70dd4998, f10cd522c
- Remove TIMEOUT and corrected spacing in patch 15
- Kicked spelling and correct spacing in patches 17, 18 

Changes in V9:
- Changed spin_threshold to 32k to avoid excess halt exits that are
   causing undercommit degradation (after PLE handler improvement).
- Added  kvm_irq_delivery_to_apic (suggested by Gleb)
- Optimized halt exit path to use PLE handler

V8 of PVspinlock was posted last year. After Avi's suggestions to look
at PLE handler's improvements, various optimizations in PLE handling
have been tried.

With this series we see that we could get little more improvements on top
of that.

Changes before V8:
Ticketlock change history:
Changes in V7:
 - Reabsed patches to 3.4-rc3
 - Added jumplabel split patch (originally from Andrew Jones rebased to
3.4-rc3
 - jumplabel changes from Ingo and Jason taken and now using static_key_*
instead of static_branch.
 - using UNINLINE_SPIN_UNLOCK (which was splitted as per suggestion from Linus)
 - This patch series is rebased on debugfs patch (that sould be already in
Xen/linux-next https://lkml.org/lkml/2012/3/23/51)

Changes in V6 posting: (Raghavendra K T)
 - Rebased to linux-3.3-rc6.
 - used function+enum in place of macro (better type checking) 
 - use cmpxchg while resetting zero status for possible race
[suggested by Dave Hansen for KVM patches ]

KVM patch Change history:
Changes in V6:
- Rebased to 3.4-rc3
- Removed debugfs changes patch which should now be in Xen/linux-next.
  (https://lkml.org/lkml/2012/3/30/687)
- Removed PV_UNHALT_MSR since currently we don't need guest communication,
  and made pv_unhalt folded to GET_MP_STATE (Marcello, Avi[long back])
- Take jumplabel changes from Ingo/Jason into use (static_key_slow_inc usage)
- Added inline to spinlock_init in non PARAVIRT case
- Move arch specific code to arch/x86 and add stubs to other archs (Marcello)
- Added more comments on pv_unhalt usage etc (Marcello)

Changes in V5:
- rebased to 3.3-rc6
- added PV_UNHALT_MSR that would help in live migration (Avi)
- removed PV_LOCK_KICK vcpu request and pv_unhalt flag (re)added.
- Changed hypercall documentaion (Alex).
- mode_t changed to umode_t in debugfs.
- MSR related documentation added.
- rename PV_LOCK_KICK to PV_UNHALT. 
- host and guest patches not mixed. (Marcelo, Alex)
- kvm_kick_cpu now takes cpu so it can be used by flush_tlb_ipi_other 
   paravirtualization (Nikunj)
- coding style changes in variable declarion etc (Srikar)

Changes in V4:
- reabsed to 3.2.0 pre.
- use APIC ID

[PATCH V13 02/14] x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks

2013-08-09 Thread Raghavendra K T
The code size expands somewhat, and its better to just call
a function rather than inline it.

Thanks Jeremy for original version of ARCH_NOINLINE_SPIN_UNLOCK config patch,
which is simplified.

Suggested-by: Linus Torvalds torva...@linux-foundation.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7f3867c..724ce55 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -663,6 +663,7 @@ config PARAVIRT_DEBUG
 config PARAVIRT_SPINLOCKS
bool Paravirtualization layer for spinlocks
depends on PARAVIRT  SMP
+   select UNINLINE_SPIN_UNLOCK
---help---
  Paravirtualized spinlocks allow a pvops backend to replace the
  spinlock implementation with something virtualization-friendly
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V13 03/14] x86/ticketlock: Collapse a layer of functions

2013-08-09 Thread Raghavendra K T
From: Jeremy Fitzhardinge jer...@goop.org

Now that the paravirtualization layer doesn't exist at the spinlock
level any more, we can collapse the __ticket_ functions into the arch_
functions.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/asm/spinlock.h | 35 +--
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 4d54244..7442410 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -76,7 +76,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
  * in the high part, because a wide xadd increment of the low part would carry
  * up and contaminate the high part.
  */
-static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
register struct __raw_tickets inc = { .tail = 1 };
 
@@ -96,7 +96,7 @@ static __always_inline void __ticket_spin_lock(struct 
arch_spinlock *lock)
 out:   barrier();  /* make sure nothing creeps before the lock is taken */
 }
 
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
+static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
arch_spinlock_t old, new;
 
@@ -110,7 +110,7 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
 }
 
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
+static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
__ticket_t next = lock-tickets.head + 1;
 
@@ -118,46 +118,21 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
__ticket_unlock_kick(lock, next);
 }
 
-static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
return tmp.tail != tmp.head;
 }
 
-static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
+static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
return (__ticket_t)(tmp.tail - tmp.head)  1;
 }
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_locked(lock);
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_contended(lock);
-}
 #define arch_spin_is_contended arch_spin_is_contended
 
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-   __ticket_spin_lock(lock);
-}
-
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-   return __ticket_spin_trylock(lock);
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-   __ticket_spin_unlock(lock);
-}
-
 static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
  unsigned long flags)
 {
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V13 04/14] xen: Defer spinlock setup until boot CPU setup

2013-08-09 Thread Raghavendra K T
From: Jeremy Fitzhardinge jer...@goop.org

There's no need to do it at very early init, and doing it there
makes it impossible to use the jump_label machinery.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/xen/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index ca92754..3b52d80 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -279,6 +279,7 @@ static void __init xen_smp_prepare_boot_cpu(void)
 
xen_filter_cpu_maps();
xen_setup_vcpu_info_placement();
+   xen_init_spinlocks();
 }
 
 static void __init xen_smp_prepare_cpus(unsigned int max_cpus)
@@ -680,7 +681,6 @@ void __init xen_smp_init(void)
 {
smp_ops = xen_smp_ops;
xen_fill_possible_map();
-   xen_init_spinlocks();
 }
 
 static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus)
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V13 05/14] xen/pvticketlock: Xen implementation for PV ticket locks

2013-08-09 Thread Raghavendra K T
From: Jeremy Fitzhardinge jer...@goop.org

Replace the old Xen implementation of PV spinlocks with and implementation
of xen_lock_spinning and xen_unlock_kick.

xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
adds itself to the waiting_cpus set, and blocks on an event channel
until the channel becomes pending.

xen_unlock_kick searches the cpus in waiting_cpus looking for the one
which next wants this lock with the next ticket, if any.  If found,
it kicks it by making its event channel pending, which wakes it up.

We need to make sure interrupts are disabled while we're relying on the
contents of the per-cpu lock_waiting values, otherwise an interrupt
handler could come in, try to take some other lock, block, and overwrite
our values.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 [ Raghavendra:  use function + enum instead of macro, cmpxchg for zero status 
reset
Reintroduce break since we know the exact vCPU to send IPI as suggested by 
Konrad.]
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/xen/spinlock.c | 348 +++-
 1 file changed, 79 insertions(+), 269 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index d509629..a458729 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -17,45 +17,44 @@
 #include xen-ops.h
 #include debugfs.h
 
-#ifdef CONFIG_XEN_DEBUG_FS
-static struct xen_spinlock_stats
-{
-   u64 taken;
-   u32 taken_slow;
-   u32 taken_slow_nested;
-   u32 taken_slow_pickup;
-   u32 taken_slow_spurious;
-   u32 taken_slow_irqenable;
+enum xen_contention_stat {
+   TAKEN_SLOW,
+   TAKEN_SLOW_PICKUP,
+   TAKEN_SLOW_SPURIOUS,
+   RELEASED_SLOW,
+   RELEASED_SLOW_KICKED,
+   NR_CONTENTION_STATS
+};
 
-   u64 released;
-   u32 released_slow;
-   u32 released_slow_kicked;
 
+#ifdef CONFIG_XEN_DEBUG_FS
 #define HISTO_BUCKETS  30
-   u32 histo_spin_total[HISTO_BUCKETS+1];
-   u32 histo_spin_spinning[HISTO_BUCKETS+1];
+static struct xen_spinlock_stats
+{
+   u32 contention_stats[NR_CONTENTION_STATS];
u32 histo_spin_blocked[HISTO_BUCKETS+1];
-
-   u64 time_total;
-   u64 time_spinning;
u64 time_blocked;
 } spinlock_stats;
 
 static u8 zero_stats;
 
-static unsigned lock_timeout = 1  10;
-#define TIMEOUT lock_timeout
-
 static inline void check_zero(void)
 {
-   if (unlikely(zero_stats)) {
-   memset(spinlock_stats, 0, sizeof(spinlock_stats));
-   zero_stats = 0;
+   u8 ret;
+   u8 old = ACCESS_ONCE(zero_stats);
+   if (unlikely(old)) {
+   ret = cmpxchg(zero_stats, old, 0);
+   /* This ensures only one fellow resets the stat */
+   if (ret == old)
+   memset(spinlock_stats, 0, sizeof(spinlock_stats));
}
 }
 
-#define ADD_STATS(elem, val)   \
-   do { check_zero(); spinlock_stats.elem += (val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+   check_zero();
+   spinlock_stats.contention_stats[var] += val;
+}
 
 static inline u64 spin_time_start(void)
 {
@@ -74,22 +73,6 @@ static void __spin_time_accum(u64 delta, u32 *array)
array[HISTO_BUCKETS]++;
 }
 
-static inline void spin_time_accum_spinning(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_spinning);
-   spinlock_stats.time_spinning += delta;
-}
-
-static inline void spin_time_accum_total(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_total);
-   spinlock_stats.time_total += delta;
-}
-
 static inline void spin_time_accum_blocked(u64 start)
 {
u32 delta = xen_clocksource_read() - start;
@@ -99,19 +82,15 @@ static inline void spin_time_accum_blocked(u64 start)
 }
 #else  /* !CONFIG_XEN_DEBUG_FS */
 #define TIMEOUT(1  10)
-#define ADD_STATS(elem, val)   do { (void)(val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+}
 
 static inline u64 spin_time_start(void)
 {
return 0;
 }
 
-static inline void spin_time_accum_total(u64 start)
-{
-}
-static inline void spin_time_accum_spinning(u64 start)
-{
-}
 static inline void spin_time_accum_blocked(u64 start)
 {
 }
@@ -134,230 +113,84 @@ typedef u16 xen_spinners_t;
asm(LOCK_PREFIX  decw %0 : +m ((xl)-spinners) : : memory);
 #endif
 
-struct xen_spinlock {
-   unsigned char lock; /* 0 - free; 1 - locked */
-   xen_spinners_t spinners;/* count of waiting cpus */
+struct xen_lock_waiting {
+   struct arch_spinlock *lock;
+   __ticket_t want;
 };
 
 static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
-
-#if 0

[PATCH V13 09/14] jump_label: Split jumplabel ratelimit

2013-08-09 Thread Raghavendra K T
From: Andrew Jones drjo...@redhat.com

Commit b202952075f62603bea9bfb6ebc6b0420db11949 (perf, core: Rate limit
perf_sched_events jump_label patching) introduced rate limiting
for jump label disabling. The changes were made in the jump label code
in order to be more widely available and to keep things tidier. This is
all fine, except now jump_label.h includes linux/workqueue.h, which
makes it impossible to include jump_label.h from anything that
workqueue.h needs. For example, it's now impossible to include
jump_label.h from asm/spinlock.h, which is done in proposed
pv-ticketlock patches. This patch splits out the rate limiting related
changes from jump_label.h into a new file, jump_label_ratelimit.h, to
resolve the issue.

Signed-off-by: Andrew Jones drjo...@redhat.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 include/linux/jump_label.h   | 28 +---
 include/linux/jump_label_ratelimit.h | 34 ++
 include/linux/perf_event.h   |  1 +
 kernel/jump_label.c  |  1 +
 4 files changed, 37 insertions(+), 27 deletions(-)
 create mode 100644 include/linux/jump_label_ratelimit.h

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 0976fc4..a507907 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -48,7 +48,6 @@
 
 #include linux/types.h
 #include linux/compiler.h
-#include linux/workqueue.h
 
 #if defined(CC_HAVE_ASM_GOTO)  defined(CONFIG_JUMP_LABEL)
 
@@ -61,12 +60,6 @@ struct static_key {
 #endif
 };
 
-struct static_key_deferred {
-   struct static_key key;
-   unsigned long timeout;
-   struct delayed_work work;
-};
-
 # include asm/jump_label.h
 # define HAVE_JUMP_LABEL
 #endif /* CC_HAVE_ASM_GOTO  CONFIG_JUMP_LABEL */
@@ -78,6 +71,7 @@ enum jump_label_type {
 
 struct module;
 
+#include linux/atomic.h
 #ifdef HAVE_JUMP_LABEL
 
 #define JUMP_LABEL_TRUE_BRANCH 1UL
@@ -119,10 +113,7 @@ extern void arch_jump_label_transform_static(struct 
jump_entry *entry,
 extern int jump_label_text_reserved(void *start, void *end);
 extern void static_key_slow_inc(struct static_key *key);
 extern void static_key_slow_dec(struct static_key *key);
-extern void static_key_slow_dec_deferred(struct static_key_deferred *key);
 extern void jump_label_apply_nops(struct module *mod);
-extern void
-jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl);
 
 #define STATIC_KEY_INIT_TRUE ((struct static_key) \
{ .enabled = ATOMIC_INIT(1), .entries = (void *)1 })
@@ -131,8 +122,6 @@ jump_label_rate_limit(struct static_key_deferred *key, 
unsigned long rl);
 
 #else  /* !HAVE_JUMP_LABEL */
 
-#include linux/atomic.h
-
 struct static_key {
atomic_t enabled;
 };
@@ -141,10 +130,6 @@ static __always_inline void jump_label_init(void)
 {
 }
 
-struct static_key_deferred {
-   struct static_key  key;
-};
-
 static __always_inline bool static_key_false(struct static_key *key)
 {
if (unlikely(atomic_read(key-enabled))  0)
@@ -169,11 +154,6 @@ static inline void static_key_slow_dec(struct static_key 
*key)
atomic_dec(key-enabled);
 }
 
-static inline void static_key_slow_dec_deferred(struct static_key_deferred 
*key)
-{
-   static_key_slow_dec(key-key);
-}
-
 static inline int jump_label_text_reserved(void *start, void *end)
 {
return 0;
@@ -187,12 +167,6 @@ static inline int jump_label_apply_nops(struct module *mod)
return 0;
 }
 
-static inline void
-jump_label_rate_limit(struct static_key_deferred *key,
-   unsigned long rl)
-{
-}
-
 #define STATIC_KEY_INIT_TRUE ((struct static_key) \
{ .enabled = ATOMIC_INIT(1) })
 #define STATIC_KEY_INIT_FALSE ((struct static_key) \
diff --git a/include/linux/jump_label_ratelimit.h 
b/include/linux/jump_label_ratelimit.h
new file mode 100644
index 000..1137883
--- /dev/null
+++ b/include/linux/jump_label_ratelimit.h
@@ -0,0 +1,34 @@
+#ifndef _LINUX_JUMP_LABEL_RATELIMIT_H
+#define _LINUX_JUMP_LABEL_RATELIMIT_H
+
+#include linux/jump_label.h
+#include linux/workqueue.h
+
+#if defined(CC_HAVE_ASM_GOTO)  defined(CONFIG_JUMP_LABEL)
+struct static_key_deferred {
+   struct static_key key;
+   unsigned long timeout;
+   struct delayed_work work;
+};
+#endif
+
+#ifdef HAVE_JUMP_LABEL
+extern void static_key_slow_dec_deferred(struct static_key_deferred *key);
+extern void
+jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl);
+
+#else  /* !HAVE_JUMP_LABEL */
+struct static_key_deferred {
+   struct static_key  key;
+};
+static inline void static_key_slow_dec_deferred(struct static_key_deferred 
*key)
+{
+   static_key_slow_dec(key-key);
+}
+static inline void
+jump_label_rate_limit(struct static_key_deferred *key,
+   unsigned long rl)
+{
+}
+#endif /* HAVE_JUMP_LABEL */
+#endif

[PATCH V13 08/14] x86/pvticketlock: When paravirtualizing ticket locks, increment by 2

2013-08-09 Thread Raghavendra K T
From: Jeremy Fitzhardinge jer...@goop.org

Increment ticket head/tails by 2 rather than 1 to leave the LSB free
to store a is in slowpath state bit.  This halves the number
of possible CPUs for a given ticket size, but this shouldn't matter
in practice - kernels built for 32k+ CPU systems are probably
specially built for the hardware rather than a generic distro
kernel.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/asm/spinlock.h   | 10 +-
 arch/x86/include/asm/spinlock_types.h | 10 +-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 7442410..04a5cd5 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -78,7 +78,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
  */
 static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
-   register struct __raw_tickets inc = { .tail = 1 };
+   register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
inc = xadd(lock-tickets, inc);
 
@@ -104,7 +104,7 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
if (old.tickets.head != old.tickets.tail)
return 0;
 
-   new.head_tail = old.head_tail + (1  TICKET_SHIFT);
+   new.head_tail = old.head_tail + (TICKET_LOCK_INC  TICKET_SHIFT);
 
/* cmpxchg is a full barrier, so nothing can move before it */
return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
@@ -112,9 +112,9 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-   __ticket_t next = lock-tickets.head + 1;
+   __ticket_t next = lock-tickets.head + TICKET_LOCK_INC;
 
-   __add(lock-tickets.head, 1, UNLOCK_LOCK_PREFIX);
+   __add(lock-tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
__ticket_unlock_kick(lock, next);
 }
 
@@ -129,7 +129,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t 
*lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
-   return (__ticket_t)(tmp.tail - tmp.head)  1;
+   return (__ticket_t)(tmp.tail - tmp.head)  TICKET_LOCK_INC;
 }
 #define arch_spin_is_contended arch_spin_is_contended
 
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 83fd3c7..e96fcbd 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -3,7 +3,13 @@
 
 #include linux/types.h
 
-#if (CONFIG_NR_CPUS  256)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define __TICKET_LOCK_INC  2
+#else
+#define __TICKET_LOCK_INC  1
+#endif
+
+#if (CONFIG_NR_CPUS  (256 / __TICKET_LOCK_INC))
 typedef u8  __ticket_t;
 typedef u16 __ticketpair_t;
 #else
@@ -11,6 +17,8 @@ typedef u16 __ticket_t;
 typedef u32 __ticketpair_t;
 #endif
 
+#define TICKET_LOCK_INC((__ticket_t)__TICKET_LOCK_INC)
+
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
 
 typedef struct arch_spinlock {
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V13 07/14] x86/pvticketlock: Use callee-save for lock_spinning

2013-08-09 Thread Raghavendra K T
From: Jeremy Fitzhardinge jer...@goop.org

Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path.  To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/asm/paravirt.h   | 2 +-
 arch/x86/include/asm/paravirt_types.h | 2 +-
 arch/x86/kernel/paravirt-spinlocks.c  | 2 +-
 arch/x86/xen/spinlock.c   | 3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 040e72d..7131e12c 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -715,7 +715,7 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
__ticket_t ticket)
 {
-   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
+   PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
 static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 346a07c..04ac40e 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -334,7 +334,7 @@ typedef u16 __ticket_t;
 #endif
 
 struct pv_lock_ops {
-   void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket);
+   struct paravirt_callee_save lock_spinning;
void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
 };
 
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index c2e010e..4251c1d 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -9,7 +9,7 @@
 
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-   .lock_spinning = paravirt_nop,
+   .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
.unlock_kick = paravirt_nop,
 #endif
 };
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 669a971..6c8792b 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -173,6 +173,7 @@ out:
local_irq_restore(flags);
spin_time_accum_blocked(start);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
 
 static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
 {
@@ -260,7 +261,7 @@ void __init xen_init_spinlocks(void)
return;
}
 
-   pv_lock_ops.lock_spinning = xen_lock_spinning;
+   pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V13 11/14] xen/pvticketlock: Allow interrupts to be enabled while blocking

2013-08-09 Thread Raghavendra K T
From: Jeremy Fitzhardinge jer...@goop.org

If interrupts were enabled when taking the spinlock, we can leave them
enabled while blocking to get the lock.

If we can enable interrupts while waiting for the lock to become
available, and we take an interrupt before entering the poll,
and the handler takes a spinlock which ends up going into
the slow state (invalidating the per-cpu lock and want values),
then when the interrupt handler returns the event channel will
remain pending so the poll will return immediately, causing it to
return out to the main spinlock loop.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/xen/spinlock.c | 46 --
 1 file changed, 40 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 546112e..0438b93 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -142,7 +142,20 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
__ticket_t want)
 * partially setup state.
 */
local_irq_save(flags);
-
+   /*
+* We don't really care if we're overwriting some other
+* (lock,want) pair, as that would mean that we're currently
+* in an interrupt context, and the outer context had
+* interrupts enabled.  That has already kicked the VCPU out
+* of xen_poll_irq(), so it will just return spuriously and
+* retry with newly setup (lock,want).
+*
+* The ordering protocol on this is that the lock pointer
+* may only be set non-NULL if the want ticket is correct.
+* If we're updating want, we must first clear lock.
+*/
+   w-lock = NULL;
+   smp_wmb();
w-want = want;
smp_wmb();
w-lock = lock;
@@ -157,24 +170,43 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
__ticket_t want)
/* Only check lock once pending cleared */
barrier();
 
-   /* Mark entry to slowpath before doing the pickup test to make
-  sure we don't deadlock with an unlocker. */
+   /*
+* Mark entry to slowpath before doing the pickup test to make
+* sure we don't deadlock with an unlocker.
+*/
__ticket_enter_slowpath(lock);
 
-   /* check again make sure it didn't become free while
-  we weren't looking  */
+   /*
+* check again make sure it didn't become free while
+* we weren't looking
+*/
if (ACCESS_ONCE(lock-tickets.head) == want) {
add_stats(TAKEN_SLOW_PICKUP, 1);
goto out;
}
+
+   /* Allow interrupts while blocked */
+   local_irq_restore(flags);
+
+   /*
+* If an interrupt happens here, it will leave the wakeup irq
+* pending, which will cause xen_poll_irq() to return
+* immediately.
+*/
+
/* Block until irq becomes pending (or perhaps a spurious wakeup) */
xen_poll_irq(irq);
add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq));
+
+   local_irq_save(flags);
+
kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 out:
cpumask_clear_cpu(cpu, waiting_cpus);
w-lock = NULL;
+
local_irq_restore(flags);
+
spin_time_accum_blocked(start);
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
@@ -188,7 +220,9 @@ static void xen_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
for_each_cpu(cpu, waiting_cpus) {
const struct xen_lock_waiting *w = per_cpu(lock_waiting, cpu);
 
-   if (w-lock == lock  w-want == next) {
+   /* Make sure we read lock before want */
+   if (ACCESS_ONCE(w-lock) == lock 
+   ACCESS_ONCE(w-want) == next) {
add_stats(RELEASED_SLOW_KICKED, 1);
xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
break;
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V13 12/14] kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi

2013-08-09 Thread Raghavendra K T
this is needed by both guest and host.

Originally-from: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 include/uapi/linux/kvm_para.h| 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 06fdbd9..94dc8ca 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -23,6 +23,7 @@
 #define KVM_FEATURE_ASYNC_PF   4
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
+#define KVM_FEATURE_PV_UNHALT  7
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index cea2c5c..2841f86 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP  2
 #define KVM_HC_FEATURES3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE  4
+#define KVM_HC_KICK_CPU5
 
 /*
  * hypercalls use architecture specific
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V13 06/14] xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks

2013-08-09 Thread Raghavendra K T
From: Jeremy Fitzhardinge jer...@goop.org

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/xen/spinlock.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index a458729..669a971 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -244,6 +244,8 @@ void xen_uninit_lock_cpu(int cpu)
per_cpu(irq_name, cpu) = NULL;
 }
 
+static bool xen_pvspin __initdata = true;
+
 void __init xen_init_spinlocks(void)
 {
/*
@@ -253,10 +255,22 @@ void __init xen_init_spinlocks(void)
if (xen_hvm_domain())
return;
 
+   if (!xen_pvspin) {
+   printk(KERN_DEBUG xen: PV spinlocks disabled\n);
+   return;
+   }
+
pv_lock_ops.lock_spinning = xen_lock_spinning;
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
+static __init int xen_parse_nopvspin(char *arg)
+{
+   xen_pvspin = false;
+   return 0;
+}
+early_param(xen_nopvspin, xen_parse_nopvspin);
+
 #ifdef CONFIG_XEN_DEBUG_FS
 
 static struct dentry *d_spin_debug;
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V13 10/14] x86/ticketlock: Add slowpath logic

2013-08-09 Thread Raghavendra K T
From: Jeremy Fitzhardinge jer...@goop.org

Maintain a flag in the LSB of the ticket lock tail which indicates
whether anyone is in the lock slowpath and may need kicking when
the current holder unlocks.  The flags are set when the first locker
enters the slowpath, and cleared when unlocking to an empty queue (ie,
no contention).

In the specific implementation of lock_spinning(), make sure to set
the slowpath flags on the lock just before blocking.  We must do
this before the last-chance pickup test to prevent a deadlock
with the unlocker:

UnlockerLocker
test for lock pickup
- fail
unlock
test slowpath
- false
set slowpath flags
block

Whereas this works in any ordering:

UnlockerLocker
set slowpath flags
test for lock pickup
- fail
block
unlock
test slowpath
- true, kick

If the unlocker finds that the lock has the slowpath flag set but it is
actually uncontended (ie, head == tail, so nobody is waiting), then it
clears the slowpath flag.

The unlock code uses a locked add to update the head counter.  This also
acts as a full memory barrier so that its safe to subsequently
read back the slowflag state, knowing that the updated lock is visible
to the other CPUs.  If it were an unlocked add, then the flag read may
just be forwarded from the store buffer before it was visible to the other
CPUs, which could result in a deadlock.

Unfortunately this means we need to do a locked instruction when
unlocking with PV ticketlocks.  However, if PV ticketlocks are not
enabled, then the old non-locked add is the only unlocking code.

Note: this code relies on gcc making sure that unlikely() code is out of
line of the fastpath, which only happens when OPTIMIZE_SIZE=n.  If it
doesn't the generated code isn't too bad, but its definitely suboptimal.

Thanks to Srivatsa Vaddagiri for providing a bugfix to the original
version of this change, which has been folded in.
Thanks to Stephan Diestelhorst for commenting on some code which relied
on an inaccurate reading of the x86 memory ordering rules.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Stephan Diestelhorst stephan.diestelho...@amd.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/asm/paravirt.h   |  2 +-
 arch/x86/include/asm/spinlock.h   | 86 +--
 arch/x86/include/asm/spinlock_types.h |  2 +
 arch/x86/kernel/paravirt-spinlocks.c  |  3 ++
 arch/x86/xen/spinlock.c   |  6 +++
 5 files changed, 74 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 7131e12c..401f350 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -718,7 +718,7 @@ static __always_inline void __ticket_lock_spinning(struct 
arch_spinlock *lock,
PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
+static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
__ticket_t ticket)
 {
PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 04a5cd5..d68883d 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -1,11 +1,14 @@
 #ifndef _ASM_X86_SPINLOCK_H
 #define _ASM_X86_SPINLOCK_H
 
+#include linux/jump_label.h
 #include linux/atomic.h
 #include asm/page.h
 #include asm/processor.h
 #include linux/compiler.h
 #include asm/paravirt.h
+#include asm/bitops.h
+
 /*
  * Your basic SMP spinlocks, allowing only a single CPU anywhere
  *
@@ -37,32 +40,28 @@
 /* How long a lock should spin before we consider blocking */
 #define SPIN_THRESHOLD (1  15)
 
-#ifndef CONFIG_PARAVIRT_SPINLOCKS
+extern struct static_key paravirt_ticketlocks_enabled;
+static __always_inline bool static_key_false(struct static_key *key);
 
-static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
-   __ticket_t ticket)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
+   set_bit(0, (volatile unsigned long *)lock-tickets.tail);
 }
 
-static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
-__ticket_t ticket)
+#else  /* !CONFIG_PARAVIRT_SPINLOCKS */
+static

[PATCH V13 13/14] kvm guest : Add configuration support to enable debug information for KVM Guests

2013-08-09 Thread Raghavendra K T
From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com

Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Suzuki Poulose suz...@in.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/Kconfig | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 724ce55..f108f00 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -688,6 +688,15 @@ config KVM_GUEST
  underlying device model, the host provides the guest with
  timing infrastructure such as time of day, and system time
 
+config KVM_DEBUG_FS
+   bool Enable debug information for KVM Guests in debugfs
+   depends on KVM_GUEST  DEBUG_FS
+   default n
+   ---help---
+ This option enables collection of various statistics for KVM guest.
+ Statistics are displayed in debugfs filesystem. Enabling this option
+ may incur significant overhead.
+
 source arch/x86/lguest/Kconfig
 
 config PARAVIRT_TIME_ACCOUNTING
-- 
1.7.11.7

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V13 14/14] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-09 Thread Raghavendra K T
From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com

During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so,
 support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Suzuki Poulose suz...@in.ibm.com
[Raghu: check_zero race fix, enum for kvm_contention_stat, jumplabel related 
changes,
addition of safe_halt for irq enabled case, bailout spinning in nmi case(Gleb)]
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/asm/kvm_para.h |  14 ++-
 arch/x86/kernel/kvm.c   | 262 
 2 files changed, 274 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 695399f..427afcb 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -118,10 +118,20 @@ void kvm_async_pf_task_wait(u32 token);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* !CONFIG_PARAVIRT_SPINLOCKS */
+static inline void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
 #define kvm_async_pf_task_wait(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
+
 static inline u32 kvm_read_and_reset_pf_reason(void)
 {
return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a96d32c..9b33a27 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -34,6 +34,7 @@
 #include linux/sched.h
 #include linux/slab.h
 #include linux/kprobes.h
+#include linux/debugfs.h
 #include asm/timer.h
 #include asm/cpu.h
 #include asm/traps.h
@@ -419,6 +420,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
WARN_ON(kvm_register_clock(primary cpu clock));
kvm_guest_cpu_init();
native_smp_prepare_boot_cpu();
+   kvm_spinlock_init();
 }
 
 static void kvm_guest_cpu_online(void *dummy)
@@ -523,3 +525,263 @@ static __init int activate_jump_labels(void)
return 0;
 }
 arch_initcall(activate_jump_labels);
+
+/* Kick a cpu by its apicid. Used to wake up a halted vcpu */
+void kvm_kick_cpu(int cpu)
+{
+   int apicid;
+   unsigned long flags = 0;
+
+   apicid = per_cpu(x86_cpu_to_apicid, cpu);
+   kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
+}
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+enum kvm_contention_stat {
+   TAKEN_SLOW,
+   TAKEN_SLOW_PICKUP,
+   RELEASED_SLOW,
+   RELEASED_SLOW_KICKED,
+   NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+#define HISTO_BUCKETS  30
+
+static struct kvm_spinlock_stats
+{
+   u32 contention_stats[NR_CONTENTION_STATS];
+   u32 histo_spin_blocked[HISTO_BUCKETS+1];
+   u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+   u8 ret;
+   u8 old;
+
+   old = ACCESS_ONCE(zero_stats);
+   if (unlikely(old)) {
+   ret = cmpxchg(zero_stats, old, 0);
+   /* This ensures only one fellow resets the stat */
+   if (ret == old)
+   memset(spinlock_stats, 0, sizeof(spinlock_stats));
+   }
+}
+
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+   check_zero();
+   spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+   return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+   unsigned index;
+
+   index = ilog2(delta);
+   check_zero();
+
+   if (index  HISTO_BUCKETS)
+   array[index]++;
+   else
+   array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+   u32 delta;
+
+   delta = sched_clock() - start;
+   __spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+   spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+   d_kvm_debug = debugfs_create_dir(kvm, NULL);
+   if (!d_kvm_debug)
+   printk(KERN_WARNING Could not create 'kvm' debugfs 
directory\n);
+
+   return d_kvm_debug;
+}
+
+static int __init kvm_spinlock_debugfs(void)
+{
+   struct dentry *d_kvm;
+
+   d_kvm = kvm_init_debugfs();
+   if (d_kvm == NULL)
+   return -ENOMEM;
+
+   d_spin_debug = debugfs_create_dir(spinlocks, d_kvm);
+
+   debugfs_create_u8(zero_stats, 0644, d_spin_debug, zero_stats

Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks

2013-08-08 Thread Raghavendra K T

On 08/09/2013 06:34 AM, H. Peter Anvin wrote:

The kbuild test bot is reporting some pretty serious errors for this
patchset.  I think these are serious enough that the patchset will need
to be respun.



I  am working on that.



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V12 0/14] Paravirtualized ticket spinlocks

2013-08-06 Thread Raghavendra K T

This series replaces the existing paravirtualized spinlock mechanism
with a paravirtualized ticketlock mechanism. The series provides
implementation for both Xen and KVM.

The current set of patches are for Xen/x86 spinlock/KVM guest side, to be 
included
against -tip.

I 'll be sending a separate patchset for KVM host based on kvm tree.

Please note I have added the below performance result (as suggested by HPA)
 in the commit log of the first patch

Results:
===
setup: 32 core machine with 32 vcpu KVM guest (HT off)  with 8GB RAM
base = 3.11-rc
patched = base + pvspinlock V12 

+-+++
 dbench (Throughput in MB/sec. Higher is better)
+-+++
|   base (stdev %)|patched(stdev%) | %gain  |
+-+++
| 15035.3   (0.3) |15150.0   (0.6) |   0.8  |
|  1470.0   (2.2) | 1713.7   (1.9) |  16.6  |
|   848.6   (4.3) |  967.8   (4.3) |  14.0  |
|   652.9   (3.5) |  685.3   (3.7) |   5.0  |
+-+++

pvspinlock shows benefits for overcommit ratio  1 for PLE enabled cases,
and undercommits results are flat.

non PLE results are much better for smaller VMs. 
http://lkml.indiana.edu/hypermail/linux/kernel/1306.3/01095.html

I would thanks all the experts here for their time in reviewing and
commenting on the patch series (including perhaps who have been not
listed). 

Finally special thanks to KVM maintainers (Avi, Marcelo, Gleb) for
their comments and Andrew Theurer, Chegu Vinod for their valuable
testing and feedback.

The older series[4] was tested by Attilio for Xen implementation.

Note that Konrad needs to revert below two patches to enable xen on hvm 
  70dd4998, f10cd522c

Changes in V12
 - spiltted uapi header patch.
 - dropped patch 18.
 - bailout of lock spinning in case of NMI (Gleb)
 - drop out patch 18 whose benefits are inconclusive (Gleb, Ingo)

Changes in V11:
 - use safe_halt in lock_spinning path to avoid potential problem 
  in case of irq_handlers taking lock in slowpath (Gleb)
 - add a0 flag for the kick hypercall for future extension  (Gleb)
 - add stubs for missing architecture for kvm_vcpu_schedule() (Gleb)
 - Change hypercall documentation.
 - Rebased to 3.11-rc1

Changes in V10:
Addressed Konrad's review comments:
- Added break in patch 5 since now we know exact cpu to wakeup
- Dropped patch 12 and Konrad needs to revert two patches to enable xen on hvm 
  70dd4998, f10cd522c
- Remove TIMEOUT and corrected spacing in patch 15
- Kicked spelling and correct spacing in patches 17, 18 

Changes in V9:
- Changed spin_threshold to 32k to avoid excess halt exits that are
   causing undercommit degradation (after PLE handler improvement).
- Added  kvm_irq_delivery_to_apic (suggested by Gleb)
- Optimized halt exit path to use PLE handler

V8 of PVspinlock was posted last year. After Avi's suggestions to look
at PLE handler's improvements, various optimizations in PLE handling
have been tried.

With this series we see that we could get little more improvements on top
of that.

Changes before V8:
Ticketlock change history:
Changes in V7:
 - Reabsed patches to 3.4-rc3
 - Added jumplabel split patch (originally from Andrew Jones rebased to
3.4-rc3
 - jumplabel changes from Ingo and Jason taken and now using static_key_*
instead of static_branch.
 - using UNINLINE_SPIN_UNLOCK (which was splitted as per suggestion from Linus)
 - This patch series is rebased on debugfs patch (that sould be already in
Xen/linux-next https://lkml.org/lkml/2012/3/23/51)

Changes in V6 posting: (Raghavendra K T)
 - Rebased to linux-3.3-rc6.
 - used function+enum in place of macro (better type checking) 
 - use cmpxchg while resetting zero status for possible race
[suggested by Dave Hansen for KVM patches ]

KVM patch Change history:
Changes in V6:
- Rebased to 3.4-rc3
- Removed debugfs changes patch which should now be in Xen/linux-next.
  (https://lkml.org/lkml/2012/3/30/687)
- Removed PV_UNHALT_MSR since currently we don't need guest communication,
  and made pv_unhalt folded to GET_MP_STATE (Marcello, Avi[long back])
- Take jumplabel changes from Ingo/Jason into use (static_key_slow_inc usage)
- Added inline to spinlock_init in non PARAVIRT case
- Move arch specific code to arch/x86 and add stubs to other archs (Marcello)
- Added more comments on pv_unhalt usage etc (Marcello)

Changes in V5:
- rebased to 3.3-rc6
- added PV_UNHALT_MSR that would help in live migration (Avi)
- removed PV_LOCK_KICK vcpu request and pv_unhalt flag (re)added.
- Changed hypercall documentaion (Alex).
- mode_t changed to umode_t in debugfs.
- MSR related documentation added.
- rename PV_LOCK_KICK to PV_UNHALT. 
- host and guest patches not mixed. (Marcelo, Alex)
- kvm_kick_cpu now takes cpu so it can be used by flush_tlb_ipi_other 
   paravirtualization (Nikunj)
- coding style changes in variable declarion etc (Srikar)

Changes in V4:
- reabsed to 3.2.0 pre.
- use APIC ID

[PATCH V12 3/14] x86/ticketlock: Collapse a layer of functions

2013-08-06 Thread Raghavendra K T
x86/ticketlock: Collapse a layer of functions

From: Jeremy Fitzhardinge jer...@goop.org

Now that the paravirtualization layer doesn't exist at the spinlock
level any more, we can collapse the __ticket_ functions into the arch_
functions.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/asm/spinlock.h |   35 +--
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 4d54244..7442410 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -76,7 +76,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
  * in the high part, because a wide xadd increment of the low part would carry
  * up and contaminate the high part.
  */
-static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
register struct __raw_tickets inc = { .tail = 1 };
 
@@ -96,7 +96,7 @@ static __always_inline void __ticket_spin_lock(struct 
arch_spinlock *lock)
 out:   barrier();  /* make sure nothing creeps before the lock is taken */
 }
 
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
+static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
arch_spinlock_t old, new;
 
@@ -110,7 +110,7 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
 }
 
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
+static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
__ticket_t next = lock-tickets.head + 1;
 
@@ -118,46 +118,21 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
__ticket_unlock_kick(lock, next);
 }
 
-static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
return tmp.tail != tmp.head;
 }
 
-static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
+static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
return (__ticket_t)(tmp.tail - tmp.head)  1;
 }
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_locked(lock);
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_contended(lock);
-}
 #define arch_spin_is_contended arch_spin_is_contended
 
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-   __ticket_spin_lock(lock);
-}
-
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-   return __ticket_spin_trylock(lock);
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-   __ticket_spin_unlock(lock);
-}
-
 static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
  unsigned long flags)
 {

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V12 4/14] xen: Defer spinlock setup until boot CPU setup

2013-08-06 Thread Raghavendra K T
xen: Defer spinlock setup until boot CPU setup

From: Jeremy Fitzhardinge jer...@goop.org

There's no need to do it at very early init, and doing it there
makes it impossible to use the jump_label machinery.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/xen/smp.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index ca92754..3b52d80 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -279,6 +279,7 @@ static void __init xen_smp_prepare_boot_cpu(void)
 
xen_filter_cpu_maps();
xen_setup_vcpu_info_placement();
+   xen_init_spinlocks();
 }
 
 static void __init xen_smp_prepare_cpus(unsigned int max_cpus)
@@ -680,7 +681,6 @@ void __init xen_smp_init(void)
 {
smp_ops = xen_smp_ops;
xen_fill_possible_map();
-   xen_init_spinlocks();
 }
 
 static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V12 6/14] xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks

2013-08-06 Thread Raghavendra K T
xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks

From: Jeremy Fitzhardinge jer...@goop.org

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/xen/spinlock.c |   14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index a458729..669a971 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -244,6 +244,8 @@ void xen_uninit_lock_cpu(int cpu)
per_cpu(irq_name, cpu) = NULL;
 }
 
+static bool xen_pvspin __initdata = true;
+
 void __init xen_init_spinlocks(void)
 {
/*
@@ -253,10 +255,22 @@ void __init xen_init_spinlocks(void)
if (xen_hvm_domain())
return;
 
+   if (!xen_pvspin) {
+   printk(KERN_DEBUG xen: PV spinlocks disabled\n);
+   return;
+   }
+
pv_lock_ops.lock_spinning = xen_lock_spinning;
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
+static __init int xen_parse_nopvspin(char *arg)
+{
+   xen_pvspin = false;
+   return 0;
+}
+early_param(xen_nopvspin, xen_parse_nopvspin);
+
 #ifdef CONFIG_XEN_DEBUG_FS
 
 static struct dentry *d_spin_debug;

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V12 5/14] xen/pvticketlock: Xen implementation for PV ticket locks

2013-08-06 Thread Raghavendra K T
xen/pvticketlock: Xen implementation for PV ticket locks

From: Jeremy Fitzhardinge jer...@goop.org

Replace the old Xen implementation of PV spinlocks with and implementation
of xen_lock_spinning and xen_unlock_kick.

xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
adds itself to the waiting_cpus set, and blocks on an event channel
until the channel becomes pending.

xen_unlock_kick searches the cpus in waiting_cpus looking for the one
which next wants this lock with the next ticket, if any.  If found,
it kicks it by making its event channel pending, which wakes it up.

We need to make sure interrupts are disabled while we're relying on the
contents of the per-cpu lock_waiting values, otherwise an interrupt
handler could come in, try to take some other lock, block, and overwrite
our values.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 [ Raghavendra:  use function + enum instead of macro, cmpxchg for zero status 
reset
Reintroduce break since we know the exact vCPU to send IPI as suggested by 
Konrad.]
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/xen/spinlock.c |  348 +++
 1 file changed, 79 insertions(+), 269 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index d509629..a458729 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -17,45 +17,44 @@
 #include xen-ops.h
 #include debugfs.h
 
-#ifdef CONFIG_XEN_DEBUG_FS
-static struct xen_spinlock_stats
-{
-   u64 taken;
-   u32 taken_slow;
-   u32 taken_slow_nested;
-   u32 taken_slow_pickup;
-   u32 taken_slow_spurious;
-   u32 taken_slow_irqenable;
+enum xen_contention_stat {
+   TAKEN_SLOW,
+   TAKEN_SLOW_PICKUP,
+   TAKEN_SLOW_SPURIOUS,
+   RELEASED_SLOW,
+   RELEASED_SLOW_KICKED,
+   NR_CONTENTION_STATS
+};
 
-   u64 released;
-   u32 released_slow;
-   u32 released_slow_kicked;
 
+#ifdef CONFIG_XEN_DEBUG_FS
 #define HISTO_BUCKETS  30
-   u32 histo_spin_total[HISTO_BUCKETS+1];
-   u32 histo_spin_spinning[HISTO_BUCKETS+1];
+static struct xen_spinlock_stats
+{
+   u32 contention_stats[NR_CONTENTION_STATS];
u32 histo_spin_blocked[HISTO_BUCKETS+1];
-
-   u64 time_total;
-   u64 time_spinning;
u64 time_blocked;
 } spinlock_stats;
 
 static u8 zero_stats;
 
-static unsigned lock_timeout = 1  10;
-#define TIMEOUT lock_timeout
-
 static inline void check_zero(void)
 {
-   if (unlikely(zero_stats)) {
-   memset(spinlock_stats, 0, sizeof(spinlock_stats));
-   zero_stats = 0;
+   u8 ret;
+   u8 old = ACCESS_ONCE(zero_stats);
+   if (unlikely(old)) {
+   ret = cmpxchg(zero_stats, old, 0);
+   /* This ensures only one fellow resets the stat */
+   if (ret == old)
+   memset(spinlock_stats, 0, sizeof(spinlock_stats));
}
 }
 
-#define ADD_STATS(elem, val)   \
-   do { check_zero(); spinlock_stats.elem += (val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+   check_zero();
+   spinlock_stats.contention_stats[var] += val;
+}
 
 static inline u64 spin_time_start(void)
 {
@@ -74,22 +73,6 @@ static void __spin_time_accum(u64 delta, u32 *array)
array[HISTO_BUCKETS]++;
 }
 
-static inline void spin_time_accum_spinning(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_spinning);
-   spinlock_stats.time_spinning += delta;
-}
-
-static inline void spin_time_accum_total(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_total);
-   spinlock_stats.time_total += delta;
-}
-
 static inline void spin_time_accum_blocked(u64 start)
 {
u32 delta = xen_clocksource_read() - start;
@@ -99,19 +82,15 @@ static inline void spin_time_accum_blocked(u64 start)
 }
 #else  /* !CONFIG_XEN_DEBUG_FS */
 #define TIMEOUT(1  10)
-#define ADD_STATS(elem, val)   do { (void)(val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+}
 
 static inline u64 spin_time_start(void)
 {
return 0;
 }
 
-static inline void spin_time_accum_total(u64 start)
-{
-}
-static inline void spin_time_accum_spinning(u64 start)
-{
-}
 static inline void spin_time_accum_blocked(u64 start)
 {
 }
@@ -134,230 +113,84 @@ typedef u16 xen_spinners_t;
asm(LOCK_PREFIX  decw %0 : +m ((xl)-spinners) : : memory);
 #endif
 
-struct xen_spinlock {
-   unsigned char lock; /* 0 - free; 1 - locked */
-   xen_spinners_t spinners;/* count of waiting cpus */
+struct xen_lock_waiting {
+   struct arch_spinlock *lock;
+   __ticket_t want

[PATCH V12 7/14] x86/pvticketlock: Use callee-save for lock_spinning

2013-08-06 Thread Raghavendra K T
x86/pvticketlock: Use callee-save for lock_spinning

From: Jeremy Fitzhardinge jer...@goop.org

Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path.  To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/asm/paravirt.h   |2 +-
 arch/x86/include/asm/paravirt_types.h |2 +-
 arch/x86/kernel/paravirt-spinlocks.c  |2 +-
 arch/x86/xen/spinlock.c   |3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 040e72d..7131e12c 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -715,7 +715,7 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
__ticket_t ticket)
 {
-   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
+   PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
 static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index d5deb6d..350d017 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -330,7 +330,7 @@ struct arch_spinlock;
 #include asm/spinlock_types.h
 
 struct pv_lock_ops {
-   void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket);
+   struct paravirt_callee_save lock_spinning;
void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
 };
 
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index c2e010e..4251c1d 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -9,7 +9,7 @@
 
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-   .lock_spinning = paravirt_nop,
+   .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
.unlock_kick = paravirt_nop,
 #endif
 };
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 669a971..6c8792b 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -173,6 +173,7 @@ out:
local_irq_restore(flags);
spin_time_accum_blocked(start);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
 
 static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
 {
@@ -260,7 +261,7 @@ void __init xen_init_spinlocks(void)
return;
}
 
-   pv_lock_ops.lock_spinning = xen_lock_spinning;
+   pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V12 8/14] x86/pvticketlock: When paravirtualizing ticket locks, increment by 2

2013-08-06 Thread Raghavendra K T
x86/pvticketlock: When paravirtualizing ticket locks, increment by 2

From: Jeremy Fitzhardinge jer...@goop.org

Increment ticket head/tails by 2 rather than 1 to leave the LSB free
to store a is in slowpath state bit.  This halves the number
of possible CPUs for a given ticket size, but this shouldn't matter
in practice - kernels built for 32k+ CPU systems are probably
specially built for the hardware rather than a generic distro
kernel.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/asm/spinlock.h   |   10 +-
 arch/x86/include/asm/spinlock_types.h |   10 +-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 7442410..04a5cd5 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -78,7 +78,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
  */
 static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
-   register struct __raw_tickets inc = { .tail = 1 };
+   register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
inc = xadd(lock-tickets, inc);
 
@@ -104,7 +104,7 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
if (old.tickets.head != old.tickets.tail)
return 0;
 
-   new.head_tail = old.head_tail + (1  TICKET_SHIFT);
+   new.head_tail = old.head_tail + (TICKET_LOCK_INC  TICKET_SHIFT);
 
/* cmpxchg is a full barrier, so nothing can move before it */
return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
@@ -112,9 +112,9 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-   __ticket_t next = lock-tickets.head + 1;
+   __ticket_t next = lock-tickets.head + TICKET_LOCK_INC;
 
-   __add(lock-tickets.head, 1, UNLOCK_LOCK_PREFIX);
+   __add(lock-tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
__ticket_unlock_kick(lock, next);
 }
 
@@ -129,7 +129,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t 
*lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
-   return (__ticket_t)(tmp.tail - tmp.head)  1;
+   return (__ticket_t)(tmp.tail - tmp.head)  TICKET_LOCK_INC;
 }
 #define arch_spin_is_contended arch_spin_is_contended
 
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 83fd3c7..e96fcbd 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -3,7 +3,13 @@
 
 #include linux/types.h
 
-#if (CONFIG_NR_CPUS  256)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define __TICKET_LOCK_INC  2
+#else
+#define __TICKET_LOCK_INC  1
+#endif
+
+#if (CONFIG_NR_CPUS  (256 / __TICKET_LOCK_INC))
 typedef u8  __ticket_t;
 typedef u16 __ticketpair_t;
 #else
@@ -11,6 +17,8 @@ typedef u16 __ticket_t;
 typedef u32 __ticketpair_t;
 #endif
 
+#define TICKET_LOCK_INC((__ticket_t)__TICKET_LOCK_INC)
+
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
 
 typedef struct arch_spinlock {

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V12 9/14] jump_label: Split out rate limiting from jump_label.h

2013-08-06 Thread Raghavendra K T
jump_label: Split jumplabel ratelimit

From: Andrew Jones drjo...@redhat.com

Commit b202952075f62603bea9bfb6ebc6b0420db11949 (perf, core: Rate limit
perf_sched_events jump_label patching) introduced rate limiting
for jump label disabling. The changes were made in the jump label code
in order to be more widely available and to keep things tidier. This is
all fine, except now jump_label.h includes linux/workqueue.h, which
makes it impossible to include jump_label.h from anything that
workqueue.h needs. For example, it's now impossible to include
jump_label.h from asm/spinlock.h, which is done in proposed
pv-ticketlock patches. This patch splits out the rate limiting related
changes from jump_label.h into a new file, jump_label_ratelimit.h, to
resolve the issue.

Signed-off-by: Andrew Jones drjo...@redhat.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 include/linux/jump_label.h   |   26 +-
 include/linux/jump_label_ratelimit.h |   34 ++
 include/linux/perf_event.h   |1 +
 kernel/jump_label.c  |1 +
 4 files changed, 37 insertions(+), 25 deletions(-)
 create mode 100644 include/linux/jump_label_ratelimit.h

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 0976fc4..53cdf89 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -48,7 +48,6 @@
 
 #include linux/types.h
 #include linux/compiler.h
-#include linux/workqueue.h
 
 #if defined(CC_HAVE_ASM_GOTO)  defined(CONFIG_JUMP_LABEL)
 
@@ -61,12 +60,6 @@ struct static_key {
 #endif
 };
 
-struct static_key_deferred {
-   struct static_key key;
-   unsigned long timeout;
-   struct delayed_work work;
-};
-
 # include asm/jump_label.h
 # define HAVE_JUMP_LABEL
 #endif /* CC_HAVE_ASM_GOTO  CONFIG_JUMP_LABEL */
@@ -119,10 +112,7 @@ extern void arch_jump_label_transform_static(struct 
jump_entry *entry,
 extern int jump_label_text_reserved(void *start, void *end);
 extern void static_key_slow_inc(struct static_key *key);
 extern void static_key_slow_dec(struct static_key *key);
-extern void static_key_slow_dec_deferred(struct static_key_deferred *key);
 extern void jump_label_apply_nops(struct module *mod);
-extern void
-jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl);
 
 #define STATIC_KEY_INIT_TRUE ((struct static_key) \
{ .enabled = ATOMIC_INIT(1), .entries = (void *)1 })
@@ -141,10 +131,6 @@ static __always_inline void jump_label_init(void)
 {
 }
 
-struct static_key_deferred {
-   struct static_key  key;
-};
-
 static __always_inline bool static_key_false(struct static_key *key)
 {
if (unlikely(atomic_read(key-enabled))  0)
@@ -169,11 +155,6 @@ static inline void static_key_slow_dec(struct static_key 
*key)
atomic_dec(key-enabled);
 }
 
-static inline void static_key_slow_dec_deferred(struct static_key_deferred 
*key)
-{
-   static_key_slow_dec(key-key);
-}
-
 static inline int jump_label_text_reserved(void *start, void *end)
 {
return 0;
@@ -187,12 +168,6 @@ static inline int jump_label_apply_nops(struct module *mod)
return 0;
 }
 
-static inline void
-jump_label_rate_limit(struct static_key_deferred *key,
-   unsigned long rl)
-{
-}
-
 #define STATIC_KEY_INIT_TRUE ((struct static_key) \
{ .enabled = ATOMIC_INIT(1) })
 #define STATIC_KEY_INIT_FALSE ((struct static_key) \
@@ -203,6 +178,7 @@ jump_label_rate_limit(struct static_key_deferred *key,
 #define STATIC_KEY_INIT STATIC_KEY_INIT_FALSE
 #define jump_label_enabled static_key_enabled
 
+static inline int atomic_read(const atomic_t *v);
 static inline bool static_key_enabled(struct static_key *key)
 {
return (atomic_read(key-enabled)  0);
diff --git a/include/linux/jump_label_ratelimit.h 
b/include/linux/jump_label_ratelimit.h
new file mode 100644
index 000..1137883
--- /dev/null
+++ b/include/linux/jump_label_ratelimit.h
@@ -0,0 +1,34 @@
+#ifndef _LINUX_JUMP_LABEL_RATELIMIT_H
+#define _LINUX_JUMP_LABEL_RATELIMIT_H
+
+#include linux/jump_label.h
+#include linux/workqueue.h
+
+#if defined(CC_HAVE_ASM_GOTO)  defined(CONFIG_JUMP_LABEL)
+struct static_key_deferred {
+   struct static_key key;
+   unsigned long timeout;
+   struct delayed_work work;
+};
+#endif
+
+#ifdef HAVE_JUMP_LABEL
+extern void static_key_slow_dec_deferred(struct static_key_deferred *key);
+extern void
+jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl);
+
+#else  /* !HAVE_JUMP_LABEL */
+struct static_key_deferred {
+   struct static_key  key;
+};
+static inline void static_key_slow_dec_deferred(struct static_key_deferred 
*key)
+{
+   static_key_slow_dec(key-key);
+}
+static inline void
+jump_label_rate_limit(struct static_key_deferred *key,
+   unsigned long rl)
+{
+}
+#endif /* HAVE_JUMP_LABEL

[PATCH V12 10/14] x86/ticketlock: Add slowpath logic

2013-08-06 Thread Raghavendra K T
x86/ticketlock: Add slowpath logic

From: Jeremy Fitzhardinge jer...@goop.org

Maintain a flag in the LSB of the ticket lock tail which indicates
whether anyone is in the lock slowpath and may need kicking when
the current holder unlocks.  The flags are set when the first locker
enters the slowpath, and cleared when unlocking to an empty queue (ie,
no contention).

In the specific implementation of lock_spinning(), make sure to set
the slowpath flags on the lock just before blocking.  We must do
this before the last-chance pickup test to prevent a deadlock
with the unlocker:

UnlockerLocker
test for lock pickup
- fail
unlock
test slowpath
- false
set slowpath flags
block

Whereas this works in any ordering:

UnlockerLocker
set slowpath flags
test for lock pickup
- fail
block
unlock
test slowpath
- true, kick

If the unlocker finds that the lock has the slowpath flag set but it is
actually uncontended (ie, head == tail, so nobody is waiting), then it
clears the slowpath flag.

The unlock code uses a locked add to update the head counter.  This also
acts as a full memory barrier so that its safe to subsequently
read back the slowflag state, knowing that the updated lock is visible
to the other CPUs.  If it were an unlocked add, then the flag read may
just be forwarded from the store buffer before it was visible to the other
CPUs, which could result in a deadlock.

Unfortunately this means we need to do a locked instruction when
unlocking with PV ticketlocks.  However, if PV ticketlocks are not
enabled, then the old non-locked add is the only unlocking code.

Note: this code relies on gcc making sure that unlikely() code is out of
line of the fastpath, which only happens when OPTIMIZE_SIZE=n.  If it
doesn't the generated code isn't too bad, but its definitely suboptimal.

Thanks to Srivatsa Vaddagiri for providing a bugfix to the original
version of this change, which has been folded in.
Thanks to Stephan Diestelhorst for commenting on some code which relied
on an inaccurate reading of the x86 memory ordering rules.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Stephan Diestelhorst stephan.diestelho...@amd.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/asm/paravirt.h   |2 -
 arch/x86/include/asm/spinlock.h   |   86 -
 arch/x86/include/asm/spinlock_types.h |2 +
 arch/x86/kernel/paravirt-spinlocks.c  |3 +
 arch/x86/xen/spinlock.c   |6 ++
 5 files changed, 74 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 7131e12c..401f350 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -718,7 +718,7 @@ static __always_inline void __ticket_lock_spinning(struct 
arch_spinlock *lock,
PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
+static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
__ticket_t ticket)
 {
PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 04a5cd5..d68883d 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -1,11 +1,14 @@
 #ifndef _ASM_X86_SPINLOCK_H
 #define _ASM_X86_SPINLOCK_H
 
+#include linux/jump_label.h
 #include linux/atomic.h
 #include asm/page.h
 #include asm/processor.h
 #include linux/compiler.h
 #include asm/paravirt.h
+#include asm/bitops.h
+
 /*
  * Your basic SMP spinlocks, allowing only a single CPU anywhere
  *
@@ -37,32 +40,28 @@
 /* How long a lock should spin before we consider blocking */
 #define SPIN_THRESHOLD (1  15)
 
-#ifndef CONFIG_PARAVIRT_SPINLOCKS
+extern struct static_key paravirt_ticketlocks_enabled;
+static __always_inline bool static_key_false(struct static_key *key);
 
-static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
-   __ticket_t ticket)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
+   set_bit(0, (volatile unsigned long *)lock-tickets.tail);
 }
 
-static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
-__ticket_t ticket)
+#else

[PATCH V12 12/14] kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi

2013-08-06 Thread Raghavendra K T
kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

this is needed by both guest and host.

Originally-from: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/uapi/asm/kvm_para.h |1 +
 include/uapi/linux/kvm_para.h|1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 06fdbd9..94dc8ca 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -23,6 +23,7 @@
 #define KVM_FEATURE_ASYNC_PF   4
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
+#define KVM_FEATURE_PV_UNHALT  7
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index cea2c5c..2841f86 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP  2
 #define KVM_HC_FEATURES3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE  4
+#define KVM_HC_KICK_CPU5
 
 /*
  * hypercalls use architecture specific

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V12 11/14] xen/pvticketlock: Allow interrupts to be enabled while blocking

2013-08-06 Thread Raghavendra K T
xen/pvticketlock: Allow interrupts to be enabled while blocking

From: Jeremy Fitzhardinge jer...@goop.org

If interrupts were enabled when taking the spinlock, we can leave them
enabled while blocking to get the lock.

If we can enable interrupts while waiting for the lock to become
available, and we take an interrupt before entering the poll,
and the handler takes a spinlock which ends up going into
the slow state (invalidating the per-cpu lock and want values),
then when the interrupt handler returns the event channel will
remain pending so the poll will return immediately, causing it to
return out to the main spinlock loop.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/xen/spinlock.c |   46 --
 1 file changed, 40 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 546112e..0438b93 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -142,7 +142,20 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
__ticket_t want)
 * partially setup state.
 */
local_irq_save(flags);
-
+   /*
+* We don't really care if we're overwriting some other
+* (lock,want) pair, as that would mean that we're currently
+* in an interrupt context, and the outer context had
+* interrupts enabled.  That has already kicked the VCPU out
+* of xen_poll_irq(), so it will just return spuriously and
+* retry with newly setup (lock,want).
+*
+* The ordering protocol on this is that the lock pointer
+* may only be set non-NULL if the want ticket is correct.
+* If we're updating want, we must first clear lock.
+*/
+   w-lock = NULL;
+   smp_wmb();
w-want = want;
smp_wmb();
w-lock = lock;
@@ -157,24 +170,43 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
__ticket_t want)
/* Only check lock once pending cleared */
barrier();
 
-   /* Mark entry to slowpath before doing the pickup test to make
-  sure we don't deadlock with an unlocker. */
+   /*
+* Mark entry to slowpath before doing the pickup test to make
+* sure we don't deadlock with an unlocker.
+*/
__ticket_enter_slowpath(lock);
 
-   /* check again make sure it didn't become free while
-  we weren't looking  */
+   /*
+* check again make sure it didn't become free while
+* we weren't looking
+*/
if (ACCESS_ONCE(lock-tickets.head) == want) {
add_stats(TAKEN_SLOW_PICKUP, 1);
goto out;
}
+
+   /* Allow interrupts while blocked */
+   local_irq_restore(flags);
+
+   /*
+* If an interrupt happens here, it will leave the wakeup irq
+* pending, which will cause xen_poll_irq() to return
+* immediately.
+*/
+
/* Block until irq becomes pending (or perhaps a spurious wakeup) */
xen_poll_irq(irq);
add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq));
+
+   local_irq_save(flags);
+
kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 out:
cpumask_clear_cpu(cpu, waiting_cpus);
w-lock = NULL;
+
local_irq_restore(flags);
+
spin_time_accum_blocked(start);
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
@@ -188,7 +220,9 @@ static void xen_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
for_each_cpu(cpu, waiting_cpus) {
const struct xen_lock_waiting *w = per_cpu(lock_waiting, cpu);
 
-   if (w-lock == lock  w-want == next) {
+   /* Make sure we read lock before want */
+   if (ACCESS_ONCE(w-lock) == lock 
+   ACCESS_ONCE(w-want) == next) {
add_stats(RELEASED_SLOW_KICKED, 1);
xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
break;

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V12 13/14] kvm guest : Add configuration support to enable debug information for KVM Guests

2013-08-06 Thread Raghavendra K T
kvm guest : Add configuration support to enable debug information for KVM Guests

From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com

Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Suzuki Poulose suz...@in.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/Kconfig |9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 724ce55..f108f00 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -688,6 +688,15 @@ config KVM_GUEST
  underlying device model, the host provides the guest with
  timing infrastructure such as time of day, and system time
 
+config KVM_DEBUG_FS
+   bool Enable debug information for KVM Guests in debugfs
+   depends on KVM_GUEST  DEBUG_FS
+   default n
+   ---help---
+ This option enables collection of various statistics for KVM guest.
+ Statistics are displayed in debugfs filesystem. Enabling this option
+ may incur significant overhead.
+
 source arch/x86/lguest/Kconfig
 
 config PARAVIRT_TIME_ACCOUNTING

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V12 14/14] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-06 Thread Raghavendra K T
kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com

During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so,
 support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Suzuki Poulose suz...@in.ibm.com
[Raghu: check_zero race fix, enum for kvm_contention_stat, jumplabel related 
changes,
addition of safe_halt for irq enabled case, bailout spinning in nmi case(Gleb)]
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/asm/kvm_para.h |   14 ++
 arch/x86/kernel/kvm.c   |  262 +++
 2 files changed, 274 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 695399f..427afcb 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -118,10 +118,20 @@ void kvm_async_pf_task_wait(u32 token);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* !CONFIG_PARAVIRT_SPINLOCKS */
+static inline void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
 #define kvm_async_pf_task_wait(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
+
 static inline u32 kvm_read_and_reset_pf_reason(void)
 {
return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a96d32c..9b33a27 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -34,6 +34,7 @@
 #include linux/sched.h
 #include linux/slab.h
 #include linux/kprobes.h
+#include linux/debugfs.h
 #include asm/timer.h
 #include asm/cpu.h
 #include asm/traps.h
@@ -419,6 +420,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
WARN_ON(kvm_register_clock(primary cpu clock));
kvm_guest_cpu_init();
native_smp_prepare_boot_cpu();
+   kvm_spinlock_init();
 }
 
 static void kvm_guest_cpu_online(void *dummy)
@@ -523,3 +525,263 @@ static __init int activate_jump_labels(void)
return 0;
 }
 arch_initcall(activate_jump_labels);
+
+/* Kick a cpu by its apicid. Used to wake up a halted vcpu */
+void kvm_kick_cpu(int cpu)
+{
+   int apicid;
+   unsigned long flags = 0;
+
+   apicid = per_cpu(x86_cpu_to_apicid, cpu);
+   kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
+}
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+enum kvm_contention_stat {
+   TAKEN_SLOW,
+   TAKEN_SLOW_PICKUP,
+   RELEASED_SLOW,
+   RELEASED_SLOW_KICKED,
+   NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+#define HISTO_BUCKETS  30
+
+static struct kvm_spinlock_stats
+{
+   u32 contention_stats[NR_CONTENTION_STATS];
+   u32 histo_spin_blocked[HISTO_BUCKETS+1];
+   u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+   u8 ret;
+   u8 old;
+
+   old = ACCESS_ONCE(zero_stats);
+   if (unlikely(old)) {
+   ret = cmpxchg(zero_stats, old, 0);
+   /* This ensures only one fellow resets the stat */
+   if (ret == old)
+   memset(spinlock_stats, 0, sizeof(spinlock_stats));
+   }
+}
+
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+   check_zero();
+   spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+   return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+   unsigned index;
+
+   index = ilog2(delta);
+   check_zero();
+
+   if (index  HISTO_BUCKETS)
+   array[index]++;
+   else
+   array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+   u32 delta;
+
+   delta = sched_clock() - start;
+   __spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+   spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+   d_kvm_debug = debugfs_create_dir(kvm, NULL);
+   if (!d_kvm_debug)
+   printk(KERN_WARNING Could not create 'kvm' debugfs 
directory\n);
+
+   return d_kvm_debug;
+}
+
+static int __init kvm_spinlock_debugfs(void)
+{
+   struct dentry *d_kvm;
+
+   d_kvm = kvm_init_debugfs();
+   if (d_kvm == NULL)
+   return -ENOMEM;
+
+   d_spin_debug = debugfs_create_dir(spinlocks, d_kvm

[PATCH V12 0/5] Paravirtualized ticket spinlocks for KVM host

2013-08-06 Thread Raghavendra K T

 This series forms the kvm host part of paravirtual spinlock
 based against kvm tree.
 
 Please refer https://lkml.org/lkml/2013/8/6/178 for kvm guest part
 of the series.

 Please note that:
 kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi is a common patch
 for both guest and host.

 Srivatsa Vaddagiri (1):
  kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

 Raghavendra K T (4):
  kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi
  kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration
  kvm hypervisor: Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic
  Documentation/kvm : Add documentation on Hypercalls and features used for PV 
spinlock 
--- 
 Documentation/virtual/kvm/cpuid.txt  |  4 
 Documentation/virtual/kvm/hypercalls.txt | 14 ++
 arch/x86/include/asm/kvm_host.h  |  5 +
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kvm/cpuid.c |  3 ++-
 arch/x86/kvm/lapic.c |  5 -
 arch/x86/kvm/x86.c   | 31 ++-
 include/uapi/linux/kvm_para.h|  1 +
 8 files changed, 61 insertions(+), 3 deletions(-)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V12 1/5] kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi

2013-08-06 Thread Raghavendra K T
kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

this is needed by both guest and host.

Originally-from: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/uapi/asm/kvm_para.h |1 +
 include/uapi/linux/kvm_para.h|1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 06fdbd9..94dc8ca 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -23,6 +23,7 @@
 #define KVM_FEATURE_ASYNC_PF   4
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
+#define KVM_FEATURE_PV_UNHALT  7
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index cea2c5c..2841f86 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP  2
 #define KVM_HC_FEATURES3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE  4
+#define KVM_HC_KICK_CPU5
 
 /*
  * hypercalls use architecture specific

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V12 4/5] kvm hypervisor : Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic

2013-08-06 Thread Raghavendra K T
kvm hypervisor: Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

Note that we are using APIC_DM_REMRD which has reserved usage.
In future if APIC_DM_REMRD usage is standardized, then we should
find some other way or go back to old method.

Suggested-by: Gleb Natapov g...@redhat.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/kvm/lapic.c |5 -
 arch/x86/kvm/x86.c   |   25 ++---
 2 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index afc1124..48c13c9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -706,7 +706,10 @@ out:
break;
 
case APIC_DM_REMRD:
-   apic_debug(Ignoring delivery mode 3\n);
+   result = 1;
+   vcpu-arch.pv.pv_unhalted = 1;
+   kvm_make_request(KVM_REQ_EVENT, vcpu);
+   kvm_vcpu_kick(vcpu);
break;
 
case APIC_DM_SMI:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1e73dab..640d112 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5502,27 +5502,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
  */
 static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int 
apicid)
 {
-   struct kvm_vcpu *vcpu = NULL;
-   int i;
+   struct kvm_lapic_irq lapic_irq;
 
-   kvm_for_each_vcpu(i, vcpu, kvm) {
-   if (!kvm_apic_present(vcpu))
-   continue;
+   lapic_irq.shorthand = 0;
+   lapic_irq.dest_mode = 0;
+   lapic_irq.dest_id = apicid;
 
-   if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
-   break;
-   }
-   if (vcpu) {
-   /*
-* Setting unhalt flag here can result in spurious runnable
-* state when unhalt reset does not happen in vcpu_block.
-* But that is harmless since that should soon result in halt.
-*/
-   vcpu-arch.pv.pv_unhalted = true;
-   /* We need everybody see unhalt before vcpu unblocks */
-   smp_wmb();
-   kvm_vcpu_kick(vcpu);
-   }
+   lapic_irq.delivery_mode = APIC_DM_REMRD;
+   kvm_irq_delivery_to_apic(kvm, 0, lapic_irq, NULL);
 }
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V12 3/5] kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration

2013-08-06 Thread Raghavendra K T
kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

During migration, any vcpu that got kicked but did not become runnable
(still in halted state) should be runnable after migration.

Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/kvm/x86.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dae4575..1e73dab 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6284,7 +6284,12 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu 
*vcpu,
struct kvm_mp_state *mp_state)
 {
kvm_apic_accept_events(vcpu);
-   mp_state-mp_state = vcpu-arch.mp_state;
+   if (vcpu-arch.mp_state == KVM_MP_STATE_HALTED 
+   vcpu-arch.pv.pv_unhalted)
+   mp_state-mp_state = KVM_MP_STATE_RUNNABLE;
+   else
+   mp_state-mp_state = vcpu-arch.mp_state;
+
return 0;
 }
 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH V12 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

2013-08-06 Thread Raghavendra K T
kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com

kvm_hc_kick_cpu allows the calling vcpu to kick another vcpu out of halt state.
the presence of these hypercalls is indicated to guest via
kvm_feature_pv_unhalt.

Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Suzuki Poulose suz...@in.ibm.com
[Raghu: Apic related changes, folding pvunhalted into vcpu_runnable
 Added flags for future use (suggested by Gleb)]
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/asm/kvm_host.h |5 +
 arch/x86/kvm/cpuid.c|3 ++-
 arch/x86/kvm/x86.c  |   37 +
 3 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f87f7fc..1d1f711 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -511,6 +511,11 @@ struct kvm_vcpu_arch {
 * instruction.
 */
bool write_fault_to_shadow_pgtable;
+
+   /* pv related host specific info */
+   struct {
+   bool pv_unhalted;
+   } pv;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a20ecb5..b110fe6 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -413,7 +413,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 
function,
 (1  KVM_FEATURE_CLOCKSOURCE2) |
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_PV_EOI) |
-(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+(1  KVM_FEATURE_PV_UNHALT);
 
if (sched_info_on())
entry-eax |= (1  KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d21bce5..dae4575 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5495,6 +5495,36 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
return 1;
 }
 
+/*
+ * kvm_pv_kick_cpu_op:  Kick a vcpu.
+ *
+ * @apicid - apicid of vcpu to be kicked.
+ */
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int 
apicid)
+{
+   struct kvm_vcpu *vcpu = NULL;
+   int i;
+
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   if (!kvm_apic_present(vcpu))
+   continue;
+
+   if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
+   break;
+   }
+   if (vcpu) {
+   /*
+* Setting unhalt flag here can result in spurious runnable
+* state when unhalt reset does not happen in vcpu_block.
+* But that is harmless since that should soon result in halt.
+*/
+   vcpu-arch.pv.pv_unhalted = true;
+   /* We need everybody see unhalt before vcpu unblocks */
+   smp_wmb();
+   kvm_vcpu_kick(vcpu);
+   }
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
unsigned long nr, a0, a1, a2, a3, ret;
@@ -5528,6 +5558,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
case KVM_HC_VAPIC_POLL_IRQ:
ret = 0;
break;
+   case KVM_HC_KICK_CPU:
+   kvm_pv_kick_cpu_op(vcpu-kvm, a0, a1);
+   ret = 0;
+   break;
default:
ret = -KVM_ENOSYS;
break;
@@ -5950,6 +5984,7 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
kvm_apic_accept_events(vcpu);
switch(vcpu-arch.mp_state) {
case KVM_MP_STATE_HALTED:
+   vcpu-arch.pv.pv_unhalted = false;
vcpu-arch.mp_state =
KVM_MP_STATE_RUNNABLE;
case KVM_MP_STATE_RUNNABLE:
@@ -6770,6 +6805,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
BUG_ON(vcpu-kvm == NULL);
kvm = vcpu-kvm;
 
+   vcpu-arch.pv.pv_unhalted = false;
vcpu-arch.emulate_ctxt.ops = emulate_ops;
if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu))
vcpu-arch.mp_state = KVM_MP_STATE_RUNNABLE;
@@ -7103,6 +7139,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
!vcpu-arch.apf.halted)
|| !list_empty_careful(vcpu-async_pf.done)
|| kvm_apic_has_events(vcpu)
+   || vcpu-arch.pv.pv_unhalted
|| atomic_read(vcpu-arch.nmi_queued) ||
(kvm_arch_interrupt_allowed(vcpu) 
 kvm_cpu_has_interrupt(vcpu

[PATCH V12 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock

2013-08-06 Thread Raghavendra K T
Documentation/kvm : Add documentation on Hypercalls and features used for PV 
spinlock

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in paravirtual spinlock
enabled guest.

KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be enabled
in guest.

Thanks Vatsa for rewriting KVM_HC_KICK_CPU

Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Gleb Natapov g...@redhat.com
Acked-by: Ingo Molnar mi...@kernel.org
---
 Documentation/virtual/kvm/cpuid.txt  |4 
 Documentation/virtual/kvm/hypercalls.txt |   14 ++
 2 files changed, 18 insertions(+)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 83afe65..22ff659 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -43,6 +43,10 @@ KVM_FEATURE_CLOCKSOURCE2   || 3 || kvmclock 
available at msrs
 KVM_FEATURE_ASYNC_PF   || 4 || async pf can be enabled by
||   || writing to msr 0x4b564d02
 --
+KVM_FEATURE_PV_UNHALT  || 7 || guest checks this feature bit
+   ||   || before enabling paravirtualized
+   ||   || spinlock support.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/Documentation/virtual/kvm/hypercalls.txt 
b/Documentation/virtual/kvm/hypercalls.txt
index ea113b5..022198e 100644
--- a/Documentation/virtual/kvm/hypercalls.txt
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -64,3 +64,17 @@ Purpose: To enable communication between the hypervisor and 
guest there is a
 shared page that contains parts of supervisor visible register state.
 The guest can map this shared page to access its supervisor register through
 memory using this hypercall.
+
+5. KVM_HC_KICK_CPU
+
+Architecture: x86
+Status: active
+Purpose: Hypercall used to wakeup a vcpu from HLT state
+Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
+kernel mode for an event to occur (ex: a spinlock to become available) can
+execute HLT instruction once it has busy-waited for more than a threshold
+time-interval. Execution of HLT instruction would cause the hypervisor to put
+the vcpu to sleep until occurence of an appropriate event. Another vcpu of the
+same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall,
+specifying APIC ID (a1) of the vcpu to be woken up. An additional argument (a0)
+is used in the hypercall for future use.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V12 3/5] kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration

2013-08-06 Thread Raghavendra K T

On 08/07/2013 12:02 AM, Eric Northup wrote:


If this is about migration correctness, could it get folded into the
previous patch 2/5, so that there's not a broken commit which could
hurt bisection?


Yes. It could be. Only reason I maintained like that was,
original author in the previous patch is different (Srivatsa) and I did
not want to merge this hunk when the patch series got evolved to mix
the sign-offs.

Gleb, Paolo please let me know.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V12 0/14] Paravirtualized ticket spinlocks

2013-08-06 Thread Raghavendra K T

On 08/07/2013 02:31 AM, H. Peter Anvin wrote:


What is the baseline for this patchset?  I tried to apply it on top of
3.11-rc4 and I got nontrivial conflicts.



I had based it on top of 445363e8 [ Merge branch 'perf/urgent']
of tip.  Sorry for not mentioning that.

Please let me know, if I should rebase again.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-05 Thread Raghavendra K T



That result was only for patch 18 of the series, not pvspinlock in
general.


Okay - I've re-read the performance numbers and they are impressive, so no
objections from me.

The x86 impact seems to be a straightforward API change, with most of the
changes on the virtualization side. So:

Acked-by: Ingo Molnar mi...@kernel.org

I guess you'd want to carry this in the KVM tree or so - maybe in a
separate branch because it changes Xen as well?



Thank you Ingo for taking a relook.

Gleb, Please let me know if you want me to resend the first 17 patches
with acked-bys. i.e excluding the 18th patch.




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-05 Thread Raghavendra K T

On 08/05/2013 07:35 PM, Paolo Bonzini wrote:

I guess you'd want to carry this in the KVM tree or so - maybe in a
separate branch because it changes Xen as well?


It changes KVM host and guest side, XEN and common x86 spinlock code. I
think it would be best to merge common x86 spinlock bits and guest side
KVM/XEN bits through tip tree and host KVM part will go through KVM
tree. If this is OK with you, Ingo, and XEN folks Raghavendra can send
two separate patch series one for the tip and one for KVM host side.


Sure, that's fine - if the initial series works fine in isolation as well
(i.e. won't break anything).


It would be a big problem if it didn't!  Raghavendra, please send the
two separate series as Gleb explained above.



Yes. Sure.  The patches have been split in that way.

Only thing I am thinking is about KVM_FEATURE_PV_UNHALT, and 
KVM_HC_KICK_CPU definition in the below hunk, that is needed by guest

as well. may be this header file change can be a separate patch so that
duplicate can be handled easily during merge?

I do testing of all combination after splitting and post.

diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h

index 06fdbd9..94dc8ca 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -23,6 +23,7 @@
 #define KVM_FEATURE_ASYNC_PF   4
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
+#define KVM_FEATURE_PV_UNHALT  7

diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index cea2c5c..2841f86 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP  2
 #define KVM_HC_FEATURES3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE  4
+#define KVM_HC_KICK_CPU5


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V11 0/18] Paravirtualized ticket spinlocks

2013-08-05 Thread Raghavendra K T

On 08/06/2013 04:20 AM, H. Peter Anvin wrote:

So, having read through the entire thread I *think* this is what the
status of this patchset is:

1. Patches 1-17 are noncontroversial, Raghavendra is going to send an
update split into two patchsets;


Yes.  Only one patch would be common to both host and guest which will
be sent as a separate patch.
I 'll rebase first patchset to -next and second patchset to kvm tree as
needed.


2. There are at least two versions of patch 15; I think the PATCH
RESEND RFC is the right one.


True.


3. Patch 18 is controversial but there are performance numbers; these
should be integrated in the patch description.


Current plan is to drop for patch 18 for now.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-02 Thread Raghavendra K T

On 08/02/2013 02:53 PM, Ingo Molnar wrote:


* Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:


On 08/01/2013 02:34 PM, Raghavendra K T wrote:

On 08/01/2013 01:15 PM, Gleb Natapov wrote:

Shall I consider this as an ack for kvm part?


For everything except 18/18. For that I still want to see numbers. But
18/18 is pretty independent from the reset of the series so it should
not stop the reset from going in.


Yes. agreed.
I am going to evaluate patch 18 separately and come with results for
that. Now we can consider only 1-17 patches.



Gleb,

32 core machine with HT off 32 vcpu guests.
base = 3.11-rc + patch 1 -17 pvspinlock_v11
patched = base + patch 18

+---+---+---++---+
   dbench  (Throughput in MB/sec higher is better)
+---+---+---++---+
   base  stdev   patchedstdev   %improvement
+---+---+---++---+
1x 14584.3800   146.9074   14705.1000   163.1060 0.82773
2x  1713.730032.87501717.320045.5979 0.20948
3x   967.821242.0257 971.885518.8532 0.41994
4x   685.276425.7150 694.5881 8.3907 1.35882
+---+---+---++---+


Please list stddev in percentage as well ...


Sure. will do this from next time.



a blind stab gave me these figures:


   base  stdev   patchedstdev   %improvement
3x   967.82124.3% 971.8855  1.8% 0.4


That makes the improvement an order of magnitude smaller than the noise of
the measurement ... i.e. totally inconclusive.


Okay. agreed.

I always had seen the positive effect of the patch since it uses ple
handler heuristics, and thus avoiding the directed yield to vcpu's in
halt handler. But the current results clearly does not conclude
anything favoring that. :(

So please drop patch 18 for now.



Also please cut the excessive decimal points: with 2-4% noise what point
is there in 5 decimal point results??


Yes.

Ingo, do you think now the patch series (patch 1 to 17) are in good
shape? or please let me know if you have any concerns to be
addressed.



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-02 Thread Raghavendra K T

On 08/02/2013 03:24 PM, Gleb Natapov wrote:

On Fri, Aug 02, 2013 at 11:25:39AM +0200, Ingo Molnar wrote:

Ingo,

Do you have any concerns reg this series? please let me know if this
looks good now to you.


I'm inclined to NAK it for excessive quotation - who knows how many people
left the discussion in disgust? Was it done to drive away as many
reviewers as possible?


Ingo, Peter,
Sorry for the confusion caused because of nesting. I should have trimmed 
it as Peter also pointed in other thread.

will ensure that is future mails.



Anyway, see my other reply, the measurement results seem hard to interpret
and inconclusive at the moment.


As Gleb already pointed, patch1-17 as a whole giving good performance 
improvement. It was only the patch 18, Gleb had concerns.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-01 Thread Raghavendra K T

On 07/31/2013 11:54 AM, Gleb Natapov wrote:

On Tue, Jul 30, 2013 at 10:13:12PM +0530, Raghavendra K T wrote:

On 07/25/2013 03:08 PM, Raghavendra K T wrote:

On 07/25/2013 02:45 PM, Gleb Natapov wrote:

On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:

On 07/24/2013 06:06 PM, Raghavendra K T wrote:

On 07/24/2013 05:36 PM, Gleb Natapov wrote:

On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:

On 07/24/2013 04:09 PM, Gleb Natapov wrote:

On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:

On 07/23/2013 08:37 PM, Gleb Natapov wrote:

On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:

+static void kvm_lock_spinning(struct arch_spinlock *lock,
__ticket_t want)

[...]

+
+/*
+ * halt until it's our turn and kicked. Note that we do safe
halt
+ * for irq enabled case to avoid hang when lock info is
overwritten
+ * in irq spinlock slowpath and no spurious interrupt occur
to save us.
+ */
+if (arch_irqs_disabled_flags(flags))
+halt();
+else
+safe_halt();
+
+out:

So here now interrupts can be either disabled or enabled. Previous
version disabled interrupts here, so are we sure it is safe to
have them
enabled at this point? I do not see any problem yet, will keep
thinking.


If we enable interrupt here, then



+cpumask_clear_cpu(cpu, waiting_cpus);


and if we start serving lock for an interrupt that came here,
cpumask clear and w-lock=null may not happen atomically.
if irq spinlock does not take slow path we would have non null
value
for lock, but with no information in waitingcpu.

I am still thinking what would be problem with that.


Exactly, for kicker waiting_cpus and w-lock updates are
non atomic anyway.


+w-lock = NULL;
+local_irq_restore(flags);
+spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick vcpu waiting on @lock-head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock,
__ticket_t ticket)
+{
+int cpu;
+
+add_stats(RELEASED_SLOW, 1);
+for_each_cpu(cpu, waiting_cpus) {
+const struct kvm_lock_waiting *w =
per_cpu(lock_waiting, cpu);
+if (ACCESS_ONCE(w-lock) == lock 
+ACCESS_ONCE(w-want) == ticket) {
+add_stats(RELEASED_SLOW_KICKED, 1);
+kvm_kick_cpu(cpu);

What about using NMI to wake sleepers? I think it was
discussed, but
forgot why it was dismissed.


I think I have missed that discussion. 'll go back and check. so
what is the idea here? we can easily wake up the halted vcpus that
have interrupt disabled?

We can of course. IIRC the objection was that NMI handling path
is very
fragile and handling NMI on each wakeup will be more expensive then
waking up a guest without injecting an event, but it is still
interesting
to see the numbers.



Haam, now I remember, We had tried request based mechanism. (new
request like REQ_UNHALT) and process that. It had worked, but had
some
complex hacks in vcpu_enter_guest to avoid guest hang in case of
request cleared.  So had left it there..

https://lkml.org/lkml/2012/4/30/67

But I do not remember performance impact though.

No, this is something different. Wakeup with NMI does not need KVM
changes at
all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.



True. It was not NMI.
just to confirm, are you talking about something like this to be
tried ?

apic-send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);


When I started benchmark, I started seeing
Dazed and confused, but trying to continue from unknown nmi error
handling.
Did I miss anything (because we did not register any NMI handler)? or
is it that spurious NMIs are trouble because we could get spurious NMIs
if next waiter already acquired the lock.

There is a default NMI handler that tries to detect the reason why NMI
happened (which is no so easy on x86) and prints this message if it
fails. You need to add logic to detect spinlock slow path there. Check
bit in waiting_cpus for instance.


aha.. Okay. will check that.


yes. Thanks.. that did the trick.

I did like below in unknown_nmi_error():
if (cpumask_test_cpu(smp_processor_id(), waiting_cpus))
return;

But I believe you asked NMI method only for experimental purpose to
check the upperbound. because as I doubted above, for spurious NMI
(i.e. when unlocker kicks when waiter already got the lock), we would
still hit unknown NMI error.

I had hit spurious NMI over 1656 times over entire benchmark run.
along with
INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too
long to run: 24.886 msecs etc...


I wonder why this happens.


(and we cannot get away with that too because it means we bypass the
unknown NMI error even in genuine cases too)

Here was the result for the my dbench test( 32 core  machine with 32
vcpu guest HT off)

  -- % improvement --
pvspinlock  pvspin_ipi  pvpsin_nmi
dbench_1x   0.9016

Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-01 Thread Raghavendra K T

On 08/01/2013 01:15 PM, Gleb Natapov wrote:

On Thu, Aug 01, 2013 at 01:08:47PM +0530, Raghavendra K T wrote:

On 07/31/2013 11:54 AM, Gleb Natapov wrote:

On Tue, Jul 30, 2013 at 10:13:12PM +0530, Raghavendra K T wrote:

On 07/25/2013 03:08 PM, Raghavendra K T wrote:

On 07/25/2013 02:45 PM, Gleb Natapov wrote:

On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:

On 07/24/2013 06:06 PM, Raghavendra K T wrote:

On 07/24/2013 05:36 PM, Gleb Natapov wrote:

On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:

On 07/24/2013 04:09 PM, Gleb Natapov wrote:

On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:

On 07/23/2013 08:37 PM, Gleb Natapov wrote:

On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:

+static void kvm_lock_spinning(struct arch_spinlock *lock,
__ticket_t want)

[...]

+
+/*
+ * halt until it's our turn and kicked. Note that we do safe
halt
+ * for irq enabled case to avoid hang when lock info is
overwritten
+ * in irq spinlock slowpath and no spurious interrupt occur
to save us.
+ */
+if (arch_irqs_disabled_flags(flags))
+halt();
+else
+safe_halt();
+
+out:

So here now interrupts can be either disabled or enabled. Previous
version disabled interrupts here, so are we sure it is safe to
have them
enabled at this point? I do not see any problem yet, will keep
thinking.


If we enable interrupt here, then



+cpumask_clear_cpu(cpu, waiting_cpus);


and if we start serving lock for an interrupt that came here,
cpumask clear and w-lock=null may not happen atomically.
if irq spinlock does not take slow path we would have non null
value
for lock, but with no information in waitingcpu.

I am still thinking what would be problem with that.


Exactly, for kicker waiting_cpus and w-lock updates are
non atomic anyway.


+w-lock = NULL;
+local_irq_restore(flags);
+spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick vcpu waiting on @lock-head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock,
__ticket_t ticket)
+{
+int cpu;
+
+add_stats(RELEASED_SLOW, 1);
+for_each_cpu(cpu, waiting_cpus) {
+const struct kvm_lock_waiting *w =
per_cpu(lock_waiting, cpu);
+if (ACCESS_ONCE(w-lock) == lock 
+ACCESS_ONCE(w-want) == ticket) {
+add_stats(RELEASED_SLOW_KICKED, 1);
+kvm_kick_cpu(cpu);

What about using NMI to wake sleepers? I think it was
discussed, but
forgot why it was dismissed.


I think I have missed that discussion. 'll go back and check. so
what is the idea here? we can easily wake up the halted vcpus that
have interrupt disabled?

We can of course. IIRC the objection was that NMI handling path
is very
fragile and handling NMI on each wakeup will be more expensive then
waking up a guest without injecting an event, but it is still
interesting
to see the numbers.



Haam, now I remember, We had tried request based mechanism. (new
request like REQ_UNHALT) and process that. It had worked, but had
some
complex hacks in vcpu_enter_guest to avoid guest hang in case of
request cleared.  So had left it there..

https://lkml.org/lkml/2012/4/30/67

But I do not remember performance impact though.

No, this is something different. Wakeup with NMI does not need KVM
changes at
all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.



True. It was not NMI.
just to confirm, are you talking about something like this to be
tried ?

apic-send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);


When I started benchmark, I started seeing
Dazed and confused, but trying to continue from unknown nmi error
handling.
Did I miss anything (because we did not register any NMI handler)? or
is it that spurious NMIs are trouble because we could get spurious NMIs
if next waiter already acquired the lock.

There is a default NMI handler that tries to detect the reason why NMI
happened (which is no so easy on x86) and prints this message if it
fails. You need to add logic to detect spinlock slow path there. Check
bit in waiting_cpus for instance.


aha.. Okay. will check that.


yes. Thanks.. that did the trick.

I did like below in unknown_nmi_error():
if (cpumask_test_cpu(smp_processor_id(), waiting_cpus))
return;

But I believe you asked NMI method only for experimental purpose to
check the upperbound. because as I doubted above, for spurious NMI
(i.e. when unlocker kicks when waiter already got the lock), we would
still hit unknown NMI error.

I had hit spurious NMI over 1656 times over entire benchmark run.
along with
INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too
long to run: 24.886 msecs etc...


I wonder why this happens.


(and we cannot get away with that too because it means we bypass the
unknown NMI error even in genuine cases too)

Here was the result for the my dbench test( 32 core  machine with 32
vcpu guest HT off

Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-01 Thread Raghavendra K T

On 08/01/2013 02:34 PM, Raghavendra K T wrote:

On 08/01/2013 01:15 PM, Gleb Natapov wrote:

Shall I consider this as an ack for kvm part?


For everything except 18/18. For that I still want to see numbers. But
18/18 is pretty independent from the reset of the series so it should
not stop the reset from going in.


Yes. agreed.
I am going to evaluate patch 18 separately and come with results for
that. Now we can consider only 1-17 patches.



Gleb,

32 core machine with HT off 32 vcpu guests.
base = 3.11-rc + patch 1 -17 pvspinlock_v11
patched = base + patch 18

+---+---+---++---+
  dbench  (Throughput in MB/sec higher is better)
+---+---+---++---+
  base  stdev   patchedstdev   %improvement
+---+---+---++---+
1x 14584.3800   146.9074   14705.1000   163.1060 0.82773
2x  1713.730032.87501717.320045.5979 0.20948
3x   967.821242.0257 971.885518.8532 0.41994
4x   685.276425.7150 694.5881 8.3907 1.35882
+---+---+---++---+

I saw -0.78% - .84% changes in ebizzy and 1-2% improvement in
hackbench.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


  1   2   3   4   >