Re: [PATCH 0/6] x86: reduce paravirtualized spinlock overhead

2015-05-05 Thread Jeremy Fitzhardinge
On 05/03/2015 10:55 PM, Juergen Gross wrote:
 I did a small measurement of the pure locking functions on bare metal
 without and with my patches.

 spin_lock() for the first time (lock and code not in cache) dropped from
 about 600 to 500 cycles.

 spin_unlock() for first time dropped from 145 to 87 cycles.

 spin_lock() in a loop dropped from 48 to 45 cycles.

 spin_unlock() in the same loop dropped from 24 to 22 cycles.

Did you isolate icache hot/cold from dcache hot/cold? It seems to me the
main difference will be whether the branch predictor is warmed up rather
than if the lock itself is in dcache, but its much more likely that the
lock code is icache if the code is lock intensive, making the cold case
moot. But that's pure speculation.

Could you see any differences in workloads beyond microbenchmarks?

Not that its my call at all, but I think we'd need to see some concrete
improvements in real workloads before adding the complexity of more pvops.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/6] x86: reduce paravirtualized spinlock overhead

2015-04-30 Thread Jeremy Fitzhardinge
On 04/30/2015 03:53 AM, Juergen Gross wrote:
 Paravirtualized spinlocks produce some overhead even if the kernel is
 running on bare metal. The main reason are the more complex locking
 and unlocking functions. Especially unlocking is no longer just one
 instruction but so complex that it is no longer inlined.

 This patch series addresses this issue by adding two more pvops
 functions to reduce the size of the inlined spinlock functions. When
 running on bare metal unlocking is again basically one instruction.

Out of curiosity, is there a measurable difference?

J

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Jeremy Fitzhardinge

On 02/11/2015 09:24 AM, Oleg Nesterov wrote:
 I agree, and I have to admit I am not sure I fully understand why
 unlock uses the locked add. Except we need a barrier to avoid the race
 with the enter_slowpath() users, of course. Perhaps this is the only
 reason?

Right now it needs to be a locked operation to prevent read-reordering.
x86 memory ordering rules state that all writes are seen in a globally
consistent order, and are globally ordered wrt reads *on the same
addresses*, but reads to different addresses can be reordered wrt to writes.

So, if the unlocking add were not a locked operation:

__add(lock-tickets.head, TICKET_LOCK_INC);/* not locked */

if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

Then the read of lock-tickets.tail can be reordered before the unlock,
which introduces a race:

/* read reordered here */
if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG)) /* false */
/* ... */;

/* other CPU sets SLOWPATH and blocks */

__add(lock-tickets.head, TICKET_LOCK_INC);/* not locked */

/* other CPU hung */

So it doesn't *have* to be a locked operation. This should also work:

__add(lock-tickets.head, TICKET_LOCK_INC);/* not locked */

lfence();   /* prevent read 
reordering */
if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

but in practice a locked add is cheaper than an lfence (or at least was).

This *might* be OK, but I think it's on dubious ground:

__add(lock-tickets.head, TICKET_LOCK_INC);/* not locked */

/* read overlaps write, and so is ordered */
if (unlikely(lock-head_tail  (TICKET_SLOWPATH_FLAG  TICKET_SHIFT))
__ticket_unlock_slowpath(lock, prev);

because I think Intel and AMD differed in interpretation about how
overlapping but different-sized reads  writes are ordered (or it simply
isn't architecturally defined).

If the slowpath flag is moved to head, then it would always have to be
locked anyway, because it needs to be atomic against other CPU's RMW
operations setting the flag.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Jeremy Fitzhardinge

On 02/11/2015 03:28 PM, Linus Torvalds wrote:


 On Feb 11, 2015 3:15 PM, Jeremy Fitzhardinge jer...@goop.org
 mailto:jer...@goop.org wrote:
 
  Right now it needs to be a locked operation to prevent read-reordering.
  x86 memory ordering rules state that all writes are seen in a globally
  consistent order, and are globally ordered wrt reads *on the same
  addresses*, but reads to different addresses can be reordered wrt to
 writes.

 The modern x86 rules are actually much tighter than that.

 Every store is a release, and every load is an acquire. So a
 non-atomic store is actually a perfectly fine unlock. All preceding
 stores will be seen by other cpu's before the unlock, and while reads
 can pass stores, they only pass *earlier* stores.


Right, so in this particular instance, the read of the SLOWPATH flag
*can't* pass the previous unlock store, hence the need for an atomic
unlock or some other mechanism to prevent the read from being reordered.

J

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-10 Thread Jeremy Fitzhardinge

On 02/10/2015 05:26 AM, Oleg Nesterov wrote:
 On 02/10, Raghavendra K T wrote:
 On 02/10/2015 06:23 AM, Linus Torvalds wrote:

  add_smp(lock-tickets.head, TICKET_LOCK_INC);
  if (READ_ONCE(lock-tickets.tail)  TICKET_SLOWPATH_FLAG) ..

 into something like

  val = xadd((lock-ticket.head_tail, TICKET_LOCK_INC  
 TICKET_SHIFT);
  if (unlikely(val  TICKET_SLOWPATH_FLAG)) ...

 would be the right thing to do. Somebody should just check that I got
 that shift right, and that the tail is in the high bytes (head really
 needs to be high to work, if it's in the low byte(s) the xadd would
 overflow from head into tail which would be wrong).
 Unfortunately xadd could result in head overflow as tail is high.

 The other option was repeated cmpxchg which is bad I believe.
 Any suggestions?
 Stupid question... what if we simply move SLOWPATH from .tail to .head?
 In this case arch_spin_unlock() could do xadd(tickets.head) and check
 the result

Well, right now, tail is manipulated by locked instructions by CPUs
who are contending for the ticketlock, but head can be manipulated
unlocked by the CPU which currently owns the ticketlock. If SLOWPATH
moved into head, then non-owner CPUs would be touching head, requiring
everyone to use locked instructions on it.

That's the theory, but I don't see much (any?) code which depends on that.

Ideally we could find a way so that pv ticketlocks could use a plain
unlocked add for the unlock like the non-pv case, but I just don't see a
way to do it.

 In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg
 the whole .head_tail. Plus obviously more boring changes. This needs a
 separate patch even _if_ this can work.

Definitely.

 BTW. If we move clear slowpath into lock path, then probably trylock
 should be changed too? Something like below, we just need to clear SLOWPATH
 before cmpxchg.

How important / widely used is trylock these days?

J


 Oleg.

 --- x/arch/x86/include/asm/spinlock.h
 +++ x/arch/x86/include/asm/spinlock.h
 @@ -109,7 +109,8 @@ static __always_inline int arch_spin_try
   if (old.tickets.head != (old.tickets.tail  ~TICKET_SLOWPATH_FLAG))
   return 0;
  
 - new.head_tail = old.head_tail + (TICKET_LOCK_INC  TICKET_SHIFT);
 + new.tickets.head = old.tickets.head;
 + new.tickets.tail = (old.tickets.tail  ~TICKET_SLOWPATH_FLAG) + 
 TICKET_LOCK_INC;
  
   /* cmpxchg is a full barrier, so nothing can move before it */
   return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
 old.head_tail;


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-08 Thread Jeremy Fitzhardinge
On 02/06/2015 06:49 AM, Raghavendra K T wrote:
 Paravirt spinlock clears slowpath flag after doing unlock.
 As explained by Linus currently it does:
 prev = *lock;
 add_smp(lock-tickets.head, TICKET_LOCK_INC);

 /* add_smp() is a full mb() */

 if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);


 which is *exactly* the kind of things you cannot do with spinlocks,
 because after you've done the add_smp() and released the spinlock
 for the fast-path, you can't access the spinlock any more.  Exactly
 because a fast-path lock might come in, and release the whole data
 structure.

Yeah, that's an embarrasingly obvious bug in retrospect.

 Linus suggested that we should not do any writes to lock after unlock(),
 and we can move slowpath clearing to fastpath lock.

Yep, that seems like a sound approach.

 However it brings additional case to be handled, viz., slowpath still
 could be set when somebody does arch_trylock. Handle that too by ignoring
 slowpath flag during lock availability check.

 Reported-by: Sasha Levin sasha.le...@oracle.com
 Suggested-by: Linus Torvalds torva...@linux-foundation.org
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
  arch/x86/include/asm/spinlock.h | 70 
 -
  1 file changed, 34 insertions(+), 36 deletions(-)

 diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
 index 625660f..0829f86 100644
 --- a/arch/x86/include/asm/spinlock.h
 +++ b/arch/x86/include/asm/spinlock.h
 @@ -49,6 +49,23 @@ static inline void __ticket_enter_slowpath(arch_spinlock_t 
 *lock)
   set_bit(0, (volatile unsigned long *)lock-tickets.tail);
  }
  
 +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
 +{
 + arch_spinlock_t old, new;
 + __ticket_t diff;
 +
 + old.tickets = READ_ONCE(lock-tickets);

Couldn't the caller pass in the lock state that it read rather than
re-reading it?

 + diff = (old.tickets.tail  ~TICKET_SLOWPATH_FLAG) - old.tickets.head;
 +
 + /* try to clear slowpath flag when there are no contenders */
 + if ((old.tickets.tail  TICKET_SLOWPATH_FLAG) 
 + (diff == TICKET_LOCK_INC)) {
 + new = old;
 + new.tickets.tail = ~TICKET_SLOWPATH_FLAG;
 + cmpxchg(lock-head_tail, old.head_tail, new.head_tail);
 + }
 +}
 +
  #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
  static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock,
   __ticket_t ticket)
 @@ -59,6 +76,10 @@ static inline void __ticket_unlock_kick(arch_spinlock_t 
 *lock,
  {
  }
  
 +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
 +{
 +}
 +
  #endif /* CONFIG_PARAVIRT_SPINLOCKS */
  
  static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 @@ -84,7 +105,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t 
 *lock)
   register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
  
   inc = xadd(lock-tickets, inc);
 - if (likely(inc.head == inc.tail))
 + if (likely(inc.head == (inc.tail  ~TICKET_SLOWPATH_FLAG)))

The intent of this conditional was to be the quickest possible path when
taking a fastpath lock, with the code below being used for all slowpath
locks (free or taken). So I don't think masking out SLOWPATH_FLAG is
necessary here.

   goto out;
  
   inc.tail = ~TICKET_SLOWPATH_FLAG;
 @@ -98,7 +119,10 @@ static __always_inline void 
 arch_spin_lock(arch_spinlock_t *lock)
   } while (--count);
   __ticket_lock_spinning(lock, inc.tail);
   }
 -out: barrier();  /* make sure nothing creeps before the lock is taken */
 +out:
 + __ticket_check_and_clear_slowpath(lock);
 +
 + barrier();  /* make sure nothing creeps before the lock is taken */

Which means that if goto out path is only ever used for fastpath
locks, you can limit calling __ticket_check_and_clear_slowpath() to the
slowpath case.

  }
  
  static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 @@ -115,47 +139,21 @@ static __always_inline int 
 arch_spin_trylock(arch_spinlock_t *lock)
   return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
 old.head_tail;
  }
  
 -static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock,
 - arch_spinlock_t old)
 -{
 - arch_spinlock_t new;
 -
 - BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
 -
 - /* Perform the unlock on the before copy */
 - old.tickets.head += TICKET_LOCK_INC;

NB (see below)

 -
 - /* Clear the slowpath flag */
 - new.head_tail = old.head_tail  ~(TICKET_SLOWPATH_FLAG  TICKET_SHIFT);
 -
 - /*
 -  * If the lock is uncontended, clear the flag - use cmpxchg in
 -  * case it changes behind our back 

Re: [PATCH delta V13 14/14] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-13 Thread Jeremy Fitzhardinge
On 08/13/2013 01:02 PM, Raghavendra K T wrote:
 * Ingo Molnar mi...@kernel.org [2013-08-13 18:55:52]:

 Would be nice to have a delta fix patch against tip:x86/spinlocks, which 
 I'll then backmerge into that series via rebasing it.

 There was a namespace collision of PER_CPU lock_waiting variable when
 we have both Xen and KVM enabled. 

 Perhaps this week wasn't for me. Had run 100 times randconfig in a loop
 for the fix sent earlier :(. 

 Ingo, below delta patch should fix it, IIRC, I hope you will be folding this
 back to patch 14/14 itself. Else please let me.
 I have already run allnoconfig, allyesconfig, randconfig with below patch. 
 But will
 test again. This should apply on top of tip:x86/spinlocks.

 ---8---
 From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

 Fix Namespace collision for lock_waiting

 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
 index d442471..b8ef630 100644
 --- a/arch/x86/kernel/kvm.c
 +++ b/arch/x86/kernel/kvm.c
 @@ -673,7 +673,7 @@ struct kvm_lock_waiting {
  static cpumask_t waiting_cpus;
  
  /* Track spinlock on which a cpu is waiting */
 -static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
 +static DEFINE_PER_CPU(struct kvm_lock_waiting, klock_waiting);

Has static stopped meaning static?

J

  
  static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
  {
 @@ -685,7 +685,7 @@ static void kvm_lock_spinning(struct arch_spinlock *lock, 
 __ticket_t want)
   if (in_nmi())
   return;
  
 - w = __get_cpu_var(lock_waiting);
 + w = __get_cpu_var(klock_waiting);
   cpu = smp_processor_id();
   start = spin_time_start();
  
 @@ -756,7 +756,7 @@ static void kvm_unlock_kick(struct arch_spinlock *lock, 
 __ticket_t ticket)
  
   add_stats(RELEASED_SLOW, 1);
   for_each_cpu(cpu, waiting_cpus) {
 - const struct kvm_lock_waiting *w = per_cpu(lock_waiting, cpu);
 + const struct kvm_lock_waiting *w = per_cpu(klock_waiting, cpu);
   if (ACCESS_ONCE(w-lock) == lock 
   ACCESS_ONCE(w-want) == ticket) {
   add_stats(RELEASED_SLOW_KICKED, 1);



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V9 0/19] Paravirtualized ticket spinlocks

2013-06-01 Thread Jeremy Fitzhardinge
On 06/01/2013 01:14 PM, Andi Kleen wrote:
 FWIW I use the paravirt spinlock ops for adding lock elision
 to the spinlocks.

Does lock elision still use the ticketlock algorithm/structure, or are
they different?  If they're still basically ticketlocks, then it seems
to me that they're complimentary - hle handles the fastpath, and pv the
slowpath.

 This needs to be done at the top level (so the level you're removing)

 However I don't like the pv mechanism very much and would 
 be fine with using an static key hook in the main path
 like I do for all the other lock types.

Right.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V9 1/19] x86/spinlock: Replace pv spinlocks with pv ticketlocks

2013-06-01 Thread Jeremy Fitzhardinge
On 06/01/2013 12:21 PM, Raghavendra K T wrote:
 x86/spinlock: Replace pv spinlocks with pv ticketlocks

 From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
I'm not sure what the etiquette is here; I did the work while at Citrix,
but jer...@goop.org is my canonical email address.  The Citrix address
is dead and bounces, so is useless for anything.  Probably best to
change it.

J


 Rather than outright replacing the entire spinlock implementation in
 order to paravirtualize it, keep the ticket lock implementation but add
 a couple of pvops hooks on the slow patch (long spin on lock, unlocking
 a contended lock).

 Ticket locks have a number of nice properties, but they also have some
 surprising behaviours in virtual environments.  They enforce a strict
 FIFO ordering on cpus trying to take a lock; however, if the hypervisor
 scheduler does not schedule the cpus in the correct order, the system can
 waste a huge amount of time spinning until the next cpu can take the lock.

 (See Thomas Friebel's talk Prevent Guests from Spinning Around
 http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

 To address this, we add two hooks:
  - __ticket_spin_lock which is called after the cpu has been
spinning on the lock for a significant number of iterations but has
failed to take the lock (presumably because the cpu holding the lock
has been descheduled).  The lock_spinning pvop is expected to block
the cpu until it has been kicked by the current lock holder.
  - __ticket_spin_unlock, which on releasing a contended lock
(there are more cpus with tail tickets), it looks to see if the next
cpu is blocked and wakes it if so.

 When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
 functions causes all the extra code to go away.

 Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
 Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 Tested-by: Attilio Rao attilio@citrix.com
 [ Raghavendra: Changed SPIN_THRESHOLD ]
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
  arch/x86/include/asm/paravirt.h   |   32 
  arch/x86/include/asm/paravirt_types.h |   10 ++
  arch/x86/include/asm/spinlock.h   |   53 
 +++--
  arch/x86/include/asm/spinlock_types.h |4 --
  arch/x86/kernel/paravirt-spinlocks.c  |   15 +
  arch/x86/xen/spinlock.c   |8 -
  6 files changed, 61 insertions(+), 61 deletions(-)

 diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
 index cfdc9ee..040e72d 100644
 --- a/arch/x86/include/asm/paravirt.h
 +++ b/arch/x86/include/asm/paravirt.h
 @@ -712,36 +712,16 @@ static inline void __set_fixmap(unsigned /* enum 
 fixed_addresses */ idx,
  
  #if defined(CONFIG_SMP)  defined(CONFIG_PARAVIRT_SPINLOCKS)
  
 -static inline int arch_spin_is_locked(struct arch_spinlock *lock)
 +static __always_inline void __ticket_lock_spinning(struct arch_spinlock 
 *lock,
 + __ticket_t ticket)
  {
 - return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
 + PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
  }
  
 -static inline int arch_spin_is_contended(struct arch_spinlock *lock)
 +static __always_inline void ticket_unlock_kick(struct arch_spinlock 
 *lock,
 + __ticket_t ticket)
  {
 - return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
 -}
 -#define arch_spin_is_contended   arch_spin_is_contended
 -
 -static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 -{
 - PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
 -}
 -
 -static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
 -   unsigned long flags)
 -{
 - PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
 -}
 -
 -static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
 -{
 - return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
 -}
 -
 -static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
 -{
 - PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
 + PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
  }
  
  #endif
 diff --git a/arch/x86/include/asm/paravirt_types.h 
 b/arch/x86/include/asm/paravirt_types.h
 index 0db1fca..d5deb6d 100644
 --- a/arch/x86/include/asm/paravirt_types.h
 +++ b/arch/x86/include/asm/paravirt_types.h
 @@ -327,13 +327,11 @@ struct pv_mmu_ops {
  };
  
  struct arch_spinlock;
 +#include asm/spinlock_types.h
 +
  struct pv_lock_ops {
 - int (*spin_is_locked)(struct arch_spinlock *lock);
 - int (*spin_is_contended)(struct arch_spinlock *lock);
 - void (*spin_lock)(struct arch_spinlock *lock);
 - void (*spin_lock_flags)(struct arch_spinlock *lock, unsigned long 
 flags);
 - int (*spin_trylock)(struct arch_spinlock *lock);
 - void (*spin_unlock

Re: [PATCH 09/14] xen: events: Remove redundant check on unsigned variable

2012-11-16 Thread Jeremy Fitzhardinge
To be honest I'd nack this kind of patch. The test is only redundant in the 
most trivial sense that the compiler can easily optimise away. The point of the 
test is to make sure that the range is OK even if the type subsequently becomes 
signed (to hold a -ve error, for example).

J

Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:

On Fri, Nov 16, 2012 at 12:20:41PM +0530, Tushar Behera wrote:
 No need to check whether unsigned variable is less than 0.
 
 CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com

Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com

 CC: Jeremy Fitzhardinge jer...@goop.org
 CC: xen-de...@lists.xensource.com
 CC: virtualization@lists.linux-foundation.org
 Signed-off-by: Tushar Behera tushar.beh...@linaro.org
 ---
  drivers/xen/events.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/xen/events.c b/drivers/xen/events.c
 index 4293c57..cadd7d1 100644
 --- a/drivers/xen/events.c
 +++ b/drivers/xen/events.c
 @@ -216,7 +216,7 @@ static void xen_irq_info_pirq_init(unsigned irq,
   */
  static unsigned int evtchn_from_irq(unsigned irq)
  {
 -if (unlikely(WARN(irq  0 || irq = nr_irqs, Invalid irq %d!\n,
irq)))
 +if (unlikely(WARN(irq = nr_irqs, Invalid irq %d!\n, irq)))
  return 0;
  
  return info_for_irq(irq)-evtchn;
 -- 
 1.7.4.1

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC V8 0/17] Paravirtualized ticket spinlocks

2012-05-14 Thread Jeremy Fitzhardinge
On 05/13/2012 11:45 AM, Raghavendra K T wrote:
 On 05/07/2012 08:22 PM, Avi Kivity wrote:

 I could not come with pv-flush results (also Nikunj had clarified that
 the result was on NOn PLE

 I'd like to see those numbers, then.

 Ingo, please hold on the kvm-specific patches, meanwhile.


 3 guests 8GB RAM, 1 used for kernbench
 (kernbench -f -H -M -o 20) other for cpuhog (shell script with  while
 true do hackbench)

 1x: no hogs
 2x: 8hogs in one guest
 3x: 8hogs each in two guest

 kernbench on PLE:
 Machine : IBM xSeries with Intel(R) Xeon(R)  X7560 2.27GHz CPU with 32
 core, with 8 online cpus and 4*64GB RAM.

 The average is taken over 4 iterations with 3 run each (4*3=12). and
 stdev is calculated over mean reported in each run.


 A): 8 vcpu guest

  BASEBASE+patch %improvement w.r.t
  mean (sd)   mean (sd) 
 patched kernel time
 case 1*1x:61.7075  (1.17872)60.93 (1.475625)1.27605
 case 1*2x:107.2125 (1.3821349)97.506675 (1.3461878)   9.95401
 case 1*3x:144.3515 (1.8203927)138.9525  (0.58309319)  3.8855


 B): 16 vcpu guest
  BASEBASE+patch %improvement w.r.t
  mean (sd)   mean (sd) 
 patched kernel time
 case 2*1x:70.524   (1.5941395)69.68866  (1.9392529)   1.19867
 case 2*2x:133.0738 (1.4558653)124.8568  (1.4544986)   6.58114
 case 2*3x:206.0094 (1.3437359)181.4712  (2.9134116)   13.5218

 B): 32 vcpu guest
  BASEBASE+patch %improvementw.r.t
  mean (sd)   mean (sd) 
 patched kernel time
 case 4*1x:100.61046 (2.7603485) 85.48734  (2.6035035)  17.6905

What does the 4*1x notation mean? Do these workloads have overcommit
of the PCPU resources?

When I measured it, even quite small amounts of overcommit lead to large
performance drops with non-pv ticket locks (on the order of 10%
improvements when there were 5 busy VCPUs on a 4 cpu system).  I never
tested it on larger machines, but I guess that represents around 25%
overcommit, or 40 busy VCPUs on a 32-PCPU system.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V8 0/17] Paravirtualized ticket spinlocks

2012-05-07 Thread Jeremy Fitzhardinge
On 05/07/2012 06:49 AM, Avi Kivity wrote:
 On 05/07/2012 04:46 PM, Srivatsa Vaddagiri wrote:
 * Raghavendra K T raghavendra...@linux.vnet.ibm.com [2012-05-07 19:08:51]:

 I 'll get hold of a PLE mc  and come up with the numbers soon. but I
 'll expect the improvement around 1-3% as it was in last version.
 Deferring preemption (when vcpu is holding lock) may give us better than 
 1-3% 
 results on PLE hardware. Something worth trying IMHO.
 Is the improvement so low, because PLE is interfering with the patch, or
 because PLE already does a good job?

How does PLE help with ticket scheduling on unlock?  I thought it would
just help with the actual spin loops.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH RFC V6 0/11] Paravirtualized ticketlocks

2012-04-16 Thread Jeremy Fitzhardinge
On 04/16/2012 09:36 AM, Ian Campbell wrote:
 On Mon, 2012-04-16 at 16:44 +0100, Konrad Rzeszutek Wilk wrote:
 On Sat, Mar 31, 2012 at 09:37:45AM +0530, Srivatsa Vaddagiri wrote:
 * Thomas Gleixner t...@linutronix.de [2012-03-31 00:07:58]:

 I know that Peter is going to go berserk on me, but if we are running
 a paravirt guest then it's simple to provide a mechanism which allows
 the host (aka hypervisor) to check that in the guest just by looking
 at some global state.

 So if a guest exits due to an external event it's easy to inspect the
 state of that guest and avoid to schedule away when it was interrupted
 in a spinlock held section. That guest/host shared state needs to be
 modified to indicate the guest to invoke an exit when the last nested
 lock has been released.
 I had attempted something like that long back:

 http://lkml.org/lkml/2010/6/3/4

 The issue is with ticketlocks though. VCPUs could go into a spin w/o
 a lock being held by anybody. Say VCPUs 1-99 try to grab a lock in
 that order (on a host with one cpu). VCPU1 wins (after VCPU0 releases it)
 and releases the lock. VCPU1 is next eligible to take the lock. If 
 that is not scheduled early enough by host, then remaining vcpus would keep 
 spinning (even though lock is technically not held by anybody) w/o making 
 forward progress.

 In that situation, what we really need is for the guest to hint to host
 scheduler to schedule VCPU1 early (via yield_to or something similar). 

 The current pv-spinlock patches however does not track which vcpu is
 spinning at what head of the ticketlock. I suppose we can consider 
 that optimization in future and see how much benefit it provides (over
 plain yield/sleep the way its done now).
 Right. I think Jeremy played around with this some time?
 5/11 xen/pvticketlock: Xen implementation for PV ticket locks tracks
 which vcpus are waiting for a lock in cpumask_t waiting_cpus and
 tracks which lock each is waiting for in per-cpu lock_waiting. This is
 used in xen_unlock_kick to kick the right CPU. There's a loop over only
 the waiting cpus to figure out who to kick.

Yes, and AFAIK the KVM pv-ticketlock patches do the same thing.  If a
(V)CPU is asleep, then sending it a kick is pretty much equivalent to a
yield to (not precisely, but it should get scheduled soon enough, and it
won't be competing with a pile of VCPUs with no useful work to do).

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

2012-01-17 Thread Jeremy Fitzhardinge
On 01/17/2012 10:33 PM, Srivatsa Vaddagiri wrote:
 * Marcelo Tosatti mtosa...@redhat.com [2012-01-17 09:02:11]:

 +/* Kick vcpu waiting on @lock-head to reach value @ticket */
 +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
 +{
 +   int cpu;
 +   int apicid;
 +
 +   add_stats(RELEASED_SLOW, 1);
 +
 +   for_each_cpu(cpu, waiting_cpus) {
 +   const struct kvm_lock_waiting *w = per_cpu(lock_waiting, cpu);
 +   if (ACCESS_ONCE(w-lock) == lock 
 +   ACCESS_ONCE(w-want) == ticket) {
 +   add_stats(RELEASED_SLOW_KICKED, 1);
 +   apicid = per_cpu(x86_cpu_to_apicid, cpu);
 +   kvm_kick_cpu(apicid);
 +   break;
 +   }
 +   }
 What prevents a kick from being lost here, if say, the waiter is at
 local_irq_save in kvm_lock_spinning, before the lock/want assignments?
 The waiter does check for lock becoming available before actually
 sleeping:

 + /*
 +* check again make sure it didn't become free while
 +* we weren't looking.
 +*/
 + if (ACCESS_ONCE(lock-tickets.head) == want) {
 +   add_stats(TAKEN_SLOW_PICKUP, 1);
 +   goto out;
 + }

That logic relies on the kick being level triggered, so that kick
before block will cause the block to fall out immediately.  If you're
using hlt as the block and it has the usual edge-triggered behaviour,
what stops a kick-before-hlt from losing the kick?

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

2012-01-16 Thread Jeremy Fitzhardinge
On 01/17/2012 01:47 AM, Avi Kivity wrote:
 On 01/16/2012 04:13 PM, Raghavendra K T wrote:
 Please drop all of these and replace with tracepoints in the appropriate
 spots.  Everything else (including the historgram) can be reconstructed
 the tracepoints in userspace.


 I think Jeremy pointed that tracepoints use spinlocks and hence debugfs
 is the option.. no ?

 Yeah, I think you're right.  What a pity.

Yes, I went through the same thought process.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests

2012-01-16 Thread Jeremy Fitzhardinge
On 01/16/2012 09:24 PM, Alexander Graf wrote:
 This is true in case you're spinning. If on overcommit spinlocks would
 instead of spin just yield(), we wouldn't have any vcpu running that's
 just waiting for a late ticket.

Yes, but the reality is that most spinlocks are held for a short period
of time and there's a low likelihood of being preempted while within a
spinlock critical section.  Therefore if someone else tries to get the
spinlock and there's contention, it's always worth spinning for a little
while because the lock will likely become free soon.

At least that's the case if the lock has low contention (shallow queue
depth and not in slow state).  Again, maybe it makes sense to never spin
for deep queues or already slowstate locks.

 We still have an issue finding the point in time when a vcpu could run again, 
 which is what this whole series is about. My point above was that instead of 
 doing a count loop, we could just do the normal spin dance and set the 
 threshold to when we enable the magic to have another spin lock notify us in 
 the CPU. That way we

   * don't change the uncontended case

I don't follow you.  What do you mean by the normal spin dance?  What
do you mean by have another spinlock notify us in the CPU?  Don't
change which uncontended case?  Do you mean in the locking path?  Or the
unlock path?  Or both?

   * can set the threshold on the host, which knows how contended the system is

Hm, I'm not convinced that knowing how contended the system is is all
that useful overall.  What's important is how contended a particular
lock is, and what state the current holder is in.  If it's not currently
running, then knowing the overall system contention would give you some
idea about how long you need to wait for it to be rescheduled, but
that's getting pretty indirect.

I think the slowpath if preempted while spinning idea I mentioned in
the other mail is probably worth following up, since that give specific
actionable information to the guest from the hypervisor.  But lots of
caveats.

[[
A possible mechanism:

  * register ranges of [er]ips with the hypervisor
  * each range is paired with a resched handler block
  * if vcpu is preempted within such a range, make sure it is
rescheduled in the resched handler block

This is obviously akin to the exception mechanism, but it is partially
implemented by the hypervisor.  It allows the spinlock code to be
unchanged from native, but make use of a resched rather than an explicit
counter to determine when to slowpath the lock.  And it's a nice general
mechanism that could be potentially useful elsewhere.

Unfortunately, it doesn't change the unlock path at all; it still needs
to explicitly test if a VCPU needs to be kicked on unlock.
]]


 And since we control what spin locks look like, we can for example always 
 keep the pointer to it in a specific register so that we can handle 
 pv_lock_ops.lock_spinning() inside there and fetch all the information we 
 need from our pt_regs.

You've left a pile of parts of an idea lying around, but I'm not sure
what shape you intend it to be.

 Speaking of which - have you benchmarked performance degradation of pv 
 ticket locks on bare metal? Last time I checked, enabling all the PV ops 
 did incur significant slowdown which is why I went though the work to split 
 the individual pv ops features up to only enable a few for KVM guests.
 The whole point of the pv-ticketlock work is to keep the pvops hooks out of 
 the locking fast path, so that the calls are only made on the slow path - 
 that is, when spinning too long on a contended lock, and when releasing a 
 lock that's in a slow state.  In the fast path case of no contention, 
 there are no pvops, and the executed code path is almost identical to native.
 You're still changing a tight loop that does nothing (CPU detects it and 
 saves power) into something that performs calculations.

It still has a pause instruction in that loop, so that CPU mechanism
will still come into play.  pause doesn't directly save power; it's
more about making sure that memory dependence cycles are broken and that
two competing threads will make similar progress.  Besides I'm not sure
adding a dec+test to a loop that's already got a memory read and compare
in it is adding much in the way of calculations.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests

2012-01-15 Thread Jeremy Fitzhardinge
 are only made on the slow path - that is, 
when spinning too long on a contended lock, and when releasing a lock that's in 
a slow state.  In the fast path case of no contention, there are no pvops, 
and the executed code path is almost identical to native.

But as I mentioned above, I'd like to see some benchmarks to prove that's the 
case.

J

 
 
 Changes in V4:
 - reabsed to 3.2.0 pre.
 - use APIC ID for kicking the vcpu and use kvm_apic_match_dest for matching. 
 (Avi)
 - fold vcpu-kicked flag into vcpu-requests (KVM_REQ_PVLOCK_KICK) and 
 related 
 changes for UNHALT path to make pv ticket spinlock migration friendly. (Avi, 
 Marcello)
 - Added Documentation for CPUID, Hypercall (KVM_HC_KICK_CPU)
 and capabilty (KVM_CAP_PVLOCK_KICK) (Avi)
 - Remove unneeded kvm_arch_vcpu_ioctl_set_mpstate call. (Marcello)
 - cumulative variable type changed (int == u32) in add_stat (Konrad)
 - remove unneeded kvm_guest_init for !CONFIG_KVM_GUEST case
 
 Changes in V3:
 - rebased to 3.2-rc1
 - use halt() instead of wait for kick hypercall.
 - modify kick hyper call to do wakeup halted vcpu.
 - hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of 
 head##.c).
 - fix the potential race when zero_stat is read.
 - export debugfs_create_32 and add documentation to API.
 - use static inline and enum instead of ADDSTAT macro. 
 - add  barrier() in after setting kick_vcpu.
 - empty static inline function for kvm_spinlock_init.
 - combine the patches one and two readuce overhead.
 - make KVM_DEBUGFS depends on DEBUGFS.
 - include debugfs header unconditionally.
 
 Changes in V2:
 - rebased patchesto -rc9
 - synchronization related changes based on Jeremy's changes 
 (Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com) pointed by 
 Stephan Diestelhorst stephan.diestelho...@amd.com
 - enabling 32 bit guests
 - splitted patches into two more chunks
 
 Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (5): 
 Add debugfs support to print u32-arrays in debugfs
 Add a hypercall to KVM hypervisor to support pv-ticketlocks
 Added configuration support to enable debug information for KVM Guests
 pv-ticketlocks support for linux guests running on KVM hypervisor
 Add documentation on Hypercalls and features used for PV spinlock
 
 Test Set up :
 The BASE patch is pre 3.2.0 + Jeremy's following patches.
 xadd (https://lkml.org/lkml/2011/10/4/328)
 x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
 Kernel for host/guest : 3.2.0 + Jeremy's xadd, pv spinlock patches as BASE
 (Note:locked add change is not taken yet)
 
 Results:
 The performance gain is mainly because of reduced busy-wait time.
 From the results we can see that patched kernel performance is similar to
 BASE when there is no lock contention. But once we start seeing more
 contention, patched kernel outperforms BASE (non PLE).
 On PLE machine we do not see greater performance improvement because of PLE
 complimenting halt()
 
 3 guests with 8VCPU, 4GB RAM, 1 used for kernbench
 (kernbench -f -H -M -o 20) other for cpuhog (shell script while
 true with an instruction)
 
 scenario A: unpinned
 
 1x: no hogs
 2x: 8hogs in one guest
 3x: 8hogs each in two guest
 
 scenario B: unpinned, run kernbench on all the guests no hogs.
 
 Dbench on PLE machine:
 dbench run on all the guest simultaneously with
 dbench --warmup=30 -t 120 with NRCLIENTS=(8/16/32).
 
 Result for Non PLE machine :
 
 Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 
 64GB RAM
   BASEBASE+patch%improvement
   mean (sd)   mean (sd)
 Scenario A:
 case 1x:  164.233 (16.5506)   163.584 (15.4598   0.39517
 case 2x:  897.654 (543.993)   328.63 (103.771)   63.3901
 case 3x:  2855.73 (2201.41)   315.029 (111.854)  88.9685
 
 Dbench:
 Throughput is in MB/sec
 NRCLIENTS BASEBASE+patch%improvement
   mean (sd)   mean (sd)
 81.774307  (0.061361)1.725667  (0.034644)-2.74135
 16   1.445967  (0.044805)1.463173  (0.094399)1.18993
 32   2.136667  (0.105717)2.193792  (0.129357)2.67356
 
 Result for PLE machine:
 ==
 Machine : IBM xSeries with Intel(R) Xeon(R)  X7560 2.27GHz CPU with 32/64 
 core, with 8
online cores and 4*64GB RAM
 
 Kernbench:
   BASEBASE+patch%improvement
   mean (sd)   mean (sd)
 Scenario A:  
 case 1x:  161.263 (56.518)159.635 (40.5621)  1.00953
 case 2x:  190.748 (61.2745)   190.606 (54.4766)  0.0744438
 case 3x:  227.378 (100.215)   225.442 (92.0809)  0.851446
 
 Scenario B:
   446.104 (58.54 )433.12733 (54.476) 2.91
 
 Dbench:
 Throughput is in MB/sec
 NRCLIENTS BASEBASE+patch

Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option

2012-01-12 Thread Jeremy Fitzhardinge
On 01/10/2012 05:44 AM, Stefano Stabellini wrote:
 On Mon, 9 Jan 2012, Andrew Jones wrote:
 I guess if we did the s/XEN_DOM0/LOCAL_APIC  IO_APIC  ACPI/ in
 arch/x86/pci/xen.c it would be pretty easy to review for equivalence.
 Then keep CONFIG_PRIVILIGED, but drop XEN_DOM0 from everywhere else and
 compile in the 3 files. I don't think it makes much sense to do it
 though. XEN_DOM0 keeps things tidier now and might be useful later.
 we can keep things clean with the following:

 #ifdef CONFIG_LOCAL_APIC  CONFIG_IO_APIC  CONFIG_ACPI  CONFIG_PCI_XEN

 #define XEN_DOM0

 #endif

 in include/xen/xen.h.

 So in the source files we can still '#ifdef XEN_DOM0', but at the same
 time we can get rid of the build symbol: everybody wins.

No, really, I think this is silly.  This kind of dependency information
should be encoded in the Kconfig system, and not reimplemented in an
ad-hoc way.

If there were a clean way to do what Andrew wants then I'd support it,
but this thread has descended into a spiral of madness, which makes me
think its a lost cause.

If the root complaint is that customers think that anything set in
.config is a supported feature, then the solutions are to support all
the features in .config, re-educate the customers that they're wrong, or
maintain a local patch to do this stuff.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 0/6] Collected vdso/vsyscall fixes for 3.1

2011-08-11 Thread Jeremy Fitzhardinge
On 08/03/2011 06:59 AM, Andrew Lutomirski wrote:
 On Wed, Aug 3, 2011 at 9:56 AM, Konrad Rzeszutek Wilk
 konrad.w...@oracle.com wrote:
 On Wed, Aug 03, 2011 at 09:53:22AM -0400, Konrad Rzeszutek Wilk wrote:
 On Wed, Aug 03, 2011 at 09:31:48AM -0400, Andy Lutomirski wrote:
 This fixes various problems that cropped up with the vdso patches.

  - Patch 1 fixes an information leak to userspace.
  - Patches 2 and 3 fix the kernel build on gold.
  - Patches 4 and 5 fix Xen (I hope).
  - Patch 6 (optional) adds a trace event to vsyscall emulation.  It will
make it easier to handle performance regression reports :)
 Hm, you seemed to have the x86 maintainers on your email..
 I definitly need some coffee. What I meant was that you missing putting
 the x86 maintainers on this patchset. They are the ones that will handle 
 this.

 I put them on the To list for you.
 Are you sure about that coffee?  I'm pretty sure I had x...@kernel.org in 
 there.

What's the state of this series?  Has tip.git picked it up, or does it
need another go-around?

I just booted a Linus tree and got a backtrace that looks like this
issue, though I haven't looked into it in detail yet.

Thanks,
J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: linux-next: Tree for July 25 (xen)

2011-08-10 Thread Jeremy Fitzhardinge
On 08/08/2011 06:01 AM, Steven Rostedt wrote:
 On Sat, 2011-08-06 at 11:22 -0700, Jeremy Fitzhardinge wrote:
 On 08/05/2011 06:58 PM, Konrad Rzeszutek Wilk wrote:
 On Fri, Aug 05, 2011 at 11:14:23PM +0200, Ingo Molnar wrote:
 * Ingo Molnar mi...@elte.hu wrote:

 * Randy Dunlap rdun...@xenotime.net wrote:

 On 08/04/11 15:40, Konrad Rzeszutek Wilk wrote:
 On Thu, Aug 04, 2011 at 06:32:59PM -0400, Konrad Rzeszutek Wilk wrote:
 These build failures are still triggering upstream:

  arch/x86/xen/trace.c:44:2: error: array index in initializer not 
 of integer type
  arch/x86/xen/trace.c:44:2: error: (near initialization for 
 ‘xen_hypercall_names’)
  arch/x86/xen/trace.c:45:1: error: ‘__HYPERVISOR_arch_4’ undeclared 
 here (not in a function)
  arch/x86/xen/trace.c:45:2: error: array index in initializer not 
 of integer type
  arch/x86/xen/trace.c:45:2: error: (near initialization for 
 ‘xen_hypercall_names’)
 Oh, that I haven't seen. Can you send me the .config for that please.
 You can't be trying very hard then.  I see lots of these (but no,
 Ah, I am getting it now. Thanks for reporting it.
 This should do the trick:
 Acked-by: Randy Dunlap rdun...@xenotime.net
 That patch did the trick here too:

 Acked-by: Ingo Molnar mi...@elte.hu
 Except that i'm still seeing the occasional build failure - see the 
 error log below. Config attached.
 Much appreciate for the report. I believe this fix (which I think hit
 linux-next yesterday?) should do that:

 commit 1e9ea2656b656edd3c8de98675bbc0340211b5bd
 Author: Jeremy Fitzhardinge jer...@goop.org
 Date:   Wed Aug 3 09:43:44 2011 -0700

  xen/tracing: it looks like we wanted CONFIG_FTRACE
 
 Apparently we wanted CONFIG_FTRACE rather the CONFIG_FUNCTION_TRACER.
 
 Reported-by: Sander Eikelenboom li...@eikelenboom.it
 Tested-by: Sander Eikelenboom li...@eikelenboom.it
 Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
 Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com

 diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
 index 45e94ac..3326204 100644
 --- a/arch/x86/xen/Makefile
 +++ b/arch/x86/xen/Makefile
 @@ -15,7 +15,7 @@ obj-y := enlighten.o setup.o multicalls.o 
 mmu.o irq.o \
 grant-table.o suspend.o platform-pci-unplug.o \
 p2m.o
  
 -obj-$(CONFIG_FUNCTION_TRACER) += trace.o
 +obj-$(CONFIG_FTRACE) += trace.o
  
 I'm not sure this is correct either.  Maybe it should be
 CONFIG_TRACEPOINTS?  Steven?
 Actually, I believe the correct answer is:

 CONFIG_EVENT_TRACING

OK, thanks.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Xen-devel] Re: linux-next: Tree for July 25 (xen)

2011-08-06 Thread Jeremy Fitzhardinge
On 08/05/2011 06:58 PM, Konrad Rzeszutek Wilk wrote:
 On Fri, Aug 05, 2011 at 11:14:23PM +0200, Ingo Molnar wrote:
 * Ingo Molnar mi...@elte.hu wrote:

 * Randy Dunlap rdun...@xenotime.net wrote:

 On 08/04/11 15:40, Konrad Rzeszutek Wilk wrote:
 On Thu, Aug 04, 2011 at 06:32:59PM -0400, Konrad Rzeszutek Wilk wrote:
 These build failures are still triggering upstream:

  arch/x86/xen/trace.c:44:2: error: array index in initializer not of 
 integer type
  arch/x86/xen/trace.c:44:2: error: (near initialization for 
 ‘xen_hypercall_names’)
  arch/x86/xen/trace.c:45:1: error: ‘__HYPERVISOR_arch_4’ undeclared 
 here (not in a function)
  arch/x86/xen/trace.c:45:2: error: array index in initializer not of 
 integer type
  arch/x86/xen/trace.c:45:2: error: (near initialization for 
 ‘xen_hypercall_names’)
 Oh, that I haven't seen. Can you send me the .config for that please.
 You can't be trying very hard then.  I see lots of these (but no,
 Ah, I am getting it now. Thanks for reporting it.
 This should do the trick:
 Acked-by: Randy Dunlap rdun...@xenotime.net
 That patch did the trick here too:

 Acked-by: Ingo Molnar mi...@elte.hu
 Except that i'm still seeing the occasional build failure - see the 
 error log below. Config attached.
 Much appreciate for the report. I believe this fix (which I think hit
 linux-next yesterday?) should do that:

 commit 1e9ea2656b656edd3c8de98675bbc0340211b5bd
 Author: Jeremy Fitzhardinge jer...@goop.org
 Date:   Wed Aug 3 09:43:44 2011 -0700

  xen/tracing: it looks like we wanted CONFIG_FTRACE
 
 Apparently we wanted CONFIG_FTRACE rather the CONFIG_FUNCTION_TRACER.
 
 Reported-by: Sander Eikelenboom li...@eikelenboom.it
 Tested-by: Sander Eikelenboom li...@eikelenboom.it
 Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
 Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com

 diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
 index 45e94ac..3326204 100644
 --- a/arch/x86/xen/Makefile
 +++ b/arch/x86/xen/Makefile
 @@ -15,7 +15,7 @@ obj-y   := enlighten.o setup.o multicalls.o 
 mmu.o irq.o \
   grant-table.o suspend.o platform-pci-unplug.o \
   p2m.o
  
 -obj-$(CONFIG_FUNCTION_TRACER) += trace.o
 +obj-$(CONFIG_FTRACE) += trace.o
  

I'm not sure this is correct either.  Maybe it should be
CONFIG_TRACEPOINTS?  Steven?

Thanks,
J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 5/5] x86-64: Add user_64bit_mode paravirt op

2011-07-27 Thread Jeremy Fitzhardinge
On 07/26/2011 08:20 PM, Andy Lutomirski wrote:
 Three places in the kernel assume that the only long mode CPL 3
 selector is __USER_CS.  This is not true on Xen -- Xen's sysretq
 changes cs to the magic value 0xe033.

 Two of the places are corner cases, but as of x86-64: Improve
 vsyscall emulation CS and RIP handling
 (c9712944b2a12373cb6ff8059afcfb7e826a6c54), vsyscalls will segfault
 if called with Xen's extra CS selector.  This causes a panic when
 older init builds die.

 It seems impossible to make Xen use __USER_CS reliably without
 taking a performance hit on every system call, so this fixes the
 tests instead with a new paravirt op.  It's a little ugly because
 ptrace.h can't include paravirt.h.

 Signed-off-by: Andy Lutomirski l...@mit.edu
 Reported-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 ---
  arch/x86/include/asm/desc.h   |4 ++--
  arch/x86/include/asm/paravirt_types.h |6 ++
  arch/x86/include/asm/ptrace.h |   19 +++
  arch/x86/kernel/paravirt.c|4 
  arch/x86/kernel/step.c|2 +-
  arch/x86/kernel/vsyscall_64.c |6 +-
  arch/x86/mm/fault.c   |2 +-
  arch/x86/xen/enlighten.c  |1 +
  8 files changed, 35 insertions(+), 9 deletions(-)

 diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
 index 7b439d9..41935fa 100644
 --- a/arch/x86/include/asm/desc.h
 +++ b/arch/x86/include/asm/desc.h
 @@ -27,8 +27,8 @@ static inline void fill_ldt(struct desc_struct *desc, const 
 struct user_desc *in
  
   desc-base2 = (info-base_addr  0xff00)  24;
   /*
 -  * Don't allow setting of the lm bit. It is useless anyway
 -  * because 64bit system calls require __USER_CS:
 +  * Don't allow setting of the lm bit. It would confuse
 +  * user_64bit_mode and would get overridden by sysret anyway.
*/
   desc-l = 0;
  }
 diff --git a/arch/x86/include/asm/paravirt_types.h 
 b/arch/x86/include/asm/paravirt_types.h
 index 2c76521..8e8b9a4 100644
 --- a/arch/x86/include/asm/paravirt_types.h
 +++ b/arch/x86/include/asm/paravirt_types.h
 @@ -41,6 +41,7 @@
  
  #include asm/desc_defs.h
  #include asm/kmap_types.h
 +#include asm/pgtable_types.h
  
  struct page;
  struct thread_struct;
 @@ -63,6 +64,11 @@ struct paravirt_callee_save {
  struct pv_info {
   unsigned int kernel_rpl;
   int shared_kernel_pmd;
 +
 +#ifdef CONFIG_X86_64
 + u16 extra_user_64bit_cs;  /* __USER_CS if none */
 +#endif
 +
   int paravirt_enabled;
   const char *name;
  };
 diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
 index 94e7618..3566454 100644
 --- a/arch/x86/include/asm/ptrace.h
 +++ b/arch/x86/include/asm/ptrace.h
 @@ -131,6 +131,9 @@ struct pt_regs {
  #ifdef __KERNEL__
  
  #include linux/init.h
 +#ifdef CONFIG_PARAVIRT
 +#include asm/paravirt_types.h
 +#endif
  
  struct cpuinfo_x86;
  struct task_struct;
 @@ -187,6 +190,22 @@ static inline int v8086_mode(struct pt_regs *regs)
  #endif
  }
  
 +#ifdef CONFIG_X86_64
 +static inline bool user_64bit_mode(struct pt_regs *regs)
 +{
 +#ifndef CONFIG_PARAVIRT
 + /*
 +  * On non-paravirt systems, this is the only long mode CPL 3
 +  * selector.  We do not allow long mode selectors in the LDT.
 +  */
 + return regs-cs == __USER_CS;
 +#else
 + /* Headers are too twisted for this to go in paravirt.h. */
 + return regs-cs == __USER_CS || regs-cs == pv_info.extra_user_64bit_cs;

Is this necessary because usermode may sometimes be on __USER_CS or
sometimes on Xen's?  Could we just commit to one or the other and make
it a simple comparison?

What if __USER_CS were a variable?

J
 +#endif
 +}
 +#endif
 +
  /*
   * X86_32 CPUs don't save ss and esp if the CPU is already in kernel mode
   * when it traps.  The previous stack will be directly underneath the saved
 diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
 index 613a793..d90272e 100644
 --- a/arch/x86/kernel/paravirt.c
 +++ b/arch/x86/kernel/paravirt.c
 @@ -307,6 +307,10 @@ struct pv_info pv_info = {
   .paravirt_enabled = 0,
   .kernel_rpl = 0,
   .shared_kernel_pmd = 1, /* Only used when CONFIG_X86_PAE is set */
 +
 +#ifdef CONFIG_X86_64
 + .extra_user_64bit_cs = __USER_CS,
 +#endif
  };
  
  struct pv_init_ops pv_init_ops = {
 diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
 index 7977f0c..c346d11 100644
 --- a/arch/x86/kernel/step.c
 +++ b/arch/x86/kernel/step.c
 @@ -74,7 +74,7 @@ static int is_setting_trap_flag(struct task_struct *child, 
 struct pt_regs *regs)
  
  #ifdef CONFIG_X86_64
   case 0x40 ... 0x4f:
 - if (regs-cs != __USER_CS)
 + if (!user_64bit_mode(regs))
   /* 32-bit mode: register increment */
   return 0;
   /* 64-bit mode: REX prefix */
 diff --git 

Re: [PATCH 09/11] xen/xenbus: use printk_ratelimited() instead of printk_ratelimit()

2011-06-16 Thread Jeremy Fitzhardinge
On 06/16/2011 05:14 AM, Manuel Zerpies wrote:
 Since printk_ratelimit() shouldn't be used anymore (see comment in
 include/linux/printk.h), replace it with printk_ratelimited()

Looks OK to me, but please fix the indentation of the rest of the
statement to match.

Thanks,
J

 Signed-off-by: Manuel Zerpies manuel.f.zerp...@ww.stud.uni-erlangen.de
 ---
  drivers/xen/xenbus/xenbus_xs.c |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
 index 5534690..4055858 100644
 --- a/drivers/xen/xenbus/xenbus_xs.c
 +++ b/drivers/xen/xenbus/xenbus_xs.c
 @@ -44,6 +44,7 @@
  #include linux/rwsem.h
  #include linux/module.h
  #include linux/mutex.h
 +#include linux/ratelimit.h
  #include xen/xenbus.h
  #include xenbus_comms.h
  
 @@ -270,8 +271,7 @@ static void *xs_talkv(struct xenbus_transaction t,
   }
  
   if (msg.type != type) {
 - if (printk_ratelimit())
 - printk(KERN_WARNING
 + `printk_ratelimited(KERN_WARNING
  XENBUS unexpected type [%d], expected [%d]\n,
  msg.type, type);
   kfree(ret);

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch] xen: off by one errors in multicalls.c

2011-06-03 Thread Jeremy Fitzhardinge
On 06/02/2011 09:45 PM, Dan Carpenter wrote:
 b-args[] has MC_ARGS elements, so the comparison here should be
 = instead of .  Otherwise we read past the end of the array
 one space.

Yeah, looks like a correct fix.  Fortunately I don't think anything
currently hits that path in practice, though there are some pending
patches which will exercise it more.

Thanks,
J

 Signed-off-by: Dan Carpenter erro...@gmail.com
 ---
 This is a static checker patch and I haven't tested it.  Please
 review carefully.

 diff --git a/arch/x86/xen/multicalls.c b/arch/x86/xen/multicalls.c
 index 8bff7e7..1b2b73f 100644
 --- a/arch/x86/xen/multicalls.c
 +++ b/arch/x86/xen/multicalls.c
 @@ -189,10 +189,10 @@ struct multicall_space __xen_mc_entry(size_t args)
   unsigned argidx = roundup(b-argidx, sizeof(u64));
  
   BUG_ON(preemptible());
 - BUG_ON(b-argidx  MC_ARGS);
 + BUG_ON(b-argidx = MC_ARGS);
  
   if (b-mcidx == MC_BATCH ||
 - (argidx + args)  MC_ARGS) {
 + (argidx + args) = MC_ARGS) {
   mc_stats_flush(b-mcidx == MC_BATCH ? FL_SLOTS : FL_ARGS);
   xen_mc_flush();
   argidx = roundup(b-argidx, sizeof(u64));
 @@ -206,7 +206,7 @@ struct multicall_space __xen_mc_entry(size_t args)
   ret.args = b-args[argidx];
   b-argidx = argidx + args;
  
 - BUG_ON(b-argidx  MC_ARGS);
 + BUG_ON(b-argidx = MC_ARGS);
   return ret;
  }
  
 @@ -216,7 +216,7 @@ struct multicall_space xen_mc_extend_args(unsigned long 
 op, size_t size)
   struct multicall_space ret = { NULL, NULL };
  
   BUG_ON(preemptible());
 - BUG_ON(b-argidx  MC_ARGS);
 + BUG_ON(b-argidx = MC_ARGS);
  
   if (b-mcidx == 0)
   return ret;
 @@ -224,14 +224,14 @@ struct multicall_space xen_mc_extend_args(unsigned long 
 op, size_t size)
   if (b-entries[b-mcidx - 1].op != op)
   return ret;
  
 - if ((b-argidx + size)  MC_ARGS)
 + if ((b-argidx + size) = MC_ARGS)
   return ret;
  
   ret.mc = b-entries[b-mcidx - 1];
   ret.args = b-args[b-argidx];
   b-argidx += size;
  
 - BUG_ON(b-argidx  MC_ARGS);
 + BUG_ON(b-argidx = MC_ARGS);
   return ret;
  }
  
 ___
 Virtualization mailing list
 Virtualization@lists.linux-foundation.org
 https://lists.linux-foundation.org/mailman/listinfo/virtualization


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [PATCH] xen: drop anti-dependency on X86_VISWS

2011-04-14 Thread Jeremy Fitzhardinge
On 04/09/2011 03:34 AM, Ian Campbell wrote:
 Actually it does - see the #ifndef CONFIG_X86_CMPXCHG section
 in asm/cmpxchg_32.h.
 Hm, OK.  Still, I'm happiest with that dependency in case someone
 knobbles the cpu to exclude cmpxchg and breaks things.
 Dropping the TSC patch is sensible though?

You mean dropping the TSC dependency?  Yes, I think so.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] xen: drop anti-dependency on X86_VISWS

2011-04-08 Thread Jeremy Fitzhardinge
On 04/07/2011 11:38 PM, Ian Campbell wrote:
 Not really.  The TSC register is a requirement, but that's going to be
 present on any CPU which can boot Xen.  We don't need any of the
 kernel's TSC machinery though.
 So why the Kconfig dependency then? In principal a kernel compiled for a
 non-TSC processor (which meets the other requirements for Xen, such as
 PAE support) will run just fine under Xen on a newer piece of hardware.

Not sure where it came from.  It was probably never needed, or just
added for some secondary effect we wanted.

 Is there any downside to this patch (is X86_CMPXCHG in the same sort of
 boat?)

Only if we don't use cmpxchg in shared memory with other domains or the
hypervisor.  (I don't think it will dynamically switch between real and
emulated cmpxchg depending on availability.)

J

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [PATCH] xen: drop anti-dependency on X86_VISWS

2011-04-08 Thread Jeremy Fitzhardinge
On 04/08/2011 08:42 AM, Jan Beulich wrote:
 On 08.04.11 at 17:25, Jeremy Fitzhardinge jer...@goop.org wrote:
 On 04/07/2011 11:38 PM, Ian Campbell wrote:
 Is there any downside to this patch (is X86_CMPXCHG in the same sort of
 boat?)
 Only if we don't use cmpxchg in shared memory with other domains or the
 hypervisor.  (I don't think it will dynamically switch between real and
 emulated cmpxchg depending on availability.)
 Actually it does - see the #ifndef CONFIG_X86_CMPXCHG section
 in asm/cmpxchg_32.h.

Hm, OK.  Still, I'm happiest with that dependency in case someone
knobbles the cpu to exclude cmpxchg and breaks things.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] xen: drop anti-dependency on X86_VISWS

2011-04-07 Thread Jeremy Fitzhardinge
On 04/06/2011 11:58 PM, Ian Campbell wrote:
 On Wed, 2011-04-06 at 22:45 +0100, David Miller wrote:
 From: Ian Campbell ian.campb...@eu.citrix.com
 Date: Mon, 4 Apr 2011 10:55:55 +0100

 You mean the !X86_VISWS I presume? It doesn't make sense to me either.
 No, I think 32-bit x86 allmodconfig elides XEN because of it's X86_TSC 
 dependency.
 TSC is a real dependency of the Xen interfaces.

Not really.  The TSC register is a requirement, but that's going to be
present on any CPU which can boot Xen.  We don't need any of the
kernel's TSC machinery though.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock

2011-01-24 Thread Jeremy Fitzhardinge
On 01/22/2011 06:53 AM, Rik van Riel wrote:
 The main question that remains is whether the PV ticketlocks are
 a large enough improvement to also merge those.  I expect they
 will be, and we'll see so in the benchmark numbers.

The pathological worst-case of ticket locks in a virtual environment
isn't very hard to hit, so I expect they'll make a huge difference
there.   On lightly loaded systems (little or no CPU overcommit) then
they should be close to a no-op.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 13/14] x86/ticketlock: add slowpath logic

2011-01-24 Thread Jeremy Fitzhardinge
On 01/19/2011 08:23 AM, Srivatsa Vaddagiri wrote:
 On Mon, Jan 17, 2011 at 08:52:22PM +0530, Srivatsa Vaddagiri wrote:
 I think this is still racy ..

 Unlocker Locker

  
 test slowpath
  - false
  
  set slowpath flag
  test for lock pickup
  - fail
  block


 unlock

 unlock needs to happen first before testing slowpath? I have made that change
 for my KVM guest and it seems to be working well with that change .. Will
 cleanup and post my patches shortly
 Patch below fixes the race described above. You can fold this to your patch
 13/14 if you agree this is in right direction.


 Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com

 ---
  arch/x86/include/asm/spinlock.h  |7 +++
  arch/x86/kernel/paravirt-spinlocks.c |   22 +-
  2 files changed, 8 insertions(+), 21 deletions(-)

 Index: linux-2.6.37/arch/x86/include/asm/spinlock.h
 ===
 --- linux-2.6.37.orig/arch/x86/include/asm/spinlock.h
 +++ linux-2.6.37/arch/x86/include/asm/spinlock.h
 @@ -55,7 +55,7 @@ static __always_inline void __ticket_unl
  
  /* Only defined when CONFIG_PARAVIRT_SPINLOCKS defined, but may as
   * well leave the prototype always visible.  */
 -extern void __ticket_unlock_release_slowpath(struct arch_spinlock *lock);
 +extern void __ticket_unlock_slowpath(struct arch_spinlock *lock);
  
  #ifdef CONFIG_PARAVIRT_SPINLOCKS
  
 @@ -166,10 +166,9 @@ static __always_inline int arch_spin_try
  static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
  {
   barrier();  /* prevent reordering out of locked region */
 + __ticket_unlock_release(lock);
   if (unlikely(__ticket_in_slowpath(lock)))
 - __ticket_unlock_release_slowpath(lock);
 - else
 - __ticket_unlock_release(lock);
 + __ticket_unlock_slowpath(lock);
   barrier();  /* prevent reordering into locked region */
  }
  
 Index: linux-2.6.37/arch/x86/kernel/paravirt-spinlocks.c
 ===
 --- linux-2.6.37.orig/arch/x86/kernel/paravirt-spinlocks.c
 +++ linux-2.6.37/arch/x86/kernel/paravirt-spinlocks.c
 @@ -22,33 +22,21 @@ EXPORT_SYMBOL(pv_lock_ops);
   * bits.  However, we need to be careful about this because someone
   * may just be entering as we leave, and enter the slowpath.
   */
 -void __ticket_unlock_release_slowpath(struct arch_spinlock *lock)
 +void __ticket_unlock_slowpath(struct arch_spinlock *lock)
  {
   struct arch_spinlock old, new;
  
   BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
  
   old = ACCESS_ONCE(*lock);
 -
   new = old;
 - new.tickets.head += TICKET_LOCK_INC;
  
   /* Clear the slowpath flag */
   new.tickets.tail = ~TICKET_SLOWPATH_FLAG;
 + if (new.tickets.head == new.tickets.tail)
 + cmpxchg(lock-head_tail, old.head_tail, new.head_tail);
  
 - /*
 -  * If there's currently people waiting or someone snuck in
 -  * since we read the lock above, then do a normal unlock and
 -  * kick.  If we managed to unlock with no queued waiters, then
 -  * we can clear the slowpath flag.
 -  */
 - if (new.tickets.head != new.tickets.tail ||
 - cmpxchg(lock-head_tail,
 - old.head_tail, new.head_tail) != old.head_tail) {
 - /* still people waiting */
 - __ticket_unlock_release(lock);
 - }
 -
 + /* Wake up an appropriate waiter */
   __ticket_unlock_kick(lock, new.tickets.head);

Does the __ticket_unlock_kick need to be unconditional?  If the
cmpxchg() fails, then it means someone else has come in and changed the
lock under our feet, and presumably they'll do the kick as needed?

Or is there the possibility that the kick will get lost?

There's certainly no harm in doing a redundant kick, and I'd expect that
case to be pretty rare...

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock

2011-01-20 Thread Jeremy Fitzhardinge
On 01/20/2011 03:42 AM, Srivatsa Vaddagiri wrote:
 On Wed, Jan 19, 2011 at 10:53:52AM -0800, Jeremy Fitzhardinge wrote:
 The reason for wanting this should be clear I guess, it allows PI.
 Well, if we can expand the spinlock to include an owner, then all this
 becomes moot...
 How so? Having an owner will not eliminate the need for pv-ticketlocks
 afaict. We still need a mechanism to reduce latency in scheduling the next
 thread-in-waiting, which is achieved by your patches. 

No, sorry, I should have been clearer.  I meant that going to the effort
of not increasing the lock size to record in slowpath state.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock

2011-01-20 Thread Jeremy Fitzhardinge
On 01/20/2011 03:59 AM, Srivatsa Vaddagiri wrote:
 At least in the Xen code, a current owner isn't very useful, because we
 need the current owner to kick the *next* owner to life at release time,
 which we can't do without some structure recording which ticket belongs
 to which cpu.
 If we had a yield-to [1] sort of interface _and_ information on which vcpu
 owns a lock, then lock-spinners can yield-to the owning vcpu, while the
 unlocking vcpu can yield-to the next-vcpu-in-waiting.

Perhaps, but the core problem is how to find next-vcpu-in-waiting
efficiently.  Once you have that info, there's a number of things you
can usefully do with it.

  The key here is not to
 sleep when waiting for locks (as implemented by current patch-series, which 
 can 
 put other VMs at an advantage by giving them more time than they are entitled 
 to)

Why?  If a VCPU can't make progress because its waiting for some
resource, then why not schedule something else instead?  Presumably when
the VCPU does become runnable, the scheduler will credit its previous
blocked state and let it run in preference to something else.

  and also to ensure that lock-owner as well as the next-in-line lock-owner 
 are not unduly made to wait for cpu. 

 Is there a way we can dynamically expand the size of lock only upon contention
 to include additional information like owning vcpu? Have the lock point to a
 per-cpu area upon contention where additional details can be stored perhaps?

As soon as you add a pointer to the lock, you're increasing its size. 
If we had a pointer in there already, then all of this would be moot.

If auxiliary per-lock is uncommon, then using a hash keyed on lock
address would be one way to do it.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock

2011-01-19 Thread Jeremy Fitzhardinge
On 01/19/2011 09:21 AM, Peter Zijlstra wrote:
 On Wed, 2011-01-19 at 22:42 +0530, Srivatsa Vaddagiri wrote:
 Add two hypercalls to KVM hypervisor to support pv-ticketlocks.

 KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or 
 it
 is woken up because of an event like interrupt.

 KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu.

 The presence of these hypercalls is indicated to guest via
 KVM_FEATURE_WAIT_FOR_KICK/KVM_CAP_WAIT_FOR_KICK. Qemu needs a corresponding
 patch to pass up the presence of this feature to guest via cpuid. Patch to 
 qemu
 will be sent separately.

 I didn't really read the patch, and I totally forgot everything from
 when I looked at the Xen series, but does the Xen/KVM hypercall
 interface for this include the vcpu to await the kick from?

 My guess is not, since the ticket locks used don't know who the owner
 is, which is of course, sad. There are FIFO spinlock implementations
 that can do this though.. although I think they all have a bigger memory
 footprint.

At least in the Xen code, a current owner isn't very useful, because we
need the current owner to kick the *next* owner to life at release time,
which we can't do without some structure recording which ticket belongs
to which cpu.

(A reminder: the big problem with ticket locks is not with the current
owner getting preempted, but making sure the next VCPU gets scheduled
quickly when the current one releases; without that all the waiting
VCPUs burn the timeslices until the VCPU scheduler gets around to
scheduling the actual next in line.)

At present, the code needs to scan an array of percpu I am waiting on
lock X with ticket Y structures to work out who's next.  The search is
somewhat optimised by keeping a cpuset of which CPUs are actually
blocked on spinlocks, but its still going to scale badly with lots of CPUs.

I haven't thought of a good way to improve on this; an obvious approach
is to just add a pointer to the spinlock and hang an explicit linked
list off it, but that's incompatible with wanting to avoid expanding the
lock.

You could have a table of auxiliary per-lock data hashed on the lock
address, but its not clear to me that its an improvement on the array
approach, especially given the synchronization issues of keeping that
structure up to date (do we have a generic lockless hashtable
implementation?).  But perhaps its one of those things that makes sense
at larger scales.

 The reason for wanting this should be clear I guess, it allows PI.

Well, if we can expand the spinlock to include an owner, then all this
becomes moot...

J


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 13/14] x86/ticketlock: add slowpath logic

2011-01-19 Thread Jeremy Fitzhardinge
On 01/19/2011 10:39 AM, Srivatsa Vaddagiri wrote:
 I have tested quite extensively with booting a 16-vcpu guest (on a 16-pcpu 
 host)
 and running kernel compine (with 32-threads). Without this patch, I had
 difficulty booting/shutting-down successfully (it would hang mid-way).

Sounds good.  But I like to test with make -j 100-200 to really give
things a workout.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86/pvclock-xen: zero last_value on resume

2010-11-26 Thread Jeremy Fitzhardinge
On 11/26/2010 02:40 AM, Ingo Molnar wrote:
 * Jeremy Fitzhardinge jer...@goop.org wrote:

 On 10/26/2010 09:59 AM, Jeremy Fitzhardinge wrote:
 If the guest domain has been suspend/resumed or migrated, then the
 system clock backing the pvclock clocksource may revert to a smaller
 value (ie, can be non-monotonic across the migration/save-restore).
 Make sure we zero last_value in that case so that the domain
 continues to see clock updates.
 Ping?  Looks like this fell through the gaps.
 Does not apply cleanly here - mind resending the latest version?

It rebased cleanly to 2.6.37-rc3.  You can pull it from:

  git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git pvclock-resume

Jeremy Fitzhardinge (1):
  x86/pvclock: zero last_value on resume

 arch/x86/include/asm/pvclock.h |1 +
 arch/x86/kernel/pvclock.c  |5 +
 arch/x86/xen/time.c|2 ++
 3 files changed, 8 insertions(+), 0 deletions(-)

Thanks,
J

Subject: [PATCH] x86/pvclock: zero last_value on resume

If the guest domain has been suspend/resumed or migrated, then the
system clock backing the pvclock clocksource may revert to a smaller
value (ie, can be non-monotonic across the migration/save-restore).
Make sure we zero last_value in that case so that the domain
continues to see clock updates.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 7f7e577..31d84ac 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -11,6 +11,7 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info 
*src);
 void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
struct pvclock_vcpu_time_info *vcpu,
struct timespec *ts);
+void pvclock_resume(void);
 
 /*
  * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 008b91e..42eb330 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -83,6 +83,11 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info 
*src)
 
 static atomic64_t last_value = ATOMIC64_INIT(0);
 
+void pvclock_resume(void)
+{
+   atomic64_set(last_value, 0);
+}
+
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
struct pvclock_shadow_time shadow;
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index b2bb5aa..5da5e53 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -426,6 +426,8 @@ void xen_timer_resume(void)
 {
int cpu;
 
+   pvclock_resume();
+
if (xen_clockevent != xen_vcpuop_clockevent)
return;
 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86/pvclock-xen: zero last_value on resume

2010-11-22 Thread Jeremy Fitzhardinge
On 10/26/2010 09:59 AM, Jeremy Fitzhardinge wrote:

 If the guest domain has been suspend/resumed or migrated, then the
 system clock backing the pvclock clocksource may revert to a smaller
 value (ie, can be non-monotonic across the migration/save-restore).
 Make sure we zero last_value in that case so that the domain
 continues to see clock updates.

Ping?  Looks like this fell through the gaps.

J

 [ I don't know if kvm needs an analogous fix or not. ]

 Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
 Cc: Stable Kernel sta...@kernel.org
 Reported-by: Eelco Dolstra e.dols...@tudelft.nl
 Reported-by: Olivier Hanesse olivier.hane...@gmail.com
 Bisected-by: Cédric Schieli cschi...@gmail.com
 Tested-by: Cédric Schieli cschi...@gmail.com

 diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
 index cd02f32..6226870 100644
 --- a/arch/x86/include/asm/pvclock.h
 +++ b/arch/x86/include/asm/pvclock.h
 @@ -11,5 +11,6 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info 
 *src);
  void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
   struct pvclock_vcpu_time_info *vcpu,
   struct timespec *ts);
 +void pvclock_resume(void);
  
  #endif /* _ASM_X86_PVCLOCK_H */
 diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
 index 239427c..a4f07c1 100644
 --- a/arch/x86/kernel/pvclock.c
 +++ b/arch/x86/kernel/pvclock.c
 @@ -120,6 +120,11 @@ unsigned long pvclock_tsc_khz(struct 
 pvclock_vcpu_time_info *src)
  
  static atomic64_t last_value = ATOMIC64_INIT(0);
  
 +void pvclock_resume(void)
 +{
 + atomic64_set(last_value, 0);
 +}
 +
  cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
  {
   struct pvclock_shadow_time shadow;
 diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
 index b2bb5aa..5da5e53 100644
 --- a/arch/x86/xen/time.c
 +++ b/arch/x86/xen/time.c
 @@ -426,6 +426,8 @@ void xen_timer_resume(void)
  {
   int cpu;
  
 + pvclock_resume();
 +
   if (xen_clockevent != xen_vcpuop_clockevent)
   return;
  




 ___
 Virtualization mailing list
 Virtualization@lists.linux-foundation.org
 https://lists.linux-foundation.org/mailman/listinfo/virtualization

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 09/14] xen/pvticketlock: Xen implementation for PV ticket locks

2010-11-17 Thread Jeremy Fitzhardinge
On 11/17/2010 12:11 AM, Jan Beulich wrote:
 On 16.11.10 at 22:08, Jeremy Fitzhardinge jer...@goop.org wrote:
 +static void xen_lock_spinning(struct arch_spinlock *lock, unsigned want)
  {
 -struct xen_spinlock *xl = (struct xen_spinlock *)lock;
 -struct xen_spinlock *prev;
  int irq = __get_cpu_var(lock_kicker_irq);
 -int ret;
 +struct xen_lock_waiting *w = __get_cpu_var(lock_waiting);
 +int cpu = smp_processor_id();
  u64 start;
  
  /* If kicker interrupts not initialized yet, just spin */
  if (irq == -1)
 -return 0;
 +return;
  
  start = spin_time_start();
  
 -/* announce we're spinning */
 -prev = spinning_lock(xl);
 +w-want = want;
 +w-lock = lock;
 +
 +/* This uses set_bit, which atomic and therefore a barrier */
 +cpumask_set_cpu(cpu, waiting_cpus);
 Since you don't allow nesting, don't you need to disable
 interrupts before you touch per-CPU state?

Yes, I think you're right - interrupts need to be disabled for the bulk
of this function.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 13/14] x86/ticketlock: add slowpath logic

2010-11-17 Thread Jeremy Fitzhardinge
On 11/17/2010 12:31 AM, Jan Beulich wrote:
 On 16.11.10 at 22:08, Jeremy Fitzhardinge jer...@goop.org wrote:
 +static inline void __ticket_enter_slowpath(struct arch_spinlock *lock)
 +{
 +if (sizeof(lock-tickets.tail) == sizeof(u8))
 +asm (LOCK_PREFIX orb %1, %0
 + : +m (lock-tickets.tail)
 + : i (TICKET_SLOWPATH_FLAG) : memory);
 +else
 +asm (LOCK_PREFIX orw %1, %0
 + : +m (lock-tickets.tail)
 + : i (TICKET_SLOWPATH_FLAG) : memory);
 +}
 Came only now to mind: Here and elsewhere, did you try using
 %z0 to have gcc produce the opcode suffix character, rather
 than having these somewhat ugly if()-s?

Actually in this case I'm pretty sure there's already a set bit
function which will do the job.  set_bit(), I guess, though it takes a
bit number rather than a mask...

But, yes, %z0 sounds interesting.  Is it documented anywhere?  I think
I've tried to use it in the past and run into gcc bugs.

Thanks,
J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 13/14] x86/ticketlock: add slowpath logic

2010-11-17 Thread Jeremy Fitzhardinge
On 11/17/2010 12:52 AM, Jeremy Fitzhardinge wrote:
 But, yes, %z0 sounds interesting.  Is it documented anywhere?  I think
 I've tried to use it in the past and run into gcc bugs.
This one: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39590

Should be OK in this case because there's no 64-bit values to be seen...

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 13/14] x86/ticketlock: add slowpath logic

2010-11-17 Thread Jeremy Fitzhardinge
On 11/17/2010 12:58 AM, Avi Kivity wrote:
 Actually in this case I'm pretty sure there's already a set bit
 function which will do the job.  set_bit(), I guess, though it takes a
 bit number rather than a mask...



 set_bit() operates on a long, while the intel manuals recommend
 against operating on operands of different size, especially with
 locked operations.  I think newer processors have more relaxed
 requirements, though.

Despite its prototype, set_bit() is pretty specifically using orb for
a the constant case, or bts otherwise (I don't know what size memory
operation bts is considered to generate).

J

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 13/14] x86/ticketlock: add slowpath logic

2010-11-17 Thread Jeremy Fitzhardinge
On 11/17/2010 12:56 AM, Jeremy Fitzhardinge wrote:
 On 11/17/2010 12:52 AM, Jeremy Fitzhardinge wrote:
 But, yes, %z0 sounds interesting.  Is it documented anywhere?  I think
 I've tried to use it in the past and run into gcc bugs.
 This one: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39590

 Should be OK in this case because there's no 64-bit values to be seen...
Hm, it fails when __ticket_t is 16 bits:

/home/jeremy/git/linux/arch/x86/include/asm/spinlock.h: Assembler messages:
/home/jeremy/git/linux/arch/x86/include/asm/spinlock.h:73: Error: suffix
or operands invalid for `or'

lock; ors $1, 2(%rbx)   #,


So I don't think that's going to work out...

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [PATCH 09/14] xen/pvticketlock: Xen implementation for PV ticket locks

2010-11-17 Thread Jeremy Fitzhardinge
On 11/17/2010 12:52 AM, Jeremy Fitzhardinge wrote:
 On 11/17/2010 12:11 AM, Jan Beulich wrote:
 On 16.11.10 at 22:08, Jeremy Fitzhardinge jer...@goop.org wrote:
 +static void xen_lock_spinning(struct arch_spinlock *lock, unsigned want)
  {
 -   struct xen_spinlock *xl = (struct xen_spinlock *)lock;
 -   struct xen_spinlock *prev;
 int irq = __get_cpu_var(lock_kicker_irq);
 -   int ret;
 +   struct xen_lock_waiting *w = __get_cpu_var(lock_waiting);
 +   int cpu = smp_processor_id();
 u64 start;
  
 /* If kicker interrupts not initialized yet, just spin */
 if (irq == -1)
 -   return 0;
 +   return;
  
 start = spin_time_start();
  
 -   /* announce we're spinning */
 -   prev = spinning_lock(xl);
 +   w-want = want;
 +   w-lock = lock;
 +
 +   /* This uses set_bit, which atomic and therefore a barrier */
 +   cpumask_set_cpu(cpu, waiting_cpus);
 Since you don't allow nesting, don't you need to disable
 interrupts before you touch per-CPU state?
 Yes, I think you're right - interrupts need to be disabled for the bulk
 of this function.

Actually, on second thoughts, maybe it doesn't matter so much.  The main
issue is making sure that the interrupt will make the VCPU drop out of
xen_poll_irq() - if it happens before xen_poll_irq(), it should leave
the event pending, which will cause the poll to return immediately.  I
hope.  Certainly disabling interrupts for some of the function will make
it easier to analyze with respect to interrupt nesting.

Another issue may be making sure the writes and reads of w-want and
w-lock are ordered properly to make sure that xen_unlock_kick() never
sees an inconsistent view of the (lock,want) tuple.  The risk being that
xen_unlock_kick() sees a random, spurious (lock,want) pairing and sends
the kick event to the wrong VCPU, leaving the deserving one hung.

J

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [PATCH 13/14] x86/ticketlock: add slowpath logic

2010-11-17 Thread Jeremy Fitzhardinge
On 11/17/2010 04:21 AM, Peter Zijlstra wrote:
 On Tue, 2010-11-16 at 13:08 -0800, Jeremy Fitzhardinge wrote:
 Maintain a flag in both LSBs of the ticket lock which indicates whether
 anyone is in the lock slowpath and may need kicking when the current
 holder unlocks.  The flags are set when the first locker enters
 the slowpath, and cleared when unlocking to an empty queue.
 So here you say you set both LSBs in order to keep head == tail working,
 but the code seems to suggest you only use the tail LSB.

 I think I see why using only one LSB is sufficient, but some consistency
 would be nice :-)

I tried that initially, but it turned out more messy.  The problem is
that the flag can change while you're spinning on the lock, so you need
to mask it out every time you read tail before you can compare it to
head; if head has the flag set too, you just need to mask it out of
there as well:

ticket = xadd(lock, 2  TICKET_SHIFT);
ticket.tail = ~TICKET_SLOW_FLAGS;

while (ticket.head != ticket.tail) {
relax();
ticket.head = lock-head /*  ~TICKET_SLOW_FLAGS */;
}

IOW setting both doesn't help anything, and just requires an extra mask
in the spin loop (and anywhere else that uses 'head').

And hey, extra bit.  Bound to be useful for something.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [PATCH 09/14] xen/pvticketlock: Xen implementation for PV ticket locks

2010-11-17 Thread Jeremy Fitzhardinge
On 11/17/2010 02:34 AM, Jan Beulich wrote:
 Actually, on second thoughts, maybe it doesn't matter so much.  The main
 issue is making sure that the interrupt will make the VCPU drop out of
 xen_poll_irq() - if it happens before xen_poll_irq(), it should leave
 the event pending, which will cause the poll to return immediately.  I
 hope.  Certainly disabling interrupts for some of the function will make
 it easier to analyze with respect to interrupt nesting.
 That's not my main concern. Instead, what if you get interrupted
 anywhere here, the interrupt handler tries to acquire another
 spinlock and also has to go into the slow path? It'll overwrite part
 or all of the outer context's state.

That doesn't matter if the outer context doesn't end up blocking.  If it
has already blocked then it will unblock as a result of the interrupt;
if it hasn't yet blocked, then the inner context will leave the event
pending and cause it to not block.  Either way, it no longer uses or
needs that per-cpu state: it will return to the spin loop and (maybe)
get re-entered, setting it all up again.

I think there is a problem with the code as posted because it sets up
the percpu data before clearing the pending event, so it can end up
blocking with bad percpu data.

 Another issue may be making sure the writes and reads of w-want and
 w-lock are ordered properly to make sure that xen_unlock_kick() never
 sees an inconsistent view of the (lock,want) tuple.  The risk being that
 xen_unlock_kick() sees a random, spurious (lock,want) pairing and sends
 the kick event to the wrong VCPU, leaving the deserving one hung.
 Yes, proper operation sequence (and barriers) is certainly
 required here. If you allowed nesting, this may even become
 simpler (as you'd have a single write making visible the new
 head pointer, after having written all relevant fields of the
 new head structure).

Yes, simple nesting should be quite straightforward (ie allowing an
interrupt handler to take some other lock than the one the outer context
is waiting on).

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 03/14] x86/ticketlock: Use C for __ticket_spin_unlock

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

If we don't need to use a locked inc for unlock, then implement it in C.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h |   32 +---
 1 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index f48a6e3..0170ba9 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -33,9 +33,21 @@
  * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
  * (PPro errata 66, 92)
  */
-# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+   if (sizeof(lock-tickets.head) == sizeof(u8))
+   asm (LOCK_PREFIX incb %0
+: +m (lock-tickets.head) : : memory);
+   else
+   asm (LOCK_PREFIX incw %0
+: +m (lock-tickets.head) : : memory);
+
+}
 #else
-# define UNLOCK_LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+   lock-tickets.head++;
+}
 #endif
 
 /*
@@ -93,14 +105,6 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
 
return tmp;
 }
-
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
-{
-   asm volatile(UNLOCK_LOCK_PREFIX incb %0
-: +m (lock-slock)
-:
-: memory, cc);
-}
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
@@ -144,15 +148,13 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
 
return tmp;
 }
+#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
-   asm volatile(UNLOCK_LOCK_PREFIX incw %0
-: +m (lock-slock)
-:
-: memory, cc);
+   __ticket_unlock_release(lock);
+   barrier();  /* prevent reordering into locked region */
 }
-#endif
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 02/14] x86/ticketlock: convert spin loop to C

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

The inner loop of __ticket_spin_lock isn't doing anything very special,
so reimplement it in C.

For the 8 bit ticket lock variant, we use a register union to get direct
access to the lower and upper bytes in the tickets, but unfortunately gcc
won't generate a direct comparison between the two halves of the register,
so the generated asm isn't quite as pretty as the hand-coded version.
However benchmarking shows that this is actually a small improvement in
runtime performance on some benchmarks, and never a slowdown.

We also need to make sure there's a barrier at the end of the lock loop
to make sure that the compiler doesn't move any instructions from within
the locked region into the region where we don't yet own the lock.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h |   58 +++---
 1 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index d6d5784..f48a6e3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -58,21 +58,21 @@
 #if (NR_CPUS  256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   unsigned short inc = 1  TICKET_SHIFT;
-
-   asm volatile (
-   LOCK_PREFIX xaddw %w0, %1\n
-   1:\t
-   cmpb %h0, %b0\n\t
-   je 2f\n\t
-   rep ; nop\n\t
-   movb %1, %b0\n\t
-   /* don't need lfence here, because loads are in-order */
-   jmp 1b\n
-   2:
-   : +Q (inc), +m (lock-slock)
-   :
-   : memory, cc);
+   register union {
+   struct __raw_tickets tickets;
+   unsigned short slock;
+   } inc = { .slock = 1  TICKET_SHIFT };
+
+   asm volatile (LOCK_PREFIX xaddw %w0, %1\n
+ : +Q (inc), +m (lock-slock) : : memory, cc);
+
+   for (;;) {
+   if (inc.tickets.head == inc.tickets.tail)
+   goto out;
+   cpu_relax();
+   inc.tickets.head = ACCESS_ONCE(lock-tickets.head);
+   }
+out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
@@ -105,22 +105,22 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
unsigned inc = 1  TICKET_SHIFT;
-   unsigned tmp;
+   __ticket_t tmp;
 
-   asm volatile(LOCK_PREFIX xaddl %0, %1\n
-movzwl %w0, %2\n\t
-shrl $16, %0\n\t
-1:\t
-cmpl %0, %2\n\t
-je 2f\n\t
-rep ; nop\n\t
-movzwl %1, %2\n\t
-/* don't need lfence here, because loads are in-order */
-jmp 1b\n
-2:
-: +r (inc), +m (lock-slock), =r (tmp)
-:
-: memory, cc);
+   asm volatile(LOCK_PREFIX xaddl %0, %1\n\t
+: +r (inc), +m (lock-slock)
+: : memory, cc);
+
+   tmp = inc;
+   inc = TICKET_SHIFT;
+
+   for (;;) {
+   if ((__ticket_t)inc == tmp)
+   goto out;
+   cpu_relax();
+   tmp = ACCESS_ONCE(lock-tickets.head);
+   }
+out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 04/14] x86/ticketlock: make large and small ticket versions of spin_lock the same

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Make the bulk of __ticket_spin_lock look identical for large and small
number of cpus.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h |   23 ---
 1 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 0170ba9..1b81809 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -70,19 +70,16 @@ static __always_inline void __ticket_unlock_release(struct 
arch_spinlock *lock)
 #if (NR_CPUS  256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   register union {
-   struct __raw_tickets tickets;
-   unsigned short slock;
-   } inc = { .slock = 1  TICKET_SHIFT };
+   register struct __raw_tickets inc = { .tail = 1 };
 
asm volatile (LOCK_PREFIX xaddw %w0, %1\n
- : +Q (inc), +m (lock-slock) : : memory, cc);
+ : +r (inc), +m (lock-tickets) : : memory, cc);
 
for (;;) {
-   if (inc.tickets.head == inc.tickets.tail)
+   if (inc.head == inc.tail)
goto out;
cpu_relax();
-   inc.tickets.head = ACCESS_ONCE(lock-tickets.head);
+   inc.head = ACCESS_ONCE(lock-tickets.head);
}
 out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
@@ -108,21 +105,17 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   unsigned inc = 1  TICKET_SHIFT;
-   __ticket_t tmp;
+   register struct __raw_tickets inc = { .tail = 1 };
 
asm volatile(LOCK_PREFIX xaddl %0, %1\n\t
-: +r (inc), +m (lock-slock)
+: +r (inc), +m (lock-tickets)
 : : memory, cc);
 
-   tmp = inc;
-   inc = TICKET_SHIFT;
-
for (;;) {
-   if ((__ticket_t)inc == tmp)
+   if (inc.head == inc.tail)
goto out;
cpu_relax();
-   tmp = ACCESS_ONCE(lock-tickets.head);
+   inc.head = ACCESS_ONCE(lock-tickets.head);
}
 out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 00/14] PV ticket locks without expanding spinlock

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Hi all,

This is a revised version of the pvticket lock series.

The early part of the series is mostly unchanged: it converts the bulk
of the ticket lock code into C and makes the small and large
ticket code common.  The only changes are the incorporation of various
review comments.

The latter part of the series converts from pv spinlocks to pv ticket
locks (ie, using the ticket lock fastpath as-is, but adding pv ops for
the ticketlock slowpaths).

The significant difference here is that rather than adding a new
ticket_t-sized element to arch_spinlock_t - effectively doubling the
size - I steal the LSB of the tickets themselves to store a bit.  This
allows the structure to remain the same size, but at the cost of
halving the max number of CPUs (127 for a 8-bit ticket, and a hard max
of 32767 overall).

The extra bit (well, two, but one is unused) in indicates whether the
lock has gone into slowpath state, which means one of its lockers
has entered its slowpath and has blocked in the hypervisor.  This
means the current lock-holder needs to make sure it gets kicked out of
the hypervisor on unlock.

The spinlock remains in slowpath state until the last unlock happens
(ie there are no more queued lockers).

This code survives for a while with moderate testing, (make -j 100 on
8 VCPUs on a 4 PCPU system), but locks up after about 20 iterations,
so there's still some race/deadlock in there (probably something
misordered), but I think the basic approach is sound.

Thanks,
J

Jeremy Fitzhardinge (14):
  x86/ticketlock: clean up types and accessors
  x86/ticketlock: convert spin loop to C
  x86/ticketlock: Use C for __ticket_spin_unlock
  x86/ticketlock: make large and small ticket versions of spin_lock the
same
  x86/ticketlock: make __ticket_spin_lock common
  x86/ticketlock: make __ticket_spin_trylock common
  x86/spinlocks: replace pv spinlocks with pv ticketlocks
  x86/ticketlock: collapse a layer of functions
  xen/pvticketlock: Xen implementation for PV ticket locks
  x86/pvticketlock: use callee-save for lock_spinning
  x86/ticketlock: don't inline _spin_unlock when using paravirt
spinlocks
  x86/ticketlocks: when paravirtualizing ticket locks, increment by 2
  x86/ticketlock: add slowpath logic
  x86/ticketlocks: tidy up __ticket_unlock_kick()

 arch/x86/Kconfig  |3 +
 arch/x86/include/asm/paravirt.h   |   30 +---
 arch/x86/include/asm/paravirt_types.h |8 +-
 arch/x86/include/asm/spinlock.h   |  236 +---
 arch/x86/include/asm/spinlock_types.h |   32 -
 arch/x86/kernel/paravirt-spinlocks.c  |   52 +--
 arch/x86/xen/spinlock.c   |  281 +
 kernel/Kconfig.locks  |2 +-
 8 files changed, 231 insertions(+), 413 deletions(-)

-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 08/14] x86/ticketlock: collapse a layer of functions

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Now that the paravirtualization layer doesn't exist at the spinlock
level any more, we can collapse the __ticket_ functions into the arch_
functions.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h |   35 +--
 1 files changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 8e379d3..cfa80b5 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -108,7 +108,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock, __t
ticket_unlock_kick(lock, next);
 }
 
-static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
register struct __raw_tickets inc;
 
@@ -128,7 +128,7 @@ static __always_inline void __ticket_spin_lock(struct 
arch_spinlock *lock)
 out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
+static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
arch_spinlock_t old, new;
 
@@ -142,7 +142,7 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
 }
 
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
+static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
__ticket_t next = lock-tickets.head + 1;
__ticket_unlock_release(lock);
@@ -150,46 +150,21 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
barrier();  /* prevent reordering into locked region */
 }
 
-static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
return !!(tmp.tail ^ tmp.head);
 }
 
-static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
+static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
return ((tmp.tail - tmp.head)  TICKET_MASK)  1;
 }
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_locked(lock);
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_contended(lock);
-}
 #define arch_spin_is_contended arch_spin_is_contended
 
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-   __ticket_spin_lock(lock);
-}
-
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-   return __ticket_spin_trylock(lock);
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-   __ticket_spin_unlock(lock);
-}
-
 static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
  unsigned long flags)
 {
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 07/14] x86/spinlocks: replace pv spinlocks with pv ticketlocks

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Rather than outright replacing the entire spinlock implementation in
order to paravirtualize it, keep the ticket lock implementation but add
a couple of pvops hooks on the slow patch (long spin on lock, unlocking
a contended lock).

Ticket locks have a number of nice properties, but they also have some
surprising behaviours in virtual environments.  They enforce a strict
FIFO ordering on cpus trying to take a lock; however, if the hypervisor
scheduler does not schedule the cpus in the correct order, the system can
waste a huge amount of time spinning until the next cpu can take the lock.

(See Thomas Friebel's talk Prevent Guests from Spinning Around
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

To address this, we add two hooks:
 - __ticket_spin_lock which is called after the cpu has been
   spinning on the lock for a significant number of iterations but has
   failed to take the lock (presumably because the cpu holding the lock
   has been descheduled).  The lock_spinning pvop is expected to block
   the cpu until it has been kicked by the current lock holder.
 - __ticket_spin_unlock, which on releasing a contended lock
   (there are more cpus with tail tickets), it looks to see if the next
   cpu is blocked and wakes it if so.

When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
functions causes all the extra code to go away.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/paravirt.h   |   30 +++---
 arch/x86/include/asm/paravirt_types.h |8 +
 arch/x86/include/asm/spinlock.h   |   44 +++--
 arch/x86/kernel/paravirt-spinlocks.c  |   15 +-
 arch/x86/xen/spinlock.c   |7 -
 5 files changed, 50 insertions(+), 54 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 18e3b8a..c864775 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -717,36 +717,14 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP)  defined(CONFIG_PARAVIRT_SPINLOCKS)
 
-static inline int arch_spin_is_locked(struct arch_spinlock *lock)
+static inline void __ticket_lock_spinning(struct arch_spinlock *lock, unsigned 
ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
+   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static inline int arch_spin_is_contended(struct arch_spinlock *lock)
+static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned 
ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
-}
-#define arch_spin_is_contended arch_spin_is_contended
-
-static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
-}
-
-static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
- unsigned long flags)
-{
-   PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
-}
-
-static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
-{
-   return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
-}
-
-static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
+   PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
 #endif
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index b82bac9..1078474 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -315,12 +315,8 @@ struct pv_mmu_ops {
 
 struct arch_spinlock;
 struct pv_lock_ops {
-   int (*spin_is_locked)(struct arch_spinlock *lock);
-   int (*spin_is_contended)(struct arch_spinlock *lock);
-   void (*spin_lock)(struct arch_spinlock *lock);
-   void (*spin_lock_flags)(struct arch_spinlock *lock, unsigned long 
flags);
-   int (*spin_trylock)(struct arch_spinlock *lock);
-   void (*spin_unlock)(struct arch_spinlock *lock);
+   void (*lock_spinning)(struct arch_spinlock *lock, unsigned ticket);
+   void (*unlock_kick)(struct arch_spinlock *lock, unsigned ticket);
 };
 
 /* This contains all the paravirt structures: we get a convenient
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3afb1a7..8e379d3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -50,6 +50,21 @@ static __always_inline void __ticket_unlock_release(struct 
arch_spinlock *lock)
 }
 #endif
 
+/* How long a lock should spin before we consider blocking */
+#define SPIN_THRESHOLD (1  11)
+
+#ifndef CONFIG_PARAVIRT_SPINLOCKS
+
+static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, 
unsigned ticket)
+{
+}
+
+static __always_inline void ticket_unlock_kick

[PATCH 13/14] x86/ticketlock: add slowpath logic

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Maintain a flag in both LSBs of the ticket lock which indicates whether
anyone is in the lock slowpath and may need kicking when the current
holder unlocks.  The flags are set when the first locker enters
the slowpath, and cleared when unlocking to an empty queue.

In the specific implementation of lock_spinning(), make sure to set
the slowpath flags on the lock just before blocking.  We must do
this before the last-chance pickup test to prevent a deadlock
with the unlocker:

UnlockerLocker
test for lock pickup
- fail
test slowpath + unlock
- false
set slowpath flags
block

Whereas this works in any ordering:

UnlockerLocker
set slowpath flags
test for lock pickup
- fail
block
test slowpath + unlock
- true, kick

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h   |   53 -
 arch/x86/include/asm/spinlock_types.h |2 +
 arch/x86/kernel/paravirt-spinlocks.c  |   37 +++
 arch/x86/xen/spinlock.c   |4 ++
 4 files changed, 88 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 9e1c7ce..8d1cb42 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -53,7 +53,38 @@ static __always_inline void __ticket_unlock_release(struct 
arch_spinlock *lock)
 /* How long a lock should spin before we consider blocking */
 #define SPIN_THRESHOLD (1  11)
 
-#ifndef CONFIG_PARAVIRT_SPINLOCKS
+/* Only defined when CONFIG_PARAVIRT_SPINLOCKS defined, but may as
+ * well leave the prototype always visible.  */
+extern void __ticket_unlock_release_slowpath(struct arch_spinlock *lock);
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+/*
+ * Return true if someone is in the slowpath on this lock.  This
+ * should only be used by the current lock-holder.
+ */
+static inline bool __ticket_in_slowpath(struct arch_spinlock *lock)
+{
+   return !!(lock-tickets.tail  TICKET_SLOWPATH_FLAG);
+}
+
+static inline void __ticket_enter_slowpath(struct arch_spinlock *lock)
+{
+   if (sizeof(lock-tickets.tail) == sizeof(u8))
+   asm (LOCK_PREFIX orb %1, %0
+: +m (lock-tickets.tail)
+: i (TICKET_SLOWPATH_FLAG) : memory);
+   else
+   asm (LOCK_PREFIX orw %1, %0
+: +m (lock-tickets.tail)
+: i (TICKET_SLOWPATH_FLAG) : memory);
+}
+
+#else  /* !CONFIG_PARAVIRT_SPINLOCKS */
+static inline bool __ticket_in_slowpath(struct arch_spinlock *lock)
+{
+   return false;
+}
 
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, 
unsigned ticket)
 {
@@ -84,18 +115,22 @@ static __always_inline void ticket_unlock_kick(struct 
arch_spinlock *lock, u
  */
 static __always_inline struct __raw_tickets __ticket_spin_claim(struct 
arch_spinlock *lock)
 {
-   register struct __raw_tickets tickets = { .tail = TICKET_LOCK_INC };
+   register struct arch_spinlock tickets = {
+   { .tickets.tail = TICKET_LOCK_INC }
+   };
 
if (sizeof(lock-tickets.head) == sizeof(u8))
asm volatile (LOCK_PREFIX xaddw %w0, %1\n
- : +r (tickets), +m (lock-tickets)
+ : +r (tickets.head_tail), +m (lock-tickets)
  : : memory, cc);
else
asm volatile (LOCK_PREFIX xaddl %0, %1\n
-: +r (tickets), +m (lock-tickets)
+: +r (tickets.head_tail), +m (lock-tickets)
 : : memory, cc);
 
-   return tickets;
+   tickets.tickets.tail = ~TICKET_SLOWPATH_FLAG;
+
+   return tickets.tickets;
 }
 
 /* 
@@ -144,9 +179,11 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-   __ticket_t next = lock-tickets.head + TICKET_LOCK_INC;
-   __ticket_unlock_release(lock);
-   __ticket_unlock_kick(lock, next);
+   barrier();  /* prevent reordering out of locked region */
+   if (unlikely(__ticket_in_slowpath(lock)))
+   __ticket_unlock_release_slowpath(lock);
+   else
+   __ticket_unlock_release(lock);
barrier();  /* prevent reordering into locked region */
 }
 
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 0553c0b..7b383e2 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h

[PATCH 05/14] x86/ticketlock: make __ticket_spin_lock common

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Aside from the particular form of the xadd instruction, they're identical.
So factor out the xadd and use common code for the rest.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h |   42 ++
 1 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 1b81809..f722f96 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -67,13 +67,27 @@ static __always_inline void __ticket_unlock_release(struct 
arch_spinlock *lock)
  * save some instructions and make the code more elegant. There really isn't
  * much between them in performance though, especially as locks are out of 
line.
  */
-#if (NR_CPUS  256)
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
+static __always_inline struct __raw_tickets __ticket_spin_claim(struct 
arch_spinlock *lock)
 {
-   register struct __raw_tickets inc = { .tail = 1 };
+   register struct __raw_tickets tickets = { .tail = 1 };
+
+   if (sizeof(lock-tickets.head) == sizeof(u8))
+   asm volatile (LOCK_PREFIX xaddw %w0, %1\n
+ : +r (tickets), +m (lock-tickets)
+ : : memory, cc);
+   else
+   asm volatile (LOCK_PREFIX xaddl %0, %1\n
+: +r (tickets), +m (lock-tickets)
+: : memory, cc);
 
-   asm volatile (LOCK_PREFIX xaddw %w0, %1\n
- : +r (inc), +m (lock-tickets) : : memory, cc);
+   return tickets;
+}
+
+static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+{
+   register struct __raw_tickets inc;
+
+   inc = __ticket_spin_claim(lock);
 
for (;;) {
if (inc.head == inc.tail)
@@ -84,6 +98,7 @@ static __always_inline void 
__ticket_spin_lock(arch_spinlock_t *lock)
 out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
+#if (NR_CPUS  256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
unsigned int tmp, new;
@@ -103,23 +118,6 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
return tmp;
 }
 #else
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
-{
-   register struct __raw_tickets inc = { .tail = 1 };
-
-   asm volatile(LOCK_PREFIX xaddl %0, %1\n\t
-: +r (inc), +m (lock-tickets)
-: : memory, cc);
-
-   for (;;) {
-   if (inc.head == inc.tail)
-   goto out;
-   cpu_relax();
-   inc.head = ACCESS_ONCE(lock-tickets.head);
-   }
-out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
-}
-
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
unsigned tmp;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 01/14] x86/ticketlock: clean up types and accessors

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

A few cleanups to the way spinlocks are defined and accessed:
 - define __ticket_t which is the size of a spinlock ticket (ie, enough
   bits to hold all the cpus)
 - Define struct arch_spinlock as a union containing plain slock and
   the head and tail tickets
 - Use head and tail to implement some of the spinlock predicates.
 - Make all ticket variables unsigned.
 - Use TICKET_SHIFT to form constants

Most of this will be used in later patches.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h   |   24 ++--
 arch/x86/include/asm/spinlock_types.h |   20 ++--
 2 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3089f70..d6d5784 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -56,11 +56,9 @@
  * much between them in performance though, especially as locks are out of 
line.
  */
 #if (NR_CPUS  256)
-#define TICKET_SHIFT 8
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   short inc = 0x0100;
+   unsigned short inc = 1  TICKET_SHIFT;
 
asm volatile (
LOCK_PREFIX xaddw %w0, %1\n
@@ -79,7 +77,7 @@ static __always_inline void 
__ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-   int tmp, new;
+   unsigned int tmp, new;
 
asm volatile(movzwl %2, %0\n\t
 cmpb %h0,%b0\n\t
@@ -104,12 +102,10 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
 : memory, cc);
 }
 #else
-#define TICKET_SHIFT 16
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   int inc = 0x0001;
-   int tmp;
+   unsigned inc = 1  TICKET_SHIFT;
+   unsigned tmp;
 
asm volatile(LOCK_PREFIX xaddl %0, %1\n
 movzwl %w0, %2\n\t
@@ -129,8 +125,8 @@ static __always_inline void 
__ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-   int tmp;
-   int new;
+   unsigned tmp;
+   unsigned new;
 
asm volatile(movl %2,%0\n\t
 movl %0,%1\n\t
@@ -160,16 +156,16 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-   int tmp = ACCESS_ONCE(lock-slock);
+   struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
-   return !!(((tmp  TICKET_SHIFT) ^ tmp)  ((1  TICKET_SHIFT) - 1));
+   return !!(tmp.tail ^ tmp.head);
 }
 
 static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
 {
-   int tmp = ACCESS_ONCE(lock-slock);
+   struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
-   return (((tmp  TICKET_SHIFT) - tmp)  ((1  TICKET_SHIFT) - 1))  1;
+   return ((tmp.tail - tmp.head)  TICKET_MASK)  1;
 }
 
 #ifndef CONFIG_PARAVIRT_SPINLOCKS
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index dcb48b2..e3ad1e3 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -5,11 +5,27 @@
 # error please don't include this file directly
 #endif
 
+#include linux/types.h
+
+#if (CONFIG_NR_CPUS  256)
+typedef u8  __ticket_t;
+#else
+typedef u16 __ticket_t;
+#endif
+
+#define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
+#define TICKET_MASK((__ticket_t)((1  TICKET_SHIFT) - 1))
+
 typedef struct arch_spinlock {
-   unsigned int slock;
+   union {
+   unsigned int slock;
+   struct __raw_tickets {
+   __ticket_t head, tail;
+   } tickets;
+   };
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED  { 0 }
+#define __ARCH_SPIN_LOCK_UNLOCKED  { { .slock = 0 } }
 
 typedef struct {
unsigned int lock;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 11/14] x86/ticketlock: don't inline _spin_unlock when using paravirt spinlocks

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

The code size expands somewhat, and its probably better to just call
a function rather than inline it.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/Kconfig |3 +++
 kernel/Kconfig.locks |2 +-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e832768..a615c9c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -531,6 +531,9 @@ config PARAVIRT_SPINLOCKS
 
  If you are unsure how to answer this question, answer N.
 
+config ARCH_NOINLINE_SPIN_UNLOCK
+   def_bool PARAVIRT_SPINLOCKS
+
 config PARAVIRT_CLOCK
bool
 
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index 88c92fb..3216c22 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -125,7 +125,7 @@ config INLINE_SPIN_LOCK_IRQSAVE
 ARCH_INLINE_SPIN_LOCK_IRQSAVE
 
 config INLINE_SPIN_UNLOCK
-   def_bool !DEBUG_SPINLOCK  (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK)
+   def_bool !DEBUG_SPINLOCK  (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK)  
!ARCH_NOINLINE_SPIN_UNLOCK
 
 config INLINE_SPIN_UNLOCK_BH
def_bool !DEBUG_SPINLOCK  ARCH_INLINE_SPIN_UNLOCK_BH
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 06/14] x86/ticketlock: make __ticket_spin_trylock common

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Make trylock code common regardless of ticket size.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h   |   49 +++--
 arch/x86/include/asm/spinlock_types.h |6 +++-
 2 files changed, 14 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index f722f96..3afb1a7 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -98,48 +98,19 @@ static __always_inline void __ticket_spin_lock(struct 
arch_spinlock *lock)
 out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
-#if (NR_CPUS  256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-   unsigned int tmp, new;
-
-   asm volatile(movzwl %2, %0\n\t
-cmpb %h0,%b0\n\t
-leal 0x100(% REG_PTR_MODE 0), %1\n\t
-jne 1f\n\t
-LOCK_PREFIX cmpxchgw %w1,%2\n\t
-1:
-sete %b1\n\t
-movzbl %b1,%0\n\t
-: =a (tmp), =q (new), +m (lock-slock)
-:
-: memory, cc);
-
-   return tmp;
-}
-#else
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-{
-   unsigned tmp;
-   unsigned new;
-
-   asm volatile(movl %2,%0\n\t
-movl %0,%1\n\t
-roll $16, %0\n\t
-cmpl %0,%1\n\t
-leal 0x0001(% REG_PTR_MODE 0), %1\n\t
-jne 1f\n\t
-LOCK_PREFIX cmpxchgl %1,%2\n\t
-1:
-sete %b1\n\t
-movzbl %b1,%0\n\t
-: =a (tmp), =q (new), +m (lock-slock)
-:
-: memory, cc);
-
-   return tmp;
+   arch_spinlock_t old, new;
+
+   old.tickets = ACCESS_ONCE(lock-tickets);
+   if (old.tickets.head != old.tickets.tail)
+   return 0;
+
+   new.head_tail = old.head_tail + (1  TICKET_SHIFT);
+
+   /* cmpxchg is a full barrier, so nothing can move before it */
+   return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
 }
-#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index e3ad1e3..72e154e 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -9,8 +9,10 @@
 
 #if (CONFIG_NR_CPUS  256)
 typedef u8  __ticket_t;
+typedef u16 __ticketpair_t;
 #else
 typedef u16 __ticket_t;
+typedef u32 __ticketpair_t;
 #endif
 
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
@@ -18,14 +20,14 @@ typedef u16 __ticket_t;
 
 typedef struct arch_spinlock {
union {
-   unsigned int slock;
+   __ticketpair_t head_tail;
struct __raw_tickets {
__ticket_t head, tail;
} tickets;
};
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED  { { .slock = 0 } }
+#define __ARCH_SPIN_LOCK_UNLOCKED  { { 0 } }
 
 typedef struct {
unsigned int lock;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 14/14] x86/ticketlocks: tidy up __ticket_unlock_kick()

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

__ticket_unlock_kick() is now only called from known slowpaths, so there's
no need for it to do any checking of its own.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/paravirt.h |2 +-
 arch/x86/include/asm/spinlock.h |   14 --
 2 files changed, 1 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 6f275ca..7755b16 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -722,7 +722,7 @@ static inline void __ticket_lock_spinning(struct 
arch_spinlock *lock, unsigned t
PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned 
ticket)
+static inline void __ticket_unlock_kick(struct arch_spinlock *lock, unsigned 
ticket)
 {
PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 8d1cb42..70675bc 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -90,10 +90,6 @@ static __always_inline void __ticket_lock_spinning(struct 
arch_spinlock *lock, u
 {
 }
 
-static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock, 
unsigned ticket)
-{
-}
-
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
 
 /*
@@ -133,16 +129,6 @@ static __always_inline struct __raw_tickets 
__ticket_spin_claim(struct arch_spin
return tickets.tickets;
 }
 
-/* 
- * If a spinlock has someone waiting on it, then kick the appropriate
- * waiting cpu.
- */
-static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
-{
-   if (unlikely(lock-tickets.tail != next))
-   ticket_unlock_kick(lock, next);
-}
-
 static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
register struct __raw_tickets inc;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 10/14] x86/pvticketlock: use callee-save for lock_spinning

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path.  To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/paravirt.h   |2 +-
 arch/x86/include/asm/paravirt_types.h |2 +-
 arch/x86/kernel/paravirt-spinlocks.c  |2 +-
 arch/x86/xen/spinlock.c   |3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index c864775..6f275ca 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -719,7 +719,7 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 static inline void __ticket_lock_spinning(struct arch_spinlock *lock, unsigned 
ticket)
 {
-   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
+   PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
 static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned 
ticket)
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 1078474..53f249a 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -315,7 +315,7 @@ struct pv_mmu_ops {
 
 struct arch_spinlock;
 struct pv_lock_ops {
-   void (*lock_spinning)(struct arch_spinlock *lock, unsigned ticket);
+   struct paravirt_callee_save lock_spinning;
void (*unlock_kick)(struct arch_spinlock *lock, unsigned ticket);
 };
 
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index c2e010e..4251c1d 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -9,7 +9,7 @@
 
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-   .lock_spinning = paravirt_nop,
+   .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
.unlock_kick = paravirt_nop,
 #endif
 };
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 5feb897..c31c5a3 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -137,6 +137,7 @@ out:
w-lock = NULL;
spin_time_accum_blocked(start);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
 
 static void xen_unlock_kick(struct arch_spinlock *lock, unsigned next)
 {
@@ -189,7 +190,7 @@ void xen_uninit_lock_cpu(int cpu)
 
 void __init xen_init_spinlocks(void)
 {
-   pv_lock_ops.lock_spinning = xen_lock_spinning;
+   pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 09/14] xen/pvticketlock: Xen implementation for PV ticket locks

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Replace the old Xen implementation of PV spinlocks with and implementation
of xen_lock_spinning and xen_unlock_kick.

xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
adds itself to the waiting_cpus set, and blocks on an event channel
until the channel becomes pending.

xen_unlock_kick searches the cpus in waiting_cpus looking for the one
which next wants this lock with the next ticket, if any.  If found,
it kicks it by making its event channel pending, which wakes it up.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/xen/spinlock.c |  281 ++-
 1 files changed, 36 insertions(+), 245 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 3d9da72..5feb897 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -19,32 +19,21 @@
 #ifdef CONFIG_XEN_DEBUG_FS
 static struct xen_spinlock_stats
 {
-   u64 taken;
u32 taken_slow;
-   u32 taken_slow_nested;
u32 taken_slow_pickup;
u32 taken_slow_spurious;
-   u32 taken_slow_irqenable;
 
-   u64 released;
u32 released_slow;
u32 released_slow_kicked;
 
 #define HISTO_BUCKETS  30
-   u32 histo_spin_total[HISTO_BUCKETS+1];
-   u32 histo_spin_spinning[HISTO_BUCKETS+1];
u32 histo_spin_blocked[HISTO_BUCKETS+1];
 
-   u64 time_total;
-   u64 time_spinning;
u64 time_blocked;
 } spinlock_stats;
 
 static u8 zero_stats;
 
-static unsigned lock_timeout = 1  10;
-#define TIMEOUT lock_timeout
-
 static inline void check_zero(void)
 {
if (unlikely(zero_stats)) {
@@ -73,22 +62,6 @@ static void __spin_time_accum(u64 delta, u32 *array)
array[HISTO_BUCKETS]++;
 }
 
-static inline void spin_time_accum_spinning(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_spinning);
-   spinlock_stats.time_spinning += delta;
-}
-
-static inline void spin_time_accum_total(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_total);
-   spinlock_stats.time_total += delta;
-}
-
 static inline void spin_time_accum_blocked(u64 start)
 {
u32 delta = xen_clocksource_read() - start;
@@ -105,214 +78,76 @@ static inline u64 spin_time_start(void)
return 0;
 }
 
-static inline void spin_time_accum_total(u64 start)
-{
-}
-static inline void spin_time_accum_spinning(u64 start)
-{
-}
 static inline void spin_time_accum_blocked(u64 start)
 {
 }
 #endif  /* CONFIG_XEN_DEBUG_FS */
 
-struct xen_spinlock {
-   unsigned char lock; /* 0 - free; 1 - locked */
-   unsigned short spinners;/* count of waiting cpus */
+struct xen_lock_waiting {
+   struct arch_spinlock *lock;
+   __ticket_t want;
 };
 
 static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
+static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting);
+static cpumask_t waiting_cpus;
 
-#if 0
-static int xen_spin_is_locked(struct arch_spinlock *lock)
-{
-   struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-
-   return xl-lock != 0;
-}
-
-static int xen_spin_is_contended(struct arch_spinlock *lock)
-{
-   struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-
-   /* Not strictly true; this is only the count of contended
-  lock-takers entering the slow path. */
-   return xl-spinners != 0;
-}
-
-static int xen_spin_trylock(struct arch_spinlock *lock)
-{
-   struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-   u8 old = 1;
-
-   asm(xchgb %b0,%1
-   : +q (old), +m (xl-lock) : : memory);
-
-   return old == 0;
-}
-
-static DEFINE_PER_CPU(struct xen_spinlock *, lock_spinners);
-
-/*
- * Mark a cpu as interested in a lock.  Returns the CPU's previous
- * lock of interest, in case we got preempted by an interrupt.
- */
-static inline struct xen_spinlock *spinning_lock(struct xen_spinlock *xl)
-{
-   struct xen_spinlock *prev;
-
-   prev = __get_cpu_var(lock_spinners);
-   __get_cpu_var(lock_spinners) = xl;
-
-   wmb();  /* set lock of interest before count */
-
-   asm(LOCK_PREFIX  incw %0
-   : +m (xl-spinners) : : memory);
-
-   return prev;
-}
-
-/*
- * Mark a cpu as no longer interested in a lock.  Restores previous
- * lock of interest (NULL for none).
- */
-static inline void unspinning_lock(struct xen_spinlock *xl, struct 
xen_spinlock *prev)
-{
-   asm(LOCK_PREFIX  decw %0
-   : +m (xl-spinners) : : memory);
-   wmb();  /* decrement count before restoring lock */
-   __get_cpu_var(lock_spinners) = prev;
-}
-
-static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool 
irq_enable)
+static void xen_lock_spinning(struct arch_spinlock *lock, unsigned want)
 {
-   struct xen_spinlock *xl

[PATCH 12/14] x86/ticketlocks: when paravirtualizing ticket locks, increment by 2

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Increment ticket head/tails by 2 rather than 1 to leave the LSB free
to store a is in slowpath state bit.  This halves the number
of possible CPUs for a given ticket size, but this shouldn't matter
in practice - kernels built for 32k+ CPU systems are probably
specially built for the hardware rather than a generic distro
kernel.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h   |   18 +-
 arch/x86/include/asm/spinlock_types.h |   10 +-
 2 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index cfa80b5..9e1c7ce 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -36,17 +36,17 @@
 static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
 {
if (sizeof(lock-tickets.head) == sizeof(u8))
-   asm (LOCK_PREFIX incb %0
-: +m (lock-tickets.head) : : memory);
+   asm (LOCK_PREFIX addb %1, %0
+: +m (lock-tickets.head) : i (TICKET_LOCK_INC) : 
memory);
else
-   asm (LOCK_PREFIX incw %0
-: +m (lock-tickets.head) : : memory);
+   asm (LOCK_PREFIX addw %1, %0
+: +m (lock-tickets.head) : i (TICKET_LOCK_INC) : 
memory);
 
 }
 #else
 static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
 {
-   lock-tickets.head++;
+   lock-tickets.head += TICKET_LOCK_INC;
 }
 #endif
 
@@ -84,7 +84,7 @@ static __always_inline void ticket_unlock_kick(struct 
arch_spinlock *lock, u
  */
 static __always_inline struct __raw_tickets __ticket_spin_claim(struct 
arch_spinlock *lock)
 {
-   register struct __raw_tickets tickets = { .tail = 1 };
+   register struct __raw_tickets tickets = { .tail = TICKET_LOCK_INC };
 
if (sizeof(lock-tickets.head) == sizeof(u8))
asm volatile (LOCK_PREFIX xaddw %w0, %1\n
@@ -136,7 +136,7 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
if (old.tickets.head != old.tickets.tail)
return 0;
 
-   new.head_tail = old.head_tail + (1  TICKET_SHIFT);
+   new.head_tail = old.head_tail + (TICKET_LOCK_INC  TICKET_SHIFT);
 
/* cmpxchg is a full barrier, so nothing can move before it */
return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
@@ -144,7 +144,7 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-   __ticket_t next = lock-tickets.head + 1;
+   __ticket_t next = lock-tickets.head + TICKET_LOCK_INC;
__ticket_unlock_release(lock);
__ticket_unlock_kick(lock, next);
barrier();  /* prevent reordering into locked region */
@@ -161,7 +161,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t 
*lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
-   return ((tmp.tail - tmp.head)  TICKET_MASK)  1;
+   return ((tmp.tail - tmp.head)  TICKET_MASK)  TICKET_LOCK_INC;
 }
 #define arch_spin_is_contended arch_spin_is_contended
 
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 72e154e..0553c0b 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -7,7 +7,13 @@
 
 #include linux/types.h
 
-#if (CONFIG_NR_CPUS  256)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define __TICKET_LOCK_INC  2
+#else
+#define __TICKET_LOCK_INC  1
+#endif
+
+#if (CONFIG_NR_CPUS  (256 / __TICKET_LOCK_INC))
 typedef u8  __ticket_t;
 typedef u16 __ticketpair_t;
 #else
@@ -15,6 +21,8 @@ typedef u16 __ticket_t;
 typedef u32 __ticketpair_t;
 #endif
 
+#define TICKET_LOCK_INC((__ticket_t)__TICKET_LOCK_INC)
+
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
 #define TICKET_MASK((__ticket_t)((1  TICKET_SHIFT) - 1))
 
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH -next] xen: fix header export to userspace

2010-11-15 Thread Jeremy Fitzhardinge
On 11/13/2010 08:44 AM, Randy Dunlap wrote:
 From: Randy Dunlap randy.dun...@oracle.com

 scripts/headers_install.pl prevents __user from being exported
 to userspace headers, so just use compiler.h to make sure that
 __user is defined and avoid the error.

 unifdef: linux-next-20101112/xx64/usr/include/xen/privcmd.h.tmp: 79: 
 Premature EOF (#if line 33 depth 1)

Ah, OK, thanks.  I was wondering what the proper fix for this was.  I'll
stick this in my tree.

Thanks,
J

 Signed-off-by: Randy Dunlap randy.dun...@oracle.com
 Cc:   Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
 Cc:   Konrad Rzeszutek Wilk konrad.w...@oracle.com
 Cc:   xen-de...@lists.xensource.com (moderated for non-subscribers)
 Cc:   virtualizat...@lists.osdl.org
 Cc:   Tony Finch d...@dotat.at
 ---
  include/xen/privcmd.h |5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

 --- linux-next-20101112.orig/include/xen/privcmd.h
 +++ linux-next-20101112/include/xen/privcmd.h
 @@ -34,13 +34,10 @@
  #define __LINUX_PUBLIC_PRIVCMD_H__
  
  #include linux/types.h
 +#include linux/compiler.h
  
  typedef unsigned long xen_pfn_t;
  
 -#ifndef __user
 -#define __user
 -#endif
 -
  struct privcmd_hypercall {
   __u64 op;
   __u64 arg[5];

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 01/20] x86/ticketlock: clean up types and accessors

2010-11-15 Thread Jeremy Fitzhardinge
On 11/13/2010 01:57 AM, Américo Wang wrote:
 On Wed, Nov 03, 2010 at 10:59:42AM -0400, Jeremy Fitzhardinge wrote:
 ...
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
 -int inc = 0x0001;
 -int tmp;
 +unsigned inc = 1  TICKET_SHIFT;
 +unsigned tmp;
 Please don't use old implicit-int.

I don't mind much either way, but I don't think I've seen anyone worry
about this before.

 -return (((tmp  TICKET_SHIFT) - tmp)  ((1  TICKET_SHIFT) - 1))  1;
 +return ((tmp.tail - tmp.head)  TICKET_MASK)  1;

 There is a type promotion here.

...

 }

 #ifndef CONFIG_PARAVIRT_SPINLOCKS
 diff --git a/arch/x86/include/asm/spinlock_types.h 
 b/arch/x86/include/asm/spinlock_types.h
 index dcb48b2..4582640 100644
 --- a/arch/x86/include/asm/spinlock_types.h
 +++ b/arch/x86/include/asm/spinlock_types.h
 @@ -5,11 +5,27 @@
 # error please don't include this file directly
 #endif

 +#include linux/types.h
 +
 +#if (CONFIG_NR_CPUS  256)
 +typedef u8  __ticket_t;
 +#else
 +typedef u16 __ticket_t;
 +#endif
 +
 +#define TICKET_SHIFT(sizeof(__ticket_t) * 8)
 +#define TICKET_MASK ((1  TICKET_SHIFT) - 1)

 So here you may need to cast the result to __ticket_t.

OK.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 03/20] x86/ticketlock: Use C for __ticket_spin_unlock

2010-11-15 Thread Jeremy Fitzhardinge
On 11/13/2010 02:05 AM, Américo Wang wrote:
 On Wed, Nov 03, 2010 at 10:59:44AM -0400, Jeremy Fitzhardinge wrote:
  * On PPro SMP or if we are using OOSTORE, we use a locked operation to 
 unlock
  * (PPro errata 66, 92)
  */
 -# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
 +static __always_inline void __ticket_unlock_release(struct arch_spinlock 
 *lock)
 +{
 +if (sizeof(lock-tickets.head) == sizeof(u8))
 +asm (LOCK_PREFIX incb %0
 + : +m (lock-tickets.head) : : memory);
 +else
 +asm (LOCK_PREFIX incw %0
 + : +m (lock-tickets.head) : : memory);
 This 'if/else' really should be done with #ifdef, even though
 the compiler may be smart enough to remove it.

No, we depend on if/else with constant arguments doing the right thing
all over the kernel.  It is always preferable to use it instead of
#ifdef where possible, so that the two branches of code are always
subjected to compiler checking, even if they're not being used.

 +
 +}
 #else
 -# define UNLOCK_LOCK_PREFIX
 +static __always_inline void __ticket_unlock_release(struct arch_spinlock 
 *lock)
 +{
 +barrier();
 +lock-tickets.head++;
 +barrier();
 The second barrier() is not needed.

Agreed.  It gets removed in a later patch.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 06/20] x86/ticketlock: make __ticket_spin_trylock common

2010-11-15 Thread Jeremy Fitzhardinge
On 11/13/2010 02:48 AM, Eric Dumazet wrote:
 Le samedi 13 novembre 2010 à 18:17 +0800, Américo Wang a écrit :
 On Wed, Nov 03, 2010 at 10:59:47AM -0400, Jeremy Fitzhardinge wrote:

 +   union {
 +   struct __raw_tickets tickets;
 +   __ticketpair_t slock;
 +   } tmp, new;
 +   int ret;
 +
 +   tmp.tickets = ACCESS_ONCE(lock-tickets);
 +   if (tmp.tickets.head != tmp.tickets.tail)
 +   return 0;
 +
 +   new.slock = tmp.slock + (1  TICKET_SHIFT);
 +
 +   ret = cmpxchg(lock-ticketpair, tmp.slock, new.slock) == tmp.slock;
 +   barrier();  /* just make nothing creeps before lock is 
 claimed */
 This one should be smp_wmb(), right? No CONFIG_X86_OOSTORE protected.
 cmpxchg() is a full memory barrier, no need for smp_wmb() or barrier()

Agreed.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 00/20] x86: ticket lock rewrite and paravirtualization

2010-11-15 Thread Jeremy Fitzhardinge
On 11/12/2010 02:20 PM, H. Peter Anvin wrote:
 On 11/12/2010 02:17 PM, Jeremy Fitzhardinge wrote:
 On 11/12/2010 02:12 PM, H. Peter Anvin wrote:
 On 11/03/2010 07:59 AM, Jeremy Fitzhardinge wrote:
   - with an unmodified struct spinlock, it can check to see if
 head == tail after unlock; if not, then there's someone else
 trying to lock, and we can do a kick.  Unfortunately this
 generates very high level of redundant kicks, because the
 waiting CPU might not have blocked yet (which is the common
 case)

 How high is very high here -- most of the time (so that any mitigation
 on the slow patch is useless)?
 I'll need to remeasure, but I think around 90% of the slowpath entries
 were spurious without this.  In other words, when spinlocks do contend,
 most of the time it isn't very serious and the other cpu doesn't spend
 much time spinning.

 90% of the slowpath entries is one thing, my real question is the
 fraction of fastpath entries that get diverted to the slowpath.  It
 affects where mitigation needs to happen.

There are two questions: how many unlock events *must* go into the
slowpath for correctness reasons (ie, because the corresponding lock
also went slowpath and got blocked there), and how many end up going
into the slowpath due to imperfect heuristics?

The number of lock events which go slowpath is very dependent on the
workload of the kernel in question and of the machine overall.  On a
system with no CPU overcommit it should be zero (assuming that in the
native case no Linux spinlock remains contended for so long that it will
trigger the slowpath).  On a very overcommitted system, it comes down to
what the likelihood that a VCPU will get preempted while running in a
critical region: Tcrit * Phz, where Tcrit is the critical section time
in S and Phz is the preemption rate of the VCPU scheduler in Hz.  So,
for example, a lock with a 10uS critical section and a 100Hz preemption
rate will have a .1% chance of getting preempted and possibly causing
the other lockers to enter the slow path.

On the unlock side, it needs to test whether lock has any waiters in a
slowpath state.  A conservative test is whether there are any
outstanding tickets, but in my measurements 90% of CPUs which spun on a
lock ended up getting it without having to take the slowpath.  This lead
me to investigate more precise tests, which is currently a count of
slowpath-entering CPUs waiting on the lock.

Another approach I discussed with PeterZ and Mathieu is to steal the LSB
of the ticket counters (halving the max CPU count) to use as a there is
someone in slowpath waiting on this lock.  But I haven't spent the time
to work out an algorithm to maintain that flag (or flags, since there
are bits available) in a correct and efficient way.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 00/20] x86: ticket lock rewrite and paravirtualization

2010-11-15 Thread Jeremy Fitzhardinge
On 11/15/2010 12:14 PM, Peter Zijlstra wrote:
 On Mon, 2010-11-15 at 12:03 -0800, H. Peter Anvin wrote:
 On 11/15/2010 12:00 PM, Jeremy Fitzhardinge wrote:
 Another approach I discussed with PeterZ and Mathieu is to steal the LSB
 of the ticket counters (halving the max CPU count) to use as a there is
 someone in slowpath waiting on this lock.  But I haven't spent the time
 to work out an algorithm to maintain that flag (or flags, since there
 are bits available) in a correct and efficient way.

 Definitely worth pondering.
 Right, so the idea was to make the ticket increment 2, which would leave
 the LSB of both the head and tail available. I think that if one were to
 set both (using a cmpxchg), the ticket fast-path wouldn't need any
 changes since head==tail is still the correct condition for acquisition.

 Then the unlock needs an added conditional:
   if (tail  1) 
   unlock_slowpath()

The tricky part is knowing how to clear the bit(s) on the last person
dropping out of the slow path, and making that race-free with respect to
new lockers entering the slow path.  I guess you could leave it in
slowpath state until you're the last unlocker (ie, you're unlocking into
uncontended state), whereupon you also clear the bits; I guess that
would probably need a cmpxchg to make it safe WRT new lockers entering
slowpath.

As a heuristic, it shouldn't be too bad performancewise, since
(handwaving) if ticketholder N has entered the slowpath, then its likely
that N+1 will as well.

J

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 04/20] x86/ticketlock: make large and small ticket versions of spin_lock the same

2010-11-12 Thread Jeremy Fitzhardinge
On 11/12/2010 04:19 AM, Srivatsa Vaddagiri wrote:
 On Wed, Nov 03, 2010 at 10:59:45AM -0400, Jeremy Fitzhardinge wrote:
 Make the bulk of __ticket_spin_lock look identical for large and small
 number of cpus.
 [snip]

  #if (NR_CPUS  256)
  static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
  {
 -register union {
 -struct __raw_tickets tickets;
 -unsigned short slock;
 -} inc = { .slock = 1  TICKET_SHIFT };
 +register struct __raw_tickets inc = { .tail = 1 };
 [snip]

  #else
  static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
  {
 -unsigned inc = 1  TICKET_SHIFT;
 -__ticket_t tmp;
 +register struct __raw_tickets inc = { .tickets.tail = 1 };
 s/.tickets//?

 Otherwise I get a compile error for NR_CPUS  256, with just 4 patches 
 applied.

Yeah, likely.  That's precisely why I wanted to make them the same ;).

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 00/20] x86: ticket lock rewrite and paravirtualization

2010-11-12 Thread Jeremy Fitzhardinge
On 11/12/2010 02:12 PM, H. Peter Anvin wrote:
 On 11/03/2010 07:59 AM, Jeremy Fitzhardinge wrote:
   - with an unmodified struct spinlock, it can check to see if
 head == tail after unlock; if not, then there's someone else
 trying to lock, and we can do a kick.  Unfortunately this
 generates very high level of redundant kicks, because the
 waiting CPU might not have blocked yet (which is the common
 case)

 How high is very high here -- most of the time (so that any mitigation
 on the slow patch is useless)?

I'll need to remeasure, but I think around 90% of the slowpath entries
were spurious without this.  In other words, when spinlocks do contend,
most of the time it isn't very serious and the other cpu doesn't spend
much time spinning.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/3] VIDEO: xen-fb, switch to for_each_console

2010-11-04 Thread Jeremy Fitzhardinge
On 11/03/2010 11:28 AM, Jiri Slaby wrote:
 Use newly added for_each_console for iterating consoles.

 Signed-off-by: Jiri Slaby jsl...@suse.cz
 Cc: Jeremy Fitzhardinge jer...@xensource.com

Sure, if that's what all the kids are doing these days.

Acked-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

J

 Cc: Chris Wright chr...@sous-sol.org
 Cc: virtualizat...@lists.osdl.org
 Cc: xen-de...@lists.xensource.com
 Cc: linux-fb...@vger.kernel.org
 ---
  drivers/video/xen-fbfront.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/drivers/video/xen-fbfront.c b/drivers/video/xen-fbfront.c
 index 428d273..4abb0b9 100644
 --- a/drivers/video/xen-fbfront.c
 +++ b/drivers/video/xen-fbfront.c
 @@ -492,7 +492,7 @@ xenfb_make_preferred_console(void)
   return;
  
   acquire_console_sem();
 - for (c = console_drivers; c; c = c-next) {
 + for_each_console(c) {
   if (!strcmp(c-name, tty)  c-index == 0)
   break;
   }

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 04/20] x86/ticketlock: make large and small ticket versions of spin_lock the same

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Make the bulk of __ticket_spin_lock look identical for large and small
number of cpus.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h |   23 ---
 1 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 082990a..7586d7a 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -72,19 +72,16 @@ static __always_inline void __ticket_unlock_release(struct 
arch_spinlock *lock)
 #if (NR_CPUS  256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   register union {
-   struct __raw_tickets tickets;
-   unsigned short slock;
-   } inc = { .slock = 1  TICKET_SHIFT };
+   register struct __raw_tickets inc = { .tail = 1 };
 
asm volatile (LOCK_PREFIX xaddw %w0, %1\n
- : +Q (inc), +m (lock-slock) : : memory, cc);
+ : +r (inc), +m (lock-tickets) : : memory, cc);
 
for (;;) {
-   if (inc.tickets.head == inc.tickets.tail)
+   if (inc.head == inc.tail)
return;
cpu_relax();
-   inc.tickets.head = ACCESS_ONCE(lock-tickets.head);
+   inc.head = ACCESS_ONCE(lock-tickets.head);
}
barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
@@ -110,21 +107,17 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   unsigned inc = 1  TICKET_SHIFT;
-   __ticket_t tmp;
+   register struct __raw_tickets inc = { .tickets.tail = 1 };
 
asm volatile(LOCK_PREFIX xaddl %0, %1\n\t
-: +r (inc), +m (lock-slock)
+: +r (inc), +m (lock-tickets)
 : : memory, cc);
 
-   tmp = inc;
-   inc = TICKET_SHIFT;
-
for (;;) {
-   if ((__ticket_t)inc == tmp)
+   if (inc.head == inc.tail)
return;
cpu_relax();
-   tmp = ACCESS_ONCE(lock-tickets.head);
+   inc.head = ACCESS_ONCE(lock-tickets.head);
}
barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 03/20] x86/ticketlock: Use C for __ticket_spin_unlock

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

If we don't need to use a locked inc for unlock, then implement it in C.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h |   33 ++---
 1 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 6711d36..082990a 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -33,9 +33,23 @@
  * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
  * (PPro errata 66, 92)
  */
-# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+   if (sizeof(lock-tickets.head) == sizeof(u8))
+   asm (LOCK_PREFIX incb %0
+: +m (lock-tickets.head) : : memory);
+   else
+   asm (LOCK_PREFIX incw %0
+: +m (lock-tickets.head) : : memory);
+
+}
 #else
-# define UNLOCK_LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+   barrier();
+   lock-tickets.head++;
+   barrier();
+}
 #endif
 
 /*
@@ -93,14 +107,6 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
 
return tmp;
 }
-
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
-{
-   asm volatile(UNLOCK_LOCK_PREFIX incb %0
-: +m (lock-slock)
-:
-: memory, cc);
-}
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
@@ -144,15 +150,12 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
 
return tmp;
 }
+#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
-   asm volatile(UNLOCK_LOCK_PREFIX incw %0
-: +m (lock-slock)
-:
-: memory, cc);
+   __ticket_unlock_release(lock);
 }
-#endif
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 08/20] x86/ticketlock: collapse a layer of functions

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Now that the paravirtualization layer doesn't exist at the spinlock
level any more, we can collapse the __ticket_ functions into the arch_
functions.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h |   35 +--
 1 files changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 06815f3..a79dfee 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -110,7 +110,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock, __t
ticket_unlock_kick(lock, next);
 }
 
-static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
register struct __raw_tickets inc;
 
@@ -130,7 +130,7 @@ static __always_inline void __ticket_spin_lock(struct 
arch_spinlock *lock)
barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
+static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
union {
struct __raw_tickets tickets;
@@ -150,53 +150,28 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
return ret;
 }
 
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
+static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
__ticket_t next = lock-tickets.head + 1;
__ticket_unlock_release(lock);
__ticket_unlock_kick(lock, next);
 }
 
-static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
return !!(tmp.tail ^ tmp.head);
 }
 
-static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
+static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
return ((tmp.tail - tmp.head)  TICKET_MASK)  1;
 }
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_locked(lock);
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_contended(lock);
-}
 #define arch_spin_is_contended arch_spin_is_contended
 
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-   __ticket_spin_lock(lock);
-}
-
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-   return __ticket_spin_trylock(lock);
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-   __ticket_spin_unlock(lock);
-}
-
 static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
  unsigned long flags)
 {
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 06/20] x86/ticketlock: make __ticket_spin_trylock common

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Make trylock code common regardless of ticket size.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h   |   55 +---
 arch/x86/include/asm/spinlock_types.h |3 ++
 2 files changed, 19 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 4f9fa24..507f83c 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -100,48 +100,25 @@ static __always_inline void __ticket_spin_lock(struct 
arch_spinlock *lock)
barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
-#if (NR_CPUS  256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-   unsigned int tmp, new;
-
-   asm volatile(movzwl %2, %0\n\t
-cmpb %h0,%b0\n\t
-leal 0x100(% REG_PTR_MODE 0), %1\n\t
-jne 1f\n\t
-LOCK_PREFIX cmpxchgw %w1,%2\n\t
-1:
-sete %b1\n\t
-movzbl %b1,%0\n\t
-: =a (tmp), =q (new), +m (lock-slock)
-:
-: memory, cc);
-
-   return tmp;
-}
-#else
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-{
-   unsigned tmp;
-   unsigned new;
-
-   asm volatile(movl %2,%0\n\t
-movl %0,%1\n\t
-roll $16, %0\n\t
-cmpl %0,%1\n\t
-leal 0x0001(% REG_PTR_MODE 0), %1\n\t
-jne 1f\n\t
-LOCK_PREFIX cmpxchgl %1,%2\n\t
-1:
-sete %b1\n\t
-movzbl %b1,%0\n\t
-: =a (tmp), =q (new), +m (lock-slock)
-:
-: memory, cc);
-
-   return tmp;
+   union {
+   struct __raw_tickets tickets;
+   __ticketpair_t slock;
+   } tmp, new;
+   int ret;
+
+   tmp.tickets = ACCESS_ONCE(lock-tickets);
+   if (tmp.tickets.head != tmp.tickets.tail)
+   return 0;
+
+   new.slock = tmp.slock + (1  TICKET_SHIFT);
+
+   ret = cmpxchg(lock-ticketpair, tmp.slock, new.slock) == tmp.slock;
+   barrier();  /* just make nothing creeps before lock is 
claimed */
+
+   return ret;
 }
-#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 4582640..48dafc3 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -9,8 +9,10 @@
 
 #if (CONFIG_NR_CPUS  256)
 typedef u8  __ticket_t;
+typedef u16 __ticketpair_t;
 #else
 typedef u16 __ticket_t;
+typedef u32 __ticketpair_t;
 #endif
 
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
@@ -19,6 +21,7 @@ typedef u16 __ticket_t;
 typedef struct arch_spinlock {
union {
unsigned int slock;
+   __ticketpair_t ticketpair;
struct __raw_tickets {
__ticket_t head, tail;
} tickets;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 11/20] x86/pvticketlock: use callee-save for lock_spinning

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path.  To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/paravirt.h   |2 +-
 arch/x86/include/asm/paravirt_types.h |2 +-
 arch/x86/kernel/paravirt-spinlocks.c  |2 +-
 arch/x86/xen/spinlock.c   |3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 79c77e5..dbd5914 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -724,7 +724,7 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 static inline void __ticket_lock_spinning(struct arch_spinlock *lock, unsigned 
ticket)
 {
-   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
+   PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
 static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned 
ticket)
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 968ee31..9ad6d088 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -316,7 +316,7 @@ struct pv_mmu_ops {
 
 struct arch_spinlock;
 struct pv_lock_ops {
-   void (*lock_spinning)(struct arch_spinlock *lock, unsigned ticket);
+   struct paravirt_callee_save lock_spinning;
void (*unlock_kick)(struct arch_spinlock *lock, unsigned ticket);
 };
 
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index c2e010e..4251c1d 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -9,7 +9,7 @@
 
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-   .lock_spinning = paravirt_nop,
+   .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
.unlock_kick = paravirt_nop,
 #endif
 };
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index b8f09d5..2bd51d8 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -141,6 +141,7 @@ out:
w-lock = NULL;
spin_time_accum_blocked(start);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
 
 static void xen_unlock_kick(struct arch_spinlock *lock, unsigned next)
 {
@@ -193,7 +194,7 @@ void xen_uninit_lock_cpu(int cpu)
 
 void __init xen_init_spinlocks(void)
 {
-   pv_lock_ops.lock_spinning = xen_lock_spinning;
+   pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 00/20] x86: ticket lock rewrite and paravirtualization

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Hi all,

This series does two major things:

1. It converts the bulk of the implementation to C, and makes the
   small ticket and large ticket code common.  Only the actual
   size-dependent asm instructions are specific to the ticket size.
   The resulting generated asm is very similar to the current
   hand-written code.

   This results in a very large reduction in lines of code.

2. Get rid of pv spinlocks, and replace them with pv ticket locks.
   Currently we have the notion of pv spinlocks where a pv-ops
   backend can completely replace the spinlock implementation with a
   new one.  This has two disadvantages:
- its a completely separate spinlock implementation, and
- there's an extra layer of indirection in front of every
  spinlock operation.

   To replace this, this series introduces the notion of pv ticket
   locks.  In this design, the ticket lock fastpath is the standard
   ticketlock algorithm.  However, after an iteration threshold it
   falls into a slow path which invokes a pv-op to block the spinning
   CPU.  Conversely, on unlock it does the normal unlock, and then
   checks to see if it needs to do a special kick to wake the next
   CPU.

   The net result is that the pv-op calls are restricted to the slow
   paths, and the normal fast-paths are largely unaffected.

   There are still some overheads, however:
   - When locking, there's some extra tests to count the spin iterations.
 There are no extra instructions in the uncontended case though.

   - When unlocking, there are two ways to detect when it is necessary
 to kick a blocked CPU:

  - with an unmodified struct spinlock, it can check to see if
head == tail after unlock; if not, then there's someone else
trying to lock, and we can do a kick.  Unfortunately this
generates very high level of redundant kicks, because the
waiting CPU might not have blocked yet (which is the common
case)

  - With a struct spinlock modified to include a waiters field,
to keep count of blocked CPUs, which is a much tighter test.
But it does increase the size of each spinlock by 50%
(doubled with padding).

The series is very fine-grained, and I've left a lightly cleaned up
history of the various options I evaluated, since they're not all
obvious.

Jeremy Fitzhardinge (20):
  x86/ticketlock: clean up types and accessors
  x86/ticketlock: convert spin loop to C
  x86/ticketlock: Use C for __ticket_spin_unlock
  x86/ticketlock: make large and small ticket versions of spin_lock the
same
  x86/ticketlock: make __ticket_spin_lock common
  x86/ticketlock: make __ticket_spin_trylock common
  x86/spinlocks: replace pv spinlocks with pv ticketlocks
  x86/ticketlock: collapse a layer of functions
  xen/pvticketlock: Xen implementation for PV ticket locks
  x86/pvticketlock: keep count of blocked cpus
  x86/pvticketlock: use callee-save for lock_spinning
  x86/pvticketlock: use callee-save for unlock_kick as well
  x86/pvticketlock: make sure unlock is seen by everyone before
checking waiters
  x86/ticketlock: loosen ordering restraints on unlock
  x86/ticketlock: prevent compiler reordering into locked region
  x86/ticketlock: don't inline _spin_unlock when using paravirt
spinlocks
  x86/ticketlock: clarify barrier in arch_spin_lock
  x86/ticketlock: remove .slock
  x86/ticketlocks: use overlapping read to eliminate mb()
  x86/ticketlock: rename ticketpair to head_tail

 arch/x86/Kconfig  |3 +
 arch/x86/include/asm/paravirt.h   |   30 +---
 arch/x86/include/asm/paravirt_types.h |8 +-
 arch/x86/include/asm/spinlock.h   |  250 +++--
 arch/x86/include/asm/spinlock_types.h |   41 +-
 arch/x86/kernel/paravirt-spinlocks.c  |   15 +--
 arch/x86/xen/spinlock.c   |  282 +
 kernel/Kconfig.locks  |2 +-
 8 files changed, 221 insertions(+), 410 deletions(-)

-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 12/20] x86/pvticketlock: use callee-save for unlock_kick as well

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

The unlock code is typically inlined throughout the kernel, so its useful
to make sure there's minimal register pressure overhead from the presence
of the unlock_tick pvop call.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/paravirt.h   |2 +-
 arch/x86/include/asm/paravirt_types.h |2 +-
 arch/x86/kernel/paravirt-spinlocks.c  |2 +-
 arch/x86/xen/spinlock.c   |3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index dbd5914..b6ff89a 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -729,7 +729,7 @@ static inline void __ticket_lock_spinning(struct 
arch_spinlock *lock, unsigned t
 
 static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned 
ticket)
 {
-   PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
+   PVOP_VCALLEE2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
 #endif
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 9ad6d088..bb02831 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -317,7 +317,7 @@ struct pv_mmu_ops {
 struct arch_spinlock;
 struct pv_lock_ops {
struct paravirt_callee_save lock_spinning;
-   void (*unlock_kick)(struct arch_spinlock *lock, unsigned ticket);
+   struct paravirt_callee_save unlock_kick;
 };
 
 /* This contains all the paravirt structures: we get a convenient
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index 4251c1d..efad219 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -10,7 +10,7 @@
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
.lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
-   .unlock_kick = paravirt_nop,
+   .unlock_kick = __PV_IS_CALLEE_SAVE(paravirt_nop),
 #endif
 };
 EXPORT_SYMBOL(pv_lock_ops);
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 2bd51d8..bf2f409 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -159,6 +159,7 @@ static void xen_unlock_kick(struct arch_spinlock *lock, 
unsigned next)
}
}
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_unlock_kick);
 
 static irqreturn_t dummy_handler(int irq, void *dev_id)
 {
@@ -195,7 +196,7 @@ void xen_uninit_lock_cpu(int cpu)
 void __init xen_init_spinlocks(void)
 {
pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
-   pv_lock_ops.unlock_kick = xen_unlock_kick;
+   pv_lock_ops.unlock_kick = PV_CALLEE_SAVE(xen_unlock_kick);
 }
 
 #ifdef CONFIG_XEN_DEBUG_FS
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 17/20] x86/ticketlock: clarify barrier in arch_spin_lock

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Make sure the barrier in arch_spin_lock is definitely in the code path.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 158b330..c7455e1 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -151,13 +151,13 @@ static __always_inline void arch_spin_lock(struct 
arch_spinlock *lock)
 
do {
if (inc.head == inc.tail)
-   return;
+   goto out;
cpu_relax();
inc.head = ACCESS_ONCE(lock-tickets.head);
} while (--count);
__ticket_lock_spinning(lock, inc.tail);
}
-   barrier();  /* make sure nothing creeps before the lock is 
taken */
+out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
 static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 16/20] x86/ticketlock: don't inline _spin_unlock when using paravirt spinlocks

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

The code size expands somewhat, and its probably better to just call
a function rather than inline it.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/Kconfig |3 +++
 kernel/Kconfig.locks |2 +-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cea0cd9..e67bf49 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -577,6 +577,9 @@ config PARAVIRT_SPINLOCKS
 
  If you are unsure how to answer this question, answer N.
 
+config ARCH_NOINLINE_SPIN_UNLOCK
+   def_bool PARAVIRT_SPINLOCKS
+
 config PARAVIRT_CLOCK
bool
 
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index 88c92fb..3216c22 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -125,7 +125,7 @@ config INLINE_SPIN_LOCK_IRQSAVE
 ARCH_INLINE_SPIN_LOCK_IRQSAVE
 
 config INLINE_SPIN_UNLOCK
-   def_bool !DEBUG_SPINLOCK  (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK)
+   def_bool !DEBUG_SPINLOCK  (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK)  
!ARCH_NOINLINE_SPIN_UNLOCK
 
 config INLINE_SPIN_UNLOCK_BH
def_bool !DEBUG_SPINLOCK  ARCH_INLINE_SPIN_UNLOCK_BH
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 15/20] x86/ticketlock: prevent compiler reordering into locked region

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Add a barrier() at the end of __raw_spin_unlock() to prevent instructions
from after the locked region from being reordered into it.  In theory doing
so shouldn't cause any problems, but in practice, the system locks up
under load...

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index d6de5e7..158b330 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -189,6 +189,8 @@ static __always_inline void 
arch_spin_unlock(arch_spinlock_t *lock)
next = lock-tickets.head + 1;
__ticket_unlock_release(lock);
__ticket_unlock_kick(lock, next);
+
+   barrier();  /* prevent reordering into locked region */
 }
 
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 07/20] x86/spinlocks: replace pv spinlocks with pv ticketlocks

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Rather than outright replacing the entire spinlock implementation in
order to paravirtualize it, keep the ticket lock implementation but add
a couple of pvops hooks on the slow patch (long spin on lock, unlocking
a contended lock).

Ticket locks have a number of nice properties, but they also have some
surprising behaviours in virtual environments.  They enforce a strict
FIFO ordering on cpus trying to take a lock; however, if the hypervisor
scheduler does not schedule the cpus in the correct order, the system can
waste a huge amount of time spinning until the next cpu can take the lock.

(See Thomas Friebel's talk Prevent Guests from Spinning Around
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

To address this, we add two hooks:
 - __ticket_spin_lock which is called after the cpu has been
   spinning on the lock for a significant number of iterations but has
   failed to take the lock (presumably because the cpu holding the lock
   has been descheduled).  The lock_spinning pvop is expected to block
   the cpu until it has been kicked by the current lock holder.
 - __ticket_spin_unlock, which on releasing a contended lock
   (there are more cpus with tail tickets), it looks to see if the next
   cpu is blocked and wakes it if so.

When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
functions causes all the extra code to go away.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/paravirt.h   |   30 +++---
 arch/x86/include/asm/paravirt_types.h |8 +
 arch/x86/include/asm/spinlock.h   |   44 +++--
 arch/x86/kernel/paravirt-spinlocks.c  |   15 +-
 arch/x86/xen/spinlock.c   |7 -
 5 files changed, 50 insertions(+), 54 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 5653f43..79c77e5 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -722,36 +722,14 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP)  defined(CONFIG_PARAVIRT_SPINLOCKS)
 
-static inline int arch_spin_is_locked(struct arch_spinlock *lock)
+static inline void __ticket_lock_spinning(struct arch_spinlock *lock, unsigned 
ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
+   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static inline int arch_spin_is_contended(struct arch_spinlock *lock)
+static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned 
ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
-}
-#define arch_spin_is_contended arch_spin_is_contended
-
-static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
-}
-
-static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
- unsigned long flags)
-{
-   PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
-}
-
-static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
-{
-   return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
-}
-
-static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
+   PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
 #endif
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index db9ef55..968ee31 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -316,12 +316,8 @@ struct pv_mmu_ops {
 
 struct arch_spinlock;
 struct pv_lock_ops {
-   int (*spin_is_locked)(struct arch_spinlock *lock);
-   int (*spin_is_contended)(struct arch_spinlock *lock);
-   void (*spin_lock)(struct arch_spinlock *lock);
-   void (*spin_lock_flags)(struct arch_spinlock *lock, unsigned long 
flags);
-   int (*spin_trylock)(struct arch_spinlock *lock);
-   void (*spin_unlock)(struct arch_spinlock *lock);
+   void (*lock_spinning)(struct arch_spinlock *lock, unsigned ticket);
+   void (*unlock_kick)(struct arch_spinlock *lock, unsigned ticket);
 };
 
 /* This contains all the paravirt structures: we get a convenient
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 507f83c..06815f3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -52,6 +52,21 @@ static __always_inline void __ticket_unlock_release(struct 
arch_spinlock *lock)
 }
 #endif
 
+/* How long a lock should spin before we consider blocking */
+#define SPIN_THRESHOLD (1  11)
+
+#ifndef CONFIG_PARAVIRT_SPINLOCKS
+
+static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, 
unsigned ticket)
+{
+}
+
+static __always_inline void ticket_unlock_kick

[PATCH 02/20] x86/ticketlock: convert spin loop to C

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

The inner loop of __ticket_spin_lock isn't doing anything very special,
so reimplement it in C.

For the 8 bit ticket lock variant, we use a register union to get direct
access to the lower and upper bytes in the tickets, but unfortunately gcc
won't generate a direct comparison between the two halves of the register,
so the generated asm isn't quite as pretty as the hand-coded version.
However benchmarking shows that this is actually a small improvement in
runtime performance on some benchmarks, and never a slowdown.

We also need to make sure there's a barrier at the end of the lock loop
to make sure that the compiler doesn't move any instructions from within
the locked region into the region where we don't yet own the lock.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h |   58 +++---
 1 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index d6d5784..6711d36 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -58,21 +58,21 @@
 #if (NR_CPUS  256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   unsigned short inc = 1  TICKET_SHIFT;
-
-   asm volatile (
-   LOCK_PREFIX xaddw %w0, %1\n
-   1:\t
-   cmpb %h0, %b0\n\t
-   je 2f\n\t
-   rep ; nop\n\t
-   movb %1, %b0\n\t
-   /* don't need lfence here, because loads are in-order */
-   jmp 1b\n
-   2:
-   : +Q (inc), +m (lock-slock)
-   :
-   : memory, cc);
+   register union {
+   struct __raw_tickets tickets;
+   unsigned short slock;
+   } inc = { .slock = 1  TICKET_SHIFT };
+
+   asm volatile (LOCK_PREFIX xaddw %w0, %1\n
+ : +Q (inc), +m (lock-slock) : : memory, cc);
+
+   for (;;) {
+   if (inc.tickets.head == inc.tickets.tail)
+   return;
+   cpu_relax();
+   inc.tickets.head = ACCESS_ONCE(lock-tickets.head);
+   }
+   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
@@ -105,22 +105,22 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
unsigned inc = 1  TICKET_SHIFT;
-   unsigned tmp;
+   __ticket_t tmp;
 
-   asm volatile(LOCK_PREFIX xaddl %0, %1\n
-movzwl %w0, %2\n\t
-shrl $16, %0\n\t
-1:\t
-cmpl %0, %2\n\t
-je 2f\n\t
-rep ; nop\n\t
-movzwl %1, %2\n\t
-/* don't need lfence here, because loads are in-order */
-jmp 1b\n
-2:
-: +r (inc), +m (lock-slock), =r (tmp)
-:
-: memory, cc);
+   asm volatile(LOCK_PREFIX xaddl %0, %1\n\t
+: +r (inc), +m (lock-slock)
+: : memory, cc);
+
+   tmp = inc;
+   inc = TICKET_SHIFT;
+
+   for (;;) {
+   if ((__ticket_t)inc == tmp)
+   return;
+   cpu_relax();
+   tmp = ACCESS_ONCE(lock-tickets.head);
+   }
+   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 14/20] x86/ticketlock: loosen ordering restraints on unlock

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

It's only necessary to prevent the compiler from reordering code out of
the locked region past the unlock.  Putting too many barriers in prevents
the compiler from generating the best code when PV spinlocks are enabled.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index b9a1aae..d6de5e7 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -46,9 +46,7 @@ static __always_inline void __ticket_unlock_release(struct 
arch_spinlock *lock)
 #else
 static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
 {
-   barrier();
lock-tickets.head++;
-   barrier();
 }
 #endif
 
@@ -184,7 +182,11 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-   __ticket_t next = lock-tickets.head + 1;
+   __ticket_t next;
+
+   barrier();  /* prevent reordering out of locked region */
+
+   next = lock-tickets.head + 1;
__ticket_unlock_release(lock);
__ticket_unlock_kick(lock, next);
 }
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 20/20] x86/ticketlock: rename ticketpair to head_tail

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Make it clearer what fields head_tail is actually overlapping with.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h   |2 +-
 arch/x86/include/asm/spinlock_types.h |4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index a16b6e4..6850884 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -174,7 +174,7 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
 
new.slock = tmp.slock + (1  TICKET_SHIFT);
 
-   ret = cmpxchg(lock-ticketpair, tmp.slock, new.slock) == tmp.slock;
+   ret = cmpxchg(lock-head_tail, tmp.slock, new.slock) == tmp.slock;
barrier();  /* just make nothing creeps before lock is 
claimed */
 
return ret;
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 307ef0b..df031ec 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -24,7 +24,7 @@ typedef struct arch_spinlock {
struct {
__ticket_t waiting;
union {
-   __ticketpair_t ticketpair;
+   __ticketpair_t head_tail;
struct __raw_tickets {
__ticket_t head, tail;
} tickets;
@@ -38,7 +38,7 @@ typedef struct arch_spinlock {
 #else
 typedef struct arch_spinlock {
union {
-   __ticketpair_t ticketpair;
+   __ticketpair_t head_tail;
struct __raw_tickets {
__ticket_t head, tail;
} tickets;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 10/20] x86/pvticketlock: keep count of blocked cpus

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

When a CPU blocks by calling into __ticket_lock_spinning, keep a count in
the spinlock.  This allows __ticket_lock_kick to more accurately tell
whether it has any work to do (in many cases, a spinlock may be contended,
but none of the waiters have gone into blocking).

This adds two locked instructions to the spinlock slow path (once the
lock has already spun for SPIN_THRESHOLD iterations), and adds another
one or two bytes to struct arch_spinlock.

We need to make sure we increment the waiting counter before doing the
last-chance check of the lock to see if we picked it up in the meantime.
If we don't then there's a potential deadlock:

lock holder lock waiter

clear event channel
check lock for pickup (did not)
release lock
check waiting counter
(=0, no kick)
add waiting counter
block (=deadlock)

Moving the add waiting counter earler avoids the deadlock:

lock holder lock waiter

clear event channel
add waiting counter
check lock for pickup (did not)
release lock
check waiting counter
(=1, kick)
block (and immediately wake)

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h   |   27 ++-
 arch/x86/include/asm/spinlock_types.h |3 +++
 arch/x86/xen/spinlock.c   |4 
 3 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index a79dfee..3deabca 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -65,6 +65,31 @@ static __always_inline void ticket_unlock_kick(struct 
arch_spinlock *lock, u
 {
 }
 
+static __always_inline bool __ticket_lock_waiters(const struct arch_spinlock 
*lock)
+{
+   return false;
+}
+#else
+static inline void __ticket_add_waiting(struct arch_spinlock *lock)
+{
+   if (sizeof(lock-waiting) == sizeof(u8))
+   asm (LOCK_PREFIX addb $1, %0 : +m (lock-waiting) : : 
memory);
+   else
+   asm (LOCK_PREFIX addw $1, %0 : +m (lock-waiting) : : 
memory);
+}
+
+static inline void __ticket_sub_waiting(struct arch_spinlock *lock)
+{
+   if (sizeof(lock-waiting) == sizeof(u8))
+   asm (LOCK_PREFIX subb $1, %0 : +m (lock-waiting) : : 
memory);
+   else
+   asm (LOCK_PREFIX subw $1, %0 : +m (lock-waiting) : : 
memory);
+}
+
+static __always_inline bool __ticket_lock_waiters(const struct arch_spinlock 
*lock)
+{
+   return ACCESS_ONCE(lock-waiting) != 0;
+}
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
 
 /*
@@ -106,7 +131,7 @@ static __always_inline struct __raw_tickets 
__ticket_spin_claim(struct arch_spin
  */
 static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
 {
-   if (unlikely(lock-tickets.tail != next))
+   if (unlikely(__ticket_lock_waiters(lock)))
ticket_unlock_kick(lock, next);
 }
 
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 48dafc3..b396ed5 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -26,6 +26,9 @@ typedef struct arch_spinlock {
__ticket_t head, tail;
} tickets;
};
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+   __ticket_t waiting;
+#endif
 } arch_spinlock_t;
 
 #define __ARCH_SPIN_LOCK_UNLOCKED  { { .slock = 0 } }
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 5feb897..b8f09d5 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -119,6 +119,8 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
unsigned want)
/* Only check lock once pending cleared */
barrier();
 
+   __ticket_add_waiting(lock);
+
/* check again make sure it didn't become free while
   we weren't looking  */
if (ACCESS_ONCE(lock-tickets.head) == want) {
@@ -133,6 +135,8 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
unsigned want)
kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 
 out:
+   __ticket_sub_waiting(lock);
+
cpumask_clear_cpu(cpu, waiting_cpus);
w-lock = NULL;
spin_time_accum_blocked(start);
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 19/20] x86/ticketlocks: use overlapping read to eliminate mb()

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

When reading the 'waiting' counter, use a longer-than-necessary read
which also overlaps 'head'.  This read is guaranteed to be in-order
with respect to and unlock writes to 'head', thereby eliminating the
need for an explicit mb() to enforce the read-after-write ordering.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h   |   14 +++---
 arch/x86/include/asm/spinlock_types.h |   24 
 2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index c7455e1..a16b6e4 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -86,14 +86,14 @@ static inline void __ticket_sub_waiting(struct 
arch_spinlock *lock)
 
 static __always_inline bool __ticket_lock_waiters(const struct arch_spinlock 
*lock)
 {
-   /*
-* Make sure everyone sees the unlock write before we check the
-* waiting count.  The processor ordering doesn't guarantee this
-* because they're different memory locations.
+   /* 
+* lock-waiting_head is a union element which aliases both
+* 'waiting' and 'head'.  Since the memory access overlaps
+* 'head', the read is forced to be in-order with respect to
+* unlock writes to 'head', eliminating the need for an
+* explicit mb(). (Intel memory ordering rules.)
 */
-   mb();
-
-   return ACCESS_ONCE(lock-waiting) != 0;
+   return ((__ticket_t)ACCESS_ONCE(lock-waiting_head)) != 0;
 }
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
 
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index def8010..307ef0b 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -18,6 +18,24 @@ typedef u32 __ticketpair_t;
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
 #define TICKET_MASK((1  TICKET_SHIFT) - 1)
 
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+typedef struct arch_spinlock {
+   union {
+   struct {
+   __ticket_t waiting;
+   union {
+   __ticketpair_t ticketpair;
+   struct __raw_tickets {
+   __ticket_t head, tail;
+   } tickets;
+   };
+   };
+   __ticketpair_t waiting_head; /* aliases waiting and head */
+   };
+} arch_spinlock_t;
+
+#define __ARCH_SPIN_LOCK_UNLOCKED  { { { 0, { 0 } } } }
+#else
 typedef struct arch_spinlock {
union {
__ticketpair_t ticketpair;
@@ -25,12 +43,10 @@ typedef struct arch_spinlock {
__ticket_t head, tail;
} tickets;
};
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-   __ticket_t waiting;
-#endif
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED  { { .tickets = {0, 0} } }
+#define __ARCH_SPIN_LOCK_UNLOCKED  { { 0 } }
+#endif
 
 typedef struct {
unsigned int lock;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 09/20] xen/pvticketlock: Xen implementation for PV ticket locks

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Replace the old Xen implementation of PV spinlocks with and implementation
of xen_lock_spinning and xen_unlock_kick.

xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
adds itself to the waiting_cpus set, and blocks on an event channel
until the channel becomes pending.

xen_unlock_kick searches the cpus in waiting_cpus looking for the one
which next wants this lock with the next ticket, if any.  If found,
it kicks it by making its event channel pending, which wakes it up.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/xen/spinlock.c |  281 ++-
 1 files changed, 36 insertions(+), 245 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 7e9eb52..5feb897 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -19,32 +19,21 @@
 #ifdef CONFIG_XEN_DEBUG_FS
 static struct xen_spinlock_stats
 {
-   u64 taken;
u32 taken_slow;
-   u32 taken_slow_nested;
u32 taken_slow_pickup;
u32 taken_slow_spurious;
-   u32 taken_slow_irqenable;
 
-   u64 released;
u32 released_slow;
u32 released_slow_kicked;
 
 #define HISTO_BUCKETS  30
-   u32 histo_spin_total[HISTO_BUCKETS+1];
-   u32 histo_spin_spinning[HISTO_BUCKETS+1];
u32 histo_spin_blocked[HISTO_BUCKETS+1];
 
-   u64 time_total;
-   u64 time_spinning;
u64 time_blocked;
 } spinlock_stats;
 
 static u8 zero_stats;
 
-static unsigned lock_timeout = 1  10;
-#define TIMEOUT lock_timeout
-
 static inline void check_zero(void)
 {
if (unlikely(zero_stats)) {
@@ -73,22 +62,6 @@ static void __spin_time_accum(u64 delta, u32 *array)
array[HISTO_BUCKETS]++;
 }
 
-static inline void spin_time_accum_spinning(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_spinning);
-   spinlock_stats.time_spinning += delta;
-}
-
-static inline void spin_time_accum_total(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_total);
-   spinlock_stats.time_total += delta;
-}
-
 static inline void spin_time_accum_blocked(u64 start)
 {
u32 delta = xen_clocksource_read() - start;
@@ -105,214 +78,76 @@ static inline u64 spin_time_start(void)
return 0;
 }
 
-static inline void spin_time_accum_total(u64 start)
-{
-}
-static inline void spin_time_accum_spinning(u64 start)
-{
-}
 static inline void spin_time_accum_blocked(u64 start)
 {
 }
 #endif  /* CONFIG_XEN_DEBUG_FS */
 
-struct xen_spinlock {
-   unsigned char lock; /* 0 - free; 1 - locked */
-   unsigned short spinners;/* count of waiting cpus */
+struct xen_lock_waiting {
+   struct arch_spinlock *lock;
+   __ticket_t want;
 };
 
 static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
+static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting);
+static cpumask_t waiting_cpus;
 
-#if 0
-static int xen_spin_is_locked(struct arch_spinlock *lock)
-{
-   struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-
-   return xl-lock != 0;
-}
-
-static int xen_spin_is_contended(struct arch_spinlock *lock)
-{
-   struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-
-   /* Not strictly true; this is only the count of contended
-  lock-takers entering the slow path. */
-   return xl-spinners != 0;
-}
-
-static int xen_spin_trylock(struct arch_spinlock *lock)
-{
-   struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-   u8 old = 1;
-
-   asm(xchgb %b0,%1
-   : +q (old), +m (xl-lock) : : memory);
-
-   return old == 0;
-}
-
-static DEFINE_PER_CPU(struct xen_spinlock *, lock_spinners);
-
-/*
- * Mark a cpu as interested in a lock.  Returns the CPU's previous
- * lock of interest, in case we got preempted by an interrupt.
- */
-static inline struct xen_spinlock *spinning_lock(struct xen_spinlock *xl)
-{
-   struct xen_spinlock *prev;
-
-   prev = __get_cpu_var(lock_spinners);
-   __get_cpu_var(lock_spinners) = xl;
-
-   wmb();  /* set lock of interest before count */
-
-   asm(LOCK_PREFIX  incw %0
-   : +m (xl-spinners) : : memory);
-
-   return prev;
-}
-
-/*
- * Mark a cpu as no longer interested in a lock.  Restores previous
- * lock of interest (NULL for none).
- */
-static inline void unspinning_lock(struct xen_spinlock *xl, struct 
xen_spinlock *prev)
-{
-   asm(LOCK_PREFIX  decw %0
-   : +m (xl-spinners) : : memory);
-   wmb();  /* decrement count before restoring lock */
-   __get_cpu_var(lock_spinners) = prev;
-}
-
-static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool 
irq_enable)
+static void xen_lock_spinning(struct arch_spinlock *lock, unsigned want)
 {
-   struct xen_spinlock *xl

[PATCH 05/20] x86/ticketlock: make __ticket_spin_lock common

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Aside from the particular form of the xadd instruction, they're identical.
So factor out the xadd and use common code for the rest.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock.h |   42 ++
 1 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 7586d7a..4f9fa24 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -69,13 +69,27 @@ static __always_inline void __ticket_unlock_release(struct 
arch_spinlock *lock)
  * save some instructions and make the code more elegant. There really isn't
  * much between them in performance though, especially as locks are out of 
line.
  */
-#if (NR_CPUS  256)
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
+static __always_inline struct __raw_tickets __ticket_spin_claim(struct 
arch_spinlock *lock)
 {
-   register struct __raw_tickets inc = { .tail = 1 };
+   register struct __raw_tickets tickets = { .tail = 1 };
+
+   if (sizeof(lock-tickets.head) == sizeof(u8))
+   asm volatile (LOCK_PREFIX xaddw %w0, %1\n
+ : +r (tickets), +m (lock-tickets)
+ : : memory, cc);
+   else
+   asm volatile (LOCK_PREFIX xaddl %0, %1\n
+: +r (tickets), +m (lock-tickets)
+: : memory, cc);
 
-   asm volatile (LOCK_PREFIX xaddw %w0, %1\n
- : +r (inc), +m (lock-tickets) : : memory, cc);
+   return tickets;
+}
+
+static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+{
+   register struct __raw_tickets inc;
+
+   inc = __ticket_spin_claim(lock);
 
for (;;) {
if (inc.head == inc.tail)
@@ -86,6 +100,7 @@ static __always_inline void 
__ticket_spin_lock(arch_spinlock_t *lock)
barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
+#if (NR_CPUS  256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
unsigned int tmp, new;
@@ -105,23 +120,6 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
return tmp;
 }
 #else
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
-{
-   register struct __raw_tickets inc = { .tickets.tail = 1 };
-
-   asm volatile(LOCK_PREFIX xaddl %0, %1\n\t
-: +r (inc), +m (lock-tickets)
-: : memory, cc);
-
-   for (;;) {
-   if (inc.head == inc.tail)
-   return;
-   cpu_relax();
-   inc.head = ACCESS_ONCE(lock-tickets.head);
-   }
-   barrier();  /* make sure nothing creeps before the lock is 
taken */
-}
-
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
unsigned tmp;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 02/20] x86/ticketlock: convert spin loop to C

2010-11-03 Thread Jeremy Fitzhardinge
On 11/03/2010 11:11 AM, Eric Dumazet wrote:
 Le mercredi 03 novembre 2010 à 10:59 -0400, Jeremy Fitzhardinge a
 écrit :
 From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

 The inner loop of __ticket_spin_lock isn't doing anything very special,
 so reimplement it in C.

 For the 8 bit ticket lock variant, we use a register union to get direct
 access to the lower and upper bytes in the tickets, but unfortunately gcc
 won't generate a direct comparison between the two halves of the register,
 so the generated asm isn't quite as pretty as the hand-coded version.
 However benchmarking shows that this is actually a small improvement in
 runtime performance on some benchmarks, and never a slowdown.

 We also need to make sure there's a barrier at the end of the lock loop
 to make sure that the compiler doesn't move any instructions from within
 the locked region into the region where we don't yet own the lock.

 Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
 ---
  arch/x86/include/asm/spinlock.h |   58 
 +++---
  1 files changed, 29 insertions(+), 29 deletions(-)

 diff --git a/arch/x86/include/asm/spinlock.h 
 b/arch/x86/include/asm/spinlock.h
 index d6d5784..6711d36 100644
 --- a/arch/x86/include/asm/spinlock.h
 +++ b/arch/x86/include/asm/spinlock.h
 @@ -58,21 +58,21 @@
  #if (NR_CPUS  256)
  static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
  {
 -unsigned short inc = 1  TICKET_SHIFT;
 -
 -asm volatile (
 -LOCK_PREFIX xaddw %w0, %1\n
 -1:\t
 -cmpb %h0, %b0\n\t
 -je 2f\n\t
 -rep ; nop\n\t
 -movb %1, %b0\n\t
 -/* don't need lfence here, because loads are in-order */
 -jmp 1b\n
 -2:
 -: +Q (inc), +m (lock-slock)
 -:
 -: memory, cc);
 +register union {
 +struct __raw_tickets tickets;
 +unsigned short slock;
 +} inc = { .slock = 1  TICKET_SHIFT };
 +
 +asm volatile (LOCK_PREFIX xaddw %w0, %1\n
 +  : +Q (inc), +m (lock-slock) : : memory, cc);
 +
 +for (;;) {
 +if (inc.tickets.head == inc.tickets.tail)
 +return;
 +cpu_relax();
 +inc.tickets.head = ACCESS_ONCE(lock-tickets.head);
 +}
 +barrier();  /* make sure nothing creeps before the lock is 
 taken */
 Isnt this barrier() never reached ?

Sorry, a later patch makes this clearer.  I should have folded it in.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

[PATCH 18/20] x86/ticketlock: remove .slock

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
---
 arch/x86/include/asm/spinlock_types.h |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index b396ed5..def8010 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -20,7 +20,6 @@ typedef u32 __ticketpair_t;
 
 typedef struct arch_spinlock {
union {
-   unsigned int slock;
__ticketpair_t ticketpair;
struct __raw_tickets {
__ticket_t head, tail;
@@ -31,7 +30,7 @@ typedef struct arch_spinlock {
 #endif
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED  { { .slock = 0 } }
+#define __ARCH_SPIN_LOCK_UNLOCKED  { { .tickets = {0, 0} } }
 
 typedef struct {
unsigned int lock;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 03/20] x86/ticketlock: Use C for __ticket_spin_unlock

2010-11-03 Thread Jeremy Fitzhardinge
On 11/03/2010 11:13 AM, Eric Dumazet wrote:
 Le mercredi 03 novembre 2010 à 10:59 -0400, Jeremy Fitzhardinge a
 écrit :
 From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com

 If we don't need to use a locked inc for unlock, then implement it in C.

 Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
 ---
  arch/x86/include/asm/spinlock.h |   33 ++---
  1 files changed, 18 insertions(+), 15 deletions(-)

 diff --git a/arch/x86/include/asm/spinlock.h 
 b/arch/x86/include/asm/spinlock.h
 index 6711d36..082990a 100644
 --- a/arch/x86/include/asm/spinlock.h
 +++ b/arch/x86/include/asm/spinlock.h
 @@ -33,9 +33,23 @@
   * On PPro SMP or if we are using OOSTORE, we use a locked operation to 
 unlock
   * (PPro errata 66, 92)
   */
 -# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
 +static __always_inline void __ticket_unlock_release(struct arch_spinlock 
 *lock)
 +{
 +if (sizeof(lock-tickets.head) == sizeof(u8))
 +asm (LOCK_PREFIX incb %0
 + : +m (lock-tickets.head) : : memory);
 +else
 +asm (LOCK_PREFIX incw %0
 + : +m (lock-tickets.head) : : memory);
 +
 +}
  #else
 -# define UNLOCK_LOCK_PREFIX
 +static __always_inline void __ticket_unlock_release(struct arch_spinlock 
 *lock)
 +{
 +barrier();
 technically speaking, it should be : smp_wmb()

Perhaps.  In practise it won't make a difference because it is defined
as barrier() unless OOSTORE is defined, in which case we need to do a
locked increment anyway.

Thanks,
J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] xen: xenfs: privcmd: check put_user() return code

2010-10-29 Thread Jeremy Fitzhardinge
 On 10/28/2010 04:39 AM, Vasiliy Kulikov wrote:
 put_user() may fail.  In this case propagate error code from
 privcmd_ioctl_mmap_batch().

Thanks for looking at this.  I'm in two minds about this; the existing
logic is such that these put_users can only fail if something else has
already failed and its returning an error.  I guess it would be useful
to get an EFAULT if you've got a problem writing back the results.

IanC, any opinion?

Thanks,
J

 Signed-off-by: Vasiliy Kulikov sego...@gmail.com
 ---
  Compile tested.

  drivers/xen/xenfs/privcmd.c |8 ++--
  1 files changed, 2 insertions(+), 6 deletions(-)

 diff --git a/drivers/xen/xenfs/privcmd.c b/drivers/xen/xenfs/privcmd.c
 index f80be7f..2eb04c8 100644
 --- a/drivers/xen/xenfs/privcmd.c
 +++ b/drivers/xen/xenfs/privcmd.c
 @@ -266,9 +266,7 @@ static int mmap_return_errors(void *data, void *state)
   xen_pfn_t *mfnp = data;
   struct mmap_batch_state *st = state;
  
 - put_user(*mfnp, st-user++);
 -
 - return 0;
 + return put_user(*mfnp, st-user++);
  }
  
  static struct vm_operations_struct privcmd_vm_ops;
 @@ -323,10 +321,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata)
   up_write(mm-mmap_sem);
  
   if (state.err  0) {
 - ret = 0;
 -
   state.user = m.arr;
 - traverse_pages(m.num, sizeof(xen_pfn_t),
 + ret = traverse_pages(m.num, sizeof(xen_pfn_t),
  pagelist,
  mmap_return_errors, state);
   }

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] xen: xenfs: privcmd: check put_user() return code

2010-10-29 Thread Jeremy Fitzhardinge
 On 10/29/2010 10:44 AM, Ian Campbell wrote:
 On Fri, 2010-10-29 at 18:18 +0100, Jeremy Fitzhardinge wrote:
 On 10/28/2010 04:39 AM, Vasiliy Kulikov wrote:
 put_user() may fail.  In this case propagate error code from
 privcmd_ioctl_mmap_batch().
 Thanks for looking at this.  I'm in two minds about this; the existing
 logic is such that these put_users can only fail if something else has
 already failed and its returning an error.  I guess it would be useful
 to get an EFAULT if you've got a problem writing back the results.

 IanC, any opinion?
 Not a strong one.

 Perhaps what we really want in this case is for traverse_pages to return
 the total number of callback failures it encountered rather than
 aborting after the first failure?

 On the other hand you are correct that gather_array() has already
 touched all the pages which we are going to be touching here so how
 likely is a new failure at this point anyway?

I could think of two cases: the array is mapped RO, so only the
writeback fails, or someone changes the mapping under our feet from
another thread.

J

 Ian.

 Thanks,
 J

 Signed-off-by: Vasiliy Kulikov sego...@gmail.com
 ---
  Compile tested.

  drivers/xen/xenfs/privcmd.c |8 ++--
  1 files changed, 2 insertions(+), 6 deletions(-)

 diff --git a/drivers/xen/xenfs/privcmd.c b/drivers/xen/xenfs/privcmd.c
 index f80be7f..2eb04c8 100644
 --- a/drivers/xen/xenfs/privcmd.c
 +++ b/drivers/xen/xenfs/privcmd.c
 @@ -266,9 +266,7 @@ static int mmap_return_errors(void *data, void *state)
 xen_pfn_t *mfnp = data;
 struct mmap_batch_state *st = state;
  
 -   put_user(*mfnp, st-user++);
 -
 -   return 0;
 +   return put_user(*mfnp, st-user++);
  }
  
  static struct vm_operations_struct privcmd_vm_ops;
 @@ -323,10 +321,8 @@ static long privcmd_ioctl_mmap_batch(void __user 
 *udata)
 up_write(mm-mmap_sem);
  
 if (state.err  0) {
 -   ret = 0;
 -
 state.user = m.arr;
 -   traverse_pages(m.num, sizeof(xen_pfn_t),
 +   ret = traverse_pages(m.num, sizeof(xen_pfn_t),
pagelist,
mmap_return_errors, state);
 }


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [PATCH] x86/pvclock-xen: zero last_value on resume

2010-10-27 Thread Jeremy Fitzhardinge
 On 10/26/2010 10:48 AM, Glauber Costa wrote:
 On Tue, 2010-10-26 at 09:59 -0700, Jeremy Fitzhardinge wrote:
 If the guest domain has been suspend/resumed or migrated, then the
 system clock backing the pvclock clocksource may revert to a smaller
 value (ie, can be non-monotonic across the migration/save-restore).
 Make sure we zero last_value in that case so that the domain
 continues to see clock updates.

 [ I don't know if kvm needs an analogous fix or not. ]
 After migration, save/restore, etc, we issue an ioctl where we tell
 the host the last clock value. That (in theory) guarantees monotonicity.

 I am not opposed to this patch in any way, however.

Thanks.

HPA, do you want to take this, or shall I send it on?

Thanks,
J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH] x86/pvclock-xen: zero last_value on resume

2010-10-26 Thread Jeremy Fitzhardinge


If the guest domain has been suspend/resumed or migrated, then the
system clock backing the pvclock clocksource may revert to a smaller
value (ie, can be non-monotonic across the migration/save-restore).
Make sure we zero last_value in that case so that the domain
continues to see clock updates.

[ I don't know if kvm needs an analogous fix or not. ]

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
Cc: Stable Kernel sta...@kernel.org
Reported-by: Eelco Dolstra e.dols...@tudelft.nl
Reported-by: Olivier Hanesse olivier.hane...@gmail.com
Bisected-by: Cédric Schieli cschi...@gmail.com
Tested-by: Cédric Schieli cschi...@gmail.com

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index cd02f32..6226870 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -11,5 +11,6 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info 
*src);
 void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
struct pvclock_vcpu_time_info *vcpu,
struct timespec *ts);
+void pvclock_resume(void);
 
 #endif /* _ASM_X86_PVCLOCK_H */
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 239427c..a4f07c1 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -120,6 +120,11 @@ unsigned long pvclock_tsc_khz(struct 
pvclock_vcpu_time_info *src)
 
 static atomic64_t last_value = ATOMIC64_INIT(0);
 
+void pvclock_resume(void)
+{
+   atomic64_set(last_value, 0);
+}
+
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
struct pvclock_shadow_time shadow;
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index b2bb5aa..5da5e53 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -426,6 +426,8 @@ void xen_timer_resume(void)
 {
int cpu;
 
+   pvclock_resume();
+
if (xen_clockevent != xen_vcpuop_clockevent)
return;
 


From 29acbb4e1d93e719250648db1ce8c7a24144fd86 Mon Sep 17 00:00:00 2001
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
Date: Mon, 25 Oct 2010 16:53:46 -0700
Subject: [PATCH] x86/pvclock: zero last_value on resume

If the guest domain has been suspend/resumed or migrated, then the
system clock backing the pvclock clocksource may revert to a smaller
value (ie, can be non-monotonic across the migration/save-restore).
Make sure we zero last_value in that case so that the domain
continues to see clock updates.

Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
Cc: Stable Kernel sta...@kernel.org
Reported-by: Eelco Dolstra e.dols...@tudelft.nl
Reported-by: Olivier Hanesse olivier.hane...@gmail.com
Bisected-by: Cédric Schieli cschi...@gmail.com
Tested-by: Cédric Schieli cschi...@gmail.com

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index cd02f32..6226870 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -11,5 +11,6 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info 
*src);
 void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
struct pvclock_vcpu_time_info *vcpu,
struct timespec *ts);
+void pvclock_resume(void);
 
 #endif /* _ASM_X86_PVCLOCK_H */
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 239427c..a4f07c1 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -120,6 +120,11 @@ unsigned long pvclock_tsc_khz(struct 
pvclock_vcpu_time_info *src)
 
 static atomic64_t last_value = ATOMIC64_INIT(0);
 
+void pvclock_resume(void)
+{
+   atomic64_set(last_value, 0);
+}
+
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
struct pvclock_shadow_time shadow;
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index b2bb5aa..5da5e53 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -426,6 +426,8 @@ void xen_timer_resume(void)
 {
int cpu;
 
+   pvclock_resume();
+
if (xen_clockevent != xen_vcpuop_clockevent)
return;
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: xen PV on HVM and initial domain merge in linux-next

2010-10-20 Thread Jeremy Fitzhardinge
Yes all ok.

J

Stefano Stabellini stefano.stabell...@eu.citrix.com wrote:

On Wed, 20 Oct 2010, Stephen Rothwell wrote:
 Hi Stefano,
 
 [just casting the net a bit wider ...]
 
 On Tue, 19 Oct 2010 18:51:47 +0100 Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 
  I forgot to CC the LKML and linux-next...
  
  On Tue, 19 Oct 2010, Stefano Stabellini wrote:
   Stephen,
   I have two patch series to merge in linux-next:
   
   PV on HVM: receive interrupts as xen events
   xen: initial domain support
   
   they have all the acked-by needed and are both stable since
several
   weeks, however they depend on Konrad's xen-pcifront series and
for this
   reason I waited until now to ask for a merge in linux-next.
   
   Could you please pull:
   
   git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git
linux-next-initial-domain-v4
   
   it contains both series rebased on Konrad's pcifront series
merged on
   linux-next (warning: it still contains the merge commit of
   xen-pcifront-0.8.2 in linux-next).
   Let me know if you have any conflicts or if you need me to change
the
   branch somehow.
 
 Not following the Xen develpment at all, I would like to have a
positive
 reply from the listed Xen contacts, please,
 

Sure.
Jeremy?


 I do have concerns that this is turning up so late, but I realise
that
 that is mainly due to a misunderstanding on the part of some of the
Xen
 community.
 

Thank you very much for understanding!


 Also, the above tree is based on next-20101019 which means that I
cannot
 use it as is.  All the trees merged into linux-next must be base on
some
 other stable tree (almost always Linus' tree).  linux-next is rebuilt
 from scratch every day, so I cannot ever include a previous day's
version.
 
 Merging in other stable trees is OK (as long as the other maintainer
is
 aware of that and makes sure that their tree does not reabse).
 
 Basically what you send to me should be what you intend to send to
Linus
 during the next merge window.
 
All right.
I merged Jeremy's and Konrad's branches (the ones you just merged on
linux-next) on top of linux 2.6.36 rc8, then I rebased my series on top
of the result.
Please checkout this branch:

git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git
2.6.36-rc8-initial-domain-v5

and let me know if it is suitable, it shouldn't have any merge
conflicts.

Cheers,

Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] PROBLEM: [BISECTED] 2.6.35.5 xen domU panics just after the boot

2010-09-24 Thread Jeremy Fitzhardinge
 On 09/23/2010 10:40 PM, Paweł Zuzelski wrote:
 Have you seen the message from James Dingwall?

 He had found the typo in fb412a178502dc498430723b082a932f797e4763
 commit and sent the patch to lkml that fixes it.

 Please, see https://patchwork.kernel.org/patch/202282/ if you have
 not received James' mail. That patch resolves the problem for me and
 for James.


(!) Oh, well.  Yes, that would explain it. I'd noticed the missing 'x',
but assumed I'd just done it with an editor fat-finger...

Looks like upstream is OK, so it crept in with the stable
cherry-pick/backport to both stable-2.6.32/master and stable-2.6.35/master.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] PROBLEM: [BISECTED] 2.6.35.5 xen domU panics just after the boot

2010-09-24 Thread Jeremy Fitzhardinge
 On 09/23/2010 07:13 PM, Dan Magenheimer wrote:
 Jeremy --

 FYI, I think I've also seen this problem, or something similar,
 but ONLY on a Nehalem box (and only intermittently), not on my
 Core 2 Duo boxen.  The
 Nehalem box is an SDP so I had assumed that it was something
 to do with that, but maybe not.  Maybe some feature is
 leaking through to the guest.  Hyperthreading? MSI? EPT?

 Anyway, if you have a newer box, you might try reproducing
 on that, rather than your usual development box (which IIRC
 was a Core 2 Duo laptop?)

No, this bug turned out to be a simple typo in the patch that got merged
into the upstream stable kernels.

J

 -Original Message-
 From: Jeremy Fitzhardinge [mailto:jer...@goop.org]
 Sent: Thursday, September 23, 2010 6:02 PM
 To: Pawel Zuzelski
 Cc: virtualizat...@lists.osdl.org; Jeremy Fitzhardinge; xen-
 de...@lists.xensource.com; lkml
 Subject: Re: [Xen-devel] PROBLEM: [BISECTED] 2.6.35.5 xen domU panics
 just after the boot

  On 09/21/2010 12:05 PM, Paweł Zuzelski wrote:
 Hello,

 kernels 2.6.35.5 and 2.6.32.22 xen domU panics at the very begining
 of the
 boot process.

 I have bisected it to a single commit, and the first bad commit is:
 [fb412a178502dc498430723b082a932f797e4763] xen: use percpu interrupts
 for IPIs and VIRQs
 kernel v2.6.35.5 with reverted this commit works for me.
 Thanks very much for doing that.  I'll have to work out what's going on
 (obviously it doesn't do it for me).

 J

 Here are the kernel configs I was using:
 http://carme.pld-linux.org/~pawelz/kernel-2.6.35.5-domU-config
 http://carme.pld-linux.org/~pawelz/kernel-2.6.32.22-domU-config
 As you can see they are stripped down configs, intended to run in
 domU only.
 I was testing it with the very simple domU configuration:

 kernel  = '/srv/xen/bzImage'
 memory  = '128'
 vcpus = 2
 name= 'test'
 on_poweroff = 'destroy'
 on_reboot   = 'restart'
 on_crash= 'restart'

 Here is the full output of kernel 2.6.35.5:

 Using config file /etc/xen/test.
 Started domain test
 [0.00] Policy zone: DMA32
 [0.00] Kernel command line:
 [0.00] PID hash table entries: 512 (order: 0, 4096 bytes)
 [0.00] Subtract (33 early reservations)
 [0.00]   #1 [0001976000 - 0001987000]  XEN PAGETABLES
 [0.00]   #2 [000100 - 00019125f8]   TEXT DATA BSS
 [0.00]   #3 [0001933000 - 0001976000]  XEN START INFO
 [0.00]   #4 [01 - 012000]  TRAMPOLINE
 [0.00]   #5 [012000 - 04] PGTABLE
 [0.00]   #6 [0001912600 - 0001917600]   NODE_DATA
 [0.00]   #7 [0001917600 - 0001918600] BOOTMEM
 [0.00]   #8 [0001918600 - 0001918618] BOOTMEM
 [0.00]   #9 [0001919000 - 000191a000] BOOTMEM
 [0.00]   #10 [000191a000 - 000191b000] BOOTMEM
 [0.00]   #11 [000191b000 - 000191c000] BOOTMEM
 [0.00]   #12 [000220 - 00023c]MEMMAP 0
 [0.00]   #13 [0001918640 - 00019187c0] BOOTMEM
 [0.00]   #14 [000191c000 - 000191cc00] BOOTMEM
 [0.00]   #15 [000191d000 - 000191e000] BOOTMEM
 [0.00]   #16 [000191e000 - 000191f000] BOOTMEM
 [0.00]   #17 [000191f000 - 000192] BOOTMEM
 [0.00]   #18 [00019187c0 - 00019188a0] BOOTMEM
 [0.00]   #19 [00019188c0 - 0001918928] BOOTMEM
 [0.00]   #20 [0001918940 - 00019189a8] BOOTMEM
 [0.00]   #21 [00019189c0 - 0001918a28] BOOTMEM
 [0.00]   #22 [0001918a40 - 0001918a41] BOOTMEM
 [0.00]   #23 [0001918a80 - 0001918a81] BOOTMEM
 [0.00]   #24 [0001987000 - 00019c1000] BOOTMEM
 [0.00]   #25 [0001918ac0 - 0001918ac8] BOOTMEM
 [0.00]   #26 [0001918b00 - 0001918b08] BOOTMEM
 [0.00]   #27 [0001918b40 - 0001918b48] BOOTMEM
 [0.00]   #28 [0001918b80 - 0001918b90] BOOTMEM
 [0.00]   #29 [0001918bc0 - 0001918cc0] BOOTMEM
 [0.00]   #30 [0001918cc0 - 0001918d08] BOOTMEM
 [0.00]   #31 [0001918d40 - 0001918d88] BOOTMEM
 [0.00]   #32 [000192 - 0001921000] BOOTMEM
 [0.00] Memory: 118724k/131072k available (3327k kernel code,
 448k absent, 11900k reserved, 3931k data, 440k init)
 [0.00] SLUB: Genslabs=14, HWalign=64, Order=0-3,
 MinObjects=0, CPUs=2, Nodes=1
 [0.00] Hierarchical RCU implementation.
 [0.00]  RCU-based detection of stalled CPUs is disabled.
 [0.00]  Verbose stalled-CPUs detection is disabled.
 [0.00] NR_IRQS:2304
 [0.00] Console: colour dummy device 80x25
 [0.00] console [tty0] enabled
 [0.00] console [hvc0] enabled
 [0.00] installing Xen timer for CPU 0
 [0.00] BUG: unable to handle kernel NULL pointer dereference
 at (null)
 [0.00] IP: [(null)] (null)
 [0.00

Re: [Xen-devel] PROBLEM: [BISECTED] 2.6.35.5 xen domU panics just after the boot

2010-09-23 Thread Jeremy Fitzhardinge
 On 09/21/2010 12:05 PM, Paweł Zuzelski wrote:
 Hello,

 kernels 2.6.35.5 and 2.6.32.22 xen domU panics at the very begining of the
 boot process.

 I have bisected it to a single commit, and the first bad commit is:
 [fb412a178502dc498430723b082a932f797e4763] xen: use percpu interrupts for 
 IPIs and VIRQs

 kernel v2.6.35.5 with reverted this commit works for me.

Thanks very much for doing that.  I'll have to work out what's going on
(obviously it doesn't do it for me).

J

 Here are the kernel configs I was using:
 http://carme.pld-linux.org/~pawelz/kernel-2.6.35.5-domU-config
 http://carme.pld-linux.org/~pawelz/kernel-2.6.32.22-domU-config
 As you can see they are stripped down configs, intended to run in domU only.

 I was testing it with the very simple domU configuration:

 kernel  = '/srv/xen/bzImage'
 memory  = '128'
 vcpus = 2
 name= 'test'
 on_poweroff = 'destroy'
 on_reboot   = 'restart'
 on_crash= 'restart'

 Here is the full output of kernel 2.6.35.5:

 Using config file /etc/xen/test.
 Started domain test
 [0.00] Policy zone: DMA32
 [0.00] Kernel command line: 
 [0.00] PID hash table entries: 512 (order: 0, 4096 bytes)
 [0.00] Subtract (33 early reservations)
 [0.00]   #1 [0001976000 - 0001987000]  XEN PAGETABLES
 [0.00]   #2 [000100 - 00019125f8]   TEXT DATA BSS
 [0.00]   #3 [0001933000 - 0001976000]  XEN START INFO
 [0.00]   #4 [01 - 012000]  TRAMPOLINE
 [0.00]   #5 [012000 - 04] PGTABLE
 [0.00]   #6 [0001912600 - 0001917600]   NODE_DATA
 [0.00]   #7 [0001917600 - 0001918600] BOOTMEM
 [0.00]   #8 [0001918600 - 0001918618] BOOTMEM
 [0.00]   #9 [0001919000 - 000191a000] BOOTMEM
 [0.00]   #10 [000191a000 - 000191b000] BOOTMEM
 [0.00]   #11 [000191b000 - 000191c000] BOOTMEM
 [0.00]   #12 [000220 - 00023c]MEMMAP 0
 [0.00]   #13 [0001918640 - 00019187c0] BOOTMEM
 [0.00]   #14 [000191c000 - 000191cc00] BOOTMEM
 [0.00]   #15 [000191d000 - 000191e000] BOOTMEM
 [0.00]   #16 [000191e000 - 000191f000] BOOTMEM
 [0.00]   #17 [000191f000 - 000192] BOOTMEM
 [0.00]   #18 [00019187c0 - 00019188a0] BOOTMEM
 [0.00]   #19 [00019188c0 - 0001918928] BOOTMEM
 [0.00]   #20 [0001918940 - 00019189a8] BOOTMEM
 [0.00]   #21 [00019189c0 - 0001918a28] BOOTMEM
 [0.00]   #22 [0001918a40 - 0001918a41] BOOTMEM
 [0.00]   #23 [0001918a80 - 0001918a81] BOOTMEM
 [0.00]   #24 [0001987000 - 00019c1000] BOOTMEM
 [0.00]   #25 [0001918ac0 - 0001918ac8] BOOTMEM
 [0.00]   #26 [0001918b00 - 0001918b08] BOOTMEM
 [0.00]   #27 [0001918b40 - 0001918b48] BOOTMEM
 [0.00]   #28 [0001918b80 - 0001918b90] BOOTMEM
 [0.00]   #29 [0001918bc0 - 0001918cc0] BOOTMEM
 [0.00]   #30 [0001918cc0 - 0001918d08] BOOTMEM
 [0.00]   #31 [0001918d40 - 0001918d88] BOOTMEM
 [0.00]   #32 [000192 - 0001921000] BOOTMEM
 [0.00] Memory: 118724k/131072k available (3327k kernel code, 448k 
 absent, 11900k reserved, 3931k data, 440k init)
 [0.00] SLUB: Genslabs=14, HWalign=64, Order=0-3, MinObjects=0, 
 CPUs=2, Nodes=1
 [0.00] Hierarchical RCU implementation.
 [0.00]RCU-based detection of stalled CPUs is disabled.
 [0.00]Verbose stalled-CPUs detection is disabled.
 [0.00] NR_IRQS:2304
 [0.00] Console: colour dummy device 80x25
 [0.00] console [tty0] enabled
 [0.00] console [hvc0] enabled
 [0.00] installing Xen timer for CPU 0
 [0.00] BUG: unable to handle kernel NULL pointer dereference at (null)
 [0.00] IP: [(null)] (null)
 [0.00] PGD 0 
 [0.00] Oops: 0010 [#1] SMP 
 [0.00] last sysfs file: 
 [0.00] CPU 0 
 [0.00] Modules linked in:
 [0.00] 
 [0.00] Pid: 0, comm: swapper Not tainted 2.6.35.5 #1 /
 [0.00] RIP: e030:[]  [(null)] (null)
 [0.00] RSP: e02b:81601d70  EFLAGS: 00010082
 [0.00] RAX: 818fdb50 RBX:  RCX: 
 
 [0.00] RDX:  RSI: 818c7958 RDI: 
 
 [0.00] RBP: 81601d88 R08: ea1b22d8 R09: 
 001a
 [0.00] R10:  R11: 6477 R12: 
 81623280
 [0.00] R13:  R14: ffea R15: 
 
 [0.00] FS:  () GS:880001987000() 
 knlGS:
 [0.00] CS:  e033 DS:  ES:  CR0: 8005003b
 [0.00] CR2:  CR3: 016b9000 

Re: [PATCH 1/4] x86: remove cast from void*

2010-09-21 Thread Jeremy Fitzhardinge
 On 09/13/2010 10:18 PM, matt mooney wrote:
 Unnecessary cast from void* in assignment.

 Signed-off-by: matt mooney m...@muteddisk.com
 ---
  arch/x86/xen/mmu.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
 index 42086ac..7436283 100644
 --- a/arch/x86/xen/mmu.c
 +++ b/arch/x86/xen/mmu.c
 @@ -737,7 +737,7 @@ pgd_t *xen_get_user_pgd(pgd_t *pgd)
  
   if (offset  pgd_index(USER_LIMIT)) {
   struct page *page = virt_to_page(pgd_page);
 - user_ptr = (pgd_t *)page-private;
 + user_ptr = page-private;

Um, page-private is unsigned long anyway, so this is needed either way.

J

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] x86: remove cast from void*

2010-09-14 Thread Jeremy Fitzhardinge
 On 09/13/2010 10:18 PM, matt mooney wrote:
 Unnecessary cast from void* in assignment.

Not very keen on this.  The cast may not be strictly required, but it
does document what's going on there.

J

 Signed-off-by: matt mooney m...@muteddisk.com
 ---
  arch/x86/xen/mmu.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
 index 42086ac..7436283 100644
 --- a/arch/x86/xen/mmu.c
 +++ b/arch/x86/xen/mmu.c
 @@ -737,7 +737,7 @@ pgd_t *xen_get_user_pgd(pgd_t *pgd)
  
   if (offset  pgd_index(USER_LIMIT)) {
   struct page *page = virt_to_page(pgd_page);
 - user_ptr = (pgd_t *)page-private;
 + user_ptr = page-private;
   if (user_ptr)
   user_ptr += offset;
   }

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] x86: remove cast from void*

2010-09-14 Thread Jeremy Fitzhardinge
 On 09/14/2010 11:20 AM, H. Peter Anvin wrote:
 On 09/14/2010 10:49 AM, Jeremy Fitzhardinge wrote:
  On 09/13/2010 10:18 PM, matt mooney wrote:
 Unnecessary cast from void* in assignment.
 Not very keen on this.  The cast may not be strictly required, but it
 does document what's going on there.

 But unnecessary casts are problematic in that if the type changes, they
 can hide a real bug in the future.

If page-private changes in a way which invalidates this kind of usage,
I would hope it would also change name to something more descriptive of
its new role.  If it changes type but is broadly compatible with this
usage (say, void * - unsigned long), then the cast won't matter, or
will become a requirement.

I'm more concerned about user_ptr changing type, in which case the
cast will highlight the problem.

In general, the casts from these kinds of generic/polymorphic fields let
you easily eyeball how they're being used, without having to work out
what the type of the other side is.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


  1   2   3   4   5   6   7   >