Re: [PATCH 0/6] x86: reduce paravirtualized spinlock overhead

2015-05-05 Thread Jeremy Fitzhardinge
On 05/03/2015 10:55 PM, Juergen Gross wrote:
> I did a small measurement of the pure locking functions on bare metal
> without and with my patches.
>
> spin_lock() for the first time (lock and code not in cache) dropped from
> about 600 to 500 cycles.
>
> spin_unlock() for first time dropped from 145 to 87 cycles.
>
> spin_lock() in a loop dropped from 48 to 45 cycles.
>
> spin_unlock() in the same loop dropped from 24 to 22 cycles.

Did you isolate icache hot/cold from dcache hot/cold? It seems to me the
main difference will be whether the branch predictor is warmed up rather
than if the lock itself is in dcache, but its much more likely that the
lock code is icache if the code is lock intensive, making the cold case
moot. But that's pure speculation.

Could you see any differences in workloads beyond microbenchmarks?

Not that its my call at all, but I think we'd need to see some concrete
improvements in real workloads before adding the complexity of more pvops.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/6] x86: reduce paravirtualized spinlock overhead

2015-04-30 Thread Jeremy Fitzhardinge
On 04/30/2015 03:53 AM, Juergen Gross wrote:
> Paravirtualized spinlocks produce some overhead even if the kernel is
> running on bare metal. The main reason are the more complex locking
> and unlocking functions. Especially unlocking is no longer just one
> instruction but so complex that it is no longer inlined.
>
> This patch series addresses this issue by adding two more pvops
> functions to reduce the size of the inlined spinlock functions. When
> running on bare metal unlocking is again basically one instruction.

Out of curiosity, is there a measurable difference?

J

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Jeremy Fitzhardinge

On 02/11/2015 03:28 PM, Linus Torvalds wrote:
>
>
> On Feb 11, 2015 3:15 PM, "Jeremy Fitzhardinge"  <mailto:jer...@goop.org>> wrote:
> >
> > Right now it needs to be a locked operation to prevent read-reordering.
> > x86 memory ordering rules state that all writes are seen in a globally
> > consistent order, and are globally ordered wrt reads *on the same
> > addresses*, but reads to different addresses can be reordered wrt to
> writes.
>
> The modern x86 rules are actually much tighter than that.
>
> Every store is a release, and every load is an acquire. So a
> non-atomic store is actually a perfectly fine unlock. All preceding
> stores will be seen by other cpu's before the unlock, and while reads
> can pass stores, they only pass *earlier* stores.
>

Right, so in this particular instance, the read of the SLOWPATH flag
*can't* pass the previous unlock store, hence the need for an atomic
unlock or some other mechanism to prevent the read from being reordered.

J

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Jeremy Fitzhardinge

On 02/11/2015 09:24 AM, Oleg Nesterov wrote:
> I agree, and I have to admit I am not sure I fully understand why
> unlock uses the locked add. Except we need a barrier to avoid the race
> with the enter_slowpath() users, of course. Perhaps this is the only
> reason?

Right now it needs to be a locked operation to prevent read-reordering.
x86 memory ordering rules state that all writes are seen in a globally
consistent order, and are globally ordered wrt reads *on the same
addresses*, but reads to different addresses can be reordered wrt to writes.

So, if the unlocking add were not a locked operation:

__add(&lock->tickets.head, TICKET_LOCK_INC);/* not locked */

if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

Then the read of lock->tickets.tail can be reordered before the unlock,
which introduces a race:

/* read reordered here */
if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) /* false */
/* ... */;

/* other CPU sets SLOWPATH and blocks */

__add(&lock->tickets.head, TICKET_LOCK_INC);/* not locked */

/* other CPU hung */

So it doesn't *have* to be a locked operation. This should also work:

__add(&lock->tickets.head, TICKET_LOCK_INC);/* not locked */

lfence();   /* prevent read 
reordering */
if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

but in practice a locked add is cheaper than an lfence (or at least was).

This *might* be OK, but I think it's on dubious ground:

__add(&lock->tickets.head, TICKET_LOCK_INC);/* not locked */

/* read overlaps write, and so is ordered */
if (unlikely(lock->head_tail & (TICKET_SLOWPATH_FLAG << TICKET_SHIFT))
__ticket_unlock_slowpath(lock, prev);

because I think Intel and AMD differed in interpretation about how
overlapping but different-sized reads & writes are ordered (or it simply
isn't architecturally defined).

If the slowpath flag is moved to head, then it would always have to be
locked anyway, because it needs to be atomic against other CPU's RMW
operations setting the flag.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-10 Thread Jeremy Fitzhardinge

On 02/10/2015 05:26 AM, Oleg Nesterov wrote:
> On 02/10, Raghavendra K T wrote:
>> On 02/10/2015 06:23 AM, Linus Torvalds wrote:
>>
>>>  add_smp(&lock->tickets.head, TICKET_LOCK_INC);
>>>  if (READ_ONCE(lock->tickets.tail) & TICKET_SLOWPATH_FLAG) ..
>>>
>>> into something like
>>>
>>>  val = xadd((&lock->ticket.head_tail, TICKET_LOCK_INC << 
>>> TICKET_SHIFT);
>>>  if (unlikely(val & TICKET_SLOWPATH_FLAG)) ...
>>>
>>> would be the right thing to do. Somebody should just check that I got
>>> that shift right, and that the tail is in the high bytes (head really
>>> needs to be high to work, if it's in the low byte(s) the xadd would
>>> overflow from head into tail which would be wrong).
>> Unfortunately xadd could result in head overflow as tail is high.
>>
>> The other option was repeated cmpxchg which is bad I believe.
>> Any suggestions?
> Stupid question... what if we simply move SLOWPATH from .tail to .head?
> In this case arch_spin_unlock() could do xadd(tickets.head) and check
> the result

Well, right now, "tail" is manipulated by locked instructions by CPUs
who are contending for the ticketlock, but head can be manipulated
unlocked by the CPU which currently owns the ticketlock. If SLOWPATH
moved into head, then non-owner CPUs would be touching head, requiring
everyone to use locked instructions on it.

That's the theory, but I don't see much (any?) code which depends on that.

Ideally we could find a way so that pv ticketlocks could use a plain
unlocked add for the unlock like the non-pv case, but I just don't see a
way to do it.

> In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg
> the whole .head_tail. Plus obviously more boring changes. This needs a
> separate patch even _if_ this can work.

Definitely.

> BTW. If we move "clear slowpath" into "lock" path, then probably trylock
> should be changed too? Something like below, we just need to clear SLOWPATH
> before cmpxchg.

How important / widely used is trylock these days?

J

>
> Oleg.
>
> --- x/arch/x86/include/asm/spinlock.h
> +++ x/arch/x86/include/asm/spinlock.h
> @@ -109,7 +109,8 @@ static __always_inline int arch_spin_try
>   if (old.tickets.head != (old.tickets.tail & ~TICKET_SLOWPATH_FLAG))
>   return 0;
>  
> - new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT);
> + new.tickets.head = old.tickets.head;
> + new.tickets.tail = (old.tickets.tail & ~TICKET_SLOWPATH_FLAG) + 
> TICKET_LOCK_INC;
>  
>   /* cmpxchg is a full barrier, so nothing can move before it */
>   return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == 
> old.head_tail;
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-08 Thread Jeremy Fitzhardinge
On 02/06/2015 06:49 AM, Raghavendra K T wrote:
> Paravirt spinlock clears slowpath flag after doing unlock.
> As explained by Linus currently it does:
> prev = *lock;
> add_smp(&lock->tickets.head, TICKET_LOCK_INC);
>
> /* add_smp() is a full mb() */
>
> if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
> __ticket_unlock_slowpath(lock, prev);
>
>
> which is *exactly* the kind of things you cannot do with spinlocks,
> because after you've done the "add_smp()" and released the spinlock
> for the fast-path, you can't access the spinlock any more.  Exactly
> because a fast-path lock might come in, and release the whole data
> structure.

Yeah, that's an embarrasingly obvious bug in retrospect.

> Linus suggested that we should not do any writes to lock after unlock(),
> and we can move slowpath clearing to fastpath lock.

Yep, that seems like a sound approach.

> However it brings additional case to be handled, viz., slowpath still
> could be set when somebody does arch_trylock. Handle that too by ignoring
> slowpath flag during lock availability check.
>
> Reported-by: Sasha Levin 
> Suggested-by: Linus Torvalds 
> Signed-off-by: Raghavendra K T 
> ---
>  arch/x86/include/asm/spinlock.h | 70 
> -
>  1 file changed, 34 insertions(+), 36 deletions(-)
>
> diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
> index 625660f..0829f86 100644
> --- a/arch/x86/include/asm/spinlock.h
> +++ b/arch/x86/include/asm/spinlock.h
> @@ -49,6 +49,23 @@ static inline void __ticket_enter_slowpath(arch_spinlock_t 
> *lock)
>   set_bit(0, (volatile unsigned long *)&lock->tickets.tail);
>  }
>  
> +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
> +{
> + arch_spinlock_t old, new;
> + __ticket_t diff;
> +
> + old.tickets = READ_ONCE(lock->tickets);

Couldn't the caller pass in the lock state that it read rather than
re-reading it?

> + diff = (old.tickets.tail & ~TICKET_SLOWPATH_FLAG) - old.tickets.head;
> +
> + /* try to clear slowpath flag when there are no contenders */
> + if ((old.tickets.tail & TICKET_SLOWPATH_FLAG) &&
> + (diff == TICKET_LOCK_INC)) {
> + new = old;
> + new.tickets.tail &= ~TICKET_SLOWPATH_FLAG;
> + cmpxchg(&lock->head_tail, old.head_tail, new.head_tail);
> + }
> +}
> +
>  #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
>  static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock,
>   __ticket_t ticket)
> @@ -59,6 +76,10 @@ static inline void __ticket_unlock_kick(arch_spinlock_t 
> *lock,
>  {
>  }
>  
> +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
> +{
> +}
> +
>  #endif /* CONFIG_PARAVIRT_SPINLOCKS */
>  
>  static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
> @@ -84,7 +105,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t 
> *lock)
>   register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
>  
>   inc = xadd(&lock->tickets, inc);
> - if (likely(inc.head == inc.tail))
> + if (likely(inc.head == (inc.tail & ~TICKET_SLOWPATH_FLAG)))

The intent of this conditional was to be the quickest possible path when
taking a fastpath lock, with the code below being used for all slowpath
locks (free or taken). So I don't think masking out SLOWPATH_FLAG is
necessary here.

>   goto out;
>  
>   inc.tail &= ~TICKET_SLOWPATH_FLAG;
> @@ -98,7 +119,10 @@ static __always_inline void 
> arch_spin_lock(arch_spinlock_t *lock)
>   } while (--count);
>   __ticket_lock_spinning(lock, inc.tail);
>   }
> -out: barrier();  /* make sure nothing creeps before the lock is taken */
> +out:
> + __ticket_check_and_clear_slowpath(lock);
> +
> + barrier();  /* make sure nothing creeps before the lock is taken */

Which means that if "goto out" path is only ever used for fastpath
locks, you can limit calling __ticket_check_and_clear_slowpath() to the
slowpath case.

>  }
>  
>  static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
> @@ -115,47 +139,21 @@ static __always_inline int 
> arch_spin_trylock(arch_spinlock_t *lock)
>   return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == 
> old.head_tail;
>  }
>  
> -static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock,
> - arch_spinlock_t old)
> -{
> - arch_spinlock_t new;
> -
> - BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
> -
> - /* Perform the unlock on the "before" copy */
> - old.tickets.head += TICKET_LOCK_INC;

NB (see below)

> -
> - /* Clear the slowpath flag */
> - new.head_tail = old.head_tail & ~(TICKET_SLOWPATH_FLAG << TICKET_SHIFT);
> -
> - /*
> -  * If the lock is uncontended, clear the flag - us

Re: [PATCH delta V13 14/14] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-13 Thread Jeremy Fitzhardinge
On 08/13/2013 01:02 PM, Raghavendra K T wrote:
> * Ingo Molnar  [2013-08-13 18:55:52]:
>
>> Would be nice to have a delta fix patch against tip:x86/spinlocks, which 
>> I'll then backmerge into that series via rebasing it.
>>
> There was a namespace collision of PER_CPU lock_waiting variable when
> we have both Xen and KVM enabled. 
>
> Perhaps this week wasn't for me. Had run 100 times randconfig in a loop
> for the fix sent earlier :(. 
>
> Ingo, below delta patch should fix it, IIRC, I hope you will be folding this
> back to patch 14/14 itself. Else please let me.
> I have already run allnoconfig, allyesconfig, randconfig with below patch. 
> But will
> test again. This should apply on top of tip:x86/spinlocks.
>
> ---8<---
> From: Raghavendra K T 
>
> Fix Namespace collision for lock_waiting
>
> Signed-off-by: Raghavendra K T 
> ---
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index d442471..b8ef630 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -673,7 +673,7 @@ struct kvm_lock_waiting {
>  static cpumask_t waiting_cpus;
>  
>  /* Track spinlock on which a cpu is waiting */
> -static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
> +static DEFINE_PER_CPU(struct kvm_lock_waiting, klock_waiting);

Has static stopped meaning static?

J

>  
>  static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
>  {
> @@ -685,7 +685,7 @@ static void kvm_lock_spinning(struct arch_spinlock *lock, 
> __ticket_t want)
>   if (in_nmi())
>   return;
>  
> - w = &__get_cpu_var(lock_waiting);
> + w = &__get_cpu_var(klock_waiting);
>   cpu = smp_processor_id();
>   start = spin_time_start();
>  
> @@ -756,7 +756,7 @@ static void kvm_unlock_kick(struct arch_spinlock *lock, 
> __ticket_t ticket)
>  
>   add_stats(RELEASED_SLOW, 1);
>   for_each_cpu(cpu, &waiting_cpus) {
> - const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
> + const struct kvm_lock_waiting *w = &per_cpu(klock_waiting, cpu);
>   if (ACCESS_ONCE(w->lock) == lock &&
>   ACCESS_ONCE(w->want) == ticket) {
>   add_stats(RELEASED_SLOW_KICKED, 1);
>
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V9 1/19] x86/spinlock: Replace pv spinlocks with pv ticketlocks

2013-06-01 Thread Jeremy Fitzhardinge
On 06/01/2013 12:21 PM, Raghavendra K T wrote:
> x86/spinlock: Replace pv spinlocks with pv ticketlocks
>
> From: Jeremy Fitzhardinge 
I'm not sure what the etiquette is here; I did the work while at Citrix,
but jer...@goop.org is my canonical email address.  The Citrix address
is dead and bounces, so is useless for anything.  Probably best to
change it.

J

>
> Rather than outright replacing the entire spinlock implementation in
> order to paravirtualize it, keep the ticket lock implementation but add
> a couple of pvops hooks on the slow patch (long spin on lock, unlocking
> a contended lock).
>
> Ticket locks have a number of nice properties, but they also have some
> surprising behaviours in virtual environments.  They enforce a strict
> FIFO ordering on cpus trying to take a lock; however, if the hypervisor
> scheduler does not schedule the cpus in the correct order, the system can
> waste a huge amount of time spinning until the next cpu can take the lock.
>
> (See Thomas Friebel's talk "Prevent Guests from Spinning Around"
> http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)
>
> To address this, we add two hooks:
>  - __ticket_spin_lock which is called after the cpu has been
>spinning on the lock for a significant number of iterations but has
>failed to take the lock (presumably because the cpu holding the lock
>has been descheduled).  The lock_spinning pvop is expected to block
>the cpu until it has been kicked by the current lock holder.
>  - __ticket_spin_unlock, which on releasing a contended lock
>(there are more cpus with tail tickets), it looks to see if the next
>cpu is blocked and wakes it if so.
>
> When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
> functions causes all the extra code to go away.
>
> Signed-off-by: Jeremy Fitzhardinge 
> Reviewed-by: Konrad Rzeszutek Wilk 
> Tested-by: Attilio Rao 
> [ Raghavendra: Changed SPIN_THRESHOLD ]
> Signed-off-by: Raghavendra K T 
> ---
>  arch/x86/include/asm/paravirt.h   |   32 
>  arch/x86/include/asm/paravirt_types.h |   10 ++
>  arch/x86/include/asm/spinlock.h   |   53 
> +++--
>  arch/x86/include/asm/spinlock_types.h |4 --
>  arch/x86/kernel/paravirt-spinlocks.c  |   15 +
>  arch/x86/xen/spinlock.c   |8 -
>  6 files changed, 61 insertions(+), 61 deletions(-)
>
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index cfdc9ee..040e72d 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -712,36 +712,16 @@ static inline void __set_fixmap(unsigned /* enum 
> fixed_addresses */ idx,
>  
>  #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)
>  
> -static inline int arch_spin_is_locked(struct arch_spinlock *lock)
> +static __always_inline void __ticket_lock_spinning(struct arch_spinlock 
> *lock,
> + __ticket_t ticket)
>  {
> - return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
> + PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
>  }
>  
> -static inline int arch_spin_is_contended(struct arch_spinlock *lock)
> +static __always_inline void ticket_unlock_kick(struct arch_spinlock 
> *lock,
> + __ticket_t ticket)
>  {
> - return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
> -}
> -#define arch_spin_is_contended   arch_spin_is_contended
> -
> -static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
> -{
> - PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
> -}
> -
> -static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
> -   unsigned long flags)
> -{
> - PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
> -}
> -
> -static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
> -{
> - return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
> -}
> -
> -static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
> -{
> - PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
> + PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
>  }
>  
>  #endif
> diff --git a/arch/x86/include/asm/paravirt_types.h 
> b/arch/x86/include/asm/paravirt_types.h
> index 0db1fca..d5deb6d 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -327,13 +327,11 @@ struct pv_mmu_ops {
>  };
>  
>  struct arch_spinlock;
> +#include 
> +
>  struct pv_lock_ops {
> - int (*spin_is_locked)(st

Re: [PATCH RFC V9 0/19] Paravirtualized ticket spinlocks

2013-06-01 Thread Jeremy Fitzhardinge
On 06/01/2013 01:14 PM, Andi Kleen wrote:
> FWIW I use the paravirt spinlock ops for adding lock elision
> to the spinlocks.

Does lock elision still use the ticketlock algorithm/structure, or are
they different?  If they're still basically ticketlocks, then it seems
to me that they're complimentary - hle handles the fastpath, and pv the
slowpath.

> This needs to be done at the top level (so the level you're removing)
>
> However I don't like the pv mechanism very much and would 
> be fine with using an static key hook in the main path
> like I do for all the other lock types.

Right.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 09/14] xen: events: Remove redundant check on unsigned variable

2012-11-16 Thread Jeremy Fitzhardinge
To be honest I'd nack this kind of patch. The test is only redundant in the 
most trivial sense that the compiler can easily optimise away. The point of the 
test is to make sure that the range is OK even if the type subsequently becomes 
signed (to hold a -ve error, for example).

J

Konrad Rzeszutek Wilk  wrote:

>On Fri, Nov 16, 2012 at 12:20:41PM +0530, Tushar Behera wrote:
>> No need to check whether unsigned variable is less than 0.
>> 
>> CC: Konrad Rzeszutek Wilk 
>
>Acked-by: Konrad Rzeszutek Wilk 
>
>> CC: Jeremy Fitzhardinge 
>> CC: xen-de...@lists.xensource.com
>> CC: virtualization@lists.linux-foundation.org
>> Signed-off-by: Tushar Behera 
>> ---
>>  drivers/xen/events.c |2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>> 
>> diff --git a/drivers/xen/events.c b/drivers/xen/events.c
>> index 4293c57..cadd7d1 100644
>> --- a/drivers/xen/events.c
>> +++ b/drivers/xen/events.c
>> @@ -216,7 +216,7 @@ static void xen_irq_info_pirq_init(unsigned irq,
>>   */
>>  static unsigned int evtchn_from_irq(unsigned irq)
>>  {
>> -if (unlikely(WARN(irq < 0 || irq >= nr_irqs, "Invalid irq %d!\n",
>irq)))
>> +if (unlikely(WARN(irq >= nr_irqs, "Invalid irq %d!\n", irq)))
>>  return 0;
>>  
>>  return info_for_irq(irq)->evtchn;
>> -- 
>> 1.7.4.1

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC V8 0/17] Paravirtualized ticket spinlocks

2012-05-14 Thread Jeremy Fitzhardinge
On 05/13/2012 11:45 AM, Raghavendra K T wrote:
> On 05/07/2012 08:22 PM, Avi Kivity wrote:
>
> I could not come with pv-flush results (also Nikunj had clarified that
> the result was on NOn PLE
>
>> I'd like to see those numbers, then.
>>
>> Ingo, please hold on the kvm-specific patches, meanwhile.
>>
>
> 3 guests 8GB RAM, 1 used for kernbench
> (kernbench -f -H -M -o 20) other for cpuhog (shell script with  while
> true do hackbench)
>
> 1x: no hogs
> 2x: 8hogs in one guest
> 3x: 8hogs each in two guest
>
> kernbench on PLE:
> Machine : IBM xSeries with Intel(R) Xeon(R)  X7560 2.27GHz CPU with 32
> core, with 8 online cpus and 4*64GB RAM.
>
> The average is taken over 4 iterations with 3 run each (4*3=12). and
> stdev is calculated over mean reported in each run.
>
>
> A): 8 vcpu guest
>
>  BASEBASE+patch %improvement w.r.t
>  mean (sd)   mean (sd) 
> patched kernel time
> case 1*1x:61.7075  (1.17872)60.93 (1.475625)1.27605
> case 1*2x:107.2125 (1.3821349)97.506675 (1.3461878)   9.95401
> case 1*3x:144.3515 (1.8203927)138.9525  (0.58309319)  3.8855
>
>
> B): 16 vcpu guest
>  BASEBASE+patch %improvement w.r.t
>  mean (sd)   mean (sd) 
> patched kernel time
> case 2*1x:70.524   (1.5941395)69.68866  (1.9392529)   1.19867
> case 2*2x:133.0738 (1.4558653)124.8568  (1.4544986)   6.58114
> case 2*3x:206.0094 (1.3437359)181.4712  (2.9134116)   13.5218
>
> B): 32 vcpu guest
>  BASEBASE+patch %improvementw.r.t
>  mean (sd)   mean (sd) 
> patched kernel time
> case 4*1x:100.61046 (2.7603485) 85.48734  (2.6035035)  17.6905

What does the "4*1x" notation mean? Do these workloads have overcommit
of the PCPU resources?

When I measured it, even quite small amounts of overcommit lead to large
performance drops with non-pv ticket locks (on the order of 10%
improvements when there were 5 busy VCPUs on a 4 cpu system).  I never
tested it on larger machines, but I guess that represents around 25%
overcommit, or 40 busy VCPUs on a 32-PCPU system.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V8 0/17] Paravirtualized ticket spinlocks

2012-05-07 Thread Jeremy Fitzhardinge
On 05/07/2012 06:49 AM, Avi Kivity wrote:
> On 05/07/2012 04:46 PM, Srivatsa Vaddagiri wrote:
>> * Raghavendra K T  [2012-05-07 19:08:51]:
>>
>>> I 'll get hold of a PLE mc  and come up with the numbers soon. but I
>>> 'll expect the improvement around 1-3% as it was in last version.
>> Deferring preemption (when vcpu is holding lock) may give us better than 
>> 1-3% 
>> results on PLE hardware. Something worth trying IMHO.
> Is the improvement so low, because PLE is interfering with the patch, or
> because PLE already does a good job?

How does PLE help with ticket scheduling on unlock?  I thought it would
just help with the actual spin loops.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH RFC V6 0/11] Paravirtualized ticketlocks

2012-04-16 Thread Jeremy Fitzhardinge
On 04/16/2012 09:36 AM, Ian Campbell wrote:
> On Mon, 2012-04-16 at 16:44 +0100, Konrad Rzeszutek Wilk wrote:
>> On Sat, Mar 31, 2012 at 09:37:45AM +0530, Srivatsa Vaddagiri wrote:
>>> * Thomas Gleixner  [2012-03-31 00:07:58]:
>>>
 I know that Peter is going to go berserk on me, but if we are running
 a paravirt guest then it's simple to provide a mechanism which allows
 the host (aka hypervisor) to check that in the guest just by looking
 at some global state.

 So if a guest exits due to an external event it's easy to inspect the
 state of that guest and avoid to schedule away when it was interrupted
 in a spinlock held section. That guest/host shared state needs to be
 modified to indicate the guest to invoke an exit when the last nested
 lock has been released.
>>> I had attempted something like that long back:
>>>
>>> http://lkml.org/lkml/2010/6/3/4
>>>
>>> The issue is with ticketlocks though. VCPUs could go into a spin w/o
>>> a lock being held by anybody. Say VCPUs 1-99 try to grab a lock in
>>> that order (on a host with one cpu). VCPU1 wins (after VCPU0 releases it)
>>> and releases the lock. VCPU1 is next eligible to take the lock. If 
>>> that is not scheduled early enough by host, then remaining vcpus would keep 
>>> spinning (even though lock is technically not held by anybody) w/o making 
>>> forward progress.
>>>
>>> In that situation, what we really need is for the guest to hint to host
>>> scheduler to schedule VCPU1 early (via yield_to or something similar). 
>>>
>>> The current pv-spinlock patches however does not track which vcpu is
>>> spinning at what head of the ticketlock. I suppose we can consider 
>>> that optimization in future and see how much benefit it provides (over
>>> plain yield/sleep the way its done now).
>> Right. I think Jeremy played around with this some time?
> 5/11 "xen/pvticketlock: Xen implementation for PV ticket locks" tracks
> which vcpus are waiting for a lock in "cpumask_t waiting_cpus" and
> tracks which lock each is waiting for in per-cpu "lock_waiting". This is
> used in xen_unlock_kick to kick the right CPU. There's a loop over only
> the waiting cpus to figure out who to kick.

Yes, and AFAIK the KVM pv-ticketlock patches do the same thing.  If a
(V)CPU is asleep, then sending it a kick is pretty much equivalent to a
yield to (not precisely, but it should get scheduled soon enough, and it
won't be competing with a pile of VCPUs with no useful work to do).

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

2012-01-18 Thread Jeremy Fitzhardinge
On 01/19/2012 12:54 AM, Srivatsa Vaddagiri wrote:
>
>> That logic relies on the "kick" being level triggered, so that "kick"
>> before "block" will cause the block to fall out immediately.  If you're
>> using "hlt" as the block and it has the usual edge-triggered behaviour,
>> what stops a "kick-before-hlt" from losing the kick?
> Hmm ..'hlt' should result in a check for kick request (in hypervisor
> context) before vcpu is put to sleep. IOW vcpu1 that is attempting to kick 
> vcpu0
> will set a 'somebody_tried_kicking_vcpu0' flag, which hypervisor should check 
> before it puts vcpu0 to sleep because of trapped 'hlt' instruction.
>
> Won't that trap the 'kick-before-hlt' case? What am I missing here?

Nothing, that sounds fine.  It wasn't clear to me that your kick
operation left persistent state, and so has a level-triggered effect on hlt.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

2012-01-17 Thread Jeremy Fitzhardinge
On 01/17/2012 10:33 PM, Srivatsa Vaddagiri wrote:
> * Marcelo Tosatti  [2012-01-17 09:02:11]:
>
>>> +/* Kick vcpu waiting on @lock->head to reach value @ticket */
>>> +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
>>> +{
>>> +   int cpu;
>>> +   int apicid;
>>> +
>>> +   add_stats(RELEASED_SLOW, 1);
>>> +
>>> +   for_each_cpu(cpu, &waiting_cpus) {
>>> +   const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
>>> +   if (ACCESS_ONCE(w->lock) == lock &&
>>> +   ACCESS_ONCE(w->want) == ticket) {
>>> +   add_stats(RELEASED_SLOW_KICKED, 1);
>>> +   apicid = per_cpu(x86_cpu_to_apicid, cpu);
>>> +   kvm_kick_cpu(apicid);
>>> +   break;
>>> +   }
>>> +   }
>> What prevents a kick from being lost here, if say, the waiter is at
>> local_irq_save in kvm_lock_spinning, before the lock/want assignments?
> The waiter does check for lock becoming available before actually
> sleeping:
>
> + /*
> +* check again make sure it didn't become free while
> +* we weren't looking.
> +*/
> + if (ACCESS_ONCE(lock->tickets.head) == want) {
> +   add_stats(TAKEN_SLOW_PICKUP, 1);
> +   goto out;
> + }

That logic relies on the "kick" being level triggered, so that "kick"
before "block" will cause the block to fall out immediately.  If you're
using "hlt" as the block and it has the usual edge-triggered behaviour,
what stops a "kick-before-hlt" from losing the kick?

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests

2012-01-16 Thread Jeremy Fitzhardinge
On 01/16/2012 09:24 PM, Alexander Graf wrote:
> This is true in case you're spinning. If on overcommit spinlocks would
> instead of spin just yield(), we wouldn't have any vcpu running that's
> just waiting for a late ticket.

Yes, but the reality is that most spinlocks are held for a short period
of time and there's a low likelihood of being preempted while within a
spinlock critical section.  Therefore if someone else tries to get the
spinlock and there's contention, it's always worth spinning for a little
while because the lock will likely become free soon.

At least that's the case if the lock has low contention (shallow queue
depth and not in slow state).  Again, maybe it makes sense to never spin
for deep queues or already slowstate locks.

> We still have an issue finding the point in time when a vcpu could run again, 
> which is what this whole series is about. My point above was that instead of 
> doing a count loop, we could just do the normal spin dance and set the 
> threshold to when we enable the magic to have another spin lock notify us in 
> the CPU. That way we
>
>   * don't change the uncontended case

I don't follow you.  What do you mean by "the normal spin dance"?  What
do you mean by "have another spinlock notify us in the CPU"?  Don't
change which uncontended case?  Do you mean in the locking path?  Or the
unlock path?  Or both?

>   * can set the threshold on the host, which knows how contended the system is

Hm, I'm not convinced that knowing how contended the system is is all
that useful overall.  What's important is how contended a particular
lock is, and what state the current holder is in.  If it's not currently
running, then knowing the overall system contention would give you some
idea about how long you need to wait for it to be rescheduled, but
that's getting pretty indirect.

I think the "slowpath if preempted while spinning" idea I mentioned in
the other mail is probably worth following up, since that give specific
actionable information to the guest from the hypervisor.  But lots of
caveats.

[[
A possible mechanism:

  * register ranges of [er]ips with the hypervisor
  * each range is paired with a "resched handler block"
  * if vcpu is preempted within such a range, make sure it is
rescheduled in the resched handler block

This is obviously akin to the exception mechanism, but it is partially
implemented by the hypervisor.  It allows the spinlock code to be
unchanged from native, but make use of a resched rather than an explicit
counter to determine when to slowpath the lock.  And it's a nice general
mechanism that could be potentially useful elsewhere.

Unfortunately, it doesn't change the unlock path at all; it still needs
to explicitly test if a VCPU needs to be kicked on unlock.
]]


> And since we control what spin locks look like, we can for example always 
> keep the pointer to it in a specific register so that we can handle 
> pv_lock_ops.lock_spinning() inside there and fetch all the information we 
> need from our pt_regs.

You've left a pile of parts of an idea lying around, but I'm not sure
what shape you intend it to be.

>>> Speaking of which - have you benchmarked performance degradation of pv 
>>> ticket locks on bare metal? Last time I checked, enabling all the PV ops 
>>> did incur significant slowdown which is why I went though the work to split 
>>> the individual pv ops features up to only enable a few for KVM guests.
>> The whole point of the pv-ticketlock work is to keep the pvops hooks out of 
>> the locking fast path, so that the calls are only made on the slow path - 
>> that is, when spinning too long on a contended lock, and when releasing a 
>> lock that's in a "slow" state.  In the fast path case of no contention, 
>> there are no pvops, and the executed code path is almost identical to native.
> You're still changing a tight loop that does nothing (CPU detects it and 
> saves power) into something that performs calculations.

It still has a "pause" instruction in that loop, so that CPU mechanism
will still come into play.  "pause" doesn't directly "save power"; it's
more about making sure that memory dependence cycles are broken and that
two competing threads will make similar progress.  Besides I'm not sure
adding a dec+test to a loop that's already got a memory read and compare
in it is adding much in the way of "calculations".

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests

2012-01-16 Thread Jeremy Fitzhardinge
On 01/16/2012 07:55 PM, Avi Kivity wrote:
> On 01/16/2012 08:40 AM, Jeremy Fitzhardinge wrote:
>>> That means we're spinning for n cycles, then notify the spinlock holder 
>>> that we'd like to get kicked and go sleeping. While I'm pretty sure that it 
>>> improves the situation, it doesn't solve all of the issues we have.
>>>
>>> Imagine we have an idle host. All vcpus can freely run and everyone can 
>>> fetch the lock as fast as on real machines. We don't need to / want to go 
>>> to sleep here. Locks that take too long are bugs that need to be solved on 
>>> real hw just as well, so all we do is possibly incur overhead.
>> I'm not quite sure what your concern is.  The lock is under contention, so 
>> there's nothing to do except spin; all this patch adds is a variable 
>> decrement/test to the spin loop, but that's not going to waste any more CPU 
>> than the non-counting case.  And once it falls into the blocking path, its a 
>> win because the VCPU isn't burning CPU any more.
> The wakeup path is slower though.  The previous lock holder has to
> hypercall, and the new lock holder has to be scheduled, and transition
> from halted state to running (a vmentry).  So it's only a clear win if
> we can do something with the cpu other than go into the idle loop.

Not burning power is a win too.

Actually what you want is something like "if you preempt a VCPU while
its spinning in a lock, then push it into the slowpath and don't
reschedule it without a kick".  But I think that interface would have a
lot of fiddly corners.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

2012-01-16 Thread Jeremy Fitzhardinge
On 01/17/2012 01:47 AM, Avi Kivity wrote:
> On 01/16/2012 04:13 PM, Raghavendra K T wrote:
>>> Please drop all of these and replace with tracepoints in the appropriate
>>> spots.  Everything else (including the historgram) can be reconstructed
>>> the tracepoints in userspace.
>>>
>>
>> I think Jeremy pointed that tracepoints use spinlocks and hence debugfs
>> is the option.. no ?
>>
> Yeah, I think you're right.  What a pity.

Yes, I went through the same thought process.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests

2012-01-15 Thread Jeremy Fitzhardinge
problem gets.

> Speaking of which - have you benchmarked performance degradation of pv ticket 
> locks on bare metal? Last time I checked, enabling all the PV ops did incur 
> significant slowdown which is why I went though the work to split the 
> individual pv ops features up to only enable a few for KVM guests.

The whole point of the pv-ticketlock work is to keep the pvops hooks out of the 
locking fast path, so that the calls are only made on the slow path - that is, 
when spinning too long on a contended lock, and when releasing a lock that's in 
a "slow" state.  In the fast path case of no contention, there are no pvops, 
and the executed code path is almost identical to native.

But as I mentioned above, I'd like to see some benchmarks to prove that's the 
case.

J

> 
>> 
>> Changes in V4:
>> - reabsed to 3.2.0 pre.
>> - use APIC ID for kicking the vcpu and use kvm_apic_match_dest for matching. 
>> (Avi)
>> - fold vcpu->kicked flag into vcpu->requests (KVM_REQ_PVLOCK_KICK) and 
>> related 
>> changes for UNHALT path to make pv ticket spinlock migration friendly. (Avi, 
>> Marcello)
>> - Added Documentation for CPUID, Hypercall (KVM_HC_KICK_CPU)
>> and capabilty (KVM_CAP_PVLOCK_KICK) (Avi)
>> - Remove unneeded kvm_arch_vcpu_ioctl_set_mpstate call. (Marcello)
>> - cumulative variable type changed (int ==> u32) in add_stat (Konrad)
>> - remove unneeded kvm_guest_init for !CONFIG_KVM_GUEST case
>> 
>> Changes in V3:
>> - rebased to 3.2-rc1
>> - use halt() instead of wait for kick hypercall.
>> - modify kick hyper call to do wakeup halted vcpu.
>> - hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of 
>> head##.c).
>> - fix the potential race when zero_stat is read.
>> - export debugfs_create_32 and add documentation to API.
>> - use static inline and enum instead of ADDSTAT macro. 
>> - add  barrier() in after setting kick_vcpu.
>> - empty static inline function for kvm_spinlock_init.
>> - combine the patches one and two readuce overhead.
>> - make KVM_DEBUGFS depends on DEBUGFS.
>> - include debugfs header unconditionally.
>> 
>> Changes in V2:
>> - rebased patchesto -rc9
>> - synchronization related changes based on Jeremy's changes 
>> (Jeremy Fitzhardinge ) pointed by 
>> Stephan Diestelhorst 
>> - enabling 32 bit guests
>> - splitted patches into two more chunks
>> 
>> Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (5): 
>> Add debugfs support to print u32-arrays in debugfs
>> Add a hypercall to KVM hypervisor to support pv-ticketlocks
>> Added configuration support to enable debug information for KVM Guests
>> pv-ticketlocks support for linux guests running on KVM hypervisor
>> Add documentation on Hypercalls and features used for PV spinlock
>> 
>> Test Set up :
>> The BASE patch is pre 3.2.0 + Jeremy's following patches.
>> xadd (https://lkml.org/lkml/2011/10/4/328)
>> x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
>> Kernel for host/guest : 3.2.0 + Jeremy's xadd, pv spinlock patches as BASE
>> (Note:locked add change is not taken yet)
>> 
>> Results:
>> The performance gain is mainly because of reduced busy-wait time.
>> From the results we can see that patched kernel performance is similar to
>> BASE when there is no lock contention. But once we start seeing more
>> contention, patched kernel outperforms BASE (non PLE).
>> On PLE machine we do not see greater performance improvement because of PLE
>> complimenting halt()
>> 
>> 3 guests with 8VCPU, 4GB RAM, 1 used for kernbench
>> (kernbench -f -H -M -o 20) other for cpuhog (shell script while
>> true with an instruction)
>> 
>> scenario A: unpinned
>> 
>> 1x: no hogs
>> 2x: 8hogs in one guest
>> 3x: 8hogs each in two guest
>> 
>> scenario B: unpinned, run kernbench on all the guests no hogs.
>> 
>> Dbench on PLE machine:
>> dbench run on all the guest simultaneously with
>> dbench --warmup=30 -t 120 with NRCLIENTS=(8/16/32).
>> 
>> Result for Non PLE machine :
>> 
>> Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 
>> 64GB RAM
>>   BASEBASE+patch%improvement
>>   mean (sd)   mean (sd)
>> Scenario A:
>> case 1x:  164.233 (16.5506)   163.584 (15.4598   0.39517
>> case 2x:  897.654 (543.993)   328.63 (103.771)   63.3901
>> case 3x:  2855.73 (2201.41)   315.029 (111.854)

Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option

2012-01-12 Thread Jeremy Fitzhardinge
On 01/10/2012 05:44 AM, Stefano Stabellini wrote:
> On Mon, 9 Jan 2012, Andrew Jones wrote:
>> I guess if we did the s/XEN_DOM0/LOCAL_APIC && IO_APIC && ACPI/ in
>> arch/x86/pci/xen.c it would be pretty easy to review for equivalence.
>> Then keep CONFIG_PRIVILIGED, but drop XEN_DOM0 from everywhere else and
>> compile in the 3 files. I don't think it makes much sense to do it
>> though. XEN_DOM0 keeps things tidier now and might be useful later.
> we can keep things clean with the following:
>
> #ifdef CONFIG_LOCAL_APIC && CONFIG_IO_APIC && CONFIG_ACPI && CONFIG_PCI_XEN
>
> #define XEN_DOM0
>
> #endif
>
> in include/xen/xen.h.
>
> So in the source files we can still '#ifdef XEN_DOM0', but at the same
> time we can get rid of the build symbol: everybody wins.

No, really, I think this is silly.  This kind of dependency information
should be encoded in the Kconfig system, and not reimplemented in an
ad-hoc way.

If there were a clean way to do what Andrew wants then I'd support it,
but this thread has descended into a spiral of madness, which makes me
think its a lost cause.

If the root complaint is that "customers think that anything set in
.config is a supported feature", then the solutions are to support all
the features in .config, re-educate the customers that they're wrong, or
maintain a local patch to do this stuff.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 0/6] Collected vdso/vsyscall fixes for 3.1

2011-08-11 Thread Jeremy Fitzhardinge
On 08/03/2011 06:59 AM, Andrew Lutomirski wrote:
> On Wed, Aug 3, 2011 at 9:56 AM, Konrad Rzeszutek Wilk
>  wrote:
>> On Wed, Aug 03, 2011 at 09:53:22AM -0400, Konrad Rzeszutek Wilk wrote:
>>> On Wed, Aug 03, 2011 at 09:31:48AM -0400, Andy Lutomirski wrote:
 This fixes various problems that cropped up with the vdso patches.

  - Patch 1 fixes an information leak to userspace.
  - Patches 2 and 3 fix the kernel build on gold.
  - Patches 4 and 5 fix Xen (I hope).
  - Patch 6 (optional) adds a trace event to vsyscall emulation.  It will
make it easier to handle performance regression reports :)
>>> Hm, you seemed to have the x86 maintainers on your email..
>> I definitly need some coffee. What I meant was that you missing putting
>> the x86 maintainers on this patchset. They are the ones that will handle 
>> this.
>>
>> I put them on the To list for you.
> Are you sure about that coffee?  I'm pretty sure I had x...@kernel.org in 
> there.

What's the state of this series?  Has tip.git picked it up, or does it
need another go-around?

I just booted a Linus tree and got a backtrace that looks like this
issue, though I haven't looked into it in detail yet.

Thanks,
J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: linux-next: Tree for July 25 (xen)

2011-08-10 Thread Jeremy Fitzhardinge
On 08/08/2011 06:01 AM, Steven Rostedt wrote:
> On Sat, 2011-08-06 at 11:22 -0700, Jeremy Fitzhardinge wrote:
>> On 08/05/2011 06:58 PM, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Aug 05, 2011 at 11:14:23PM +0200, Ingo Molnar wrote:
>>>> * Ingo Molnar  wrote:
>>>>
>>>>> * Randy Dunlap  wrote:
>>>>>
>>>>>> On 08/04/11 15:40, Konrad Rzeszutek Wilk wrote:
>>>>>>> On Thu, Aug 04, 2011 at 06:32:59PM -0400, Konrad Rzeszutek Wilk wrote:
>>>>>>>>>>> These build failures are still triggering upstream:
>>>>>>>>>>>
>>>>>>>>>>>  arch/x86/xen/trace.c:44:2: error: array index in initializer not 
>>>>>>>>>>> of integer type
>>>>>>>>>>>  arch/x86/xen/trace.c:44:2: error: (near initialization for 
>>>>>>>>>>> ‘xen_hypercall_names’)
>>>>>>>>>>>  arch/x86/xen/trace.c:45:1: error: ‘__HYPERVISOR_arch_4’ undeclared 
>>>>>>>>>>> here (not in a function)
>>>>>>>>>>>  arch/x86/xen/trace.c:45:2: error: array index in initializer not 
>>>>>>>>>>> of integer type
>>>>>>>>>>>  arch/x86/xen/trace.c:45:2: error: (near initialization for 
>>>>>>>>>>> ‘xen_hypercall_names’)
>>>>>>>>>> Oh, that I haven't seen. Can you send me the .config for that please.
>>>>>>>>> You can't be trying very hard then.  I see lots of these (but no,
>>>>>>>> Ah, I am getting it now. Thanks for reporting it.
>>>>>>> This should do the trick:
>>>>>> Acked-by: Randy Dunlap 
>>>>> That patch did the trick here too:
>>>>>
>>>>> Acked-by: Ingo Molnar 
>>>> Except that i'm still seeing the occasional build failure - see the 
>>>> error log below. Config attached.
>>> Much appreciate for the report. I believe this fix (which I think hit
>>> linux-next yesterday?) should do that:
>>>
>>> commit 1e9ea2656b656edd3c8de98675bbc0340211b5bd
>>> Author: Jeremy Fitzhardinge 
>>> Date:   Wed Aug 3 09:43:44 2011 -0700
>>>
>>>  xen/tracing: it looks like we wanted CONFIG_FTRACE
>>> 
>>> Apparently we wanted CONFIG_FTRACE rather the CONFIG_FUNCTION_TRACER.
>>> 
>>> Reported-by: Sander Eikelenboom 
>>> Tested-by: Sander Eikelenboom 
>>> Signed-off-by: Jeremy Fitzhardinge 
>>> Signed-off-by: Konrad Rzeszutek Wilk 
>>>
>>> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
>>> index 45e94ac..3326204 100644
>>> --- a/arch/x86/xen/Makefile
>>> +++ b/arch/x86/xen/Makefile
>>> @@ -15,7 +15,7 @@ obj-y := enlighten.o setup.o multicalls.o 
>>> mmu.o irq.o \
>>> grant-table.o suspend.o platform-pci-unplug.o \
>>> p2m.o
>>>  
>>> -obj-$(CONFIG_FUNCTION_TRACER) += trace.o
>>> +obj-$(CONFIG_FTRACE) += trace.o
>>>  
>> I'm not sure this is correct either.  Maybe it should be
>> CONFIG_TRACEPOINTS?  Steven?
> Actually, I believe the correct answer is:
>
> CONFIG_EVENT_TRACING

OK, thanks.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Xen-devel] Re: linux-next: Tree for July 25 (xen)

2011-08-06 Thread Jeremy Fitzhardinge
On 08/05/2011 06:58 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Aug 05, 2011 at 11:14:23PM +0200, Ingo Molnar wrote:
>> * Ingo Molnar  wrote:
>>
>>> * Randy Dunlap  wrote:
>>>
>>>> On 08/04/11 15:40, Konrad Rzeszutek Wilk wrote:
>>>>> On Thu, Aug 04, 2011 at 06:32:59PM -0400, Konrad Rzeszutek Wilk wrote:
>>>>>>>>> These build failures are still triggering upstream:
>>>>>>>>>
>>>>>>>>>  arch/x86/xen/trace.c:44:2: error: array index in initializer not of 
>>>>>>>>> integer type
>>>>>>>>>  arch/x86/xen/trace.c:44:2: error: (near initialization for 
>>>>>>>>> ‘xen_hypercall_names’)
>>>>>>>>>  arch/x86/xen/trace.c:45:1: error: ‘__HYPERVISOR_arch_4’ undeclared 
>>>>>>>>> here (not in a function)
>>>>>>>>>  arch/x86/xen/trace.c:45:2: error: array index in initializer not of 
>>>>>>>>> integer type
>>>>>>>>>  arch/x86/xen/trace.c:45:2: error: (near initialization for 
>>>>>>>>> ‘xen_hypercall_names’)
>>>>>>>> Oh, that I haven't seen. Can you send me the .config for that please.
>>>>>>> You can't be trying very hard then.  I see lots of these (but no,
>>>>>> Ah, I am getting it now. Thanks for reporting it.
>>>>> This should do the trick:
>>>> Acked-by: Randy Dunlap 
>>> That patch did the trick here too:
>>>
>>> Acked-by: Ingo Molnar 
>> Except that i'm still seeing the occasional build failure - see the 
>> error log below. Config attached.
> Much appreciate for the report. I believe this fix (which I think hit
> linux-next yesterday?) should do that:
>
> commit 1e9ea2656b656edd3c8de98675bbc0340211b5bd
> Author: Jeremy Fitzhardinge 
> Date:   Wed Aug 3 09:43:44 2011 -0700
>
>  xen/tracing: it looks like we wanted CONFIG_FTRACE
> 
> Apparently we wanted CONFIG_FTRACE rather the CONFIG_FUNCTION_TRACER.
> 
> Reported-by: Sander Eikelenboom 
> Tested-by: Sander Eikelenboom 
> Signed-off-by: Jeremy Fitzhardinge 
> Signed-off-by: Konrad Rzeszutek Wilk 
>
> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
> index 45e94ac..3326204 100644
> --- a/arch/x86/xen/Makefile
> +++ b/arch/x86/xen/Makefile
> @@ -15,7 +15,7 @@ obj-y   := enlighten.o setup.o multicalls.o 
> mmu.o irq.o \
>   grant-table.o suspend.o platform-pci-unplug.o \
>   p2m.o
>  
> -obj-$(CONFIG_FUNCTION_TRACER) += trace.o
> +obj-$(CONFIG_FTRACE) += trace.o
>  

I'm not sure this is correct either.  Maybe it should be
CONFIG_TRACEPOINTS?  Steven?

Thanks,
J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 5/5] x86-64: Add user_64bit_mode paravirt op

2011-07-27 Thread Jeremy Fitzhardinge
On 07/26/2011 08:20 PM, Andy Lutomirski wrote:
> Three places in the kernel assume that the only long mode CPL 3
> selector is __USER_CS.  This is not true on Xen -- Xen's sysretq
> changes cs to the magic value 0xe033.
>
> Two of the places are corner cases, but as of "x86-64: Improve
> vsyscall emulation CS and RIP handling"
> (c9712944b2a12373cb6ff8059afcfb7e826a6c54), vsyscalls will segfault
> if called with Xen's extra CS selector.  This causes a panic when
> older init builds die.
>
> It seems impossible to make Xen use __USER_CS reliably without
> taking a performance hit on every system call, so this fixes the
> tests instead with a new paravirt op.  It's a little ugly because
> ptrace.h can't include paravirt.h.
>
> Signed-off-by: Andy Lutomirski 
> Reported-by: Konrad Rzeszutek Wilk 
> ---
>  arch/x86/include/asm/desc.h   |4 ++--
>  arch/x86/include/asm/paravirt_types.h |6 ++
>  arch/x86/include/asm/ptrace.h |   19 +++
>  arch/x86/kernel/paravirt.c|4 
>  arch/x86/kernel/step.c|2 +-
>  arch/x86/kernel/vsyscall_64.c |6 +-
>  arch/x86/mm/fault.c   |2 +-
>  arch/x86/xen/enlighten.c  |1 +
>  8 files changed, 35 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
> index 7b439d9..41935fa 100644
> --- a/arch/x86/include/asm/desc.h
> +++ b/arch/x86/include/asm/desc.h
> @@ -27,8 +27,8 @@ static inline void fill_ldt(struct desc_struct *desc, const 
> struct user_desc *in
>  
>   desc->base2 = (info->base_addr & 0xff00) >> 24;
>   /*
> -  * Don't allow setting of the lm bit. It is useless anyway
> -  * because 64bit system calls require __USER_CS:
> +  * Don't allow setting of the lm bit. It would confuse
> +  * user_64bit_mode and would get overridden by sysret anyway.
>*/
>   desc->l = 0;
>  }
> diff --git a/arch/x86/include/asm/paravirt_types.h 
> b/arch/x86/include/asm/paravirt_types.h
> index 2c76521..8e8b9a4 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -41,6 +41,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  struct page;
>  struct thread_struct;
> @@ -63,6 +64,11 @@ struct paravirt_callee_save {
>  struct pv_info {
>   unsigned int kernel_rpl;
>   int shared_kernel_pmd;
> +
> +#ifdef CONFIG_X86_64
> + u16 extra_user_64bit_cs;  /* __USER_CS if none */
> +#endif
> +
>   int paravirt_enabled;
>   const char *name;
>  };
> diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
> index 94e7618..3566454 100644
> --- a/arch/x86/include/asm/ptrace.h
> +++ b/arch/x86/include/asm/ptrace.h
> @@ -131,6 +131,9 @@ struct pt_regs {
>  #ifdef __KERNEL__
>  
>  #include 
> +#ifdef CONFIG_PARAVIRT
> +#include 
> +#endif
>  
>  struct cpuinfo_x86;
>  struct task_struct;
> @@ -187,6 +190,22 @@ static inline int v8086_mode(struct pt_regs *regs)
>  #endif
>  }
>  
> +#ifdef CONFIG_X86_64
> +static inline bool user_64bit_mode(struct pt_regs *regs)
> +{
> +#ifndef CONFIG_PARAVIRT
> + /*
> +  * On non-paravirt systems, this is the only long mode CPL 3
> +  * selector.  We do not allow long mode selectors in the LDT.
> +  */
> + return regs->cs == __USER_CS;
> +#else
> + /* Headers are too twisted for this to go in paravirt.h. */
> + return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs;

Is this necessary because usermode may sometimes be on __USER_CS or
sometimes on Xen's?  Could we just commit to one or the other and make
it a simple comparison?

What if __USER_CS were a variable?

J
> +#endif
> +}
> +#endif
> +
>  /*
>   * X86_32 CPUs don't save ss and esp if the CPU is already in kernel mode
>   * when it traps.  The previous stack will be directly underneath the saved
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index 613a793..d90272e 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -307,6 +307,10 @@ struct pv_info pv_info = {
>   .paravirt_enabled = 0,
>   .kernel_rpl = 0,
>   .shared_kernel_pmd = 1, /* Only used when CONFIG_X86_PAE is set */
> +
> +#ifdef CONFIG_X86_64
> + .extra_user_64bit_cs = __USER_CS,
> +#endif
>  };
>  
>  struct pv_init_ops pv_init_ops = {
> diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
> index 7977f0c..c346d11 100644
> --- a/arch/x86/kernel/step.c
> +++ b/arch/x86/kernel/step.c
> @@ -74,7 +74,7 @@ static int is_setting_trap_flag(struct task_struct *child, 
> struct pt_regs *regs)
>  
>  #ifdef CONFIG_X86_64
>   case 0x40 ... 0x4f:
> - if (regs->cs != __USER_CS)
> + if (!user_64bit_mode(regs))
>   /* 32-bit mode: register increment */
>   return 0;
>   /* 64-bit mode: REX p

Re: [PATCH 5/7] Xen: fix whitespaces, tabs coding style issue in drivers/xen/xenbus/xenbus_client.c

2011-07-26 Thread Jeremy Fitzhardinge
On 07/26/2011 04:16 AM, ruslanpisa...@gmail.com wrote:
> From: Ruslan Pisarev 
>
> This is a patch to the xenbus_client.c file that fixed up whitespaces, tabs 
> errors found by the checkpatch.pl tools.
>
> Signed-off-by: Ruslan Pisarev 
> ---
>  drivers/xen/xenbus/xenbus_client.c |   18 +-
>  1 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/xen/xenbus/xenbus_client.c 
> b/drivers/xen/xenbus/xenbus_client.c
> index cdacf92..ff4f03c 100644
> --- a/drivers/xen/xenbus/xenbus_client.c
> +++ b/drivers/xen/xenbus/xenbus_client.c
> @@ -43,15 +43,15 @@
>  const char *xenbus_strstate(enum xenbus_state state)
>  {
>   static const char *const name[] = {
> - [ XenbusStateUnknown  ] = "Unknown",
> - [ XenbusStateInitialising ] = "Initialising",
> - [ XenbusStateInitWait ] = "InitWait",
> - [ XenbusStateInitialised  ] = "Initialised",
> - [ XenbusStateConnected] = "Connected",
> - [ XenbusStateClosing  ] = "Closing",
> - [ XenbusStateClosed   ] = "Closed",
> - [XenbusStateReconfiguring] = "Reconfiguring",
> - [XenbusStateReconfigured] = "Reconfigured",
> + [XenbusStateUnknown] =  "Unknown",
> + [XenbusStateInitialising] = "Initialising",
> + [XenbusStateInitWait] = "InitWait",
> + [XenbusStateInitialised] =  "Initialised",
> + [XenbusStateConnected] ="Connected",
> + [XenbusStateClosing] =  "Closing",
> + [XenbusStateClosed] =   "Closed",
> + [XenbusStateReconfiguring] ="Reconfiguring",
> + [XenbusStateReconfigured] = "Reconfigured",
>   };

Eh, I think this looks worse now.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 09/11] xen/xenbus: use printk_ratelimited() instead of printk_ratelimit()

2011-06-16 Thread Jeremy Fitzhardinge
On 06/16/2011 05:14 AM, Manuel Zerpies wrote:
> Since printk_ratelimit() shouldn't be used anymore (see comment in
> include/linux/printk.h), replace it with printk_ratelimited()
>
Looks OK to me, but please fix the indentation of the rest of the
statement to match.

Thanks,
J

> Signed-off-by: Manuel Zerpies 
> ---
>  drivers/xen/xenbus/xenbus_xs.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
> index 5534690..4055858 100644
> --- a/drivers/xen/xenbus/xenbus_xs.c
> +++ b/drivers/xen/xenbus/xenbus_xs.c
> @@ -44,6 +44,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include "xenbus_comms.h"
>  
> @@ -270,8 +271,7 @@ static void *xs_talkv(struct xenbus_transaction t,
>   }
>  
>   if (msg.type != type) {
> - if (printk_ratelimit())
> - printk(KERN_WARNING
> + `printk_ratelimited(KERN_WARNING
>  "XENBUS unexpected type [%d], expected [%d]\n",
>  msg.type, type);
>   kfree(ret);

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch] xen: off by one errors in multicalls.c

2011-06-03 Thread Jeremy Fitzhardinge
On 06/02/2011 09:45 PM, Dan Carpenter wrote:
> b->args[] has MC_ARGS elements, so the comparison here should be
> ">=" instead of ">".  Otherwise we read past the end of the array
> one space.

Yeah, looks like a correct fix.  Fortunately I don't think anything
currently hits that path in practice, though there are some pending
patches which will exercise it more.

Thanks,
J

> Signed-off-by: Dan Carpenter 
> ---
> This is a static checker patch and I haven't tested it.  Please
> review carefully.
>
> diff --git a/arch/x86/xen/multicalls.c b/arch/x86/xen/multicalls.c
> index 8bff7e7..1b2b73f 100644
> --- a/arch/x86/xen/multicalls.c
> +++ b/arch/x86/xen/multicalls.c
> @@ -189,10 +189,10 @@ struct multicall_space __xen_mc_entry(size_t args)
>   unsigned argidx = roundup(b->argidx, sizeof(u64));
>  
>   BUG_ON(preemptible());
> - BUG_ON(b->argidx > MC_ARGS);
> + BUG_ON(b->argidx >= MC_ARGS);
>  
>   if (b->mcidx == MC_BATCH ||
> - (argidx + args) > MC_ARGS) {
> + (argidx + args) >= MC_ARGS) {
>   mc_stats_flush(b->mcidx == MC_BATCH ? FL_SLOTS : FL_ARGS);
>   xen_mc_flush();
>   argidx = roundup(b->argidx, sizeof(u64));
> @@ -206,7 +206,7 @@ struct multicall_space __xen_mc_entry(size_t args)
>   ret.args = &b->args[argidx];
>   b->argidx = argidx + args;
>  
> - BUG_ON(b->argidx > MC_ARGS);
> + BUG_ON(b->argidx >= MC_ARGS);
>   return ret;
>  }
>  
> @@ -216,7 +216,7 @@ struct multicall_space xen_mc_extend_args(unsigned long 
> op, size_t size)
>   struct multicall_space ret = { NULL, NULL };
>  
>   BUG_ON(preemptible());
> - BUG_ON(b->argidx > MC_ARGS);
> + BUG_ON(b->argidx >= MC_ARGS);
>  
>   if (b->mcidx == 0)
>   return ret;
> @@ -224,14 +224,14 @@ struct multicall_space xen_mc_extend_args(unsigned long 
> op, size_t size)
>   if (b->entries[b->mcidx - 1].op != op)
>   return ret;
>  
> - if ((b->argidx + size) > MC_ARGS)
> + if ((b->argidx + size) >= MC_ARGS)
>   return ret;
>  
>   ret.mc = &b->entries[b->mcidx - 1];
>   ret.args = &b->args[b->argidx];
>   b->argidx += size;
>  
> - BUG_ON(b->argidx > MC_ARGS);
> + BUG_ON(b->argidx >= MC_ARGS);
>   return ret;
>  }
>  
> ___
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/virtualization
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [PATCH] xen: drop anti-dependency on X86_VISWS

2011-04-14 Thread Jeremy Fitzhardinge
On 04/09/2011 03:34 AM, Ian Campbell wrote:
>>> Actually it does - see the "#ifndef CONFIG_X86_CMPXCHG" section
>>> in asm/cmpxchg_32.h.
>> Hm, OK.  Still, I'm happiest with that dependency in case someone
>> knobbles the cpu to exclude cmpxchg and breaks things.
> Dropping the TSC patch is sensible though?

You mean dropping the TSC dependency?  Yes, I think so.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [PATCH] xen: drop anti-dependency on X86_VISWS

2011-04-08 Thread Jeremy Fitzhardinge
On 04/08/2011 08:42 AM, Jan Beulich wrote:
>>>> On 08.04.11 at 17:25, Jeremy Fitzhardinge  wrote:
>> On 04/07/2011 11:38 PM, Ian Campbell wrote:
>>> Is there any downside to this patch (is X86_CMPXCHG in the same sort of
>>> boat?)
>> Only if we don't use cmpxchg in shared memory with other domains or the
>> hypervisor.  (I don't think it will dynamically switch between real and
>> emulated cmpxchg depending on availability.)
> Actually it does - see the "#ifndef CONFIG_X86_CMPXCHG" section
> in asm/cmpxchg_32.h.

Hm, OK.  Still, I'm happiest with that dependency in case someone
knobbles the cpu to exclude cmpxchg and breaks things.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] xen: drop anti-dependency on X86_VISWS

2011-04-08 Thread Jeremy Fitzhardinge
On 04/07/2011 11:38 PM, Ian Campbell wrote:
>> Not really.  The TSC register is a requirement, but that's going to be
>> present on any CPU which can boot Xen.  We don't need any of the
>> kernel's TSC machinery though.
> So why the Kconfig dependency then? In principal a kernel compiled for a
> non-TSC processor (which meets the other requirements for Xen, such as
> PAE support) will run just fine under Xen on a newer piece of hardware.

Not sure where it came from.  It was probably never needed, or just
added for some secondary effect we wanted.

> Is there any downside to this patch (is X86_CMPXCHG in the same sort of
> boat?)

Only if we don't use cmpxchg in shared memory with other domains or the
hypervisor.  (I don't think it will dynamically switch between real and
emulated cmpxchg depending on availability.)

J

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] xen: drop anti-dependency on X86_VISWS

2011-04-07 Thread Jeremy Fitzhardinge
On 04/06/2011 11:58 PM, Ian Campbell wrote:
> On Wed, 2011-04-06 at 22:45 +0100, David Miller wrote:
>> From: Ian Campbell 
>> Date: Mon, 4 Apr 2011 10:55:55 +0100
>>
>>> You mean the "!X86_VISWS" I presume? It doesn't make sense to me either.
>> No, I think 32-bit x86 allmodconfig elides XEN because of it's X86_TSC 
>> dependency.
> TSC is a real dependency of the Xen interfaces.

Not really.  The TSC register is a requirement, but that's going to be
present on any CPU which can boot Xen.  We don't need any of the
kernel's TSC machinery though.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 13/14] x86/ticketlock: add slowpath logic

2011-01-24 Thread Jeremy Fitzhardinge
On 01/19/2011 08:23 AM, Srivatsa Vaddagiri wrote:
> On Mon, Jan 17, 2011 at 08:52:22PM +0530, Srivatsa Vaddagiri wrote:
>> I think this is still racy ..
>>
>> Unlocker Locker
>>
>>  
>> test slowpath
>>  -> false
>>  
>>  set slowpath flag
>>  test for lock pickup
>>  -> fail
>>  block
>>
>>
>> unlock
>>
>> unlock needs to happen first before testing slowpath? I have made that change
>> for my KVM guest and it seems to be working well with that change .. Will
>> cleanup and post my patches shortly
> Patch below fixes the race described above. You can fold this to your patch
> 13/14 if you agree this is in right direction.
>
>
> Signed-off-by: Srivatsa Vaddagiri 
>
> ---
>  arch/x86/include/asm/spinlock.h  |7 +++
>  arch/x86/kernel/paravirt-spinlocks.c |   22 +-
>  2 files changed, 8 insertions(+), 21 deletions(-)
>
> Index: linux-2.6.37/arch/x86/include/asm/spinlock.h
> ===
> --- linux-2.6.37.orig/arch/x86/include/asm/spinlock.h
> +++ linux-2.6.37/arch/x86/include/asm/spinlock.h
> @@ -55,7 +55,7 @@ static __always_inline void __ticket_unl
>  
>  /* Only defined when CONFIG_PARAVIRT_SPINLOCKS defined, but may as
>   * well leave the prototype always visible.  */
> -extern void __ticket_unlock_release_slowpath(struct arch_spinlock *lock);
> +extern void __ticket_unlock_slowpath(struct arch_spinlock *lock);
>  
>  #ifdef CONFIG_PARAVIRT_SPINLOCKS
>  
> @@ -166,10 +166,9 @@ static __always_inline int arch_spin_try
>  static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
>  {
>   barrier();  /* prevent reordering out of locked region */
> + __ticket_unlock_release(lock);
>   if (unlikely(__ticket_in_slowpath(lock)))
> - __ticket_unlock_release_slowpath(lock);
> - else
> - __ticket_unlock_release(lock);
> + __ticket_unlock_slowpath(lock);
>   barrier();  /* prevent reordering into locked region */
>  }
>  
> Index: linux-2.6.37/arch/x86/kernel/paravirt-spinlocks.c
> ===
> --- linux-2.6.37.orig/arch/x86/kernel/paravirt-spinlocks.c
> +++ linux-2.6.37/arch/x86/kernel/paravirt-spinlocks.c
> @@ -22,33 +22,21 @@ EXPORT_SYMBOL(pv_lock_ops);
>   * bits.  However, we need to be careful about this because someone
>   * may just be entering as we leave, and enter the slowpath.
>   */
> -void __ticket_unlock_release_slowpath(struct arch_spinlock *lock)
> +void __ticket_unlock_slowpath(struct arch_spinlock *lock)
>  {
>   struct arch_spinlock old, new;
>  
>   BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
>  
>   old = ACCESS_ONCE(*lock);
> -
>   new = old;
> - new.tickets.head += TICKET_LOCK_INC;
>  
>   /* Clear the slowpath flag */
>   new.tickets.tail &= ~TICKET_SLOWPATH_FLAG;
> + if (new.tickets.head == new.tickets.tail)
> + cmpxchg(&lock->head_tail, old.head_tail, new.head_tail);
>  
> - /*
> -  * If there's currently people waiting or someone snuck in
> -  * since we read the lock above, then do a normal unlock and
> -  * kick.  If we managed to unlock with no queued waiters, then
> -  * we can clear the slowpath flag.
> -  */
> - if (new.tickets.head != new.tickets.tail ||
> - cmpxchg(&lock->head_tail,
> - old.head_tail, new.head_tail) != old.head_tail) {
> - /* still people waiting */
> - __ticket_unlock_release(lock);
> - }
> -
> + /* Wake up an appropriate waiter */
>   __ticket_unlock_kick(lock, new.tickets.head);

Does the __ticket_unlock_kick need to be unconditional?  If the
cmpxchg() fails, then it means someone else has come in and changed the
lock under our feet, and presumably they'll do the kick as needed?

Or is there the possibility that the kick will get lost?

There's certainly no harm in doing a redundant kick, and I'd expect that
case to be pretty rare...

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock

2011-01-24 Thread Jeremy Fitzhardinge
On 01/22/2011 06:53 AM, Rik van Riel wrote:
> The main question that remains is whether the PV ticketlocks are
> a large enough improvement to also merge those.  I expect they
> will be, and we'll see so in the benchmark numbers.

The pathological worst-case of ticket locks in a virtual environment
isn't very hard to hit, so I expect they'll make a huge difference
there.   On lightly loaded systems (little or no CPU overcommit) then
they should be close to a no-op.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock

2011-01-20 Thread Jeremy Fitzhardinge
On 01/20/2011 03:59 AM, Srivatsa Vaddagiri wrote:
>> At least in the Xen code, a current owner isn't very useful, because we
>> need the current owner to kick the *next* owner to life at release time,
>> which we can't do without some structure recording which ticket belongs
>> to which cpu.
> If we had a yield-to [1] sort of interface _and_ information on which vcpu
> owns a lock, then lock-spinners can yield-to the owning vcpu, while the
> unlocking vcpu can yield-to the next-vcpu-in-waiting.

Perhaps, but the core problem is how to find "next-vcpu-in-waiting"
efficiently.  Once you have that info, there's a number of things you
can usefully do with it.

>  The key here is not to
> sleep when waiting for locks (as implemented by current patch-series, which 
> can 
> put other VMs at an advantage by giving them more time than they are entitled 
> to)

Why?  If a VCPU can't make progress because its waiting for some
resource, then why not schedule something else instead?  Presumably when
the VCPU does become runnable, the scheduler will credit its previous
blocked state and let it run in preference to something else.

>  and also to ensure that lock-owner as well as the next-in-line lock-owner 
> are not unduly made to wait for cpu. 
>
> Is there a way we can dynamically expand the size of lock only upon contention
> to include additional information like owning vcpu? Have the lock point to a
> per-cpu area upon contention where additional details can be stored perhaps?

As soon as you add a pointer to the lock, you're increasing its size. 
If we had a pointer in there already, then all of this would be moot.

If auxiliary per-lock is uncommon, then using a hash keyed on lock
address would be one way to do it.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock

2011-01-20 Thread Jeremy Fitzhardinge
On 01/20/2011 03:42 AM, Srivatsa Vaddagiri wrote:
> On Wed, Jan 19, 2011 at 10:53:52AM -0800, Jeremy Fitzhardinge wrote:
>>> The reason for wanting this should be clear I guess, it allows PI.
>> Well, if we can expand the spinlock to include an owner, then all this
>> becomes moot...
> How so? Having an owner will not eliminate the need for pv-ticketlocks
> afaict. We still need a mechanism to reduce latency in scheduling the next
> thread-in-waiting, which is achieved by your patches. 

No, sorry, I should have been clearer.  I meant that going to the effort
of not increasing the lock size to record "in slowpath" state.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 13/14] x86/ticketlock: add slowpath logic

2011-01-19 Thread Jeremy Fitzhardinge
On 01/19/2011 10:39 AM, Srivatsa Vaddagiri wrote:
> I have tested quite extensively with booting a 16-vcpu guest (on a 16-pcpu 
> host)
> and running kernel compine (with 32-threads). Without this patch, I had
> difficulty booting/shutting-down successfully (it would hang mid-way).

Sounds good.  But I like to test with "make -j 100-200" to really give
things a workout.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/3] kvm hypervisor : Add hypercalls to support pv-ticketlock

2011-01-19 Thread Jeremy Fitzhardinge
On 01/19/2011 09:21 AM, Peter Zijlstra wrote:
> On Wed, 2011-01-19 at 22:42 +0530, Srivatsa Vaddagiri wrote:
>> Add two hypercalls to KVM hypervisor to support pv-ticketlocks.
>>
>> KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or 
>> it
>> is woken up because of an event like interrupt.
>>
>> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu.
>>
>> The presence of these hypercalls is indicated to guest via
>> KVM_FEATURE_WAIT_FOR_KICK/KVM_CAP_WAIT_FOR_KICK. Qemu needs a corresponding
>> patch to pass up the presence of this feature to guest via cpuid. Patch to 
>> qemu
>> will be sent separately.
>
> I didn't really read the patch, and I totally forgot everything from
> when I looked at the Xen series, but does the Xen/KVM hypercall
> interface for this include the vcpu to await the kick from?
>
> My guess is not, since the ticket locks used don't know who the owner
> is, which is of course, sad. There are FIFO spinlock implementations
> that can do this though.. although I think they all have a bigger memory
> footprint.

At least in the Xen code, a current owner isn't very useful, because we
need the current owner to kick the *next* owner to life at release time,
which we can't do without some structure recording which ticket belongs
to which cpu.

(A reminder: the big problem with ticket locks is not with the current
owner getting preempted, but making sure the next VCPU gets scheduled
quickly when the current one releases; without that all the waiting
VCPUs burn the timeslices until the VCPU scheduler gets around to
scheduling the actual next in line.)

At present, the code needs to scan an array of percpu "I am waiting on
lock X with ticket Y" structures to work out who's next.  The search is
somewhat optimised by keeping a cpuset of which CPUs are actually
blocked on spinlocks, but its still going to scale badly with lots of CPUs.

I haven't thought of a good way to improve on this; an obvious approach
is to just add a pointer to the spinlock and hang an explicit linked
list off it, but that's incompatible with wanting to avoid expanding the
lock.

You could have a table of auxiliary per-lock data hashed on the lock
address, but its not clear to me that its an improvement on the array
approach, especially given the synchronization issues of keeping that
structure up to date (do we have a generic lockless hashtable
implementation?).  But perhaps its one of those things that makes sense
at larger scales.

> The reason for wanting this should be clear I guess, it allows PI.

Well, if we can expand the spinlock to include an owner, then all this
becomes moot...

J


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 13/14] x86/ticketlock: add slowpath logic

2011-01-19 Thread Jeremy Fitzhardinge
On 01/17/2011 07:22 AM, Srivatsa Vaddagiri wrote:
> On Tue, Nov 16, 2010 at 01:08:44PM -0800, Jeremy Fitzhardinge wrote:
>> From: Jeremy Fitzhardinge 
>>
>> Maintain a flag in both LSBs of the ticket lock which indicates whether
>> anyone is in the lock slowpath and may need kicking when the current
>> holder unlocks.  The flags are set when the first locker enters
>> the slowpath, and cleared when unlocking to an empty queue.
>>
>> In the specific implementation of lock_spinning(), make sure to set
>> the slowpath flags on the lock just before blocking.  We must do
>> this before the last-chance pickup test to prevent a deadlock
>> with the unlocker:
>>
>> Unlocker Locker
>>  test for lock pickup
>>  -> fail
>> test slowpath + unlock
>>  -> false
>>  set slowpath flags
>>  block
>>
>> Whereas this works in any ordering:
>>
>> Unlocker Locker
>>  set slowpath flags
>>  test for lock pickup
>>  -> fail
>>  block
>> test slowpath + unlock
>>  -> true, kick
> I think this is still racy ..
>
> Unlocker  Locker
>
>   
> test slowpath
>   -> false
>   
>   set slowpath flag
>   test for lock pickup
>   -> fail
>   block
>
>
> unlock
>
> unlock needs to happen first before testing slowpath? I have made that change
> for my KVM guest and it seems to be working well with that change .. Will
> cleanup and post my patches shortly

I think you're probably right; when I last tested this code, it was
hanging in at about the rate this kind of race would cause.  And in my
previous analysis of similar schemes (the current pv spinlock code), it
was always correct to do the "slowpath" test after doing do the unlock.

*But* I haven't yet had the chance to specifically go through and
analyse your patch in detail to make sure there's nothing else going on,
so take this as provisional.

How much testing have you given it?

Thanks,
J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86/pvclock-xen: zero last_value on resume

2010-11-26 Thread Jeremy Fitzhardinge
On 11/26/2010 02:40 AM, Ingo Molnar wrote:
> * Jeremy Fitzhardinge  wrote:
>
>> On 10/26/2010 09:59 AM, Jeremy Fitzhardinge wrote:
>>> If the guest domain has been suspend/resumed or migrated, then the
>>> system clock backing the pvclock clocksource may revert to a smaller
>>> value (ie, can be non-monotonic across the migration/save-restore).
>>> Make sure we zero last_value in that case so that the domain
>>> continues to see clock updates.
>> Ping?  Looks like this fell through the gaps.
> Does not apply cleanly here - mind resending the latest version?

It rebased cleanly to 2.6.37-rc3.  You can pull it from:

  git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git pvclock-resume

Jeremy Fitzhardinge (1):
  x86/pvclock: zero last_value on resume

 arch/x86/include/asm/pvclock.h |1 +
 arch/x86/kernel/pvclock.c  |5 +
 arch/x86/xen/time.c|2 ++
 3 files changed, 8 insertions(+), 0 deletions(-)

Thanks,
J

Subject: [PATCH] x86/pvclock: zero last_value on resume

If the guest domain has been suspend/resumed or migrated, then the
system clock backing the pvclock clocksource may revert to a smaller
value (ie, can be non-monotonic across the migration/save-restore).
Make sure we zero last_value in that case so that the domain
continues to see clock updates.

Signed-off-by: Jeremy Fitzhardinge 

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 7f7e577..31d84ac 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -11,6 +11,7 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info 
*src);
 void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
struct pvclock_vcpu_time_info *vcpu,
struct timespec *ts);
+void pvclock_resume(void);
 
 /*
  * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 008b91e..42eb330 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -83,6 +83,11 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info 
*src)
 
 static atomic64_t last_value = ATOMIC64_INIT(0);
 
+void pvclock_resume(void)
+{
+   atomic64_set(&last_value, 0);
+}
+
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
struct pvclock_shadow_time shadow;
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index b2bb5aa..5da5e53 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -426,6 +426,8 @@ void xen_timer_resume(void)
 {
int cpu;
 
+   pvclock_resume();
+
if (xen_clockevent != &xen_vcpuop_clockevent)
return;
 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86/pvclock-xen: zero last_value on resume

2010-11-22 Thread Jeremy Fitzhardinge
On 10/26/2010 09:59 AM, Jeremy Fitzhardinge wrote:
>
> If the guest domain has been suspend/resumed or migrated, then the
> system clock backing the pvclock clocksource may revert to a smaller
> value (ie, can be non-monotonic across the migration/save-restore).
> Make sure we zero last_value in that case so that the domain
> continues to see clock updates.

Ping?  Looks like this fell through the gaps.

J

> [ I don't know if kvm needs an analogous fix or not. ]
>
> Signed-off-by: Jeremy Fitzhardinge 
> Cc: Stable Kernel 
> Reported-by: Eelco Dolstra 
> Reported-by: Olivier Hanesse 
> Bisected-by: Cédric Schieli 
> Tested-by: Cédric Schieli 
>
> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
> index cd02f32..6226870 100644
> --- a/arch/x86/include/asm/pvclock.h
> +++ b/arch/x86/include/asm/pvclock.h
> @@ -11,5 +11,6 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info 
> *src);
>  void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
>   struct pvclock_vcpu_time_info *vcpu,
>   struct timespec *ts);
> +void pvclock_resume(void);
>  
>  #endif /* _ASM_X86_PVCLOCK_H */
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 239427c..a4f07c1 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -120,6 +120,11 @@ unsigned long pvclock_tsc_khz(struct 
> pvclock_vcpu_time_info *src)
>  
>  static atomic64_t last_value = ATOMIC64_INIT(0);
>  
> +void pvclock_resume(void)
> +{
> + atomic64_set(&last_value, 0);
> +}
> +
>  cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
>  {
>   struct pvclock_shadow_time shadow;
> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> index b2bb5aa..5da5e53 100644
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -426,6 +426,8 @@ void xen_timer_resume(void)
>  {
>   int cpu;
>  
> + pvclock_resume();
> +
>   if (xen_clockevent != &xen_vcpuop_clockevent)
>   return;
>  
>
>
>
>
> ___
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/virtualization

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Xen-devel] Re: [PATCH 09/14] xen/pvticketlock: Xen implementation for PV ticket locks

2010-11-17 Thread Jeremy Fitzhardinge
On 11/17/2010 02:34 AM, Jan Beulich wrote:
>> Actually, on second thoughts, maybe it doesn't matter so much.  The main
>> issue is making sure that the interrupt will make the VCPU drop out of
>> xen_poll_irq() - if it happens before xen_poll_irq(), it should leave
>> the event pending, which will cause the poll to return immediately.  I
>> hope.  Certainly disabling interrupts for some of the function will make
>> it easier to analyze with respect to interrupt nesting.
> That's not my main concern. Instead, what if you get interrupted
> anywhere here, the interrupt handler tries to acquire another
> spinlock and also has to go into the slow path? It'll overwrite part
> or all of the outer context's state.

That doesn't matter if the outer context doesn't end up blocking.  If it
has already blocked then it will unblock as a result of the interrupt;
if it hasn't yet blocked, then the inner context will leave the event
pending and cause it to not block.  Either way, it no longer uses or
needs that per-cpu state: it will return to the spin loop and (maybe)
get re-entered, setting it all up again.

I think there is a problem with the code as posted because it sets up
the percpu data before clearing the pending event, so it can end up
blocking with bad percpu data.

>> Another issue may be making sure the writes and reads of "w->want" and
>> "w->lock" are ordered properly to make sure that xen_unlock_kick() never
>> sees an inconsistent view of the (lock,want) tuple.  The risk being that
>> xen_unlock_kick() sees a random, spurious (lock,want) pairing and sends
>> the kick event to the wrong VCPU, leaving the deserving one hung.
> Yes, proper operation sequence (and barriers) is certainly
> required here. If you allowed nesting, this may even become
> simpler (as you'd have a single write making visible the new
> "head" pointer, after having written all relevant fields of the
> new "head" structure).

Yes, simple nesting should be quite straightforward (ie allowing an
interrupt handler to take some other lock than the one the outer context
is waiting on).

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [PATCH 13/14] x86/ticketlock: add slowpath logic

2010-11-17 Thread Jeremy Fitzhardinge
On 11/17/2010 04:21 AM, Peter Zijlstra wrote:
> On Tue, 2010-11-16 at 13:08 -0800, Jeremy Fitzhardinge wrote:
>> Maintain a flag in both LSBs of the ticket lock which indicates whether
>> anyone is in the lock slowpath and may need kicking when the current
>> holder unlocks.  The flags are set when the first locker enters
>> the slowpath, and cleared when unlocking to an empty queue.
> So here you say you set both LSBs in order to keep head == tail working,
> but the code seems to suggest you only use the tail LSB.
>
> I think I see why using only one LSB is sufficient, but some consistency
> would be nice :-)

I tried that initially, but it turned out more messy.  The problem is
that the flag can change while you're spinning on the lock, so you need
to mask it out every time you read tail before you can compare it to
head; if head has the flag set too, you just need to mask it out of
there as well:

ticket = xadd(lock, 2 << TICKET_SHIFT);
ticket.tail &= ~TICKET_SLOW_FLAGS;

while (ticket.head != ticket.tail) {
relax();
ticket.head = lock->head /* & ~TICKET_SLOW_FLAGS */;
}

IOW setting both doesn't help anything, and just requires an extra mask
in the spin loop (and anywhere else that uses 'head').

And hey, extra bit.  Bound to be useful for something.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [PATCH 09/14] xen/pvticketlock: Xen implementation for PV ticket locks

2010-11-17 Thread Jeremy Fitzhardinge
On 11/17/2010 12:52 AM, Jeremy Fitzhardinge wrote:
> On 11/17/2010 12:11 AM, Jan Beulich wrote:
>>>>> On 16.11.10 at 22:08, Jeremy Fitzhardinge  wrote:
>>> +static void xen_lock_spinning(struct arch_spinlock *lock, unsigned want)
>>>  {
>>> -   struct xen_spinlock *xl = (struct xen_spinlock *)lock;
>>> -   struct xen_spinlock *prev;
>>> int irq = __get_cpu_var(lock_kicker_irq);
>>> -   int ret;
>>> +   struct xen_lock_waiting *w = &__get_cpu_var(lock_waiting);
>>> +   int cpu = smp_processor_id();
>>> u64 start;
>>>  
>>> /* If kicker interrupts not initialized yet, just spin */
>>> if (irq == -1)
>>> -   return 0;
>>> +   return;
>>>  
>>> start = spin_time_start();
>>>  
>>> -   /* announce we're spinning */
>>> -   prev = spinning_lock(xl);
>>> +   w->want = want;
>>> +   w->lock = lock;
>>> +
>>> +   /* This uses set_bit, which atomic and therefore a barrier */
>>> +   cpumask_set_cpu(cpu, &waiting_cpus);
>> Since you don't allow nesting, don't you need to disable
>> interrupts before you touch per-CPU state?
> Yes, I think you're right - interrupts need to be disabled for the bulk
> of this function.

Actually, on second thoughts, maybe it doesn't matter so much.  The main
issue is making sure that the interrupt will make the VCPU drop out of
xen_poll_irq() - if it happens before xen_poll_irq(), it should leave
the event pending, which will cause the poll to return immediately.  I
hope.  Certainly disabling interrupts for some of the function will make
it easier to analyze with respect to interrupt nesting.

Another issue may be making sure the writes and reads of "w->want" and
"w->lock" are ordered properly to make sure that xen_unlock_kick() never
sees an inconsistent view of the (lock,want) tuple.  The risk being that
xen_unlock_kick() sees a random, spurious (lock,want) pairing and sends
the kick event to the wrong VCPU, leaving the deserving one hung.

J

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 13/14] x86/ticketlock: add slowpath logic

2010-11-17 Thread Jeremy Fitzhardinge
On 11/17/2010 12:56 AM, Jeremy Fitzhardinge wrote:
> On 11/17/2010 12:52 AM, Jeremy Fitzhardinge wrote:
>> But, yes, %z0 sounds interesting.  Is it documented anywhere?  I think
>> I've tried to use it in the past and run into gcc bugs.
> This one: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39590
>
> Should be OK in this case because there's no 64-bit values to be seen...
Hm, it fails when __ticket_t is 16 bits:

/home/jeremy/git/linux/arch/x86/include/asm/spinlock.h: Assembler messages:
/home/jeremy/git/linux/arch/x86/include/asm/spinlock.h:73: Error: suffix
or operands invalid for `or'

lock; ors $1, 2(%rbx)   #,


So I don't think that's going to work out...

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 13/14] x86/ticketlock: add slowpath logic

2010-11-17 Thread Jeremy Fitzhardinge
On 11/17/2010 12:58 AM, Avi Kivity wrote:
>> Actually in this case I'm pretty sure there's already a "set bit"
>> function which will do the job.  set_bit(), I guess, though it takes a
>> bit number rather than a mask...
>>
>
>
> set_bit() operates on a long, while the intel manuals recommend
> against operating on operands of different size, especially with
> locked operations.  I think newer processors have more relaxed
> requirements, though.

Despite its prototype, set_bit() is pretty specifically using "orb" for
a the constant case, or bts otherwise (I don't know what size memory
operation bts is considered to generate).

J

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 13/14] x86/ticketlock: add slowpath logic

2010-11-17 Thread Jeremy Fitzhardinge
On 11/17/2010 12:52 AM, Jeremy Fitzhardinge wrote:
> But, yes, %z0 sounds interesting.  Is it documented anywhere?  I think
> I've tried to use it in the past and run into gcc bugs.
This one: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39590

Should be OK in this case because there's no 64-bit values to be seen...

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 13/14] x86/ticketlock: add slowpath logic

2010-11-17 Thread Jeremy Fitzhardinge
On 11/17/2010 12:31 AM, Jan Beulich wrote:
>>>> On 16.11.10 at 22:08, Jeremy Fitzhardinge  wrote:
>> +static inline void __ticket_enter_slowpath(struct arch_spinlock *lock)
>> +{
>> +if (sizeof(lock->tickets.tail) == sizeof(u8))
>> +asm (LOCK_PREFIX "orb %1, %0"
>> + : "+m" (lock->tickets.tail)
>> + : "i" (TICKET_SLOWPATH_FLAG) : "memory");
>> +else
>> +asm (LOCK_PREFIX "orw %1, %0"
>> + : "+m" (lock->tickets.tail)
>> + : "i" (TICKET_SLOWPATH_FLAG) : "memory");
>> +}
> Came only now to mind: Here and elsewhere, did you try using
> %z0 to have gcc produce the opcode suffix character, rather
> than having these somewhat ugly if()-s?

Actually in this case I'm pretty sure there's already a "set bit"
function which will do the job.  set_bit(), I guess, though it takes a
bit number rather than a mask...

But, yes, %z0 sounds interesting.  Is it documented anywhere?  I think
I've tried to use it in the past and run into gcc bugs.

Thanks,
J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 09/14] xen/pvticketlock: Xen implementation for PV ticket locks

2010-11-17 Thread Jeremy Fitzhardinge
On 11/17/2010 12:11 AM, Jan Beulich wrote:
>>>> On 16.11.10 at 22:08, Jeremy Fitzhardinge  wrote:
>> +static void xen_lock_spinning(struct arch_spinlock *lock, unsigned want)
>>  {
>> -struct xen_spinlock *xl = (struct xen_spinlock *)lock;
>> -struct xen_spinlock *prev;
>>  int irq = __get_cpu_var(lock_kicker_irq);
>> -int ret;
>> +struct xen_lock_waiting *w = &__get_cpu_var(lock_waiting);
>> +int cpu = smp_processor_id();
>>  u64 start;
>>  
>>  /* If kicker interrupts not initialized yet, just spin */
>>  if (irq == -1)
>> -return 0;
>> +return;
>>  
>>  start = spin_time_start();
>>  
>> -/* announce we're spinning */
>> -prev = spinning_lock(xl);
>> +w->want = want;
>> +w->lock = lock;
>> +
>> +/* This uses set_bit, which atomic and therefore a barrier */
>> +cpumask_set_cpu(cpu, &waiting_cpus);
> Since you don't allow nesting, don't you need to disable
> interrupts before you touch per-CPU state?

Yes, I think you're right - interrupts need to be disabled for the bulk
of this function.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 09/14] xen/pvticketlock: Xen implementation for PV ticket locks

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Replace the old Xen implementation of PV spinlocks with and implementation
of xen_lock_spinning and xen_unlock_kick.

xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
adds itself to the waiting_cpus set, and blocks on an event channel
until the channel becomes pending.

xen_unlock_kick searches the cpus in waiting_cpus looking for the one
which next wants this lock with the next ticket, if any.  If found,
it kicks it by making its event channel pending, which wakes it up.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/xen/spinlock.c |  281 ++-
 1 files changed, 36 insertions(+), 245 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 3d9da72..5feb897 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -19,32 +19,21 @@
 #ifdef CONFIG_XEN_DEBUG_FS
 static struct xen_spinlock_stats
 {
-   u64 taken;
u32 taken_slow;
-   u32 taken_slow_nested;
u32 taken_slow_pickup;
u32 taken_slow_spurious;
-   u32 taken_slow_irqenable;
 
-   u64 released;
u32 released_slow;
u32 released_slow_kicked;
 
 #define HISTO_BUCKETS  30
-   u32 histo_spin_total[HISTO_BUCKETS+1];
-   u32 histo_spin_spinning[HISTO_BUCKETS+1];
u32 histo_spin_blocked[HISTO_BUCKETS+1];
 
-   u64 time_total;
-   u64 time_spinning;
u64 time_blocked;
 } spinlock_stats;
 
 static u8 zero_stats;
 
-static unsigned lock_timeout = 1 << 10;
-#define TIMEOUT lock_timeout
-
 static inline void check_zero(void)
 {
if (unlikely(zero_stats)) {
@@ -73,22 +62,6 @@ static void __spin_time_accum(u64 delta, u32 *array)
array[HISTO_BUCKETS]++;
 }
 
-static inline void spin_time_accum_spinning(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_spinning);
-   spinlock_stats.time_spinning += delta;
-}
-
-static inline void spin_time_accum_total(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_total);
-   spinlock_stats.time_total += delta;
-}
-
 static inline void spin_time_accum_blocked(u64 start)
 {
u32 delta = xen_clocksource_read() - start;
@@ -105,214 +78,76 @@ static inline u64 spin_time_start(void)
return 0;
 }
 
-static inline void spin_time_accum_total(u64 start)
-{
-}
-static inline void spin_time_accum_spinning(u64 start)
-{
-}
 static inline void spin_time_accum_blocked(u64 start)
 {
 }
 #endif  /* CONFIG_XEN_DEBUG_FS */
 
-struct xen_spinlock {
-   unsigned char lock; /* 0 -> free; 1 -> locked */
-   unsigned short spinners;/* count of waiting cpus */
+struct xen_lock_waiting {
+   struct arch_spinlock *lock;
+   __ticket_t want;
 };
 
 static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
+static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting);
+static cpumask_t waiting_cpus;
 
-#if 0
-static int xen_spin_is_locked(struct arch_spinlock *lock)
-{
-   struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-
-   return xl->lock != 0;
-}
-
-static int xen_spin_is_contended(struct arch_spinlock *lock)
-{
-   struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-
-   /* Not strictly true; this is only the count of contended
-  lock-takers entering the slow path. */
-   return xl->spinners != 0;
-}
-
-static int xen_spin_trylock(struct arch_spinlock *lock)
-{
-   struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-   u8 old = 1;
-
-   asm("xchgb %b0,%1"
-   : "+q" (old), "+m" (xl->lock) : : "memory");
-
-   return old == 0;
-}
-
-static DEFINE_PER_CPU(struct xen_spinlock *, lock_spinners);
-
-/*
- * Mark a cpu as interested in a lock.  Returns the CPU's previous
- * lock of interest, in case we got preempted by an interrupt.
- */
-static inline struct xen_spinlock *spinning_lock(struct xen_spinlock *xl)
-{
-   struct xen_spinlock *prev;
-
-   prev = __get_cpu_var(lock_spinners);
-   __get_cpu_var(lock_spinners) = xl;
-
-   wmb();  /* set lock of interest before count */
-
-   asm(LOCK_PREFIX " incw %0"
-   : "+m" (xl->spinners) : : "memory");
-
-   return prev;
-}
-
-/*
- * Mark a cpu as no longer interested in a lock.  Restores previous
- * lock of interest (NULL for none).
- */
-static inline void unspinning_lock(struct xen_spinlock *xl, struct 
xen_spinlock *prev)
-{
-   asm(LOCK_PREFIX " decw %0"
-   : "+m" (xl->spinners) : : "memory");
-   wmb();  /* decrement count before restoring lock */
-   __get_cpu_var(lock_spinners) = prev;
-}
-
-static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool 
irq_enable)
+stati

[PATCH 12/14] x86/ticketlocks: when paravirtualizing ticket locks, increment by 2

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Increment ticket head/tails by 2 rather than 1 to leave the LSB free
to store a "is in slowpath state" bit.  This halves the number
of possible CPUs for a given ticket size, but this shouldn't matter
in practice - kernels built for 32k+ CPU systems are probably
specially built for the hardware rather than a generic distro
kernel.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h   |   18 +-
 arch/x86/include/asm/spinlock_types.h |   10 +-
 2 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index cfa80b5..9e1c7ce 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -36,17 +36,17 @@
 static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
 {
if (sizeof(lock->tickets.head) == sizeof(u8))
-   asm (LOCK_PREFIX "incb %0"
-: "+m" (lock->tickets.head) : : "memory");
+   asm (LOCK_PREFIX "addb %1, %0"
+: "+m" (lock->tickets.head) : "i" (TICKET_LOCK_INC) : 
"memory");
else
-   asm (LOCK_PREFIX "incw %0"
-: "+m" (lock->tickets.head) : : "memory");
+   asm (LOCK_PREFIX "addw %1, %0"
+: "+m" (lock->tickets.head) : "i" (TICKET_LOCK_INC) : 
"memory");
 
 }
 #else
 static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
 {
-   lock->tickets.head++;
+   lock->tickets.head += TICKET_LOCK_INC;
 }
 #endif
 
@@ -84,7 +84,7 @@ static __always_inline void ticket_unlock_kick(struct 
arch_spinlock *lock, u
  */
 static __always_inline struct __raw_tickets __ticket_spin_claim(struct 
arch_spinlock *lock)
 {
-   register struct __raw_tickets tickets = { .tail = 1 };
+   register struct __raw_tickets tickets = { .tail = TICKET_LOCK_INC };
 
if (sizeof(lock->tickets.head) == sizeof(u8))
asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
@@ -136,7 +136,7 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
if (old.tickets.head != old.tickets.tail)
return 0;
 
-   new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
+   new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT);
 
/* cmpxchg is a full barrier, so nothing can move before it */
return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
@@ -144,7 +144,7 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-   __ticket_t next = lock->tickets.head + 1;
+   __ticket_t next = lock->tickets.head + TICKET_LOCK_INC;
__ticket_unlock_release(lock);
__ticket_unlock_kick(lock, next);
barrier();  /* prevent reordering into locked region */
@@ -161,7 +161,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t 
*lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-   return ((tmp.tail - tmp.head) & TICKET_MASK) > 1;
+   return ((tmp.tail - tmp.head) & TICKET_MASK) > TICKET_LOCK_INC;
 }
 #define arch_spin_is_contended arch_spin_is_contended
 
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 72e154e..0553c0b 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -7,7 +7,13 @@
 
 #include 
 
-#if (CONFIG_NR_CPUS < 256)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define __TICKET_LOCK_INC  2
+#else
+#define __TICKET_LOCK_INC  1
+#endif
+
+#if (CONFIG_NR_CPUS < (256 / __TICKET_LOCK_INC))
 typedef u8  __ticket_t;
 typedef u16 __ticketpair_t;
 #else
@@ -15,6 +21,8 @@ typedef u16 __ticket_t;
 typedef u32 __ticketpair_t;
 #endif
 
+#define TICKET_LOCK_INC((__ticket_t)__TICKET_LOCK_INC)
+
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
 #define TICKET_MASK((__ticket_t)((1 << TICKET_SHIFT) - 1))
 
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 10/14] x86/pvticketlock: use callee-save for lock_spinning

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path.  To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/paravirt.h   |2 +-
 arch/x86/include/asm/paravirt_types.h |2 +-
 arch/x86/kernel/paravirt-spinlocks.c  |2 +-
 arch/x86/xen/spinlock.c   |3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index c864775..6f275ca 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -719,7 +719,7 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 static inline void __ticket_lock_spinning(struct arch_spinlock *lock, unsigned 
ticket)
 {
-   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
+   PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
 static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned 
ticket)
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 1078474..53f249a 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -315,7 +315,7 @@ struct pv_mmu_ops {
 
 struct arch_spinlock;
 struct pv_lock_ops {
-   void (*lock_spinning)(struct arch_spinlock *lock, unsigned ticket);
+   struct paravirt_callee_save lock_spinning;
void (*unlock_kick)(struct arch_spinlock *lock, unsigned ticket);
 };
 
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index c2e010e..4251c1d 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -9,7 +9,7 @@
 
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-   .lock_spinning = paravirt_nop,
+   .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
.unlock_kick = paravirt_nop,
 #endif
 };
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 5feb897..c31c5a3 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -137,6 +137,7 @@ out:
w->lock = NULL;
spin_time_accum_blocked(start);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
 
 static void xen_unlock_kick(struct arch_spinlock *lock, unsigned next)
 {
@@ -189,7 +190,7 @@ void xen_uninit_lock_cpu(int cpu)
 
 void __init xen_init_spinlocks(void)
 {
-   pv_lock_ops.lock_spinning = xen_lock_spinning;
+   pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 14/14] x86/ticketlocks: tidy up __ticket_unlock_kick()

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

__ticket_unlock_kick() is now only called from known slowpaths, so there's
no need for it to do any checking of its own.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/paravirt.h |2 +-
 arch/x86/include/asm/spinlock.h |   14 --
 2 files changed, 1 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 6f275ca..7755b16 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -722,7 +722,7 @@ static inline void __ticket_lock_spinning(struct 
arch_spinlock *lock, unsigned t
PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned 
ticket)
+static inline void __ticket_unlock_kick(struct arch_spinlock *lock, unsigned 
ticket)
 {
PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 8d1cb42..70675bc 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -90,10 +90,6 @@ static __always_inline void __ticket_lock_spinning(struct 
arch_spinlock *lock, u
 {
 }
 
-static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock, 
unsigned ticket)
-{
-}
-
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
 
 /*
@@ -133,16 +129,6 @@ static __always_inline struct __raw_tickets 
__ticket_spin_claim(struct arch_spin
return tickets.tickets;
 }
 
-/* 
- * If a spinlock has someone waiting on it, then kick the appropriate
- * waiting cpu.
- */
-static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
-{
-   if (unlikely(lock->tickets.tail != next))
-   ticket_unlock_kick(lock, next);
-}
-
 static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
register struct __raw_tickets inc;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 06/14] x86/ticketlock: make __ticket_spin_trylock common

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Make trylock code common regardless of ticket size.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h   |   49 +++--
 arch/x86/include/asm/spinlock_types.h |6 +++-
 2 files changed, 14 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index f722f96..3afb1a7 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -98,48 +98,19 @@ static __always_inline void __ticket_spin_lock(struct 
arch_spinlock *lock)
 out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
-#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-   unsigned int tmp, new;
-
-   asm volatile("movzwl %2, %0\n\t"
-"cmpb %h0,%b0\n\t"
-"leal 0x100(%" REG_PTR_MODE "0), %1\n\t"
-"jne 1f\n\t"
-LOCK_PREFIX "cmpxchgw %w1,%2\n\t"
-"1:"
-"sete %b1\n\t"
-"movzbl %b1,%0\n\t"
-: "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-:
-: "memory", "cc");
-
-   return tmp;
-}
-#else
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-{
-   unsigned tmp;
-   unsigned new;
-
-   asm volatile("movl %2,%0\n\t"
-"movl %0,%1\n\t"
-"roll $16, %0\n\t"
-"cmpl %0,%1\n\t"
-"leal 0x0001(%" REG_PTR_MODE "0), %1\n\t"
-"jne 1f\n\t"
-LOCK_PREFIX "cmpxchgl %1,%2\n\t"
-"1:"
-"sete %b1\n\t"
-"movzbl %b1,%0\n\t"
-: "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-:
-: "memory", "cc");
-
-   return tmp;
+   arch_spinlock_t old, new;
+
+   old.tickets = ACCESS_ONCE(lock->tickets);
+   if (old.tickets.head != old.tickets.tail)
+   return 0;
+
+   new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
+
+   /* cmpxchg is a full barrier, so nothing can move before it */
+   return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
 }
-#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index e3ad1e3..72e154e 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -9,8 +9,10 @@
 
 #if (CONFIG_NR_CPUS < 256)
 typedef u8  __ticket_t;
+typedef u16 __ticketpair_t;
 #else
 typedef u16 __ticket_t;
+typedef u32 __ticketpair_t;
 #endif
 
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
@@ -18,14 +20,14 @@ typedef u16 __ticket_t;
 
 typedef struct arch_spinlock {
union {
-   unsigned int slock;
+   __ticketpair_t head_tail;
struct __raw_tickets {
__ticket_t head, tail;
} tickets;
};
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED  { { .slock = 0 } }
+#define __ARCH_SPIN_LOCK_UNLOCKED  { { 0 } }
 
 typedef struct {
unsigned int lock;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 11/14] x86/ticketlock: don't inline _spin_unlock when using paravirt spinlocks

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

The code size expands somewhat, and its probably better to just call
a function rather than inline it.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/Kconfig |3 +++
 kernel/Kconfig.locks |2 +-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e832768..a615c9c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -531,6 +531,9 @@ config PARAVIRT_SPINLOCKS
 
  If you are unsure how to answer this question, answer N.
 
+config ARCH_NOINLINE_SPIN_UNLOCK
+   def_bool PARAVIRT_SPINLOCKS
+
 config PARAVIRT_CLOCK
bool
 
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index 88c92fb..3216c22 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -125,7 +125,7 @@ config INLINE_SPIN_LOCK_IRQSAVE
 ARCH_INLINE_SPIN_LOCK_IRQSAVE
 
 config INLINE_SPIN_UNLOCK
-   def_bool !DEBUG_SPINLOCK && (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK)
+   def_bool !DEBUG_SPINLOCK && (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK) && 
!ARCH_NOINLINE_SPIN_UNLOCK
 
 config INLINE_SPIN_UNLOCK_BH
def_bool !DEBUG_SPINLOCK && ARCH_INLINE_SPIN_UNLOCK_BH
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 01/14] x86/ticketlock: clean up types and accessors

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

A few cleanups to the way spinlocks are defined and accessed:
 - define __ticket_t which is the size of a spinlock ticket (ie, enough
   bits to hold all the cpus)
 - Define struct arch_spinlock as a union containing plain slock and
   the head and tail tickets
 - Use head and tail to implement some of the spinlock predicates.
 - Make all ticket variables unsigned.
 - Use TICKET_SHIFT to form constants

Most of this will be used in later patches.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h   |   24 ++--
 arch/x86/include/asm/spinlock_types.h |   20 ++--
 2 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3089f70..d6d5784 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -56,11 +56,9 @@
  * much between them in performance though, especially as locks are out of 
line.
  */
 #if (NR_CPUS < 256)
-#define TICKET_SHIFT 8
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   short inc = 0x0100;
+   unsigned short inc = 1 << TICKET_SHIFT;
 
asm volatile (
LOCK_PREFIX "xaddw %w0, %1\n"
@@ -79,7 +77,7 @@ static __always_inline void 
__ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-   int tmp, new;
+   unsigned int tmp, new;
 
asm volatile("movzwl %2, %0\n\t"
 "cmpb %h0,%b0\n\t"
@@ -104,12 +102,10 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
 : "memory", "cc");
 }
 #else
-#define TICKET_SHIFT 16
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   int inc = 0x0001;
-   int tmp;
+   unsigned inc = 1 << TICKET_SHIFT;
+   unsigned tmp;
 
asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
 "movzwl %w0, %2\n\t"
@@ -129,8 +125,8 @@ static __always_inline void 
__ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-   int tmp;
-   int new;
+   unsigned tmp;
+   unsigned new;
 
asm volatile("movl %2,%0\n\t"
 "movl %0,%1\n\t"
@@ -160,16 +156,16 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-   int tmp = ACCESS_ONCE(lock->slock);
+   struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-   return !!(((tmp >> TICKET_SHIFT) ^ tmp) & ((1 << TICKET_SHIFT) - 1));
+   return !!(tmp.tail ^ tmp.head);
 }
 
 static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
 {
-   int tmp = ACCESS_ONCE(lock->slock);
+   struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-   return (((tmp >> TICKET_SHIFT) - tmp) & ((1 << TICKET_SHIFT) - 1)) > 1;
+   return ((tmp.tail - tmp.head) & TICKET_MASK) > 1;
 }
 
 #ifndef CONFIG_PARAVIRT_SPINLOCKS
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index dcb48b2..e3ad1e3 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -5,11 +5,27 @@
 # error "please don't include this file directly"
 #endif
 
+#include 
+
+#if (CONFIG_NR_CPUS < 256)
+typedef u8  __ticket_t;
+#else
+typedef u16 __ticket_t;
+#endif
+
+#define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
+#define TICKET_MASK((__ticket_t)((1 << TICKET_SHIFT) - 1))
+
 typedef struct arch_spinlock {
-   unsigned int slock;
+   union {
+   unsigned int slock;
+   struct __raw_tickets {
+   __ticket_t head, tail;
+   } tickets;
+   };
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED  { 0 }
+#define __ARCH_SPIN_LOCK_UNLOCKED  { { .slock = 0 } }
 
 typedef struct {
unsigned int lock;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 05/14] x86/ticketlock: make __ticket_spin_lock common

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Aside from the particular form of the xadd instruction, they're identical.
So factor out the xadd and use common code for the rest.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h |   42 ++
 1 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 1b81809..f722f96 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -67,13 +67,27 @@ static __always_inline void __ticket_unlock_release(struct 
arch_spinlock *lock)
  * save some instructions and make the code more elegant. There really isn't
  * much between them in performance though, especially as locks are out of 
line.
  */
-#if (NR_CPUS < 256)
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
+static __always_inline struct __raw_tickets __ticket_spin_claim(struct 
arch_spinlock *lock)
 {
-   register struct __raw_tickets inc = { .tail = 1 };
+   register struct __raw_tickets tickets = { .tail = 1 };
+
+   if (sizeof(lock->tickets.head) == sizeof(u8))
+   asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+ : "+r" (tickets), "+m" (lock->tickets)
+ : : "memory", "cc");
+   else
+   asm volatile (LOCK_PREFIX "xaddl %0, %1\n"
+: "+r" (tickets), "+m" (lock->tickets)
+: : "memory", "cc");
 
-   asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
- : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
+   return tickets;
+}
+
+static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+{
+   register struct __raw_tickets inc;
+
+   inc = __ticket_spin_claim(lock);
 
for (;;) {
if (inc.head == inc.tail)
@@ -84,6 +98,7 @@ static __always_inline void 
__ticket_spin_lock(arch_spinlock_t *lock)
 out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
+#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
unsigned int tmp, new;
@@ -103,23 +118,6 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
return tmp;
 }
 #else
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
-{
-   register struct __raw_tickets inc = { .tail = 1 };
-
-   asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
-: "+r" (inc), "+m" (lock->tickets)
-: : "memory", "cc");
-
-   for (;;) {
-   if (inc.head == inc.tail)
-   goto out;
-   cpu_relax();
-   inc.head = ACCESS_ONCE(lock->tickets.head);
-   }
-out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
-}
-
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
unsigned tmp;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 13/14] x86/ticketlock: add slowpath logic

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Maintain a flag in both LSBs of the ticket lock which indicates whether
anyone is in the lock slowpath and may need kicking when the current
holder unlocks.  The flags are set when the first locker enters
the slowpath, and cleared when unlocking to an empty queue.

In the specific implementation of lock_spinning(), make sure to set
the slowpath flags on the lock just before blocking.  We must do
this before the last-chance pickup test to prevent a deadlock
with the unlocker:

UnlockerLocker
test for lock pickup
-> fail
test slowpath + unlock
-> false
set slowpath flags
block

Whereas this works in any ordering:

UnlockerLocker
set slowpath flags
test for lock pickup
-> fail
block
test slowpath + unlock
-> true, kick

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h   |   53 -
 arch/x86/include/asm/spinlock_types.h |2 +
 arch/x86/kernel/paravirt-spinlocks.c  |   37 +++
 arch/x86/xen/spinlock.c   |4 ++
 4 files changed, 88 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 9e1c7ce..8d1cb42 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -53,7 +53,38 @@ static __always_inline void __ticket_unlock_release(struct 
arch_spinlock *lock)
 /* How long a lock should spin before we consider blocking */
 #define SPIN_THRESHOLD (1 << 11)
 
-#ifndef CONFIG_PARAVIRT_SPINLOCKS
+/* Only defined when CONFIG_PARAVIRT_SPINLOCKS defined, but may as
+ * well leave the prototype always visible.  */
+extern void __ticket_unlock_release_slowpath(struct arch_spinlock *lock);
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+/*
+ * Return true if someone is in the slowpath on this lock.  This
+ * should only be used by the current lock-holder.
+ */
+static inline bool __ticket_in_slowpath(struct arch_spinlock *lock)
+{
+   return !!(lock->tickets.tail & TICKET_SLOWPATH_FLAG);
+}
+
+static inline void __ticket_enter_slowpath(struct arch_spinlock *lock)
+{
+   if (sizeof(lock->tickets.tail) == sizeof(u8))
+   asm (LOCK_PREFIX "orb %1, %0"
+: "+m" (lock->tickets.tail)
+: "i" (TICKET_SLOWPATH_FLAG) : "memory");
+   else
+   asm (LOCK_PREFIX "orw %1, %0"
+: "+m" (lock->tickets.tail)
+: "i" (TICKET_SLOWPATH_FLAG) : "memory");
+}
+
+#else  /* !CONFIG_PARAVIRT_SPINLOCKS */
+static inline bool __ticket_in_slowpath(struct arch_spinlock *lock)
+{
+   return false;
+}
 
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, 
unsigned ticket)
 {
@@ -84,18 +115,22 @@ static __always_inline void ticket_unlock_kick(struct 
arch_spinlock *lock, u
  */
 static __always_inline struct __raw_tickets __ticket_spin_claim(struct 
arch_spinlock *lock)
 {
-   register struct __raw_tickets tickets = { .tail = TICKET_LOCK_INC };
+   register struct arch_spinlock tickets = {
+   { .tickets.tail = TICKET_LOCK_INC }
+   };
 
if (sizeof(lock->tickets.head) == sizeof(u8))
asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
- : "+r" (tickets), "+m" (lock->tickets)
+ : "+r" (tickets.head_tail), "+m" (lock->tickets)
  : : "memory", "cc");
else
asm volatile (LOCK_PREFIX "xaddl %0, %1\n"
-: "+r" (tickets), "+m" (lock->tickets)
+: "+r" (tickets.head_tail), "+m" (lock->tickets)
 : : "memory", "cc");
 
-   return tickets;
+   tickets.tickets.tail &= ~TICKET_SLOWPATH_FLAG;
+
+   return tickets.tickets;
 }
 
 /* 
@@ -144,9 +179,11 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-   __ticket_t next = lock->tickets.head + TICKET_LOCK_INC;
-   __ticket_unlock_release(lock);
-   __ticket_unlock_kick(lock, next);
+   barrier();  /* prevent reordering out of locked region */
+   if (unlikely(__ticket_in_slowpath(lock)))
+   __ticket_unlock_release_slowpath(lock);
+   else
+   __ticket_unlock_release(lock);
barrie

[PATCH 07/14] x86/spinlocks: replace pv spinlocks with pv ticketlocks

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Rather than outright replacing the entire spinlock implementation in
order to paravirtualize it, keep the ticket lock implementation but add
a couple of pvops hooks on the slow patch (long spin on lock, unlocking
a contended lock).

Ticket locks have a number of nice properties, but they also have some
surprising behaviours in virtual environments.  They enforce a strict
FIFO ordering on cpus trying to take a lock; however, if the hypervisor
scheduler does not schedule the cpus in the correct order, the system can
waste a huge amount of time spinning until the next cpu can take the lock.

(See Thomas Friebel's talk "Prevent Guests from Spinning Around"
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

To address this, we add two hooks:
 - __ticket_spin_lock which is called after the cpu has been
   spinning on the lock for a significant number of iterations but has
   failed to take the lock (presumably because the cpu holding the lock
   has been descheduled).  The lock_spinning pvop is expected to block
   the cpu until it has been kicked by the current lock holder.
 - __ticket_spin_unlock, which on releasing a contended lock
   (there are more cpus with tail tickets), it looks to see if the next
   cpu is blocked and wakes it if so.

When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
functions causes all the extra code to go away.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/paravirt.h   |   30 +++---
 arch/x86/include/asm/paravirt_types.h |8 +
 arch/x86/include/asm/spinlock.h   |   44 +++--
 arch/x86/kernel/paravirt-spinlocks.c  |   15 +-
 arch/x86/xen/spinlock.c   |7 -
 5 files changed, 50 insertions(+), 54 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 18e3b8a..c864775 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -717,36 +717,14 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)
 
-static inline int arch_spin_is_locked(struct arch_spinlock *lock)
+static inline void __ticket_lock_spinning(struct arch_spinlock *lock, unsigned 
ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
+   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static inline int arch_spin_is_contended(struct arch_spinlock *lock)
+static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned 
ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
-}
-#define arch_spin_is_contended arch_spin_is_contended
-
-static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
-}
-
-static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
- unsigned long flags)
-{
-   PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
-}
-
-static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
-{
-   return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
-}
-
-static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
+   PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
 #endif
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index b82bac9..1078474 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -315,12 +315,8 @@ struct pv_mmu_ops {
 
 struct arch_spinlock;
 struct pv_lock_ops {
-   int (*spin_is_locked)(struct arch_spinlock *lock);
-   int (*spin_is_contended)(struct arch_spinlock *lock);
-   void (*spin_lock)(struct arch_spinlock *lock);
-   void (*spin_lock_flags)(struct arch_spinlock *lock, unsigned long 
flags);
-   int (*spin_trylock)(struct arch_spinlock *lock);
-   void (*spin_unlock)(struct arch_spinlock *lock);
+   void (*lock_spinning)(struct arch_spinlock *lock, unsigned ticket);
+   void (*unlock_kick)(struct arch_spinlock *lock, unsigned ticket);
 };
 
 /* This contains all the paravirt structures: we get a convenient
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3afb1a7..8e379d3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -50,6 +50,21 @@ static __always_inline void __ticket_unlock_release(struct 
arch_spinlock *lock)
 }
 #endif
 
+/* How long a lock should spin before we consider blocking */
+#define SPIN_THRESHOLD (1 << 11)
+
+#ifndef CONFIG_PARAVIRT_SPINLOCKS
+
+static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, 
unsigned ticket)
+{
+}
+
+static __always_inline void ticket_unlock_kick(struct arch_spi

[PATCH 08/14] x86/ticketlock: collapse a layer of functions

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Now that the paravirtualization layer doesn't exist at the spinlock
level any more, we can collapse the __ticket_ functions into the arch_
functions.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h |   35 +--
 1 files changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 8e379d3..cfa80b5 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -108,7 +108,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock, __t
ticket_unlock_kick(lock, next);
 }
 
-static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
register struct __raw_tickets inc;
 
@@ -128,7 +128,7 @@ static __always_inline void __ticket_spin_lock(struct 
arch_spinlock *lock)
 out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
+static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
arch_spinlock_t old, new;
 
@@ -142,7 +142,7 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
 }
 
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
+static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
__ticket_t next = lock->tickets.head + 1;
__ticket_unlock_release(lock);
@@ -150,46 +150,21 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
barrier();  /* prevent reordering into locked region */
 }
 
-static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
return !!(tmp.tail ^ tmp.head);
 }
 
-static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
+static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
return ((tmp.tail - tmp.head) & TICKET_MASK) > 1;
 }
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_locked(lock);
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_contended(lock);
-}
 #define arch_spin_is_contended arch_spin_is_contended
 
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-   __ticket_spin_lock(lock);
-}
-
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-   return __ticket_spin_trylock(lock);
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-   __ticket_spin_unlock(lock);
-}
-
 static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
  unsigned long flags)
 {
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 00/14] PV ticket locks without expanding spinlock

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Hi all,

This is a revised version of the pvticket lock series.

The early part of the series is mostly unchanged: it converts the bulk
of the ticket lock code into C and makes the "small" and "large"
ticket code common.  The only changes are the incorporation of various
review comments.

The latter part of the series converts from pv spinlocks to pv ticket
locks (ie, using the ticket lock fastpath as-is, but adding pv ops for
the ticketlock slowpaths).

The significant difference here is that rather than adding a new
ticket_t-sized element to arch_spinlock_t - effectively doubling the
size - I steal the LSB of the tickets themselves to store a bit.  This
allows the structure to remain the same size, but at the cost of
halving the max number of CPUs (127 for a 8-bit ticket, and a hard max
of 32767 overall).

The extra bit (well, two, but one is unused) in indicates whether the
lock has gone into "slowpath state", which means one of its lockers
has entered its slowpath and has blocked in the hypervisor.  This
means the current lock-holder needs to make sure it gets kicked out of
the hypervisor on unlock.

The spinlock remains in slowpath state until the last unlock happens
(ie there are no more queued lockers).

This code survives for a while with moderate testing, (make -j 100 on
8 VCPUs on a 4 PCPU system), but locks up after about 20 iterations,
so there's still some race/deadlock in there (probably something
misordered), but I think the basic approach is sound.

Thanks,
J

Jeremy Fitzhardinge (14):
  x86/ticketlock: clean up types and accessors
  x86/ticketlock: convert spin loop to C
  x86/ticketlock: Use C for __ticket_spin_unlock
  x86/ticketlock: make large and small ticket versions of spin_lock the
same
  x86/ticketlock: make __ticket_spin_lock common
  x86/ticketlock: make __ticket_spin_trylock common
  x86/spinlocks: replace pv spinlocks with pv ticketlocks
  x86/ticketlock: collapse a layer of functions
  xen/pvticketlock: Xen implementation for PV ticket locks
  x86/pvticketlock: use callee-save for lock_spinning
  x86/ticketlock: don't inline _spin_unlock when using paravirt
spinlocks
  x86/ticketlocks: when paravirtualizing ticket locks, increment by 2
  x86/ticketlock: add slowpath logic
  x86/ticketlocks: tidy up __ticket_unlock_kick()

 arch/x86/Kconfig  |3 +
 arch/x86/include/asm/paravirt.h   |   30 +---
 arch/x86/include/asm/paravirt_types.h |8 +-
 arch/x86/include/asm/spinlock.h   |  236 +---
 arch/x86/include/asm/spinlock_types.h |   32 -
 arch/x86/kernel/paravirt-spinlocks.c  |   52 +--
 arch/x86/xen/spinlock.c   |  281 +
 kernel/Kconfig.locks  |2 +-
 8 files changed, 231 insertions(+), 413 deletions(-)

-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 04/14] x86/ticketlock: make large and small ticket versions of spin_lock the same

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Make the bulk of __ticket_spin_lock look identical for large and small
number of cpus.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h |   23 ---
 1 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 0170ba9..1b81809 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -70,19 +70,16 @@ static __always_inline void __ticket_unlock_release(struct 
arch_spinlock *lock)
 #if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   register union {
-   struct __raw_tickets tickets;
-   unsigned short slock;
-   } inc = { .slock = 1 << TICKET_SHIFT };
+   register struct __raw_tickets inc = { .tail = 1 };
 
asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
- : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+ : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
 
for (;;) {
-   if (inc.tickets.head == inc.tickets.tail)
+   if (inc.head == inc.tail)
goto out;
cpu_relax();
-   inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+   inc.head = ACCESS_ONCE(lock->tickets.head);
}
 out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
@@ -108,21 +105,17 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   unsigned inc = 1 << TICKET_SHIFT;
-   __ticket_t tmp;
+   register struct __raw_tickets inc = { .tail = 1 };
 
asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
-: "+r" (inc), "+m" (lock->slock)
+: "+r" (inc), "+m" (lock->tickets)
 : : "memory", "cc");
 
-   tmp = inc;
-   inc >>= TICKET_SHIFT;
-
for (;;) {
-   if ((__ticket_t)inc == tmp)
+   if (inc.head == inc.tail)
goto out;
cpu_relax();
-   tmp = ACCESS_ONCE(lock->tickets.head);
+   inc.head = ACCESS_ONCE(lock->tickets.head);
}
 out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 02/14] x86/ticketlock: convert spin loop to C

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

The inner loop of __ticket_spin_lock isn't doing anything very special,
so reimplement it in C.

For the 8 bit ticket lock variant, we use a register union to get direct
access to the lower and upper bytes in the tickets, but unfortunately gcc
won't generate a direct comparison between the two halves of the register,
so the generated asm isn't quite as pretty as the hand-coded version.
However benchmarking shows that this is actually a small improvement in
runtime performance on some benchmarks, and never a slowdown.

We also need to make sure there's a barrier at the end of the lock loop
to make sure that the compiler doesn't move any instructions from within
the locked region into the region where we don't yet own the lock.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h |   58 +++---
 1 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index d6d5784..f48a6e3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -58,21 +58,21 @@
 #if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   unsigned short inc = 1 << TICKET_SHIFT;
-
-   asm volatile (
-   LOCK_PREFIX "xaddw %w0, %1\n"
-   "1:\t"
-   "cmpb %h0, %b0\n\t"
-   "je 2f\n\t"
-   "rep ; nop\n\t"
-   "movb %1, %b0\n\t"
-   /* don't need lfence here, because loads are in-order */
-   "jmp 1b\n"
-   "2:"
-   : "+Q" (inc), "+m" (lock->slock)
-   :
-   : "memory", "cc");
+   register union {
+   struct __raw_tickets tickets;
+   unsigned short slock;
+   } inc = { .slock = 1 << TICKET_SHIFT };
+
+   asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+ : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+
+   for (;;) {
+   if (inc.tickets.head == inc.tickets.tail)
+   goto out;
+   cpu_relax();
+   inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+   }
+out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
@@ -105,22 +105,22 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
unsigned inc = 1 << TICKET_SHIFT;
-   unsigned tmp;
+   __ticket_t tmp;
 
-   asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
-"movzwl %w0, %2\n\t"
-"shrl $16, %0\n\t"
-"1:\t"
-"cmpl %0, %2\n\t"
-"je 2f\n\t"
-"rep ; nop\n\t"
-"movzwl %1, %2\n\t"
-/* don't need lfence here, because loads are in-order */
-"jmp 1b\n"
-"2:"
-: "+r" (inc), "+m" (lock->slock), "=&r" (tmp)
-:
-: "memory", "cc");
+   asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
+: "+r" (inc), "+m" (lock->slock)
+: : "memory", "cc");
+
+   tmp = inc;
+   inc >>= TICKET_SHIFT;
+
+   for (;;) {
+   if ((__ticket_t)inc == tmp)
+   goto out;
+   cpu_relax();
+   tmp = ACCESS_ONCE(lock->tickets.head);
+   }
+out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 03/14] x86/ticketlock: Use C for __ticket_spin_unlock

2010-11-16 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

If we don't need to use a locked inc for unlock, then implement it in C.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h |   32 +---
 1 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index f48a6e3..0170ba9 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -33,9 +33,21 @@
  * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
  * (PPro errata 66, 92)
  */
-# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+   if (sizeof(lock->tickets.head) == sizeof(u8))
+   asm (LOCK_PREFIX "incb %0"
+: "+m" (lock->tickets.head) : : "memory");
+   else
+   asm (LOCK_PREFIX "incw %0"
+: "+m" (lock->tickets.head) : : "memory");
+
+}
 #else
-# define UNLOCK_LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+   lock->tickets.head++;
+}
 #endif
 
 /*
@@ -93,14 +105,6 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
 
return tmp;
 }
-
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
-{
-   asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
-: "+m" (lock->slock)
-:
-: "memory", "cc");
-}
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
@@ -144,15 +148,13 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
 
return tmp;
 }
+#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
-   asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
-: "+m" (lock->slock)
-:
-: "memory", "cc");
+   __ticket_unlock_release(lock);
+   barrier();  /* prevent reordering into locked region */
 }
-#endif
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 00/20] x86: ticket lock rewrite and paravirtualization

2010-11-15 Thread Jeremy Fitzhardinge
On 11/15/2010 12:14 PM, Peter Zijlstra wrote:
> On Mon, 2010-11-15 at 12:03 -0800, H. Peter Anvin wrote:
>> On 11/15/2010 12:00 PM, Jeremy Fitzhardinge wrote:
>>> Another approach I discussed with PeterZ and Mathieu is to steal the LSB
>>> of the ticket counters (halving the max CPU count) to use as a "there is
>>> someone in slowpath waiting on this lock".  But I haven't spent the time
>>> to work out an algorithm to maintain that flag (or flags, since there
>>> are bits available) in a correct and efficient way.
>>>
>> Definitely worth pondering.
> Right, so the idea was to make the ticket increment 2, which would leave
> the LSB of both the head and tail available. I think that if one were to
> set both (using a cmpxchg), the ticket fast-path wouldn't need any
> changes since head==tail is still the correct condition for acquisition.
>
> Then the unlock needs an added conditional:
>   if (tail & 1) 
>   unlock_slowpath()

The tricky part is knowing how to clear the bit(s) on the last person
dropping out of the slow path, and making that race-free with respect to
new lockers entering the slow path.  I guess you could leave it in
slowpath state until you're the last unlocker (ie, you're unlocking into
uncontended state), whereupon you also clear the bits; I guess that
would probably need a cmpxchg to make it safe WRT new lockers entering
slowpath.

As a heuristic, it shouldn't be too bad performancewise, since
(handwaving) if ticketholder N has entered the slowpath, then its likely
that N+1 will as well.

J

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 00/20] x86: ticket lock rewrite and paravirtualization

2010-11-15 Thread Jeremy Fitzhardinge
On 11/12/2010 02:20 PM, H. Peter Anvin wrote:
> On 11/12/2010 02:17 PM, Jeremy Fitzhardinge wrote:
>> On 11/12/2010 02:12 PM, H. Peter Anvin wrote:
>>> On 11/03/2010 07:59 AM, Jeremy Fitzhardinge wrote:
>>>>   - with an unmodified struct spinlock, it can check to see if
>>>> head == tail after unlock; if not, then there's someone else
>>>> trying to lock, and we can do a kick.  Unfortunately this
>>>> generates very high level of redundant kicks, because the
>>>> waiting CPU might not have blocked yet (which is the common
>>>> case)
>>>>
>>> How high is "very high" here -- most of the time (so that any mitigation
>>> on the slow patch is useless)?
>> I'll need to remeasure, but I think around 90% of the slowpath entries
>> were spurious without this.  In other words, when spinlocks do contend,
>> most of the time it isn't very serious and the other cpu doesn't spend
>> much time spinning.
>>
> 90% of the slowpath entries is one thing, my real question is the
> fraction of fastpath entries that get diverted to the slowpath.  It
> affects where mitigation needs to happen.

There are two questions: how many unlock events *must* go into the
slowpath for correctness reasons (ie, because the corresponding lock
also went slowpath and got blocked there), and how many end up going
into the slowpath due to imperfect heuristics?

The number of lock events which go slowpath is very dependent on the
workload of the kernel in question and of the machine overall.  On a
system with no CPU overcommit it should be zero (assuming that in the
native case no Linux spinlock remains contended for so long that it will
trigger the slowpath).  On a very overcommitted system, it comes down to
what the likelihood that a VCPU will get preempted while running in a
critical region: Tcrit * Phz, where Tcrit is the critical section time
in S and Phz is the preemption rate of the VCPU scheduler in Hz.  So,
for example, a lock with a 10uS critical section and a 100Hz preemption
rate will have a .1% chance of getting preempted and possibly causing
the other lockers to enter the slow path.

On the unlock side, it needs to test whether lock has any waiters in a
slowpath state.  A conservative test is whether there are any
outstanding tickets, but in my measurements 90% of CPUs which spun on a
lock ended up getting it without having to take the slowpath.  This lead
me to investigate more precise tests, which is currently a count of
slowpath-entering CPUs waiting on the lock.

Another approach I discussed with PeterZ and Mathieu is to steal the LSB
of the ticket counters (halving the max CPU count) to use as a "there is
someone in slowpath waiting on this lock".  But I haven't spent the time
to work out an algorithm to maintain that flag (or flags, since there
are bits available) in a correct and efficient way.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 06/20] x86/ticketlock: make __ticket_spin_trylock common

2010-11-15 Thread Jeremy Fitzhardinge
On 11/13/2010 02:48 AM, Eric Dumazet wrote:
> Le samedi 13 novembre 2010 à 18:17 +0800, Américo Wang a écrit :
>> On Wed, Nov 03, 2010 at 10:59:47AM -0400, Jeremy Fitzhardinge wrote:
>>
>>> +   union {
>>> +   struct __raw_tickets tickets;
>>> +   __ticketpair_t slock;
>>> +   } tmp, new;
>>> +   int ret;
>>> +
>>> +   tmp.tickets = ACCESS_ONCE(lock->tickets);
>>> +   if (tmp.tickets.head != tmp.tickets.tail)
>>> +   return 0;
>>> +
>>> +   new.slock = tmp.slock + (1 << TICKET_SHIFT);
>>> +
>>> +   ret = cmpxchg(&lock->ticketpair, tmp.slock, new.slock) == tmp.slock;
>>> +   barrier();  /* just make nothing creeps before lock is 
>>> claimed */
>> This one should be smp_wmb(), right? No CONFIG_X86_OOSTORE protected.
> cmpxchg() is a full memory barrier, no need for smp_wmb() or barrier()

Agreed.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 03/20] x86/ticketlock: Use C for __ticket_spin_unlock

2010-11-15 Thread Jeremy Fitzhardinge
On 11/13/2010 02:05 AM, Américo Wang wrote:
> On Wed, Nov 03, 2010 at 10:59:44AM -0400, Jeremy Fitzhardinge wrote:
>>  * On PPro SMP or if we are using OOSTORE, we use a locked operation to 
>> unlock
>>  * (PPro errata 66, 92)
>>  */
>> -# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
>> +static __always_inline void __ticket_unlock_release(struct arch_spinlock 
>> *lock)
>> +{
>> +if (sizeof(lock->tickets.head) == sizeof(u8))
>> +asm (LOCK_PREFIX "incb %0"
>> + : "+m" (lock->tickets.head) : : "memory");
>> +else
>> +asm (LOCK_PREFIX "incw %0"
>> + : "+m" (lock->tickets.head) : : "memory");
> This 'if/else' really should be done with #ifdef, even though
> the compiler may be smart enough to remove it.

No, we depend on if/else with constant arguments doing the right thing
all over the kernel.  It is always preferable to use it instead of
#ifdef where possible, so that the two branches of code are always
subjected to compiler checking, even if they're not being used.

>> +
>> +}
>> #else
>> -# define UNLOCK_LOCK_PREFIX
>> +static __always_inline void __ticket_unlock_release(struct arch_spinlock 
>> *lock)
>> +{
>> +barrier();
>> +lock->tickets.head++;
>> +barrier();
> The second barrier() is not needed.

Agreed.  It gets removed in a later patch.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 01/20] x86/ticketlock: clean up types and accessors

2010-11-15 Thread Jeremy Fitzhardinge
On 11/13/2010 01:57 AM, Américo Wang wrote:
> On Wed, Nov 03, 2010 at 10:59:42AM -0400, Jeremy Fitzhardinge wrote:
> ...
>> static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
>> {
>> -int inc = 0x0001;
>> -int tmp;
>> +unsigned inc = 1 << TICKET_SHIFT;
>> +unsigned tmp;
> Please don't use old implicit-int.

I don't mind much either way, but I don't think I've seen anyone worry
about this before.

>> -return (((tmp >> TICKET_SHIFT) - tmp) & ((1 << TICKET_SHIFT) - 1)) > 1;
>> +return ((tmp.tail - tmp.head) & TICKET_MASK) > 1;
>
> There is a type promotion here.

...

>> }
>>
>> #ifndef CONFIG_PARAVIRT_SPINLOCKS
>> diff --git a/arch/x86/include/asm/spinlock_types.h 
>> b/arch/x86/include/asm/spinlock_types.h
>> index dcb48b2..4582640 100644
>> --- a/arch/x86/include/asm/spinlock_types.h
>> +++ b/arch/x86/include/asm/spinlock_types.h
>> @@ -5,11 +5,27 @@
>> # error "please don't include this file directly"
>> #endif
>>
>> +#include 
>> +
>> +#if (CONFIG_NR_CPUS < 256)
>> +typedef u8  __ticket_t;
>> +#else
>> +typedef u16 __ticket_t;
>> +#endif
>> +
>> +#define TICKET_SHIFT(sizeof(__ticket_t) * 8)
>> +#define TICKET_MASK ((1 << TICKET_SHIFT) - 1)
>
> So here you may need to cast the result to __ticket_t.

OK.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH -next] xen: fix header export to userspace

2010-11-15 Thread Jeremy Fitzhardinge
On 11/13/2010 08:44 AM, Randy Dunlap wrote:
> From: Randy Dunlap 
>
> scripts/headers_install.pl prevents "__user" from being exported
> to userspace headers, so just use compiler.h to make sure that
> __user is defined and avoid the error.
>
> unifdef: linux-next-20101112/xx64/usr/include/xen/privcmd.h.tmp: 79: 
> Premature EOF (#if line 33 depth 1)

Ah, OK, thanks.  I was wondering what the proper fix for this was.  I'll
stick this in my tree.

Thanks,
J

> Signed-off-by: Randy Dunlap 
> Cc:   Jeremy Fitzhardinge 
> Cc:   Konrad Rzeszutek Wilk 
> Cc:   xen-de...@lists.xensource.com (moderated for non-subscribers)
> Cc:   virtualizat...@lists.osdl.org
> Cc:   Tony Finch 
> ---
>  include/xen/privcmd.h |5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> --- linux-next-20101112.orig/include/xen/privcmd.h
> +++ linux-next-20101112/include/xen/privcmd.h
> @@ -34,13 +34,10 @@
>  #define __LINUX_PUBLIC_PRIVCMD_H__
>  
>  #include 
> +#include 
>  
>  typedef unsigned long xen_pfn_t;
>  
> -#ifndef __user
> -#define __user
> -#endif
> -
>  struct privcmd_hypercall {
>   __u64 op;
>   __u64 arg[5];

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 00/20] x86: ticket lock rewrite and paravirtualization

2010-11-12 Thread Jeremy Fitzhardinge
On 11/12/2010 02:12 PM, H. Peter Anvin wrote:
> On 11/03/2010 07:59 AM, Jeremy Fitzhardinge wrote:
>>   - with an unmodified struct spinlock, it can check to see if
>> head == tail after unlock; if not, then there's someone else
>> trying to lock, and we can do a kick.  Unfortunately this
>> generates very high level of redundant kicks, because the
>> waiting CPU might not have blocked yet (which is the common
>> case)
>>
> How high is "very high" here -- most of the time (so that any mitigation
> on the slow patch is useless)?

I'll need to remeasure, but I think around 90% of the slowpath entries
were spurious without this.  In other words, when spinlocks do contend,
most of the time it isn't very serious and the other cpu doesn't spend
much time spinning.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 04/20] x86/ticketlock: make large and small ticket versions of spin_lock the same

2010-11-12 Thread Jeremy Fitzhardinge
On 11/12/2010 04:19 AM, Srivatsa Vaddagiri wrote:
> On Wed, Nov 03, 2010 at 10:59:45AM -0400, Jeremy Fitzhardinge wrote:
>> Make the bulk of __ticket_spin_lock look identical for large and small
>> number of cpus.
> [snip]
>
>>  #if (NR_CPUS < 256)
>>  static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
>>  {
>> -register union {
>> -struct __raw_tickets tickets;
>> -unsigned short slock;
>> -} inc = { .slock = 1 << TICKET_SHIFT };
>> +register struct __raw_tickets inc = { .tail = 1 };
> [snip]
>
>>  #else
>>  static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
>>  {
>> -unsigned inc = 1 << TICKET_SHIFT;
>> -__ticket_t tmp;
>> +register struct __raw_tickets inc = { .tickets.tail = 1 };
> s/.tickets//?
>
> Otherwise I get a compile error for NR_CPUS > 256, with just 4 patches 
> applied.

Yeah, likely.  That's precisely why I wanted to make them the same ;).

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/3] VIDEO: xen-fb, switch to for_each_console

2010-11-04 Thread Jeremy Fitzhardinge
On 11/03/2010 11:28 AM, Jiri Slaby wrote:
> Use newly added for_each_console for iterating consoles.
>
> Signed-off-by: Jiri Slaby 
> Cc: Jeremy Fitzhardinge 

Sure, if that's what all the kids are doing these days.

Acked-by: Jeremy Fitzhardinge 

J

> Cc: Chris Wright 
> Cc: virtualizat...@lists.osdl.org
> Cc: xen-de...@lists.xensource.com
> Cc: linux-fb...@vger.kernel.org
> ---
>  drivers/video/xen-fbfront.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/video/xen-fbfront.c b/drivers/video/xen-fbfront.c
> index 428d273..4abb0b9 100644
> --- a/drivers/video/xen-fbfront.c
> +++ b/drivers/video/xen-fbfront.c
> @@ -492,7 +492,7 @@ xenfb_make_preferred_console(void)
>   return;
>  
>   acquire_console_sem();
> - for (c = console_drivers; c; c = c->next) {
> + for_each_console(c) {
>   if (!strcmp(c->name, "tty") && c->index == 0)
>   break;
>   }

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 03/20] x86/ticketlock: Use C for __ticket_spin_unlock

2010-11-03 Thread Jeremy Fitzhardinge
On 11/03/2010 11:13 AM, Eric Dumazet wrote:
> Le mercredi 03 novembre 2010 à 10:59 -0400, Jeremy Fitzhardinge a
> écrit :
>> From: Jeremy Fitzhardinge 
>>
>> If we don't need to use a locked inc for unlock, then implement it in C.
>>
>> Signed-off-by: Jeremy Fitzhardinge 
>> ---
>>  arch/x86/include/asm/spinlock.h |   33 ++---
>>  1 files changed, 18 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/spinlock.h 
>> b/arch/x86/include/asm/spinlock.h
>> index 6711d36..082990a 100644
>> --- a/arch/x86/include/asm/spinlock.h
>> +++ b/arch/x86/include/asm/spinlock.h
>> @@ -33,9 +33,23 @@
>>   * On PPro SMP or if we are using OOSTORE, we use a locked operation to 
>> unlock
>>   * (PPro errata 66, 92)
>>   */
>> -# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
>> +static __always_inline void __ticket_unlock_release(struct arch_spinlock 
>> *lock)
>> +{
>> +if (sizeof(lock->tickets.head) == sizeof(u8))
>> +asm (LOCK_PREFIX "incb %0"
>> + : "+m" (lock->tickets.head) : : "memory");
>> +else
>> +asm (LOCK_PREFIX "incw %0"
>> + : "+m" (lock->tickets.head) : : "memory");
>> +
>> +}
>>  #else
>> -# define UNLOCK_LOCK_PREFIX
>> +static __always_inline void __ticket_unlock_release(struct arch_spinlock 
>> *lock)
>> +{
>> +barrier();
> technically speaking, it should be : smp_wmb()

Perhaps.  In practise it won't make a difference because it is defined
as barrier() unless OOSTORE is defined, in which case we need to do a
locked increment anyway.

Thanks,
J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

[PATCH 18/20] x86/ticketlock: remove .slock

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock_types.h |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index b396ed5..def8010 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -20,7 +20,6 @@ typedef u32 __ticketpair_t;
 
 typedef struct arch_spinlock {
union {
-   unsigned int slock;
__ticketpair_t ticketpair;
struct __raw_tickets {
__ticket_t head, tail;
@@ -31,7 +30,7 @@ typedef struct arch_spinlock {
 #endif
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED  { { .slock = 0 } }
+#define __ARCH_SPIN_LOCK_UNLOCKED  { { .tickets = {0, 0} } }
 
 typedef struct {
unsigned int lock;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 02/20] x86/ticketlock: convert spin loop to C

2010-11-03 Thread Jeremy Fitzhardinge
On 11/03/2010 11:11 AM, Eric Dumazet wrote:
> Le mercredi 03 novembre 2010 à 10:59 -0400, Jeremy Fitzhardinge a
> écrit :
>> From: Jeremy Fitzhardinge 
>>
>> The inner loop of __ticket_spin_lock isn't doing anything very special,
>> so reimplement it in C.
>>
>> For the 8 bit ticket lock variant, we use a register union to get direct
>> access to the lower and upper bytes in the tickets, but unfortunately gcc
>> won't generate a direct comparison between the two halves of the register,
>> so the generated asm isn't quite as pretty as the hand-coded version.
>> However benchmarking shows that this is actually a small improvement in
>> runtime performance on some benchmarks, and never a slowdown.
>>
>> We also need to make sure there's a barrier at the end of the lock loop
>> to make sure that the compiler doesn't move any instructions from within
>> the locked region into the region where we don't yet own the lock.
>>
>> Signed-off-by: Jeremy Fitzhardinge 
>> ---
>>  arch/x86/include/asm/spinlock.h |   58 
>> +++---
>>  1 files changed, 29 insertions(+), 29 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/spinlock.h 
>> b/arch/x86/include/asm/spinlock.h
>> index d6d5784..6711d36 100644
>> --- a/arch/x86/include/asm/spinlock.h
>> +++ b/arch/x86/include/asm/spinlock.h
>> @@ -58,21 +58,21 @@
>>  #if (NR_CPUS < 256)
>>  static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
>>  {
>> -unsigned short inc = 1 << TICKET_SHIFT;
>> -
>> -asm volatile (
>> -LOCK_PREFIX "xaddw %w0, %1\n"
>> -"1:\t"
>> -"cmpb %h0, %b0\n\t"
>> -"je 2f\n\t"
>> -"rep ; nop\n\t"
>> -"movb %1, %b0\n\t"
>> -/* don't need lfence here, because loads are in-order */
>> -"jmp 1b\n"
>> -"2:"
>> -: "+Q" (inc), "+m" (lock->slock)
>> -:
>> -: "memory", "cc");
>> +register union {
>> +struct __raw_tickets tickets;
>> +unsigned short slock;
>> +} inc = { .slock = 1 << TICKET_SHIFT };
>> +
>> +asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
>> +  : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
>> +
>> +for (;;) {
>> +if (inc.tickets.head == inc.tickets.tail)
>> +return;
>> +cpu_relax();
>> +inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
>> +}
>> +barrier();  /* make sure nothing creeps before the lock is 
>> taken */
> Isnt this barrier() never reached ?

Sorry, a later patch makes this clearer.  I should have folded it in.

J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

[PATCH 05/20] x86/ticketlock: make __ticket_spin_lock common

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Aside from the particular form of the xadd instruction, they're identical.
So factor out the xadd and use common code for the rest.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h |   42 ++
 1 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 7586d7a..4f9fa24 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -69,13 +69,27 @@ static __always_inline void __ticket_unlock_release(struct 
arch_spinlock *lock)
  * save some instructions and make the code more elegant. There really isn't
  * much between them in performance though, especially as locks are out of 
line.
  */
-#if (NR_CPUS < 256)
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
+static __always_inline struct __raw_tickets __ticket_spin_claim(struct 
arch_spinlock *lock)
 {
-   register struct __raw_tickets inc = { .tail = 1 };
+   register struct __raw_tickets tickets = { .tail = 1 };
+
+   if (sizeof(lock->tickets.head) == sizeof(u8))
+   asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+ : "+r" (tickets), "+m" (lock->tickets)
+ : : "memory", "cc");
+   else
+   asm volatile (LOCK_PREFIX "xaddl %0, %1\n"
+: "+r" (tickets), "+m" (lock->tickets)
+: : "memory", "cc");
 
-   asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
- : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
+   return tickets;
+}
+
+static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+{
+   register struct __raw_tickets inc;
+
+   inc = __ticket_spin_claim(lock);
 
for (;;) {
if (inc.head == inc.tail)
@@ -86,6 +100,7 @@ static __always_inline void 
__ticket_spin_lock(arch_spinlock_t *lock)
barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
+#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
unsigned int tmp, new;
@@ -105,23 +120,6 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
return tmp;
 }
 #else
-static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
-{
-   register struct __raw_tickets inc = { .tickets.tail = 1 };
-
-   asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
-: "+r" (inc), "+m" (lock->tickets)
-: : "memory", "cc");
-
-   for (;;) {
-   if (inc.head == inc.tail)
-   return;
-   cpu_relax();
-   inc.head = ACCESS_ONCE(lock->tickets.head);
-   }
-   barrier();  /* make sure nothing creeps before the lock is 
taken */
-}
-
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
unsigned tmp;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 09/20] xen/pvticketlock: Xen implementation for PV ticket locks

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Replace the old Xen implementation of PV spinlocks with and implementation
of xen_lock_spinning and xen_unlock_kick.

xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
adds itself to the waiting_cpus set, and blocks on an event channel
until the channel becomes pending.

xen_unlock_kick searches the cpus in waiting_cpus looking for the one
which next wants this lock with the next ticket, if any.  If found,
it kicks it by making its event channel pending, which wakes it up.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/xen/spinlock.c |  281 ++-
 1 files changed, 36 insertions(+), 245 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 7e9eb52..5feb897 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -19,32 +19,21 @@
 #ifdef CONFIG_XEN_DEBUG_FS
 static struct xen_spinlock_stats
 {
-   u64 taken;
u32 taken_slow;
-   u32 taken_slow_nested;
u32 taken_slow_pickup;
u32 taken_slow_spurious;
-   u32 taken_slow_irqenable;
 
-   u64 released;
u32 released_slow;
u32 released_slow_kicked;
 
 #define HISTO_BUCKETS  30
-   u32 histo_spin_total[HISTO_BUCKETS+1];
-   u32 histo_spin_spinning[HISTO_BUCKETS+1];
u32 histo_spin_blocked[HISTO_BUCKETS+1];
 
-   u64 time_total;
-   u64 time_spinning;
u64 time_blocked;
 } spinlock_stats;
 
 static u8 zero_stats;
 
-static unsigned lock_timeout = 1 << 10;
-#define TIMEOUT lock_timeout
-
 static inline void check_zero(void)
 {
if (unlikely(zero_stats)) {
@@ -73,22 +62,6 @@ static void __spin_time_accum(u64 delta, u32 *array)
array[HISTO_BUCKETS]++;
 }
 
-static inline void spin_time_accum_spinning(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_spinning);
-   spinlock_stats.time_spinning += delta;
-}
-
-static inline void spin_time_accum_total(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_total);
-   spinlock_stats.time_total += delta;
-}
-
 static inline void spin_time_accum_blocked(u64 start)
 {
u32 delta = xen_clocksource_read() - start;
@@ -105,214 +78,76 @@ static inline u64 spin_time_start(void)
return 0;
 }
 
-static inline void spin_time_accum_total(u64 start)
-{
-}
-static inline void spin_time_accum_spinning(u64 start)
-{
-}
 static inline void spin_time_accum_blocked(u64 start)
 {
 }
 #endif  /* CONFIG_XEN_DEBUG_FS */
 
-struct xen_spinlock {
-   unsigned char lock; /* 0 -> free; 1 -> locked */
-   unsigned short spinners;/* count of waiting cpus */
+struct xen_lock_waiting {
+   struct arch_spinlock *lock;
+   __ticket_t want;
 };
 
 static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
+static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting);
+static cpumask_t waiting_cpus;
 
-#if 0
-static int xen_spin_is_locked(struct arch_spinlock *lock)
-{
-   struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-
-   return xl->lock != 0;
-}
-
-static int xen_spin_is_contended(struct arch_spinlock *lock)
-{
-   struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-
-   /* Not strictly true; this is only the count of contended
-  lock-takers entering the slow path. */
-   return xl->spinners != 0;
-}
-
-static int xen_spin_trylock(struct arch_spinlock *lock)
-{
-   struct xen_spinlock *xl = (struct xen_spinlock *)lock;
-   u8 old = 1;
-
-   asm("xchgb %b0,%1"
-   : "+q" (old), "+m" (xl->lock) : : "memory");
-
-   return old == 0;
-}
-
-static DEFINE_PER_CPU(struct xen_spinlock *, lock_spinners);
-
-/*
- * Mark a cpu as interested in a lock.  Returns the CPU's previous
- * lock of interest, in case we got preempted by an interrupt.
- */
-static inline struct xen_spinlock *spinning_lock(struct xen_spinlock *xl)
-{
-   struct xen_spinlock *prev;
-
-   prev = __get_cpu_var(lock_spinners);
-   __get_cpu_var(lock_spinners) = xl;
-
-   wmb();  /* set lock of interest before count */
-
-   asm(LOCK_PREFIX " incw %0"
-   : "+m" (xl->spinners) : : "memory");
-
-   return prev;
-}
-
-/*
- * Mark a cpu as no longer interested in a lock.  Restores previous
- * lock of interest (NULL for none).
- */
-static inline void unspinning_lock(struct xen_spinlock *xl, struct 
xen_spinlock *prev)
-{
-   asm(LOCK_PREFIX " decw %0"
-   : "+m" (xl->spinners) : : "memory");
-   wmb();  /* decrement count before restoring lock */
-   __get_cpu_var(lock_spinners) = prev;
-}
-
-static noinline int xen_spin_lock_slow(struct arch_spinlock *lock, bool 
irq_enable)
+stati

[PATCH 19/20] x86/ticketlocks: use overlapping read to eliminate mb()

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

When reading the 'waiting' counter, use a longer-than-necessary read
which also overlaps 'head'.  This read is guaranteed to be in-order
with respect to and unlock writes to 'head', thereby eliminating the
need for an explicit mb() to enforce the read-after-write ordering.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h   |   14 +++---
 arch/x86/include/asm/spinlock_types.h |   24 
 2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index c7455e1..a16b6e4 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -86,14 +86,14 @@ static inline void __ticket_sub_waiting(struct 
arch_spinlock *lock)
 
 static __always_inline bool __ticket_lock_waiters(const struct arch_spinlock 
*lock)
 {
-   /*
-* Make sure everyone sees the unlock write before we check the
-* waiting count.  The processor ordering doesn't guarantee this
-* because they're different memory locations.
+   /* 
+* lock->waiting_head is a union element which aliases both
+* 'waiting' and 'head'.  Since the memory access overlaps
+* 'head', the read is forced to be in-order with respect to
+* unlock writes to 'head', eliminating the need for an
+* explicit mb(). (Intel memory ordering rules.)
 */
-   mb();
-
-   return ACCESS_ONCE(lock->waiting) != 0;
+   return ((__ticket_t)ACCESS_ONCE(lock->waiting_head)) != 0;
 }
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
 
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index def8010..307ef0b 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -18,6 +18,24 @@ typedef u32 __ticketpair_t;
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
 #define TICKET_MASK((1 << TICKET_SHIFT) - 1)
 
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+typedef struct arch_spinlock {
+   union {
+   struct {
+   __ticket_t waiting;
+   union {
+   __ticketpair_t ticketpair;
+   struct __raw_tickets {
+   __ticket_t head, tail;
+   } tickets;
+   };
+   };
+   __ticketpair_t waiting_head; /* aliases waiting and head */
+   };
+} arch_spinlock_t;
+
+#define __ARCH_SPIN_LOCK_UNLOCKED  { { { 0, { 0 } } } }
+#else
 typedef struct arch_spinlock {
union {
__ticketpair_t ticketpair;
@@ -25,12 +43,10 @@ typedef struct arch_spinlock {
__ticket_t head, tail;
} tickets;
};
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-   __ticket_t waiting;
-#endif
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED  { { .tickets = {0, 0} } }
+#define __ARCH_SPIN_LOCK_UNLOCKED  { { 0 } }
+#endif
 
 typedef struct {
unsigned int lock;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 10/20] x86/pvticketlock: keep count of blocked cpus

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

When a CPU blocks by calling into __ticket_lock_spinning, keep a count in
the spinlock.  This allows __ticket_lock_kick to more accurately tell
whether it has any work to do (in many cases, a spinlock may be contended,
but none of the waiters have gone into blocking).

This adds two locked instructions to the spinlock slow path (once the
lock has already spun for SPIN_THRESHOLD iterations), and adds another
one or two bytes to struct arch_spinlock.

We need to make sure we increment the waiting counter before doing the
last-chance check of the lock to see if we picked it up in the meantime.
If we don't then there's a potential deadlock:

lock holder lock waiter

clear event channel
check lock for pickup (did not)
release lock
check waiting counter
(=0, no kick)
add waiting counter
block (=deadlock)

Moving the "add waiting counter earler" avoids the deadlock:

lock holder lock waiter

clear event channel
add waiting counter
check lock for pickup (did not)
release lock
check waiting counter
(=1, kick)
block (and immediately wake)

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h   |   27 ++-
 arch/x86/include/asm/spinlock_types.h |3 +++
 arch/x86/xen/spinlock.c   |4 
 3 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index a79dfee..3deabca 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -65,6 +65,31 @@ static __always_inline void ticket_unlock_kick(struct 
arch_spinlock *lock, u
 {
 }
 
+static __always_inline bool __ticket_lock_waiters(const struct arch_spinlock 
*lock)
+{
+   return false;
+}
+#else
+static inline void __ticket_add_waiting(struct arch_spinlock *lock)
+{
+   if (sizeof(lock->waiting) == sizeof(u8))
+   asm (LOCK_PREFIX "addb $1, %0" : "+m" (lock->waiting) : : 
"memory");
+   else
+   asm (LOCK_PREFIX "addw $1, %0" : "+m" (lock->waiting) : : 
"memory");
+}
+
+static inline void __ticket_sub_waiting(struct arch_spinlock *lock)
+{
+   if (sizeof(lock->waiting) == sizeof(u8))
+   asm (LOCK_PREFIX "subb $1, %0" : "+m" (lock->waiting) : : 
"memory");
+   else
+   asm (LOCK_PREFIX "subw $1, %0" : "+m" (lock->waiting) : : 
"memory");
+}
+
+static __always_inline bool __ticket_lock_waiters(const struct arch_spinlock 
*lock)
+{
+   return ACCESS_ONCE(lock->waiting) != 0;
+}
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
 
 /*
@@ -106,7 +131,7 @@ static __always_inline struct __raw_tickets 
__ticket_spin_claim(struct arch_spin
  */
 static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
 {
-   if (unlikely(lock->tickets.tail != next))
+   if (unlikely(__ticket_lock_waiters(lock)))
ticket_unlock_kick(lock, next);
 }
 
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 48dafc3..b396ed5 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -26,6 +26,9 @@ typedef struct arch_spinlock {
__ticket_t head, tail;
} tickets;
};
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+   __ticket_t waiting;
+#endif
 } arch_spinlock_t;
 
 #define __ARCH_SPIN_LOCK_UNLOCKED  { { .slock = 0 } }
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 5feb897..b8f09d5 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -119,6 +119,8 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
unsigned want)
/* Only check lock once pending cleared */
barrier();
 
+   __ticket_add_waiting(lock);
+
/* check again make sure it didn't become free while
   we weren't looking  */
if (ACCESS_ONCE(lock->tickets.head) == want) {
@@ -133,6 +135,8 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
unsigned want)
kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 
 out:
+   __ticket_sub_waiting(lock);
+
cpumask_clear_cpu(cpu, &waiting_cpus);
w->lock = NULL;
spin_time_accum_blocked(start);
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 20/20] x86/ticketlock: rename ticketpair to head_tail

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Make it clearer what fields head_tail is actually overlapping with.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h   |2 +-
 arch/x86/include/asm/spinlock_types.h |4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index a16b6e4..6850884 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -174,7 +174,7 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
 
new.slock = tmp.slock + (1 << TICKET_SHIFT);
 
-   ret = cmpxchg(&lock->ticketpair, tmp.slock, new.slock) == tmp.slock;
+   ret = cmpxchg(&lock->head_tail, tmp.slock, new.slock) == tmp.slock;
barrier();  /* just make nothing creeps before lock is 
claimed */
 
return ret;
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 307ef0b..df031ec 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -24,7 +24,7 @@ typedef struct arch_spinlock {
struct {
__ticket_t waiting;
union {
-   __ticketpair_t ticketpair;
+   __ticketpair_t head_tail;
struct __raw_tickets {
__ticket_t head, tail;
} tickets;
@@ -38,7 +38,7 @@ typedef struct arch_spinlock {
 #else
 typedef struct arch_spinlock {
union {
-   __ticketpair_t ticketpair;
+   __ticketpair_t head_tail;
struct __raw_tickets {
__ticket_t head, tail;
} tickets;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 14/20] x86/ticketlock: loosen ordering restraints on unlock

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

It's only necessary to prevent the compiler from reordering code out of
the locked region past the unlock.  Putting too many barriers in prevents
the compiler from generating the best code when PV spinlocks are enabled.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index b9a1aae..d6de5e7 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -46,9 +46,7 @@ static __always_inline void __ticket_unlock_release(struct 
arch_spinlock *lock)
 #else
 static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
 {
-   barrier();
lock->tickets.head++;
-   barrier();
 }
 #endif
 
@@ -184,7 +182,11 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-   __ticket_t next = lock->tickets.head + 1;
+   __ticket_t next;
+
+   barrier();  /* prevent reordering out of locked region */
+
+   next = lock->tickets.head + 1;
__ticket_unlock_release(lock);
__ticket_unlock_kick(lock, next);
 }
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 13/20] x86/pvticketlock: make sure unlock is seen by everyone before checking waiters

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

If we don't put a real memory barrier between the unlocking increment
of the queue head and the check for lock waiters, we can end up with a
deadlock as a result of the unload write being reordered with respect
to the waiters read.  In effect, the check on the waiters count could
happen before the unlock, which leads to a deadlock:

unlockerlocker
check waiters (none)
check lock head (timeout)
add to waiters
block
release lock=> deadlock

Adding a mb() barrier prevents the unlocker's waiter check from happening
before the lock release.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3deabca..b9a1aae 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -88,6 +88,13 @@ static inline void __ticket_sub_waiting(struct arch_spinlock 
*lock)
 
 static __always_inline bool __ticket_lock_waiters(const struct arch_spinlock 
*lock)
 {
+   /*
+* Make sure everyone sees the unlock write before we check the
+* waiting count.  The processor ordering doesn't guarantee this
+* because they're different memory locations.
+*/
+   mb();
+
return ACCESS_ONCE(lock->waiting) != 0;
 }
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 02/20] x86/ticketlock: convert spin loop to C

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

The inner loop of __ticket_spin_lock isn't doing anything very special,
so reimplement it in C.

For the 8 bit ticket lock variant, we use a register union to get direct
access to the lower and upper bytes in the tickets, but unfortunately gcc
won't generate a direct comparison between the two halves of the register,
so the generated asm isn't quite as pretty as the hand-coded version.
However benchmarking shows that this is actually a small improvement in
runtime performance on some benchmarks, and never a slowdown.

We also need to make sure there's a barrier at the end of the lock loop
to make sure that the compiler doesn't move any instructions from within
the locked region into the region where we don't yet own the lock.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h |   58 +++---
 1 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index d6d5784..6711d36 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -58,21 +58,21 @@
 #if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   unsigned short inc = 1 << TICKET_SHIFT;
-
-   asm volatile (
-   LOCK_PREFIX "xaddw %w0, %1\n"
-   "1:\t"
-   "cmpb %h0, %b0\n\t"
-   "je 2f\n\t"
-   "rep ; nop\n\t"
-   "movb %1, %b0\n\t"
-   /* don't need lfence here, because loads are in-order */
-   "jmp 1b\n"
-   "2:"
-   : "+Q" (inc), "+m" (lock->slock)
-   :
-   : "memory", "cc");
+   register union {
+   struct __raw_tickets tickets;
+   unsigned short slock;
+   } inc = { .slock = 1 << TICKET_SHIFT };
+
+   asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
+ : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+
+   for (;;) {
+   if (inc.tickets.head == inc.tickets.tail)
+   return;
+   cpu_relax();
+   inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+   }
+   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
@@ -105,22 +105,22 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
unsigned inc = 1 << TICKET_SHIFT;
-   unsigned tmp;
+   __ticket_t tmp;
 
-   asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
-"movzwl %w0, %2\n\t"
-"shrl $16, %0\n\t"
-"1:\t"
-"cmpl %0, %2\n\t"
-"je 2f\n\t"
-"rep ; nop\n\t"
-"movzwl %1, %2\n\t"
-/* don't need lfence here, because loads are in-order */
-"jmp 1b\n"
-"2:"
-: "+r" (inc), "+m" (lock->slock), "=&r" (tmp)
-:
-: "memory", "cc");
+   asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
+: "+r" (inc), "+m" (lock->slock)
+: : "memory", "cc");
+
+   tmp = inc;
+   inc >>= TICKET_SHIFT;
+
+   for (;;) {
+   if ((__ticket_t)inc == tmp)
+   return;
+   cpu_relax();
+   tmp = ACCESS_ONCE(lock->tickets.head);
+   }
+   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 07/20] x86/spinlocks: replace pv spinlocks with pv ticketlocks

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Rather than outright replacing the entire spinlock implementation in
order to paravirtualize it, keep the ticket lock implementation but add
a couple of pvops hooks on the slow patch (long spin on lock, unlocking
a contended lock).

Ticket locks have a number of nice properties, but they also have some
surprising behaviours in virtual environments.  They enforce a strict
FIFO ordering on cpus trying to take a lock; however, if the hypervisor
scheduler does not schedule the cpus in the correct order, the system can
waste a huge amount of time spinning until the next cpu can take the lock.

(See Thomas Friebel's talk "Prevent Guests from Spinning Around"
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

To address this, we add two hooks:
 - __ticket_spin_lock which is called after the cpu has been
   spinning on the lock for a significant number of iterations but has
   failed to take the lock (presumably because the cpu holding the lock
   has been descheduled).  The lock_spinning pvop is expected to block
   the cpu until it has been kicked by the current lock holder.
 - __ticket_spin_unlock, which on releasing a contended lock
   (there are more cpus with tail tickets), it looks to see if the next
   cpu is blocked and wakes it if so.

When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
functions causes all the extra code to go away.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/paravirt.h   |   30 +++---
 arch/x86/include/asm/paravirt_types.h |8 +
 arch/x86/include/asm/spinlock.h   |   44 +++--
 arch/x86/kernel/paravirt-spinlocks.c  |   15 +-
 arch/x86/xen/spinlock.c   |7 -
 5 files changed, 50 insertions(+), 54 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 5653f43..79c77e5 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -722,36 +722,14 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)
 
-static inline int arch_spin_is_locked(struct arch_spinlock *lock)
+static inline void __ticket_lock_spinning(struct arch_spinlock *lock, unsigned 
ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
+   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static inline int arch_spin_is_contended(struct arch_spinlock *lock)
+static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned 
ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
-}
-#define arch_spin_is_contended arch_spin_is_contended
-
-static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
-}
-
-static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
- unsigned long flags)
-{
-   PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
-}
-
-static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
-{
-   return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
-}
-
-static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
+   PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
 #endif
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index db9ef55..968ee31 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -316,12 +316,8 @@ struct pv_mmu_ops {
 
 struct arch_spinlock;
 struct pv_lock_ops {
-   int (*spin_is_locked)(struct arch_spinlock *lock);
-   int (*spin_is_contended)(struct arch_spinlock *lock);
-   void (*spin_lock)(struct arch_spinlock *lock);
-   void (*spin_lock_flags)(struct arch_spinlock *lock, unsigned long 
flags);
-   int (*spin_trylock)(struct arch_spinlock *lock);
-   void (*spin_unlock)(struct arch_spinlock *lock);
+   void (*lock_spinning)(struct arch_spinlock *lock, unsigned ticket);
+   void (*unlock_kick)(struct arch_spinlock *lock, unsigned ticket);
 };
 
 /* This contains all the paravirt structures: we get a convenient
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 507f83c..06815f3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -52,6 +52,21 @@ static __always_inline void __ticket_unlock_release(struct 
arch_spinlock *lock)
 }
 #endif
 
+/* How long a lock should spin before we consider blocking */
+#define SPIN_THRESHOLD (1 << 11)
+
+#ifndef CONFIG_PARAVIRT_SPINLOCKS
+
+static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, 
unsigned ticket)
+{
+}
+
+static __always_inline void ticket_unlock_kick(struct arch_spi

[PATCH 15/20] x86/ticketlock: prevent compiler reordering into locked region

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Add a barrier() at the end of __raw_spin_unlock() to prevent instructions
from after the locked region from being reordered into it.  In theory doing
so shouldn't cause any problems, but in practice, the system locks up
under load...

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index d6de5e7..158b330 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -189,6 +189,8 @@ static __always_inline void 
arch_spin_unlock(arch_spinlock_t *lock)
next = lock->tickets.head + 1;
__ticket_unlock_release(lock);
__ticket_unlock_kick(lock, next);
+
+   barrier();  /* prevent reordering into locked region */
 }
 
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 16/20] x86/ticketlock: don't inline _spin_unlock when using paravirt spinlocks

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

The code size expands somewhat, and its probably better to just call
a function rather than inline it.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/Kconfig |3 +++
 kernel/Kconfig.locks |2 +-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cea0cd9..e67bf49 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -577,6 +577,9 @@ config PARAVIRT_SPINLOCKS
 
  If you are unsure how to answer this question, answer N.
 
+config ARCH_NOINLINE_SPIN_UNLOCK
+   def_bool PARAVIRT_SPINLOCKS
+
 config PARAVIRT_CLOCK
bool
 
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index 88c92fb..3216c22 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -125,7 +125,7 @@ config INLINE_SPIN_LOCK_IRQSAVE
 ARCH_INLINE_SPIN_LOCK_IRQSAVE
 
 config INLINE_SPIN_UNLOCK
-   def_bool !DEBUG_SPINLOCK && (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK)
+   def_bool !DEBUG_SPINLOCK && (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK) && 
!ARCH_NOINLINE_SPIN_UNLOCK
 
 config INLINE_SPIN_UNLOCK_BH
def_bool !DEBUG_SPINLOCK && ARCH_INLINE_SPIN_UNLOCK_BH
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 17/20] x86/ticketlock: clarify barrier in arch_spin_lock

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Make sure the barrier in arch_spin_lock is definitely in the code path.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 158b330..c7455e1 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -151,13 +151,13 @@ static __always_inline void arch_spin_lock(struct 
arch_spinlock *lock)
 
do {
if (inc.head == inc.tail)
-   return;
+   goto out;
cpu_relax();
inc.head = ACCESS_ONCE(lock->tickets.head);
} while (--count);
__ticket_lock_spinning(lock, inc.tail);
}
-   barrier();  /* make sure nothing creeps before the lock is 
taken */
+out:   barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
 static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 12/20] x86/pvticketlock: use callee-save for unlock_kick as well

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

The unlock code is typically inlined throughout the kernel, so its useful
to make sure there's minimal register pressure overhead from the presence
of the unlock_tick pvop call.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/paravirt.h   |2 +-
 arch/x86/include/asm/paravirt_types.h |2 +-
 arch/x86/kernel/paravirt-spinlocks.c  |2 +-
 arch/x86/xen/spinlock.c   |3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index dbd5914..b6ff89a 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -729,7 +729,7 @@ static inline void __ticket_lock_spinning(struct 
arch_spinlock *lock, unsigned t
 
 static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned 
ticket)
 {
-   PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
+   PVOP_VCALLEE2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
 #endif
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 9ad6d088..bb02831 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -317,7 +317,7 @@ struct pv_mmu_ops {
 struct arch_spinlock;
 struct pv_lock_ops {
struct paravirt_callee_save lock_spinning;
-   void (*unlock_kick)(struct arch_spinlock *lock, unsigned ticket);
+   struct paravirt_callee_save unlock_kick;
 };
 
 /* This contains all the paravirt structures: we get a convenient
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index 4251c1d..efad219 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -10,7 +10,7 @@
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
.lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
-   .unlock_kick = paravirt_nop,
+   .unlock_kick = __PV_IS_CALLEE_SAVE(paravirt_nop),
 #endif
 };
 EXPORT_SYMBOL(pv_lock_ops);
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 2bd51d8..bf2f409 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -159,6 +159,7 @@ static void xen_unlock_kick(struct arch_spinlock *lock, 
unsigned next)
}
}
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_unlock_kick);
 
 static irqreturn_t dummy_handler(int irq, void *dev_id)
 {
@@ -195,7 +196,7 @@ void xen_uninit_lock_cpu(int cpu)
 void __init xen_init_spinlocks(void)
 {
pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
-   pv_lock_ops.unlock_kick = xen_unlock_kick;
+   pv_lock_ops.unlock_kick = PV_CALLEE_SAVE(xen_unlock_kick);
 }
 
 #ifdef CONFIG_XEN_DEBUG_FS
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 00/20] x86: ticket lock rewrite and paravirtualization

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Hi all,

This series does two major things:

1. It converts the bulk of the implementation to C, and makes the
   "small ticket" and "large ticket" code common.  Only the actual
   size-dependent asm instructions are specific to the ticket size.
   The resulting generated asm is very similar to the current
   hand-written code.

   This results in a very large reduction in lines of code.

2. Get rid of pv spinlocks, and replace them with pv ticket locks.
   Currently we have the notion of "pv spinlocks" where a pv-ops
   backend can completely replace the spinlock implementation with a
   new one.  This has two disadvantages:
- its a completely separate spinlock implementation, and
- there's an extra layer of indirection in front of every
  spinlock operation.

   To replace this, this series introduces the notion of pv ticket
   locks.  In this design, the ticket lock fastpath is the standard
   ticketlock algorithm.  However, after an iteration threshold it
   falls into a slow path which invokes a pv-op to block the spinning
   CPU.  Conversely, on unlock it does the normal unlock, and then
   checks to see if it needs to do a special "kick" to wake the next
   CPU.

   The net result is that the pv-op calls are restricted to the slow
   paths, and the normal fast-paths are largely unaffected.

   There are still some overheads, however:
   - When locking, there's some extra tests to count the spin iterations.
 There are no extra instructions in the uncontended case though.

   - When unlocking, there are two ways to detect when it is necessary
 to kick a blocked CPU:

  - with an unmodified struct spinlock, it can check to see if
head == tail after unlock; if not, then there's someone else
trying to lock, and we can do a kick.  Unfortunately this
generates very high level of redundant kicks, because the
waiting CPU might not have blocked yet (which is the common
case)

  - With a struct spinlock modified to include a "waiters" field,
to keep count of blocked CPUs, which is a much tighter test.
But it does increase the size of each spinlock by 50%
(doubled with padding).

The series is very fine-grained, and I've left a lightly cleaned up
history of the various options I evaluated, since they're not all
obvious.

Jeremy Fitzhardinge (20):
  x86/ticketlock: clean up types and accessors
  x86/ticketlock: convert spin loop to C
  x86/ticketlock: Use C for __ticket_spin_unlock
  x86/ticketlock: make large and small ticket versions of spin_lock the
same
  x86/ticketlock: make __ticket_spin_lock common
  x86/ticketlock: make __ticket_spin_trylock common
  x86/spinlocks: replace pv spinlocks with pv ticketlocks
  x86/ticketlock: collapse a layer of functions
  xen/pvticketlock: Xen implementation for PV ticket locks
  x86/pvticketlock: keep count of blocked cpus
  x86/pvticketlock: use callee-save for lock_spinning
  x86/pvticketlock: use callee-save for unlock_kick as well
  x86/pvticketlock: make sure unlock is seen by everyone before
checking waiters
  x86/ticketlock: loosen ordering restraints on unlock
  x86/ticketlock: prevent compiler reordering into locked region
  x86/ticketlock: don't inline _spin_unlock when using paravirt
spinlocks
  x86/ticketlock: clarify barrier in arch_spin_lock
  x86/ticketlock: remove .slock
  x86/ticketlocks: use overlapping read to eliminate mb()
  x86/ticketlock: rename ticketpair to head_tail

 arch/x86/Kconfig  |3 +
 arch/x86/include/asm/paravirt.h   |   30 +---
 arch/x86/include/asm/paravirt_types.h |8 +-
 arch/x86/include/asm/spinlock.h   |  250 +++--
 arch/x86/include/asm/spinlock_types.h |   41 +-
 arch/x86/kernel/paravirt-spinlocks.c  |   15 +--
 arch/x86/xen/spinlock.c   |  282 +
 kernel/Kconfig.locks  |2 +-
 8 files changed, 221 insertions(+), 410 deletions(-)

-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 11/20] x86/pvticketlock: use callee-save for lock_spinning

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path.  To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/paravirt.h   |2 +-
 arch/x86/include/asm/paravirt_types.h |2 +-
 arch/x86/kernel/paravirt-spinlocks.c  |2 +-
 arch/x86/xen/spinlock.c   |3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 79c77e5..dbd5914 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -724,7 +724,7 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 static inline void __ticket_lock_spinning(struct arch_spinlock *lock, unsigned 
ticket)
 {
-   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
+   PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
 static inline void ticket_unlock_kick(struct arch_spinlock *lock, unsigned 
ticket)
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 968ee31..9ad6d088 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -316,7 +316,7 @@ struct pv_mmu_ops {
 
 struct arch_spinlock;
 struct pv_lock_ops {
-   void (*lock_spinning)(struct arch_spinlock *lock, unsigned ticket);
+   struct paravirt_callee_save lock_spinning;
void (*unlock_kick)(struct arch_spinlock *lock, unsigned ticket);
 };
 
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index c2e010e..4251c1d 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -9,7 +9,7 @@
 
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-   .lock_spinning = paravirt_nop,
+   .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
.unlock_kick = paravirt_nop,
 #endif
 };
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index b8f09d5..2bd51d8 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -141,6 +141,7 @@ out:
w->lock = NULL;
spin_time_accum_blocked(start);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
 
 static void xen_unlock_kick(struct arch_spinlock *lock, unsigned next)
 {
@@ -193,7 +194,7 @@ void xen_uninit_lock_cpu(int cpu)
 
 void __init xen_init_spinlocks(void)
 {
-   pv_lock_ops.lock_spinning = xen_lock_spinning;
+   pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 01/20] x86/ticketlock: clean up types and accessors

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

A few cleanups to the way spinlocks are defined and accessed:
 - define __ticket_t which is the size of a spinlock ticket (ie, enough
   bits to hold all the cpus)
 - Define struct arch_spinlock as a union containing plain slock and
   the head and tail tickets
 - Use head and tail to implement some of the spinlock predicates.
 - Make all ticket variables unsigned.
 - Use TICKET_SHIFT to form constants

Most of this will be used in later patches.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h   |   24 ++--
 arch/x86/include/asm/spinlock_types.h |   20 ++--
 2 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 3089f70..d6d5784 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -56,11 +56,9 @@
  * much between them in performance though, especially as locks are out of 
line.
  */
 #if (NR_CPUS < 256)
-#define TICKET_SHIFT 8
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   short inc = 0x0100;
+   unsigned short inc = 1 << TICKET_SHIFT;
 
asm volatile (
LOCK_PREFIX "xaddw %w0, %1\n"
@@ -79,7 +77,7 @@ static __always_inline void 
__ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-   int tmp, new;
+   unsigned int tmp, new;
 
asm volatile("movzwl %2, %0\n\t"
 "cmpb %h0,%b0\n\t"
@@ -104,12 +102,10 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
 : "memory", "cc");
 }
 #else
-#define TICKET_SHIFT 16
-
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   int inc = 0x0001;
-   int tmp;
+   unsigned inc = 1 << TICKET_SHIFT;
+   unsigned tmp;
 
asm volatile(LOCK_PREFIX "xaddl %0, %1\n"
 "movzwl %w0, %2\n\t"
@@ -129,8 +125,8 @@ static __always_inline void 
__ticket_spin_lock(arch_spinlock_t *lock)
 
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-   int tmp;
-   int new;
+   unsigned tmp;
+   unsigned new;
 
asm volatile("movl %2,%0\n\t"
 "movl %0,%1\n\t"
@@ -160,16 +156,16 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-   int tmp = ACCESS_ONCE(lock->slock);
+   struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-   return !!(((tmp >> TICKET_SHIFT) ^ tmp) & ((1 << TICKET_SHIFT) - 1));
+   return !!(tmp.tail ^ tmp.head);
 }
 
 static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
 {
-   int tmp = ACCESS_ONCE(lock->slock);
+   struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-   return (((tmp >> TICKET_SHIFT) - tmp) & ((1 << TICKET_SHIFT) - 1)) > 1;
+   return ((tmp.tail - tmp.head) & TICKET_MASK) > 1;
 }
 
 #ifndef CONFIG_PARAVIRT_SPINLOCKS
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index dcb48b2..4582640 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -5,11 +5,27 @@
 # error "please don't include this file directly"
 #endif
 
+#include 
+
+#if (CONFIG_NR_CPUS < 256)
+typedef u8  __ticket_t;
+#else
+typedef u16 __ticket_t;
+#endif
+
+#define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
+#define TICKET_MASK((1 << TICKET_SHIFT) - 1)
+
 typedef struct arch_spinlock {
-   unsigned int slock;
+   union {
+   unsigned int slock;
+   struct __raw_tickets {
+   __ticket_t head, tail;
+   } tickets;
+   };
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED  { 0 }
+#define __ARCH_SPIN_LOCK_UNLOCKED  { { .slock = 0 } }
 
 typedef struct {
unsigned int lock;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 06/20] x86/ticketlock: make __ticket_spin_trylock common

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Make trylock code common regardless of ticket size.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h   |   55 +---
 arch/x86/include/asm/spinlock_types.h |3 ++
 2 files changed, 19 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 4f9fa24..507f83c 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -100,48 +100,25 @@ static __always_inline void __ticket_spin_lock(struct 
arch_spinlock *lock)
barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
-#if (NR_CPUS < 256)
 static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 {
-   unsigned int tmp, new;
-
-   asm volatile("movzwl %2, %0\n\t"
-"cmpb %h0,%b0\n\t"
-"leal 0x100(%" REG_PTR_MODE "0), %1\n\t"
-"jne 1f\n\t"
-LOCK_PREFIX "cmpxchgw %w1,%2\n\t"
-"1:"
-"sete %b1\n\t"
-"movzbl %b1,%0\n\t"
-: "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-:
-: "memory", "cc");
-
-   return tmp;
-}
-#else
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
-{
-   unsigned tmp;
-   unsigned new;
-
-   asm volatile("movl %2,%0\n\t"
-"movl %0,%1\n\t"
-"roll $16, %0\n\t"
-"cmpl %0,%1\n\t"
-"leal 0x0001(%" REG_PTR_MODE "0), %1\n\t"
-"jne 1f\n\t"
-LOCK_PREFIX "cmpxchgl %1,%2\n\t"
-"1:"
-"sete %b1\n\t"
-"movzbl %b1,%0\n\t"
-: "=&a" (tmp), "=&q" (new), "+m" (lock->slock)
-:
-: "memory", "cc");
-
-   return tmp;
+   union {
+   struct __raw_tickets tickets;
+   __ticketpair_t slock;
+   } tmp, new;
+   int ret;
+
+   tmp.tickets = ACCESS_ONCE(lock->tickets);
+   if (tmp.tickets.head != tmp.tickets.tail)
+   return 0;
+
+   new.slock = tmp.slock + (1 << TICKET_SHIFT);
+
+   ret = cmpxchg(&lock->ticketpair, tmp.slock, new.slock) == tmp.slock;
+   barrier();  /* just make nothing creeps before lock is 
claimed */
+
+   return ret;
 }
-#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 4582640..48dafc3 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -9,8 +9,10 @@
 
 #if (CONFIG_NR_CPUS < 256)
 typedef u8  __ticket_t;
+typedef u16 __ticketpair_t;
 #else
 typedef u16 __ticket_t;
+typedef u32 __ticketpair_t;
 #endif
 
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
@@ -19,6 +21,7 @@ typedef u16 __ticket_t;
 typedef struct arch_spinlock {
union {
unsigned int slock;
+   __ticketpair_t ticketpair;
struct __raw_tickets {
__ticket_t head, tail;
} tickets;
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 08/20] x86/ticketlock: collapse a layer of functions

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Now that the paravirtualization layer doesn't exist at the spinlock
level any more, we can collapse the __ticket_ functions into the arch_
functions.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h |   35 +--
 1 files changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 06815f3..a79dfee 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -110,7 +110,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock, __t
ticket_unlock_kick(lock, next);
 }
 
-static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
register struct __raw_tickets inc;
 
@@ -130,7 +130,7 @@ static __always_inline void __ticket_spin_lock(struct 
arch_spinlock *lock)
barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
 
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
+static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
union {
struct __raw_tickets tickets;
@@ -150,53 +150,28 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
return ret;
 }
 
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
+static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
__ticket_t next = lock->tickets.head + 1;
__ticket_unlock_release(lock);
__ticket_unlock_kick(lock, next);
 }
 
-static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
return !!(tmp.tail ^ tmp.head);
 }
 
-static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
+static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
return ((tmp.tail - tmp.head) & TICKET_MASK) > 1;
 }
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_locked(lock);
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_contended(lock);
-}
 #define arch_spin_is_contended arch_spin_is_contended
 
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-   __ticket_spin_lock(lock);
-}
-
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-   return __ticket_spin_trylock(lock);
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-   __ticket_spin_unlock(lock);
-}
-
 static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
  unsigned long flags)
 {
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 03/20] x86/ticketlock: Use C for __ticket_spin_unlock

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

If we don't need to use a locked inc for unlock, then implement it in C.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h |   33 ++---
 1 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 6711d36..082990a 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -33,9 +33,23 @@
  * On PPro SMP or if we are using OOSTORE, we use a locked operation to unlock
  * (PPro errata 66, 92)
  */
-# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+   if (sizeof(lock->tickets.head) == sizeof(u8))
+   asm (LOCK_PREFIX "incb %0"
+: "+m" (lock->tickets.head) : : "memory");
+   else
+   asm (LOCK_PREFIX "incw %0"
+: "+m" (lock->tickets.head) : : "memory");
+
+}
 #else
-# define UNLOCK_LOCK_PREFIX
+static __always_inline void __ticket_unlock_release(struct arch_spinlock *lock)
+{
+   barrier();
+   lock->tickets.head++;
+   barrier();
+}
 #endif
 
 /*
@@ -93,14 +107,6 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
 
return tmp;
 }
-
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
-{
-   asm volatile(UNLOCK_LOCK_PREFIX "incb %0"
-: "+m" (lock->slock)
-:
-: "memory", "cc");
-}
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
@@ -144,15 +150,12 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
 
return tmp;
 }
+#endif
 
 static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 {
-   asm volatile(UNLOCK_LOCK_PREFIX "incw %0"
-: "+m" (lock->slock)
-:
-: "memory", "cc");
+   __ticket_unlock_release(lock);
 }
-#endif
 
 static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 {
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 04/20] x86/ticketlock: make large and small ticket versions of spin_lock the same

2010-11-03 Thread Jeremy Fitzhardinge
From: Jeremy Fitzhardinge 

Make the bulk of __ticket_spin_lock look identical for large and small
number of cpus.

Signed-off-by: Jeremy Fitzhardinge 
---
 arch/x86/include/asm/spinlock.h |   23 ---
 1 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 082990a..7586d7a 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -72,19 +72,16 @@ static __always_inline void __ticket_unlock_release(struct 
arch_spinlock *lock)
 #if (NR_CPUS < 256)
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   register union {
-   struct __raw_tickets tickets;
-   unsigned short slock;
-   } inc = { .slock = 1 << TICKET_SHIFT };
+   register struct __raw_tickets inc = { .tail = 1 };
 
asm volatile (LOCK_PREFIX "xaddw %w0, %1\n"
- : "+Q" (inc), "+m" (lock->slock) : : "memory", "cc");
+ : "+r" (inc), "+m" (lock->tickets) : : "memory", "cc");
 
for (;;) {
-   if (inc.tickets.head == inc.tickets.tail)
+   if (inc.head == inc.tail)
return;
cpu_relax();
-   inc.tickets.head = ACCESS_ONCE(lock->tickets.head);
+   inc.head = ACCESS_ONCE(lock->tickets.head);
}
barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
@@ -110,21 +107,17 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
 #else
 static __always_inline void __ticket_spin_lock(arch_spinlock_t *lock)
 {
-   unsigned inc = 1 << TICKET_SHIFT;
-   __ticket_t tmp;
+   register struct __raw_tickets inc = { .tickets.tail = 1 };
 
asm volatile(LOCK_PREFIX "xaddl %0, %1\n\t"
-: "+r" (inc), "+m" (lock->slock)
+: "+r" (inc), "+m" (lock->tickets)
 : : "memory", "cc");
 
-   tmp = inc;
-   inc >>= TICKET_SHIFT;
-
for (;;) {
-   if ((__ticket_t)inc == tmp)
+   if (inc.head == inc.tail)
return;
cpu_relax();
-   tmp = ACCESS_ONCE(lock->tickets.head);
+   inc.head = ACCESS_ONCE(lock->tickets.head);
}
barrier();  /* make sure nothing creeps before the lock is 
taken */
 }
-- 
1.7.2.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] xen: xenfs: privcmd: check put_user() return code

2010-10-29 Thread Jeremy Fitzhardinge
 On 10/29/2010 10:44 AM, Ian Campbell wrote:
> On Fri, 2010-10-29 at 18:18 +0100, Jeremy Fitzhardinge wrote:
>> On 10/28/2010 04:39 AM, Vasiliy Kulikov wrote:
>>> put_user() may fail.  In this case propagate error code from
>>> privcmd_ioctl_mmap_batch().
>> Thanks for looking at this.  I'm in two minds about this; the existing
>> logic is such that these put_users can only fail if something else has
>> already failed and its returning an error.  I guess it would be useful
>> to get an EFAULT if you've got a problem writing back the results.
>>
>> IanC, any opinion?
> Not a strong one.
>
> Perhaps what we really want in this case is for traverse_pages to return
> the total number of callback failures it encountered rather than
> aborting after the first failure?
>
> On the other hand you are correct that gather_array() has already
> touched all the pages which we are going to be touching here so how
> likely is a new failure at this point anyway?

I could think of two cases: the array is mapped RO, so only the
writeback fails, or someone changes the mapping under our feet from
another thread.

J

> Ian.
>
>> Thanks,
>> J
>>
>>> Signed-off-by: Vasiliy Kulikov 
>>> ---
>>>  Compile tested.
>>>
>>>  drivers/xen/xenfs/privcmd.c |8 ++--
>>>  1 files changed, 2 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/xen/xenfs/privcmd.c b/drivers/xen/xenfs/privcmd.c
>>> index f80be7f..2eb04c8 100644
>>> --- a/drivers/xen/xenfs/privcmd.c
>>> +++ b/drivers/xen/xenfs/privcmd.c
>>> @@ -266,9 +266,7 @@ static int mmap_return_errors(void *data, void *state)
>>> xen_pfn_t *mfnp = data;
>>> struct mmap_batch_state *st = state;
>>>  
>>> -   put_user(*mfnp, st->user++);
>>> -
>>> -   return 0;
>>> +   return put_user(*mfnp, st->user++);
>>>  }
>>>  
>>>  static struct vm_operations_struct privcmd_vm_ops;
>>> @@ -323,10 +321,8 @@ static long privcmd_ioctl_mmap_batch(void __user 
>>> *udata)
>>> up_write(&mm->mmap_sem);
>>>  
>>> if (state.err > 0) {
>>> -   ret = 0;
>>> -
>>> state.user = m.arr;
>>> -   traverse_pages(m.num, sizeof(xen_pfn_t),
>>> +   ret = traverse_pages(m.num, sizeof(xen_pfn_t),
>>>&pagelist,
>>>mmap_return_errors, &state);
>>> }
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] xen: xenfs: privcmd: check put_user() return code

2010-10-29 Thread Jeremy Fitzhardinge
 On 10/28/2010 04:39 AM, Vasiliy Kulikov wrote:
> put_user() may fail.  In this case propagate error code from
> privcmd_ioctl_mmap_batch().

Thanks for looking at this.  I'm in two minds about this; the existing
logic is such that these put_users can only fail if something else has
already failed and its returning an error.  I guess it would be useful
to get an EFAULT if you've got a problem writing back the results.

IanC, any opinion?

Thanks,
J

> Signed-off-by: Vasiliy Kulikov 
> ---
>  Compile tested.
>
>  drivers/xen/xenfs/privcmd.c |8 ++--
>  1 files changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/xen/xenfs/privcmd.c b/drivers/xen/xenfs/privcmd.c
> index f80be7f..2eb04c8 100644
> --- a/drivers/xen/xenfs/privcmd.c
> +++ b/drivers/xen/xenfs/privcmd.c
> @@ -266,9 +266,7 @@ static int mmap_return_errors(void *data, void *state)
>   xen_pfn_t *mfnp = data;
>   struct mmap_batch_state *st = state;
>  
> - put_user(*mfnp, st->user++);
> -
> - return 0;
> + return put_user(*mfnp, st->user++);
>  }
>  
>  static struct vm_operations_struct privcmd_vm_ops;
> @@ -323,10 +321,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata)
>   up_write(&mm->mmap_sem);
>  
>   if (state.err > 0) {
> - ret = 0;
> -
>   state.user = m.arr;
> - traverse_pages(m.num, sizeof(xen_pfn_t),
> + ret = traverse_pages(m.num, sizeof(xen_pfn_t),
>  &pagelist,
>  mmap_return_errors, &state);
>   }

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] Re: [PATCH] x86/pvclock-xen: zero last_value on resume

2010-10-27 Thread Jeremy Fitzhardinge
 On 10/26/2010 10:48 AM, Glauber Costa wrote:
> On Tue, 2010-10-26 at 09:59 -0700, Jeremy Fitzhardinge wrote:
>> If the guest domain has been suspend/resumed or migrated, then the
>> system clock backing the pvclock clocksource may revert to a smaller
>> value (ie, can be non-monotonic across the migration/save-restore).
>> Make sure we zero last_value in that case so that the domain
>> continues to see clock updates.
>>
>> [ I don't know if kvm needs an analogous fix or not. ]
> After migration, save/restore, etc, we issue an ioctl where we tell
> the host the last clock value. That (in theory) guarantees monotonicity.
>
> I am not opposed to this patch in any way, however.

Thanks.

HPA, do you want to take this, or shall I send it on?

Thanks,
J
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH] x86/pvclock-xen: zero last_value on resume

2010-10-26 Thread Jeremy Fitzhardinge


If the guest domain has been suspend/resumed or migrated, then the
system clock backing the pvclock clocksource may revert to a smaller
value (ie, can be non-monotonic across the migration/save-restore).
Make sure we zero last_value in that case so that the domain
continues to see clock updates.

[ I don't know if kvm needs an analogous fix or not. ]

Signed-off-by: Jeremy Fitzhardinge 
Cc: Stable Kernel 
Reported-by: Eelco Dolstra 
Reported-by: Olivier Hanesse 
Bisected-by: Cédric Schieli 
Tested-by: Cédric Schieli 

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index cd02f32..6226870 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -11,5 +11,6 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info 
*src);
 void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
struct pvclock_vcpu_time_info *vcpu,
struct timespec *ts);
+void pvclock_resume(void);
 
 #endif /* _ASM_X86_PVCLOCK_H */
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 239427c..a4f07c1 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -120,6 +120,11 @@ unsigned long pvclock_tsc_khz(struct 
pvclock_vcpu_time_info *src)
 
 static atomic64_t last_value = ATOMIC64_INIT(0);
 
+void pvclock_resume(void)
+{
+   atomic64_set(&last_value, 0);
+}
+
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
struct pvclock_shadow_time shadow;
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index b2bb5aa..5da5e53 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -426,6 +426,8 @@ void xen_timer_resume(void)
 {
int cpu;
 
+   pvclock_resume();
+
if (xen_clockevent != &xen_vcpuop_clockevent)
return;
 


From 29acbb4e1d93e719250648db1ce8c7a24144fd86 Mon Sep 17 00:00:00 2001
From: Jeremy Fitzhardinge 
Date: Mon, 25 Oct 2010 16:53:46 -0700
Subject: [PATCH] x86/pvclock: zero last_value on resume

If the guest domain has been suspend/resumed or migrated, then the
system clock backing the pvclock clocksource may revert to a smaller
value (ie, can be non-monotonic across the migration/save-restore).
Make sure we zero last_value in that case so that the domain
continues to see clock updates.

Signed-off-by: Jeremy Fitzhardinge 
Cc: Stable Kernel 
Reported-by: Eelco Dolstra 
Reported-by: Olivier Hanesse 
Bisected-by: Cédric Schieli 
Tested-by: Cédric Schieli 

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index cd02f32..6226870 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -11,5 +11,6 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info 
*src);
 void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
struct pvclock_vcpu_time_info *vcpu,
struct timespec *ts);
+void pvclock_resume(void);
 
 #endif /* _ASM_X86_PVCLOCK_H */
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 239427c..a4f07c1 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -120,6 +120,11 @@ unsigned long pvclock_tsc_khz(struct 
pvclock_vcpu_time_info *src)
 
 static atomic64_t last_value = ATOMIC64_INIT(0);
 
+void pvclock_resume(void)
+{
+   atomic64_set(&last_value, 0);
+}
+
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
struct pvclock_shadow_time shadow;
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index b2bb5aa..5da5e53 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -426,6 +426,8 @@ void xen_timer_resume(void)
 {
int cpu;
 
+   pvclock_resume();
+
if (xen_clockevent != &xen_vcpuop_clockevent)
return;
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: xen PV on HVM and initial domain merge in linux-next

2010-10-20 Thread Jeremy Fitzhardinge
Yes all ok.

J

"Stefano Stabellini"  wrote:

>On Wed, 20 Oct 2010, Stephen Rothwell wrote:
>> Hi Stefano,
>> 
>> [just casting the net a bit wider ...]
>> 
>> On Tue, 19 Oct 2010 18:51:47 +0100 Stefano Stabellini
> wrote:
>> >
>> > I forgot to CC the LKML and linux-next...
>> > 
>> > On Tue, 19 Oct 2010, Stefano Stabellini wrote:
>> > > Stephen,
>> > > I have two patch series to merge in linux-next:
>> > > 
>> > > PV on HVM: receive interrupts as xen events
>> > > xen: initial domain support
>> > > 
>> > > they have all the acked-by needed and are both stable since
>several
>> > > weeks, however they depend on Konrad's xen-pcifront series and
>for this
>> > > reason I waited until now to ask for a merge in linux-next.
>> > > 
>> > > Could you please pull:
>> > > 
>> > > git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git
>linux-next-initial-domain-v4
>> > > 
>> > > it contains both series rebased on Konrad's pcifront series
>merged on
>> > > linux-next (warning: it still contains the merge commit of
>> > > xen-pcifront-0.8.2 in linux-next).
>> > > Let me know if you have any conflicts or if you need me to change
>the
>> > > branch somehow.
>> 
>> Not following the Xen develpment at all, I would like to have a
>positive
>> reply from the listed Xen contacts, please,
>> 
>
>Sure.
>Jeremy?
>
>
>> I do have concerns that this is turning up so late, but I realise
>that
>> that is mainly due to a misunderstanding on the part of some of the
>Xen
>> community.
>> 
>
>Thank you very much for understanding!
>
>
>> Also, the above tree is based on next-20101019 which means that I
>cannot
>> use it as is.  All the trees merged into linux-next must be base on
>some
>> other stable tree (almost always Linus' tree).  linux-next is rebuilt
>> from scratch every day, so I cannot ever include a previous day's
>version.
>> 
>> Merging in other stable trees is OK (as long as the other maintainer
>is
>> aware of that and makes sure that their tree does not reabse).
>> 
>> Basically what you send to me should be what you intend to send to
>Linus
>> during the next merge window.
> 
>All right.
>I merged Jeremy's and Konrad's branches (the ones you just merged on
>linux-next) on top of linux 2.6.36 rc8, then I rebased my series on top
>of the result.
>Please checkout this branch:
>
>git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git
>2.6.36-rc8-initial-domain-v5
>
>and let me know if it is suitable, it shouldn't have any merge
>conflicts.
>
>Cheers,
>
>Stefano
>
>___
>Virtualization mailing list
>Virtualization@lists.linux-foundation.org
>https://lists.linux-foundation.org/mailman/listinfo/virtualization

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


  1   2   3   4   5   6   7   8   9   10   >