Re: [PATCH 0/6] x86: reduce paravirtualized spinlock overhead

2015-05-05 Thread Jeremy Fitzhardinge
On 05/03/2015 10:55 PM, Juergen Gross wrote:
> I did a small measurement of the pure locking functions on bare metal
> without and with my patches.
>
> spin_lock() for the first time (lock and code not in cache) dropped from
> about 600 to 500 cycles.
>
> spin_unlock() for first time dropped from 145 to 87 cycles.
>
> spin_lock() in a loop dropped from 48 to 45 cycles.
>
> spin_unlock() in the same loop dropped from 24 to 22 cycles.

Did you isolate icache hot/cold from dcache hot/cold? It seems to me the
main difference will be whether the branch predictor is warmed up rather
than if the lock itself is in dcache, but its much more likely that the
lock code is icache if the code is lock intensive, making the cold case
moot. But that's pure speculation.

Could you see any differences in workloads beyond microbenchmarks?

Not that its my call at all, but I think we'd need to see some concrete
improvements in real workloads before adding the complexity of more pvops.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] x86: reduce paravirtualized spinlock overhead

2015-05-05 Thread Jeremy Fitzhardinge
On 05/03/2015 10:55 PM, Juergen Gross wrote:
 I did a small measurement of the pure locking functions on bare metal
 without and with my patches.

 spin_lock() for the first time (lock and code not in cache) dropped from
 about 600 to 500 cycles.

 spin_unlock() for first time dropped from 145 to 87 cycles.

 spin_lock() in a loop dropped from 48 to 45 cycles.

 spin_unlock() in the same loop dropped from 24 to 22 cycles.

Did you isolate icache hot/cold from dcache hot/cold? It seems to me the
main difference will be whether the branch predictor is warmed up rather
than if the lock itself is in dcache, but its much more likely that the
lock code is icache if the code is lock intensive, making the cold case
moot. But that's pure speculation.

Could you see any differences in workloads beyond microbenchmarks?

Not that its my call at all, but I think we'd need to see some concrete
improvements in real workloads before adding the complexity of more pvops.

J
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] x86: reduce paravirtualized spinlock overhead

2015-04-30 Thread Jeremy Fitzhardinge
On 04/30/2015 03:53 AM, Juergen Gross wrote:
> Paravirtualized spinlocks produce some overhead even if the kernel is
> running on bare metal. The main reason are the more complex locking
> and unlocking functions. Especially unlocking is no longer just one
> instruction but so complex that it is no longer inlined.
>
> This patch series addresses this issue by adding two more pvops
> functions to reduce the size of the inlined spinlock functions. When
> running on bare metal unlocking is again basically one instruction.

Out of curiosity, is there a measurable difference?

J

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/6] x86: reduce paravirtualized spinlock overhead

2015-04-30 Thread Jeremy Fitzhardinge
On 04/30/2015 03:53 AM, Juergen Gross wrote:
 Paravirtualized spinlocks produce some overhead even if the kernel is
 running on bare metal. The main reason are the more complex locking
 and unlocking functions. Especially unlocking is no longer just one
 instruction but so complex that it is no longer inlined.

 This patch series addresses this issue by adding two more pvops
 functions to reduce the size of the inlined spinlock functions. When
 running on bare metal unlocking is again basically one instruction.

Out of curiosity, is there a measurable difference?

J

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Jeremy Fitzhardinge

On 02/11/2015 09:24 AM, Oleg Nesterov wrote:
> I agree, and I have to admit I am not sure I fully understand why
> unlock uses the locked add. Except we need a barrier to avoid the race
> with the enter_slowpath() users, of course. Perhaps this is the only
> reason?

Right now it needs to be a locked operation to prevent read-reordering.
x86 memory ordering rules state that all writes are seen in a globally
consistent order, and are globally ordered wrt reads *on the same
addresses*, but reads to different addresses can be reordered wrt to writes.

So, if the unlocking add were not a locked operation:

__add(>tickets.head, TICKET_LOCK_INC);/* not locked */

if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

Then the read of lock->tickets.tail can be reordered before the unlock,
which introduces a race:

/* read reordered here */
if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) /* false */
/* ... */;

/* other CPU sets SLOWPATH and blocks */

__add(>tickets.head, TICKET_LOCK_INC);/* not locked */

/* other CPU hung */

So it doesn't *have* to be a locked operation. This should also work:

__add(>tickets.head, TICKET_LOCK_INC);/* not locked */

lfence();   /* prevent read 
reordering */
if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

but in practice a locked add is cheaper than an lfence (or at least was).

This *might* be OK, but I think it's on dubious ground:

__add(>tickets.head, TICKET_LOCK_INC);/* not locked */

/* read overlaps write, and so is ordered */
if (unlikely(lock->head_tail & (TICKET_SLOWPATH_FLAG << TICKET_SHIFT))
__ticket_unlock_slowpath(lock, prev);

because I think Intel and AMD differed in interpretation about how
overlapping but different-sized reads & writes are ordered (or it simply
isn't architecturally defined).

If the slowpath flag is moved to head, then it would always have to be
locked anyway, because it needs to be atomic against other CPU's RMW
operations setting the flag.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Jeremy Fitzhardinge

On 02/11/2015 09:24 AM, Oleg Nesterov wrote:
 I agree, and I have to admit I am not sure I fully understand why
 unlock uses the locked add. Except we need a barrier to avoid the race
 with the enter_slowpath() users, of course. Perhaps this is the only
 reason?

Right now it needs to be a locked operation to prevent read-reordering.
x86 memory ordering rules state that all writes are seen in a globally
consistent order, and are globally ordered wrt reads *on the same
addresses*, but reads to different addresses can be reordered wrt to writes.

So, if the unlocking add were not a locked operation:

__add(lock-tickets.head, TICKET_LOCK_INC);/* not locked */

if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

Then the read of lock-tickets.tail can be reordered before the unlock,
which introduces a race:

/* read reordered here */
if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG)) /* false */
/* ... */;

/* other CPU sets SLOWPATH and blocks */

__add(lock-tickets.head, TICKET_LOCK_INC);/* not locked */

/* other CPU hung */

So it doesn't *have* to be a locked operation. This should also work:

__add(lock-tickets.head, TICKET_LOCK_INC);/* not locked */

lfence();   /* prevent read 
reordering */
if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

but in practice a locked add is cheaper than an lfence (or at least was).

This *might* be OK, but I think it's on dubious ground:

__add(lock-tickets.head, TICKET_LOCK_INC);/* not locked */

/* read overlaps write, and so is ordered */
if (unlikely(lock-head_tail  (TICKET_SLOWPATH_FLAG  TICKET_SHIFT))
__ticket_unlock_slowpath(lock, prev);

because I think Intel and AMD differed in interpretation about how
overlapping but different-sized reads  writes are ordered (or it simply
isn't architecturally defined).

If the slowpath flag is moved to head, then it would always have to be
locked anyway, because it needs to be atomic against other CPU's RMW
operations setting the flag.

J
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-10 Thread Jeremy Fitzhardinge

On 02/10/2015 05:26 AM, Oleg Nesterov wrote:
> On 02/10, Raghavendra K T wrote:
>> On 02/10/2015 06:23 AM, Linus Torvalds wrote:
>>
>>>  add_smp(>tickets.head, TICKET_LOCK_INC);
>>>  if (READ_ONCE(lock->tickets.tail) & TICKET_SLOWPATH_FLAG) ..
>>>
>>> into something like
>>>
>>>  val = xadd((>ticket.head_tail, TICKET_LOCK_INC << 
>>> TICKET_SHIFT);
>>>  if (unlikely(val & TICKET_SLOWPATH_FLAG)) ...
>>>
>>> would be the right thing to do. Somebody should just check that I got
>>> that shift right, and that the tail is in the high bytes (head really
>>> needs to be high to work, if it's in the low byte(s) the xadd would
>>> overflow from head into tail which would be wrong).
>> Unfortunately xadd could result in head overflow as tail is high.
>>
>> The other option was repeated cmpxchg which is bad I believe.
>> Any suggestions?
> Stupid question... what if we simply move SLOWPATH from .tail to .head?
> In this case arch_spin_unlock() could do xadd(tickets.head) and check
> the result

Well, right now, "tail" is manipulated by locked instructions by CPUs
who are contending for the ticketlock, but head can be manipulated
unlocked by the CPU which currently owns the ticketlock. If SLOWPATH
moved into head, then non-owner CPUs would be touching head, requiring
everyone to use locked instructions on it.

That's the theory, but I don't see much (any?) code which depends on that.

Ideally we could find a way so that pv ticketlocks could use a plain
unlocked add for the unlock like the non-pv case, but I just don't see a
way to do it.

> In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg
> the whole .head_tail. Plus obviously more boring changes. This needs a
> separate patch even _if_ this can work.

Definitely.

> BTW. If we move "clear slowpath" into "lock" path, then probably trylock
> should be changed too? Something like below, we just need to clear SLOWPATH
> before cmpxchg.

How important / widely used is trylock these days?

J

>
> Oleg.
>
> --- x/arch/x86/include/asm/spinlock.h
> +++ x/arch/x86/include/asm/spinlock.h
> @@ -109,7 +109,8 @@ static __always_inline int arch_spin_try
>   if (old.tickets.head != (old.tickets.tail & ~TICKET_SLOWPATH_FLAG))
>   return 0;
>  
> - new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT);
> + new.tickets.head = old.tickets.head;
> + new.tickets.tail = (old.tickets.tail & ~TICKET_SLOWPATH_FLAG) + 
> TICKET_LOCK_INC;
>  
>   /* cmpxchg is a full barrier, so nothing can move before it */
>   return cmpxchg(>head_tail, old.head_tail, new.head_tail) == 
> old.head_tail;
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-10 Thread Jeremy Fitzhardinge

On 02/10/2015 05:26 AM, Oleg Nesterov wrote:
 On 02/10, Raghavendra K T wrote:
 On 02/10/2015 06:23 AM, Linus Torvalds wrote:

  add_smp(lock-tickets.head, TICKET_LOCK_INC);
  if (READ_ONCE(lock-tickets.tail)  TICKET_SLOWPATH_FLAG) ..

 into something like

  val = xadd((lock-ticket.head_tail, TICKET_LOCK_INC  
 TICKET_SHIFT);
  if (unlikely(val  TICKET_SLOWPATH_FLAG)) ...

 would be the right thing to do. Somebody should just check that I got
 that shift right, and that the tail is in the high bytes (head really
 needs to be high to work, if it's in the low byte(s) the xadd would
 overflow from head into tail which would be wrong).
 Unfortunately xadd could result in head overflow as tail is high.

 The other option was repeated cmpxchg which is bad I believe.
 Any suggestions?
 Stupid question... what if we simply move SLOWPATH from .tail to .head?
 In this case arch_spin_unlock() could do xadd(tickets.head) and check
 the result

Well, right now, tail is manipulated by locked instructions by CPUs
who are contending for the ticketlock, but head can be manipulated
unlocked by the CPU which currently owns the ticketlock. If SLOWPATH
moved into head, then non-owner CPUs would be touching head, requiring
everyone to use locked instructions on it.

That's the theory, but I don't see much (any?) code which depends on that.

Ideally we could find a way so that pv ticketlocks could use a plain
unlocked add for the unlock like the non-pv case, but I just don't see a
way to do it.

 In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg
 the whole .head_tail. Plus obviously more boring changes. This needs a
 separate patch even _if_ this can work.

Definitely.

 BTW. If we move clear slowpath into lock path, then probably trylock
 should be changed too? Something like below, we just need to clear SLOWPATH
 before cmpxchg.

How important / widely used is trylock these days?

J


 Oleg.

 --- x/arch/x86/include/asm/spinlock.h
 +++ x/arch/x86/include/asm/spinlock.h
 @@ -109,7 +109,8 @@ static __always_inline int arch_spin_try
   if (old.tickets.head != (old.tickets.tail  ~TICKET_SLOWPATH_FLAG))
   return 0;
  
 - new.head_tail = old.head_tail + (TICKET_LOCK_INC  TICKET_SHIFT);
 + new.tickets.head = old.tickets.head;
 + new.tickets.tail = (old.tickets.tail  ~TICKET_SLOWPATH_FLAG) + 
 TICKET_LOCK_INC;
  
   /* cmpxchg is a full barrier, so nothing can move before it */
   return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
 old.head_tail;


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-08 Thread Jeremy Fitzhardinge
On 02/06/2015 06:49 AM, Raghavendra K T wrote:
> Paravirt spinlock clears slowpath flag after doing unlock.
> As explained by Linus currently it does:
> prev = *lock;
> add_smp(>tickets.head, TICKET_LOCK_INC);
>
> /* add_smp() is a full mb() */
>
> if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
> __ticket_unlock_slowpath(lock, prev);
>
>
> which is *exactly* the kind of things you cannot do with spinlocks,
> because after you've done the "add_smp()" and released the spinlock
> for the fast-path, you can't access the spinlock any more.  Exactly
> because a fast-path lock might come in, and release the whole data
> structure.

Yeah, that's an embarrasingly obvious bug in retrospect.

> Linus suggested that we should not do any writes to lock after unlock(),
> and we can move slowpath clearing to fastpath lock.

Yep, that seems like a sound approach.

> However it brings additional case to be handled, viz., slowpath still
> could be set when somebody does arch_trylock. Handle that too by ignoring
> slowpath flag during lock availability check.
>
> Reported-by: Sasha Levin 
> Suggested-by: Linus Torvalds 
> Signed-off-by: Raghavendra K T 
> ---
>  arch/x86/include/asm/spinlock.h | 70 
> -
>  1 file changed, 34 insertions(+), 36 deletions(-)
>
> diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
> index 625660f..0829f86 100644
> --- a/arch/x86/include/asm/spinlock.h
> +++ b/arch/x86/include/asm/spinlock.h
> @@ -49,6 +49,23 @@ static inline void __ticket_enter_slowpath(arch_spinlock_t 
> *lock)
>   set_bit(0, (volatile unsigned long *)>tickets.tail);
>  }
>  
> +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
> +{
> + arch_spinlock_t old, new;
> + __ticket_t diff;
> +
> + old.tickets = READ_ONCE(lock->tickets);

Couldn't the caller pass in the lock state that it read rather than
re-reading it?

> + diff = (old.tickets.tail & ~TICKET_SLOWPATH_FLAG) - old.tickets.head;
> +
> + /* try to clear slowpath flag when there are no contenders */
> + if ((old.tickets.tail & TICKET_SLOWPATH_FLAG) &&
> + (diff == TICKET_LOCK_INC)) {
> + new = old;
> + new.tickets.tail &= ~TICKET_SLOWPATH_FLAG;
> + cmpxchg(>head_tail, old.head_tail, new.head_tail);
> + }
> +}
> +
>  #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
>  static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock,
>   __ticket_t ticket)
> @@ -59,6 +76,10 @@ static inline void __ticket_unlock_kick(arch_spinlock_t 
> *lock,
>  {
>  }
>  
> +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
> +{
> +}
> +
>  #endif /* CONFIG_PARAVIRT_SPINLOCKS */
>  
>  static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
> @@ -84,7 +105,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t 
> *lock)
>   register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
>  
>   inc = xadd(>tickets, inc);
> - if (likely(inc.head == inc.tail))
> + if (likely(inc.head == (inc.tail & ~TICKET_SLOWPATH_FLAG)))

The intent of this conditional was to be the quickest possible path when
taking a fastpath lock, with the code below being used for all slowpath
locks (free or taken). So I don't think masking out SLOWPATH_FLAG is
necessary here.

>   goto out;
>  
>   inc.tail &= ~TICKET_SLOWPATH_FLAG;
> @@ -98,7 +119,10 @@ static __always_inline void 
> arch_spin_lock(arch_spinlock_t *lock)
>   } while (--count);
>   __ticket_lock_spinning(lock, inc.tail);
>   }
> -out: barrier();  /* make sure nothing creeps before the lock is taken */
> +out:
> + __ticket_check_and_clear_slowpath(lock);
> +
> + barrier();  /* make sure nothing creeps before the lock is taken */

Which means that if "goto out" path is only ever used for fastpath
locks, you can limit calling __ticket_check_and_clear_slowpath() to the
slowpath case.

>  }
>  
>  static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
> @@ -115,47 +139,21 @@ static __always_inline int 
> arch_spin_trylock(arch_spinlock_t *lock)
>   return cmpxchg(>head_tail, old.head_tail, new.head_tail) == 
> old.head_tail;
>  }
>  
> -static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock,
> - arch_spinlock_t old)
> -{
> - arch_spinlock_t new;
> -
> - BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
> -
> - /* Perform the unlock on the "before" copy */
> - old.tickets.head += TICKET_LOCK_INC;

NB (see below)

> -
> - /* Clear the slowpath flag */
> - new.head_tail = old.head_tail & ~(TICKET_SLOWPATH_FLAG << TICKET_SHIFT);
> -
> - /*
> -  * If the lock is uncontended, clear the flag - use cmpxchg in
> -  * case 

Re: [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-08 Thread Jeremy Fitzhardinge
On 02/06/2015 06:49 AM, Raghavendra K T wrote:
 Paravirt spinlock clears slowpath flag after doing unlock.
 As explained by Linus currently it does:
 prev = *lock;
 add_smp(lock-tickets.head, TICKET_LOCK_INC);

 /* add_smp() is a full mb() */

 if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);


 which is *exactly* the kind of things you cannot do with spinlocks,
 because after you've done the add_smp() and released the spinlock
 for the fast-path, you can't access the spinlock any more.  Exactly
 because a fast-path lock might come in, and release the whole data
 structure.

Yeah, that's an embarrasingly obvious bug in retrospect.

 Linus suggested that we should not do any writes to lock after unlock(),
 and we can move slowpath clearing to fastpath lock.

Yep, that seems like a sound approach.

 However it brings additional case to be handled, viz., slowpath still
 could be set when somebody does arch_trylock. Handle that too by ignoring
 slowpath flag during lock availability check.

 Reported-by: Sasha Levin sasha.le...@oracle.com
 Suggested-by: Linus Torvalds torva...@linux-foundation.org
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
  arch/x86/include/asm/spinlock.h | 70 
 -
  1 file changed, 34 insertions(+), 36 deletions(-)

 diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
 index 625660f..0829f86 100644
 --- a/arch/x86/include/asm/spinlock.h
 +++ b/arch/x86/include/asm/spinlock.h
 @@ -49,6 +49,23 @@ static inline void __ticket_enter_slowpath(arch_spinlock_t 
 *lock)
   set_bit(0, (volatile unsigned long *)lock-tickets.tail);
  }
  
 +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
 +{
 + arch_spinlock_t old, new;
 + __ticket_t diff;
 +
 + old.tickets = READ_ONCE(lock-tickets);

Couldn't the caller pass in the lock state that it read rather than
re-reading it?

 + diff = (old.tickets.tail  ~TICKET_SLOWPATH_FLAG) - old.tickets.head;
 +
 + /* try to clear slowpath flag when there are no contenders */
 + if ((old.tickets.tail  TICKET_SLOWPATH_FLAG) 
 + (diff == TICKET_LOCK_INC)) {
 + new = old;
 + new.tickets.tail = ~TICKET_SLOWPATH_FLAG;
 + cmpxchg(lock-head_tail, old.head_tail, new.head_tail);
 + }
 +}
 +
  #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
  static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock,
   __ticket_t ticket)
 @@ -59,6 +76,10 @@ static inline void __ticket_unlock_kick(arch_spinlock_t 
 *lock,
  {
  }
  
 +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
 +{
 +}
 +
  #endif /* CONFIG_PARAVIRT_SPINLOCKS */
  
  static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 @@ -84,7 +105,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t 
 *lock)
   register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
  
   inc = xadd(lock-tickets, inc);
 - if (likely(inc.head == inc.tail))
 + if (likely(inc.head == (inc.tail  ~TICKET_SLOWPATH_FLAG)))

The intent of this conditional was to be the quickest possible path when
taking a fastpath lock, with the code below being used for all slowpath
locks (free or taken). So I don't think masking out SLOWPATH_FLAG is
necessary here.

   goto out;
  
   inc.tail = ~TICKET_SLOWPATH_FLAG;
 @@ -98,7 +119,10 @@ static __always_inline void 
 arch_spin_lock(arch_spinlock_t *lock)
   } while (--count);
   __ticket_lock_spinning(lock, inc.tail);
   }
 -out: barrier();  /* make sure nothing creeps before the lock is taken */
 +out:
 + __ticket_check_and_clear_slowpath(lock);
 +
 + barrier();  /* make sure nothing creeps before the lock is taken */

Which means that if goto out path is only ever used for fastpath
locks, you can limit calling __ticket_check_and_clear_slowpath() to the
slowpath case.

  }
  
  static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 @@ -115,47 +139,21 @@ static __always_inline int 
 arch_spin_trylock(arch_spinlock_t *lock)
   return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
 old.head_tail;
  }
  
 -static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock,
 - arch_spinlock_t old)
 -{
 - arch_spinlock_t new;
 -
 - BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
 -
 - /* Perform the unlock on the before copy */
 - old.tickets.head += TICKET_LOCK_INC;

NB (see below)

 -
 - /* Clear the slowpath flag */
 - new.head_tail = old.head_tail  ~(TICKET_SLOWPATH_FLAG  TICKET_SHIFT);
 -
 - /*
 -  * If the lock is uncontended, clear the flag - use cmpxchg in
 -  * case it changes behind our back 

Re: [PATCH 1/3] MAINTAINERS: Remove Jeremy from the Xen subsystem.

2013-08-13 Thread Jeremy Fitzhardinge
On 08/05/2013 11:05 AM, Konrad Rzeszutek Wilk wrote:
> Jeremy has been a key person in making Linux work with Xen.
> He has been enjoying the last year working on something
> different so reflect that in the maintainers file.

Ack.

J
>
> CC: Jeremy Fitzhardinge 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  CREDITS | 1 +
>  MAINTAINERS | 1 -
>  2 files changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/CREDITS b/CREDITS
> index 206d0fc..646a0a9 100644
> --- a/CREDITS
> +++ b/CREDITS
> @@ -1120,6 +1120,7 @@ D: author of userfs filesystem
>  D: Improved mmap and munmap handling
>  D: General mm minor tidyups
>  D: autofs v4 maintainer
> +D: Xen subsystem
>  S: 987 Alabama St
>  S: San Francisco
>  S: CA, 94110
> diff --git a/MAINTAINERS b/MAINTAINERS
> index defc053..440af74 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9237,7 +9237,6 @@ F:  drivers/media/tuners/tuner-xc2028.*
>  
>  XEN HYPERVISOR INTERFACE
>  M:   Konrad Rzeszutek Wilk 
> -M:   Jeremy Fitzhardinge 
>  L:   xen-de...@lists.xensource.com (moderated for non-subscribers)
>  L:   virtualizat...@lists.linux-foundation.org
>  S:   Supported

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH delta V13 14/14] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-13 Thread Jeremy Fitzhardinge
On 08/13/2013 01:02 PM, Raghavendra K T wrote:
> * Ingo Molnar  [2013-08-13 18:55:52]:
>
>> Would be nice to have a delta fix patch against tip:x86/spinlocks, which 
>> I'll then backmerge into that series via rebasing it.
>>
> There was a namespace collision of PER_CPU lock_waiting variable when
> we have both Xen and KVM enabled. 
>
> Perhaps this week wasn't for me. Had run 100 times randconfig in a loop
> for the fix sent earlier :(. 
>
> Ingo, below delta patch should fix it, IIRC, I hope you will be folding this
> back to patch 14/14 itself. Else please let me.
> I have already run allnoconfig, allyesconfig, randconfig with below patch. 
> But will
> test again. This should apply on top of tip:x86/spinlocks.
>
> ---8<---
> From: Raghavendra K T 
>
> Fix Namespace collision for lock_waiting
>
> Signed-off-by: Raghavendra K T 
> ---
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index d442471..b8ef630 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -673,7 +673,7 @@ struct kvm_lock_waiting {
>  static cpumask_t waiting_cpus;
>  
>  /* Track spinlock on which a cpu is waiting */
> -static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
> +static DEFINE_PER_CPU(struct kvm_lock_waiting, klock_waiting);

Has static stopped meaning static?

J

>  
>  static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
>  {
> @@ -685,7 +685,7 @@ static void kvm_lock_spinning(struct arch_spinlock *lock, 
> __ticket_t want)
>   if (in_nmi())
>   return;
>  
> - w = &__get_cpu_var(lock_waiting);
> + w = &__get_cpu_var(klock_waiting);
>   cpu = smp_processor_id();
>   start = spin_time_start();
>  
> @@ -756,7 +756,7 @@ static void kvm_unlock_kick(struct arch_spinlock *lock, 
> __ticket_t ticket)
>  
>   add_stats(RELEASED_SLOW, 1);
>   for_each_cpu(cpu, _cpus) {
> - const struct kvm_lock_waiting *w = _cpu(lock_waiting, cpu);
> + const struct kvm_lock_waiting *w = _cpu(klock_waiting, cpu);
>   if (ACCESS_ONCE(w->lock) == lock &&
>   ACCESS_ONCE(w->want) == ticket) {
>   add_stats(RELEASED_SLOW_KICKED, 1);
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH delta V13 14/14] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-13 Thread Jeremy Fitzhardinge
On 08/13/2013 01:02 PM, Raghavendra K T wrote:
 * Ingo Molnar mi...@kernel.org [2013-08-13 18:55:52]:

 Would be nice to have a delta fix patch against tip:x86/spinlocks, which 
 I'll then backmerge into that series via rebasing it.

 There was a namespace collision of PER_CPU lock_waiting variable when
 we have both Xen and KVM enabled. 

 Perhaps this week wasn't for me. Had run 100 times randconfig in a loop
 for the fix sent earlier :(. 

 Ingo, below delta patch should fix it, IIRC, I hope you will be folding this
 back to patch 14/14 itself. Else please let me.
 I have already run allnoconfig, allyesconfig, randconfig with below patch. 
 But will
 test again. This should apply on top of tip:x86/spinlocks.

 ---8---
 From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

 Fix Namespace collision for lock_waiting

 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
 index d442471..b8ef630 100644
 --- a/arch/x86/kernel/kvm.c
 +++ b/arch/x86/kernel/kvm.c
 @@ -673,7 +673,7 @@ struct kvm_lock_waiting {
  static cpumask_t waiting_cpus;
  
  /* Track spinlock on which a cpu is waiting */
 -static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
 +static DEFINE_PER_CPU(struct kvm_lock_waiting, klock_waiting);

Has static stopped meaning static?

J

  
  static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
  {
 @@ -685,7 +685,7 @@ static void kvm_lock_spinning(struct arch_spinlock *lock, 
 __ticket_t want)
   if (in_nmi())
   return;
  
 - w = __get_cpu_var(lock_waiting);
 + w = __get_cpu_var(klock_waiting);
   cpu = smp_processor_id();
   start = spin_time_start();
  
 @@ -756,7 +756,7 @@ static void kvm_unlock_kick(struct arch_spinlock *lock, 
 __ticket_t ticket)
  
   add_stats(RELEASED_SLOW, 1);
   for_each_cpu(cpu, waiting_cpus) {
 - const struct kvm_lock_waiting *w = per_cpu(lock_waiting, cpu);
 + const struct kvm_lock_waiting *w = per_cpu(klock_waiting, cpu);
   if (ACCESS_ONCE(w-lock) == lock 
   ACCESS_ONCE(w-want) == ticket) {
   add_stats(RELEASED_SLOW_KICKED, 1);



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] MAINTAINERS: Remove Jeremy from the Xen subsystem.

2013-08-13 Thread Jeremy Fitzhardinge
On 08/05/2013 11:05 AM, Konrad Rzeszutek Wilk wrote:
 Jeremy has been a key person in making Linux work with Xen.
 He has been enjoying the last year working on something
 different so reflect that in the maintainers file.

Ack.

J

 CC: Jeremy Fitzhardinge jer...@goop.org
 Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 ---
  CREDITS | 1 +
  MAINTAINERS | 1 -
  2 files changed, 1 insertion(+), 1 deletion(-)

 diff --git a/CREDITS b/CREDITS
 index 206d0fc..646a0a9 100644
 --- a/CREDITS
 +++ b/CREDITS
 @@ -1120,6 +1120,7 @@ D: author of userfs filesystem
  D: Improved mmap and munmap handling
  D: General mm minor tidyups
  D: autofs v4 maintainer
 +D: Xen subsystem
  S: 987 Alabama St
  S: San Francisco
  S: CA, 94110
 diff --git a/MAINTAINERS b/MAINTAINERS
 index defc053..440af74 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -9237,7 +9237,6 @@ F:  drivers/media/tuners/tuner-xc2028.*
  
  XEN HYPERVISOR INTERFACE
  M:   Konrad Rzeszutek Wilk konrad.w...@oracle.com
 -M:   Jeremy Fitzhardinge jer...@goop.org
  L:   xen-de...@lists.xensource.com (moderated for non-subscribers)
  L:   virtualizat...@lists.linux-foundation.org
  S:   Supported

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] x86, spinlock: Replace pv spinlocks with pv ticketlocks

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  545ac13892ab391049a92108cf59a0d05de7e28c
Gitweb: http://git.kernel.org/tip/545ac13892ab391049a92108cf59a0d05de7e28c
Author: Jeremy Fitzhardinge 
AuthorDate: Fri, 9 Aug 2013 19:51:49 +0530
Committer:  H. Peter Anvin 
CommitDate: Fri, 9 Aug 2013 07:53:05 -0700

x86, spinlock: Replace pv spinlocks with pv ticketlocks

Rather than outright replacing the entire spinlock implementation in
order to paravirtualize it, keep the ticket lock implementation but add
a couple of pvops hooks on the slow patch (long spin on lock, unlocking
a contended lock).

Ticket locks have a number of nice properties, but they also have some
surprising behaviours in virtual environments.  They enforce a strict
FIFO ordering on cpus trying to take a lock; however, if the hypervisor
scheduler does not schedule the cpus in the correct order, the system can
waste a huge amount of time spinning until the next cpu can take the lock.

(See Thomas Friebel's talk "Prevent Guests from Spinning Around"
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

To address this, we add two hooks:
 - __ticket_spin_lock which is called after the cpu has been
   spinning on the lock for a significant number of iterations but has
   failed to take the lock (presumably because the cpu holding the lock
   has been descheduled).  The lock_spinning pvop is expected to block
   the cpu until it has been kicked by the current lock holder.
 - __ticket_spin_unlock, which on releasing a contended lock
   (there are more cpus with tail tickets), it looks to see if the next
   cpu is blocked and wakes it if so.

When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
functions causes all the extra code to go away.

Results:
===
setup: 32 core machine with 32 vcpu KVM guest (HT off)  with 8GB RAM
base = 3.11-rc
patched = base + pvspinlock V12

+-+++
 dbench (Throughput in MB/sec. Higher is better)
+-+++
|   base (stdev %)|patched(stdev%) | %gain  |
+-+++
| 15035.3   (0.3) |15150.0   (0.6) |   0.8  |
|  1470.0   (2.2) | 1713.7   (1.9) |  16.6  |
|   848.6   (4.3) |  967.8   (4.3) |  14.0  |
|   652.9   (3.5) |  685.3   (3.7) |   5.0  |
+-+++

pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases,
and undercommits results are flat

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/1376058122-8248-2-git-send-email-raghavendra...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk 
Tested-by: Attilio Rao 
[ Raghavendra: Changed SPIN_THRESHOLD, fixed redefinition of arch_spinlock_t]
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/paravirt.h   | 32 -
 arch/x86/include/asm/paravirt_types.h | 14 +
 arch/x86/include/asm/spinlock.h   | 53 ---
 arch/x86/include/asm/spinlock_types.h |  4 ---
 arch/x86/kernel/paravirt-spinlocks.c  | 15 ++
 arch/x86/xen/spinlock.c   |  8 --
 6 files changed, 65 insertions(+), 61 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index cfdc9ee..040e72d 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -712,36 +712,16 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)
 
-static inline int arch_spin_is_locked(struct arch_spinlock *lock)
+static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
+   __ticket_t ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
+   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static inline int arch_spin_is_contended(struct arch_spinlock *lock)
+static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
+   __ticket_t ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
-}
-#define arch_spin_is_contended arch_spin_is_contended
-
-static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
-}
-
-static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
- unsigned long flags)
-{
-   PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
-}
-
-static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
-{
-   return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
-}
-
-static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
+   PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
 #endif

[tip:x86/spinlocks] x86, pvticketlock: When paravirtualizing ticket locks, increment by 2

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  4a1ed4ca681e7df38ed1b609a11aab38cbc515b3
Gitweb: http://git.kernel.org/tip/4a1ed4ca681e7df38ed1b609a11aab38cbc515b3
Author: Jeremy Fitzhardinge 
AuthorDate: Fri, 9 Aug 2013 19:51:56 +0530
Committer:  H. Peter Anvin 
CommitDate: Fri, 9 Aug 2013 07:53:50 -0700

x86, pvticketlock: When paravirtualizing ticket locks, increment by 2

Increment ticket head/tails by 2 rather than 1 to leave the LSB free
to store a "is in slowpath state" bit.  This halves the number
of possible CPUs for a given ticket size, but this shouldn't matter
in practice - kernels built for 32k+ CPU systems are probably
specially built for the hardware rather than a generic distro
kernel.

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/1376058122-8248-9-git-send-email-raghavendra...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk 
Tested-by: Attilio Rao 
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/spinlock.h   | 10 +-
 arch/x86/include/asm/spinlock_types.h | 10 +-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 7442410..04a5cd5 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -78,7 +78,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
  */
 static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
-   register struct __raw_tickets inc = { .tail = 1 };
+   register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
inc = xadd(>tickets, inc);
 
@@ -104,7 +104,7 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
if (old.tickets.head != old.tickets.tail)
return 0;
 
-   new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
+   new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT);
 
/* cmpxchg is a full barrier, so nothing can move before it */
return cmpxchg(>head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
@@ -112,9 +112,9 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-   __ticket_t next = lock->tickets.head + 1;
+   __ticket_t next = lock->tickets.head + TICKET_LOCK_INC;
 
-   __add(>tickets.head, 1, UNLOCK_LOCK_PREFIX);
+   __add(>tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
__ticket_unlock_kick(lock, next);
 }
 
@@ -129,7 +129,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t 
*lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-   return (__ticket_t)(tmp.tail - tmp.head) > 1;
+   return (__ticket_t)(tmp.tail - tmp.head) > TICKET_LOCK_INC;
 }
 #define arch_spin_is_contended arch_spin_is_contended
 
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 83fd3c7..e96fcbd 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -3,7 +3,13 @@
 
 #include 
 
-#if (CONFIG_NR_CPUS < 256)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define __TICKET_LOCK_INC  2
+#else
+#define __TICKET_LOCK_INC  1
+#endif
+
+#if (CONFIG_NR_CPUS < (256 / __TICKET_LOCK_INC))
 typedef u8  __ticket_t;
 typedef u16 __ticketpair_t;
 #else
@@ -11,6 +17,8 @@ typedef u16 __ticket_t;
 typedef u32 __ticketpair_t;
 #endif
 
+#define TICKET_LOCK_INC((__ticket_t)__TICKET_LOCK_INC)
+
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
 
 typedef struct arch_spinlock {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] xen: Defer spinlock setup until boot CPU setup

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  bf7aab3ad4b4364a293421d628a912a2153ee1ee
Gitweb: http://git.kernel.org/tip/bf7aab3ad4b4364a293421d628a912a2153ee1ee
Author: Jeremy Fitzhardinge 
AuthorDate: Fri, 9 Aug 2013 19:51:52 +0530
Committer:  H. Peter Anvin 
CommitDate: Fri, 9 Aug 2013 07:53:18 -0700

xen: Defer spinlock setup until boot CPU setup

There's no need to do it at very early init, and doing it there
makes it impossible to use the jump_label machinery.

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/1376058122-8248-5-git-send-email-raghavendra...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/xen/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index ca92754..3b52d80 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -279,6 +279,7 @@ static void __init xen_smp_prepare_boot_cpu(void)
 
xen_filter_cpu_maps();
xen_setup_vcpu_info_placement();
+   xen_init_spinlocks();
 }
 
 static void __init xen_smp_prepare_cpus(unsigned int max_cpus)
@@ -680,7 +681,6 @@ void __init xen_smp_init(void)
 {
smp_ops = xen_smp_ops;
xen_fill_possible_map();
-   xen_init_spinlocks();
 }
 
 static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] xen, pvticketlock: Allow interrupts to be enabled while blocking

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  1ed7bf5f5227169b661c619636f754b98001ec30
Gitweb: http://git.kernel.org/tip/1ed7bf5f5227169b661c619636f754b98001ec30
Author: Jeremy Fitzhardinge 
AuthorDate: Fri, 9 Aug 2013 19:51:59 +0530
Committer:  H. Peter Anvin 
CommitDate: Fri, 9 Aug 2013 07:54:03 -0700

xen, pvticketlock: Allow interrupts to be enabled while blocking

If interrupts were enabled when taking the spinlock, we can leave them
enabled while blocking to get the lock.

If we can enable interrupts while waiting for the lock to become
available, and we take an interrupt before entering the poll,
and the handler takes a spinlock which ends up going into
the slow state (invalidating the per-cpu "lock" and "want" values),
then when the interrupt handler returns the event channel will
remain pending so the poll will return immediately, causing it to
return out to the main spinlock loop.

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/1376058122-8248-12-git-send-email-raghavendra...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/xen/spinlock.c | 46 --
 1 file changed, 40 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 546112e..0438b93 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -142,7 +142,20 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
__ticket_t want)
 * partially setup state.
 */
local_irq_save(flags);
-
+   /*
+* We don't really care if we're overwriting some other
+* (lock,want) pair, as that would mean that we're currently
+* in an interrupt context, and the outer context had
+* interrupts enabled.  That has already kicked the VCPU out
+* of xen_poll_irq(), so it will just return spuriously and
+* retry with newly setup (lock,want).
+*
+* The ordering protocol on this is that the "lock" pointer
+* may only be set non-NULL if the "want" ticket is correct.
+* If we're updating "want", we must first clear "lock".
+*/
+   w->lock = NULL;
+   smp_wmb();
w->want = want;
smp_wmb();
w->lock = lock;
@@ -157,24 +170,43 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
__ticket_t want)
/* Only check lock once pending cleared */
barrier();
 
-   /* Mark entry to slowpath before doing the pickup test to make
-  sure we don't deadlock with an unlocker. */
+   /*
+* Mark entry to slowpath before doing the pickup test to make
+* sure we don't deadlock with an unlocker.
+*/
__ticket_enter_slowpath(lock);
 
-   /* check again make sure it didn't become free while
-  we weren't looking  */
+   /*
+* check again make sure it didn't become free while
+* we weren't looking
+*/
if (ACCESS_ONCE(lock->tickets.head) == want) {
add_stats(TAKEN_SLOW_PICKUP, 1);
goto out;
}
+
+   /* Allow interrupts while blocked */
+   local_irq_restore(flags);
+
+   /*
+* If an interrupt happens here, it will leave the wakeup irq
+* pending, which will cause xen_poll_irq() to return
+* immediately.
+*/
+
/* Block until irq becomes pending (or perhaps a spurious wakeup) */
xen_poll_irq(irq);
add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq));
+
+   local_irq_save(flags);
+
kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 out:
cpumask_clear_cpu(cpu, _cpus);
w->lock = NULL;
+
local_irq_restore(flags);
+
spin_time_accum_blocked(start);
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
@@ -188,7 +220,9 @@ static void xen_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
for_each_cpu(cpu, _cpus) {
const struct xen_lock_waiting *w = _cpu(lock_waiting, cpu);
 
-   if (w->lock == lock && w->want == next) {
+   /* Make sure we read lock before want */
+   if (ACCESS_ONCE(w->lock) == lock &&
+   ACCESS_ONCE(w->want) == next) {
add_stats(RELEASED_SLOW_KICKED, 1);
xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
break;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  b8fa70b51aa76737bdb6b493901ef7376977489c
Gitweb: http://git.kernel.org/tip/b8fa70b51aa76737bdb6b493901ef7376977489c
Author: Jeremy Fitzhardinge 
AuthorDate: Fri, 9 Aug 2013 19:51:54 +0530
Committer:  H. Peter Anvin 
CommitDate: Fri, 9 Aug 2013 07:53:37 -0700

xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/1376058122-8248-7-git-send-email-raghavendra...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/xen/spinlock.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index a458729..669a971 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -244,6 +244,8 @@ void xen_uninit_lock_cpu(int cpu)
per_cpu(irq_name, cpu) = NULL;
 }
 
+static bool xen_pvspin __initdata = true;
+
 void __init xen_init_spinlocks(void)
 {
/*
@@ -253,10 +255,22 @@ void __init xen_init_spinlocks(void)
if (xen_hvm_domain())
return;
 
+   if (!xen_pvspin) {
+   printk(KERN_DEBUG "xen: PV spinlocks disabled\n");
+   return;
+   }
+
pv_lock_ops.lock_spinning = xen_lock_spinning;
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
+static __init int xen_parse_nopvspin(char *arg)
+{
+   xen_pvspin = false;
+   return 0;
+}
+early_param("xen_nopvspin", xen_parse_nopvspin);
+
 #ifdef CONFIG_XEN_DEBUG_FS
 
 static struct dentry *d_spin_debug;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] x86, pvticketlock: Use callee-save for lock_spinning

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  354714dd2607778692db53947ab93b74956494e5
Gitweb: http://git.kernel.org/tip/354714dd2607778692db53947ab93b74956494e5
Author: Jeremy Fitzhardinge 
AuthorDate: Fri, 9 Aug 2013 19:51:55 +0530
Committer:  H. Peter Anvin 
CommitDate: Fri, 9 Aug 2013 07:53:44 -0700

x86, pvticketlock: Use callee-save for lock_spinning

Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path.  To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/1376058122-8248-8-git-send-email-raghavendra...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk 
Tested-by: Attilio Rao 
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/paravirt.h   | 2 +-
 arch/x86/include/asm/paravirt_types.h | 2 +-
 arch/x86/kernel/paravirt-spinlocks.c  | 2 +-
 arch/x86/xen/spinlock.c   | 3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 040e72d..7131e12c 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -715,7 +715,7 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
__ticket_t ticket)
 {
-   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
+   PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
 static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 346a07c..04ac40e 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -334,7 +334,7 @@ typedef u16 __ticket_t;
 #endif
 
 struct pv_lock_ops {
-   void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket);
+   struct paravirt_callee_save lock_spinning;
void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
 };
 
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index c2e010e..4251c1d 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -9,7 +9,7 @@
 
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-   .lock_spinning = paravirt_nop,
+   .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
.unlock_kick = paravirt_nop,
 #endif
 };
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 669a971..6c8792b 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -173,6 +173,7 @@ out:
local_irq_restore(flags);
spin_time_accum_blocked(start);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
 
 static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
 {
@@ -260,7 +261,7 @@ void __init xen_init_spinlocks(void)
return;
}
 
-   pv_lock_ops.lock_spinning = xen_lock_spinning;
+   pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] xen, pvticketlock: Xen implementation for PV ticket locks

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  80bd58fef495d000a02fc5b55ca76d423400e748
Gitweb: http://git.kernel.org/tip/80bd58fef495d000a02fc5b55ca76d423400e748
Author: Jeremy Fitzhardinge 
AuthorDate: Fri, 9 Aug 2013 19:51:53 +0530
Committer:  H. Peter Anvin 
CommitDate: Fri, 9 Aug 2013 07:53:23 -0700

xen, pvticketlock: Xen implementation for PV ticket locks

Replace the old Xen implementation of PV spinlocks with and implementation
of xen_lock_spinning and xen_unlock_kick.

xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
adds itself to the waiting_cpus set, and blocks on an event channel
until the channel becomes pending.

xen_unlock_kick searches the cpus in waiting_cpus looking for the one
which next wants this lock with the next ticket, if any.  If found,
it kicks it by making its event channel pending, which wakes it up.

We need to make sure interrupts are disabled while we're relying on the
contents of the per-cpu lock_waiting values, otherwise an interrupt
handler could come in, try to take some other lock, block, and overwrite
our values.

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/1376058122-8248-6-git-send-email-raghavendra...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk 
 [ Raghavendra:  use function + enum instead of macro, cmpxchg for zero status 
reset
Reintroduce break since we know the exact vCPU to send IPI as suggested by 
Konrad.]
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/xen/spinlock.c | 348 +++-
 1 file changed, 79 insertions(+), 269 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index d509629..a458729 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -17,45 +17,44 @@
 #include "xen-ops.h"
 #include "debugfs.h"
 
-#ifdef CONFIG_XEN_DEBUG_FS
-static struct xen_spinlock_stats
-{
-   u64 taken;
-   u32 taken_slow;
-   u32 taken_slow_nested;
-   u32 taken_slow_pickup;
-   u32 taken_slow_spurious;
-   u32 taken_slow_irqenable;
+enum xen_contention_stat {
+   TAKEN_SLOW,
+   TAKEN_SLOW_PICKUP,
+   TAKEN_SLOW_SPURIOUS,
+   RELEASED_SLOW,
+   RELEASED_SLOW_KICKED,
+   NR_CONTENTION_STATS
+};
 
-   u64 released;
-   u32 released_slow;
-   u32 released_slow_kicked;
 
+#ifdef CONFIG_XEN_DEBUG_FS
 #define HISTO_BUCKETS  30
-   u32 histo_spin_total[HISTO_BUCKETS+1];
-   u32 histo_spin_spinning[HISTO_BUCKETS+1];
+static struct xen_spinlock_stats
+{
+   u32 contention_stats[NR_CONTENTION_STATS];
u32 histo_spin_blocked[HISTO_BUCKETS+1];
-
-   u64 time_total;
-   u64 time_spinning;
u64 time_blocked;
 } spinlock_stats;
 
 static u8 zero_stats;
 
-static unsigned lock_timeout = 1 << 10;
-#define TIMEOUT lock_timeout
-
 static inline void check_zero(void)
 {
-   if (unlikely(zero_stats)) {
-   memset(_stats, 0, sizeof(spinlock_stats));
-   zero_stats = 0;
+   u8 ret;
+   u8 old = ACCESS_ONCE(zero_stats);
+   if (unlikely(old)) {
+   ret = cmpxchg(_stats, old, 0);
+   /* This ensures only one fellow resets the stat */
+   if (ret == old)
+   memset(_stats, 0, sizeof(spinlock_stats));
}
 }
 
-#define ADD_STATS(elem, val)   \
-   do { check_zero(); spinlock_stats.elem += (val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+   check_zero();
+   spinlock_stats.contention_stats[var] += val;
+}
 
 static inline u64 spin_time_start(void)
 {
@@ -74,22 +73,6 @@ static void __spin_time_accum(u64 delta, u32 *array)
array[HISTO_BUCKETS]++;
 }
 
-static inline void spin_time_accum_spinning(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_spinning);
-   spinlock_stats.time_spinning += delta;
-}
-
-static inline void spin_time_accum_total(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_total);
-   spinlock_stats.time_total += delta;
-}
-
 static inline void spin_time_accum_blocked(u64 start)
 {
u32 delta = xen_clocksource_read() - start;
@@ -99,19 +82,15 @@ static inline void spin_time_accum_blocked(u64 start)
 }
 #else  /* !CONFIG_XEN_DEBUG_FS */
 #define TIMEOUT(1 << 10)
-#define ADD_STATS(elem, val)   do { (void)(val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+}
 
 static inline u64 spin_time_start(void)
 {
return 0;
 }
 
-static inline void spin_time_accum_total(u64 start)
-{
-}
-static inline void spin_time_accum_spinning(u64 start)
-{
-}
 static inline void spin_time_accum_blocked(u64 start)
 {
 }
@@ -134,230 +113,84 @@ typedef u16 xen_spinners_t;
asm(LOCK_PREFIX &

[tip:x86/spinlocks] x86, ticketlock: Add slowpath logic

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  96f853eaa889c7a22718d275b0df7bebdbd6780e
Gitweb: http://git.kernel.org/tip/96f853eaa889c7a22718d275b0df7bebdbd6780e
Author: Jeremy Fitzhardinge 
AuthorDate: Fri, 9 Aug 2013 19:51:58 +0530
Committer:  H. Peter Anvin 
CommitDate: Fri, 9 Aug 2013 07:54:00 -0700

x86, ticketlock: Add slowpath logic

Maintain a flag in the LSB of the ticket lock tail which indicates
whether anyone is in the lock slowpath and may need kicking when
the current holder unlocks.  The flags are set when the first locker
enters the slowpath, and cleared when unlocking to an empty queue (ie,
no contention).

In the specific implementation of lock_spinning(), make sure to set
the slowpath flags on the lock just before blocking.  We must do
this before the last-chance pickup test to prevent a deadlock
with the unlocker:

UnlockerLocker
test for lock pickup
-> fail
unlock
test slowpath
-> false
set slowpath flags
block

Whereas this works in any ordering:

UnlockerLocker
set slowpath flags
test for lock pickup
-> fail
block
unlock
test slowpath
-> true, kick

If the unlocker finds that the lock has the slowpath flag set but it is
actually uncontended (ie, head == tail, so nobody is waiting), then it
clears the slowpath flag.

The unlock code uses a locked add to update the head counter.  This also
acts as a full memory barrier so that its safe to subsequently
read back the slowflag state, knowing that the updated lock is visible
to the other CPUs.  If it were an unlocked add, then the flag read may
just be forwarded from the store buffer before it was visible to the other
CPUs, which could result in a deadlock.

Unfortunately this means we need to do a locked instruction when
unlocking with PV ticketlocks.  However, if PV ticketlocks are not
enabled, then the old non-locked "add" is the only unlocking code.

Note: this code relies on gcc making sure that unlikely() code is out of
line of the fastpath, which only happens when OPTIMIZE_SIZE=n.  If it
doesn't the generated code isn't too bad, but its definitely suboptimal.

Thanks to Srivatsa Vaddagiri for providing a bugfix to the original
version of this change, which has been folded in.
Thanks to Stephan Diestelhorst for commenting on some code which relied
on an inaccurate reading of the x86 memory ordering rules.

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/1376058122-8248-11-git-send-email-raghavendra...@linux.vnet.ibm.com
Signed-off-by: Srivatsa Vaddagiri 
Reviewed-by: Konrad Rzeszutek Wilk 
Cc: Stephan Diestelhorst 
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/paravirt.h   |  2 +-
 arch/x86/include/asm/spinlock.h   | 86 +--
 arch/x86/include/asm/spinlock_types.h |  2 +
 arch/x86/kernel/paravirt-spinlocks.c  |  3 ++
 arch/x86/xen/spinlock.c   |  6 +++
 5 files changed, 74 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 7131e12c..401f350 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -718,7 +718,7 @@ static __always_inline void __ticket_lock_spinning(struct 
arch_spinlock *lock,
PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
+static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
__ticket_t ticket)
 {
PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 04a5cd5..d68883d 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -1,11 +1,14 @@
 #ifndef _ASM_X86_SPINLOCK_H
 #define _ASM_X86_SPINLOCK_H
 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
+
 /*
  * Your basic SMP spinlocks, allowing only a single CPU anywhere
  *
@@ -37,32 +40,28 @@
 /* How long a lock should spin before we consider blocking */
 #define SPIN_THRESHOLD (1 << 15)
 
-#ifndef CONFIG_PARAVIRT_SPINLOCKS
+extern struct static_key paravirt_ticketlocks_enabled;
+static __always_inline bool static_key_false(struct static_key *key);
 
-static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
-   __ticket_t ticket)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
+   set_bit(0, (volatile unsigned long *)>tickets.tail);
 }

[tip:x86/spinlocks] x86, ticketlock: Collapse a layer of functions

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  b798df09f919c52823110a74bd568c6a4e98e6b2
Gitweb: http://git.kernel.org/tip/b798df09f919c52823110a74bd568c6a4e98e6b2
Author: Jeremy Fitzhardinge 
AuthorDate: Fri, 9 Aug 2013 19:51:51 +0530
Committer:  H. Peter Anvin 
CommitDate: Fri, 9 Aug 2013 07:53:14 -0700

x86, ticketlock: Collapse a layer of functions

Now that the paravirtualization layer doesn't exist at the spinlock
level any more, we can collapse the __ticket_ functions into the arch_
functions.

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/1376058122-8248-4-git-send-email-raghavendra...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk 
Tested-by: Attilio Rao 
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/spinlock.h | 35 +--
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 4d54244..7442410 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -76,7 +76,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
  * in the high part, because a wide xadd increment of the low part would carry
  * up and contaminate the high part.
  */
-static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
register struct __raw_tickets inc = { .tail = 1 };
 
@@ -96,7 +96,7 @@ static __always_inline void __ticket_spin_lock(struct 
arch_spinlock *lock)
 out:   barrier();  /* make sure nothing creeps before the lock is taken */
 }
 
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
+static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
arch_spinlock_t old, new;
 
@@ -110,7 +110,7 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
return cmpxchg(>head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
 }
 
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
+static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
__ticket_t next = lock->tickets.head + 1;
 
@@ -118,46 +118,21 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
__ticket_unlock_kick(lock, next);
 }
 
-static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
return tmp.tail != tmp.head;
 }
 
-static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
+static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
return (__ticket_t)(tmp.tail - tmp.head) > 1;
 }
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_locked(lock);
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_contended(lock);
-}
 #define arch_spin_is_contended arch_spin_is_contended
 
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-   __ticket_spin_lock(lock);
-}
-
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-   return __ticket_spin_trylock(lock);
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-   __ticket_spin_unlock(lock);
-}
-
 static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
  unsigned long flags)
 {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  b8fa70b51aa76737bdb6b493901ef7376977489c
Gitweb: http://git.kernel.org/tip/b8fa70b51aa76737bdb6b493901ef7376977489c
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Fri, 9 Aug 2013 19:51:54 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Fri, 9 Aug 2013 07:53:37 -0700

xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/1376058122-8248-7-git-send-email-raghavendra...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/xen/spinlock.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index a458729..669a971 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -244,6 +244,8 @@ void xen_uninit_lock_cpu(int cpu)
per_cpu(irq_name, cpu) = NULL;
 }
 
+static bool xen_pvspin __initdata = true;
+
 void __init xen_init_spinlocks(void)
 {
/*
@@ -253,10 +255,22 @@ void __init xen_init_spinlocks(void)
if (xen_hvm_domain())
return;
 
+   if (!xen_pvspin) {
+   printk(KERN_DEBUG xen: PV spinlocks disabled\n);
+   return;
+   }
+
pv_lock_ops.lock_spinning = xen_lock_spinning;
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
+static __init int xen_parse_nopvspin(char *arg)
+{
+   xen_pvspin = false;
+   return 0;
+}
+early_param(xen_nopvspin, xen_parse_nopvspin);
+
 #ifdef CONFIG_XEN_DEBUG_FS
 
 static struct dentry *d_spin_debug;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] x86, pvticketlock: Use callee-save for lock_spinning

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  354714dd2607778692db53947ab93b74956494e5
Gitweb: http://git.kernel.org/tip/354714dd2607778692db53947ab93b74956494e5
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Fri, 9 Aug 2013 19:51:55 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Fri, 9 Aug 2013 07:53:44 -0700

x86, pvticketlock: Use callee-save for lock_spinning

Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path.  To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/1376058122-8248-8-git-send-email-raghavendra...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/include/asm/paravirt.h   | 2 +-
 arch/x86/include/asm/paravirt_types.h | 2 +-
 arch/x86/kernel/paravirt-spinlocks.c  | 2 +-
 arch/x86/xen/spinlock.c   | 3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 040e72d..7131e12c 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -715,7 +715,7 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
__ticket_t ticket)
 {
-   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
+   PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
 static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 346a07c..04ac40e 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -334,7 +334,7 @@ typedef u16 __ticket_t;
 #endif
 
 struct pv_lock_ops {
-   void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket);
+   struct paravirt_callee_save lock_spinning;
void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
 };
 
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index c2e010e..4251c1d 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -9,7 +9,7 @@
 
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-   .lock_spinning = paravirt_nop,
+   .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
.unlock_kick = paravirt_nop,
 #endif
 };
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 669a971..6c8792b 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -173,6 +173,7 @@ out:
local_irq_restore(flags);
spin_time_accum_blocked(start);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
 
 static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
 {
@@ -260,7 +261,7 @@ void __init xen_init_spinlocks(void)
return;
}
 
-   pv_lock_ops.lock_spinning = xen_lock_spinning;
+   pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] xen, pvticketlock: Xen implementation for PV ticket locks

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  80bd58fef495d000a02fc5b55ca76d423400e748
Gitweb: http://git.kernel.org/tip/80bd58fef495d000a02fc5b55ca76d423400e748
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Fri, 9 Aug 2013 19:51:53 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Fri, 9 Aug 2013 07:53:23 -0700

xen, pvticketlock: Xen implementation for PV ticket locks

Replace the old Xen implementation of PV spinlocks with and implementation
of xen_lock_spinning and xen_unlock_kick.

xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
adds itself to the waiting_cpus set, and blocks on an event channel
until the channel becomes pending.

xen_unlock_kick searches the cpus in waiting_cpus looking for the one
which next wants this lock with the next ticket, if any.  If found,
it kicks it by making its event channel pending, which wakes it up.

We need to make sure interrupts are disabled while we're relying on the
contents of the per-cpu lock_waiting values, otherwise an interrupt
handler could come in, try to take some other lock, block, and overwrite
our values.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/1376058122-8248-6-git-send-email-raghavendra...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 [ Raghavendra:  use function + enum instead of macro, cmpxchg for zero status 
reset
Reintroduce break since we know the exact vCPU to send IPI as suggested by 
Konrad.]
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/xen/spinlock.c | 348 +++-
 1 file changed, 79 insertions(+), 269 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index d509629..a458729 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -17,45 +17,44 @@
 #include xen-ops.h
 #include debugfs.h
 
-#ifdef CONFIG_XEN_DEBUG_FS
-static struct xen_spinlock_stats
-{
-   u64 taken;
-   u32 taken_slow;
-   u32 taken_slow_nested;
-   u32 taken_slow_pickup;
-   u32 taken_slow_spurious;
-   u32 taken_slow_irqenable;
+enum xen_contention_stat {
+   TAKEN_SLOW,
+   TAKEN_SLOW_PICKUP,
+   TAKEN_SLOW_SPURIOUS,
+   RELEASED_SLOW,
+   RELEASED_SLOW_KICKED,
+   NR_CONTENTION_STATS
+};
 
-   u64 released;
-   u32 released_slow;
-   u32 released_slow_kicked;
 
+#ifdef CONFIG_XEN_DEBUG_FS
 #define HISTO_BUCKETS  30
-   u32 histo_spin_total[HISTO_BUCKETS+1];
-   u32 histo_spin_spinning[HISTO_BUCKETS+1];
+static struct xen_spinlock_stats
+{
+   u32 contention_stats[NR_CONTENTION_STATS];
u32 histo_spin_blocked[HISTO_BUCKETS+1];
-
-   u64 time_total;
-   u64 time_spinning;
u64 time_blocked;
 } spinlock_stats;
 
 static u8 zero_stats;
 
-static unsigned lock_timeout = 1  10;
-#define TIMEOUT lock_timeout
-
 static inline void check_zero(void)
 {
-   if (unlikely(zero_stats)) {
-   memset(spinlock_stats, 0, sizeof(spinlock_stats));
-   zero_stats = 0;
+   u8 ret;
+   u8 old = ACCESS_ONCE(zero_stats);
+   if (unlikely(old)) {
+   ret = cmpxchg(zero_stats, old, 0);
+   /* This ensures only one fellow resets the stat */
+   if (ret == old)
+   memset(spinlock_stats, 0, sizeof(spinlock_stats));
}
 }
 
-#define ADD_STATS(elem, val)   \
-   do { check_zero(); spinlock_stats.elem += (val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+   check_zero();
+   spinlock_stats.contention_stats[var] += val;
+}
 
 static inline u64 spin_time_start(void)
 {
@@ -74,22 +73,6 @@ static void __spin_time_accum(u64 delta, u32 *array)
array[HISTO_BUCKETS]++;
 }
 
-static inline void spin_time_accum_spinning(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_spinning);
-   spinlock_stats.time_spinning += delta;
-}
-
-static inline void spin_time_accum_total(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_total);
-   spinlock_stats.time_total += delta;
-}
-
 static inline void spin_time_accum_blocked(u64 start)
 {
u32 delta = xen_clocksource_read() - start;
@@ -99,19 +82,15 @@ static inline void spin_time_accum_blocked(u64 start)
 }
 #else  /* !CONFIG_XEN_DEBUG_FS */
 #define TIMEOUT(1  10)
-#define ADD_STATS(elem, val)   do { (void)(val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+}
 
 static inline u64 spin_time_start(void)
 {
return 0;
 }
 
-static inline void spin_time_accum_total(u64 start)
-{
-}
-static inline void spin_time_accum_spinning(u64 start)
-{
-}
 static inline void

[tip:x86/spinlocks] x86, ticketlock: Add slowpath logic

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  96f853eaa889c7a22718d275b0df7bebdbd6780e
Gitweb: http://git.kernel.org/tip/96f853eaa889c7a22718d275b0df7bebdbd6780e
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Fri, 9 Aug 2013 19:51:58 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Fri, 9 Aug 2013 07:54:00 -0700

x86, ticketlock: Add slowpath logic

Maintain a flag in the LSB of the ticket lock tail which indicates
whether anyone is in the lock slowpath and may need kicking when
the current holder unlocks.  The flags are set when the first locker
enters the slowpath, and cleared when unlocking to an empty queue (ie,
no contention).

In the specific implementation of lock_spinning(), make sure to set
the slowpath flags on the lock just before blocking.  We must do
this before the last-chance pickup test to prevent a deadlock
with the unlocker:

UnlockerLocker
test for lock pickup
- fail
unlock
test slowpath
- false
set slowpath flags
block

Whereas this works in any ordering:

UnlockerLocker
set slowpath flags
test for lock pickup
- fail
block
unlock
test slowpath
- true, kick

If the unlocker finds that the lock has the slowpath flag set but it is
actually uncontended (ie, head == tail, so nobody is waiting), then it
clears the slowpath flag.

The unlock code uses a locked add to update the head counter.  This also
acts as a full memory barrier so that its safe to subsequently
read back the slowflag state, knowing that the updated lock is visible
to the other CPUs.  If it were an unlocked add, then the flag read may
just be forwarded from the store buffer before it was visible to the other
CPUs, which could result in a deadlock.

Unfortunately this means we need to do a locked instruction when
unlocking with PV ticketlocks.  However, if PV ticketlocks are not
enabled, then the old non-locked add is the only unlocking code.

Note: this code relies on gcc making sure that unlikely() code is out of
line of the fastpath, which only happens when OPTIMIZE_SIZE=n.  If it
doesn't the generated code isn't too bad, but its definitely suboptimal.

Thanks to Srivatsa Vaddagiri for providing a bugfix to the original
version of this change, which has been folded in.
Thanks to Stephan Diestelhorst for commenting on some code which relied
on an inaccurate reading of the x86 memory ordering rules.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/1376058122-8248-11-git-send-email-raghavendra...@linux.vnet.ibm.com
Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Stephan Diestelhorst stephan.diestelho...@amd.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/include/asm/paravirt.h   |  2 +-
 arch/x86/include/asm/spinlock.h   | 86 +--
 arch/x86/include/asm/spinlock_types.h |  2 +
 arch/x86/kernel/paravirt-spinlocks.c  |  3 ++
 arch/x86/xen/spinlock.c   |  6 +++
 5 files changed, 74 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 7131e12c..401f350 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -718,7 +718,7 @@ static __always_inline void __ticket_lock_spinning(struct 
arch_spinlock *lock,
PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
+static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
__ticket_t ticket)
 {
PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 04a5cd5..d68883d 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -1,11 +1,14 @@
 #ifndef _ASM_X86_SPINLOCK_H
 #define _ASM_X86_SPINLOCK_H
 
+#include linux/jump_label.h
 #include linux/atomic.h
 #include asm/page.h
 #include asm/processor.h
 #include linux/compiler.h
 #include asm/paravirt.h
+#include asm/bitops.h
+
 /*
  * Your basic SMP spinlocks, allowing only a single CPU anywhere
  *
@@ -37,32 +40,28 @@
 /* How long a lock should spin before we consider blocking */
 #define SPIN_THRESHOLD (1  15)
 
-#ifndef CONFIG_PARAVIRT_SPINLOCKS
+extern struct static_key paravirt_ticketlocks_enabled;
+static __always_inline bool static_key_false(struct static_key *key);
 
-static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock

[tip:x86/spinlocks] x86, ticketlock: Collapse a layer of functions

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  b798df09f919c52823110a74bd568c6a4e98e6b2
Gitweb: http://git.kernel.org/tip/b798df09f919c52823110a74bd568c6a4e98e6b2
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Fri, 9 Aug 2013 19:51:51 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Fri, 9 Aug 2013 07:53:14 -0700

x86, ticketlock: Collapse a layer of functions

Now that the paravirtualization layer doesn't exist at the spinlock
level any more, we can collapse the __ticket_ functions into the arch_
functions.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/1376058122-8248-4-git-send-email-raghavendra...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/include/asm/spinlock.h | 35 +--
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 4d54244..7442410 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -76,7 +76,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
  * in the high part, because a wide xadd increment of the low part would carry
  * up and contaminate the high part.
  */
-static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
register struct __raw_tickets inc = { .tail = 1 };
 
@@ -96,7 +96,7 @@ static __always_inline void __ticket_spin_lock(struct 
arch_spinlock *lock)
 out:   barrier();  /* make sure nothing creeps before the lock is taken */
 }
 
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
+static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
arch_spinlock_t old, new;
 
@@ -110,7 +110,7 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
 }
 
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
+static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
__ticket_t next = lock-tickets.head + 1;
 
@@ -118,46 +118,21 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
__ticket_unlock_kick(lock, next);
 }
 
-static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
return tmp.tail != tmp.head;
 }
 
-static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
+static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
return (__ticket_t)(tmp.tail - tmp.head)  1;
 }
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_locked(lock);
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_contended(lock);
-}
 #define arch_spin_is_contended arch_spin_is_contended
 
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-   __ticket_spin_lock(lock);
-}
-
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-   return __ticket_spin_trylock(lock);
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-   __ticket_spin_unlock(lock);
-}
-
 static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
  unsigned long flags)
 {
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] xen, pvticketlock: Allow interrupts to be enabled while blocking

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  1ed7bf5f5227169b661c619636f754b98001ec30
Gitweb: http://git.kernel.org/tip/1ed7bf5f5227169b661c619636f754b98001ec30
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Fri, 9 Aug 2013 19:51:59 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Fri, 9 Aug 2013 07:54:03 -0700

xen, pvticketlock: Allow interrupts to be enabled while blocking

If interrupts were enabled when taking the spinlock, we can leave them
enabled while blocking to get the lock.

If we can enable interrupts while waiting for the lock to become
available, and we take an interrupt before entering the poll,
and the handler takes a spinlock which ends up going into
the slow state (invalidating the per-cpu lock and want values),
then when the interrupt handler returns the event channel will
remain pending so the poll will return immediately, causing it to
return out to the main spinlock loop.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/1376058122-8248-12-git-send-email-raghavendra...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/xen/spinlock.c | 46 --
 1 file changed, 40 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 546112e..0438b93 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -142,7 +142,20 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
__ticket_t want)
 * partially setup state.
 */
local_irq_save(flags);
-
+   /*
+* We don't really care if we're overwriting some other
+* (lock,want) pair, as that would mean that we're currently
+* in an interrupt context, and the outer context had
+* interrupts enabled.  That has already kicked the VCPU out
+* of xen_poll_irq(), so it will just return spuriously and
+* retry with newly setup (lock,want).
+*
+* The ordering protocol on this is that the lock pointer
+* may only be set non-NULL if the want ticket is correct.
+* If we're updating want, we must first clear lock.
+*/
+   w-lock = NULL;
+   smp_wmb();
w-want = want;
smp_wmb();
w-lock = lock;
@@ -157,24 +170,43 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
__ticket_t want)
/* Only check lock once pending cleared */
barrier();
 
-   /* Mark entry to slowpath before doing the pickup test to make
-  sure we don't deadlock with an unlocker. */
+   /*
+* Mark entry to slowpath before doing the pickup test to make
+* sure we don't deadlock with an unlocker.
+*/
__ticket_enter_slowpath(lock);
 
-   /* check again make sure it didn't become free while
-  we weren't looking  */
+   /*
+* check again make sure it didn't become free while
+* we weren't looking
+*/
if (ACCESS_ONCE(lock-tickets.head) == want) {
add_stats(TAKEN_SLOW_PICKUP, 1);
goto out;
}
+
+   /* Allow interrupts while blocked */
+   local_irq_restore(flags);
+
+   /*
+* If an interrupt happens here, it will leave the wakeup irq
+* pending, which will cause xen_poll_irq() to return
+* immediately.
+*/
+
/* Block until irq becomes pending (or perhaps a spurious wakeup) */
xen_poll_irq(irq);
add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq));
+
+   local_irq_save(flags);
+
kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 out:
cpumask_clear_cpu(cpu, waiting_cpus);
w-lock = NULL;
+
local_irq_restore(flags);
+
spin_time_accum_blocked(start);
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
@@ -188,7 +220,9 @@ static void xen_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
for_each_cpu(cpu, waiting_cpus) {
const struct xen_lock_waiting *w = per_cpu(lock_waiting, cpu);
 
-   if (w-lock == lock  w-want == next) {
+   /* Make sure we read lock before want */
+   if (ACCESS_ONCE(w-lock) == lock 
+   ACCESS_ONCE(w-want) == next) {
add_stats(RELEASED_SLOW_KICKED, 1);
xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
break;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] x86, pvticketlock: When paravirtualizing ticket locks, increment by 2

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  4a1ed4ca681e7df38ed1b609a11aab38cbc515b3
Gitweb: http://git.kernel.org/tip/4a1ed4ca681e7df38ed1b609a11aab38cbc515b3
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Fri, 9 Aug 2013 19:51:56 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Fri, 9 Aug 2013 07:53:50 -0700

x86, pvticketlock: When paravirtualizing ticket locks, increment by 2

Increment ticket head/tails by 2 rather than 1 to leave the LSB free
to store a is in slowpath state bit.  This halves the number
of possible CPUs for a given ticket size, but this shouldn't matter
in practice - kernels built for 32k+ CPU systems are probably
specially built for the hardware rather than a generic distro
kernel.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/1376058122-8248-9-git-send-email-raghavendra...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/include/asm/spinlock.h   | 10 +-
 arch/x86/include/asm/spinlock_types.h | 10 +-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 7442410..04a5cd5 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -78,7 +78,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
  */
 static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
-   register struct __raw_tickets inc = { .tail = 1 };
+   register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
inc = xadd(lock-tickets, inc);
 
@@ -104,7 +104,7 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
if (old.tickets.head != old.tickets.tail)
return 0;
 
-   new.head_tail = old.head_tail + (1  TICKET_SHIFT);
+   new.head_tail = old.head_tail + (TICKET_LOCK_INC  TICKET_SHIFT);
 
/* cmpxchg is a full barrier, so nothing can move before it */
return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
@@ -112,9 +112,9 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-   __ticket_t next = lock-tickets.head + 1;
+   __ticket_t next = lock-tickets.head + TICKET_LOCK_INC;
 
-   __add(lock-tickets.head, 1, UNLOCK_LOCK_PREFIX);
+   __add(lock-tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
__ticket_unlock_kick(lock, next);
 }
 
@@ -129,7 +129,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t 
*lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
-   return (__ticket_t)(tmp.tail - tmp.head)  1;
+   return (__ticket_t)(tmp.tail - tmp.head)  TICKET_LOCK_INC;
 }
 #define arch_spin_is_contended arch_spin_is_contended
 
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 83fd3c7..e96fcbd 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -3,7 +3,13 @@
 
 #include linux/types.h
 
-#if (CONFIG_NR_CPUS  256)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define __TICKET_LOCK_INC  2
+#else
+#define __TICKET_LOCK_INC  1
+#endif
+
+#if (CONFIG_NR_CPUS  (256 / __TICKET_LOCK_INC))
 typedef u8  __ticket_t;
 typedef u16 __ticketpair_t;
 #else
@@ -11,6 +17,8 @@ typedef u16 __ticket_t;
 typedef u32 __ticketpair_t;
 #endif
 
+#define TICKET_LOCK_INC((__ticket_t)__TICKET_LOCK_INC)
+
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
 
 typedef struct arch_spinlock {
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] xen: Defer spinlock setup until boot CPU setup

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  bf7aab3ad4b4364a293421d628a912a2153ee1ee
Gitweb: http://git.kernel.org/tip/bf7aab3ad4b4364a293421d628a912a2153ee1ee
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Fri, 9 Aug 2013 19:51:52 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Fri, 9 Aug 2013 07:53:18 -0700

xen: Defer spinlock setup until boot CPU setup

There's no need to do it at very early init, and doing it there
makes it impossible to use the jump_label machinery.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/1376058122-8248-5-git-send-email-raghavendra...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/xen/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index ca92754..3b52d80 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -279,6 +279,7 @@ static void __init xen_smp_prepare_boot_cpu(void)
 
xen_filter_cpu_maps();
xen_setup_vcpu_info_placement();
+   xen_init_spinlocks();
 }
 
 static void __init xen_smp_prepare_cpus(unsigned int max_cpus)
@@ -680,7 +681,6 @@ void __init xen_smp_init(void)
 {
smp_ops = xen_smp_ops;
xen_fill_possible_map();
-   xen_init_spinlocks();
 }
 
 static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] x86, spinlock: Replace pv spinlocks with pv ticketlocks

2013-08-10 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  545ac13892ab391049a92108cf59a0d05de7e28c
Gitweb: http://git.kernel.org/tip/545ac13892ab391049a92108cf59a0d05de7e28c
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Fri, 9 Aug 2013 19:51:49 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Fri, 9 Aug 2013 07:53:05 -0700

x86, spinlock: Replace pv spinlocks with pv ticketlocks

Rather than outright replacing the entire spinlock implementation in
order to paravirtualize it, keep the ticket lock implementation but add
a couple of pvops hooks on the slow patch (long spin on lock, unlocking
a contended lock).

Ticket locks have a number of nice properties, but they also have some
surprising behaviours in virtual environments.  They enforce a strict
FIFO ordering on cpus trying to take a lock; however, if the hypervisor
scheduler does not schedule the cpus in the correct order, the system can
waste a huge amount of time spinning until the next cpu can take the lock.

(See Thomas Friebel's talk Prevent Guests from Spinning Around
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

To address this, we add two hooks:
 - __ticket_spin_lock which is called after the cpu has been
   spinning on the lock for a significant number of iterations but has
   failed to take the lock (presumably because the cpu holding the lock
   has been descheduled).  The lock_spinning pvop is expected to block
   the cpu until it has been kicked by the current lock holder.
 - __ticket_spin_unlock, which on releasing a contended lock
   (there are more cpus with tail tickets), it looks to see if the next
   cpu is blocked and wakes it if so.

When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
functions causes all the extra code to go away.

Results:
===
setup: 32 core machine with 32 vcpu KVM guest (HT off)  with 8GB RAM
base = 3.11-rc
patched = base + pvspinlock V12

+-+++
 dbench (Throughput in MB/sec. Higher is better)
+-+++
|   base (stdev %)|patched(stdev%) | %gain  |
+-+++
| 15035.3   (0.3) |15150.0   (0.6) |   0.8  |
|  1470.0   (2.2) | 1713.7   (1.9) |  16.6  |
|   848.6   (4.3) |  967.8   (4.3) |  14.0  |
|   652.9   (3.5) |  685.3   (3.7) |   5.0  |
+-+++

pvspinlock shows benefits for overcommit ratio  1 for PLE enabled cases,
and undercommits results are flat

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/1376058122-8248-2-git-send-email-raghavendra...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
[ Raghavendra: Changed SPIN_THRESHOLD, fixed redefinition of arch_spinlock_t]
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/include/asm/paravirt.h   | 32 -
 arch/x86/include/asm/paravirt_types.h | 14 +
 arch/x86/include/asm/spinlock.h   | 53 ---
 arch/x86/include/asm/spinlock_types.h |  4 ---
 arch/x86/kernel/paravirt-spinlocks.c  | 15 ++
 arch/x86/xen/spinlock.c   |  8 --
 6 files changed, 65 insertions(+), 61 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index cfdc9ee..040e72d 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -712,36 +712,16 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP)  defined(CONFIG_PARAVIRT_SPINLOCKS)
 
-static inline int arch_spin_is_locked(struct arch_spinlock *lock)
+static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
+   __ticket_t ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
+   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static inline int arch_spin_is_contended(struct arch_spinlock *lock)
+static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
+   __ticket_t ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
-}
-#define arch_spin_is_contended arch_spin_is_contended
-
-static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
-}
-
-static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
- unsigned long flags)
-{
-   PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
-}
-
-static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
-{
-   return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
-}
-
-static __always_inline void arch_spin_unlock(struct arch_spinlock

[tip:x86/spinlocks] xen, pvticketlock: Allow interrupts to be enabled while blocking

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  38eddb85894561ab32c1de4171e1c1582f0efa78
Gitweb: http://git.kernel.org/tip/38eddb85894561ab32c1de4171e1c1582f0efa78
Author: Jeremy Fitzhardinge 
AuthorDate: Tue, 6 Aug 2013 17:14:12 +0530
Committer:  H. Peter Anvin 
CommitDate: Thu, 8 Aug 2013 16:07:01 -0700

xen, pvticketlock: Allow interrupts to be enabled while blocking

If interrupts were enabled when taking the spinlock, we can leave them
enabled while blocking to get the lock.

If we can enable interrupts while waiting for the lock to become
available, and we take an interrupt before entering the poll,
and the handler takes a spinlock which ends up going into
the slow state (invalidating the per-cpu "lock" and "want" values),
then when the interrupt handler returns the event channel will
remain pending so the poll will return immediately, causing it to
return out to the main spinlock loop.

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/20130806114412.20643.84141.sendpatch...@codeblue.in.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/xen/spinlock.c | 46 --
 1 file changed, 40 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 546112e..0438b93 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -142,7 +142,20 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
__ticket_t want)
 * partially setup state.
 */
local_irq_save(flags);
-
+   /*
+* We don't really care if we're overwriting some other
+* (lock,want) pair, as that would mean that we're currently
+* in an interrupt context, and the outer context had
+* interrupts enabled.  That has already kicked the VCPU out
+* of xen_poll_irq(), so it will just return spuriously and
+* retry with newly setup (lock,want).
+*
+* The ordering protocol on this is that the "lock" pointer
+* may only be set non-NULL if the "want" ticket is correct.
+* If we're updating "want", we must first clear "lock".
+*/
+   w->lock = NULL;
+   smp_wmb();
w->want = want;
smp_wmb();
w->lock = lock;
@@ -157,24 +170,43 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
__ticket_t want)
/* Only check lock once pending cleared */
barrier();
 
-   /* Mark entry to slowpath before doing the pickup test to make
-  sure we don't deadlock with an unlocker. */
+   /*
+* Mark entry to slowpath before doing the pickup test to make
+* sure we don't deadlock with an unlocker.
+*/
__ticket_enter_slowpath(lock);
 
-   /* check again make sure it didn't become free while
-  we weren't looking  */
+   /*
+* check again make sure it didn't become free while
+* we weren't looking
+*/
if (ACCESS_ONCE(lock->tickets.head) == want) {
add_stats(TAKEN_SLOW_PICKUP, 1);
goto out;
}
+
+   /* Allow interrupts while blocked */
+   local_irq_restore(flags);
+
+   /*
+* If an interrupt happens here, it will leave the wakeup irq
+* pending, which will cause xen_poll_irq() to return
+* immediately.
+*/
+
/* Block until irq becomes pending (or perhaps a spurious wakeup) */
xen_poll_irq(irq);
add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq));
+
+   local_irq_save(flags);
+
kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 out:
cpumask_clear_cpu(cpu, _cpus);
w->lock = NULL;
+
local_irq_restore(flags);
+
spin_time_accum_blocked(start);
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
@@ -188,7 +220,9 @@ static void xen_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
for_each_cpu(cpu, _cpus) {
const struct xen_lock_waiting *w = _cpu(lock_waiting, cpu);
 
-   if (w->lock == lock && w->want == next) {
+   /* Make sure we read lock before want */
+   if (ACCESS_ONCE(w->lock) == lock &&
+   ACCESS_ONCE(w->want) == next) {
add_stats(RELEASED_SLOW_KICKED, 1);
xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
break;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] x86, ticketlock: Add slowpath logic

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  faf5f87c305fc58464bf8c47d0a5d148ceee2e32
Gitweb: http://git.kernel.org/tip/faf5f87c305fc58464bf8c47d0a5d148ceee2e32
Author: Jeremy Fitzhardinge 
AuthorDate: Tue, 6 Aug 2013 17:13:52 +0530
Committer:  H. Peter Anvin 
CommitDate: Thu, 8 Aug 2013 16:06:58 -0700

x86, ticketlock: Add slowpath logic

Maintain a flag in the LSB of the ticket lock tail which indicates
whether anyone is in the lock slowpath and may need kicking when
the current holder unlocks.  The flags are set when the first locker
enters the slowpath, and cleared when unlocking to an empty queue (ie,
no contention).

In the specific implementation of lock_spinning(), make sure to set
the slowpath flags on the lock just before blocking.  We must do
this before the last-chance pickup test to prevent a deadlock
with the unlocker:

UnlockerLocker
test for lock pickup
-> fail
unlock
test slowpath
-> false
set slowpath flags
block

Whereas this works in any ordering:

UnlockerLocker
set slowpath flags
test for lock pickup
-> fail
block
unlock
test slowpath
-> true, kick

If the unlocker finds that the lock has the slowpath flag set but it is
actually uncontended (ie, head == tail, so nobody is waiting), then it
clears the slowpath flag.

The unlock code uses a locked add to update the head counter.  This also
acts as a full memory barrier so that its safe to subsequently
read back the slowflag state, knowing that the updated lock is visible
to the other CPUs.  If it were an unlocked add, then the flag read may
just be forwarded from the store buffer before it was visible to the other
CPUs, which could result in a deadlock.

Unfortunately this means we need to do a locked instruction when
unlocking with PV ticketlocks.  However, if PV ticketlocks are not
enabled, then the old non-locked "add" is the only unlocking code.

Note: this code relies on gcc making sure that unlikely() code is out of
line of the fastpath, which only happens when OPTIMIZE_SIZE=n.  If it
doesn't the generated code isn't too bad, but its definitely suboptimal.

Thanks to Srivatsa Vaddagiri for providing a bugfix to the original
version of this change, which has been folded in.
Thanks to Stephan Diestelhorst for commenting on some code which relied
on an inaccurate reading of the x86 memory ordering rules.

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/20130806114352.20643.75343.sendpatch...@codeblue.in.ibm.com
Signed-off-by: Srivatsa Vaddagiri 
Reviewed-by: Konrad Rzeszutek Wilk 
Cc: Stephan Diestelhorst 
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/paravirt.h   |  2 +-
 arch/x86/include/asm/spinlock.h   | 86 +--
 arch/x86/include/asm/spinlock_types.h |  2 +
 arch/x86/kernel/paravirt-spinlocks.c  |  3 ++
 arch/x86/xen/spinlock.c   |  6 +++
 5 files changed, 74 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 7131e12c..401f350 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -718,7 +718,7 @@ static __always_inline void __ticket_lock_spinning(struct 
arch_spinlock *lock,
PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
+static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
__ticket_t ticket)
 {
PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 04a5cd5..d68883d 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -1,11 +1,14 @@
 #ifndef _ASM_X86_SPINLOCK_H
 #define _ASM_X86_SPINLOCK_H
 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
+
 /*
  * Your basic SMP spinlocks, allowing only a single CPU anywhere
  *
@@ -37,32 +40,28 @@
 /* How long a lock should spin before we consider blocking */
 #define SPIN_THRESHOLD (1 << 15)
 
-#ifndef CONFIG_PARAVIRT_SPINLOCKS
+extern struct static_key paravirt_ticketlocks_enabled;
+static __always_inline bool static_key_false(struct static_key *key);
 
-static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
-   __ticket_t ticket)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
+   set_bit(0, (volatile unsigned long *)>tickets.tail);
 }
 
-static __always_inl

[tip:x86/spinlocks] x86, pvticketlock: When paravirtualizing ticket locks, increment by 2

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  00de643f44ed84156e18f8b9afa8613d3b37a298
Gitweb: http://git.kernel.org/tip/00de643f44ed84156e18f8b9afa8613d3b37a298
Author: Jeremy Fitzhardinge 
AuthorDate: Tue, 6 Aug 2013 17:13:13 +0530
Committer:  H. Peter Anvin 
CommitDate: Thu, 8 Aug 2013 16:06:50 -0700

x86, pvticketlock: When paravirtualizing ticket locks, increment by 2

Increment ticket head/tails by 2 rather than 1 to leave the LSB free
to store a "is in slowpath state" bit.  This halves the number
of possible CPUs for a given ticket size, but this shouldn't matter
in practice - kernels built for 32k+ CPU systems are probably
specially built for the hardware rather than a generic distro
kernel.

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/20130806114313.20643.60805.sendpatch...@codeblue.in.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk 
Tested-by: Attilio Rao 
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/spinlock.h   | 10 +-
 arch/x86/include/asm/spinlock_types.h | 10 +-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 7442410..04a5cd5 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -78,7 +78,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
  */
 static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
-   register struct __raw_tickets inc = { .tail = 1 };
+   register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
inc = xadd(>tickets, inc);
 
@@ -104,7 +104,7 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
if (old.tickets.head != old.tickets.tail)
return 0;
 
-   new.head_tail = old.head_tail + (1 << TICKET_SHIFT);
+   new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT);
 
/* cmpxchg is a full barrier, so nothing can move before it */
return cmpxchg(>head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
@@ -112,9 +112,9 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-   __ticket_t next = lock->tickets.head + 1;
+   __ticket_t next = lock->tickets.head + TICKET_LOCK_INC;
 
-   __add(>tickets.head, 1, UNLOCK_LOCK_PREFIX);
+   __add(>tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
__ticket_unlock_kick(lock, next);
 }
 
@@ -129,7 +129,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t 
*lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
-   return (__ticket_t)(tmp.tail - tmp.head) > 1;
+   return (__ticket_t)(tmp.tail - tmp.head) > TICKET_LOCK_INC;
 }
 #define arch_spin_is_contended arch_spin_is_contended
 
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 83fd3c7..e96fcbd 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -3,7 +3,13 @@
 
 #include 
 
-#if (CONFIG_NR_CPUS < 256)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define __TICKET_LOCK_INC  2
+#else
+#define __TICKET_LOCK_INC  1
+#endif
+
+#if (CONFIG_NR_CPUS < (256 / __TICKET_LOCK_INC))
 typedef u8  __ticket_t;
 typedef u16 __ticketpair_t;
 #else
@@ -11,6 +17,8 @@ typedef u16 __ticket_t;
 typedef u32 __ticketpair_t;
 #endif
 
+#define TICKET_LOCK_INC((__ticket_t)__TICKET_LOCK_INC)
+
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
 
 typedef struct arch_spinlock {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] xen: Defer spinlock setup until boot CPU setup

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  26dca1a3262c451846537d8c1f3f56290718b7d4
Gitweb: http://git.kernel.org/tip/26dca1a3262c451846537d8c1f3f56290718b7d4
Author: Jeremy Fitzhardinge 
AuthorDate: Tue, 6 Aug 2013 17:11:45 +0530
Committer:  H. Peter Anvin 
CommitDate: Thu, 8 Aug 2013 16:06:34 -0700

xen: Defer spinlock setup until boot CPU setup

There's no need to do it at very early init, and doing it there
makes it impossible to use the jump_label machinery.

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/20130806114145.20643.7527.sendpatch...@codeblue.in.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/xen/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index ca92754..3b52d80 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -279,6 +279,7 @@ static void __init xen_smp_prepare_boot_cpu(void)
 
xen_filter_cpu_maps();
xen_setup_vcpu_info_placement();
+   xen_init_spinlocks();
 }
 
 static void __init xen_smp_prepare_cpus(unsigned int max_cpus)
@@ -680,7 +681,6 @@ void __init xen_smp_init(void)
 {
smp_ops = xen_smp_ops;
xen_fill_possible_map();
-   xen_init_spinlocks();
 }
 
 static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] x86, ticketlock: Collapse a layer of functions

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  c1230bf0bf6c93ff2db93ba0b66839068f498e8b
Gitweb: http://git.kernel.org/tip/c1230bf0bf6c93ff2db93ba0b66839068f498e8b
Author: Jeremy Fitzhardinge 
AuthorDate: Tue, 6 Aug 2013 17:11:20 +0530
Committer:  H. Peter Anvin 
CommitDate: Thu, 8 Aug 2013 16:06:30 -0700

x86, ticketlock: Collapse a layer of functions

Now that the paravirtualization layer doesn't exist at the spinlock
level any more, we can collapse the __ticket_ functions into the arch_
functions.

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/20130806114120.20643.87847.sendpatch...@codeblue.in.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk 
Tested-by: Attilio Rao 
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/spinlock.h | 35 +--
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 4d54244..7442410 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -76,7 +76,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
  * in the high part, because a wide xadd increment of the low part would carry
  * up and contaminate the high part.
  */
-static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
register struct __raw_tickets inc = { .tail = 1 };
 
@@ -96,7 +96,7 @@ static __always_inline void __ticket_spin_lock(struct 
arch_spinlock *lock)
 out:   barrier();  /* make sure nothing creeps before the lock is taken */
 }
 
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
+static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
arch_spinlock_t old, new;
 
@@ -110,7 +110,7 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
return cmpxchg(>head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
 }
 
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
+static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
__ticket_t next = lock->tickets.head + 1;
 
@@ -118,46 +118,21 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
__ticket_unlock_kick(lock, next);
 }
 
-static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
return tmp.tail != tmp.head;
 }
 
-static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
+static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
 
return (__ticket_t)(tmp.tail - tmp.head) > 1;
 }
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_locked(lock);
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_contended(lock);
-}
 #define arch_spin_is_contended arch_spin_is_contended
 
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-   __ticket_spin_lock(lock);
-}
-
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-   return __ticket_spin_trylock(lock);
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-   __ticket_spin_unlock(lock);
-}
-
 static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
  unsigned long flags)
 {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  9f091fd2047dc4f9cc512e88f818786274fa646f
Gitweb: http://git.kernel.org/tip/9f091fd2047dc4f9cc512e88f818786274fa646f
Author: Jeremy Fitzhardinge 
AuthorDate: Tue, 6 Aug 2013 17:12:24 +0530
Committer:  H. Peter Anvin 
CommitDate: Thu, 8 Aug 2013 16:06:42 -0700

xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/20130806114224.20643.9099.sendpatch...@codeblue.in.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/xen/spinlock.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index a458729..669a971 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -244,6 +244,8 @@ void xen_uninit_lock_cpu(int cpu)
per_cpu(irq_name, cpu) = NULL;
 }
 
+static bool xen_pvspin __initdata = true;
+
 void __init xen_init_spinlocks(void)
 {
/*
@@ -253,10 +255,22 @@ void __init xen_init_spinlocks(void)
if (xen_hvm_domain())
return;
 
+   if (!xen_pvspin) {
+   printk(KERN_DEBUG "xen: PV spinlocks disabled\n");
+   return;
+   }
+
pv_lock_ops.lock_spinning = xen_lock_spinning;
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
+static __init int xen_parse_nopvspin(char *arg)
+{
+   xen_pvspin = false;
+   return 0;
+}
+early_param("xen_nopvspin", xen_parse_nopvspin);
+
 #ifdef CONFIG_XEN_DEBUG_FS
 
 static struct dentry *d_spin_debug;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] x86, pvticketlock: Use callee-save for lock_spinning

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  80aefe73ea1611bb064f4dcfe49b1aa9648922b3
Gitweb: http://git.kernel.org/tip/80aefe73ea1611bb064f4dcfe49b1aa9648922b3
Author: Jeremy Fitzhardinge 
AuthorDate: Tue, 6 Aug 2013 17:12:44 +0530
Committer:  H. Peter Anvin 
CommitDate: Thu, 8 Aug 2013 16:06:46 -0700

x86, pvticketlock: Use callee-save for lock_spinning

Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path.  To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/20130806114244.20643.21618.sendpatch...@codeblue.in.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk 
Tested-by: Attilio Rao 
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/paravirt.h   | 2 +-
 arch/x86/include/asm/paravirt_types.h | 2 +-
 arch/x86/kernel/paravirt-spinlocks.c  | 2 +-
 arch/x86/xen/spinlock.c   | 3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 040e72d..7131e12c 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -715,7 +715,7 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
__ticket_t ticket)
 {
-   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
+   PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
 static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index d5deb6d..350d017 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -330,7 +330,7 @@ struct arch_spinlock;
 #include 
 
 struct pv_lock_ops {
-   void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket);
+   struct paravirt_callee_save lock_spinning;
void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
 };
 
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index c2e010e..4251c1d 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -9,7 +9,7 @@
 
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-   .lock_spinning = paravirt_nop,
+   .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
.unlock_kick = paravirt_nop,
 #endif
 };
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 669a971..6c8792b 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -173,6 +173,7 @@ out:
local_irq_restore(flags);
spin_time_accum_blocked(start);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
 
 static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
 {
@@ -260,7 +261,7 @@ void __init xen_init_spinlocks(void)
return;
}
 
-   pv_lock_ops.lock_spinning = xen_lock_spinning;
+   pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] xen, pvticketlock: Xen implementation for PV ticket locks

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  82a5c77805c1824da74912f986f6dab241589ac7
Gitweb: http://git.kernel.org/tip/82a5c77805c1824da74912f986f6dab241589ac7
Author: Jeremy Fitzhardinge 
AuthorDate: Tue, 6 Aug 2013 17:12:04 +0530
Committer:  H. Peter Anvin 
CommitDate: Thu, 8 Aug 2013 16:06:38 -0700

xen, pvticketlock: Xen implementation for PV ticket locks

Replace the old Xen implementation of PV spinlocks with and implementation
of xen_lock_spinning and xen_unlock_kick.

xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
adds itself to the waiting_cpus set, and blocks on an event channel
until the channel becomes pending.

xen_unlock_kick searches the cpus in waiting_cpus looking for the one
which next wants this lock with the next ticket, if any.  If found,
it kicks it by making its event channel pending, which wakes it up.

We need to make sure interrupts are disabled while we're relying on the
contents of the per-cpu lock_waiting values, otherwise an interrupt
handler could come in, try to take some other lock, block, and overwrite
our values.

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/20130806114204.20643.58941.sendpatch...@codeblue.in.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk 
 [ Raghavendra:  use function + enum instead of macro, cmpxchg for zero status 
reset
Reintroduce break since we know the exact vCPU to send IPI as suggested by 
Konrad.]
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/xen/spinlock.c | 348 +++-
 1 file changed, 79 insertions(+), 269 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index d509629..a458729 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -17,45 +17,44 @@
 #include "xen-ops.h"
 #include "debugfs.h"
 
-#ifdef CONFIG_XEN_DEBUG_FS
-static struct xen_spinlock_stats
-{
-   u64 taken;
-   u32 taken_slow;
-   u32 taken_slow_nested;
-   u32 taken_slow_pickup;
-   u32 taken_slow_spurious;
-   u32 taken_slow_irqenable;
+enum xen_contention_stat {
+   TAKEN_SLOW,
+   TAKEN_SLOW_PICKUP,
+   TAKEN_SLOW_SPURIOUS,
+   RELEASED_SLOW,
+   RELEASED_SLOW_KICKED,
+   NR_CONTENTION_STATS
+};
 
-   u64 released;
-   u32 released_slow;
-   u32 released_slow_kicked;
 
+#ifdef CONFIG_XEN_DEBUG_FS
 #define HISTO_BUCKETS  30
-   u32 histo_spin_total[HISTO_BUCKETS+1];
-   u32 histo_spin_spinning[HISTO_BUCKETS+1];
+static struct xen_spinlock_stats
+{
+   u32 contention_stats[NR_CONTENTION_STATS];
u32 histo_spin_blocked[HISTO_BUCKETS+1];
-
-   u64 time_total;
-   u64 time_spinning;
u64 time_blocked;
 } spinlock_stats;
 
 static u8 zero_stats;
 
-static unsigned lock_timeout = 1 << 10;
-#define TIMEOUT lock_timeout
-
 static inline void check_zero(void)
 {
-   if (unlikely(zero_stats)) {
-   memset(_stats, 0, sizeof(spinlock_stats));
-   zero_stats = 0;
+   u8 ret;
+   u8 old = ACCESS_ONCE(zero_stats);
+   if (unlikely(old)) {
+   ret = cmpxchg(_stats, old, 0);
+   /* This ensures only one fellow resets the stat */
+   if (ret == old)
+   memset(_stats, 0, sizeof(spinlock_stats));
}
 }
 
-#define ADD_STATS(elem, val)   \
-   do { check_zero(); spinlock_stats.elem += (val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+   check_zero();
+   spinlock_stats.contention_stats[var] += val;
+}
 
 static inline u64 spin_time_start(void)
 {
@@ -74,22 +73,6 @@ static void __spin_time_accum(u64 delta, u32 *array)
array[HISTO_BUCKETS]++;
 }
 
-static inline void spin_time_accum_spinning(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_spinning);
-   spinlock_stats.time_spinning += delta;
-}
-
-static inline void spin_time_accum_total(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_total);
-   spinlock_stats.time_total += delta;
-}
-
 static inline void spin_time_accum_blocked(u64 start)
 {
u32 delta = xen_clocksource_read() - start;
@@ -99,19 +82,15 @@ static inline void spin_time_accum_blocked(u64 start)
 }
 #else  /* !CONFIG_XEN_DEBUG_FS */
 #define TIMEOUT(1 << 10)
-#define ADD_STATS(elem, val)   do { (void)(val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+}
 
 static inline u64 spin_time_start(void)
 {
return 0;
 }
 
-static inline void spin_time_accum_total(u64 start)
-{
-}
-static inline void spin_time_accum_spinning(u64 start)
-{
-}
 static inline void spin_time_accum_blocked(u64 start)
 {
 }
@@ -134,230 +113,84 @@ typedef u16 xen_spinners_t;
asm(LOCK_PREFIX " decw %0

[tip:x86/spinlocks] x86, spinlock: Replace pv spinlocks with pv ticketlocks

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  ee120b6a347e3ce6526bd2380fdf5858d5a8550d
Gitweb: http://git.kernel.org/tip/ee120b6a347e3ce6526bd2380fdf5858d5a8550d
Author: Jeremy Fitzhardinge 
AuthorDate: Tue, 6 Aug 2013 17:10:40 +0530
Committer:  H. Peter Anvin 
CommitDate: Thu, 8 Aug 2013 16:06:23 -0700

x86, spinlock: Replace pv spinlocks with pv ticketlocks

Rather than outright replacing the entire spinlock implementation in
order to paravirtualize it, keep the ticket lock implementation but add
a couple of pvops hooks on the slow patch (long spin on lock, unlocking
a contended lock).

Ticket locks have a number of nice properties, but they also have some
surprising behaviours in virtual environments.  They enforce a strict
FIFO ordering on cpus trying to take a lock; however, if the hypervisor
scheduler does not schedule the cpus in the correct order, the system can
waste a huge amount of time spinning until the next cpu can take the lock.

(See Thomas Friebel's talk "Prevent Guests from Spinning Around"
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

To address this, we add two hooks:
 - __ticket_spin_lock which is called after the cpu has been
   spinning on the lock for a significant number of iterations but has
   failed to take the lock (presumably because the cpu holding the lock
   has been descheduled).  The lock_spinning pvop is expected to block
   the cpu until it has been kicked by the current lock holder.
 - __ticket_spin_unlock, which on releasing a contended lock
   (there are more cpus with tail tickets), it looks to see if the next
   cpu is blocked and wakes it if so.

When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
functions causes all the extra code to go away.

Results:
===
setup: 32 core machine with 32 vcpu KVM guest (HT off)  with 8GB RAM
base = 3.11-rc
patched = base + pvspinlock V12

+-+++
 dbench (Throughput in MB/sec. Higher is better)
+-+++
|   base (stdev %)|patched(stdev%) | %gain  |
+-+++
| 15035.3   (0.3) |15150.0   (0.6) |   0.8  |
|  1470.0   (2.2) | 1713.7   (1.9) |  16.6  |
|   848.6   (4.3) |  967.8   (4.3) |  14.0  |
|   652.9   (3.5) |  685.3   (3.7) |   5.0  |
+-+++

pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases,
and undercommits results are flat

Signed-off-by: Jeremy Fitzhardinge 
Link: 
http://lkml.kernel.org/r/20130806114040.20643.84140.sendpatch...@codeblue.in.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk 
Tested-by: Attilio Rao 
[ Raghavendra: Changed SPIN_THRESHOLD ]
Signed-off-by: Raghavendra K T 
Acked-by: Ingo Molnar 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/include/asm/paravirt.h   | 32 -
 arch/x86/include/asm/paravirt_types.h | 10 +++
 arch/x86/include/asm/spinlock.h   | 53 ---
 arch/x86/include/asm/spinlock_types.h |  4 ---
 arch/x86/kernel/paravirt-spinlocks.c  | 15 ++
 arch/x86/xen/spinlock.c   |  8 --
 6 files changed, 61 insertions(+), 61 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index cfdc9ee..040e72d 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -712,36 +712,16 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)
 
-static inline int arch_spin_is_locked(struct arch_spinlock *lock)
+static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
+   __ticket_t ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
+   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static inline int arch_spin_is_contended(struct arch_spinlock *lock)
+static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
+   __ticket_t ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
-}
-#define arch_spin_is_contended arch_spin_is_contended
-
-static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
-}
-
-static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
- unsigned long flags)
-{
-   PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
-}
-
-static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
-{
-   return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
-}
-
-static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
+   PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
 #endif
diff --git a/arch/x86/include/asm/paravirt_types.h 
b

[tip:x86/spinlocks] x86, spinlock: Replace pv spinlocks with pv ticketlocks

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  ee120b6a347e3ce6526bd2380fdf5858d5a8550d
Gitweb: http://git.kernel.org/tip/ee120b6a347e3ce6526bd2380fdf5858d5a8550d
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Tue, 6 Aug 2013 17:10:40 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Thu, 8 Aug 2013 16:06:23 -0700

x86, spinlock: Replace pv spinlocks with pv ticketlocks

Rather than outright replacing the entire spinlock implementation in
order to paravirtualize it, keep the ticket lock implementation but add
a couple of pvops hooks on the slow patch (long spin on lock, unlocking
a contended lock).

Ticket locks have a number of nice properties, but they also have some
surprising behaviours in virtual environments.  They enforce a strict
FIFO ordering on cpus trying to take a lock; however, if the hypervisor
scheduler does not schedule the cpus in the correct order, the system can
waste a huge amount of time spinning until the next cpu can take the lock.

(See Thomas Friebel's talk Prevent Guests from Spinning Around
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

To address this, we add two hooks:
 - __ticket_spin_lock which is called after the cpu has been
   spinning on the lock for a significant number of iterations but has
   failed to take the lock (presumably because the cpu holding the lock
   has been descheduled).  The lock_spinning pvop is expected to block
   the cpu until it has been kicked by the current lock holder.
 - __ticket_spin_unlock, which on releasing a contended lock
   (there are more cpus with tail tickets), it looks to see if the next
   cpu is blocked and wakes it if so.

When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
functions causes all the extra code to go away.

Results:
===
setup: 32 core machine with 32 vcpu KVM guest (HT off)  with 8GB RAM
base = 3.11-rc
patched = base + pvspinlock V12

+-+++
 dbench (Throughput in MB/sec. Higher is better)
+-+++
|   base (stdev %)|patched(stdev%) | %gain  |
+-+++
| 15035.3   (0.3) |15150.0   (0.6) |   0.8  |
|  1470.0   (2.2) | 1713.7   (1.9) |  16.6  |
|   848.6   (4.3) |  967.8   (4.3) |  14.0  |
|   652.9   (3.5) |  685.3   (3.7) |   5.0  |
+-+++

pvspinlock shows benefits for overcommit ratio  1 for PLE enabled cases,
and undercommits results are flat

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/20130806114040.20643.84140.sendpatch...@codeblue.in.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
[ Raghavendra: Changed SPIN_THRESHOLD ]
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/include/asm/paravirt.h   | 32 -
 arch/x86/include/asm/paravirt_types.h | 10 +++
 arch/x86/include/asm/spinlock.h   | 53 ---
 arch/x86/include/asm/spinlock_types.h |  4 ---
 arch/x86/kernel/paravirt-spinlocks.c  | 15 ++
 arch/x86/xen/spinlock.c   |  8 --
 6 files changed, 61 insertions(+), 61 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index cfdc9ee..040e72d 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -712,36 +712,16 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP)  defined(CONFIG_PARAVIRT_SPINLOCKS)
 
-static inline int arch_spin_is_locked(struct arch_spinlock *lock)
+static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
+   __ticket_t ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
+   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static inline int arch_spin_is_contended(struct arch_spinlock *lock)
+static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
+   __ticket_t ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
-}
-#define arch_spin_is_contended arch_spin_is_contended
-
-static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
-}
-
-static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
- unsigned long flags)
-{
-   PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
-}
-
-static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
-{
-   return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
-}
-
-static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1

[tip:x86/spinlocks] xen, pvticketlock: Xen implementation for PV ticket locks

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  82a5c77805c1824da74912f986f6dab241589ac7
Gitweb: http://git.kernel.org/tip/82a5c77805c1824da74912f986f6dab241589ac7
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Tue, 6 Aug 2013 17:12:04 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Thu, 8 Aug 2013 16:06:38 -0700

xen, pvticketlock: Xen implementation for PV ticket locks

Replace the old Xen implementation of PV spinlocks with and implementation
of xen_lock_spinning and xen_unlock_kick.

xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
adds itself to the waiting_cpus set, and blocks on an event channel
until the channel becomes pending.

xen_unlock_kick searches the cpus in waiting_cpus looking for the one
which next wants this lock with the next ticket, if any.  If found,
it kicks it by making its event channel pending, which wakes it up.

We need to make sure interrupts are disabled while we're relying on the
contents of the per-cpu lock_waiting values, otherwise an interrupt
handler could come in, try to take some other lock, block, and overwrite
our values.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/20130806114204.20643.58941.sendpatch...@codeblue.in.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 [ Raghavendra:  use function + enum instead of macro, cmpxchg for zero status 
reset
Reintroduce break since we know the exact vCPU to send IPI as suggested by 
Konrad.]
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/xen/spinlock.c | 348 +++-
 1 file changed, 79 insertions(+), 269 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index d509629..a458729 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -17,45 +17,44 @@
 #include xen-ops.h
 #include debugfs.h
 
-#ifdef CONFIG_XEN_DEBUG_FS
-static struct xen_spinlock_stats
-{
-   u64 taken;
-   u32 taken_slow;
-   u32 taken_slow_nested;
-   u32 taken_slow_pickup;
-   u32 taken_slow_spurious;
-   u32 taken_slow_irqenable;
+enum xen_contention_stat {
+   TAKEN_SLOW,
+   TAKEN_SLOW_PICKUP,
+   TAKEN_SLOW_SPURIOUS,
+   RELEASED_SLOW,
+   RELEASED_SLOW_KICKED,
+   NR_CONTENTION_STATS
+};
 
-   u64 released;
-   u32 released_slow;
-   u32 released_slow_kicked;
 
+#ifdef CONFIG_XEN_DEBUG_FS
 #define HISTO_BUCKETS  30
-   u32 histo_spin_total[HISTO_BUCKETS+1];
-   u32 histo_spin_spinning[HISTO_BUCKETS+1];
+static struct xen_spinlock_stats
+{
+   u32 contention_stats[NR_CONTENTION_STATS];
u32 histo_spin_blocked[HISTO_BUCKETS+1];
-
-   u64 time_total;
-   u64 time_spinning;
u64 time_blocked;
 } spinlock_stats;
 
 static u8 zero_stats;
 
-static unsigned lock_timeout = 1  10;
-#define TIMEOUT lock_timeout
-
 static inline void check_zero(void)
 {
-   if (unlikely(zero_stats)) {
-   memset(spinlock_stats, 0, sizeof(spinlock_stats));
-   zero_stats = 0;
+   u8 ret;
+   u8 old = ACCESS_ONCE(zero_stats);
+   if (unlikely(old)) {
+   ret = cmpxchg(zero_stats, old, 0);
+   /* This ensures only one fellow resets the stat */
+   if (ret == old)
+   memset(spinlock_stats, 0, sizeof(spinlock_stats));
}
 }
 
-#define ADD_STATS(elem, val)   \
-   do { check_zero(); spinlock_stats.elem += (val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+   check_zero();
+   spinlock_stats.contention_stats[var] += val;
+}
 
 static inline u64 spin_time_start(void)
 {
@@ -74,22 +73,6 @@ static void __spin_time_accum(u64 delta, u32 *array)
array[HISTO_BUCKETS]++;
 }
 
-static inline void spin_time_accum_spinning(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_spinning);
-   spinlock_stats.time_spinning += delta;
-}
-
-static inline void spin_time_accum_total(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_total);
-   spinlock_stats.time_total += delta;
-}
-
 static inline void spin_time_accum_blocked(u64 start)
 {
u32 delta = xen_clocksource_read() - start;
@@ -99,19 +82,15 @@ static inline void spin_time_accum_blocked(u64 start)
 }
 #else  /* !CONFIG_XEN_DEBUG_FS */
 #define TIMEOUT(1  10)
-#define ADD_STATS(elem, val)   do { (void)(val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+}
 
 static inline u64 spin_time_start(void)
 {
return 0;
 }
 
-static inline void spin_time_accum_total(u64 start)
-{
-}
-static inline void spin_time_accum_spinning(u64 start)
-{
-}
 static inline void

[tip:x86/spinlocks] x86, pvticketlock: Use callee-save for lock_spinning

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  80aefe73ea1611bb064f4dcfe49b1aa9648922b3
Gitweb: http://git.kernel.org/tip/80aefe73ea1611bb064f4dcfe49b1aa9648922b3
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Tue, 6 Aug 2013 17:12:44 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Thu, 8 Aug 2013 16:06:46 -0700

x86, pvticketlock: Use callee-save for lock_spinning

Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path.  To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/20130806114244.20643.21618.sendpatch...@codeblue.in.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/include/asm/paravirt.h   | 2 +-
 arch/x86/include/asm/paravirt_types.h | 2 +-
 arch/x86/kernel/paravirt-spinlocks.c  | 2 +-
 arch/x86/xen/spinlock.c   | 3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 040e72d..7131e12c 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -715,7 +715,7 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
__ticket_t ticket)
 {
-   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
+   PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
 static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index d5deb6d..350d017 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -330,7 +330,7 @@ struct arch_spinlock;
 #include asm/spinlock_types.h
 
 struct pv_lock_ops {
-   void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket);
+   struct paravirt_callee_save lock_spinning;
void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
 };
 
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index c2e010e..4251c1d 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -9,7 +9,7 @@
 
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-   .lock_spinning = paravirt_nop,
+   .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
.unlock_kick = paravirt_nop,
 #endif
 };
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 669a971..6c8792b 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -173,6 +173,7 @@ out:
local_irq_restore(flags);
spin_time_accum_blocked(start);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
 
 static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
 {
@@ -260,7 +261,7 @@ void __init xen_init_spinlocks(void)
return;
}
 
-   pv_lock_ops.lock_spinning = xen_lock_spinning;
+   pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  9f091fd2047dc4f9cc512e88f818786274fa646f
Gitweb: http://git.kernel.org/tip/9f091fd2047dc4f9cc512e88f818786274fa646f
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Tue, 6 Aug 2013 17:12:24 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Thu, 8 Aug 2013 16:06:42 -0700

xen, pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/20130806114224.20643.9099.sendpatch...@codeblue.in.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/xen/spinlock.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index a458729..669a971 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -244,6 +244,8 @@ void xen_uninit_lock_cpu(int cpu)
per_cpu(irq_name, cpu) = NULL;
 }
 
+static bool xen_pvspin __initdata = true;
+
 void __init xen_init_spinlocks(void)
 {
/*
@@ -253,10 +255,22 @@ void __init xen_init_spinlocks(void)
if (xen_hvm_domain())
return;
 
+   if (!xen_pvspin) {
+   printk(KERN_DEBUG xen: PV spinlocks disabled\n);
+   return;
+   }
+
pv_lock_ops.lock_spinning = xen_lock_spinning;
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
+static __init int xen_parse_nopvspin(char *arg)
+{
+   xen_pvspin = false;
+   return 0;
+}
+early_param(xen_nopvspin, xen_parse_nopvspin);
+
 #ifdef CONFIG_XEN_DEBUG_FS
 
 static struct dentry *d_spin_debug;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] xen: Defer spinlock setup until boot CPU setup

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  26dca1a3262c451846537d8c1f3f56290718b7d4
Gitweb: http://git.kernel.org/tip/26dca1a3262c451846537d8c1f3f56290718b7d4
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Tue, 6 Aug 2013 17:11:45 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Thu, 8 Aug 2013 16:06:34 -0700

xen: Defer spinlock setup until boot CPU setup

There's no need to do it at very early init, and doing it there
makes it impossible to use the jump_label machinery.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/20130806114145.20643.7527.sendpatch...@codeblue.in.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/xen/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index ca92754..3b52d80 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -279,6 +279,7 @@ static void __init xen_smp_prepare_boot_cpu(void)
 
xen_filter_cpu_maps();
xen_setup_vcpu_info_placement();
+   xen_init_spinlocks();
 }
 
 static void __init xen_smp_prepare_cpus(unsigned int max_cpus)
@@ -680,7 +681,6 @@ void __init xen_smp_init(void)
 {
smp_ops = xen_smp_ops;
xen_fill_possible_map();
-   xen_init_spinlocks();
 }
 
 static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] x86, ticketlock: Collapse a layer of functions

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  c1230bf0bf6c93ff2db93ba0b66839068f498e8b
Gitweb: http://git.kernel.org/tip/c1230bf0bf6c93ff2db93ba0b66839068f498e8b
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Tue, 6 Aug 2013 17:11:20 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Thu, 8 Aug 2013 16:06:30 -0700

x86, ticketlock: Collapse a layer of functions

Now that the paravirtualization layer doesn't exist at the spinlock
level any more, we can collapse the __ticket_ functions into the arch_
functions.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/20130806114120.20643.87847.sendpatch...@codeblue.in.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/include/asm/spinlock.h | 35 +--
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 4d54244..7442410 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -76,7 +76,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
  * in the high part, because a wide xadd increment of the low part would carry
  * up and contaminate the high part.
  */
-static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
register struct __raw_tickets inc = { .tail = 1 };
 
@@ -96,7 +96,7 @@ static __always_inline void __ticket_spin_lock(struct 
arch_spinlock *lock)
 out:   barrier();  /* make sure nothing creeps before the lock is taken */
 }
 
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
+static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
arch_spinlock_t old, new;
 
@@ -110,7 +110,7 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
 }
 
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
+static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
__ticket_t next = lock-tickets.head + 1;
 
@@ -118,46 +118,21 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
__ticket_unlock_kick(lock, next);
 }
 
-static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
return tmp.tail != tmp.head;
 }
 
-static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
+static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
return (__ticket_t)(tmp.tail - tmp.head)  1;
 }
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_locked(lock);
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_contended(lock);
-}
 #define arch_spin_is_contended arch_spin_is_contended
 
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-   __ticket_spin_lock(lock);
-}
-
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-   return __ticket_spin_trylock(lock);
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-   __ticket_spin_unlock(lock);
-}
-
 static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
  unsigned long flags)
 {
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] x86, ticketlock: Add slowpath logic

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  faf5f87c305fc58464bf8c47d0a5d148ceee2e32
Gitweb: http://git.kernel.org/tip/faf5f87c305fc58464bf8c47d0a5d148ceee2e32
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Tue, 6 Aug 2013 17:13:52 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Thu, 8 Aug 2013 16:06:58 -0700

x86, ticketlock: Add slowpath logic

Maintain a flag in the LSB of the ticket lock tail which indicates
whether anyone is in the lock slowpath and may need kicking when
the current holder unlocks.  The flags are set when the first locker
enters the slowpath, and cleared when unlocking to an empty queue (ie,
no contention).

In the specific implementation of lock_spinning(), make sure to set
the slowpath flags on the lock just before blocking.  We must do
this before the last-chance pickup test to prevent a deadlock
with the unlocker:

UnlockerLocker
test for lock pickup
- fail
unlock
test slowpath
- false
set slowpath flags
block

Whereas this works in any ordering:

UnlockerLocker
set slowpath flags
test for lock pickup
- fail
block
unlock
test slowpath
- true, kick

If the unlocker finds that the lock has the slowpath flag set but it is
actually uncontended (ie, head == tail, so nobody is waiting), then it
clears the slowpath flag.

The unlock code uses a locked add to update the head counter.  This also
acts as a full memory barrier so that its safe to subsequently
read back the slowflag state, knowing that the updated lock is visible
to the other CPUs.  If it were an unlocked add, then the flag read may
just be forwarded from the store buffer before it was visible to the other
CPUs, which could result in a deadlock.

Unfortunately this means we need to do a locked instruction when
unlocking with PV ticketlocks.  However, if PV ticketlocks are not
enabled, then the old non-locked add is the only unlocking code.

Note: this code relies on gcc making sure that unlikely() code is out of
line of the fastpath, which only happens when OPTIMIZE_SIZE=n.  If it
doesn't the generated code isn't too bad, but its definitely suboptimal.

Thanks to Srivatsa Vaddagiri for providing a bugfix to the original
version of this change, which has been folded in.
Thanks to Stephan Diestelhorst for commenting on some code which relied
on an inaccurate reading of the x86 memory ordering rules.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/20130806114352.20643.75343.sendpatch...@codeblue.in.ibm.com
Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Stephan Diestelhorst stephan.diestelho...@amd.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/include/asm/paravirt.h   |  2 +-
 arch/x86/include/asm/spinlock.h   | 86 +--
 arch/x86/include/asm/spinlock_types.h |  2 +
 arch/x86/kernel/paravirt-spinlocks.c  |  3 ++
 arch/x86/xen/spinlock.c   |  6 +++
 5 files changed, 74 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 7131e12c..401f350 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -718,7 +718,7 @@ static __always_inline void __ticket_lock_spinning(struct 
arch_spinlock *lock,
PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
+static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
__ticket_t ticket)
 {
PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 04a5cd5..d68883d 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -1,11 +1,14 @@
 #ifndef _ASM_X86_SPINLOCK_H
 #define _ASM_X86_SPINLOCK_H
 
+#include linux/jump_label.h
 #include linux/atomic.h
 #include asm/page.h
 #include asm/processor.h
 #include linux/compiler.h
 #include asm/paravirt.h
+#include asm/bitops.h
+
 /*
  * Your basic SMP spinlocks, allowing only a single CPU anywhere
  *
@@ -37,32 +40,28 @@
 /* How long a lock should spin before we consider blocking */
 #define SPIN_THRESHOLD (1  15)
 
-#ifndef CONFIG_PARAVIRT_SPINLOCKS
+extern struct static_key paravirt_ticketlocks_enabled;
+static __always_inline bool static_key_false(struct static_key *key);
 
-static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock

[tip:x86/spinlocks] x86, pvticketlock: When paravirtualizing ticket locks, increment by 2

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  00de643f44ed84156e18f8b9afa8613d3b37a298
Gitweb: http://git.kernel.org/tip/00de643f44ed84156e18f8b9afa8613d3b37a298
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Tue, 6 Aug 2013 17:13:13 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Thu, 8 Aug 2013 16:06:50 -0700

x86, pvticketlock: When paravirtualizing ticket locks, increment by 2

Increment ticket head/tails by 2 rather than 1 to leave the LSB free
to store a is in slowpath state bit.  This halves the number
of possible CPUs for a given ticket size, but this shouldn't matter
in practice - kernels built for 32k+ CPU systems are probably
specially built for the hardware rather than a generic distro
kernel.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/20130806114313.20643.60805.sendpatch...@codeblue.in.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/include/asm/spinlock.h   | 10 +-
 arch/x86/include/asm/spinlock_types.h | 10 +-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 7442410..04a5cd5 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -78,7 +78,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
  */
 static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
-   register struct __raw_tickets inc = { .tail = 1 };
+   register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
inc = xadd(lock-tickets, inc);
 
@@ -104,7 +104,7 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
if (old.tickets.head != old.tickets.tail)
return 0;
 
-   new.head_tail = old.head_tail + (1  TICKET_SHIFT);
+   new.head_tail = old.head_tail + (TICKET_LOCK_INC  TICKET_SHIFT);
 
/* cmpxchg is a full barrier, so nothing can move before it */
return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
@@ -112,9 +112,9 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-   __ticket_t next = lock-tickets.head + 1;
+   __ticket_t next = lock-tickets.head + TICKET_LOCK_INC;
 
-   __add(lock-tickets.head, 1, UNLOCK_LOCK_PREFIX);
+   __add(lock-tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
__ticket_unlock_kick(lock, next);
 }
 
@@ -129,7 +129,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t 
*lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
-   return (__ticket_t)(tmp.tail - tmp.head)  1;
+   return (__ticket_t)(tmp.tail - tmp.head)  TICKET_LOCK_INC;
 }
 #define arch_spin_is_contended arch_spin_is_contended
 
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 83fd3c7..e96fcbd 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -3,7 +3,13 @@
 
 #include linux/types.h
 
-#if (CONFIG_NR_CPUS  256)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define __TICKET_LOCK_INC  2
+#else
+#define __TICKET_LOCK_INC  1
+#endif
+
+#if (CONFIG_NR_CPUS  (256 / __TICKET_LOCK_INC))
 typedef u8  __ticket_t;
 typedef u16 __ticketpair_t;
 #else
@@ -11,6 +17,8 @@ typedef u16 __ticket_t;
 typedef u32 __ticketpair_t;
 #endif
 
+#define TICKET_LOCK_INC((__ticket_t)__TICKET_LOCK_INC)
+
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
 
 typedef struct arch_spinlock {
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/spinlocks] xen, pvticketlock: Allow interrupts to be enabled while blocking

2013-08-08 Thread tip-bot for Jeremy Fitzhardinge
Commit-ID:  38eddb85894561ab32c1de4171e1c1582f0efa78
Gitweb: http://git.kernel.org/tip/38eddb85894561ab32c1de4171e1c1582f0efa78
Author: Jeremy Fitzhardinge jer...@goop.org
AuthorDate: Tue, 6 Aug 2013 17:14:12 +0530
Committer:  H. Peter Anvin h...@linux.intel.com
CommitDate: Thu, 8 Aug 2013 16:07:01 -0700

xen, pvticketlock: Allow interrupts to be enabled while blocking

If interrupts were enabled when taking the spinlock, we can leave them
enabled while blocking to get the lock.

If we can enable interrupts while waiting for the lock to become
available, and we take an interrupt before entering the poll,
and the handler takes a spinlock which ends up going into
the slow state (invalidating the per-cpu lock and want values),
then when the interrupt handler returns the event channel will
remain pending so the poll will return immediately, causing it to
return out to the main spinlock loop.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Link: 
http://lkml.kernel.org/r/20130806114412.20643.84141.sendpatch...@codeblue.in.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
Acked-by: Ingo Molnar mi...@kernel.org
Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 arch/x86/xen/spinlock.c | 46 --
 1 file changed, 40 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 546112e..0438b93 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -142,7 +142,20 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
__ticket_t want)
 * partially setup state.
 */
local_irq_save(flags);
-
+   /*
+* We don't really care if we're overwriting some other
+* (lock,want) pair, as that would mean that we're currently
+* in an interrupt context, and the outer context had
+* interrupts enabled.  That has already kicked the VCPU out
+* of xen_poll_irq(), so it will just return spuriously and
+* retry with newly setup (lock,want).
+*
+* The ordering protocol on this is that the lock pointer
+* may only be set non-NULL if the want ticket is correct.
+* If we're updating want, we must first clear lock.
+*/
+   w-lock = NULL;
+   smp_wmb();
w-want = want;
smp_wmb();
w-lock = lock;
@@ -157,24 +170,43 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
__ticket_t want)
/* Only check lock once pending cleared */
barrier();
 
-   /* Mark entry to slowpath before doing the pickup test to make
-  sure we don't deadlock with an unlocker. */
+   /*
+* Mark entry to slowpath before doing the pickup test to make
+* sure we don't deadlock with an unlocker.
+*/
__ticket_enter_slowpath(lock);
 
-   /* check again make sure it didn't become free while
-  we weren't looking  */
+   /*
+* check again make sure it didn't become free while
+* we weren't looking
+*/
if (ACCESS_ONCE(lock-tickets.head) == want) {
add_stats(TAKEN_SLOW_PICKUP, 1);
goto out;
}
+
+   /* Allow interrupts while blocked */
+   local_irq_restore(flags);
+
+   /*
+* If an interrupt happens here, it will leave the wakeup irq
+* pending, which will cause xen_poll_irq() to return
+* immediately.
+*/
+
/* Block until irq becomes pending (or perhaps a spurious wakeup) */
xen_poll_irq(irq);
add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq));
+
+   local_irq_save(flags);
+
kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 out:
cpumask_clear_cpu(cpu, waiting_cpus);
w-lock = NULL;
+
local_irq_restore(flags);
+
spin_time_accum_blocked(start);
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
@@ -188,7 +220,9 @@ static void xen_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
for_each_cpu(cpu, waiting_cpus) {
const struct xen_lock_waiting *w = per_cpu(lock_waiting, cpu);
 
-   if (w-lock == lock  w-want == next) {
+   /* Make sure we read lock before want */
+   if (ACCESS_ONCE(w-lock) == lock 
+   ACCESS_ONCE(w-want) == next) {
add_stats(RELEASED_SLOW_KICKED, 1);
xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
break;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Replace in linux-next the xen, xen-two, xen-arm with xen/tip.git tree instead.

2013-07-30 Thread Jeremy Fitzhardinge
On 07/30/2013 12:53 PM, Konrad Rzeszutek Wilk wrote:
> Hey,
>
> I was wondering if it would be possible to remove from linux-next
> the three xen trees and instead use a combined tree, similar to the
> x86 tip (so the various maintainers share it)?
>
> The ones that would be removed are:
>
> xen   git 
> git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git#upstream/xen
> xen-two   git 
> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git#linux-next
> xen-arm   git 
> git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git#linux-next
>
> And instead it would be pulled from:
>
> xen-tip   git 
> git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git#linux-next
>
> I presume you need Ack's from all of us (so Jeremy and Stefano) so CC-ing 
> them here.
>
> And Acked-by: Konrad Rzeszutek Wilk 
>

Ack from me.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Replace in linux-next the xen, xen-two, xen-arm with xen/tip.git tree instead.

2013-07-30 Thread Jeremy Fitzhardinge
On 07/30/2013 12:53 PM, Konrad Rzeszutek Wilk wrote:
 Hey,

 I was wondering if it would be possible to remove from linux-next
 the three xen trees and instead use a combined tree, similar to the
 x86 tip (so the various maintainers share it)?

 The ones that would be removed are:

 xen   git 
 git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git#upstream/xen
 xen-two   git 
 git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git#linux-next
 xen-arm   git 
 git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git#linux-next

 And instead it would be pulled from:

 xen-tip   git 
 git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git#linux-next

 I presume you need Ack's from all of us (so Jeremy and Stefano) so CC-ing 
 them here.

 And Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com


Ack from me.

J
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

2013-07-14 Thread Jeremy Fitzhardinge
On 07/14/2013 12:30 PM, Tim Northover wrote:
>> And that is why I think you should just consider "bt $x,y" to be
>> trivially the same thing and not at all ambiguous. Because there is
>> ABSOLUTELY ZERO ambiguity when people write
>>
>>bt $63, mem
>>
>> Zero. Nada. None. The semantics are *exactly* the same for btl and btq
>> in this case, so why would you want the user to specify one or the
>> other?
> I don't think you've actually tested that, have you? (x86-64)
>
> int main() {
>   long val = 0x;
>   char res;
>
>   asm("btl $63, %1\n\tsetc %0" : "=r"(res) : "m"(val));
>   printf("%d\n", res);
>
>   asm("btq $63, %1\n\tsetc %0" : "=r"(res) : "m"(val));
>   printf("%d\n", res);
> }

Blerk.  It doesn't undermine the original point - that gas can
unambiguously choose the right operation size for a constant bit offset
- but yes, the operation size is meaningful in the case of a immediate
bit offset. Its pretty nasty of Intel to hide that detail in Table 3-2,
far from the instructions which use it...

J

>
> Tim.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/asm: avoid mnemonics without type suffix

2013-07-14 Thread Jeremy Fitzhardinge
(resent without HTML)

On 07/14/2013 05:56 AM, Ramkumar Ramachandra wrote:
> 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30)
> changed a bunch of btrl/btsl instructions to btr/bts, with the following
> justification:
>
>   The inline assembly for the bit operations has been changed to remove
>   explicit sizing hints on the instructions, so the assembler will pick
>   the appropriate instruction forms depending on the architecture and
>   the context.
>
> Unfortunately, GNU as does no such thing, and the AT syntax manual
> [1] contains no references to any such inference.  As evidenced by the
> following experiment, gas always disambiguates btr/bts to btrl/btsl.
> Feed the following input to gas:
>
>   btrl$1, 0
>   btr $1, 0
>   btsl$1, 0
>   bts $1, 0

When I originally did those patches, I was careful make sure that we
didn't give implied sizes to operations with only immediate and/or
memory operands because - in general - gas can't infer the operation
size from such operands. However, in the case of the bit test/set
operations, the memory access size is not really derived from the
operation size (the SDM is a bit vague), and even if it were it would be
an operation rather than semantic difference.  So there's no real
problem with gas choosing 'l' as a default size in the absence of any
explicit override or constraint.

> Check that btr matches btrl, and bts matches btsl in both cases:
>
>   $ as --32 -a in.s
>   $ as --64 -a in.s
>
> To avoid giving readers the illusion of such an inference, and for
> clarity, change btr/bts back to btrl/btsl.  Also, llvm-mc refuses to
> disambiguate btr/bts automatically.

That sounds reasonable for all other operations because it makes a real
semantic difference, but overly strict for bit operations.

    J


> [1]: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf
>
> Cc: Jeremy Fitzhardinge 
> Cc: Andi Kleen 
> Cc: Linus Torvalds 
> Cc: Ingo Molnar 
> Cc: Thomas Gleixner 
> Cc: Eli Friedman 
> Cc: Jim Grosbach 
> Cc: Stephen Checkoway 
> Cc: LLVMdev 
> Signed-off-by: Ramkumar Ramachandra 
> ---
>  We discussed this pretty extensively on LLVMDev, but I'm still not
>  sure that I haven't missed something.
>
>  arch/x86/include/asm/bitops.h | 16 
>  arch/x86/include/asm/percpu.h |  2 +-
>  2 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
> index 6dfd019..6ed3d1e 100644
> --- a/arch/x86/include/asm/bitops.h
> +++ b/arch/x86/include/asm/bitops.h
> @@ -67,7 +67,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr)
>   : "iq" ((u8)CONST_MASK(nr))
>   : "memory");
>   } else {
> - asm volatile(LOCK_PREFIX "bts %1,%0"
> + asm volatile(LOCK_PREFIX "btsl %1,%0"
>   : BITOP_ADDR(addr) : "Ir" (nr) : "memory");
>   }
>  }
> @@ -83,7 +83,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr)
>   */
>  static inline void __set_bit(int nr, volatile unsigned long *addr)
>  {
> - asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory");
> + asm volatile("btsl %1,%0" : ADDR : "Ir" (nr) : "memory");
>  }
>  
>  /**
> @@ -104,7 +104,7 @@ clear_bit(int nr, volatile unsigned long *addr)
>   : CONST_MASK_ADDR(nr, addr)
>   : "iq" ((u8)~CONST_MASK(nr)));
>   } else {
> - asm volatile(LOCK_PREFIX "btr %1,%0"
> + asm volatile(LOCK_PREFIX "btrl %1,%0"
>   : BITOP_ADDR(addr)
>   : "Ir" (nr));
>   }
> @@ -126,7 +126,7 @@ static inline void clear_bit_unlock(unsigned nr, volatile 
> unsigned long *addr)
>  
>  static inline void __clear_bit(int nr, volatile unsigned long *addr)
>  {
> - asm volatile("btr %1,%0" : ADDR : "Ir" (nr));
> + asm volatile("btrl %1,%0" : ADDR : "Ir" (nr));
>  }
>  
>  /*
> @@ -198,7 +198,7 @@ static inline int test_and_set_bit(int nr, volatile 
> unsigned long *addr)
>  {
>   int oldbit;
>  
> - asm volatile(LOCK_PREFIX "bts %2,%1\n\t"
> + asm volatile(LOCK_PREFIX "btsl %2,%1\n\t"
>"sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory");
>  
>   return oldbit;
> @@ -230,7 +230,7 @@ static inline int __test_and_set_bit(int nr, volatile 
> unsigned long *addr)
>  {
>   int oldbit;
>  
> - asm("bts %

Re: [PATCH] x86/asm: avoid mnemonics without type suffix

2013-07-14 Thread Jeremy Fitzhardinge
(Resent without HTML)

On 07/14/2013 10:19 AM, Linus Torvalds wrote:
> Now, there are possible cases where you want to make the size explicit
> because you are mixing memory operand sizes and there can be nasty
> performance implications of doing a 32-bit write and then doing a
> 64-bit read of the result. I'm not actually aware of us having ever
> worried/cared about it, but it's a possible source of trouble: mixing
> bitop instructions with non-bitop instructions can have some subtle
> interactions, and you need to be careful, since the size of the
> operand affects both the offset *and* the memory access size.
The SDM entry for BT mentions that the instruction may touch 2 or 4
bytes depending on the operand size, but doesn't specifically mention
that a 64 bit operation size touches 8 bytes - and it doesn't mention
anything at all about operand size and access size in BTR/BTS/BTC
(unless it's implied as part of the discussion about encoding the MSBs
of a constant bit offset in the offset of the addressing mode). Is that
an oversight?

>  The
> access size generally is meaningless from a semantic standpoint
> (little-endian being the only sane model), but the access size *can*
> have performance implications for the write queue forwarding.

It looks like that if the base address isn't aligned then neither is the
generated access, so you could get a protection fault if it overlaps a
page boundary, which is a semantic rather than purely operational
difference.

J

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/asm: avoid mnemonics without type suffix

2013-07-14 Thread Jeremy Fitzhardinge
(resent without HTML)

On 07/14/2013 05:56 AM, Ramkumar Ramachandra wrote:
 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30)
 changed a bunch of btrl/btsl instructions to btr/bts, with the following
 justification:

   The inline assembly for the bit operations has been changed to remove
   explicit sizing hints on the instructions, so the assembler will pick
   the appropriate instruction forms depending on the architecture and
   the context.

 Unfortunately, GNU as does no such thing, and the ATT syntax manual
 [1] contains no references to any such inference.  As evidenced by the
 following experiment, gas always disambiguates btr/bts to btrl/btsl.
 Feed the following input to gas:

   btrl$1, 0
   btr $1, 0
   btsl$1, 0
   bts $1, 0

When I originally did those patches, I was careful make sure that we
didn't give implied sizes to operations with only immediate and/or
memory operands because - in general - gas can't infer the operation
size from such operands. However, in the case of the bit test/set
operations, the memory access size is not really derived from the
operation size (the SDM is a bit vague), and even if it were it would be
an operation rather than semantic difference.  So there's no real
problem with gas choosing 'l' as a default size in the absence of any
explicit override or constraint.

 Check that btr matches btrl, and bts matches btsl in both cases:

   $ as --32 -a in.s
   $ as --64 -a in.s

 To avoid giving readers the illusion of such an inference, and for
 clarity, change btr/bts back to btrl/btsl.  Also, llvm-mc refuses to
 disambiguate btr/bts automatically.

That sounds reasonable for all other operations because it makes a real
semantic difference, but overly strict for bit operations.

J


 [1]: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf

 Cc: Jeremy Fitzhardinge jer...@goop.org
 Cc: Andi Kleen a...@linux.intel.com
 Cc: Linus Torvalds torva...@linux-foundation.org
 Cc: Ingo Molnar mi...@kernel.org
 Cc: Thomas Gleixner t...@linutronix.de
 Cc: Eli Friedman eli.fried...@gmail.com
 Cc: Jim Grosbach grosb...@apple.com
 Cc: Stephen Checkoway s...@pahtak.org
 Cc: LLVMdev llvm...@cs.uiuc.edu
 Signed-off-by: Ramkumar Ramachandra artag...@gmail.com
 ---
  We discussed this pretty extensively on LLVMDev, but I'm still not
  sure that I haven't missed something.

  arch/x86/include/asm/bitops.h | 16 
  arch/x86/include/asm/percpu.h |  2 +-
  2 files changed, 9 insertions(+), 9 deletions(-)

 diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
 index 6dfd019..6ed3d1e 100644
 --- a/arch/x86/include/asm/bitops.h
 +++ b/arch/x86/include/asm/bitops.h
 @@ -67,7 +67,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr)
   : iq ((u8)CONST_MASK(nr))
   : memory);
   } else {
 - asm volatile(LOCK_PREFIX bts %1,%0
 + asm volatile(LOCK_PREFIX btsl %1,%0
   : BITOP_ADDR(addr) : Ir (nr) : memory);
   }
  }
 @@ -83,7 +83,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr)
   */
  static inline void __set_bit(int nr, volatile unsigned long *addr)
  {
 - asm volatile(bts %1,%0 : ADDR : Ir (nr) : memory);
 + asm volatile(btsl %1,%0 : ADDR : Ir (nr) : memory);
  }
  
  /**
 @@ -104,7 +104,7 @@ clear_bit(int nr, volatile unsigned long *addr)
   : CONST_MASK_ADDR(nr, addr)
   : iq ((u8)~CONST_MASK(nr)));
   } else {
 - asm volatile(LOCK_PREFIX btr %1,%0
 + asm volatile(LOCK_PREFIX btrl %1,%0
   : BITOP_ADDR(addr)
   : Ir (nr));
   }
 @@ -126,7 +126,7 @@ static inline void clear_bit_unlock(unsigned nr, volatile 
 unsigned long *addr)
  
  static inline void __clear_bit(int nr, volatile unsigned long *addr)
  {
 - asm volatile(btr %1,%0 : ADDR : Ir (nr));
 + asm volatile(btrl %1,%0 : ADDR : Ir (nr));
  }
  
  /*
 @@ -198,7 +198,7 @@ static inline int test_and_set_bit(int nr, volatile 
 unsigned long *addr)
  {
   int oldbit;
  
 - asm volatile(LOCK_PREFIX bts %2,%1\n\t
 + asm volatile(LOCK_PREFIX btsl %2,%1\n\t
sbb %0,%0 : =r (oldbit), ADDR : Ir (nr) : memory);
  
   return oldbit;
 @@ -230,7 +230,7 @@ static inline int __test_and_set_bit(int nr, volatile 
 unsigned long *addr)
  {
   int oldbit;
  
 - asm(bts %2,%1\n\t
 + asm(btsl %2,%1\n\t
   sbb %0,%0
   : =r (oldbit), ADDR
   : Ir (nr));
 @@ -249,7 +249,7 @@ static inline int test_and_clear_bit(int nr, volatile 
 unsigned long *addr)
  {
   int oldbit;
  
 - asm volatile(LOCK_PREFIX btr %2,%1\n\t
 + asm volatile(LOCK_PREFIX btrl %2,%1\n\t
sbb %0,%0
: =r (oldbit), ADDR : Ir (nr) : memory);
  
 @@ -276,7 +276,7 @@ static inline int __test_and_clear_bit(int nr, volatile 
 unsigned long *addr)
  {
   int oldbit

Re: [PATCH] x86/asm: avoid mnemonics without type suffix

2013-07-14 Thread Jeremy Fitzhardinge
(Resent without HTML)

On 07/14/2013 10:19 AM, Linus Torvalds wrote:
 Now, there are possible cases where you want to make the size explicit
 because you are mixing memory operand sizes and there can be nasty
 performance implications of doing a 32-bit write and then doing a
 64-bit read of the result. I'm not actually aware of us having ever
 worried/cared about it, but it's a possible source of trouble: mixing
 bitop instructions with non-bitop instructions can have some subtle
 interactions, and you need to be careful, since the size of the
 operand affects both the offset *and* the memory access size.
The SDM entry for BT mentions that the instruction may touch 2 or 4
bytes depending on the operand size, but doesn't specifically mention
that a 64 bit operation size touches 8 bytes - and it doesn't mention
anything at all about operand size and access size in BTR/BTS/BTC
(unless it's implied as part of the discussion about encoding the MSBs
of a constant bit offset in the offset of the addressing mode). Is that
an oversight?

  The
 access size generally is meaningless from a semantic standpoint
 (little-endian being the only sane model), but the access size *can*
 have performance implications for the write queue forwarding.

It looks like that if the base address isn't aligned then neither is the
generated access, so you could get a protection fault if it overlaps a
page boundary, which is a semantic rather than purely operational
difference.

J

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

2013-07-14 Thread Jeremy Fitzhardinge
On 07/14/2013 12:30 PM, Tim Northover wrote:
 And that is why I think you should just consider bt $x,y to be
 trivially the same thing and not at all ambiguous. Because there is
 ABSOLUTELY ZERO ambiguity when people write

bt $63, mem

 Zero. Nada. None. The semantics are *exactly* the same for btl and btq
 in this case, so why would you want the user to specify one or the
 other?
 I don't think you've actually tested that, have you? (x86-64)

 int main() {
   long val = 0x;
   char res;

   asm(btl $63, %1\n\tsetc %0 : =r(res) : m(val));
   printf(%d\n, res);

   asm(btq $63, %1\n\tsetc %0 : =r(res) : m(val));
   printf(%d\n, res);
 }

Blerk.  It doesn't undermine the original point - that gas can
unambiguously choose the right operation size for a constant bit offset
- but yes, the operation size is meaningful in the case of a immediate
bit offset. Its pretty nasty of Intel to hide that detail in Table 3-2,
far from the instructions which use it...

J


 Tim.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC V9 1/19] x86/spinlock: Replace pv spinlocks with pv ticketlocks

2013-06-01 Thread Jeremy Fitzhardinge
On 06/01/2013 12:21 PM, Raghavendra K T wrote:
> x86/spinlock: Replace pv spinlocks with pv ticketlocks
>
> From: Jeremy Fitzhardinge 
I'm not sure what the etiquette is here; I did the work while at Citrix,
but jer...@goop.org is my canonical email address.  The Citrix address
is dead and bounces, so is useless for anything.  Probably best to
change it.

J

>
> Rather than outright replacing the entire spinlock implementation in
> order to paravirtualize it, keep the ticket lock implementation but add
> a couple of pvops hooks on the slow patch (long spin on lock, unlocking
> a contended lock).
>
> Ticket locks have a number of nice properties, but they also have some
> surprising behaviours in virtual environments.  They enforce a strict
> FIFO ordering on cpus trying to take a lock; however, if the hypervisor
> scheduler does not schedule the cpus in the correct order, the system can
> waste a huge amount of time spinning until the next cpu can take the lock.
>
> (See Thomas Friebel's talk "Prevent Guests from Spinning Around"
> http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)
>
> To address this, we add two hooks:
>  - __ticket_spin_lock which is called after the cpu has been
>spinning on the lock for a significant number of iterations but has
>failed to take the lock (presumably because the cpu holding the lock
>has been descheduled).  The lock_spinning pvop is expected to block
>the cpu until it has been kicked by the current lock holder.
>  - __ticket_spin_unlock, which on releasing a contended lock
>(there are more cpus with tail tickets), it looks to see if the next
>cpu is blocked and wakes it if so.
>
> When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
> functions causes all the extra code to go away.
>
> Signed-off-by: Jeremy Fitzhardinge 
> Reviewed-by: Konrad Rzeszutek Wilk 
> Tested-by: Attilio Rao 
> [ Raghavendra: Changed SPIN_THRESHOLD ]
> Signed-off-by: Raghavendra K T 
> ---
>  arch/x86/include/asm/paravirt.h   |   32 
>  arch/x86/include/asm/paravirt_types.h |   10 ++
>  arch/x86/include/asm/spinlock.h   |   53 
> +++--
>  arch/x86/include/asm/spinlock_types.h |4 --
>  arch/x86/kernel/paravirt-spinlocks.c  |   15 +
>  arch/x86/xen/spinlock.c   |8 -
>  6 files changed, 61 insertions(+), 61 deletions(-)
>
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index cfdc9ee..040e72d 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -712,36 +712,16 @@ static inline void __set_fixmap(unsigned /* enum 
> fixed_addresses */ idx,
>  
>  #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)
>  
> -static inline int arch_spin_is_locked(struct arch_spinlock *lock)
> +static __always_inline void __ticket_lock_spinning(struct arch_spinlock 
> *lock,
> + __ticket_t ticket)
>  {
> - return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
> + PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
>  }
>  
> -static inline int arch_spin_is_contended(struct arch_spinlock *lock)
> +static __always_inline void ticket_unlock_kick(struct arch_spinlock 
> *lock,
> + __ticket_t ticket)
>  {
> - return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
> -}
> -#define arch_spin_is_contended   arch_spin_is_contended
> -
> -static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
> -{
> - PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
> -}
> -
> -static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
> -   unsigned long flags)
> -{
> - PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
> -}
> -
> -static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
> -{
> - return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
> -}
> -
> -static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
> -{
> - PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
> + PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
>  }
>  
>  #endif
> diff --git a/arch/x86/include/asm/paravirt_types.h 
> b/arch/x86/include/asm/paravirt_types.h
> index 0db1fca..d5deb6d 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -327,13 +327,11 @@ struct pv_mmu_ops {
>  };
>  
>  struct arch_spinlock;
> +#include 
> +
>  struct pv_lock_ops {
> - int (*spin_is_locked)(struct arch_

Re: [PATCH RFC V9 0/19] Paravirtualized ticket spinlocks

2013-06-01 Thread Jeremy Fitzhardinge
On 06/01/2013 01:14 PM, Andi Kleen wrote:
> FWIW I use the paravirt spinlock ops for adding lock elision
> to the spinlocks.

Does lock elision still use the ticketlock algorithm/structure, or are
they different?  If they're still basically ticketlocks, then it seems
to me that they're complimentary - hle handles the fastpath, and pv the
slowpath.

> This needs to be done at the top level (so the level you're removing)
>
> However I don't like the pv mechanism very much and would 
> be fine with using an static key hook in the main path
> like I do for all the other lock types.

Right.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC V9 0/19] Paravirtualized ticket spinlocks

2013-06-01 Thread Jeremy Fitzhardinge
On 06/01/2013 01:14 PM, Andi Kleen wrote:
 FWIW I use the paravirt spinlock ops for adding lock elision
 to the spinlocks.

Does lock elision still use the ticketlock algorithm/structure, or are
they different?  If they're still basically ticketlocks, then it seems
to me that they're complimentary - hle handles the fastpath, and pv the
slowpath.

 This needs to be done at the top level (so the level you're removing)

 However I don't like the pv mechanism very much and would 
 be fine with using an static key hook in the main path
 like I do for all the other lock types.

Right.

J
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC V9 1/19] x86/spinlock: Replace pv spinlocks with pv ticketlocks

2013-06-01 Thread Jeremy Fitzhardinge
On 06/01/2013 12:21 PM, Raghavendra K T wrote:
 x86/spinlock: Replace pv spinlocks with pv ticketlocks

 From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
I'm not sure what the etiquette is here; I did the work while at Citrix,
but jer...@goop.org is my canonical email address.  The Citrix address
is dead and bounces, so is useless for anything.  Probably best to
change it.

J


 Rather than outright replacing the entire spinlock implementation in
 order to paravirtualize it, keep the ticket lock implementation but add
 a couple of pvops hooks on the slow patch (long spin on lock, unlocking
 a contended lock).

 Ticket locks have a number of nice properties, but they also have some
 surprising behaviours in virtual environments.  They enforce a strict
 FIFO ordering on cpus trying to take a lock; however, if the hypervisor
 scheduler does not schedule the cpus in the correct order, the system can
 waste a huge amount of time spinning until the next cpu can take the lock.

 (See Thomas Friebel's talk Prevent Guests from Spinning Around
 http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

 To address this, we add two hooks:
  - __ticket_spin_lock which is called after the cpu has been
spinning on the lock for a significant number of iterations but has
failed to take the lock (presumably because the cpu holding the lock
has been descheduled).  The lock_spinning pvop is expected to block
the cpu until it has been kicked by the current lock holder.
  - __ticket_spin_unlock, which on releasing a contended lock
(there are more cpus with tail tickets), it looks to see if the next
cpu is blocked and wakes it if so.

 When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
 functions causes all the extra code to go away.

 Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
 Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 Tested-by: Attilio Rao attilio@citrix.com
 [ Raghavendra: Changed SPIN_THRESHOLD ]
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
  arch/x86/include/asm/paravirt.h   |   32 
  arch/x86/include/asm/paravirt_types.h |   10 ++
  arch/x86/include/asm/spinlock.h   |   53 
 +++--
  arch/x86/include/asm/spinlock_types.h |4 --
  arch/x86/kernel/paravirt-spinlocks.c  |   15 +
  arch/x86/xen/spinlock.c   |8 -
  6 files changed, 61 insertions(+), 61 deletions(-)

 diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
 index cfdc9ee..040e72d 100644
 --- a/arch/x86/include/asm/paravirt.h
 +++ b/arch/x86/include/asm/paravirt.h
 @@ -712,36 +712,16 @@ static inline void __set_fixmap(unsigned /* enum 
 fixed_addresses */ idx,
  
  #if defined(CONFIG_SMP)  defined(CONFIG_PARAVIRT_SPINLOCKS)
  
 -static inline int arch_spin_is_locked(struct arch_spinlock *lock)
 +static __always_inline void __ticket_lock_spinning(struct arch_spinlock 
 *lock,
 + __ticket_t ticket)
  {
 - return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
 + PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
  }
  
 -static inline int arch_spin_is_contended(struct arch_spinlock *lock)
 +static __always_inline void ticket_unlock_kick(struct arch_spinlock 
 *lock,
 + __ticket_t ticket)
  {
 - return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
 -}
 -#define arch_spin_is_contended   arch_spin_is_contended
 -
 -static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 -{
 - PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
 -}
 -
 -static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
 -   unsigned long flags)
 -{
 - PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
 -}
 -
 -static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
 -{
 - return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
 -}
 -
 -static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
 -{
 - PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
 + PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
  }
  
  #endif
 diff --git a/arch/x86/include/asm/paravirt_types.h 
 b/arch/x86/include/asm/paravirt_types.h
 index 0db1fca..d5deb6d 100644
 --- a/arch/x86/include/asm/paravirt_types.h
 +++ b/arch/x86/include/asm/paravirt_types.h
 @@ -327,13 +327,11 @@ struct pv_mmu_ops {
  };
  
  struct arch_spinlock;
 +#include asm/spinlock_types.h
 +
  struct pv_lock_ops {
 - int (*spin_is_locked)(struct arch_spinlock *lock);
 - int (*spin_is_contended)(struct arch_spinlock *lock);
 - void (*spin_lock)(struct arch_spinlock *lock);
 - void (*spin_lock_flags)(struct arch_spinlock *lock, unsigned long 
 flags);
 - int (*spin_trylock)(struct arch_spinlock *lock);
 - void (*spin_unlock

Re: [PATCH 2/2] Drivers: hv: Add Hyper-V balloon driver

2012-10-10 Thread Jeremy Fitzhardinge
On 10/09/2012 06:14 PM, Andrew Morton wrote:
> On Wed, 10 Oct 2012 00:09:12 + KY Srinivasan  wrote:
>
 +  if (!pg) {
 +  *alloc_error = true;
 +  return i * alloc_unit;
 +  }
 +
 +  totalram_pages -= alloc_unit;
>>> Well, I'd consider totalram_pages to be an mm-private thing which drivers
>>> shouldn't muck with.  Why is this done?
>> By modifying the totalram_pages, the information presented in /proc/meminfo
>> correctly reflects what is currently assigned to the guest (MemTotal).
> eh?  /proc/meminfo:MemTotal tells you the total memory in the machine. 
> The only thing which should change it after boot is memory hotplug. 
[...]
> Why on earth do balloon drivers do this?  If the amount of memory which
> is consumed by balloons is interesting then it should be exported via a
> standalone metric, not by mucking with totalram_pages.

Balloon drivers are trying to fake a form of page-by-page memory
hotplug.  When they allocate memory from the kernel, they're actually
giving the pages back to the hypervisor to redistribute to other
guests.  They reduce totalram_pages to try and reflect that the memory
is no longer the kernel (in Xen, at least, the pfns will no longer have
any physical page underlying them).

I agree this is pretty ugly; it would be nice to have some better
interface to indicate what's going on.  At one point I tried to use the
memory hotplug interfaces for larger-scale dynamic transfers of memory
between a domain and the host, but when I last looked at it, it was too
coarse grained and heavyweight to replace the balloon mechanism.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Drivers: hv: Add Hyper-V balloon driver

2012-10-10 Thread Jeremy Fitzhardinge
On 10/09/2012 06:14 PM, Andrew Morton wrote:
 On Wed, 10 Oct 2012 00:09:12 + KY Srinivasan k...@microsoft.com wrote:

 +  if (!pg) {
 +  *alloc_error = true;
 +  return i * alloc_unit;
 +  }
 +
 +  totalram_pages -= alloc_unit;
 Well, I'd consider totalram_pages to be an mm-private thing which drivers
 shouldn't muck with.  Why is this done?
 By modifying the totalram_pages, the information presented in /proc/meminfo
 correctly reflects what is currently assigned to the guest (MemTotal).
 eh?  /proc/meminfo:MemTotal tells you the total memory in the machine. 
 The only thing which should change it after boot is memory hotplug. 
[...]
 Why on earth do balloon drivers do this?  If the amount of memory which
 is consumed by balloons is interesting then it should be exported via a
 standalone metric, not by mucking with totalram_pages.

Balloon drivers are trying to fake a form of page-by-page memory
hotplug.  When they allocate memory from the kernel, they're actually
giving the pages back to the hypervisor to redistribute to other
guests.  They reduce totalram_pages to try and reflect that the memory
is no longer the kernel (in Xen, at least, the pfns will no longer have
any physical page underlying them).

I agree this is pretty ugly; it would be nice to have some better
interface to indicate what's going on.  At one point I tried to use the
memory hotplug interfaces for larger-scale dynamic transfers of memory
between a domain and the host, but when I last looked at it, it was too
coarse grained and heavyweight to replace the balloon mechanism.

J
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 53/74] x86, lto, paravirt: Make paravirt thunks global

2012-08-19 Thread Jeremy Fitzhardinge
On 08/18/2012 07:56 PM, Andi Kleen wrote:
> From: Andi Kleen 
>
> The paravirt thunks use a hack of using a static reference to a static
> function to reference that function from the top level statement.
>
> This assumes that gcc always generates static function names in a specific
> format, which is not necessarily true.
>
> Simply make these functions global and asmlinkage. This way the
> static __used variables are not needed and everything works.

I'm not a huge fan of unstaticing all this stuff, but it doesn't
surprise me that the current code is brittle in the face of gcc changes.

J

>
> Changed in paravirt and in all users (Xen and vsmp)
>
> Cc: jer...@goop.org
> Signed-off-by: Andi Kleen 
> ---
>  arch/x86/include/asm/paravirt.h |2 +-
>  arch/x86/kernel/vsmp_64.c   |8 
>  arch/x86/xen/irq.c  |8 
>  arch/x86/xen/mmu.c  |   16 
>  4 files changed, 17 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index a0facf3..cc733a6 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -804,9 +804,9 @@ static __always_inline void arch_spin_unlock(struct 
> arch_spinlock *lock)
>   */
>  #define PV_CALLEE_SAVE_REGS_THUNK(func)  
> \
>   extern typeof(func) __raw_callee_save_##func;   \
> - static void *__##func##__ __used = func;\
>   \
>   asm(".pushsection .text;"   \
> + ".globl __raw_callee_save_" #func " ; " \
>   "__raw_callee_save_" #func ": " \
>   PV_SAVE_ALL_CALLER_REGS \
>   "call " #func ";"   \
> diff --git a/arch/x86/kernel/vsmp_64.c b/arch/x86/kernel/vsmp_64.c
> index 992f890..f393d6d 100644
> --- a/arch/x86/kernel/vsmp_64.c
> +++ b/arch/x86/kernel/vsmp_64.c
> @@ -33,7 +33,7 @@
>   * and vice versa.
>   */
>  
> -static unsigned long vsmp_save_fl(void)
> +asmlinkage unsigned long vsmp_save_fl(void)
>  {
>   unsigned long flags = native_save_fl();
>  
> @@ -43,7 +43,7 @@ static unsigned long vsmp_save_fl(void)
>  }
>  PV_CALLEE_SAVE_REGS_THUNK(vsmp_save_fl);
>  
> -static void vsmp_restore_fl(unsigned long flags)
> +asmlinkage void vsmp_restore_fl(unsigned long flags)
>  {
>   if (flags & X86_EFLAGS_IF)
>   flags &= ~X86_EFLAGS_AC;
> @@ -53,7 +53,7 @@ static void vsmp_restore_fl(unsigned long flags)
>  }
>  PV_CALLEE_SAVE_REGS_THUNK(vsmp_restore_fl);
>  
> -static void vsmp_irq_disable(void)
> +asmlinkage void vsmp_irq_disable(void)
>  {
>   unsigned long flags = native_save_fl();
>  
> @@ -61,7 +61,7 @@ static void vsmp_irq_disable(void)
>  }
>  PV_CALLEE_SAVE_REGS_THUNK(vsmp_irq_disable);
>  
> -static void vsmp_irq_enable(void)
> +asmlinkage void vsmp_irq_enable(void)
>  {
>   unsigned long flags = native_save_fl();
>  
> diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
> index 1573376..3dd8831 100644
> --- a/arch/x86/xen/irq.c
> +++ b/arch/x86/xen/irq.c
> @@ -21,7 +21,7 @@ void xen_force_evtchn_callback(void)
>   (void)HYPERVISOR_xen_version(0, NULL);
>  }
>  
> -static unsigned long xen_save_fl(void)
> +asmlinkage unsigned long xen_save_fl(void)
>  {
>   struct vcpu_info *vcpu;
>   unsigned long flags;
> @@ -39,7 +39,7 @@ static unsigned long xen_save_fl(void)
>  }
>  PV_CALLEE_SAVE_REGS_THUNK(xen_save_fl);
>  
> -static void xen_restore_fl(unsigned long flags)
> +asmlinkage void xen_restore_fl(unsigned long flags)
>  {
>   struct vcpu_info *vcpu;
>  
> @@ -66,7 +66,7 @@ static void xen_restore_fl(unsigned long flags)
>  }
>  PV_CALLEE_SAVE_REGS_THUNK(xen_restore_fl);
>  
> -static void xen_irq_disable(void)
> +asmlinkage void xen_irq_disable(void)
>  {
>   /* There's a one instruction preempt window here.  We need to
>  make sure we're don't switch CPUs between getting the vcpu
> @@ -77,7 +77,7 @@ static void xen_irq_disable(void)
>  }
>  PV_CALLEE_SAVE_REGS_THUNK(xen_irq_disable);
>  
> -static void xen_irq_enable(void)
> +asmlinkage void xen_irq_enable(void)
>  {
>   struct vcpu_info *vcpu;
>  
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index b65a761..9f82443 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -429,7 +429,7 @@ static pteval_t iomap_pte(pteval_t val)
>   return val;
>  }
>  
> -static pteval_t xen_pte_val(pte_t pte)
> +asmlinkage pteval_t xen_pte_val(pte_t pte)
>  {
>   pteval_t pteval = pte.pte;
>  #if 0
> @@ -446,7 +446,7 @@ static pteval_t xen_pte_val(pte_t pte)
>  }
>  PV_CALLEE_SAVE_REGS_THUNK(xen_pte_val);
>  
> -static pgdval_t xen_pgd_val(pgd_t pgd)
> +asmlinkage pgdval_t xen_pgd_val(pgd_t pgd)
>  {
>   return pte_mfn_to_pfn(pgd.pgd);
>  }
> @@ 

Re: [PATCH 52/74] x86, lto, paravirt: Don't rely on local assembler labels

2012-08-19 Thread Jeremy Fitzhardinge
On 08/18/2012 07:56 PM, Andi Kleen wrote:
> From: Andi Kleen 
>
> The paravirt patching code assumes that it can reference a
> local assembler label between two different top level assembler
> statements. This does not work with some experimental gcc builds,
> where the assembler code may end up in different assembler files.

Egad, what are those zany gcc chaps up to now?

J

>
> Replace it with extern / global /asm linkage labels.
>
> This also removes one redundant copy of the macro.
>
> Cc: jer...@goop.org
> Signed-off-by: Andi Kleen 
> ---
>  arch/x86/include/asm/paravirt_types.h |9 +
>  arch/x86/kernel/paravirt.c|5 -
>  2 files changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/include/asm/paravirt_types.h 
> b/arch/x86/include/asm/paravirt_types.h
> index 4f262bc..6a464ba 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -385,10 +385,11 @@ extern struct pv_lock_ops pv_lock_ops;
>   _paravirt_alt(insn_string, "%c[paravirt_typenum]", 
> "%c[paravirt_clobber]")
>  
>  /* Simple instruction patching code. */
> -#define DEF_NATIVE(ops, name, code)  \
> - extern const char start_##ops##_##name[] __visible, \
> -   end_##ops##_##name[] __visible;   \
> - asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
> +#define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b 
> ":\n\t"
> +
> +#define DEF_NATIVE(ops, name, code)  \
> + __visible extern const char start_##ops##_##name[], 
> end_##ops##_##name[];   \
> + asm(NATIVE_LABEL("start_", ops, name) code NATIVE_LABEL("end_", ops, 
> name))
>  
>  unsigned paravirt_patch_nop(void);
>  unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len);
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index 17fff18..947255e 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -62,11 +62,6 @@ void __init default_banner(void)
>  pv_info.name);
>  }
>  
> -/* Simple instruction patching code. */
> -#define DEF_NATIVE(ops, name, code)  \
> - extern const char start_##ops##_##name[], end_##ops##_##name[]; \
> - asm("start_" #ops "_" #name ": " code "; end_" #ops "_" #name ":")
> -
>  /* Undefined instruction for dealing with missing ops pointers. */
>  static const unsigned char ud2a[] = { 0x0f, 0x0b };
>  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 52/74] x86, lto, paravirt: Don't rely on local assembler labels

2012-08-19 Thread Jeremy Fitzhardinge
On 08/18/2012 07:56 PM, Andi Kleen wrote:
 From: Andi Kleen a...@linux.intel.com

 The paravirt patching code assumes that it can reference a
 local assembler label between two different top level assembler
 statements. This does not work with some experimental gcc builds,
 where the assembler code may end up in different assembler files.

Egad, what are those zany gcc chaps up to now?

J


 Replace it with extern / global /asm linkage labels.

 This also removes one redundant copy of the macro.

 Cc: jer...@goop.org
 Signed-off-by: Andi Kleen a...@linux.intel.com
 ---
  arch/x86/include/asm/paravirt_types.h |9 +
  arch/x86/kernel/paravirt.c|5 -
  2 files changed, 5 insertions(+), 9 deletions(-)

 diff --git a/arch/x86/include/asm/paravirt_types.h 
 b/arch/x86/include/asm/paravirt_types.h
 index 4f262bc..6a464ba 100644
 --- a/arch/x86/include/asm/paravirt_types.h
 +++ b/arch/x86/include/asm/paravirt_types.h
 @@ -385,10 +385,11 @@ extern struct pv_lock_ops pv_lock_ops;
   _paravirt_alt(insn_string, %c[paravirt_typenum], 
 %c[paravirt_clobber])
  
  /* Simple instruction patching code. */
 -#define DEF_NATIVE(ops, name, code)  \
 - extern const char start_##ops##_##name[] __visible, \
 -   end_##ops##_##name[] __visible;   \
 - asm(start_ #ops _ #name :  code ; end_ #ops _ #name :)
 +#define NATIVE_LABEL(a,x,b) \n\t.globl  a #x _ #b \n a #x _ #b 
 :\n\t
 +
 +#define DEF_NATIVE(ops, name, code)  \
 + __visible extern const char start_##ops##_##name[], 
 end_##ops##_##name[];   \
 + asm(NATIVE_LABEL(start_, ops, name) code NATIVE_LABEL(end_, ops, 
 name))
  
  unsigned paravirt_patch_nop(void);
  unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len);
 diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
 index 17fff18..947255e 100644
 --- a/arch/x86/kernel/paravirt.c
 +++ b/arch/x86/kernel/paravirt.c
 @@ -62,11 +62,6 @@ void __init default_banner(void)
  pv_info.name);
  }
  
 -/* Simple instruction patching code. */
 -#define DEF_NATIVE(ops, name, code)  \
 - extern const char start_##ops##_##name[], end_##ops##_##name[]; \
 - asm(start_ #ops _ #name :  code ; end_ #ops _ #name :)
 -
  /* Undefined instruction for dealing with missing ops pointers. */
  static const unsigned char ud2a[] = { 0x0f, 0x0b };
  

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 53/74] x86, lto, paravirt: Make paravirt thunks global

2012-08-19 Thread Jeremy Fitzhardinge
On 08/18/2012 07:56 PM, Andi Kleen wrote:
 From: Andi Kleen a...@linux.intel.com

 The paravirt thunks use a hack of using a static reference to a static
 function to reference that function from the top level statement.

 This assumes that gcc always generates static function names in a specific
 format, which is not necessarily true.

 Simply make these functions global and asmlinkage. This way the
 static __used variables are not needed and everything works.

I'm not a huge fan of unstaticing all this stuff, but it doesn't
surprise me that the current code is brittle in the face of gcc changes.

J


 Changed in paravirt and in all users (Xen and vsmp)

 Cc: jer...@goop.org
 Signed-off-by: Andi Kleen a...@linux.intel.com
 ---
  arch/x86/include/asm/paravirt.h |2 +-
  arch/x86/kernel/vsmp_64.c   |8 
  arch/x86/xen/irq.c  |8 
  arch/x86/xen/mmu.c  |   16 
  4 files changed, 17 insertions(+), 17 deletions(-)

 diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
 index a0facf3..cc733a6 100644
 --- a/arch/x86/include/asm/paravirt.h
 +++ b/arch/x86/include/asm/paravirt.h
 @@ -804,9 +804,9 @@ static __always_inline void arch_spin_unlock(struct 
 arch_spinlock *lock)
   */
  #define PV_CALLEE_SAVE_REGS_THUNK(func)  
 \
   extern typeof(func) __raw_callee_save_##func;   \
 - static void *__##func##__ __used = func;\
   \
   asm(.pushsection .text;   \
 + .globl __raw_callee_save_ #func  ;  \
   __raw_callee_save_ #func :  \
   PV_SAVE_ALL_CALLER_REGS \
   call  #func ;   \
 diff --git a/arch/x86/kernel/vsmp_64.c b/arch/x86/kernel/vsmp_64.c
 index 992f890..f393d6d 100644
 --- a/arch/x86/kernel/vsmp_64.c
 +++ b/arch/x86/kernel/vsmp_64.c
 @@ -33,7 +33,7 @@
   * and vice versa.
   */
  
 -static unsigned long vsmp_save_fl(void)
 +asmlinkage unsigned long vsmp_save_fl(void)
  {
   unsigned long flags = native_save_fl();
  
 @@ -43,7 +43,7 @@ static unsigned long vsmp_save_fl(void)
  }
  PV_CALLEE_SAVE_REGS_THUNK(vsmp_save_fl);
  
 -static void vsmp_restore_fl(unsigned long flags)
 +asmlinkage void vsmp_restore_fl(unsigned long flags)
  {
   if (flags  X86_EFLAGS_IF)
   flags = ~X86_EFLAGS_AC;
 @@ -53,7 +53,7 @@ static void vsmp_restore_fl(unsigned long flags)
  }
  PV_CALLEE_SAVE_REGS_THUNK(vsmp_restore_fl);
  
 -static void vsmp_irq_disable(void)
 +asmlinkage void vsmp_irq_disable(void)
  {
   unsigned long flags = native_save_fl();
  
 @@ -61,7 +61,7 @@ static void vsmp_irq_disable(void)
  }
  PV_CALLEE_SAVE_REGS_THUNK(vsmp_irq_disable);
  
 -static void vsmp_irq_enable(void)
 +asmlinkage void vsmp_irq_enable(void)
  {
   unsigned long flags = native_save_fl();
  
 diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
 index 1573376..3dd8831 100644
 --- a/arch/x86/xen/irq.c
 +++ b/arch/x86/xen/irq.c
 @@ -21,7 +21,7 @@ void xen_force_evtchn_callback(void)
   (void)HYPERVISOR_xen_version(0, NULL);
  }
  
 -static unsigned long xen_save_fl(void)
 +asmlinkage unsigned long xen_save_fl(void)
  {
   struct vcpu_info *vcpu;
   unsigned long flags;
 @@ -39,7 +39,7 @@ static unsigned long xen_save_fl(void)
  }
  PV_CALLEE_SAVE_REGS_THUNK(xen_save_fl);
  
 -static void xen_restore_fl(unsigned long flags)
 +asmlinkage void xen_restore_fl(unsigned long flags)
  {
   struct vcpu_info *vcpu;
  
 @@ -66,7 +66,7 @@ static void xen_restore_fl(unsigned long flags)
  }
  PV_CALLEE_SAVE_REGS_THUNK(xen_restore_fl);
  
 -static void xen_irq_disable(void)
 +asmlinkage void xen_irq_disable(void)
  {
   /* There's a one instruction preempt window here.  We need to
  make sure we're don't switch CPUs between getting the vcpu
 @@ -77,7 +77,7 @@ static void xen_irq_disable(void)
  }
  PV_CALLEE_SAVE_REGS_THUNK(xen_irq_disable);
  
 -static void xen_irq_enable(void)
 +asmlinkage void xen_irq_enable(void)
  {
   struct vcpu_info *vcpu;
  
 diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
 index b65a761..9f82443 100644
 --- a/arch/x86/xen/mmu.c
 +++ b/arch/x86/xen/mmu.c
 @@ -429,7 +429,7 @@ static pteval_t iomap_pte(pteval_t val)
   return val;
  }
  
 -static pteval_t xen_pte_val(pte_t pte)
 +asmlinkage pteval_t xen_pte_val(pte_t pte)
  {
   pteval_t pteval = pte.pte;
  #if 0
 @@ -446,7 +446,7 @@ static pteval_t xen_pte_val(pte_t pte)
  }
  PV_CALLEE_SAVE_REGS_THUNK(xen_pte_val);
  
 -static pgdval_t xen_pgd_val(pgd_t pgd)
 +asmlinkage pgdval_t xen_pgd_val(pgd_t pgd)
  {
   return pte_mfn_to_pfn(pgd.pgd);
  }
 @@ -477,7 +477,7 @@ void xen_set_pat(u64 pat)
   WARN_ON(pat != 0x0007010600070106ull);
  }
  
 -static pte_t 

Re: [PATCH] netvm: check for page == NULL when propogating the skb->pfmemalloc flag

2012-08-13 Thread Jeremy Fitzhardinge
On 08/13/2012 03:47 AM, Mel Gorman wrote:
> Resending to correct Jeremy's address.
>
> On Wed, Aug 08, 2012 at 03:50:46PM -0700, David Miller wrote:
>> From: Mel Gorman 
>> Date: Tue, 7 Aug 2012 09:55:55 +0100
>>
>>> Commit [c48a11c7: netvm: propagate page->pfmemalloc to skb] is responsible
>>> for the following bug triggered by a xen network driver
>>  ...
>>> The problem is that the xenfront driver is passing a NULL page to
>>> __skb_fill_page_desc() which was unexpected. This patch checks that
>>> there is a page before dereferencing.
>>>
>>> Reported-and-Tested-by: Konrad Rzeszutek Wilk 
>>> Signed-off-by: Mel Gorman 
>> That call to __skb_fill_page_desc() in xen-netfront.c looks completely bogus.
>> It's the only driver passing NULL here.
>>
>> That whole song and dance figuring out what to do with the head
>> fragment page, depending upon whether the length is greater than the
>> RX_COPY_THRESHOLD, is completely unnecessary.
>>
>> Just use something like a call to __pskb_pull_tail(skb, len) and all
>> that other crap around that area can simply be deleted.
> I looked at this for a while but I did not see how __pskb_pull_tail()
> could be used sensibly but I'm simily not familiar with writing network
> device drivers or Xen.
>
> This messing with RX_COPY_THRESHOLD seems to be related to how the frontend
> and backend communicate (maybe some fixed limitation of the xenbus). The
> existing code looks like it is trying to take the fragments received and
> pass them straight to the backend without copying by passing the fragments
> to the backend without copying. I worry that if I try converting this to
> __pskb_pull_tail() that it would either hit the limitation of xenbus or
> introduce copying where it is not wanted.
>
> I'm going to have to punt this to Jeremy and the other Xen folk as I'm not
> sure what the original intention was and I don't have a Xen setup anywhere
> to test any patch. Jeremy, xen folk? 

It's been a while since I've looked at that stuff, but as I remember,
the issue is that since the packet ring memory is shared with another
domain which may be untrustworthy, we want to make copies of the headers
before making any decisions based on them so that the other domain can't
change them after header processing but before they're actually sent. 
(The packet payload is considered less important, but of course the same
issue applies if you're using some kind of content-aware packet filter.)

So that's the rationale for always copying RX_COPY_THRESHOLD, even if
the packet is larger than that amount.  As far as I know, changing this
behaviour wouldn't break the ring protocol, but it does introduce a
potential security issue.

J

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] netvm: check for page == NULL when propogating the skb-pfmemalloc flag

2012-08-13 Thread Jeremy Fitzhardinge
On 08/13/2012 03:47 AM, Mel Gorman wrote:
 Resending to correct Jeremy's address.

 On Wed, Aug 08, 2012 at 03:50:46PM -0700, David Miller wrote:
 From: Mel Gorman mgor...@suse.de
 Date: Tue, 7 Aug 2012 09:55:55 +0100

 Commit [c48a11c7: netvm: propagate page-pfmemalloc to skb] is responsible
 for the following bug triggered by a xen network driver
  ...
 The problem is that the xenfront driver is passing a NULL page to
 __skb_fill_page_desc() which was unexpected. This patch checks that
 there is a page before dereferencing.

 Reported-and-Tested-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 Signed-off-by: Mel Gorman mgor...@suse.de
 That call to __skb_fill_page_desc() in xen-netfront.c looks completely bogus.
 It's the only driver passing NULL here.

 That whole song and dance figuring out what to do with the head
 fragment page, depending upon whether the length is greater than the
 RX_COPY_THRESHOLD, is completely unnecessary.

 Just use something like a call to __pskb_pull_tail(skb, len) and all
 that other crap around that area can simply be deleted.
 I looked at this for a while but I did not see how __pskb_pull_tail()
 could be used sensibly but I'm simily not familiar with writing network
 device drivers or Xen.

 This messing with RX_COPY_THRESHOLD seems to be related to how the frontend
 and backend communicate (maybe some fixed limitation of the xenbus). The
 existing code looks like it is trying to take the fragments received and
 pass them straight to the backend without copying by passing the fragments
 to the backend without copying. I worry that if I try converting this to
 __pskb_pull_tail() that it would either hit the limitation of xenbus or
 introduce copying where it is not wanted.

 I'm going to have to punt this to Jeremy and the other Xen folk as I'm not
 sure what the original intention was and I don't have a Xen setup anywhere
 to test any patch. Jeremy, xen folk? 

It's been a while since I've looked at that stuff, but as I remember,
the issue is that since the packet ring memory is shared with another
domain which may be untrustworthy, we want to make copies of the headers
before making any decisions based on them so that the other domain can't
change them after header processing but before they're actually sent. 
(The packet payload is considered less important, but of course the same
issue applies if you're using some kind of content-aware packet filter.)

So that's the rationale for always copying RX_COPY_THRESHOLD, even if
the packet is larger than that amount.  As far as I know, changing this
behaviour wouldn't break the ring protocol, but it does introduce a
potential security issue.

J

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/mm: stop allocating pmd page if failed

2012-07-25 Thread Jeremy Fitzhardinge
On 07/24/2012 06:15 AM, Yuanhan Liu wrote:
> The old code would call __get_free_page() even though previous
> allocation fail met. This is not needed.

Yeah, I guess, but its hardly worth changing.

J


>
> Signed-off-by: Yuanhan Liu 
> Cc: Jeremy Fitzhardinge 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> ---
>  arch/x86/mm/pgtable.c |   18 +-
>  1 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index 8573b83..6760348 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -181,24 +181,24 @@ static void free_pmds(pmd_t *pmds[])
>  {
>   int i;
>  
> - for(i = 0; i < PREALLOCATED_PMDS; i++)
> - if (pmds[i])
> - free_page((unsigned long)pmds[i]);
> + for(i = 0; i < PREALLOCATED_PMDS; i++) {
> + if (pmds[i] == NULL)
> + break;
> + free_page((unsigned long)pmds[i]);
> + }
>  }
>  
>  static int preallocate_pmds(pmd_t *pmds[])
>  {
>   int i;
> - bool failed = false;
>  
>   for(i = 0; i < PREALLOCATED_PMDS; i++) {
> - pmd_t *pmd = (pmd_t *)__get_free_page(PGALLOC_GFP);
> - if (pmd == NULL)
> - failed = true;
> - pmds[i] = pmd;
> + pmds[i] = (pmd_t *)__get_free_page(PGALLOC_GFP);
> + if (pmds[i] == NULL)
> + break;
>   }
>  
> - if (failed) {
> + if (i < PREALLOCATED_PMDS) {
>   free_pmds(pmds);
>   return -ENOMEM;
>   }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/mm: stop allocating pmd page if failed

2012-07-25 Thread Jeremy Fitzhardinge
On 07/24/2012 06:15 AM, Yuanhan Liu wrote:
 The old code would call __get_free_page() even though previous
 allocation fail met. This is not needed.

Yeah, I guess, but its hardly worth changing.

J



 Signed-off-by: Yuanhan Liu yliu.n...@gmail.com
 Cc: Jeremy Fitzhardinge jer...@goop.org
 Cc: Thomas Gleixner t...@linutronix.de
 Cc: Ingo Molnar mi...@elte.hu
 Cc: H. Peter Anvin h...@zytor.com
 ---
  arch/x86/mm/pgtable.c |   18 +-
  1 files changed, 9 insertions(+), 9 deletions(-)

 diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
 index 8573b83..6760348 100644
 --- a/arch/x86/mm/pgtable.c
 +++ b/arch/x86/mm/pgtable.c
 @@ -181,24 +181,24 @@ static void free_pmds(pmd_t *pmds[])
  {
   int i;
  
 - for(i = 0; i  PREALLOCATED_PMDS; i++)
 - if (pmds[i])
 - free_page((unsigned long)pmds[i]);
 + for(i = 0; i  PREALLOCATED_PMDS; i++) {
 + if (pmds[i] == NULL)
 + break;
 + free_page((unsigned long)pmds[i]);
 + }
  }
  
  static int preallocate_pmds(pmd_t *pmds[])
  {
   int i;
 - bool failed = false;
  
   for(i = 0; i  PREALLOCATED_PMDS; i++) {
 - pmd_t *pmd = (pmd_t *)__get_free_page(PGALLOC_GFP);
 - if (pmd == NULL)
 - failed = true;
 - pmds[i] = pmd;
 + pmds[i] = (pmd_t *)__get_free_page(PGALLOC_GFP);
 + if (pmds[i] == NULL)
 + break;
   }
  
 - if (failed) {
 + if (i  PREALLOCATED_PMDS) {
   free_pmds(pmds);
   return -ENOMEM;
   }

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc1 xen pvops regression

2008-02-26 Thread Jeremy Fitzhardinge

Mark McLoughlin wrote:

@@ -371,6 +372,9 @@ void __init dmi_scan_machine(void)
}
}
else {
+   if (e820_all_mapped(0xF, 0xF+0x1, E820_RAM))
+   goto out;



One issue with using the e820 map for this is that a Xen Dom0 will also
have this region marked as RAM in the e820 map, but will set up a fixmap
for it, allowing dmi_scan_machine() to map the region.
  


Would it be easier to just fake up a mapping so that window points to 
the real dmi area, and mark E820 accordingly?


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc1 xen pvops regression

2008-02-26 Thread Jeremy Fitzhardinge

Mark McLoughlin wrote:

@@ -371,6 +372,9 @@ void __init dmi_scan_machine(void)
}
}
else {
+   if (e820_all_mapped(0xF, 0xF+0x1, E820_RAM))
+   goto out;



One issue with using the e820 map for this is that a Xen Dom0 will also
have this region marked as RAM in the e820 map, but will set up a fixmap
for it, allowing dmi_scan_machine() to map the region.
  


Would it be easier to just fake up a mapping so that window points to 
the real dmi area, and mark E820 accordingly?


   J
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: compile problem in current x86.git

2008-02-25 Thread Jeremy Fitzhardinge

Ingo Molnar wrote:

Jeremy, you might want to start tracking x86.git#testing:

  http://people.redhat.com/mingo/x86.git/README

if you want to follow the latest & greatest x86.git code.
  


Right, will do.

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: compile problem in current x86.git

2008-02-25 Thread Jeremy Fitzhardinge

Ingo Molnar wrote:

* Vegard Nossum <[EMAIL PROTECTED]> wrote:

  
 asm-x86/kmemcheck.h does seem to be completely missing.  Looks like 
 8db0acefb3025795abe3f37669354677a03de680 "x86: add hooks for 
 kmemcheck" should have added the file.
  
Hm. This is x86#testing, no? I don't think there's any kmemcheck code 
whatsoever in other branches.


The file should be added with this commit:

kmemcheck: add the kmemcheck core 
http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-x86.git;a=commit;h=c83d05d69382945c92a2e7a2b168c1cc2aa77c29



yes, x86.git looks fine here too:

 ~/linux.trees.git> git-checkout -b tmp x86/testing
 Branch tmp set up to track remote branch refs/remotes/x86/testing.
 Switched to a new branch "tmp"
 ~/linux.trees.git> cd include/asm-x86/
 ~/linux.trees.git/include/asm-x86> ls -l kmemcheck.h
 -rw-rw-r-- 1 mingo mingo 55 2008-02-25 21:41 kmemcheck.h
 ~/linux.trees.git/include/asm-x86> cd ..
 ~/linux.trees.git/include> cd ..
 ~/linux.trees.git> ls -ldt include/asm-x86/kmemcheck.h
 -rw-rw-r-- 1 mingo mingo 55 2008-02-25 21:41 include/asm-x86/kmemcheck.h
 ~/linux.trees.git> git-log | head -1
 commit c9d2f5489cec70f814bf64033290e5f05b4d7f33


I'm using #mm.  Should I be using #testing?

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


compile problem in current x86.git

2008-02-25 Thread Jeremy Fitzhardinge

 CC  arch/x86/kernel/traps_32.o
/home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/traps_32.c:59:27: error: 
asm/kmemcheck.h: No such file or directory


asm-x86/kmemcheck.h does seem to be completely missing.  Looks like 
8db0acefb3025795abe3f37669354677a03de680 "x86: add hooks for kmemcheck" 
should have added the file.


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


compile problem in current x86.git

2008-02-25 Thread Jeremy Fitzhardinge

 CC  arch/x86/kernel/traps_32.o
/home/jeremy/hg/xen/paravirt/linux/arch/x86/kernel/traps_32.c:59:27: error: 
asm/kmemcheck.h: No such file or directory


asm-x86/kmemcheck.h does seem to be completely missing.  Looks like 
8db0acefb3025795abe3f37669354677a03de680 x86: add hooks for kmemcheck 
should have added the file.


   J
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: compile problem in current x86.git

2008-02-25 Thread Jeremy Fitzhardinge

Ingo Molnar wrote:

* Vegard Nossum [EMAIL PROTECTED] wrote:

  
 asm-x86/kmemcheck.h does seem to be completely missing.  Looks like 
 8db0acefb3025795abe3f37669354677a03de680 x86: add hooks for 
 kmemcheck should have added the file.
  
Hm. This is x86#testing, no? I don't think there's any kmemcheck code 
whatsoever in other branches.


The file should be added with this commit:

kmemcheck: add the kmemcheck core 
http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-x86.git;a=commit;h=c83d05d69382945c92a2e7a2b168c1cc2aa77c29



yes, x86.git looks fine here too:

 ~/linux.trees.git git-checkout -b tmp x86/testing
 Branch tmp set up to track remote branch refs/remotes/x86/testing.
 Switched to a new branch tmp
 ~/linux.trees.git cd include/asm-x86/
 ~/linux.trees.git/include/asm-x86 ls -l kmemcheck.h
 -rw-rw-r-- 1 mingo mingo 55 2008-02-25 21:41 kmemcheck.h
 ~/linux.trees.git/include/asm-x86 cd ..
 ~/linux.trees.git/include cd ..
 ~/linux.trees.git ls -ldt include/asm-x86/kmemcheck.h
 -rw-rw-r-- 1 mingo mingo 55 2008-02-25 21:41 include/asm-x86/kmemcheck.h
 ~/linux.trees.git git-log | head -1
 commit c9d2f5489cec70f814bf64033290e5f05b4d7f33


I'm using #mm.  Should I be using #testing?

   J
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: compile problem in current x86.git

2008-02-25 Thread Jeremy Fitzhardinge

Ingo Molnar wrote:

Jeremy, you might want to start tracking x86.git#testing:

  http://people.redhat.com/mingo/x86.git/README

if you want to follow the latest  greatest x86.git code.
  


Right, will do.

   J
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] xen pvfb: Para-virtual framebuffer, keyboard and pointer driver

2008-02-22 Thread Jeremy Fitzhardinge

Markus Armbruster wrote:

Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:

  

Markus Armbruster wrote:


This is a pair of Xen para-virtual frontend device drivers:
drivers/video/xen-fbfront.c provides a framebuffer, and
drivers/input/xen-kbdfront provides keyboard and mouse.
  
  

Unless they're actually inter-dependent, could you post this as two
separate patches?  I don't know anything about these parts of the
kernel, so it would be nice to make it very obvious which changes are
fb vs mouse/keyboard.



I could do that do that, but the intermediate step (one driver, not
the other) is somewhat problematic: the backend in dom0 needs both
drivers, and will refuse to complete device initialization unless
they're both present.
  


That's OK.  In that case keep them together.

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] let XEN depend on PAE

2008-02-22 Thread Jeremy Fitzhardinge

Arnd Hannemann wrote:

Jeremy Fitzhardinge wrote:
  

Arnd Hannemann wrote:


This is with 2.6.24.2, but latest-git looks the same:
I also tried with 2.6.23 which crashes instantly, without any output
of the guest.
  
  

I'm not too surprised.  Non-PAE Xen is a bit of a rarity, and it only
gets tested rarely.  Chris Wright did spend some time on it a while ago,
but I don't know that its had any real attention since.  I've been
making sure non-PAE compiles, but I've been lax about testing it.
This is the first usermode exec, I guess?  The backtrace is a bit odd;
I've never seen a problem in move_page_tables before.



Yes its trying to execute the first script in initramfs, I also tried with 
initramdisk
and got a similar error. (move_page_tables also involved)

  

Does "xm dmesg" tell you what Xen is complaining about?  You may need to
compile with debug=y in Config.mk.



(XEN) mm.c:645:d44 Non-privileged (44) attempt to map I/O space 

I will recompile with debug=y and post the output.
If I reduce the dom0 memory with dom0_mem=20 I see something like
0080 with dom0_mem=80 I always see .
  


That's helpful.  Looks like the mfn is getting mushed to 0.

   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen: Implement getgeo for Xen virtual block device.

2008-02-22 Thread Jeremy Fitzhardinge

Linus Torvalds wrote:
This isn't a problem with things like "Signed-off-by:" etc tags, because 
they have no automated meaning and don't really change the commit itself, 
but the "From:"/"Date:"/"Subject:" markers at the head of the message 
really do have real meaning, and get removed from the commit message and 
instead get put into the SCM headers.
  


It may be worth having a definitive and unambiguous Author: tag then, 
which can appear among Signed-off-by:s and is used in preference to 
anything else.  From: is a useful heuristic which seems to work well in 
general, but as you say, it gets a bit hairy when you have something 
which means different things to different parts of the software stack at 
the same time.


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] let XEN depend on PAE

2008-02-22 Thread Jeremy Fitzhardinge

Arnd Hannemann wrote:

This is with 2.6.24.2, but latest-git looks the same:
I also tried with 2.6.23 which crashes instantly, without any output of the 
guest.
  


I'm not too surprised.  Non-PAE Xen is a bit of a rarity, and it only 
gets tested rarely.  Chris Wright did spend some time on it a while ago, 
but I don't know that its had any real attention since.  I've been 
making sure non-PAE compiles, but I've been lax about testing it.


This is the first usermode exec, I guess?  The backtrace is a bit odd; 
I've never seen a problem in move_page_tables before.


Does "xm dmesg" tell you what Xen is complaining about?  You may need to 
compile with debug=y in Config.mk.



[0.599806] 1 multicall(s) failed: cpu 0
[0.599816]   call  1/2: op=26 arg=[c1051860] result=0
[0.599825]   call  2/2: op=14 arg=[bf9c7000] result=-22
[0.599841] [ cut here ]
[0.599851] kernel BUG at arch/x86/xen/multicalls.c:103!
[0.599861] invalid opcode:  [#1] SMP
[0.599871] Modules linked in:
[0.599879]
[0.599885] Pid: 1, comm: init Not tainted (2.6.24.2 #6)
[0.599895] EIP: 0061:[] EFLAGS: 00010202 CPU: 0
[0.599910] EIP is at xen_mc_flush+0x19c/0x1b0
[0.599919] EAX:  EBX: c10510a0 ECX: c1051060 EDX: c1051060
[0.599930] ESI: 0002 EDI: 0001 EBP: c2417c10 ESP: c2417be4
[0.599940]  DS: 007b ES: 007b FS: 00d8 GS:  SS: e021
[0.599951] Process init (pid: 1, ti=c2417000 task=c2416ab0 task.ti=c2417000)
[0.599960] Stack: c0443c98 0002 0002 000e bf9c7000 ffea 
c1051060 0200
[0.599984]0067 c193fffc bf9c7000 c2417c18 c0101112 c2417c5c 
c0166dfc c193ce40
[0.66]c193e5c0 c000 c193e5c0 1000 c000 c193ce40 
c198e71c c10331cc
[0.600029] Call Trace:
[0.600036]  [] show_trace_log_lvl+0x1a/0x30
[0.600050]  [] show_stack_log_lvl+0xa9/0xd0
[0.600062]  [] show_registers+0xca/0x1e0
[0.600074]  [] die+0x11a/0x250
[0.600085]  [] do_trap+0x83/0xb0
[0.600096]  [] do_invalid_op+0x88/0xa0
[0.600108]  [] error_code+0x72/0x80
[0.600121]  [] xen_leave_lazy+0x12/0x20
[0.600134]  [] move_page_tables+0x27c/0x300
[0.600149]  [] setup_arg_pages+0x162/0x2a0
[0.600162]  [] load_elf_binary+0x3d3/0x1bd0
[0.600175]  [] search_binary_handler+0x92/0x200
[0.600190]  [] load_script+0x1bf/0x200
[0.600202]  [] search_binary_handler+0x92/0x200
[0.600215]  [] do_execve+0x15b/0x180
[0.600227]  [] sys_execve+0x2e/0x80
[0.600241]  [] syscall_call+0x7/0xb
[0.600253]  ===
[0.600259] Code: 24 08 89 44 24 0c 89 74 24 04 c7 04 24 98 3c 44 c0 e8 c9 36 02 
00 8b 45 ec 83 c3 20 8b 90 00 0b 00 00 39 d6 72 c0 e9 04 ff ff ff <0f> 0b eb fe 
0f 0b eb fe 8d b6 00 00 00 00 8d bf 00 00 00 00 55
[0.600370] EIP: [] xen_mc_flush+0x19c/0x1b0 SS:ESP e021:c2417be4
[0.600393] ---[ end trace a686db401f06e173 ]---
[0.600403] Kernel panic - not syncing: Attempted to kill init!

full dmesg, config here:
http://lists.xensource.com/archives/html/xen-devel/2008-02/msg00716.html

Best regards,
Arnd Hannemann

  


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] let XEN depend on PAE

2008-02-22 Thread Jeremy Fitzhardinge

Arnd Hannemann wrote:

As paravirtualized xen guests won't work with !X86_PAE, change the Kconfig
accordingly.
  


!PAE is supposed to work, but it is a rarely used configuration.  How 
does it fail?


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] let XEN depend on PAE

2008-02-22 Thread Jeremy Fitzhardinge

Arnd Hannemann wrote:

As paravirtualized xen guests won't work with !X86_PAE, change the Kconfig
accordingly.
  


!PAE is supposed to work, but it is a rarely used configuration.  How 
does it fail?


   J
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] let XEN depend on PAE

2008-02-22 Thread Jeremy Fitzhardinge

Arnd Hannemann wrote:

This is with 2.6.24.2, but latest-git looks the same:
I also tried with 2.6.23 which crashes instantly, without any output of the 
guest.
  


I'm not too surprised.  Non-PAE Xen is a bit of a rarity, and it only 
gets tested rarely.  Chris Wright did spend some time on it a while ago, 
but I don't know that its had any real attention since.  I've been 
making sure non-PAE compiles, but I've been lax about testing it.


This is the first usermode exec, I guess?  The backtrace is a bit odd; 
I've never seen a problem in move_page_tables before.


Does xm dmesg tell you what Xen is complaining about?  You may need to 
compile with debug=y in Config.mk.



[0.599806] 1 multicall(s) failed: cpu 0
[0.599816]   call  1/2: op=26 arg=[c1051860] result=0
[0.599825]   call  2/2: op=14 arg=[bf9c7000] result=-22
[0.599841] [ cut here ]
[0.599851] kernel BUG at arch/x86/xen/multicalls.c:103!
[0.599861] invalid opcode:  [#1] SMP
[0.599871] Modules linked in:
[0.599879]
[0.599885] Pid: 1, comm: init Not tainted (2.6.24.2 #6)
[0.599895] EIP: 0061:[c0101b7c] EFLAGS: 00010202 CPU: 0
[0.599910] EIP is at xen_mc_flush+0x19c/0x1b0
[0.599919] EAX:  EBX: c10510a0 ECX: c1051060 EDX: c1051060
[0.599930] ESI: 0002 EDI: 0001 EBP: c2417c10 ESP: c2417be4
[0.599940]  DS: 007b ES: 007b FS: 00d8 GS:  SS: e021
[0.599951] Process init (pid: 1, ti=c2417000 task=c2416ab0 task.ti=c2417000)
[0.599960] Stack: c0443c98 0002 0002 000e bf9c7000 ffea 
c1051060 0200
[0.599984]0067 c193fffc bf9c7000 c2417c18 c0101112 c2417c5c 
c0166dfc c193ce40
[0.66]c193e5c0 c000 c193e5c0 1000 c000 c193ce40 
c198e71c c10331cc
[0.600029] Call Trace:
[0.600036]  [c0107a6a] show_trace_log_lvl+0x1a/0x30
[0.600050]  [c0107b29] show_stack_log_lvl+0xa9/0xd0
[0.600062]  [c0107c1a] show_registers+0xca/0x1e0
[0.600074]  [c0107e4a] die+0x11a/0x250
[0.600085]  [c0108003] do_trap+0x83/0xb0
[0.600096]  [c0108318] do_invalid_op+0x88/0xa0
[0.600108]  [c03e89d2] error_code+0x72/0x80
[0.600121]  [c0101112] xen_leave_lazy+0x12/0x20
[0.600134]  [c0166dfc] move_page_tables+0x27c/0x300
[0.600149]  [c0174762] setup_arg_pages+0x162/0x2a0
[0.600162]  [c019cad3] load_elf_binary+0x3d3/0x1bd0
[0.600175]  [c0173f92] search_binary_handler+0x92/0x200
[0.600190]  [c019b1ef] load_script+0x1bf/0x200
[0.600202]  [c0173f92] search_binary_handler+0x92/0x200
[0.600215]  [c0175bab] do_execve+0x15b/0x180
[0.600227]  [c0104a2e] sys_execve+0x2e/0x80
[0.600241]  [c0106342] syscall_call+0x7/0xb
[0.600253]  ===
[0.600259] Code: 24 08 89 44 24 0c 89 74 24 04 c7 04 24 98 3c 44 c0 e8 c9 36 02 
00 8b 45 ec 83 c3 20 8b 90 00 0b 00 00 39 d6 72 c0 e9 04 ff ff ff 0f 0b eb fe 
0f 0b eb fe 8d b6 00 00 00 00 8d bf 00 00 00 00 55
[0.600370] EIP: [c0101b7c] xen_mc_flush+0x19c/0x1b0 SS:ESP e021:c2417be4
[0.600393] ---[ end trace a686db401f06e173 ]---
[0.600403] Kernel panic - not syncing: Attempted to kill init!

full dmesg, config here:
http://lists.xensource.com/archives/html/xen-devel/2008-02/msg00716.html

Best regards,
Arnd Hannemann

  


   J
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen: Implement getgeo for Xen virtual block device.

2008-02-22 Thread Jeremy Fitzhardinge

Linus Torvalds wrote:
This isn't a problem with things like Signed-off-by: etc tags, because 
they have no automated meaning and don't really change the commit itself, 
but the From:/Date:/Subject: markers at the head of the message 
really do have real meaning, and get removed from the commit message and 
instead get put into the SCM headers.
  


It may be worth having a definitive and unambiguous Author: tag then, 
which can appear among Signed-off-by:s and is used in preference to 
anything else.  From: is a useful heuristic which seems to work well in 
general, but as you say, it gets a bit hairy when you have something 
which means different things to different parts of the software stack at 
the same time.


   J
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] let XEN depend on PAE

2008-02-22 Thread Jeremy Fitzhardinge

Arnd Hannemann wrote:

Jeremy Fitzhardinge wrote:
  

Arnd Hannemann wrote:


This is with 2.6.24.2, but latest-git looks the same:
I also tried with 2.6.23 which crashes instantly, without any output
of the guest.
  
  

I'm not too surprised.  Non-PAE Xen is a bit of a rarity, and it only
gets tested rarely.  Chris Wright did spend some time on it a while ago,
but I don't know that its had any real attention since.  I've been
making sure non-PAE compiles, but I've been lax about testing it.
This is the first usermode exec, I guess?  The backtrace is a bit odd;
I've never seen a problem in move_page_tables before.



Yes its trying to execute the first script in initramfs, I also tried with 
initramdisk
and got a similar error. (move_page_tables also involved)

  

Does xm dmesg tell you what Xen is complaining about?  You may need to
compile with debug=y in Config.mk.



(XEN) mm.c:645:d44 Non-privileged (44) attempt to map I/O space 

I will recompile with debug=y and post the output.
If I reduce the dom0 memory with dom0_mem=20 I see something like
0080 with dom0_mem=80 I always see .
  


That's helpful.  Looks like the mfn is getting mushed to 0.

   J
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] xen pvfb: Para-virtual framebuffer, keyboard and pointer driver

2008-02-22 Thread Jeremy Fitzhardinge

Markus Armbruster wrote:

Jeremy Fitzhardinge [EMAIL PROTECTED] writes:

  

Markus Armbruster wrote:


This is a pair of Xen para-virtual frontend device drivers:
drivers/video/xen-fbfront.c provides a framebuffer, and
drivers/input/xen-kbdfront provides keyboard and mouse.
  
  

Unless they're actually inter-dependent, could you post this as two
separate patches?  I don't know anything about these parts of the
kernel, so it would be nice to make it very obvious which changes are
fb vs mouse/keyboard.



I could do that do that, but the intermediate step (one driver, not
the other) is somewhat problematic: the backend in dom0 needs both
drivers, and will refuse to complete device initialization unless
they're both present.
  


That's OK.  In that case keep them together.

   J
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc1 xen pvops regression

2008-02-21 Thread Jeremy Fitzhardinge

H. Peter Anvin wrote:

Jeremy Fitzhardinge wrote:


It seems to me that those pages are being handed out as heap pages by 
the early allocator.  In the Xen case this is OK because there's 
nothing magic about them.  But if real hardware doesn't reserve these 
pages in the E820 map, then they could end up being used as regular 
memory by mistake, which is an issue.




No, they couldn't.

On real hardware they'll be memory types 0 or 2, depending on whether 
or not they're marked reserved.


Available RAM is type 1. 


OK.  Well, perhaps Ian's patch could be amended to test to see if the 
e820 map marks the ISA ROM region as normal RAM, and skip it if so?


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen: Implement getgeo for Xen virtual block device.

2008-02-21 Thread Jeremy Fitzhardinge

Linus Torvalds wrote:

On Thu, 21 Feb 2008, Jeremy Fitzhardinge wrote:
  

OK.  Have you fixed it, or shall I resend?



I'll fix it, but I want people to know so that I don't have to fix things 
like this in the future (*).


Linus

(*) I keed, I keed. Of *course* I'll have to fix things like this in the 
future too. But hopefully not quite as often.
  


Putting the From: in the Signed-off-by block is a result of two thoughts:

  1. putting it at the top makes the most sense from an email
 perspective, but it often seem to get lost by various
 patch-posting programs if it gets tangled in the Subject/summary
 part of the patch.  The result is that it needs to float in an odd
 way:

 Subject: wooble the foo

 From: Foo Woobler <[EMAIL PROTECTED]>

 Wooble foos in the appropriate manner.

 Signed-off-by: Foo Woobler <[EMAIL PROTECTED]>
 Cc: Bar Mangler <[EMAIL PROTECTED]> 
 


  2. There's already a block of email addresses which describe how
 people relate to this patch, so why not put From: there (since it
 isn't really an email From header, but a patch metadata header). 
 I'd assumed that tools which pick "Thing: Email" pairs out of a
 patch would deal with From in the same place as a Signed-off-by. 
 After all, tools deal with Cc:s there.



I'll make sure From: is in the right place in future, but I just wanted 
to point out it wasn't complete randomness.


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc1 xen pvops regression

2008-02-21 Thread Jeremy Fitzhardinge

H. Peter Anvin wrote:
Still curious about why a pagetable page is ending up in that range 
though.  Seems like it shouldn't be possible, since we shouldn't be 
allowed to allocate from those pages, at least until the DMI probe 
has happened...  Unless the early allocator is only excluded from 
e820 reserved pages, which would cause a problem on systems which 
don't reserve the DMI space...  HPA?




I thought the problem was a Xen-provided pagetable from before Linux 
started? 


Hm, I don't think so.  The domain-builder pagetable is put after the 
kernel, so it shouldn't be under 1M.


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.25-rc1 xen pvops regression

2008-02-21 Thread Jeremy Fitzhardinge

Ian Campbell wrote:

I'll see if I can track down where the page is getting used and have a
go at getting in there first. It must be pretty early to be allocated
already when dmi_scan_machine gets called.

It's possible that the domain builder might have already allocated a PT
at this address. I haven't checked but I think currently the domain
builder always puts PT pages after the kernel so hopefully it's only a
theoretical problem.
  


Yes, it does.  And presumably the early pagetable builder is guaranteed 
to avoid special memory like the DMI space.  But the bug definitely 
seems to be a result of the DMI code trying to make a RW mapping of a 
pagetable page, so something is amiss there.


Ooh, sleazy hack idea: make DMI always map RO, so even if it does get a 
pagetable it causes no complaint...  A bit awkward, since there doesn't 
seem to be an RO form of early_ioremap.



Another option I was thinking of was a command line option to disable
DMI, which (maybe) isn't terribly useful in itself but it introduces an
associated variable to frob with. That's similar to how the TSC was
handled in the past (well, the opposite since TSC was forced on).
  


Yep, that would work too.

Still curious about why a pagetable page is ending up in that range 
though.  Seems like it shouldn't be possible, since we shouldn't be 
allowed to allocate from those pages, at least until the DMI probe has 
happened...  Unless the early allocator is only excluded from e820 
reserved pages, which would cause a problem on systems which don't 
reserve the DMI space...  HPA?


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] xen: Implement getgeo for Xen virtual block device.

2008-02-21 Thread Jeremy Fitzhardinge

Linus Torvalds wrote:

On Thu, 21 Feb 2008, Jeremy Fitzhardinge wrote:
  

Signed-off-by: Ian Campbell <[EMAIL PROTECTED]>
From: Ian Campbell <[EMAIL PROTECTED]>
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>



This is just wrong. The From: goes at the *top*, and if it's not there, 
my scripts won't pick it up as the author. 


OK.  Have you fixed it, or shall I resend?

Thanks,
   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] xen: Implement getgeo for Xen virtual block device.

2008-02-21 Thread Jeremy Fitzhardinge

The below implements the getgeo hook for Xen block devices. Extracted
from the xen-unstable tree where it has been used for ages.

It is useful to have because it allows things like grub2 (used by the
Debian installer images) to work in a guest domain without having to
sprinkle Xen specific hacks around the place.

Signed-off-by: Ian Campbell <[EMAIL PROTECTED]>
From: Ian Campbell <[EMAIL PROTECTED]>
Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>

---
drivers/block/xen-blkfront.c |   18 ++
1 file changed, 18 insertions(+)

===
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -37,6 +37,7 @@

#include 
#include 
+#include 
#include 

#include 
@@ -134,6 +135,22 @@ static void blkif_restart_queue_callback
{
struct blkfront_info *info = (struct blkfront_info *)arg;
schedule_work(>work);
+}
+
+int blkif_getgeo(struct block_device *bd, struct hd_geometry *hg)
+{
+   /* We don't have real geometry info, but let's at least return
+  values consistent with the size of the device */
+   sector_t nsect = get_capacity(bd->bd_disk);
+   sector_t cylinders = nsect;
+
+   hg->heads = 0xff;
+   hg->sectors = 0x3f;
+   sector_div(cylinders, hg->heads * hg->sectors);
+   hg->cylinders = cylinders;
+   if ((sector_t)(hg->cylinders + 1) * hg->heads * hg->sectors < nsect)
+   hg->cylinders = 0x;
+   return 0;
}

/*
@@ -946,6 +963,7 @@ static struct block_device_operations xl
.owner = THIS_MODULE,
.open = blkif_open,
.release = blkif_release,
+   .getgeo = blkif_getgeo,
};



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] xen pvfb: Para-virtual framebuffer, keyboard and pointer driver

2008-02-21 Thread Jeremy Fitzhardinge

Markus Armbruster wrote:

This is a pair of Xen para-virtual frontend device drivers:
drivers/video/xen-fbfront.c provides a framebuffer, and
drivers/input/xen-kbdfront provides keyboard and mouse.
  


Unless they're actually inter-dependent, could you post this as two 
separate patches?  I don't know anything about these parts of the 
kernel, so it would be nice to make it very obvious which changes are fb 
vs mouse/keyboard.


(I guess input/* vs video/* should make it obvious, but it looks like 
input has a config dependency on fb, so I'll avoid making too many 
presumptions...)


(Couple of comments below)

   J


The backends run in dom0 user space.

Signed-off-by: Markus Armbruster <[EMAIL PROTECTED]>

---

 drivers/input/Kconfig|9 
 drivers/input/Makefile   |2 
 drivers/input/xen-kbdfront.c |  337 +++
 drivers/video/Kconfig|   14 
 drivers/video/Makefile   |1 
 drivers/video/xen-fbfront.c  |  550 +++

 include/xen/interface/io/fbif.h  |  124 
 include/xen/interface/io/kbdif.h |  114 
 8 files changed, 1151 insertions(+)

diff --git a/drivers/input/Kconfig b/drivers/input/Kconfig
index 9dea14d..5f9d860 100644
--- a/drivers/input/Kconfig
+++ b/drivers/input/Kconfig
@@ -149,6 +149,15 @@ config INPUT_APMPOWER
  To compile this driver as a module, choose M here: the
  module will be called apm-power.
 
+config XEN_KBDDEV_FRONTEND

+   tristate "Xen virtual keyboard and mouse support"
+   depends on XEN_FBDEV_FRONTEND
+   default y
+   help
+ This driver implements the front-end of the Xen virtual
+ keyboard and mouse device driver.  It communicates with a back-end
+ in another domain.
+
 comment "Input Device Drivers"
 
 source "drivers/input/keyboard/Kconfig"

diff --git a/drivers/input/Makefile b/drivers/input/Makefile
index 2ae87b1..98c4f9a 100644
--- a/drivers/input/Makefile
+++ b/drivers/input/Makefile
@@ -23,3 +23,5 @@ obj-$(CONFIG_INPUT_TOUCHSCREEN)   += touchscreen/
 obj-$(CONFIG_INPUT_MISC)   += misc/
 
 obj-$(CONFIG_INPUT_APMPOWER)	+= apm-power.o

+
+obj-$(CONFIG_XEN_KBDDEV_FRONTEND)  += xen-kbdfront.o
diff --git a/drivers/input/xen-kbdfront.c b/drivers/input/xen-kbdfront.c
new file mode 100644
index 000..84f65cf
--- /dev/null
+++ b/drivers/input/xen-kbdfront.c
@@ -0,0 +1,337 @@
+/*
+ * Xen para-virtual input device
+ *
+ * Copyright (C) 2005 Anthony Liguori <[EMAIL PROTECTED]>
+ * Copyright (C) 2006-2008 Red Hat, Inc., Markus Armbruster <[EMAIL PROTECTED]>
+ *
+ *  Based on linux/drivers/input/mouse/sermouse.c
+ *
+ *  This file is subject to the terms and conditions of the GNU General Public
+ *  License. See the file COPYING in the main directory of this archive for
+ *  more details.
+ */
+
+/*
+ * TODO:
+ *
+ * Switch to grant tables together with xen-fbfront.c.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct xenkbd_info {
+   struct input_dev *kbd;
+   struct input_dev *ptr;
+   struct xenkbd_page *page;
+   int evtchn, irq;
+   struct xenbus_device *xbdev;
+   char phys[32];
+};
+
+static int xenkbd_remove(struct xenbus_device *);
+static int xenkbd_connect_backend(struct xenbus_device *, struct xenkbd_info 
*);
+static void xenkbd_disconnect_backend(struct xenkbd_info *);
+
+/*
+ * Note: if you need to send out events, see xenfb_do_update() for how
+ * to do that.
+ */
+
+static irqreturn_t input_handler(int rq, void *dev_id)
+{
+   struct xenkbd_info *info = dev_id;
+   struct xenkbd_page *page = info->page;
+   __u32 cons, prod;
+
+   prod = page->in_prod;
+   if (prod == page->in_cons)
+   return IRQ_HANDLED;
+   rmb();  /* ensure we see ring contents up to prod */
+   for (cons = page->in_cons; cons != prod; cons++) {
+   union xenkbd_in_event *event;
+   struct input_dev *dev;
+   event = _IN_RING_REF(page, cons);
+
+   dev = info->ptr;
+   switch (event->type) {
+   case XENKBD_TYPE_MOTION:
+   input_report_rel(dev, REL_X, event->motion.rel_x);
+   input_report_rel(dev, REL_Y, event->motion.rel_y);
+   break;
+   case XENKBD_TYPE_KEY:
+   dev = NULL;
+   if (test_bit(event->key.keycode, info->kbd->keybit))
+   dev = info->kbd;
+   if (test_bit(event->key.keycode, info->ptr->keybit))
+   dev = info->ptr;
+   if (dev)
+   input_report_key(dev, event->key.keycode,
+event->key.pressed);
+   else
+   printk(KERN_WARNING
+  

Re: [PATCH 0/2] xen pvfb: Para-virtual framebuffer, keyboard and pointer

2008-02-21 Thread Jeremy Fitzhardinge

Markus Armbruster wrote:

Forgot to mention: This patch depends on

Subject: [PATCH] xen: Make xen-blkfront write its protocol ABI to xenstore
From: Markus Armbruster <>
Date: Thu, 06 Dec 2007 14:45:53 +0100

http://lkml.org/lkml/2007/12/6/132

Sorry!


Sorry, I haven't pushed this upstream yet, since there didn't seem to be 
any particular urgency.  What's the dependency?


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/11] xen: add missing definitions for xen grant table which ia64/xen needs.

2008-02-21 Thread Jeremy Fitzhardinge

[EMAIL PROTECTED] wrote:

Yep.  We removed the guest handle stuff for the initial upstreaming, 
since it isn't necessary on x86 and it quietened some of the reviewer 
noise.  But I expected we'd need to reintroduce it at some stage.


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 06/11] xen: move arch/x86/xen/events.c undedr drivers/xen and split out arch specific part.

2008-02-21 Thread Jeremy Fitzhardinge

[EMAIL PROTECTED] wrote:

diff --git a/arch/x86/xen/events.c b/drivers/xen/events.c
similarity index 95%
rename from arch/x86/xen/events.c
rename to drivers/xen/events.c
index dcf613e..7474739 100644
--- a/arch/x86/xen/events.c
+++ b/drivers/xen/events.c
@@ -37,7 +37,9 @@
 #include 
 #include 
 
-#include "xen-ops.h"

+#ifdef CONFIG_X86
+# include "../arch/x86/xen/xen-ops.h"
+#endif


Hm.  Perhaps it would be better to move whatever definition you need 
into a header in a common place (or move xen-ops.h entirely).


   J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >