subject:"\[RFC PATCH v2\] memory\-barriers\: remove smp_mb__after_unlock

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-31 Thread Paul Mackerras

On Thu, Jul 16, 2015 at 03:14:55PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2015-07-16 at 15:03 +1000, Benjamin Herrenschmidt wrote:
> > On Thu, 2015-07-16 at 12:00 +1000, Michael Ellerman wrote:
> > > That would fix the problem with smp_mb__after_unlock_lock(), but not
> > > the original worry we had about loads happening before the SC in lock.
> > 
> > However I think isync fixes *that* :-) The problem with isync is as you
> > said, it's not a -memory- barrier per-se, it's an execution barrier /
> > context synchronizing instruction. The combination stwcx. + bne + isync
> > however prevents the execution of anything past the isync until the
> > stwcx has completed and the bne has been "decided", which prevents loads
> > from leaking into the LL/SC loop. It will also prevent a store in the
> > lock from being issued before the stwcx. has completed. It does *not*
> > prevent as far as I can tell another unrelated store before the lock
> > from leaking into the lock, including the one used to unlock a different
> > lock.
> 
> Except that the architecture says:
> 
> <<
> Because a Store Conditional instruction may com-
> plete before its store has been performed, a condi-
> tional Branch instruction that depends on the CR0
> value set by a Store Conditional instruction does
> not order the Store Conditional's store with respect
> to storage accesses caused by instructions that
> follow the Branch
> >>
> 
> So isync in lock in architecturally incorrect, despite being what the
> architecture recommends using, yay !

That paragraph doesn't say anything about isync.  Doing stcx., bne,
isync to acquire a lock will ensure that the instructions inside the
locked region can't execute until the lock is globally known to be
acquired.  It is true that another CPU that doesn't take the lock
could see the effect of some load or store inside the locked region
and then see the lock variable itself being clear; but any CPU that
tried to acquire the lock would have its stcx. fail (at least until
the first CPU releases the lock).

So isync in the lock sequence is architecturally correct.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-31 Thread Paul Mackerras

On Thu, Jul 16, 2015 at 03:14:55PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2015-07-16 at 15:03 +1000, Benjamin Herrenschmidt wrote:
> > On Thu, 2015-07-16 at 12:00 +1000, Michael Ellerman wrote:
> > > That would fix the problem with smp_mb__after_unlock_lock(), but not
> > > the original worry we had about loads happening before the SC in lock.
> > 
> > However I think isync fixes *that* :-) The problem with isync is as you
> > said, it's not a -memory- barrier per-se, it's an execution barrier /
> > context synchronizing instruction. The combination stwcx. + bne + isync
> > however prevents the execution of anything past the isync until the
> > stwcx has completed and the bne has been "decided", which prevents loads
> > from leaking into the LL/SC loop. It will also prevent a store in the
> > lock from being issued before the stwcx. has completed. It does *not*
> > prevent as far as I can tell another unrelated store before the lock
> > from leaking into the lock, including the one used to unlock a different
> > lock.
> 
> Except that the architecture says:
> 
> <<
> Because a Store Conditional instruction may com-
> plete before its store has been performed, a condi-
> tional Branch instruction that depends on the CR0
> value set by a Store Conditional instruction does
> not order the Store Conditional's store with respect
> to storage accesses caused by instructions that
> follow the Branch
> >>
> 
> So isync in lock in architecturally incorrect, despite being what the
> architecture recommends using, yay !

That paragraph doesn't say anything about isync.  Doing stcx., bne,
isync to acquire a lock will ensure that the instructions inside the
locked region can't execute until the lock is globally known to be
acquired.  It is true that another CPU that doesn't take the lock
could see the effect of some load or store inside the locked region
and then see the lock variable itself being clear; but any CPU that
tried to acquire the lock would have its stcx. fail (at least until
the first CPU releases the lock).

So isync in the lock sequence is architecturally correct.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-25 Thread Michael Ellerman

On Tue, 2015-08-25 at 17:27 -0700, Paul E. McKenney wrote:
> On Thu, Aug 20, 2015 at 04:56:04PM +0100, Will Deacon wrote:
> > On Thu, Aug 20, 2015 at 10:45:05AM +0100, Michael Ellerman wrote:
> > > On Tue, 2015-08-18 at 09:37 +0100, Will Deacon wrote:
> > > > 
> > > > Thanks, that sounds great. FWIW, there are multiple ways of implementing
> > > > the patch (i.e. whether you strengthen lock or unlock). I had a crack at
> > > > something here, but it's not tested:
> > > > 
> > > >   http://marc.info/?l=linux-arch=143758379023849=2
> > > 
> > > I notice you are not changing PPC_RELEASE_BARRIER, but only the spin 
> > > unlock
> > > code. But from my reading of the docs we need to make sure any 
> > > UNLOCK+LOCK is a
> > > full barrier, not just spin unlock/lock?
> > > 
> > > So don't we need to worry about some of the other locks as well? At least
> > > rwlock, and mutex fast path?
> > 
> > Hmm, that's a good question. I notice that you don't do any of the SYNC_IO
> > stuff for any locks other than spinlocks but I don't know whether
> > smp_mb__after_unlock_lock is similarly limited in scope.
> > 
> > Paul?
> 
> I would expect the various locks to have similar ordering characteristics.
> 
> Or am I missing something subtle here?

I don't think so.

The docs just talk about ACQUIRE/RELEASE, so I think it needs to apply to all
lock types. Or at least the list mentioned in the docs which is:

 (*) spin locks
 (*) R/W spin locks
 (*) mutexes
 (*) semaphores
 (*) R/W semaphores
 (*) RCU

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-25 Thread Paul E. McKenney

On Thu, Aug 20, 2015 at 04:56:04PM +0100, Will Deacon wrote:
> On Thu, Aug 20, 2015 at 10:45:05AM +0100, Michael Ellerman wrote:
> > On Tue, 2015-08-18 at 09:37 +0100, Will Deacon wrote:
> > > On Tue, Aug 18, 2015 at 02:50:55AM +0100, Michael Ellerman wrote:
> > > > On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote:
> > > > > On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
> > > > > > On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
> > > > > > > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
> > > > > > > I thought the end result of this thread was that we didn't *need* 
> > > > > > > to change the
> > > > > > > powerpc lock semantics? Or did I read it wrong?
> > > > > > > 
> > > > > > > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, 
> > > > > > > which is
> > > > > > > consistent with our current implementation.
> > > > > > 
> > > > > > That change happened about 1.5 years ago, and I thought that the
> > > > > > current discussion was about reversing it, based in part on the
> > > > > > recent powerpc benchmarks of locking primitives with and without the
> > > > > > sync instruction.  But regardless, I clearly cannot remove either 
> > > > > > the
> > > > > > smp_mb__after_unlock_lock() or the powerpc definition of it to be 
> > > > > > smp_mb()
> > > > > > if powerpc unlock/lock is not strengthened.
> > > > > 
> > > > > Yup. Peter and I would really like to get rid of 
> > > > > smp_mb__after_unlock_lock
> > > > > entirely, which would mean strengthening the ppc spinlocks. Moving the
> > > > > barrier primitive into RCU is a good step to prevent more widespread 
> > > > > usage
> > > > > of the barrier, but we'd really like to go further if the performance 
> > > > > impact
> > > > > is deemed acceptable (which is what this thread is about).
> > > > 
> > > > OK, sorry for completely missing the point, too many balls in the air 
> > > > here.
> > > 
> > > No problem!
> > > 
> > > > I'll do some benchmarks and see what we come up with.
> > > 
> > > Thanks, that sounds great. FWIW, there are multiple ways of implementing
> > > the patch (i.e. whether you strengthen lock or unlock). I had a crack at
> > > something here, but it's not tested:
> > > 
> > >   http://marc.info/?l=linux-arch=143758379023849=2
> > 
> > Thanks.
> > 
> > I notice you are not changing PPC_RELEASE_BARRIER, but only the spin unlock
> > code. But from my reading of the docs we need to make sure any UNLOCK+LOCK 
> > is a
> > full barrier, not just spin unlock/lock?
> > 
> > So don't we need to worry about some of the other locks as well? At least
> > rwlock, and mutex fast path?
> 
> Hmm, that's a good question. I notice that you don't do any of the SYNC_IO
> stuff for any locks other than spinlocks but I don't know whether
> smp_mb__after_unlock_lock is similarly limited in scope.
> 
> Paul?

I would expect the various locks to have similar ordering characteristics.

Or am I missing something subtle here?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-25 Thread Michael Ellerman

On Tue, 2015-08-25 at 17:27 -0700, Paul E. McKenney wrote:
 On Thu, Aug 20, 2015 at 04:56:04PM +0100, Will Deacon wrote:
  On Thu, Aug 20, 2015 at 10:45:05AM +0100, Michael Ellerman wrote:
   On Tue, 2015-08-18 at 09:37 +0100, Will Deacon wrote:

Thanks, that sounds great. FWIW, there are multiple ways of implementing
the patch (i.e. whether you strengthen lock or unlock). I had a crack at
something here, but it's not tested:

  http://marc.info/?l=linux-archm=143758379023849w=2
   
   I notice you are not changing PPC_RELEASE_BARRIER, but only the spin 
   unlock
   code. But from my reading of the docs we need to make sure any 
   UNLOCK+LOCK is a
   full barrier, not just spin unlock/lock?
   
   So don't we need to worry about some of the other locks as well? At least
   rwlock, and mutex fast path?
  
  Hmm, that's a good question. I notice that you don't do any of the SYNC_IO
  stuff for any locks other than spinlocks but I don't know whether
  smp_mb__after_unlock_lock is similarly limited in scope.
  
  Paul?
 
 I would expect the various locks to have similar ordering characteristics.
 
 Or am I missing something subtle here?

I don't think so.

The docs just talk about ACQUIRE/RELEASE, so I think it needs to apply to all
lock types. Or at least the list mentioned in the docs which is:

 (*) spin locks
 (*) R/W spin locks
 (*) mutexes
 (*) semaphores
 (*) R/W semaphores
 (*) RCU

cheers


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-25 Thread Paul E. McKenney

On Thu, Aug 20, 2015 at 04:56:04PM +0100, Will Deacon wrote:
 On Thu, Aug 20, 2015 at 10:45:05AM +0100, Michael Ellerman wrote:
  On Tue, 2015-08-18 at 09:37 +0100, Will Deacon wrote:
   On Tue, Aug 18, 2015 at 02:50:55AM +0100, Michael Ellerman wrote:
On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote:
 On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
  On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
   On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
   I thought the end result of this thread was that we didn't *need* 
   to change the
   powerpc lock semantics? Or did I read it wrong?
   
   ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, 
   which is
   consistent with our current implementation.
  
  That change happened about 1.5 years ago, and I thought that the
  current discussion was about reversing it, based in part on the
  recent powerpc benchmarks of locking primitives with and without the
  sync instruction.  But regardless, I clearly cannot remove either 
  the
  smp_mb__after_unlock_lock() or the powerpc definition of it to be 
  smp_mb()
  if powerpc unlock/lock is not strengthened.
 
 Yup. Peter and I would really like to get rid of 
 smp_mb__after_unlock_lock
 entirely, which would mean strengthening the ppc spinlocks. Moving the
 barrier primitive into RCU is a good step to prevent more widespread 
 usage
 of the barrier, but we'd really like to go further if the performance 
 impact
 is deemed acceptable (which is what this thread is about).

OK, sorry for completely missing the point, too many balls in the air 
here.
   
   No problem!
   
I'll do some benchmarks and see what we come up with.
   
   Thanks, that sounds great. FWIW, there are multiple ways of implementing
   the patch (i.e. whether you strengthen lock or unlock). I had a crack at
   something here, but it's not tested:
   
 http://marc.info/?l=linux-archm=143758379023849w=2
  
  Thanks.
  
  I notice you are not changing PPC_RELEASE_BARRIER, but only the spin unlock
  code. But from my reading of the docs we need to make sure any UNLOCK+LOCK 
  is a
  full barrier, not just spin unlock/lock?
  
  So don't we need to worry about some of the other locks as well? At least
  rwlock, and mutex fast path?
 
 Hmm, that's a good question. I notice that you don't do any of the SYNC_IO
 stuff for any locks other than spinlocks but I don't know whether
 smp_mb__after_unlock_lock is similarly limited in scope.
 
 Paul?

I would expect the various locks to have similar ordering characteristics.

Or am I missing something subtle here?

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-20 Thread Will Deacon

On Thu, Aug 20, 2015 at 10:45:05AM +0100, Michael Ellerman wrote:
> On Tue, 2015-08-18 at 09:37 +0100, Will Deacon wrote:
> > On Tue, Aug 18, 2015 at 02:50:55AM +0100, Michael Ellerman wrote:
> > > On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote:
> > > > On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
> > > > > On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
> > > > > > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
> > > > > > I thought the end result of this thread was that we didn't *need* 
> > > > > > to change the
> > > > > > powerpc lock semantics? Or did I read it wrong?
> > > > > > 
> > > > > > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, 
> > > > > > which is
> > > > > > consistent with our current implementation.
> > > > > 
> > > > > That change happened about 1.5 years ago, and I thought that the
> > > > > current discussion was about reversing it, based in part on the
> > > > > recent powerpc benchmarks of locking primitives with and without the
> > > > > sync instruction.  But regardless, I clearly cannot remove either the
> > > > > smp_mb__after_unlock_lock() or the powerpc definition of it to be 
> > > > > smp_mb()
> > > > > if powerpc unlock/lock is not strengthened.
> > > > 
> > > > Yup. Peter and I would really like to get rid of 
> > > > smp_mb__after_unlock_lock
> > > > entirely, which would mean strengthening the ppc spinlocks. Moving the
> > > > barrier primitive into RCU is a good step to prevent more widespread 
> > > > usage
> > > > of the barrier, but we'd really like to go further if the performance 
> > > > impact
> > > > is deemed acceptable (which is what this thread is about).
> > > 
> > > OK, sorry for completely missing the point, too many balls in the air 
> > > here.
> > 
> > No problem!
> > 
> > > I'll do some benchmarks and see what we come up with.
> > 
> > Thanks, that sounds great. FWIW, there are multiple ways of implementing
> > the patch (i.e. whether you strengthen lock or unlock). I had a crack at
> > something here, but it's not tested:
> > 
> >   http://marc.info/?l=linux-arch=143758379023849=2
> 
> Thanks.
> 
> I notice you are not changing PPC_RELEASE_BARRIER, but only the spin unlock
> code. But from my reading of the docs we need to make sure any UNLOCK+LOCK is 
> a
> full barrier, not just spin unlock/lock?
> 
> So don't we need to worry about some of the other locks as well? At least
> rwlock, and mutex fast path?

Hmm, that's a good question. I notice that you don't do any of the SYNC_IO
stuff for any locks other than spinlocks but I don't know whether
smp_mb__after_unlock_lock is similarly limited in scope.

Paul?

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-20 Thread Michael Ellerman

On Tue, 2015-08-18 at 09:37 +0100, Will Deacon wrote:
> On Tue, Aug 18, 2015 at 02:50:55AM +0100, Michael Ellerman wrote:
> > On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote:
> > > On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
> > > > On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
> > > > > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
> > > > > I thought the end result of this thread was that we didn't *need* to 
> > > > > change the
> > > > > powerpc lock semantics? Or did I read it wrong?
> > > > > 
> > > > > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, 
> > > > > which is
> > > > > consistent with our current implementation.
> > > > 
> > > > That change happened about 1.5 years ago, and I thought that the
> > > > current discussion was about reversing it, based in part on the
> > > > recent powerpc benchmarks of locking primitives with and without the
> > > > sync instruction.  But regardless, I clearly cannot remove either the
> > > > smp_mb__after_unlock_lock() or the powerpc definition of it to be 
> > > > smp_mb()
> > > > if powerpc unlock/lock is not strengthened.
> > > 
> > > Yup. Peter and I would really like to get rid of smp_mb__after_unlock_lock
> > > entirely, which would mean strengthening the ppc spinlocks. Moving the
> > > barrier primitive into RCU is a good step to prevent more widespread usage
> > > of the barrier, but we'd really like to go further if the performance 
> > > impact
> > > is deemed acceptable (which is what this thread is about).
> > 
> > OK, sorry for completely missing the point, too many balls in the air here.
> 
> No problem!
> 
> > I'll do some benchmarks and see what we come up with.
> 
> Thanks, that sounds great. FWIW, there are multiple ways of implementing
> the patch (i.e. whether you strengthen lock or unlock). I had a crack at
> something here, but it's not tested:
> 
>   http://marc.info/?l=linux-arch=143758379023849=2

Thanks.

I notice you are not changing PPC_RELEASE_BARRIER, but only the spin unlock
code. But from my reading of the docs we need to make sure any UNLOCK+LOCK is a
full barrier, not just spin unlock/lock?

So don't we need to worry about some of the other locks as well? At least
rwlock, and mutex fast path?

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-20 Thread Michael Ellerman

On Tue, 2015-08-18 at 09:37 +0100, Will Deacon wrote:
 On Tue, Aug 18, 2015 at 02:50:55AM +0100, Michael Ellerman wrote:
  On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote:
   On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
 On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
 I thought the end result of this thread was that we didn't *need* to 
 change the
 powerpc lock semantics? Or did I read it wrong?
 
 ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, 
 which is
 consistent with our current implementation.

That change happened about 1.5 years ago, and I thought that the
current discussion was about reversing it, based in part on the
recent powerpc benchmarks of locking primitives with and without the
sync instruction.  But regardless, I clearly cannot remove either the
smp_mb__after_unlock_lock() or the powerpc definition of it to be 
smp_mb()
if powerpc unlock/lock is not strengthened.
   
   Yup. Peter and I would really like to get rid of smp_mb__after_unlock_lock
   entirely, which would mean strengthening the ppc spinlocks. Moving the
   barrier primitive into RCU is a good step to prevent more widespread usage
   of the barrier, but we'd really like to go further if the performance 
   impact
   is deemed acceptable (which is what this thread is about).
  
  OK, sorry for completely missing the point, too many balls in the air here.
 
 No problem!
 
  I'll do some benchmarks and see what we come up with.
 
 Thanks, that sounds great. FWIW, there are multiple ways of implementing
 the patch (i.e. whether you strengthen lock or unlock). I had a crack at
 something here, but it's not tested:
 
   http://marc.info/?l=linux-archm=143758379023849w=2

Thanks.

I notice you are not changing PPC_RELEASE_BARRIER, but only the spin unlock
code. But from my reading of the docs we need to make sure any UNLOCK+LOCK is a
full barrier, not just spin unlock/lock?

So don't we need to worry about some of the other locks as well? At least
rwlock, and mutex fast path?

cheers


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-20 Thread Will Deacon

On Thu, Aug 20, 2015 at 10:45:05AM +0100, Michael Ellerman wrote:
 On Tue, 2015-08-18 at 09:37 +0100, Will Deacon wrote:
  On Tue, Aug 18, 2015 at 02:50:55AM +0100, Michael Ellerman wrote:
   On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote:
On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
 On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
  On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
  I thought the end result of this thread was that we didn't *need* 
  to change the
  powerpc lock semantics? Or did I read it wrong?
  
  ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, 
  which is
  consistent with our current implementation.
 
 That change happened about 1.5 years ago, and I thought that the
 current discussion was about reversing it, based in part on the
 recent powerpc benchmarks of locking primitives with and without the
 sync instruction.  But regardless, I clearly cannot remove either the
 smp_mb__after_unlock_lock() or the powerpc definition of it to be 
 smp_mb()
 if powerpc unlock/lock is not strengthened.

Yup. Peter and I would really like to get rid of 
smp_mb__after_unlock_lock
entirely, which would mean strengthening the ppc spinlocks. Moving the
barrier primitive into RCU is a good step to prevent more widespread 
usage
of the barrier, but we'd really like to go further if the performance 
impact
is deemed acceptable (which is what this thread is about).
   
   OK, sorry for completely missing the point, too many balls in the air 
   here.
  
  No problem!
  
   I'll do some benchmarks and see what we come up with.
  
  Thanks, that sounds great. FWIW, there are multiple ways of implementing
  the patch (i.e. whether you strengthen lock or unlock). I had a crack at
  something here, but it's not tested:
  
http://marc.info/?l=linux-archm=143758379023849w=2
 
 Thanks.
 
 I notice you are not changing PPC_RELEASE_BARRIER, but only the spin unlock
 code. But from my reading of the docs we need to make sure any UNLOCK+LOCK is 
 a
 full barrier, not just spin unlock/lock?
 
 So don't we need to worry about some of the other locks as well? At least
 rwlock, and mutex fast path?

Hmm, that's a good question. I notice that you don't do any of the SYNC_IO
stuff for any locks other than spinlocks but I don't know whether
smp_mb__after_unlock_lock is similarly limited in scope.

Paul?

Will
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-18 Thread Will Deacon

On Tue, Aug 18, 2015 at 02:50:55AM +0100, Michael Ellerman wrote:
> On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote:
> > On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
> > > On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
> > > > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
> > > > I thought the end result of this thread was that we didn't *need* to 
> > > > change the
> > > > powerpc lock semantics? Or did I read it wrong?
> > > > 
> > > > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which 
> > > > is
> > > > consistent with our current implementation.
> > > 
> > > That change happened about 1.5 years ago, and I thought that the
> > > current discussion was about reversing it, based in part on the
> > > recent powerpc benchmarks of locking primitives with and without the
> > > sync instruction.  But regardless, I clearly cannot remove either the
> > > smp_mb__after_unlock_lock() or the powerpc definition of it to be smp_mb()
> > > if powerpc unlock/lock is not strengthened.
> > 
> > Yup. Peter and I would really like to get rid of smp_mb__after_unlock_lock
> > entirely, which would mean strengthening the ppc spinlocks. Moving the
> > barrier primitive into RCU is a good step to prevent more widespread usage
> > of the barrier, but we'd really like to go further if the performance impact
> > is deemed acceptable (which is what this thread is about).
> 
> OK, sorry for completely missing the point, too many balls in the air here.

No problem!

> I'll do some benchmarks and see what we come up with.

Thanks, that sounds great. FWIW, there are multiple ways of implementing
the patch (i.e. whether you strengthen lock or unlock). I had a crack at
something here, but it's not tested:

  http://marc.info/?l=linux-arch=143758379023849=2

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-18 Thread Will Deacon

On Tue, Aug 18, 2015 at 02:50:55AM +0100, Michael Ellerman wrote:
 On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote:
  On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
   On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
I thought the end result of this thread was that we didn't *need* to 
change the
powerpc lock semantics? Or did I read it wrong?

ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which 
is
consistent with our current implementation.
   
   That change happened about 1.5 years ago, and I thought that the
   current discussion was about reversing it, based in part on the
   recent powerpc benchmarks of locking primitives with and without the
   sync instruction.  But regardless, I clearly cannot remove either the
   smp_mb__after_unlock_lock() or the powerpc definition of it to be smp_mb()
   if powerpc unlock/lock is not strengthened.
  
  Yup. Peter and I would really like to get rid of smp_mb__after_unlock_lock
  entirely, which would mean strengthening the ppc spinlocks. Moving the
  barrier primitive into RCU is a good step to prevent more widespread usage
  of the barrier, but we'd really like to go further if the performance impact
  is deemed acceptable (which is what this thread is about).
 
 OK, sorry for completely missing the point, too many balls in the air here.

No problem!

 I'll do some benchmarks and see what we come up with.

Thanks, that sounds great. FWIW, there are multiple ways of implementing
the patch (i.e. whether you strengthen lock or unlock). I had a crack at
something here, but it's not tested:

  http://marc.info/?l=linux-archm=143758379023849w=2

Will
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-17 Thread Michael Ellerman

On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote:
> On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
> > On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
> > > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
> > > I thought the end result of this thread was that we didn't *need* to 
> > > change the
> > > powerpc lock semantics? Or did I read it wrong?
> > > 
> > > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which is
> > > consistent with our current implementation.
> > 
> > That change happened about 1.5 years ago, and I thought that the
> > current discussion was about reversing it, based in part on the
> > recent powerpc benchmarks of locking primitives with and without the
> > sync instruction.  But regardless, I clearly cannot remove either the
> > smp_mb__after_unlock_lock() or the powerpc definition of it to be smp_mb()
> > if powerpc unlock/lock is not strengthened.
> 
> Yup. Peter and I would really like to get rid of smp_mb__after_unlock_lock
> entirely, which would mean strengthening the ppc spinlocks. Moving the
> barrier primitive into RCU is a good step to prevent more widespread usage
> of the barrier, but we'd really like to go further if the performance impact
> is deemed acceptable (which is what this thread is about).

OK, sorry for completely missing the point, too many balls in the air here.

I'll do some benchmarks and see what we come up with.

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-17 Thread Will Deacon

On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
> On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
> > On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
> > > > On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
> > > > > On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
> > > > > > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
> > > > > > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > > > > > > > > Author: Paul E. McKenney 
> > > > > > > > > Date:   Tue Jul 14 18:35:23 2015 -0700
> > > > > > > > > 
> > > > > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > > > > > > > > 
> > > > > > > > > RCU is the only thing that uses 
> > > > > > > > > smp_mb__after_unlock_lock(), and is
> > > > > > > > > likely the only thing that ever will use it, so this 
> > > > > > > > > commit makes this
> > > > > > > > > macro private to RCU.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Paul E. McKenney 
> > > > > > > > > 
> > > > > > > > > Cc: Will Deacon 
> > > > > > > > > Cc: Peter Zijlstra 
> > > > > > > > > Cc: Benjamin Herrenschmidt 
> > > > > > > > > Cc: "linux-a...@vger.kernel.org" 
> > > > > > > > > 
> > > > > > 
> > > > > > Are you planning to queue this somewhere? I think it makes sense 
> > > > > > regardless
> > > > > > of whether we change PowerPc or not and ideally it would be merged 
> > > > > > around
> > > > > > the same time as my relaxed atomics series.
> > > > > 
> > > > > I have is in -rcu.  By default, I will push it to the 4.4 merge 
> > > > > window.
> > > > > Please let me know if you need it sooner.
> > > > 
> > > > The generic relaxed atomics are now queued in -tip, so it would be 
> > > > really
> > > > good to see this Documentation update land in 4.3 if at all possible. I
> > > > appreciate it's late in the cycle, but it's always worth asking.
> > > 
> > > Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
> > > commit, and if it passes a few day's worth of testing, I will see what
> > > Ingo has to say about a pull request.
> > > 
> > > This commit also privatizes smp_mb__after_unlock_lock() as well as
> > > updating documentation.  Looks like we need to strengthen powerpc's
> > > locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
> > > Or did that already happen and I just missed it?
> > 
> > No it didn't.
> > 
> > I thought the end result of this thread was that we didn't *need* to change 
> > the
> > powerpc lock semantics? Or did I read it wrong?
> > 
> > ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which is
> > consistent with our current implementation.
> 
> That change happened about 1.5 years ago, and I thought that the
> current discussion was about reversing it, based in part on the
> recent powerpc benchmarks of locking primitives with and without the
> sync instruction.  But regardless, I clearly cannot remove either the
> smp_mb__after_unlock_lock() or the powerpc definition of it to be smp_mb()
> if powerpc unlock/lock is not strengthened.

Yup. Peter and I would really like to get rid of smp_mb__after_unlock_lock
entirely, which would mean strengthening the ppc spinlocks. Moving the
barrier primitive into RCU is a good step to prevent more widespread usage
of the barrier, but we'd really like to go further if the performance impact
is deemed acceptable (which is what this thread is about).

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-17 Thread Paul E. McKenney

On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
> On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
> > On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
> > > Hello Paul,
> > > 
> > > On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
> > > > On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
> > > > > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
> > > > > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > > > > > > > Author: Paul E. McKenney 
> > > > > > > > Date:   Tue Jul 14 18:35:23 2015 -0700
> > > > > > > > 
> > > > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > > > > > > > 
> > > > > > > > RCU is the only thing that uses 
> > > > > > > > smp_mb__after_unlock_lock(), and is
> > > > > > > > likely the only thing that ever will use it, so this commit 
> > > > > > > > makes this
> > > > > > > > macro private to RCU.
> > > > > > > > 
> > > > > > > > Signed-off-by: Paul E. McKenney 
> > > > > > > > Cc: Will Deacon 
> > > > > > > > Cc: Peter Zijlstra 
> > > > > > > > Cc: Benjamin Herrenschmidt 
> > > > > > > > Cc: "linux-a...@vger.kernel.org" 
> > > > > > > > 
> > > > > 
> > > > > Are you planning to queue this somewhere? I think it makes sense 
> > > > > regardless
> > > > > of whether we change PowerPc or not and ideally it would be merged 
> > > > > around
> > > > > the same time as my relaxed atomics series.
> > > > 
> > > > I have is in -rcu.  By default, I will push it to the 4.4 merge window.
> > > > Please let me know if you need it sooner.
> > > 
> > > The generic relaxed atomics are now queued in -tip, so it would be really
> > > good to see this Documentation update land in 4.3 if at all possible. I
> > > appreciate it's late in the cycle, but it's always worth asking.
> > 
> > Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
> > commit, and if it passes a few day's worth of testing, I will see what
> > Ingo has to say about a pull request.
> > 
> > This commit also privatizes smp_mb__after_unlock_lock() as well as
> > updating documentation.  Looks like we need to strengthen powerpc's
> > locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
> > Or did that already happen and I just missed it?
> 
> No it didn't.
> 
> I thought the end result of this thread was that we didn't *need* to change 
> the
> powerpc lock semantics? Or did I read it wrong?
> 
> ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which is
> consistent with our current implementation.

That change happened about 1.5 years ago, and I thought that the
current discussion was about reversing it, based in part on the
recent powerpc benchmarks of locking primitives with and without the
sync instruction.  But regardless, I clearly cannot remove either the
smp_mb__after_unlock_lock() or the powerpc definition of it to be smp_mb()
if powerpc unlock/lock is not strengthened.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-17 Thread Will Deacon

On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
 On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
  On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
   On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
 On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
  On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
 commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
 Author: Paul E. McKenney paul...@linux.vnet.ibm.com
 Date:   Tue Jul 14 18:35:23 2015 -0700
 
 rcu,locking: Privatize smp_mb__after_unlock_lock()
 
 RCU is the only thing that uses 
 smp_mb__after_unlock_lock(), and is
 likely the only thing that ever will use it, so this 
 commit makes this
 macro private to RCU.
 
 Signed-off-by: Paul E. McKenney 
 paul...@linux.vnet.ibm.com
 Cc: Will Deacon will.dea...@arm.com
 Cc: Peter Zijlstra pet...@infradead.org
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: linux-a...@vger.kernel.org 
 linux-a...@vger.kernel.org
  
  Are you planning to queue this somewhere? I think it makes sense 
  regardless
  of whether we change PowerPc or not and ideally it would be merged 
  around
  the same time as my relaxed atomics series.
 
 I have is in -rcu.  By default, I will push it to the 4.4 merge 
 window.
 Please let me know if you need it sooner.

The generic relaxed atomics are now queued in -tip, so it would be 
really
good to see this Documentation update land in 4.3 if at all possible. I
appreciate it's late in the cycle, but it's always worth asking.
   
   Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
   commit, and if it passes a few day's worth of testing, I will see what
   Ingo has to say about a pull request.
   
   This commit also privatizes smp_mb__after_unlock_lock() as well as
   updating documentation.  Looks like we need to strengthen powerpc's
   locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
   Or did that already happen and I just missed it?
  
  No it didn't.
  
  I thought the end result of this thread was that we didn't *need* to change 
  the
  powerpc lock semantics? Or did I read it wrong?
  
  ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which is
  consistent with our current implementation.
 
 That change happened about 1.5 years ago, and I thought that the
 current discussion was about reversing it, based in part on the
 recent powerpc benchmarks of locking primitives with and without the
 sync instruction.  But regardless, I clearly cannot remove either the
 smp_mb__after_unlock_lock() or the powerpc definition of it to be smp_mb()
 if powerpc unlock/lock is not strengthened.

Yup. Peter and I would really like to get rid of smp_mb__after_unlock_lock
entirely, which would mean strengthening the ppc spinlocks. Moving the
barrier primitive into RCU is a good step to prevent more widespread usage
of the barrier, but we'd really like to go further if the performance impact
is deemed acceptable (which is what this thread is about).

Will
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-17 Thread Michael Ellerman

On Mon, 2015-08-17 at 09:57 +0100, Will Deacon wrote:
 On Mon, Aug 17, 2015 at 07:15:01AM +0100, Paul E. McKenney wrote:
  On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
   On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
   I thought the end result of this thread was that we didn't *need* to 
   change the
   powerpc lock semantics? Or did I read it wrong?
   
   ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which is
   consistent with our current implementation.
  
  That change happened about 1.5 years ago, and I thought that the
  current discussion was about reversing it, based in part on the
  recent powerpc benchmarks of locking primitives with and without the
  sync instruction.  But regardless, I clearly cannot remove either the
  smp_mb__after_unlock_lock() or the powerpc definition of it to be smp_mb()
  if powerpc unlock/lock is not strengthened.
 
 Yup. Peter and I would really like to get rid of smp_mb__after_unlock_lock
 entirely, which would mean strengthening the ppc spinlocks. Moving the
 barrier primitive into RCU is a good step to prevent more widespread usage
 of the barrier, but we'd really like to go further if the performance impact
 is deemed acceptable (which is what this thread is about).

OK, sorry for completely missing the point, too many balls in the air here.

I'll do some benchmarks and see what we come up with.

cheers


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-17 Thread Paul E. McKenney

On Mon, Aug 17, 2015 at 02:06:07PM +1000, Michael Ellerman wrote:
 On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
  On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
   Hello Paul,
   
   On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
 On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
Author: Paul E. McKenney paul...@linux.vnet.ibm.com
Date:   Tue Jul 14 18:35:23 2015 -0700

rcu,locking: Privatize smp_mb__after_unlock_lock()

RCU is the only thing that uses 
smp_mb__after_unlock_lock(), and is
likely the only thing that ever will use it, so this commit 
makes this
macro private to RCU.

Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
Cc: Will Deacon will.dea...@arm.com
Cc: Peter Zijlstra pet...@infradead.org
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: linux-a...@vger.kernel.org 
linux-a...@vger.kernel.org
 
 Are you planning to queue this somewhere? I think it makes sense 
 regardless
 of whether we change PowerPc or not and ideally it would be merged 
 around
 the same time as my relaxed atomics series.

I have is in -rcu.  By default, I will push it to the 4.4 merge window.
Please let me know if you need it sooner.
   
   The generic relaxed atomics are now queued in -tip, so it would be really
   good to see this Documentation update land in 4.3 if at all possible. I
   appreciate it's late in the cycle, but it's always worth asking.
  
  Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
  commit, and if it passes a few day's worth of testing, I will see what
  Ingo has to say about a pull request.
  
  This commit also privatizes smp_mb__after_unlock_lock() as well as
  updating documentation.  Looks like we need to strengthen powerpc's
  locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
  Or did that already happen and I just missed it?
 
 No it didn't.
 
 I thought the end result of this thread was that we didn't *need* to change 
 the
 powerpc lock semantics? Or did I read it wrong?
 
 ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which is
 consistent with our current implementation.

That change happened about 1.5 years ago, and I thought that the
current discussion was about reversing it, based in part on the
recent powerpc benchmarks of locking primitives with and without the
sync instruction.  But regardless, I clearly cannot remove either the
smp_mb__after_unlock_lock() or the powerpc definition of it to be smp_mb()
if powerpc unlock/lock is not strengthened.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-16 Thread Michael Ellerman

On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
> On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
> > Hello Paul,
> > 
> > On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
> > > On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
> > > > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
> > > > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > > > > > > Author: Paul E. McKenney 
> > > > > > > Date:   Tue Jul 14 18:35:23 2015 -0700
> > > > > > > 
> > > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > > > > > > 
> > > > > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), 
> > > > > > > and is
> > > > > > > likely the only thing that ever will use it, so this commit 
> > > > > > > makes this
> > > > > > > macro private to RCU.
> > > > > > > 
> > > > > > > Signed-off-by: Paul E. McKenney 
> > > > > > > Cc: Will Deacon 
> > > > > > > Cc: Peter Zijlstra 
> > > > > > > Cc: Benjamin Herrenschmidt 
> > > > > > > Cc: "linux-a...@vger.kernel.org" 
> > > > 
> > > > Are you planning to queue this somewhere? I think it makes sense 
> > > > regardless
> > > > of whether we change PowerPc or not and ideally it would be merged 
> > > > around
> > > > the same time as my relaxed atomics series.
> > > 
> > > I have is in -rcu.  By default, I will push it to the 4.4 merge window.
> > > Please let me know if you need it sooner.
> > 
> > The generic relaxed atomics are now queued in -tip, so it would be really
> > good to see this Documentation update land in 4.3 if at all possible. I
> > appreciate it's late in the cycle, but it's always worth asking.
> 
> Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
> commit, and if it passes a few day's worth of testing, I will see what
> Ingo has to say about a pull request.
> 
> This commit also privatizes smp_mb__after_unlock_lock() as well as
> updating documentation.  Looks like we need to strengthen powerpc's
> locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
> Or did that already happen and I just missed it?

No it didn't.

I thought the end result of this thread was that we didn't *need* to change the
powerpc lock semantics? Or did I read it wrong?

ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which is
consistent with our current implementation.

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-16 Thread Michael Ellerman

On Wed, 2015-08-12 at 08:43 -0700, Paul E. McKenney wrote:
 On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
  Hello Paul,
  
  On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
   On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
   commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
   Author: Paul E. McKenney paul...@linux.vnet.ibm.com
   Date:   Tue Jul 14 18:35:23 2015 -0700
   
   rcu,locking: Privatize smp_mb__after_unlock_lock()
   
   RCU is the only thing that uses smp_mb__after_unlock_lock(), 
   and is
   likely the only thing that ever will use it, so this commit 
   makes this
   macro private to RCU.
   
   Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
   Cc: Will Deacon will.dea...@arm.com
   Cc: Peter Zijlstra pet...@infradead.org
   Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
   Cc: linux-a...@vger.kernel.org linux-a...@vger.kernel.org

Are you planning to queue this somewhere? I think it makes sense 
regardless
of whether we change PowerPc or not and ideally it would be merged 
around
the same time as my relaxed atomics series.
   
   I have is in -rcu.  By default, I will push it to the 4.4 merge window.
   Please let me know if you need it sooner.
  
  The generic relaxed atomics are now queued in -tip, so it would be really
  good to see this Documentation update land in 4.3 if at all possible. I
  appreciate it's late in the cycle, but it's always worth asking.
 
 Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
 commit, and if it passes a few day's worth of testing, I will see what
 Ingo has to say about a pull request.
 
 This commit also privatizes smp_mb__after_unlock_lock() as well as
 updating documentation.  Looks like we need to strengthen powerpc's
 locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
 Or did that already happen and I just missed it?

No it didn't.

I thought the end result of this thread was that we didn't *need* to change the
powerpc lock semantics? Or did I read it wrong?

ie. the docs now say that RELEASE+ACQUIRE is not a full barrier, which is
consistent with our current implementation.

cheers


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-13 Thread Paul E. McKenney

On Thu, Aug 13, 2015 at 11:49:28AM +0100, Will Deacon wrote:
> On Wed, Aug 12, 2015 at 06:59:38PM +0100, Paul E. McKenney wrote:
> > On Wed, Aug 12, 2015 at 08:43:46AM -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
> > > > The generic relaxed atomics are now queued in -tip, so it would be 
> > > > really
> > > > good to see this Documentation update land in 4.3 if at all possible. I
> > > > appreciate it's late in the cycle, but it's always worth asking.
> > > 
> > > Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
> > > commit, and if it passes a few day's worth of testing, I will see what
> > > Ingo has to say about a pull request.
> > > 
> > > This commit also privatizes smp_mb__after_unlock_lock() as well as
> > > updating documentation.  Looks like we need to strengthen powerpc's
> > > locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
> > > Or did that already happen and I just missed it?
> > 
> > And just for completeness, here is the current version of that commit.
> > 
> > Thanx, Paul
> > 
> > 
> > 
> >  b/Documentation/memory-barriers.txt   |   71 
> > +-
> >  b/arch/powerpc/include/asm/spinlock.h |2 
> >  b/include/linux/spinlock.h|   10 
> >  b/kernel/rcu/tree.h   |   12 +
> >  4 files changed, 16 insertions(+), 79 deletions(-)
> > 
> > commit 12d560f4ea87030667438a169912380be00cea4b
> > Author: Paul E. McKenney 
> > Date:   Tue Jul 14 18:35:23 2015 -0700
> > 
> > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > 
> > RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
> > likely the only thing that ever will use it, so this commit makes this
> > macro private to RCU.
> > 
> > Signed-off-by: Paul E. McKenney 
> > Cc: Will Deacon 
> > Cc: Peter Zijlstra 
> > Cc: Benjamin Herrenschmidt 
> > Cc: "linux-a...@vger.kernel.org" 
> 
> Acked-by: Will Deacon 
> 
> I don't think the PowerPC spinlock change is queued anywhere (I sent it
> out as a diff for discussion, but that was it). This patch doesn't rely
> on that though, right?

No, this patch just moves the smp_mb__after_unlock_lock() definition,
it does not change the code generated.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-13 Thread Will Deacon

On Wed, Aug 12, 2015 at 06:59:38PM +0100, Paul E. McKenney wrote:
> On Wed, Aug 12, 2015 at 08:43:46AM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
> > > The generic relaxed atomics are now queued in -tip, so it would be really
> > > good to see this Documentation update land in 4.3 if at all possible. I
> > > appreciate it's late in the cycle, but it's always worth asking.
> > 
> > Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
> > commit, and if it passes a few day's worth of testing, I will see what
> > Ingo has to say about a pull request.
> > 
> > This commit also privatizes smp_mb__after_unlock_lock() as well as
> > updating documentation.  Looks like we need to strengthen powerpc's
> > locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
> > Or did that already happen and I just missed it?
> 
> And just for completeness, here is the current version of that commit.
> 
>   Thanx, Paul
> 
> 
> 
>  b/Documentation/memory-barriers.txt   |   71 
> +-
>  b/arch/powerpc/include/asm/spinlock.h |2 
>  b/include/linux/spinlock.h|   10 
>  b/kernel/rcu/tree.h   |   12 +
>  4 files changed, 16 insertions(+), 79 deletions(-)
> 
> commit 12d560f4ea87030667438a169912380be00cea4b
> Author: Paul E. McKenney 
> Date:   Tue Jul 14 18:35:23 2015 -0700
> 
> rcu,locking: Privatize smp_mb__after_unlock_lock()
> 
> RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
> likely the only thing that ever will use it, so this commit makes this
> macro private to RCU.
> 
> Signed-off-by: Paul E. McKenney 
> Cc: Will Deacon 
> Cc: Peter Zijlstra 
> Cc: Benjamin Herrenschmidt 
> Cc: "linux-a...@vger.kernel.org" 

Acked-by: Will Deacon 

I don't think the PowerPC spinlock change is queued anywhere (I sent it
out as a diff for discussion, but that was it). This patch doesn't rely
on that though, right?

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-13 Thread Will Deacon

On Wed, Aug 12, 2015 at 06:59:38PM +0100, Paul E. McKenney wrote:
 On Wed, Aug 12, 2015 at 08:43:46AM -0700, Paul E. McKenney wrote:
  On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
   The generic relaxed atomics are now queued in -tip, so it would be really
   good to see this Documentation update land in 4.3 if at all possible. I
   appreciate it's late in the cycle, but it's always worth asking.
  
  Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
  commit, and if it passes a few day's worth of testing, I will see what
  Ingo has to say about a pull request.
  
  This commit also privatizes smp_mb__after_unlock_lock() as well as
  updating documentation.  Looks like we need to strengthen powerpc's
  locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
  Or did that already happen and I just missed it?
 
 And just for completeness, here is the current version of that commit.
 
   Thanx, Paul
 
 
 
  b/Documentation/memory-barriers.txt   |   71 
 +-
  b/arch/powerpc/include/asm/spinlock.h |2 
  b/include/linux/spinlock.h|   10 
  b/kernel/rcu/tree.h   |   12 +
  4 files changed, 16 insertions(+), 79 deletions(-)
 
 commit 12d560f4ea87030667438a169912380be00cea4b
 Author: Paul E. McKenney paul...@linux.vnet.ibm.com
 Date:   Tue Jul 14 18:35:23 2015 -0700
 
 rcu,locking: Privatize smp_mb__after_unlock_lock()
 
 RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
 likely the only thing that ever will use it, so this commit makes this
 macro private to RCU.
 
 Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 Cc: Will Deacon will.dea...@arm.com
 Cc: Peter Zijlstra pet...@infradead.org
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: linux-a...@vger.kernel.org linux-a...@vger.kernel.org

Acked-by: Will Deacon will.dea...@arm.com

I don't think the PowerPC spinlock change is queued anywhere (I sent it
out as a diff for discussion, but that was it). This patch doesn't rely
on that though, right?

Will
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-13 Thread Paul E. McKenney

On Thu, Aug 13, 2015 at 11:49:28AM +0100, Will Deacon wrote:
 On Wed, Aug 12, 2015 at 06:59:38PM +0100, Paul E. McKenney wrote:
  On Wed, Aug 12, 2015 at 08:43:46AM -0700, Paul E. McKenney wrote:
   On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
The generic relaxed atomics are now queued in -tip, so it would be 
really
good to see this Documentation update land in 4.3 if at all possible. I
appreciate it's late in the cycle, but it's always worth asking.
   
   Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
   commit, and if it passes a few day's worth of testing, I will see what
   Ingo has to say about a pull request.
   
   This commit also privatizes smp_mb__after_unlock_lock() as well as
   updating documentation.  Looks like we need to strengthen powerpc's
   locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
   Or did that already happen and I just missed it?
  
  And just for completeness, here is the current version of that commit.
  
  Thanx, Paul
  
  
  
   b/Documentation/memory-barriers.txt   |   71 
  +-
   b/arch/powerpc/include/asm/spinlock.h |2 
   b/include/linux/spinlock.h|   10 
   b/kernel/rcu/tree.h   |   12 +
   4 files changed, 16 insertions(+), 79 deletions(-)
  
  commit 12d560f4ea87030667438a169912380be00cea4b
  Author: Paul E. McKenney paul...@linux.vnet.ibm.com
  Date:   Tue Jul 14 18:35:23 2015 -0700
  
  rcu,locking: Privatize smp_mb__after_unlock_lock()
  
  RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
  likely the only thing that ever will use it, so this commit makes this
  macro private to RCU.
  
  Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
  Cc: Will Deacon will.dea...@arm.com
  Cc: Peter Zijlstra pet...@infradead.org
  Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
  Cc: linux-a...@vger.kernel.org linux-a...@vger.kernel.org
 
 Acked-by: Will Deacon will.dea...@arm.com
 
 I don't think the PowerPC spinlock change is queued anywhere (I sent it
 out as a diff for discussion, but that was it). This patch doesn't rely
 on that though, right?

No, this patch just moves the smp_mb__after_unlock_lock() definition,
it does not change the code generated.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-12 Thread Paul E. McKenney

On Wed, Aug 12, 2015 at 08:43:46AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
> > Hello Paul,
> > 
> > On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
> > > On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
> > > > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
> > > > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > > > > > > Author: Paul E. McKenney 
> > > > > > > Date:   Tue Jul 14 18:35:23 2015 -0700
> > > > > > > 
> > > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > > > > > > 
> > > > > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), 
> > > > > > > and is
> > > > > > > likely the only thing that ever will use it, so this commit 
> > > > > > > makes this
> > > > > > > macro private to RCU.
> > > > > > > 
> > > > > > > Signed-off-by: Paul E. McKenney 
> > > > > > > Cc: Will Deacon 
> > > > > > > Cc: Peter Zijlstra 
> > > > > > > Cc: Benjamin Herrenschmidt 
> > > > > > > Cc: "linux-a...@vger.kernel.org" 
> > > > 
> > > > Are you planning to queue this somewhere? I think it makes sense 
> > > > regardless
> > > > of whether we change PowerPc or not and ideally it would be merged 
> > > > around
> > > > the same time as my relaxed atomics series.
> > > 
> > > I have is in -rcu.  By default, I will push it to the 4.4 merge window.
> > > Please let me know if you need it sooner.
> > 
> > The generic relaxed atomics are now queued in -tip, so it would be really
> > good to see this Documentation update land in 4.3 if at all possible. I
> > appreciate it's late in the cycle, but it's always worth asking.
> 
> Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
> commit, and if it passes a few day's worth of testing, I will see what
> Ingo has to say about a pull request.
> 
> This commit also privatizes smp_mb__after_unlock_lock() as well as
> updating documentation.  Looks like we need to strengthen powerpc's
> locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
> Or did that already happen and I just missed it?

And just for completeness, here is the current version of that commit.

Thanx, Paul



 b/Documentation/memory-barriers.txt   |   71 +-
 b/arch/powerpc/include/asm/spinlock.h |2 
 b/include/linux/spinlock.h|   10 
 b/kernel/rcu/tree.h   |   12 +
 4 files changed, 16 insertions(+), 79 deletions(-)

commit 12d560f4ea87030667438a169912380be00cea4b
Author: Paul E. McKenney 
Date:   Tue Jul 14 18:35:23 2015 -0700

rcu,locking: Privatize smp_mb__after_unlock_lock()

RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
likely the only thing that ever will use it, so this commit makes this
macro private to RCU.

Signed-off-by: Paul E. McKenney 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Benjamin Herrenschmidt 
Cc: "linux-a...@vger.kernel.org" 

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index 318523872db5..eafa6a53f72c 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only from 
the perspective of
 another CPU not holding that lock.  In short, a ACQUIRE followed by an
 RELEASE may -not- be assumed to be a full memory barrier.
 
-Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
-imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
-pair to produce a full barrier, the ACQUIRE can be followed by an
-smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
-(including transitivity) if either (a) the RELEASE and the ACQUIRE are
-executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
-the same variable.  The smp_mb__after_unlock_lock() primitive is free
-on many architectures.  Without smp_mb__after_unlock_lock(), the CPU's
-execution of the critical sections corresponding to the RELEASE and the
-ACQUIRE can cross, so that:
+Similarly, the reverse case of a RELEASE followed by an ACQUIRE does
+not imply a full memory barrier.  Therefore, the CPU's execution of the
+critical sections corresponding to the RELEASE and the ACQUIRE can cross,
+so that:
 
*A = a;
RELEASE M
@@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding the 
deadlock.
a sleep-unlock race, but the locking primitive needs to resolve
such races properly in any case.
 
-With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
-For example, with the following code, the store to *A will always be
-seen by other CPUs before the store to *B:
-
-   *A = a;
-   RELEASE M
-   ACQUIRE

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-12 Thread Paul E. McKenney

On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
> Hello Paul,
> 
> On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
> > On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
> > > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
> > > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > > > > > Author: Paul E. McKenney 
> > > > > > Date:   Tue Jul 14 18:35:23 2015 -0700
> > > > > > 
> > > > > > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > > > > > 
> > > > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), 
> > > > > > and is
> > > > > > likely the only thing that ever will use it, so this commit 
> > > > > > makes this
> > > > > > macro private to RCU.
> > > > > > 
> > > > > > Signed-off-by: Paul E. McKenney 
> > > > > > Cc: Will Deacon 
> > > > > > Cc: Peter Zijlstra 
> > > > > > Cc: Benjamin Herrenschmidt 
> > > > > > Cc: "linux-a...@vger.kernel.org" 
> > > 
> > > Are you planning to queue this somewhere? I think it makes sense 
> > > regardless
> > > of whether we change PowerPc or not and ideally it would be merged around
> > > the same time as my relaxed atomics series.
> > 
> > I have is in -rcu.  By default, I will push it to the 4.4 merge window.
> > Please let me know if you need it sooner.
> 
> The generic relaxed atomics are now queued in -tip, so it would be really
> good to see this Documentation update land in 4.3 if at all possible. I
> appreciate it's late in the cycle, but it's always worth asking.

Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
commit, and if it passes a few day's worth of testing, I will see what
Ingo has to say about a pull request.

This commit also privatizes smp_mb__after_unlock_lock() as well as
updating documentation.  Looks like we need to strengthen powerpc's
locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
Or did that already happen and I just missed it?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-12 Thread Will Deacon

Hello Paul,

On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
> On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
> > On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
> > > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > > > > Author: Paul E. McKenney 
> > > > > Date:   Tue Jul 14 18:35:23 2015 -0700
> > > > > 
> > > > > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > > > > 
> > > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), and 
> > > > > is
> > > > > likely the only thing that ever will use it, so this commit makes 
> > > > > this
> > > > > macro private to RCU.
> > > > > 
> > > > > Signed-off-by: Paul E. McKenney 
> > > > > Cc: Will Deacon 
> > > > > Cc: Peter Zijlstra 
> > > > > Cc: Benjamin Herrenschmidt 
> > > > > Cc: "linux-a...@vger.kernel.org" 
> > 
> > Are you planning to queue this somewhere? I think it makes sense regardless
> > of whether we change PowerPc or not and ideally it would be merged around
> > the same time as my relaxed atomics series.
> 
> I have is in -rcu.  By default, I will push it to the 4.4 merge window.
> Please let me know if you need it sooner.

The generic relaxed atomics are now queued in -tip, so it would be really
good to see this Documentation update land in 4.3 if at all possible. I
appreciate it's late in the cycle, but it's always worth asking.

Thanks,

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-12 Thread Paul E. McKenney

On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
 Hello Paul,
 
 On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
  On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
   On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
  commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
  Author: Paul E. McKenney paul...@linux.vnet.ibm.com
  Date:   Tue Jul 14 18:35:23 2015 -0700
  
  rcu,locking: Privatize smp_mb__after_unlock_lock()
  
  RCU is the only thing that uses smp_mb__after_unlock_lock(), 
  and is
  likely the only thing that ever will use it, so this commit 
  makes this
  macro private to RCU.
  
  Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
  Cc: Will Deacon will.dea...@arm.com
  Cc: Peter Zijlstra pet...@infradead.org
  Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
  Cc: linux-a...@vger.kernel.org linux-a...@vger.kernel.org
   
   Are you planning to queue this somewhere? I think it makes sense 
   regardless
   of whether we change PowerPc or not and ideally it would be merged around
   the same time as my relaxed atomics series.
  
  I have is in -rcu.  By default, I will push it to the 4.4 merge window.
  Please let me know if you need it sooner.
 
 The generic relaxed atomics are now queued in -tip, so it would be really
 good to see this Documentation update land in 4.3 if at all possible. I
 appreciate it's late in the cycle, but it's always worth asking.

Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
commit, and if it passes a few day's worth of testing, I will see what
Ingo has to say about a pull request.

This commit also privatizes smp_mb__after_unlock_lock() as well as
updating documentation.  Looks like we need to strengthen powerpc's
locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
Or did that already happen and I just missed it?

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-12 Thread Will Deacon

Hello Paul,

On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
 On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
  On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
 commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
 Author: Paul E. McKenney paul...@linux.vnet.ibm.com
 Date:   Tue Jul 14 18:35:23 2015 -0700
 
 rcu,locking: Privatize smp_mb__after_unlock_lock()
 
 RCU is the only thing that uses smp_mb__after_unlock_lock(), and 
 is
 likely the only thing that ever will use it, so this commit makes 
 this
 macro private to RCU.
 
 Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 Cc: Will Deacon will.dea...@arm.com
 Cc: Peter Zijlstra pet...@infradead.org
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: linux-a...@vger.kernel.org linux-a...@vger.kernel.org
  
  Are you planning to queue this somewhere? I think it makes sense regardless
  of whether we change PowerPc or not and ideally it would be merged around
  the same time as my relaxed atomics series.
 
 I have is in -rcu.  By default, I will push it to the 4.4 merge window.
 Please let me know if you need it sooner.

The generic relaxed atomics are now queued in -tip, so it would be really
good to see this Documentation update land in 4.3 if at all possible. I
appreciate it's late in the cycle, but it's always worth asking.

Thanks,

Will
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-08-12 Thread Paul E. McKenney

On Wed, Aug 12, 2015 at 08:43:46AM -0700, Paul E. McKenney wrote:
 On Wed, Aug 12, 2015 at 02:44:15PM +0100, Will Deacon wrote:
  Hello Paul,
  
  On Fri, Jul 24, 2015 at 04:30:46PM +0100, Paul E. McKenney wrote:
   On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
   commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
   Author: Paul E. McKenney paul...@linux.vnet.ibm.com
   Date:   Tue Jul 14 18:35:23 2015 -0700
   
   rcu,locking: Privatize smp_mb__after_unlock_lock()
   
   RCU is the only thing that uses smp_mb__after_unlock_lock(), 
   and is
   likely the only thing that ever will use it, so this commit 
   makes this
   macro private to RCU.
   
   Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
   Cc: Will Deacon will.dea...@arm.com
   Cc: Peter Zijlstra pet...@infradead.org
   Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
   Cc: linux-a...@vger.kernel.org linux-a...@vger.kernel.org

Are you planning to queue this somewhere? I think it makes sense 
regardless
of whether we change PowerPc or not and ideally it would be merged 
around
the same time as my relaxed atomics series.
   
   I have is in -rcu.  By default, I will push it to the 4.4 merge window.
   Please let me know if you need it sooner.
  
  The generic relaxed atomics are now queued in -tip, so it would be really
  good to see this Documentation update land in 4.3 if at all possible. I
  appreciate it's late in the cycle, but it's always worth asking.
 
 Can't hurt to give it a try.  I have set -rcu's rcu/next branch to this
 commit, and if it passes a few day's worth of testing, I will see what
 Ingo has to say about a pull request.
 
 This commit also privatizes smp_mb__after_unlock_lock() as well as
 updating documentation.  Looks like we need to strengthen powerpc's
 locking primitives, then get rid of smp_mb__after_unlock_lock() entirely.
 Or did that already happen and I just missed it?

And just for completeness, here is the current version of that commit.

Thanx, Paul



 b/Documentation/memory-barriers.txt   |   71 +-
 b/arch/powerpc/include/asm/spinlock.h |2 
 b/include/linux/spinlock.h|   10 
 b/kernel/rcu/tree.h   |   12 +
 4 files changed, 16 insertions(+), 79 deletions(-)

commit 12d560f4ea87030667438a169912380be00cea4b
Author: Paul E. McKenney paul...@linux.vnet.ibm.com
Date:   Tue Jul 14 18:35:23 2015 -0700

rcu,locking: Privatize smp_mb__after_unlock_lock()

RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
likely the only thing that ever will use it, so this commit makes this
macro private to RCU.

Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
Cc: Will Deacon will.dea...@arm.com
Cc: Peter Zijlstra pet...@infradead.org
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: linux-a...@vger.kernel.org linux-a...@vger.kernel.org

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index 318523872db5..eafa6a53f72c 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only from 
the perspective of
 another CPU not holding that lock.  In short, a ACQUIRE followed by an
 RELEASE may -not- be assumed to be a full memory barrier.
 
-Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
-imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
-pair to produce a full barrier, the ACQUIRE can be followed by an
-smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
-(including transitivity) if either (a) the RELEASE and the ACQUIRE are
-executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
-the same variable.  The smp_mb__after_unlock_lock() primitive is free
-on many architectures.  Without smp_mb__after_unlock_lock(), the CPU's
-execution of the critical sections corresponding to the RELEASE and the
-ACQUIRE can cross, so that:
+Similarly, the reverse case of a RELEASE followed by an ACQUIRE does
+not imply a full memory barrier.  Therefore, the CPU's execution of the
+critical sections corresponding to the RELEASE and the ACQUIRE can cross,
+so that:
 
*A = a;
RELEASE M
@@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding the 
deadlock.
a sleep-unlock race, but the locking primitive needs to resolve
such races properly in any case.
 
-With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
-For example, with the following code, the store to *A will always be

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-24 Thread Paul E. McKenney

On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
> Hi Paul,
> 
> On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
> > On Wed, Jul 15, 2015 at 11:51:35AM +0100, Will Deacon wrote:
> > > On Wed, Jul 15, 2015 at 02:38:20AM +0100, Paul E. McKenney wrote:
> > > > On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote:
> > > > > On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
> > > > > > On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
> > > > > > > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
> > > > > > > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney 
> > > > > > > > wrote:
> > > > > > > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > > > > > > > > > Given that RCU is currently the only user of this barrier, 
> > > > > > > > > > how would you
> > > > > > > > > > feel about making the barrier local to RCU and not part of 
> > > > > > > > > > the general
> > > > > > > > > > memory-barrier API?
> > > > > > > > > 
> > > > > > > > > In theory, no objection.  Your thought is to leave the 
> > > > > > > > > definitions where
> > > > > > > > > they are, mark them as being used only by RCU, and removing 
> > > > > > > > > mention from
> > > > > > > > > memory-barriers.txt?  Or did you have something else in mind?
> > > > > > > > 
> > > > > > > > Actually, I was thinking of defining them in an RCU header file 
> > > > > > > > with an
> > > > > > > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could 
> > > > > > > > have a big
> > > > > > > > comment describing the semantics, or put that in an RCU 
> > > > > > > > Documentation file
> > > > > > > > instead of memory-barriers.txt.
> > > > > > > > 
> > > > > > > > That *should* then mean we notice anybody else trying to use 
> > > > > > > > the barrier,
> > > > > > > > because they'd need to send patches to either add something 
> > > > > > > > equivalent
> > > > > > > > or move the definition out again.
> > > > > > > 
> > > > > > > My concern with this approach is that someone putting together a 
> > > > > > > new
> > > > > > > architecture might miss this.  That said, this approach certainly 
> > > > > > > would
> > > > > > > work for the current architectures.
> > > > > > 
> > > > > > I don't think they're any more likely to miss it than with the 
> > > > > > current
> > > > > > situation where the generic code defines the macro as a NOP unless 
> > > > > > you
> > > > > > explicitly override it.
> > > > > 
> > > > > Fair enough...
> > > > 
> > > > Like this?
> > > 
> > > Precisely! Thanks for cooking the patch -- this lays all my worries to
> > > rest, so:
> > > 
> > >   Acked-by: Will Deacon 
> > 
> > Thank you!
> 
> [...]
> 
> > > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > > > Author: Paul E. McKenney 
> > > > Date:   Tue Jul 14 18:35:23 2015 -0700
> > > > 
> > > > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > > > 
> > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
> > > > likely the only thing that ever will use it, so this commit makes 
> > > > this
> > > > macro private to RCU.
> > > > 
> > > > Signed-off-by: Paul E. McKenney 
> > > > Cc: Will Deacon 
> > > > Cc: Peter Zijlstra 
> > > > Cc: Benjamin Herrenschmidt 
> > > > Cc: "linux-a...@vger.kernel.org" 
> 
> Are you planning to queue this somewhere? I think it makes sense regardless
> of whether we change PowerPc or not and ideally it would be merged around
> the same time as my relaxed atomics series.

I have is in -rcu.  By default, I will push it to the 4.4 merge window.
Please let me know if you need it sooner.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-24 Thread Will Deacon

Hi Paul,

On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
> On Wed, Jul 15, 2015 at 11:51:35AM +0100, Will Deacon wrote:
> > On Wed, Jul 15, 2015 at 02:38:20AM +0100, Paul E. McKenney wrote:
> > > On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote:
> > > > On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
> > > > > On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
> > > > > > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
> > > > > > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
> > > > > > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > > > > > > > > Given that RCU is currently the only user of this barrier, 
> > > > > > > > > how would you
> > > > > > > > > feel about making the barrier local to RCU and not part of 
> > > > > > > > > the general
> > > > > > > > > memory-barrier API?
> > > > > > > > 
> > > > > > > > In theory, no objection.  Your thought is to leave the 
> > > > > > > > definitions where
> > > > > > > > they are, mark them as being used only by RCU, and removing 
> > > > > > > > mention from
> > > > > > > > memory-barriers.txt?  Or did you have something else in mind?
> > > > > > > 
> > > > > > > Actually, I was thinking of defining them in an RCU header file 
> > > > > > > with an
> > > > > > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could 
> > > > > > > have a big
> > > > > > > comment describing the semantics, or put that in an RCU 
> > > > > > > Documentation file
> > > > > > > instead of memory-barriers.txt.
> > > > > > > 
> > > > > > > That *should* then mean we notice anybody else trying to use the 
> > > > > > > barrier,
> > > > > > > because they'd need to send patches to either add something 
> > > > > > > equivalent
> > > > > > > or move the definition out again.
> > > > > > 
> > > > > > My concern with this approach is that someone putting together a new
> > > > > > architecture might miss this.  That said, this approach certainly 
> > > > > > would
> > > > > > work for the current architectures.
> > > > > 
> > > > > I don't think they're any more likely to miss it than with the current
> > > > > situation where the generic code defines the macro as a NOP unless you
> > > > > explicitly override it.
> > > > 
> > > > Fair enough...
> > > 
> > > Like this?
> > 
> > Precisely! Thanks for cooking the patch -- this lays all my worries to
> > rest, so:
> > 
> >   Acked-by: Will Deacon 
> 
> Thank you!

[...]

> > > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > > Author: Paul E. McKenney 
> > > Date:   Tue Jul 14 18:35:23 2015 -0700
> > > 
> > > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > > 
> > > RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
> > > likely the only thing that ever will use it, so this commit makes this
> > > macro private to RCU.
> > > 
> > > Signed-off-by: Paul E. McKenney 
> > > Cc: Will Deacon 
> > > Cc: Peter Zijlstra 
> > > Cc: Benjamin Herrenschmidt 
> > > Cc: "linux-a...@vger.kernel.org" 

Are you planning to queue this somewhere? I think it makes sense regardless
of whether we change PowerPc or not and ideally it would be merged around
the same time as my relaxed atomics series.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-24 Thread Will Deacon

Hi Paul,

On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
 On Wed, Jul 15, 2015 at 11:51:35AM +0100, Will Deacon wrote:
  On Wed, Jul 15, 2015 at 02:38:20AM +0100, Paul E. McKenney wrote:
   On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote:
On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
 On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
  On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
   On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
 Given that RCU is currently the only user of this barrier, 
 how would you
 feel about making the barrier local to RCU and not part of 
 the general
 memory-barrier API?

In theory, no objection.  Your thought is to leave the 
definitions where
they are, mark them as being used only by RCU, and removing 
mention from
memory-barriers.txt?  Or did you have something else in mind?
   
   Actually, I was thinking of defining them in an RCU header file 
   with an
   #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could 
   have a big
   comment describing the semantics, or put that in an RCU 
   Documentation file
   instead of memory-barriers.txt.
   
   That *should* then mean we notice anybody else trying to use the 
   barrier,
   because they'd need to send patches to either add something 
   equivalent
   or move the definition out again.
  
  My concern with this approach is that someone putting together a new
  architecture might miss this.  That said, this approach certainly 
  would
  work for the current architectures.
 
 I don't think they're any more likely to miss it than with the current
 situation where the generic code defines the macro as a NOP unless you
 explicitly override it.

Fair enough...
   
   Like this?
  
  Precisely! Thanks for cooking the patch -- this lays all my worries to
  rest, so:
  
Acked-by: Will Deacon will.dea...@arm.com
 
 Thank you!

[...]

   commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
   Author: Paul E. McKenney paul...@linux.vnet.ibm.com
   Date:   Tue Jul 14 18:35:23 2015 -0700
   
   rcu,locking: Privatize smp_mb__after_unlock_lock()
   
   RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
   likely the only thing that ever will use it, so this commit makes this
   macro private to RCU.
   
   Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
   Cc: Will Deacon will.dea...@arm.com
   Cc: Peter Zijlstra pet...@infradead.org
   Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
   Cc: linux-a...@vger.kernel.org linux-a...@vger.kernel.org

Are you planning to queue this somewhere? I think it makes sense regardless
of whether we change PowerPc or not and ideally it would be merged around
the same time as my relaxed atomics series.

Will
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-24 Thread Paul E. McKenney

On Fri, Jul 24, 2015 at 12:31:01PM +0100, Will Deacon wrote:
 Hi Paul,
 
 On Wed, Jul 15, 2015 at 02:12:21PM +0100, Paul E. McKenney wrote:
  On Wed, Jul 15, 2015 at 11:51:35AM +0100, Will Deacon wrote:
   On Wed, Jul 15, 2015 at 02:38:20AM +0100, Paul E. McKenney wrote:
On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote:
 On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
  On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
   On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney 
wrote:
 On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
  Given that RCU is currently the only user of this barrier, 
  how would you
  feel about making the barrier local to RCU and not part of 
  the general
  memory-barrier API?
 
 In theory, no objection.  Your thought is to leave the 
 definitions where
 they are, mark them as being used only by RCU, and removing 
 mention from
 memory-barriers.txt?  Or did you have something else in mind?

Actually, I was thinking of defining them in an RCU header file 
with an
#ifdef CONFIG_POWERPC for the smb_mb() version. Then you could 
have a big
comment describing the semantics, or put that in an RCU 
Documentation file
instead of memory-barriers.txt.

That *should* then mean we notice anybody else trying to use 
the barrier,
because they'd need to send patches to either add something 
equivalent
or move the definition out again.
   
   My concern with this approach is that someone putting together a 
   new
   architecture might miss this.  That said, this approach certainly 
   would
   work for the current architectures.
  
  I don't think they're any more likely to miss it than with the 
  current
  situation where the generic code defines the macro as a NOP unless 
  you
  explicitly override it.
 
 Fair enough...

Like this?
   
   Precisely! Thanks for cooking the patch -- this lays all my worries to
   rest, so:
   
 Acked-by: Will Deacon will.dea...@arm.com
  
  Thank you!
 
 [...]
 
commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
Author: Paul E. McKenney paul...@linux.vnet.ibm.com
Date:   Tue Jul 14 18:35:23 2015 -0700

rcu,locking: Privatize smp_mb__after_unlock_lock()

RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
likely the only thing that ever will use it, so this commit makes 
this
macro private to RCU.

Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
Cc: Will Deacon will.dea...@arm.com
Cc: Peter Zijlstra pet...@infradead.org
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: linux-a...@vger.kernel.org linux-a...@vger.kernel.org
 
 Are you planning to queue this somewhere? I think it makes sense regardless
 of whether we change PowerPc or not and ideally it would be merged around
 the same time as my relaxed atomics series.

I have is in -rcu.  By default, I will push it to the 4.4 merge window.
Please let me know if you need it sooner.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-22 Thread Will Deacon

On Mon, Jul 20, 2015 at 10:18:14PM +0100, Benjamin Herrenschmidt wrote:
> On Mon, 2015-07-20 at 14:39 +0100, Will Deacon wrote:
> > On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote:
> > > On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
> > > >  static inline void arch_spin_unlock(arch_spinlock_t *lock)
> > > >  {
> > > > -   SYNC_IO;
> > > > -   __asm__ __volatile__("# arch_spin_unlock\n\t"
> > > > -   PPC_RELEASE_BARRIER: : :"memory");
> > > > +   smp_mb();
> > > > lock->slock = 0;
> > > >  }
> > > 
> > > That probably needs to be mb() in case somebody has the expectation that
> > > it does a barrier vs. DMA on UP.
> > 
> > Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier()
> > in the core code?
> 
> include/linux/spinlock_up.h:# define arch_spin_unlock(lock)   do {
> barrier(); (void)(lock); } while (0)
> 
> Indeed...

So, leaving mmiowb() alone, we end up with the patch below.

Will

--->8

>From 9373a4226986dd69b6aaf896590ea43abeb1cc8e Mon Sep 17 00:00:00 2001
From: Will Deacon 
Date: Wed, 22 Jul 2015 17:37:58 +0100
Subject: [PATCH] powerpc: kill smp_mb__after_unlock_lock

PowerPC is the only architecture defining smp_mb__after_unlock_lock,
but we can remove it by adding an unconditional sync (smp_mb()) to the
spin_unlock code.

Signed-off-by: Will Deacon 
---
 arch/powerpc/include/asm/io.h   | 13 +
 arch/powerpc/include/asm/spinlock.h | 22 +-
 2 files changed, 2 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index a8d2ef30d473..a3ad51046b04 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -105,12 +105,6 @@ extern bool isa_io_special;
  *
  */
 
-#ifdef CONFIG_PPC64
-#define IO_SET_SYNC_FLAG() do { local_paca->io_sync = 1; } while(0)
-#else
-#define IO_SET_SYNC_FLAG()
-#endif
-
 /* gcc 4.0 and older doesn't have 'Z' constraint */
 #if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ == 0)
 #define DEF_MMIO_IN_X(name, size, insn)\
@@ -127,7 +121,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__("sync;"#insn" %1,0,%2" \
: "=m" (*addr) : "r" (val), "r" (addr) : "memory"); \
-   IO_SET_SYNC_FLAG(); \
 }
 #else /* newer gcc */
 #define DEF_MMIO_IN_X(name, size, insn)\
@@ -144,7 +137,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__("sync;"#insn" %1,%y0"  \
: "=Z" (*addr) : "r" (val) : "memory"); \
-   IO_SET_SYNC_FLAG(); \
 }
 #endif
 
@@ -162,7 +154,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__("sync;"#insn"%U0%X0 %1,%0" \
: "=m" (*addr) : "r" (val) : "memory"); \
-   IO_SET_SYNC_FLAG(); \
 }
 
 DEF_MMIO_IN_D(in_8, 8, lbz);
@@ -630,9 +621,7 @@ static inline void name at  
\
 #define mmiowb()
 #else
 /*
- * Enforce synchronisation of stores vs. spin_unlock
- * (this does it explicitly, though our implementation of spin_unlock
- * does it implicitely too)
+ * Explicitly enforce synchronisation of stores vs. spin_unlock
  */
 static inline void mmiowb(void)
 {
diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 4dbe072eecbe..9da89ea4ff31 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -28,8 +28,6 @@
 #include 
 #include 
 
-#define smp_mb__after_unlock_lock()smp_mb()  /* Full ordering for lock. */
-
 #ifdef CONFIG_PPC64
 /* use 0x80yy when locked, where yy == CPU number */
 #ifdef __BIG_ENDIAN__
@@ -41,19 +39,6 @@
 #define LOCK_TOKEN 1
 #endif
 
-#if defined(CONFIG_PPC64) && defined(CONFIG_SMP)
-#define CLEAR_IO_SYNC  (get_paca()->io_sync = 0)
-#define SYNC_IOdo {
\
-   if (unlikely(get_paca()->io_sync)) {\
-   mb();   \
-   get_paca()->io_sync = 0;\
-   }   \
-   } while (0)
-#else
-#define CLEAR_IO_SYNC
-#define SYNC_IO
-#endif
-
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
return

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-22 Thread Will Deacon

On Mon, Jul 20, 2015 at 10:18:14PM +0100, Benjamin Herrenschmidt wrote:
 On Mon, 2015-07-20 at 14:39 +0100, Will Deacon wrote:
  On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote:
   On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-   SYNC_IO;
-   __asm__ __volatile__(# arch_spin_unlock\n\t
-   PPC_RELEASE_BARRIER: : :memory);
+   smp_mb();
lock-slock = 0;
 }
   
   That probably needs to be mb() in case somebody has the expectation that
   it does a barrier vs. DMA on UP.
  
  Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier()
  in the core code?
 
 include/linux/spinlock_up.h:# define arch_spin_unlock(lock)   do {
 barrier(); (void)(lock); } while (0)
 
 Indeed...

So, leaving mmiowb() alone, we end up with the patch below.

Will

---8

From 9373a4226986dd69b6aaf896590ea43abeb1cc8e Mon Sep 17 00:00:00 2001
From: Will Deacon will.dea...@arm.com
Date: Wed, 22 Jul 2015 17:37:58 +0100
Subject: [PATCH] powerpc: kill smp_mb__after_unlock_lock

PowerPC is the only architecture defining smp_mb__after_unlock_lock,
but we can remove it by adding an unconditional sync (smp_mb()) to the
spin_unlock code.

Signed-off-by: Will Deacon will.dea...@arm.com
---
 arch/powerpc/include/asm/io.h   | 13 +
 arch/powerpc/include/asm/spinlock.h | 22 +-
 2 files changed, 2 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index a8d2ef30d473..a3ad51046b04 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -105,12 +105,6 @@ extern bool isa_io_special;
  *
  */
 
-#ifdef CONFIG_PPC64
-#define IO_SET_SYNC_FLAG() do { local_paca-io_sync = 1; } while(0)
-#else
-#define IO_SET_SYNC_FLAG()
-#endif
-
 /* gcc 4.0 and older doesn't have 'Z' constraint */
 #if __GNUC__  4 || (__GNUC__ == 4  __GNUC_MINOR__ == 0)
 #define DEF_MMIO_IN_X(name, size, insn)\
@@ -127,7 +121,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__(sync;#insn %1,0,%2 \
: =m (*addr) : r (val), r (addr) : memory); \
-   IO_SET_SYNC_FLAG(); \
 }
 #else /* newer gcc */
 #define DEF_MMIO_IN_X(name, size, insn)\
@@ -144,7 +137,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__(sync;#insn %1,%y0  \
: =Z (*addr) : r (val) : memory); \
-   IO_SET_SYNC_FLAG(); \
 }
 #endif
 
@@ -162,7 +154,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__(sync;#insn%U0%X0 %1,%0 \
: =m (*addr) : r (val) : memory); \
-   IO_SET_SYNC_FLAG(); \
 }
 
 DEF_MMIO_IN_D(in_8, 8, lbz);
@@ -630,9 +621,7 @@ static inline void name at  
\
 #define mmiowb()
 #else
 /*
- * Enforce synchronisation of stores vs. spin_unlock
- * (this does it explicitly, though our implementation of spin_unlock
- * does it implicitely too)
+ * Explicitly enforce synchronisation of stores vs. spin_unlock
  */
 static inline void mmiowb(void)
 {
diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 4dbe072eecbe..9da89ea4ff31 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -28,8 +28,6 @@
 #include asm/synch.h
 #include asm/ppc-opcode.h
 
-#define smp_mb__after_unlock_lock()smp_mb()  /* Full ordering for lock. */
-
 #ifdef CONFIG_PPC64
 /* use 0x80yy when locked, where yy == CPU number */
 #ifdef __BIG_ENDIAN__
@@ -41,19 +39,6 @@
 #define LOCK_TOKEN 1
 #endif
 
-#if defined(CONFIG_PPC64)  defined(CONFIG_SMP)
-#define CLEAR_IO_SYNC  (get_paca()-io_sync = 0)
-#define SYNC_IOdo {
\
-   if (unlikely(get_paca()-io_sync)) {\
-   mb();   \
-   get_paca()-io_sync = 0;\
-   }   \
-   } while (0)
-#else
-#define CLEAR_IO_SYNC
-#define SYNC_IO
-#endif
-
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
return lock.slock == 0;
@@ -91,7 +76,6 @@ static

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-20 Thread Benjamin Herrenschmidt

On Mon, 2015-07-20 at 14:39 +0100, Will Deacon wrote:
> On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote:
> > On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
> > >  static inline void arch_spin_unlock(arch_spinlock_t *lock)
> > >  {
> > > -   SYNC_IO;
> > > -   __asm__ __volatile__("# arch_spin_unlock\n\t"
> > > -   PPC_RELEASE_BARRIER: : :"memory");
> > > +   smp_mb();
> > > lock->slock = 0;
> > >  }
> > 
> > That probably needs to be mb() in case somebody has the expectation that
> > it does a barrier vs. DMA on UP.
> 
> Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier()
> in the core code?

include/linux/spinlock_up.h:# define arch_spin_unlock(lock) do { barrier(); 
(void)(lock); } while (0)

Indeed...

Ben.


> Will
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arch" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-20 Thread Will Deacon

On Mon, Jul 20, 2015 at 02:48:49PM +0100, Paul E. McKenney wrote:
> On Mon, Jul 20, 2015 at 02:39:06PM +0100, Will Deacon wrote:
> > On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote:
> > > On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
> > > >  static inline void arch_spin_unlock(arch_spinlock_t *lock)
> > > >  {
> > > > -   SYNC_IO;
> > > > -   __asm__ __volatile__("# arch_spin_unlock\n\t"
> > > > -   PPC_RELEASE_BARRIER: : :"memory");
> > > > +   smp_mb();
> > > > lock->slock = 0;
> > > >  }
> > > 
> > > That probably needs to be mb() in case somebody has the expectation that
> > > it does a barrier vs. DMA on UP.
> > 
> > Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier()
> > in the core code?
> 
> Yes, to barrier(), but that doesn't generate any code.  In contrast, the
> mb() that Ben is asking for puts out a sync instruction.  Without that
> sync instruction, MMIO accesses can be reordered with the spin_unlock(),
> even on single-CPU systems.  So the bm() is really needed if unlock is
> to order against MMIO (and thus DMA) on UP.

Understood, but my point was that this needs to be done in the core code
rather than here. Perhaps it's easier to leave mmiowb() alone for PowerPC
for now and instead predicate that on !SMP?

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-20 Thread Paul E. McKenney

On Mon, Jul 20, 2015 at 02:39:06PM +0100, Will Deacon wrote:
> On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote:
> > On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
> > >  static inline void arch_spin_unlock(arch_spinlock_t *lock)
> > >  {
> > > -   SYNC_IO;
> > > -   __asm__ __volatile__("# arch_spin_unlock\n\t"
> > > -   PPC_RELEASE_BARRIER: : :"memory");
> > > +   smp_mb();
> > > lock->slock = 0;
> > >  }
> > 
> > That probably needs to be mb() in case somebody has the expectation that
> > it does a barrier vs. DMA on UP.
> 
> Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier()
> in the core code?

Yes, to barrier(), but that doesn't generate any code.  In contrast, the
mb() that Ben is asking for puts out a sync instruction.  Without that
sync instruction, MMIO accesses can be reordered with the spin_unlock(),
even on single-CPU systems.  So the bm() is really needed if unlock is
to order against MMIO (and thus DMA) on UP.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-20 Thread Will Deacon

On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote:
> On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
> >  static inline void arch_spin_unlock(arch_spinlock_t *lock)
> >  {
> > -   SYNC_IO;
> > -   __asm__ __volatile__("# arch_spin_unlock\n\t"
> > -   PPC_RELEASE_BARRIER: : :"memory");
> > +   smp_mb();
> > lock->slock = 0;
> >  }
> 
> That probably needs to be mb() in case somebody has the expectation that
> it does a barrier vs. DMA on UP.

Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier()
in the core code?

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-20 Thread Will Deacon

On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote:
 On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
   static inline void arch_spin_unlock(arch_spinlock_t *lock)
   {
  -   SYNC_IO;
  -   __asm__ __volatile__(# arch_spin_unlock\n\t
  -   PPC_RELEASE_BARRIER: : :memory);
  +   smp_mb();
  lock-slock = 0;
   }
 
 That probably needs to be mb() in case somebody has the expectation that
 it does a barrier vs. DMA on UP.

Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier()
in the core code?

Will
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-20 Thread Paul E. McKenney

On Mon, Jul 20, 2015 at 02:39:06PM +0100, Will Deacon wrote:
 On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote:
  On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
   -   SYNC_IO;
   -   __asm__ __volatile__(# arch_spin_unlock\n\t
   -   PPC_RELEASE_BARRIER: : :memory);
   +   smp_mb();
   lock-slock = 0;
}
  
  That probably needs to be mb() in case somebody has the expectation that
  it does a barrier vs. DMA on UP.
 
 Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier()
 in the core code?

Yes, to barrier(), but that doesn't generate any code.  In contrast, the
mb() that Ben is asking for puts out a sync instruction.  Without that
sync instruction, MMIO accesses can be reordered with the spin_unlock(),
even on single-CPU systems.  So the bm() is really needed if unlock is
to order against MMIO (and thus DMA) on UP.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-20 Thread Will Deacon

On Mon, Jul 20, 2015 at 02:48:49PM +0100, Paul E. McKenney wrote:
 On Mon, Jul 20, 2015 at 02:39:06PM +0100, Will Deacon wrote:
  On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote:
   On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-   SYNC_IO;
-   __asm__ __volatile__(# arch_spin_unlock\n\t
-   PPC_RELEASE_BARRIER: : :memory);
+   smp_mb();
lock-slock = 0;
 }
   
   That probably needs to be mb() in case somebody has the expectation that
   it does a barrier vs. DMA on UP.
  
  Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier()
  in the core code?
 
 Yes, to barrier(), but that doesn't generate any code.  In contrast, the
 mb() that Ben is asking for puts out a sync instruction.  Without that
 sync instruction, MMIO accesses can be reordered with the spin_unlock(),
 even on single-CPU systems.  So the bm() is really needed if unlock is
 to order against MMIO (and thus DMA) on UP.

Understood, but my point was that this needs to be done in the core code
rather than here. Perhaps it's easier to leave mmiowb() alone for PowerPC
for now and instead predicate that on !SMP?

Will
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-20 Thread Benjamin Herrenschmidt

On Mon, 2015-07-20 at 14:39 +0100, Will Deacon wrote:
 On Fri, Jul 17, 2015 at 11:14:14PM +0100, Benjamin Herrenschmidt wrote:
  On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
   -   SYNC_IO;
   -   __asm__ __volatile__(# arch_spin_unlock\n\t
   -   PPC_RELEASE_BARRIER: : :memory);
   +   smp_mb();
   lock-slock = 0;
}
  
  That probably needs to be mb() in case somebody has the expectation that
  it does a barrier vs. DMA on UP.
 
 Hmm, but on !SMP doesn't arch_spin_unlock effectively expand to barrier()
 in the core code?

include/linux/spinlock_up.h:# define arch_spin_unlock(lock) do { barrier(); 
(void)(lock); } while (0)

Indeed...

Ben.


 Will
 --
 To unsubscribe from this list: send the line unsubscribe linux-arch in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-17 Thread Benjamin Herrenschmidt

On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
>  static inline void arch_spin_unlock(arch_spinlock_t *lock)
>  {
> -   SYNC_IO;
> -   __asm__ __volatile__("# arch_spin_unlock\n\t"
> -   PPC_RELEASE_BARRIER: : :"memory");
> +   smp_mb();
> lock->slock = 0;
>  }

That probably needs to be mb() in case somebody has the expectation that
it does a barrier vs. DMA on UP.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-17 Thread Paul E. McKenney

On Fri, Jul 17, 2015 at 12:15:35PM +0200, Peter Zijlstra wrote:
> On Fri, Jul 17, 2015 at 10:32:21AM +0100, Will Deacon wrote:
> > @@ -158,9 +140,7 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, 
> > unsigned long flags)
> >  
> >  static inline void arch_spin_unlock(arch_spinlock_t *lock)
> >  {
> > -   SYNC_IO;
> > -   __asm__ __volatile__("# arch_spin_unlock\n\t"
> > -   PPC_RELEASE_BARRIER: : :"memory");
> > +   smp_mb();
> > lock->slock = 0;
> >  }
> 
> Should we then also make smp_store_release() use sync instead of lwsync
> to keep it consistent?

Unless smp_store_release() needs to interact with MMIO accesses, it
should still be able to be lwsync.  This means that unlock-lock is a full
barrier, but relase-acquire is not necessarily, which should be just fine.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-17 Thread Peter Zijlstra

On Fri, Jul 17, 2015 at 10:32:21AM +0100, Will Deacon wrote:
> @@ -158,9 +140,7 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned 
> long flags)
>  
>  static inline void arch_spin_unlock(arch_spinlock_t *lock)
>  {
> - SYNC_IO;
> - __asm__ __volatile__("# arch_spin_unlock\n\t"
> - PPC_RELEASE_BARRIER: : :"memory");
> + smp_mb();
>   lock->slock = 0;
>  }

Should we then also make smp_store_release() use sync instead of lwsync
to keep it consistent?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-17 Thread Will Deacon

On Thu, Jul 16, 2015 at 11:54:25PM +0100, Benjamin Herrenschmidt wrote:
> On Thu, 2015-07-16 at 08:11 -0700, Paul E. McKenney wrote:
> > > So isync in lock in architecturally incorrect, despite being what
> > the
> > > architecture recommends using, yay !
> > 
> > Well, the architecture isn't expecting that crazies like myself would
> > want to have an unlock-lock provide ordering to some CPU not holding
> > the lock.  :-/
> 
> Yes, isync in lock effectively allows any load or store before the lock
> to leak into the lock and get re-ordered with things in there.
> 
> lwsync leaves us exposed to the re-order inside the LL/SC of a
> subsequent load.
> 
> So to get the full barrier semantic, the only option is a full sync,
> either in lock or unlock. Instinctively I prefer in lock but there's
> an argument to have it in unlock so we can get rid of the iosync
> business.

Yeah, removing the iosync logic kills mmiowb() as well (totally untested
diff below and I was too scared to touch that paca thing!). It also sticks
the full barrier in one place, since there's no try_unlock to worry about.

Will

--->8

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index a8d2ef30d473..bb34f3bb66dc 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -105,12 +105,6 @@ extern bool isa_io_special;
  *
  */
 
-#ifdef CONFIG_PPC64
-#define IO_SET_SYNC_FLAG() do { local_paca->io_sync = 1; } while(0)
-#else
-#define IO_SET_SYNC_FLAG()
-#endif
-
 /* gcc 4.0 and older doesn't have 'Z' constraint */
 #if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ == 0)
 #define DEF_MMIO_IN_X(name, size, insn)\
@@ -127,7 +121,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__("sync;"#insn" %1,0,%2" \
: "=m" (*addr) : "r" (val), "r" (addr) : "memory"); \
-   IO_SET_SYNC_FLAG(); \
 }
 #else /* newer gcc */
 #define DEF_MMIO_IN_X(name, size, insn)\
@@ -144,7 +137,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__("sync;"#insn" %1,%y0"  \
: "=Z" (*addr) : "r" (val) : "memory"); \
-   IO_SET_SYNC_FLAG(); \
 }
 #endif
 
@@ -162,7 +154,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__("sync;"#insn"%U0%X0 %1,%0" \
: "=m" (*addr) : "r" (val) : "memory"); \
-   IO_SET_SYNC_FLAG(); \
 }
 
 DEF_MMIO_IN_D(in_8, 8, lbz);
@@ -626,23 +617,7 @@ static inline void name at 
\
 #define writel_relaxed(v, addr)writel(v, addr)
 #define writeq_relaxed(v, addr)writeq(v, addr)
 
-#ifdef CONFIG_PPC32
 #define mmiowb()
-#else
-/*
- * Enforce synchronisation of stores vs. spin_unlock
- * (this does it explicitly, though our implementation of spin_unlock
- * does it implicitely too)
- */
-static inline void mmiowb(void)
-{
-   unsigned long tmp;
-
-   __asm__ __volatile__("sync; li %0,0; stb %0,%1(13)"
-   : "=" (tmp) : "i" (offsetof(struct paca_struct, io_sync))
-   : "memory");
-}
-#endif /* !CONFIG_PPC32 */
 
 static inline void iosync(void)
 {
diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 4dbe072eecbe..9da89ea4ff31 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -28,8 +28,6 @@
 #include 
 #include 
 
-#define smp_mb__after_unlock_lock()smp_mb()  /* Full ordering for lock. */
-
 #ifdef CONFIG_PPC64
 /* use 0x80yy when locked, where yy == CPU number */
 #ifdef __BIG_ENDIAN__
@@ -41,19 +39,6 @@
 #define LOCK_TOKEN 1
 #endif
 
-#if defined(CONFIG_PPC64) && defined(CONFIG_SMP)
-#define CLEAR_IO_SYNC  (get_paca()->io_sync = 0)
-#define SYNC_IOdo {
\
-   if (unlikely(get_paca()->io_sync)) {\
-   mb();   \
-   get_paca()->io_sync = 0;\
-   }   \
-   } while (0)
-#else
-#define CLEAR_IO_SYNC
-#define SYNC_IO
-#endif
-
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
return lock.slock == 0;
@@ -91,7 +76,6 @@ static inline unsigned long 
__arch_spin_trylock(arch_spinlock_t *lock)
 
 static inline

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-17 Thread Paul E. McKenney

On Fri, Jul 17, 2015 at 12:15:35PM +0200, Peter Zijlstra wrote:
 On Fri, Jul 17, 2015 at 10:32:21AM +0100, Will Deacon wrote:
  @@ -158,9 +140,7 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, 
  unsigned long flags)
   
   static inline void arch_spin_unlock(arch_spinlock_t *lock)
   {
  -   SYNC_IO;
  -   __asm__ __volatile__(# arch_spin_unlock\n\t
  -   PPC_RELEASE_BARRIER: : :memory);
  +   smp_mb();
  lock-slock = 0;
   }
 
 Should we then also make smp_store_release() use sync instead of lwsync
 to keep it consistent?

Unless smp_store_release() needs to interact with MMIO accesses, it
should still be able to be lwsync.  This means that unlock-lock is a full
barrier, but relase-acquire is not necessarily, which should be just fine.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-17 Thread Will Deacon

On Thu, Jul 16, 2015 at 11:54:25PM +0100, Benjamin Herrenschmidt wrote:
 On Thu, 2015-07-16 at 08:11 -0700, Paul E. McKenney wrote:
   So isync in lock in architecturally incorrect, despite being what
  the
   architecture recommends using, yay !
  
  Well, the architecture isn't expecting that crazies like myself would
  want to have an unlock-lock provide ordering to some CPU not holding
  the lock.  :-/
 
 Yes, isync in lock effectively allows any load or store before the lock
 to leak into the lock and get re-ordered with things in there.
 
 lwsync leaves us exposed to the re-order inside the LL/SC of a
 subsequent load.
 
 So to get the full barrier semantic, the only option is a full sync,
 either in lock or unlock. Instinctively I prefer in lock but there's
 an argument to have it in unlock so we can get rid of the iosync
 business.

Yeah, removing the iosync logic kills mmiowb() as well (totally untested
diff below and I was too scared to touch that paca thing!). It also sticks
the full barrier in one place, since there's no try_unlock to worry about.

Will

---8

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index a8d2ef30d473..bb34f3bb66dc 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -105,12 +105,6 @@ extern bool isa_io_special;
  *
  */
 
-#ifdef CONFIG_PPC64
-#define IO_SET_SYNC_FLAG() do { local_paca-io_sync = 1; } while(0)
-#else
-#define IO_SET_SYNC_FLAG()
-#endif
-
 /* gcc 4.0 and older doesn't have 'Z' constraint */
 #if __GNUC__  4 || (__GNUC__ == 4  __GNUC_MINOR__ == 0)
 #define DEF_MMIO_IN_X(name, size, insn)\
@@ -127,7 +121,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__(sync;#insn %1,0,%2 \
: =m (*addr) : r (val), r (addr) : memory); \
-   IO_SET_SYNC_FLAG(); \
 }
 #else /* newer gcc */
 #define DEF_MMIO_IN_X(name, size, insn)\
@@ -144,7 +137,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__(sync;#insn %1,%y0  \
: =Z (*addr) : r (val) : memory); \
-   IO_SET_SYNC_FLAG(); \
 }
 #endif
 
@@ -162,7 +154,6 @@ static inline void name(volatile u##size __iomem *addr, 
u##size val)\
 {  \
__asm__ __volatile__(sync;#insn%U0%X0 %1,%0 \
: =m (*addr) : r (val) : memory); \
-   IO_SET_SYNC_FLAG(); \
 }
 
 DEF_MMIO_IN_D(in_8, 8, lbz);
@@ -626,23 +617,7 @@ static inline void name at 
\
 #define writel_relaxed(v, addr)writel(v, addr)
 #define writeq_relaxed(v, addr)writeq(v, addr)
 
-#ifdef CONFIG_PPC32
 #define mmiowb()
-#else
-/*
- * Enforce synchronisation of stores vs. spin_unlock
- * (this does it explicitly, though our implementation of spin_unlock
- * does it implicitely too)
- */
-static inline void mmiowb(void)
-{
-   unsigned long tmp;
-
-   __asm__ __volatile__(sync; li %0,0; stb %0,%1(13)
-   : =r (tmp) : i (offsetof(struct paca_struct, io_sync))
-   : memory);
-}
-#endif /* !CONFIG_PPC32 */
 
 static inline void iosync(void)
 {
diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 4dbe072eecbe..9da89ea4ff31 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -28,8 +28,6 @@
 #include asm/synch.h
 #include asm/ppc-opcode.h
 
-#define smp_mb__after_unlock_lock()smp_mb()  /* Full ordering for lock. */
-
 #ifdef CONFIG_PPC64
 /* use 0x80yy when locked, where yy == CPU number */
 #ifdef __BIG_ENDIAN__
@@ -41,19 +39,6 @@
 #define LOCK_TOKEN 1
 #endif
 
-#if defined(CONFIG_PPC64)  defined(CONFIG_SMP)
-#define CLEAR_IO_SYNC  (get_paca()-io_sync = 0)
-#define SYNC_IOdo {
\
-   if (unlikely(get_paca()-io_sync)) {\
-   mb();   \
-   get_paca()-io_sync = 0;\
-   }   \
-   } while (0)
-#else
-#define CLEAR_IO_SYNC
-#define SYNC_IO
-#endif
-
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
return lock.slock == 0;
@@ -91,7 +76,6 @@ static inline unsigned long 
__arch_spin_trylock(arch_spinlock_t *lock)
 
 static inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
-

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-17 Thread Peter Zijlstra

On Fri, Jul 17, 2015 at 10:32:21AM +0100, Will Deacon wrote:
 @@ -158,9 +140,7 @@ void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned 
 long flags)
  
  static inline void arch_spin_unlock(arch_spinlock_t *lock)
  {
 - SYNC_IO;
 - __asm__ __volatile__(# arch_spin_unlock\n\t
 - PPC_RELEASE_BARRIER: : :memory);
 + smp_mb();
   lock-slock = 0;
  }

Should we then also make smp_store_release() use sync instead of lwsync
to keep it consistent?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-17 Thread Benjamin Herrenschmidt

On Fri, 2015-07-17 at 10:32 +0100, Will Deacon wrote:
  static inline void arch_spin_unlock(arch_spinlock_t *lock)
  {
 -   SYNC_IO;
 -   __asm__ __volatile__(# arch_spin_unlock\n\t
 -   PPC_RELEASE_BARRIER: : :memory);
 +   smp_mb();
 lock-slock = 0;
  }

That probably needs to be mb() in case somebody has the expectation that
it does a barrier vs. DMA on UP.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-16 Thread Benjamin Herrenschmidt

On Thu, 2015-07-16 at 08:11 -0700, Paul E. McKenney wrote:
> > So isync in lock in architecturally incorrect, despite being what
> the
> > architecture recommends using, yay !
> 
> Well, the architecture isn't expecting that crazies like myself would
> want to have an unlock-lock provide ordering to some CPU not holding
> the lock.  :-/

Yes, isync in lock effectively allows any load or store before the lock
to leak into the lock and get re-ordered with things in there.

lwsync leaves us exposed to the re-order inside the LL/SC of a
subsequent load.

So to get the full barrier semantic, the only option is a full sync,
either in lock or unlock. Instinctively I prefer in lock but there's
an argument to have it in unlock so we can get rid of the iosync
business.

Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-16 Thread Paul E. McKenney

On Thu, Jul 16, 2015 at 03:14:55PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2015-07-16 at 15:03 +1000, Benjamin Herrenschmidt wrote:
> > On Thu, 2015-07-16 at 12:00 +1000, Michael Ellerman wrote:
> > > That would fix the problem with smp_mb__after_unlock_lock(), but not
> > > the original worry we had about loads happening before the SC in lock.
> > 
> > However I think isync fixes *that* :-) The problem with isync is as you
> > said, it's not a -memory- barrier per-se, it's an execution barrier /
> > context synchronizing instruction. The combination stwcx. + bne + isync
> > however prevents the execution of anything past the isync until the
> > stwcx has completed and the bne has been "decided", which prevents loads
> > from leaking into the LL/SC loop. It will also prevent a store in the
> > lock from being issued before the stwcx. has completed. It does *not*
> > prevent as far as I can tell another unrelated store before the lock
> > from leaking into the lock, including the one used to unlock a different
> > lock.
> 
> Except that the architecture says:
> 
> <<
> Because a Store Conditional instruction may com-
> plete before its store has been performed, a condi-
> tional Branch instruction that depends on the CR0
> value set by a Store Conditional instruction does
> not order the Store Conditional's store with respect
> to storage accesses caused by instructions that
> follow the Branch
> >>
> 
> So isync in lock in architecturally incorrect, despite being what the
> architecture recommends using, yay !

Well, the architecture isn't expecting that crazies like myself would
want to have an unlock-lock provide ordering to some CPU not holding
the lock.  :-/

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-16 Thread Paul E. McKenney

On Thu, Jul 16, 2015 at 03:14:55PM +1000, Benjamin Herrenschmidt wrote:
 On Thu, 2015-07-16 at 15:03 +1000, Benjamin Herrenschmidt wrote:
  On Thu, 2015-07-16 at 12:00 +1000, Michael Ellerman wrote:
   That would fix the problem with smp_mb__after_unlock_lock(), but not
   the original worry we had about loads happening before the SC in lock.
  
  However I think isync fixes *that* :-) The problem with isync is as you
  said, it's not a -memory- barrier per-se, it's an execution barrier /
  context synchronizing instruction. The combination stwcx. + bne + isync
  however prevents the execution of anything past the isync until the
  stwcx has completed and the bne has been decided, which prevents loads
  from leaking into the LL/SC loop. It will also prevent a store in the
  lock from being issued before the stwcx. has completed. It does *not*
  prevent as far as I can tell another unrelated store before the lock
  from leaking into the lock, including the one used to unlock a different
  lock.
 
 Except that the architecture says:
 
 
 Because a Store Conditional instruction may com-
 plete before its store has been performed, a condi-
 tional Branch instruction that depends on the CR0
 value set by a Store Conditional instruction does
 not order the Store Conditional's store with respect
 to storage accesses caused by instructions that
 follow the Branch
 
 
 So isync in lock in architecturally incorrect, despite being what the
 architecture recommends using, yay !

Well, the architecture isn't expecting that crazies like myself would
want to have an unlock-lock provide ordering to some CPU not holding
the lock.  :-/

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-16 Thread Benjamin Herrenschmidt

On Thu, 2015-07-16 at 08:11 -0700, Paul E. McKenney wrote:
  So isync in lock in architecturally incorrect, despite being what
 the
  architecture recommends using, yay !
 
 Well, the architecture isn't expecting that crazies like myself would
 want to have an unlock-lock provide ordering to some CPU not holding
 the lock.  :-/

Yes, isync in lock effectively allows any load or store before the lock
to leak into the lock and get re-ordered with things in there.

lwsync leaves us exposed to the re-order inside the LL/SC of a
subsequent load.

So to get the full barrier semantic, the only option is a full sync,
either in lock or unlock. Instinctively I prefer in lock but there's
an argument to have it in unlock so we can get rid of the iosync
business.

Ben.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Benjamin Herrenschmidt

On Thu, 2015-07-16 at 15:03 +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2015-07-16 at 12:00 +1000, Michael Ellerman wrote:
> > That would fix the problem with smp_mb__after_unlock_lock(), but not
> > the original worry we had about loads happening before the SC in lock.
> 
> However I think isync fixes *that* :-) The problem with isync is as you
> said, it's not a -memory- barrier per-se, it's an execution barrier /
> context synchronizing instruction. The combination stwcx. + bne + isync
> however prevents the execution of anything past the isync until the
> stwcx has completed and the bne has been "decided", which prevents loads
> from leaking into the LL/SC loop. It will also prevent a store in the
> lock from being issued before the stwcx. has completed. It does *not*
> prevent as far as I can tell another unrelated store before the lock
> from leaking into the lock, including the one used to unlock a different
> lock.

Except that the architecture says:

<<
Because a Store Conditional instruction may com-
plete before its store has been performed, a condi-
tional Branch instruction that depends on the CR0
value set by a Store Conditional instruction does
not order the Store Conditional's store with respect
to storage accesses caused by instructions that
follow the Branch
>>

So isync in lock in architecturally incorrect, despite being what the
architecture recommends using, yay !

Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Benjamin Herrenschmidt

On Thu, 2015-07-16 at 12:00 +1000, Michael Ellerman wrote:
> That would fix the problem with smp_mb__after_unlock_lock(), but not
> the original worry we had about loads happening before the SC in lock.

However I think isync fixes *that* :-) The problem with isync is as you
said, it's not a -memory- barrier per-se, it's an execution barrier /
context synchronizing instruction. The combination stwcx. + bne + isync
however prevents the execution of anything past the isync until the
stwcx has completed and the bne has been "decided", which prevents loads
from leaking into the LL/SC loop. It will also prevent a store in the
lock from being issued before the stwcx. has completed. It does *not*
prevent as far as I can tell another unrelated store before the lock
from leaking into the lock, including the one used to unlock a different
lock.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Michael Ellerman

On Wed, 2015-07-15 at 11:44 +0100, Will Deacon wrote:
> Hi Michael,
> 
> On Wed, Jul 15, 2015 at 04:06:18AM +0100, Michael Ellerman wrote:
> > On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote:
> > > On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
> > > > This didn't go anywhere last time I posted it, but here it is again.
> > > > I'd really appreciate some feedback from the PowerPC guys, especially as
> > > > to whether this change requires them to add an additional barrier in
> > > > arch_spin_unlock and what the cost of that would be.
> > > 
> > > We'd have to turn the lwsync in unlock or the isync in lock into a full
> > > barrier. As it is, we *almost* have a full barrier semantic, but not
> > > quite, as in things can get mixed up inside spin_lock between the LL and
> > > the SC (things leaking in past LL and things leaking "out" up before SC
> > > and then getting mixed up in there).
> > > 
> > > Michael, at some point you were experimenting a bit with that and tried
> > > to get some perf numbers of the impact that would have, did that
> > > solidify ? Otherwise, I'll have a look when I'm back next week.
> > 
> > I was mainly experimenting with replacing the lwsync in lock with an isync.
> > 
> > But I think you're talking about making it a full sync in lock.
> > 
> > That was about +7% on p8, +25% on p7 and +88% on p6.
> 
> Ok, so that's overhead incurred by moving from isync -> lwsync? The numbers
> look quite large...

Sorry no.

Currently we use lwsync in lock. You'll see isync in the code
(PPC_ACQUIRE_BARRIER), but on most modern CPUs that is patched at runtime to
lwsync.

I benchmarked lwsync (current), isync (proposed at the time) and sync (just for
comparison).

The numbers above are going from lwsync -> sync, as I thought that was what Ben
was talking about.

Going from lwsync to isync was actually a small speedup on power8, which was
surprising.

> > We got stuck deciding whether isync was safe to use as a memory barrier,
> > because the wording in the arch is a bit vague.
> > 
> > But if we're talking about a full sync then I think there is no question 
> > that's
> > OK and we should just do it.
> 
> Is this because there's a small overhead from lwsync -> sync? Just want to
> make sure I follow your reasoning.

No I mean that sync is clearly a memory barrier. The issue with switching to
isync in lock was that it's not a memory barrier per se, so we were not 100%
confident in it.

> If you went the way of sync in unlock, could you remove the conditional
> SYNC_IO stuff?

Yeah we could, it's just a conditional sync in unlock when mmio has been done.

That would fix the problem with smp_mb__after_unlock_lock(), but not the
original worry we had about loads happening before the SC in lock.

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Michael Ellerman

On Wed, 2015-07-15 at 07:18 -0700, Paul E. McKenney wrote:
> On Wed, Jul 15, 2015 at 01:06:18PM +1000, Michael Ellerman wrote:
> > On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote:
> > > Michael, at some point you were experimenting a bit with that and tried
> > > to get some perf numbers of the impact that would have, did that
> > > solidify ? Otherwise, I'll have a look when I'm back next week.
> > 
> > I was mainly experimenting with replacing the lwsync in lock with an isync.
> > 
> > But I think you're talking about making it a full sync in lock.
> > 
> > That was about +7% on p8, +25% on p7 and +88% on p6.
> 
> Just for completeness, what were you running as benchmark?  ;-)

Heh sorry :)

That was my lockcomparison benchmark, based on Anton's original. It just does
100,000,000 lock/unlocks for each type in userspace:

  https://github.com/mpe/lockcomparison/blob/master/lock_comparison.c

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Paul E. McKenney

On Wed, Jul 15, 2015 at 01:06:18PM +1000, Michael Ellerman wrote:
> On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
> > > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
> > > into a full memory barrier.
> > > 
> > > However:
> > > 
> > >   - This ordering guarantee is already provided without the barrier on
> > > all architectures apart from PowerPC
> > > 
> > >   - The barrier only applies to UNLOCK + LOCK, not general
> > > RELEASE + ACQUIRE operations
> > > 
> > >   - Locks are generally assumed to offer SC ordering semantics, so
> > > having this additional barrier is error-prone and complicates the
> > > callers of LOCK/UNLOCK primitives
> > > 
> > >   - The barrier is not well used outside of RCU and, because it was
> > > retrofitted into the kernel, it's not clear whether other areas of
> > > the kernel are incorrectly relying on UNLOCK + LOCK implying a full
> > > barrier
> > > 
> > > This patch removes the barrier and instead requires architectures to
> > > provide full barrier semantics for an UNLOCK + LOCK sequence.
> > > 
> > > Cc: Benjamin Herrenschmidt 
> > > Cc: Paul McKenney 
> > > Cc: Peter Zijlstra 
> > > Signed-off-by: Will Deacon 
> > > ---
> > > 
> > > This didn't go anywhere last time I posted it, but here it is again.
> > > I'd really appreciate some feedback from the PowerPC guys, especially as
> > > to whether this change requires them to add an additional barrier in
> > > arch_spin_unlock and what the cost of that would be.
> > 
> > We'd have to turn the lwsync in unlock or the isync in lock into a full
> > barrier. As it is, we *almost* have a full barrier semantic, but not
> > quite, as in things can get mixed up inside spin_lock between the LL and
> > the SC (things leaking in past LL and things leaking "out" up before SC
> > and then getting mixed up in there).
> > 
> > Michael, at some point you were experimenting a bit with that and tried
> > to get some perf numbers of the impact that would have, did that
> > solidify ? Otherwise, I'll have a look when I'm back next week.
> 
> I was mainly experimenting with replacing the lwsync in lock with an isync.
> 
> But I think you're talking about making it a full sync in lock.
> 
> That was about +7% on p8, +25% on p7 and +88% on p6.

Just for completeness, what were you running as benchmark?  ;-)

Thanx, Paul

> We got stuck deciding whether isync was safe to use as a memory barrier,
> because the wording in the arch is a bit vague.
> 
> But if we're talking about a full sync then I think there is no question 
> that's
> OK and we should just do it.
> 
> cheers
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Paul E. McKenney

On Wed, Jul 15, 2015 at 11:51:35AM +0100, Will Deacon wrote:
> Hi Paul,
> 
> On Wed, Jul 15, 2015 at 02:38:20AM +0100, Paul E. McKenney wrote:
> > On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote:
> > > On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
> > > > On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
> > > > > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
> > > > > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
> > > > > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > > > > > > > Given that RCU is currently the only user of this barrier, how 
> > > > > > > > would you
> > > > > > > > feel about making the barrier local to RCU and not part of the 
> > > > > > > > general
> > > > > > > > memory-barrier API?
> > > > > > > 
> > > > > > > In theory, no objection.  Your thought is to leave the 
> > > > > > > definitions where
> > > > > > > they are, mark them as being used only by RCU, and removing 
> > > > > > > mention from
> > > > > > > memory-barriers.txt?  Or did you have something else in mind?
> > > > > > 
> > > > > > Actually, I was thinking of defining them in an RCU header file 
> > > > > > with an
> > > > > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have 
> > > > > > a big
> > > > > > comment describing the semantics, or put that in an RCU 
> > > > > > Documentation file
> > > > > > instead of memory-barriers.txt.
> > > > > > 
> > > > > > That *should* then mean we notice anybody else trying to use the 
> > > > > > barrier,
> > > > > > because they'd need to send patches to either add something 
> > > > > > equivalent
> > > > > > or move the definition out again.
> > > > > 
> > > > > My concern with this approach is that someone putting together a new
> > > > > architecture might miss this.  That said, this approach certainly 
> > > > > would
> > > > > work for the current architectures.
> > > > 
> > > > I don't think they're any more likely to miss it than with the current
> > > > situation where the generic code defines the macro as a NOP unless you
> > > > explicitly override it.
> > > 
> > > Fair enough...
> > 
> > Like this?
> 
> Precisely! Thanks for cooking the patch -- this lays all my worries to
> rest, so:
> 
>   Acked-by: Will Deacon 

Thank you!

> We should continue the discussion with Ben and Michael about whether or
> not the PowerPC locking code can be strengthened, though (making this
> barrier a NOP on all currently supported archs).

Indeed -- if it becomes a NOP on all supported architectures, we might
want to consider just removing it completely.

Thanx, Paul

> Will
> 
> > commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> > Author: Paul E. McKenney 
> > Date:   Tue Jul 14 18:35:23 2015 -0700
> > 
> > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > 
> > RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
> > likely the only thing that ever will use it, so this commit makes this
> > macro private to RCU.
> > 
> > Signed-off-by: Paul E. McKenney 
> > Cc: Will Deacon 
> > Cc: Peter Zijlstra 
> > Cc: Benjamin Herrenschmidt 
> > Cc: "linux-a...@vger.kernel.org" 
> > 
> > diff --git a/Documentation/memory-barriers.txt 
> > b/Documentation/memory-barriers.txt
> > index 318523872db5..eafa6a53f72c 100644
> > --- a/Documentation/memory-barriers.txt
> > +++ b/Documentation/memory-barriers.txt
> > @@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only 
> > from the perspective of
> >  another CPU not holding that lock.  In short, a ACQUIRE followed by an
> >  RELEASE may -not- be assumed to be a full memory barrier.
> >  
> > -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
> > -imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
> > -pair to produce a full barrier, the ACQUIRE can be followed by an
> > -smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
> > -(including transitivity) if either (a) the RELEASE and the ACQUIRE are
> > -executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
> > -the same variable.  The smp_mb__after_unlock_lock() primitive is free
> > -on many architectures.  Without smp_mb__after_unlock_lock(), the CPU's
> > -execution of the critical sections corresponding to the RELEASE and the
> > -ACQUIRE can cross, so that:
> > +Similarly, the reverse case of a RELEASE followed by an ACQUIRE does
> > +not imply a full memory barrier.  Therefore, the CPU's execution of the
> > +critical sections corresponding to the RELEASE and the ACQUIRE can cross,
> > +so that:
> >  
> > *A = a;
> > RELEASE M
> > @@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding 
> > the deadlock.
> > a sleep-unlock race, but the locking primitive needs to resolve
> > such races properly in any case.
> >  
> >

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Will Deacon

Hi Paul,

On Wed, Jul 15, 2015 at 02:38:20AM +0100, Paul E. McKenney wrote:
> On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote:
> > On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
> > > On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
> > > > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
> > > > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
> > > > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > > > > > > Given that RCU is currently the only user of this barrier, how 
> > > > > > > would you
> > > > > > > feel about making the barrier local to RCU and not part of the 
> > > > > > > general
> > > > > > > memory-barrier API?
> > > > > > 
> > > > > > In theory, no objection.  Your thought is to leave the definitions 
> > > > > > where
> > > > > > they are, mark them as being used only by RCU, and removing mention 
> > > > > > from
> > > > > > memory-barriers.txt?  Or did you have something else in mind?
> > > > > 
> > > > > Actually, I was thinking of defining them in an RCU header file with 
> > > > > an
> > > > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a 
> > > > > big
> > > > > comment describing the semantics, or put that in an RCU Documentation 
> > > > > file
> > > > > instead of memory-barriers.txt.
> > > > > 
> > > > > That *should* then mean we notice anybody else trying to use the 
> > > > > barrier,
> > > > > because they'd need to send patches to either add something equivalent
> > > > > or move the definition out again.
> > > > 
> > > > My concern with this approach is that someone putting together a new
> > > > architecture might miss this.  That said, this approach certainly would
> > > > work for the current architectures.
> > > 
> > > I don't think they're any more likely to miss it than with the current
> > > situation where the generic code defines the macro as a NOP unless you
> > > explicitly override it.
> > 
> > Fair enough...
> 
> Like this?

Precisely! Thanks for cooking the patch -- this lays all my worries to
rest, so:

  Acked-by: Will Deacon 

We should continue the discussion with Ben and Michael about whether or
not the PowerPC locking code can be strengthened, though (making this
barrier a NOP on all currently supported archs).

Will

> commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
> Author: Paul E. McKenney 
> Date:   Tue Jul 14 18:35:23 2015 -0700
> 
> rcu,locking: Privatize smp_mb__after_unlock_lock()
> 
> RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
> likely the only thing that ever will use it, so this commit makes this
> macro private to RCU.
> 
> Signed-off-by: Paul E. McKenney 
> Cc: Will Deacon 
> Cc: Peter Zijlstra 
> Cc: Benjamin Herrenschmidt 
> Cc: "linux-a...@vger.kernel.org" 
> 
> diff --git a/Documentation/memory-barriers.txt 
> b/Documentation/memory-barriers.txt
> index 318523872db5..eafa6a53f72c 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only from 
> the perspective of
>  another CPU not holding that lock.  In short, a ACQUIRE followed by an
>  RELEASE may -not- be assumed to be a full memory barrier.
>  
> -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
> -imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
> -pair to produce a full barrier, the ACQUIRE can be followed by an
> -smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
> -(including transitivity) if either (a) the RELEASE and the ACQUIRE are
> -executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
> -the same variable.  The smp_mb__after_unlock_lock() primitive is free
> -on many architectures.  Without smp_mb__after_unlock_lock(), the CPU's
> -execution of the critical sections corresponding to the RELEASE and the
> -ACQUIRE can cross, so that:
> +Similarly, the reverse case of a RELEASE followed by an ACQUIRE does
> +not imply a full memory barrier.  Therefore, the CPU's execution of the
> +critical sections corresponding to the RELEASE and the ACQUIRE can cross,
> +so that:
>  
>   *A = a;
>   RELEASE M
> @@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding 
> the deadlock.
>   a sleep-unlock race, but the locking primitive needs to resolve
>   such races properly in any case.
>  
> -With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
> -For example, with the following code, the store to *A will always be
> -seen by other CPUs before the store to *B:
> -
> - *A = a;
> - RELEASE M
> - ACQUIRE N
> - smp_mb__after_unlock_lock();
> - *B = b;
> -
> -The operations will always occur in one of the following orders:
> -
> - STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B
> - STORE *A, ACQUIRE,

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Will Deacon

Hi Michael,

On Wed, Jul 15, 2015 at 04:06:18AM +0100, Michael Ellerman wrote:
> On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
> > > This didn't go anywhere last time I posted it, but here it is again.
> > > I'd really appreciate some feedback from the PowerPC guys, especially as
> > > to whether this change requires them to add an additional barrier in
> > > arch_spin_unlock and what the cost of that would be.
> > 
> > We'd have to turn the lwsync in unlock or the isync in lock into a full
> > barrier. As it is, we *almost* have a full barrier semantic, but not
> > quite, as in things can get mixed up inside spin_lock between the LL and
> > the SC (things leaking in past LL and things leaking "out" up before SC
> > and then getting mixed up in there).
> > 
> > Michael, at some point you were experimenting a bit with that and tried
> > to get some perf numbers of the impact that would have, did that
> > solidify ? Otherwise, I'll have a look when I'm back next week.
> 
> I was mainly experimenting with replacing the lwsync in lock with an isync.
> 
> But I think you're talking about making it a full sync in lock.
> 
> That was about +7% on p8, +25% on p7 and +88% on p6.

Ok, so that's overhead incurred by moving from isync -> lwsync? The numbers
look quite large...

> We got stuck deciding whether isync was safe to use as a memory barrier,
> because the wording in the arch is a bit vague.
> 
> But if we're talking about a full sync then I think there is no question 
> that's
> OK and we should just do it.

Is this because there's a small overhead from lwsync -> sync? Just want to
make sure I follow your reasoning.

If you went the way of sync in unlock, could you remove the conditional
SYNC_IO stuff?

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Benjamin Herrenschmidt

On Thu, 2015-07-16 at 12:00 +1000, Michael Ellerman wrote:
 That would fix the problem with smp_mb__after_unlock_lock(), but not
 the original worry we had about loads happening before the SC in lock.

However I think isync fixes *that* :-) The problem with isync is as you
said, it's not a -memory- barrier per-se, it's an execution barrier /
context synchronizing instruction. The combination stwcx. + bne + isync
however prevents the execution of anything past the isync until the
stwcx has completed and the bne has been decided, which prevents loads
from leaking into the LL/SC loop. It will also prevent a store in the
lock from being issued before the stwcx. has completed. It does *not*
prevent as far as I can tell another unrelated store before the lock
from leaking into the lock, including the one used to unlock a different
lock.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Michael Ellerman

On Wed, 2015-07-15 at 11:44 +0100, Will Deacon wrote:
 Hi Michael,
 
 On Wed, Jul 15, 2015 at 04:06:18AM +0100, Michael Ellerman wrote:
  On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote:
   On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
This didn't go anywhere last time I posted it, but here it is again.
I'd really appreciate some feedback from the PowerPC guys, especially as
to whether this change requires them to add an additional barrier in
arch_spin_unlock and what the cost of that would be.
   
   We'd have to turn the lwsync in unlock or the isync in lock into a full
   barrier. As it is, we *almost* have a full barrier semantic, but not
   quite, as in things can get mixed up inside spin_lock between the LL and
   the SC (things leaking in past LL and things leaking out up before SC
   and then getting mixed up in there).
   
   Michael, at some point you were experimenting a bit with that and tried
   to get some perf numbers of the impact that would have, did that
   solidify ? Otherwise, I'll have a look when I'm back next week.
  
  I was mainly experimenting with replacing the lwsync in lock with an isync.
  
  But I think you're talking about making it a full sync in lock.
  
  That was about +7% on p8, +25% on p7 and +88% on p6.
 
 Ok, so that's overhead incurred by moving from isync - lwsync? The numbers
 look quite large...

Sorry no.

Currently we use lwsync in lock. You'll see isync in the code
(PPC_ACQUIRE_BARRIER), but on most modern CPUs that is patched at runtime to
lwsync.

I benchmarked lwsync (current), isync (proposed at the time) and sync (just for
comparison).

The numbers above are going from lwsync - sync, as I thought that was what Ben
was talking about.

Going from lwsync to isync was actually a small speedup on power8, which was
surprising.

  We got stuck deciding whether isync was safe to use as a memory barrier,
  because the wording in the arch is a bit vague.
  
  But if we're talking about a full sync then I think there is no question 
  that's
  OK and we should just do it.
 
 Is this because there's a small overhead from lwsync - sync? Just want to
 make sure I follow your reasoning.

No I mean that sync is clearly a memory barrier. The issue with switching to
isync in lock was that it's not a memory barrier per se, so we were not 100%
confident in it.

 If you went the way of sync in unlock, could you remove the conditional
 SYNC_IO stuff?

Yeah we could, it's just a conditional sync in unlock when mmio has been done.

That would fix the problem with smp_mb__after_unlock_lock(), but not the
original worry we had about loads happening before the SC in lock.

cheers


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Benjamin Herrenschmidt

On Thu, 2015-07-16 at 15:03 +1000, Benjamin Herrenschmidt wrote:
 On Thu, 2015-07-16 at 12:00 +1000, Michael Ellerman wrote:
  That would fix the problem with smp_mb__after_unlock_lock(), but not
  the original worry we had about loads happening before the SC in lock.
 
 However I think isync fixes *that* :-) The problem with isync is as you
 said, it's not a -memory- barrier per-se, it's an execution barrier /
 context synchronizing instruction. The combination stwcx. + bne + isync
 however prevents the execution of anything past the isync until the
 stwcx has completed and the bne has been decided, which prevents loads
 from leaking into the LL/SC loop. It will also prevent a store in the
 lock from being issued before the stwcx. has completed. It does *not*
 prevent as far as I can tell another unrelated store before the lock
 from leaking into the lock, including the one used to unlock a different
 lock.

Except that the architecture says:


Because a Store Conditional instruction may com-
plete before its store has been performed, a condi-
tional Branch instruction that depends on the CR0
value set by a Store Conditional instruction does
not order the Store Conditional's store with respect
to storage accesses caused by instructions that
follow the Branch


So isync in lock in architecturally incorrect, despite being what the
architecture recommends using, yay !

Ben.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Michael Ellerman

On Wed, 2015-07-15 at 07:18 -0700, Paul E. McKenney wrote:
 On Wed, Jul 15, 2015 at 01:06:18PM +1000, Michael Ellerman wrote:
  On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote:
   Michael, at some point you were experimenting a bit with that and tried
   to get some perf numbers of the impact that would have, did that
   solidify ? Otherwise, I'll have a look when I'm back next week.
  
  I was mainly experimenting with replacing the lwsync in lock with an isync.
  
  But I think you're talking about making it a full sync in lock.
  
  That was about +7% on p8, +25% on p7 and +88% on p6.
 
 Just for completeness, what were you running as benchmark?  ;-)

Heh sorry :)

That was my lockcomparison benchmark, based on Anton's original. It just does
100,000,000 lock/unlocks for each type in userspace:

  https://github.com/mpe/lockcomparison/blob/master/lock_comparison.c

cheers


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Will Deacon

Hi Paul,

On Wed, Jul 15, 2015 at 02:38:20AM +0100, Paul E. McKenney wrote:
 On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote:
  On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
   On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
 On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
  On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
   Given that RCU is currently the only user of this barrier, how 
   would you
   feel about making the barrier local to RCU and not part of the 
   general
   memory-barrier API?
  
  In theory, no objection.  Your thought is to leave the definitions 
  where
  they are, mark them as being used only by RCU, and removing mention 
  from
  memory-barriers.txt?  Or did you have something else in mind?
 
 Actually, I was thinking of defining them in an RCU header file with 
 an
 #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a 
 big
 comment describing the semantics, or put that in an RCU Documentation 
 file
 instead of memory-barriers.txt.
 
 That *should* then mean we notice anybody else trying to use the 
 barrier,
 because they'd need to send patches to either add something equivalent
 or move the definition out again.

My concern with this approach is that someone putting together a new
architecture might miss this.  That said, this approach certainly would
work for the current architectures.
   
   I don't think they're any more likely to miss it than with the current
   situation where the generic code defines the macro as a NOP unless you
   explicitly override it.
  
  Fair enough...
 
 Like this?

Precisely! Thanks for cooking the patch -- this lays all my worries to
rest, so:

  Acked-by: Will Deacon will.dea...@arm.com

We should continue the discussion with Ben and Michael about whether or
not the PowerPC locking code can be strengthened, though (making this
barrier a NOP on all currently supported archs).

Will

 commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
 Author: Paul E. McKenney paul...@linux.vnet.ibm.com
 Date:   Tue Jul 14 18:35:23 2015 -0700
 
 rcu,locking: Privatize smp_mb__after_unlock_lock()
 
 RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
 likely the only thing that ever will use it, so this commit makes this
 macro private to RCU.
 
 Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 Cc: Will Deacon will.dea...@arm.com
 Cc: Peter Zijlstra pet...@infradead.org
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: linux-a...@vger.kernel.org linux-a...@vger.kernel.org
 
 diff --git a/Documentation/memory-barriers.txt 
 b/Documentation/memory-barriers.txt
 index 318523872db5..eafa6a53f72c 100644
 --- a/Documentation/memory-barriers.txt
 +++ b/Documentation/memory-barriers.txt
 @@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only from 
 the perspective of
  another CPU not holding that lock.  In short, a ACQUIRE followed by an
  RELEASE may -not- be assumed to be a full memory barrier.
  
 -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
 -imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
 -pair to produce a full barrier, the ACQUIRE can be followed by an
 -smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
 -(including transitivity) if either (a) the RELEASE and the ACQUIRE are
 -executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
 -the same variable.  The smp_mb__after_unlock_lock() primitive is free
 -on many architectures.  Without smp_mb__after_unlock_lock(), the CPU's
 -execution of the critical sections corresponding to the RELEASE and the
 -ACQUIRE can cross, so that:
 +Similarly, the reverse case of a RELEASE followed by an ACQUIRE does
 +not imply a full memory barrier.  Therefore, the CPU's execution of the
 +critical sections corresponding to the RELEASE and the ACQUIRE can cross,
 +so that:
  
   *A = a;
   RELEASE M
 @@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding 
 the deadlock.
   a sleep-unlock race, but the locking primitive needs to resolve
   such races properly in any case.
  
 -With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
 -For example, with the following code, the store to *A will always be
 -seen by other CPUs before the store to *B:
 -
 - *A = a;
 - RELEASE M
 - ACQUIRE N
 - smp_mb__after_unlock_lock();
 - *B = b;
 -
 -The operations will always occur in one of the following orders:
 -
 - STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B
 - STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B
 - ACQUIRE, STORE *A, RELEASE,

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Will Deacon

Hi Michael,

On Wed, Jul 15, 2015 at 04:06:18AM +0100, Michael Ellerman wrote:
 On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote:
  On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
   This didn't go anywhere last time I posted it, but here it is again.
   I'd really appreciate some feedback from the PowerPC guys, especially as
   to whether this change requires them to add an additional barrier in
   arch_spin_unlock and what the cost of that would be.
  
  We'd have to turn the lwsync in unlock or the isync in lock into a full
  barrier. As it is, we *almost* have a full barrier semantic, but not
  quite, as in things can get mixed up inside spin_lock between the LL and
  the SC (things leaking in past LL and things leaking out up before SC
  and then getting mixed up in there).
  
  Michael, at some point you were experimenting a bit with that and tried
  to get some perf numbers of the impact that would have, did that
  solidify ? Otherwise, I'll have a look when I'm back next week.
 
 I was mainly experimenting with replacing the lwsync in lock with an isync.
 
 But I think you're talking about making it a full sync in lock.
 
 That was about +7% on p8, +25% on p7 and +88% on p6.

Ok, so that's overhead incurred by moving from isync - lwsync? The numbers
look quite large...

 We got stuck deciding whether isync was safe to use as a memory barrier,
 because the wording in the arch is a bit vague.
 
 But if we're talking about a full sync then I think there is no question 
 that's
 OK and we should just do it.

Is this because there's a small overhead from lwsync - sync? Just want to
make sure I follow your reasoning.

If you went the way of sync in unlock, could you remove the conditional
SYNC_IO stuff?

Will
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Paul E. McKenney

On Wed, Jul 15, 2015 at 01:06:18PM +1000, Michael Ellerman wrote:
 On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote:
  On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
   smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
   into a full memory barrier.
   
   However:
   
 - This ordering guarantee is already provided without the barrier on
   all architectures apart from PowerPC
   
 - The barrier only applies to UNLOCK + LOCK, not general
   RELEASE + ACQUIRE operations
   
 - Locks are generally assumed to offer SC ordering semantics, so
   having this additional barrier is error-prone and complicates the
   callers of LOCK/UNLOCK primitives
   
 - The barrier is not well used outside of RCU and, because it was
   retrofitted into the kernel, it's not clear whether other areas of
   the kernel are incorrectly relying on UNLOCK + LOCK implying a full
   barrier
   
   This patch removes the barrier and instead requires architectures to
   provide full barrier semantics for an UNLOCK + LOCK sequence.
   
   Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
   Cc: Paul McKenney paul...@linux.vnet.ibm.com
   Cc: Peter Zijlstra pet...@infradead.org
   Signed-off-by: Will Deacon will.dea...@arm.com
   ---
   
   This didn't go anywhere last time I posted it, but here it is again.
   I'd really appreciate some feedback from the PowerPC guys, especially as
   to whether this change requires them to add an additional barrier in
   arch_spin_unlock and what the cost of that would be.
  
  We'd have to turn the lwsync in unlock or the isync in lock into a full
  barrier. As it is, we *almost* have a full barrier semantic, but not
  quite, as in things can get mixed up inside spin_lock between the LL and
  the SC (things leaking in past LL and things leaking out up before SC
  and then getting mixed up in there).
  
  Michael, at some point you were experimenting a bit with that and tried
  to get some perf numbers of the impact that would have, did that
  solidify ? Otherwise, I'll have a look when I'm back next week.
 
 I was mainly experimenting with replacing the lwsync in lock with an isync.
 
 But I think you're talking about making it a full sync in lock.
 
 That was about +7% on p8, +25% on p7 and +88% on p6.

Just for completeness, what were you running as benchmark?  ;-)

Thanx, Paul

 We got stuck deciding whether isync was safe to use as a memory barrier,
 because the wording in the arch is a bit vague.
 
 But if we're talking about a full sync then I think there is no question 
 that's
 OK and we should just do it.
 
 cheers
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-15 Thread Paul E. McKenney

On Wed, Jul 15, 2015 at 11:51:35AM +0100, Will Deacon wrote:
 Hi Paul,
 
 On Wed, Jul 15, 2015 at 02:38:20AM +0100, Paul E. McKenney wrote:
  On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote:
   On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
 On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
  On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
   On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
Given that RCU is currently the only user of this barrier, how 
would you
feel about making the barrier local to RCU and not part of the 
general
memory-barrier API?
   
   In theory, no objection.  Your thought is to leave the 
   definitions where
   they are, mark them as being used only by RCU, and removing 
   mention from
   memory-barriers.txt?  Or did you have something else in mind?
  
  Actually, I was thinking of defining them in an RCU header file 
  with an
  #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have 
  a big
  comment describing the semantics, or put that in an RCU 
  Documentation file
  instead of memory-barriers.txt.
  
  That *should* then mean we notice anybody else trying to use the 
  barrier,
  because they'd need to send patches to either add something 
  equivalent
  or move the definition out again.
 
 My concern with this approach is that someone putting together a new
 architecture might miss this.  That said, this approach certainly 
 would
 work for the current architectures.

I don't think they're any more likely to miss it than with the current
situation where the generic code defines the macro as a NOP unless you
explicitly override it.
   
   Fair enough...
  
  Like this?
 
 Precisely! Thanks for cooking the patch -- this lays all my worries to
 rest, so:
 
   Acked-by: Will Deacon will.dea...@arm.com

Thank you!

 We should continue the discussion with Ben and Michael about whether or
 not the PowerPC locking code can be strengthened, though (making this
 barrier a NOP on all currently supported archs).

Indeed -- if it becomes a NOP on all supported architectures, we might
want to consider just removing it completely.

Thanx, Paul

 Will
 
  commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
  Author: Paul E. McKenney paul...@linux.vnet.ibm.com
  Date:   Tue Jul 14 18:35:23 2015 -0700
  
  rcu,locking: Privatize smp_mb__after_unlock_lock()
  
  RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
  likely the only thing that ever will use it, so this commit makes this
  macro private to RCU.
  
  Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
  Cc: Will Deacon will.dea...@arm.com
  Cc: Peter Zijlstra pet...@infradead.org
  Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
  Cc: linux-a...@vger.kernel.org linux-a...@vger.kernel.org
  
  diff --git a/Documentation/memory-barriers.txt 
  b/Documentation/memory-barriers.txt
  index 318523872db5..eafa6a53f72c 100644
  --- a/Documentation/memory-barriers.txt
  +++ b/Documentation/memory-barriers.txt
  @@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only 
  from the perspective of
   another CPU not holding that lock.  In short, a ACQUIRE followed by an
   RELEASE may -not- be assumed to be a full memory barrier.
   
  -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
  -imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
  -pair to produce a full barrier, the ACQUIRE can be followed by an
  -smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
  -(including transitivity) if either (a) the RELEASE and the ACQUIRE are
  -executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
  -the same variable.  The smp_mb__after_unlock_lock() primitive is free
  -on many architectures.  Without smp_mb__after_unlock_lock(), the CPU's
  -execution of the critical sections corresponding to the RELEASE and the
  -ACQUIRE can cross, so that:
  +Similarly, the reverse case of a RELEASE followed by an ACQUIRE does
  +not imply a full memory barrier.  Therefore, the CPU's execution of the
  +critical sections corresponding to the RELEASE and the ACQUIRE can cross,
  +so that:
   
  *A = a;
  RELEASE M
  @@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding 
  the deadlock.
  a sleep-unlock race, but the locking primitive needs to resolve
  such races properly in any case.
   
  -With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
  -For example, with the following code, the store to *A will always be
  -seen by other CPUs before the store to *B:
  -

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Michael Ellerman

On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
> > smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
> > into a full memory barrier.
> > 
> > However:
> > 
> >   - This ordering guarantee is already provided without the barrier on
> > all architectures apart from PowerPC
> > 
> >   - The barrier only applies to UNLOCK + LOCK, not general
> > RELEASE + ACQUIRE operations
> > 
> >   - Locks are generally assumed to offer SC ordering semantics, so
> > having this additional barrier is error-prone and complicates the
> > callers of LOCK/UNLOCK primitives
> > 
> >   - The barrier is not well used outside of RCU and, because it was
> > retrofitted into the kernel, it's not clear whether other areas of
> > the kernel are incorrectly relying on UNLOCK + LOCK implying a full
> > barrier
> > 
> > This patch removes the barrier and instead requires architectures to
> > provide full barrier semantics for an UNLOCK + LOCK sequence.
> > 
> > Cc: Benjamin Herrenschmidt 
> > Cc: Paul McKenney 
> > Cc: Peter Zijlstra 
> > Signed-off-by: Will Deacon 
> > ---
> > 
> > This didn't go anywhere last time I posted it, but here it is again.
> > I'd really appreciate some feedback from the PowerPC guys, especially as
> > to whether this change requires them to add an additional barrier in
> > arch_spin_unlock and what the cost of that would be.
> 
> We'd have to turn the lwsync in unlock or the isync in lock into a full
> barrier. As it is, we *almost* have a full barrier semantic, but not
> quite, as in things can get mixed up inside spin_lock between the LL and
> the SC (things leaking in past LL and things leaking "out" up before SC
> and then getting mixed up in there).
> 
> Michael, at some point you were experimenting a bit with that and tried
> to get some perf numbers of the impact that would have, did that
> solidify ? Otherwise, I'll have a look when I'm back next week.

I was mainly experimenting with replacing the lwsync in lock with an isync.

But I think you're talking about making it a full sync in lock.

That was about +7% on p8, +25% on p7 and +88% on p6.

We got stuck deciding whether isync was safe to use as a memory barrier,
because the wording in the arch is a bit vague.

But if we're talking about a full sync then I think there is no question that's
OK and we should just do it.

cheers


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Paul E. McKenney

On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote:
> On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
> > On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
> > > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
> > > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
> > > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > > > > > Given that RCU is currently the only user of this barrier, how 
> > > > > > would you
> > > > > > feel about making the barrier local to RCU and not part of the 
> > > > > > general
> > > > > > memory-barrier API?
> > > > > 
> > > > > In theory, no objection.  Your thought is to leave the definitions 
> > > > > where
> > > > > they are, mark them as being used only by RCU, and removing mention 
> > > > > from
> > > > > memory-barriers.txt?  Or did you have something else in mind?
> > > > 
> > > > Actually, I was thinking of defining them in an RCU header file with an
> > > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a 
> > > > big
> > > > comment describing the semantics, or put that in an RCU Documentation 
> > > > file
> > > > instead of memory-barriers.txt.
> > > > 
> > > > That *should* then mean we notice anybody else trying to use the 
> > > > barrier,
> > > > because they'd need to send patches to either add something equivalent
> > > > or move the definition out again.
> > > 
> > > My concern with this approach is that someone putting together a new
> > > architecture might miss this.  That said, this approach certainly would
> > > work for the current architectures.
> > 
> > I don't think they're any more likely to miss it than with the current
> > situation where the generic code defines the macro as a NOP unless you
> > explicitly override it.
> 
> Fair enough...

Like this?

Thanx, Paul



commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
Author: Paul E. McKenney 
Date:   Tue Jul 14 18:35:23 2015 -0700

rcu,locking: Privatize smp_mb__after_unlock_lock()

RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
likely the only thing that ever will use it, so this commit makes this
macro private to RCU.

Signed-off-by: Paul E. McKenney 
Cc: Will Deacon 
Cc: Peter Zijlstra 
Cc: Benjamin Herrenschmidt 
Cc: "linux-a...@vger.kernel.org" 

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index 318523872db5..eafa6a53f72c 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only from 
the perspective of
 another CPU not holding that lock.  In short, a ACQUIRE followed by an
 RELEASE may -not- be assumed to be a full memory barrier.
 
-Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
-imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
-pair to produce a full barrier, the ACQUIRE can be followed by an
-smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
-(including transitivity) if either (a) the RELEASE and the ACQUIRE are
-executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
-the same variable.  The smp_mb__after_unlock_lock() primitive is free
-on many architectures.  Without smp_mb__after_unlock_lock(), the CPU's
-execution of the critical sections corresponding to the RELEASE and the
-ACQUIRE can cross, so that:
+Similarly, the reverse case of a RELEASE followed by an ACQUIRE does
+not imply a full memory barrier.  Therefore, the CPU's execution of the
+critical sections corresponding to the RELEASE and the ACQUIRE can cross,
+so that:
 
*A = a;
RELEASE M
@@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding the 
deadlock.
a sleep-unlock race, but the locking primitive needs to resolve
such races properly in any case.
 
-With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
-For example, with the following code, the store to *A will always be
-seen by other CPUs before the store to *B:
-
-   *A = a;
-   RELEASE M
-   ACQUIRE N
-   smp_mb__after_unlock_lock();
-   *B = b;
-
-The operations will always occur in one of the following orders:
-
-   STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B
-   STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B
-   ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B
-
-If the RELEASE and ACQUIRE were instead both operating on the same lock
-variable, only the first of these alternatives can occur.  In addition,
-the more strongly ordered systems may rule out some of the above orders.
-But in any case, as noted earlier, the smp_mb__after_unlock_lock()
-ensures that the store to *A

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Paul E. McKenney

On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
> On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
> > On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
> > > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
> > > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > > > > Given that RCU is currently the only user of this barrier, how would 
> > > > > you
> > > > > feel about making the barrier local to RCU and not part of the general
> > > > > memory-barrier API?
> > > > 
> > > > In theory, no objection.  Your thought is to leave the definitions where
> > > > they are, mark them as being used only by RCU, and removing mention from
> > > > memory-barriers.txt?  Or did you have something else in mind?
> > > 
> > > Actually, I was thinking of defining them in an RCU header file with an
> > > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a big
> > > comment describing the semantics, or put that in an RCU Documentation file
> > > instead of memory-barriers.txt.
> > > 
> > > That *should* then mean we notice anybody else trying to use the barrier,
> > > because they'd need to send patches to either add something equivalent
> > > or move the definition out again.
> > 
> > My concern with this approach is that someone putting together a new
> > architecture might miss this.  That said, this approach certainly would
> > work for the current architectures.
> 
> I don't think they're any more likely to miss it than with the current
> situation where the generic code defines the macro as a NOP unless you
> explicitly override it.

Fair enough...

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Will Deacon

On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
> On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
> > On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
> > > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > > > Given that RCU is currently the only user of this barrier, how would you
> > > > feel about making the barrier local to RCU and not part of the general
> > > > memory-barrier API?
> > > 
> > > In theory, no objection.  Your thought is to leave the definitions where
> > > they are, mark them as being used only by RCU, and removing mention from
> > > memory-barriers.txt?  Or did you have something else in mind?
> > 
> > Actually, I was thinking of defining them in an RCU header file with an
> > #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a big
> > comment describing the semantics, or put that in an RCU Documentation file
> > instead of memory-barriers.txt.
> > 
> > That *should* then mean we notice anybody else trying to use the barrier,
> > because they'd need to send patches to either add something equivalent
> > or move the definition out again.
> 
> My concern with this approach is that someone putting together a new
> architecture might miss this.  That said, this approach certainly would
> work for the current architectures.

I don't think they're any more likely to miss it than with the current
situation where the generic code defines the macro as a NOP unless you
explicitly override it.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Paul E. McKenney

On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
> Hi Paul,
> 
> On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
> > On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > > Given that RCU is currently the only user of this barrier, how would you
> > > feel about making the barrier local to RCU and not part of the general
> > > memory-barrier API?
> > 
> > In theory, no objection.  Your thought is to leave the definitions where
> > they are, mark them as being used only by RCU, and removing mention from
> > memory-barriers.txt?  Or did you have something else in mind?
> 
> Actually, I was thinking of defining them in an RCU header file with an
> #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a big
> comment describing the semantics, or put that in an RCU Documentation file
> instead of memory-barriers.txt.
> 
> That *should* then mean we notice anybody else trying to use the barrier,
> because they'd need to send patches to either add something equivalent
> or move the definition out again.

My concern with this approach is that someone putting together a new
architecture might miss this.  That said, this approach certainly would
work for the current architectures.

> > > My main reason for proposing its removal is because I don't want to see
> > > it being used (incorrectly) all over the place to order the new RELEASE
> > > and ACQUIRE operations I posted separately, at which point we have to try
> > > fixing up all the callers or retrofitting some semantics. It doesn't help
> > > that memory-barriers.txt lumps things like LOCK and ACQUIRE together,
> > > whereas this barrier is currently only intended to be used in conjunction
> > > with the former.
> > 
> > Heh!  That lumping was considered to be a feature at the time.  ;-)
> 
> Oh, I'm sure it was added with good intentions!

And we all know which road is paved with good intentions!  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Will Deacon

Hi Paul,

On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
> On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> > Given that RCU is currently the only user of this barrier, how would you
> > feel about making the barrier local to RCU and not part of the general
> > memory-barrier API?
> 
> In theory, no objection.  Your thought is to leave the definitions where
> they are, mark them as being used only by RCU, and removing mention from
> memory-barriers.txt?  Or did you have something else in mind?

Actually, I was thinking of defining them in an RCU header file with an
#ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a big
comment describing the semantics, or put that in an RCU Documentation file
instead of memory-barriers.txt.

That *should* then mean we notice anybody else trying to use the barrier,
because they'd need to send patches to either add something equivalent
or move the definition out again.

> > My main reason for proposing its removal is because I don't want to see
> > it being used (incorrectly) all over the place to order the new RELEASE
> > and ACQUIRE operations I posted separately, at which point we have to try
> > fixing up all the callers or retrofitting some semantics. It doesn't help
> > that memory-barriers.txt lumps things like LOCK and ACQUIRE together,
> > whereas this barrier is currently only intended to be used in conjunction
> > with the former.
> 
> Heh!  That lumping was considered to be a feature at the time.  ;-)

Oh, I'm sure it was added with good intentions!

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Paul E. McKenney

On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
> On Tue, Jul 14, 2015 at 12:04:06AM +0100, Paul E. McKenney wrote:
> > On Tue, Jul 14, 2015 at 12:23:46AM +0200, Peter Zijlstra wrote:
> > > If we look at the inside of the critical section again -- similar
> > > argument as before:
> > > 
> > >   *A = a
> > >   smp_mb()
> > >   store M
> > >   load N
> > >   smp_mb()
> > >   *B = b
> > > 
> > > A and B are fully ordered, and in this case even transitivity is
> > > provided.
> > > 
> > > I'm stating that the order of M and N don't matter, only the
> > > load/stores that are inside the acquire/release are constrained.
> > 
> > No argument here.
> > 
> > > IOW, I think smp_mb__after_unlock_lock() already works as advertised
> > > with all our acquire/release thingies -- as is stated by the
> > > documentation.
> > > 
> > > That said, I'm not aware of anybody but RCU actually using this, so its
> > > not used in that capacity.
> > 
> > OK, I might actually understand what you are getting at.  And, yes, if
> > someone actually comes up with a need to combine smp_mb__after_unlock_lock()
> > with something other than locking, we should worry about it at that point.
> > And probably rename smp_mb__after_unlock_lock() at that point, as well.
> > Until then, why lock ourselves into semantics that no one needs, and
> > that it is quite possible that no one will ever need?
> 
> Given that RCU is currently the only user of this barrier, how would you
> feel about making the barrier local to RCU and not part of the general
> memory-barrier API?

In theory, no objection.  Your thought is to leave the definitions where
they are, mark them as being used only by RCU, and removing mention from
memory-barriers.txt?  Or did you have something else in mind?

> My main reason for proposing its removal is because I don't want to see
> it being used (incorrectly) all over the place to order the new RELEASE
> and ACQUIRE operations I posted separately, at which point we have to try
> fixing up all the callers or retrofitting some semantics. It doesn't help
> that memory-barriers.txt lumps things like LOCK and ACQUIRE together,
> whereas this barrier is currently only intended to be used in conjunction
> with the former.

Heh!  That lumping was considered to be a feature at the time.  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Will Deacon

On Mon, Jul 13, 2015 at 11:31:29PM +0100, Benjamin Herrenschmidt wrote:
> On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
> > This didn't go anywhere last time I posted it, but here it is again.
> > I'd really appreciate some feedback from the PowerPC guys, especially as
> > to whether this change requires them to add an additional barrier in
> > arch_spin_unlock and what the cost of that would be.
> 
> We'd have to turn the lwsync in unlock or the isync in lock into a full
> barrier. As it is, we *almost* have a full barrier semantic, but not
> quite, as in things can get mixed up inside spin_lock between the LL and
> the SC (things leaking in past LL and things leaking "out" up before SC
> and then getting mixed up in there).

Thanks, Ben.

> Michael, at some point you were experimenting a bit with that and tried
> to get some perf numbers of the impact that would have, did that
> solidify ? Otherwise, I'll have a look when I'm back next week.

These numbers would be really interesting...

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Will Deacon

On Tue, Jul 14, 2015 at 12:04:06AM +0100, Paul E. McKenney wrote:
> On Tue, Jul 14, 2015 at 12:23:46AM +0200, Peter Zijlstra wrote:
> > If we look at the inside of the critical section again -- similar
> > argument as before:
> > 
> > *A = a
> > smp_mb()
> > store M
> > load N
> > smp_mb()
> > *B = b
> > 
> > A and B are fully ordered, and in this case even transitivity is
> > provided.
> > 
> > I'm stating that the order of M and N don't matter, only the
> > load/stores that are inside the acquire/release are constrained.
> 
> No argument here.
> 
> > IOW, I think smp_mb__after_unlock_lock() already works as advertised
> > with all our acquire/release thingies -- as is stated by the
> > documentation.
> > 
> > That said, I'm not aware of anybody but RCU actually using this, so its
> > not used in that capacity.
> 
> OK, I might actually understand what you are getting at.  And, yes, if
> someone actually comes up with a need to combine smp_mb__after_unlock_lock()
> with something other than locking, we should worry about it at that point.
> And probably rename smp_mb__after_unlock_lock() at that point, as well.
> Until then, why lock ourselves into semantics that no one needs, and
> that it is quite possible that no one will ever need?

Given that RCU is currently the only user of this barrier, how would you
feel about making the barrier local to RCU and not part of the general
memory-barrier API?

My main reason for proposing its removal is because I don't want to see
it being used (incorrectly) all over the place to order the new RELEASE
and ACQUIRE operations I posted separately, at which point we have to try
fixing up all the callers or retrofitting some semantics. It doesn't help
that memory-barriers.txt lumps things like LOCK and ACQUIRE together,
whereas this barrier is currently only intended to be used in conjunction
with the former.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Peter Zijlstra

On Tue, Jul 14, 2015 at 08:43:44AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2015-07-14 at 00:15 +0200, Peter Zijlstra wrote:
> 
> > > This is instead the sequence that is of concern:
> > > 
> > >   store a
> > >   unlock M
> > >   lock N
> > >   load b
> > 
> > So its late and that table didn't parse, but that should be ordered too.
> > The load of b should not be able to escape the lock N.
> > 
> > If only because LWSYNC is a valid RMB and any LOCK implementation must
> > load the lock state to observe it unlocked.
> 
> What happens is that the load passes the store conditional, though it
> doesn't pass the load with reserve. However, both store A and unlock M
> being just stores with an lwsync, can pass a load, so they can pass the
> load with reserve. And thus inside the LL/SC loop, our store A has
> passed our load B.

Ah cute.. Thanks, clearly I wasn't awake enough anymore :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Will Deacon

On Mon, Jul 13, 2015 at 11:31:29PM +0100, Benjamin Herrenschmidt wrote:
 On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
  This didn't go anywhere last time I posted it, but here it is again.
  I'd really appreciate some feedback from the PowerPC guys, especially as
  to whether this change requires them to add an additional barrier in
  arch_spin_unlock and what the cost of that would be.
 
 We'd have to turn the lwsync in unlock or the isync in lock into a full
 barrier. As it is, we *almost* have a full barrier semantic, but not
 quite, as in things can get mixed up inside spin_lock between the LL and
 the SC (things leaking in past LL and things leaking out up before SC
 and then getting mixed up in there).

Thanks, Ben.

 Michael, at some point you were experimenting a bit with that and tried
 to get some perf numbers of the impact that would have, did that
 solidify ? Otherwise, I'll have a look when I'm back next week.

These numbers would be really interesting...

Will
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Peter Zijlstra

On Tue, Jul 14, 2015 at 08:43:44AM +1000, Benjamin Herrenschmidt wrote:
 On Tue, 2015-07-14 at 00:15 +0200, Peter Zijlstra wrote:
 
   This is instead the sequence that is of concern:
   
 store a
 unlock M
 lock N
 load b
  
  So its late and that table didn't parse, but that should be ordered too.
  The load of b should not be able to escape the lock N.
  
  If only because LWSYNC is a valid RMB and any LOCK implementation must
  load the lock state to observe it unlocked.
 
 What happens is that the load passes the store conditional, though it
 doesn't pass the load with reserve. However, both store A and unlock M
 being just stores with an lwsync, can pass a load, so they can pass the
 load with reserve. And thus inside the LL/SC loop, our store A has
 passed our load B.

Ah cute.. Thanks, clearly I wasn't awake enough anymore :-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Will Deacon

On Tue, Jul 14, 2015 at 12:04:06AM +0100, Paul E. McKenney wrote:
 On Tue, Jul 14, 2015 at 12:23:46AM +0200, Peter Zijlstra wrote:
  If we look at the inside of the critical section again -- similar
  argument as before:
  
  *A = a
  smp_mb()
  store M
  load N
  smp_mb()
  *B = b
  
  A and B are fully ordered, and in this case even transitivity is
  provided.
  
  I'm stating that the order of M and N don't matter, only the
  load/stores that are inside the acquire/release are constrained.
 
 No argument here.
 
  IOW, I think smp_mb__after_unlock_lock() already works as advertised
  with all our acquire/release thingies -- as is stated by the
  documentation.
  
  That said, I'm not aware of anybody but RCU actually using this, so its
  not used in that capacity.
 
 OK, I might actually understand what you are getting at.  And, yes, if
 someone actually comes up with a need to combine smp_mb__after_unlock_lock()
 with something other than locking, we should worry about it at that point.
 And probably rename smp_mb__after_unlock_lock() at that point, as well.
 Until then, why lock ourselves into semantics that no one needs, and
 that it is quite possible that no one will ever need?

Given that RCU is currently the only user of this barrier, how would you
feel about making the barrier local to RCU and not part of the general
memory-barrier API?

My main reason for proposing its removal is because I don't want to see
it being used (incorrectly) all over the place to order the new RELEASE
and ACQUIRE operations I posted separately, at which point we have to try
fixing up all the callers or retrofitting some semantics. It doesn't help
that memory-barriers.txt lumps things like LOCK and ACQUIRE together,
whereas this barrier is currently only intended to be used in conjunction
with the former.

Will
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Paul E. McKenney

On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
 On Tue, Jul 14, 2015 at 12:04:06AM +0100, Paul E. McKenney wrote:
  On Tue, Jul 14, 2015 at 12:23:46AM +0200, Peter Zijlstra wrote:
   If we look at the inside of the critical section again -- similar
   argument as before:
   
 *A = a
 smp_mb()
 store M
 load N
 smp_mb()
 *B = b
   
   A and B are fully ordered, and in this case even transitivity is
   provided.
   
   I'm stating that the order of M and N don't matter, only the
   load/stores that are inside the acquire/release are constrained.
  
  No argument here.
  
   IOW, I think smp_mb__after_unlock_lock() already works as advertised
   with all our acquire/release thingies -- as is stated by the
   documentation.
   
   That said, I'm not aware of anybody but RCU actually using this, so its
   not used in that capacity.
  
  OK, I might actually understand what you are getting at.  And, yes, if
  someone actually comes up with a need to combine smp_mb__after_unlock_lock()
  with something other than locking, we should worry about it at that point.
  And probably rename smp_mb__after_unlock_lock() at that point, as well.
  Until then, why lock ourselves into semantics that no one needs, and
  that it is quite possible that no one will ever need?
 
 Given that RCU is currently the only user of this barrier, how would you
 feel about making the barrier local to RCU and not part of the general
 memory-barrier API?

In theory, no objection.  Your thought is to leave the definitions where
they are, mark them as being used only by RCU, and removing mention from
memory-barriers.txt?  Or did you have something else in mind?

 My main reason for proposing its removal is because I don't want to see
 it being used (incorrectly) all over the place to order the new RELEASE
 and ACQUIRE operations I posted separately, at which point we have to try
 fixing up all the callers or retrofitting some semantics. It doesn't help
 that memory-barriers.txt lumps things like LOCK and ACQUIRE together,
 whereas this barrier is currently only intended to be used in conjunction
 with the former.

Heh!  That lumping was considered to be a feature at the time.  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Will Deacon

Hi Paul,

On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
 On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
  Given that RCU is currently the only user of this barrier, how would you
  feel about making the barrier local to RCU and not part of the general
  memory-barrier API?
 
 In theory, no objection.  Your thought is to leave the definitions where
 they are, mark them as being used only by RCU, and removing mention from
 memory-barriers.txt?  Or did you have something else in mind?

Actually, I was thinking of defining them in an RCU header file with an
#ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a big
comment describing the semantics, or put that in an RCU Documentation file
instead of memory-barriers.txt.

That *should* then mean we notice anybody else trying to use the barrier,
because they'd need to send patches to either add something equivalent
or move the definition out again.

  My main reason for proposing its removal is because I don't want to see
  it being used (incorrectly) all over the place to order the new RELEASE
  and ACQUIRE operations I posted separately, at which point we have to try
  fixing up all the callers or retrofitting some semantics. It doesn't help
  that memory-barriers.txt lumps things like LOCK and ACQUIRE together,
  whereas this barrier is currently only intended to be used in conjunction
  with the former.
 
 Heh!  That lumping was considered to be a feature at the time.  ;-)

Oh, I'm sure it was added with good intentions!

Will
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Will Deacon

On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
 On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
  On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
   On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
Given that RCU is currently the only user of this barrier, how would you
feel about making the barrier local to RCU and not part of the general
memory-barrier API?
   
   In theory, no objection.  Your thought is to leave the definitions where
   they are, mark them as being used only by RCU, and removing mention from
   memory-barriers.txt?  Or did you have something else in mind?
  
  Actually, I was thinking of defining them in an RCU header file with an
  #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a big
  comment describing the semantics, or put that in an RCU Documentation file
  instead of memory-barriers.txt.
  
  That *should* then mean we notice anybody else trying to use the barrier,
  because they'd need to send patches to either add something equivalent
  or move the definition out again.
 
 My concern with this approach is that someone putting together a new
 architecture might miss this.  That said, this approach certainly would
 work for the current architectures.

I don't think they're any more likely to miss it than with the current
situation where the generic code defines the macro as a NOP unless you
explicitly override it.

Will
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Paul E. McKenney

On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
 Hi Paul,
 
 On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
  On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
   Given that RCU is currently the only user of this barrier, how would you
   feel about making the barrier local to RCU and not part of the general
   memory-barrier API?
  
  In theory, no objection.  Your thought is to leave the definitions where
  they are, mark them as being used only by RCU, and removing mention from
  memory-barriers.txt?  Or did you have something else in mind?
 
 Actually, I was thinking of defining them in an RCU header file with an
 #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a big
 comment describing the semantics, or put that in an RCU Documentation file
 instead of memory-barriers.txt.
 
 That *should* then mean we notice anybody else trying to use the barrier,
 because they'd need to send patches to either add something equivalent
 or move the definition out again.

My concern with this approach is that someone putting together a new
architecture might miss this.  That said, this approach certainly would
work for the current architectures.

   My main reason for proposing its removal is because I don't want to see
   it being used (incorrectly) all over the place to order the new RELEASE
   and ACQUIRE operations I posted separately, at which point we have to try
   fixing up all the callers or retrofitting some semantics. It doesn't help
   that memory-barriers.txt lumps things like LOCK and ACQUIRE together,
   whereas this barrier is currently only intended to be used in conjunction
   with the former.
  
  Heh!  That lumping was considered to be a feature at the time.  ;-)
 
 Oh, I'm sure it was added with good intentions!

And we all know which road is paved with good intentions!  ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Paul E. McKenney

On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
 On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
  On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
   On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
 Given that RCU is currently the only user of this barrier, how would 
 you
 feel about making the barrier local to RCU and not part of the general
 memory-barrier API?

In theory, no objection.  Your thought is to leave the definitions where
they are, mark them as being used only by RCU, and removing mention from
memory-barriers.txt?  Or did you have something else in mind?
   
   Actually, I was thinking of defining them in an RCU header file with an
   #ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a big
   comment describing the semantics, or put that in an RCU Documentation file
   instead of memory-barriers.txt.
   
   That *should* then mean we notice anybody else trying to use the barrier,
   because they'd need to send patches to either add something equivalent
   or move the definition out again.
  
  My concern with this approach is that someone putting together a new
  architecture might miss this.  That said, this approach certainly would
  work for the current architectures.
 
 I don't think they're any more likely to miss it than with the current
 situation where the generic code defines the macro as a NOP unless you
 explicitly override it.

Fair enough...

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Paul E. McKenney

On Tue, Jul 14, 2015 at 12:31:44PM -0700, Paul E. McKenney wrote:
 On Tue, Jul 14, 2015 at 03:12:16PM +0100, Will Deacon wrote:
  On Tue, Jul 14, 2015 at 03:00:14PM +0100, Paul E. McKenney wrote:
   On Tue, Jul 14, 2015 at 01:51:46PM +0100, Will Deacon wrote:
On Tue, Jul 14, 2015 at 01:45:40PM +0100, Paul E. McKenney wrote:
 On Tue, Jul 14, 2015 at 11:04:29AM +0100, Will Deacon wrote:
  Given that RCU is currently the only user of this barrier, how 
  would you
  feel about making the barrier local to RCU and not part of the 
  general
  memory-barrier API?
 
 In theory, no objection.  Your thought is to leave the definitions 
 where
 they are, mark them as being used only by RCU, and removing mention 
 from
 memory-barriers.txt?  Or did you have something else in mind?

Actually, I was thinking of defining them in an RCU header file with an
#ifdef CONFIG_POWERPC for the smb_mb() version. Then you could have a 
big
comment describing the semantics, or put that in an RCU Documentation 
file
instead of memory-barriers.txt.

That *should* then mean we notice anybody else trying to use the 
barrier,
because they'd need to send patches to either add something equivalent
or move the definition out again.
   
   My concern with this approach is that someone putting together a new
   architecture might miss this.  That said, this approach certainly would
   work for the current architectures.
  
  I don't think they're any more likely to miss it than with the current
  situation where the generic code defines the macro as a NOP unless you
  explicitly override it.
 
 Fair enough...

Like this?

Thanx, Paul



commit 695c05d4b9666c50b40a1c022678b5f6e2e3e771
Author: Paul E. McKenney paul...@linux.vnet.ibm.com
Date:   Tue Jul 14 18:35:23 2015 -0700

rcu,locking: Privatize smp_mb__after_unlock_lock()

RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
likely the only thing that ever will use it, so this commit makes this
macro private to RCU.

Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
Cc: Will Deacon will.dea...@arm.com
Cc: Peter Zijlstra pet...@infradead.org
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
Cc: linux-a...@vger.kernel.org linux-a...@vger.kernel.org

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index 318523872db5..eafa6a53f72c 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1854,16 +1854,10 @@ RELEASE are to the same lock variable, but only from 
the perspective of
 another CPU not holding that lock.  In short, a ACQUIRE followed by an
 RELEASE may -not- be assumed to be a full memory barrier.
 
-Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
-imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
-pair to produce a full barrier, the ACQUIRE can be followed by an
-smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
-(including transitivity) if either (a) the RELEASE and the ACQUIRE are
-executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
-the same variable.  The smp_mb__after_unlock_lock() primitive is free
-on many architectures.  Without smp_mb__after_unlock_lock(), the CPU's
-execution of the critical sections corresponding to the RELEASE and the
-ACQUIRE can cross, so that:
+Similarly, the reverse case of a RELEASE followed by an ACQUIRE does
+not imply a full memory barrier.  Therefore, the CPU's execution of the
+critical sections corresponding to the RELEASE and the ACQUIRE can cross,
+so that:
 
*A = a;
RELEASE M
@@ -1901,29 +1895,6 @@ the RELEASE would simply complete, thereby avoiding the 
deadlock.
a sleep-unlock race, but the locking primitive needs to resolve
such races properly in any case.
 
-With smp_mb__after_unlock_lock(), the two critical sections cannot overlap.
-For example, with the following code, the store to *A will always be
-seen by other CPUs before the store to *B:
-
-   *A = a;
-   RELEASE M
-   ACQUIRE N
-   smp_mb__after_unlock_lock();
-   *B = b;
-
-The operations will always occur in one of the following orders:
-
-   STORE *A, RELEASE, ACQUIRE, smp_mb__after_unlock_lock(), STORE *B
-   STORE *A, ACQUIRE, RELEASE, smp_mb__after_unlock_lock(), STORE *B
-   ACQUIRE, STORE *A, RELEASE, smp_mb__after_unlock_lock(), STORE *B
-
-If the RELEASE and ACQUIRE were instead both operating on the same lock
-variable, only the first of these alternatives can occur.  In addition,
-the more strongly ordered systems may rule out some of the above orders.
-But in any case, as noted earlier, the smp_mb__after_unlock_lock()
-ensures that the store to *A will

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-14 Thread Michael Ellerman

On Tue, 2015-07-14 at 08:31 +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
  smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
  into a full memory barrier.
  
  However:
  
- This ordering guarantee is already provided without the barrier on
  all architectures apart from PowerPC
  
- The barrier only applies to UNLOCK + LOCK, not general
  RELEASE + ACQUIRE operations
  
- Locks are generally assumed to offer SC ordering semantics, so
  having this additional barrier is error-prone and complicates the
  callers of LOCK/UNLOCK primitives
  
- The barrier is not well used outside of RCU and, because it was
  retrofitted into the kernel, it's not clear whether other areas of
  the kernel are incorrectly relying on UNLOCK + LOCK implying a full
  barrier
  
  This patch removes the barrier and instead requires architectures to
  provide full barrier semantics for an UNLOCK + LOCK sequence.
  
  Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
  Cc: Paul McKenney paul...@linux.vnet.ibm.com
  Cc: Peter Zijlstra pet...@infradead.org
  Signed-off-by: Will Deacon will.dea...@arm.com
  ---
  
  This didn't go anywhere last time I posted it, but here it is again.
  I'd really appreciate some feedback from the PowerPC guys, especially as
  to whether this change requires them to add an additional barrier in
  arch_spin_unlock and what the cost of that would be.
 
 We'd have to turn the lwsync in unlock or the isync in lock into a full
 barrier. As it is, we *almost* have a full barrier semantic, but not
 quite, as in things can get mixed up inside spin_lock between the LL and
 the SC (things leaking in past LL and things leaking out up before SC
 and then getting mixed up in there).
 
 Michael, at some point you were experimenting a bit with that and tried
 to get some perf numbers of the impact that would have, did that
 solidify ? Otherwise, I'll have a look when I'm back next week.

I was mainly experimenting with replacing the lwsync in lock with an isync.

But I think you're talking about making it a full sync in lock.

That was about +7% on p8, +25% on p7 and +88% on p6.

We got stuck deciding whether isync was safe to use as a memory barrier,
because the wording in the arch is a bit vague.

But if we're talking about a full sync then I think there is no question that's
OK and we should just do it.

cheers


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Paul E. McKenney

On Tue, Jul 14, 2015 at 12:23:46AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 13, 2015 at 01:20:32PM -0700, Paul E. McKenney wrote:
> > On Mon, Jul 13, 2015 at 06:50:29PM +0100, Will Deacon wrote:
> 
> > > So if I'm following along, smp_mb__after_unlock_lock *does* provide
> > > transitivity when used with UNLOCK + LOCK, which is stronger than your
> > > example here.
> > 
> > Yes, that is indeed the intent.
> 
> Maybe good to state this explicitly somewhere.

Fair enough!  Please see patch below.

> > > I don't think we want to make the same guarantee for general RELEASE +
> > > ACQUIRE, because we'd end up forcing most architectures to implement the
> > > expensive macro for a case that currently has no users.
> > 
> > Agreed, smp_mb__after_unlock_lock() makes a limited guarantee.
> 
> I'm still not seeing how the archs that implement load_acquire and
> store_release with smp_mb() are a problem.

I don't know that I ever claimed such architectures to be a problem.
Color me confused.

> If we look at the inside of the critical section again -- similar
> argument as before:
> 
>   *A = a
>   smp_mb()
>   store M
>   load N
>   smp_mb()
>   *B = b
> 
> A and B are fully ordered, and in this case even transitivity is
> provided.
> 
> I'm stating that the order of M and N don't matter, only the
> load/stores that are inside the acquire/release are constrained.

No argument here.

> IOW, I think smp_mb__after_unlock_lock() already works as advertised
> with all our acquire/release thingies -- as is stated by the
> documentation.
> 
> That said, I'm not aware of anybody but RCU actually using this, so its
> not used in that capacity.

OK, I might actually understand what you are getting at.  And, yes, if
someone actually comes up with a need to combine smp_mb__after_unlock_lock()
with something other than locking, we should worry about it at that point.
And probably rename smp_mb__after_unlock_lock() at that point, as well.
Until then, why lock ourselves into semantics that no one needs, and
that it is quite possible that no one will ever need?

> > > In which case, it boils down to the question of how expensive it would
> > > be to implement an SC UNLOCK operation on PowerPC and whether that 
> > > justifies
> > > the existence of a complicated barrier macro that isn't used outside of
> > > RCU.
> > 
> > Given that it is either smp_mb() or nothing, I am not seeing the
> > "complicated" part...
> 
> The 'complicated' part is that we need think about it; that is we need
> to realized and remember that UNLOCK+LOCK is a load-store barrier but
> fails to provide transitivity.

...  unless you are holding the lock.  So in the common case, you do
get transitivity.

Thanx, Paul



commit bae5cf1e2973bb1e8f852abb54f8b1948ffd82e4
Author: Paul E. McKenney 
Date:   Mon Jul 13 15:55:52 2015 -0700

doc: Call out smp_mb__after_unlock_lock() transitivity

Although "full barrier" should be interpreted as providing transitivity,
it is worth eliminating any possible confusion.  This commit therefore
adds "(including transitivity)" to eliminate any possible confusion.

Reported-by: Peter Zijlstra 
Signed-off-by: Paul E. McKenney 

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index 470c07c868e4..318523872db5 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1858,11 +1858,12 @@ Similarly, the reverse case of a RELEASE followed by an 
ACQUIRE does not
 imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
 pair to produce a full barrier, the ACQUIRE can be followed by an
 smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
-if either (a) the RELEASE and the ACQUIRE are executed by the same
-CPU or task, or (b) the RELEASE and ACQUIRE act on the same variable.
-The smp_mb__after_unlock_lock() primitive is free on many architectures.
-Without smp_mb__after_unlock_lock(), the CPU's execution of the critical
-sections corresponding to the RELEASE and the ACQUIRE can cross, so that:
+(including transitivity) if either (a) the RELEASE and the ACQUIRE are
+executed by the same CPU or task, or (b) the RELEASE and ACQUIRE act on
+the same variable.  The smp_mb__after_unlock_lock() primitive is free
+on many architectures.  Without smp_mb__after_unlock_lock(), the CPU's
+execution of the critical sections corresponding to the RELEASE and the
+ACQUIRE can cross, so that:
 
*A = a;
RELEASE M

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Paul E. McKenney

On Tue, Jul 14, 2015 at 12:15:03AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 13, 2015 at 01:16:42PM -0700, Paul E. McKenney wrote:
> > On Mon, Jul 13, 2015 at 03:41:53PM -0400, Peter Hurley wrote:
> > > > Does that answer the question, or am I missing the point?
> > > 
> > > Yes, it shows that smp_mb__after_unlock_lock() has no purpose, since it
> > > is defined only for PowerPC and your test above just showed that for
> > > the sequence
> 
> The only purpose is to provide transitivity, but the documentation fails
> to explicitly call that out.

It does say that it is a full barrier, but I added explicit mention of
transitivity.

> > > 
> > >   store a
> > >   UNLOCK M
> > >   LOCK N
> > >   store b
> > > 
> > > a and b is always observed as an ordered pair {a,b}.
> > 
> > Not quite.
> > 
> > This is instead the sequence that is of concern:
> > 
> > store a
> > unlock M
> > lock N
> > load b
> 
> So its late and that table didn't parse, but that should be ordered too.
> The load of b should not be able to escape the lock N.
> 
> If only because LWSYNC is a valid RMB and any LOCK implementation must
> load the lock state to observe it unlocked.

If you actually hold a given lock, then yes, you will observe anything
previously done while holding that same lock, even if you don't use
smp_mb__after_unlock_lock().  The smp_mb__after_unlock_lock() comes into
play when code not holding a lock needs to see the ordering.  RCU needs
this because of the strong ordering that grace periods must provide:
regardless of who started or ended the grace period, anything on any
CPU preceding a given grace period is fully ordered before anything on
any CPU following that same grace period.  It is not clear to me that
anything else would need such strong ordering.

> > > Additionally, the assertion in Documentation/memory_barriers.txt that
> > > the sequence above can be reordered as
> > > 
> > >   LOCK N
> > >   store b
> > >   store a
> > >   UNLOCK M
> > > 
> > > is not true on any existing arch in Linux.
> > 
> > It was at one time and might be again.
> 
> What would be required to make this true? I'm having a hard time seeing
> how things can get reordered like that.

You are right, I failed to merge current and past knowledge.  At one time,
Itanium was said to allow things to bleed into lock-based critical sections.
However, we now know that ld,acq and st,rel really do full ordering.

Compilers might one day do this sort of reordering, but I would guess
that Linux kernel builds would disable this sort of thing.  Something
about wanting critical sections to remain small.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Benjamin Herrenschmidt

On Tue, 2015-07-14 at 00:15 +0200, Peter Zijlstra wrote:

> > This is instead the sequence that is of concern:
> > 
> > store a
> > unlock M
> > lock N
> > load b
> 
> So its late and that table didn't parse, but that should be ordered too.
> The load of b should not be able to escape the lock N.
> 
> If only because LWSYNC is a valid RMB and any LOCK implementation must
> load the lock state to observe it unlocked.

What happens is that the load passes the store conditional, though it
doesn't pass the load with reserve. However, both store A and unlock M
being just stores with an lwsync, can pass a load, so they can pass the
load with reserve. And thus inside the LL/SC loop, our store A has
passed our load B.

> > > Additionally, the assertion in Documentation/memory_barriers.txt that
> > > the sequence above can be reordered as
> > > 
> > >   LOCK N
> > >   store b
> > >   store a
> > >   UNLOCK M
> > > 
> > > is not true on any existing arch in Linux.
> > 
> > It was at one time and might be again.
> 
> What would be required to make this true? I'm having a hard time seeing
> how things can get reordered like that.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Benjamin Herrenschmidt

On Mon, 2015-07-13 at 17:54 +0200, Peter Zijlstra wrote:

> That said, I don't think this could even happen on PPC because we have
> load_acquire and store_release, this means that:
> 
>   *A = a
>   lwsync
>   store_release M
>   load_acquire N
>   lwsync
>   *B = b
> 
> And since the store to M is wrapped inside two lwsync there must be
> strong store order, and because the load from N is equally wrapped in
> two lwsyncs there must also be strong load order.
> 
> In fact, no store/load can cross from before the first lwsync to after
> the latter and the other way around.
> 
> So in that respect it does provide full load-store ordering. What it
> does not provide is order for M and N, nor does it provide transitivity,
> but looking at our documentation I'm not at all sure we guarantee that
> in any case.

The problem is if you have a load after the second lwsync, that load can
go up pass the store release. This has caused issues when N or M is what
you are trying to order against. That's why we had to add a sync to
spin_is_locked or similar.

Ben.

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Benjamin Herrenschmidt

On Mon, 2015-07-13 at 13:15 +0100, Will Deacon wrote:
> smp_mb__after_unlock_lock is used to promote an UNLOCK + LOCK sequence
> into a full memory barrier.
> 
> However:
> 
>   - This ordering guarantee is already provided without the barrier on
> all architectures apart from PowerPC
> 
>   - The barrier only applies to UNLOCK + LOCK, not general
> RELEASE + ACQUIRE operations
> 
>   - Locks are generally assumed to offer SC ordering semantics, so
> having this additional barrier is error-prone and complicates the
> callers of LOCK/UNLOCK primitives
> 
>   - The barrier is not well used outside of RCU and, because it was
> retrofitted into the kernel, it's not clear whether other areas of
> the kernel are incorrectly relying on UNLOCK + LOCK implying a full
> barrier
> 
> This patch removes the barrier and instead requires architectures to
> provide full barrier semantics for an UNLOCK + LOCK sequence.
> 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul McKenney 
> Cc: Peter Zijlstra 
> Signed-off-by: Will Deacon 
> ---
> 
> This didn't go anywhere last time I posted it, but here it is again.
> I'd really appreciate some feedback from the PowerPC guys, especially as
> to whether this change requires them to add an additional barrier in
> arch_spin_unlock and what the cost of that would be.

We'd have to turn the lwsync in unlock or the isync in lock into a full
barrier. As it is, we *almost* have a full barrier semantic, but not
quite, as in things can get mixed up inside spin_lock between the LL and
the SC (things leaking in past LL and things leaking "out" up before SC
and then getting mixed up in there).

Michael, at some point you were experimenting a bit with that and tried
to get some perf numbers of the impact that would have, did that
solidify ? Otherwise, I'll have a look when I'm back next week.

Ben.

>  Documentation/memory-barriers.txt   | 77 
> ++---
>  arch/powerpc/include/asm/spinlock.h |  2 -
>  include/linux/spinlock.h| 10 -
>  kernel/locking/mcs_spinlock.h   |  9 -
>  kernel/rcu/tree.c   | 21 +-
>  kernel/rcu/tree_plugin.h| 11 --
>  6 files changed, 4 insertions(+), 126 deletions(-)
> 
> diff --git a/Documentation/memory-barriers.txt 
> b/Documentation/memory-barriers.txt
> index 13feb697271f..fff21b632893 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1848,74 +1848,9 @@ RELEASE are to the same lock variable, but only from 
> the perspective of
>  another CPU not holding that lock.  In short, a ACQUIRE followed by an
>  RELEASE may -not- be assumed to be a full memory barrier.
>  
> -Similarly, the reverse case of a RELEASE followed by an ACQUIRE does not
> -imply a full memory barrier.  If it is necessary for a RELEASE-ACQUIRE
> -pair to produce a full barrier, the ACQUIRE can be followed by an
> -smp_mb__after_unlock_lock() invocation.  This will produce a full barrier
> -if either (a) the RELEASE and the ACQUIRE are executed by the same
> -CPU or task, or (b) the RELEASE and ACQUIRE act on the same variable.
> -The smp_mb__after_unlock_lock() primitive is free on many architectures.
> -Without smp_mb__after_unlock_lock(), the CPU's execution of the critical
> -sections corresponding to the RELEASE and the ACQUIRE can cross, so that:
> -
> - *A = a;
> - RELEASE M
> - ACQUIRE N
> - *B = b;
> -
> -could occur as:
> -
> - ACQUIRE N, STORE *B, STORE *A, RELEASE M
> -
> -It might appear that this reordering could introduce a deadlock.
> -However, this cannot happen because if such a deadlock threatened,
> -the RELEASE would simply complete, thereby avoiding the deadlock.
> -
> - Why does this work?
> -
> - One key point is that we are only talking about the CPU doing
> - the reordering, not the compiler.  If the compiler (or, for
> - that matter, the developer) switched the operations, deadlock
> - -could- occur.
> -
> - But suppose the CPU reordered the operations.  In this case,
> - the unlock precedes the lock in the assembly code.  The CPU
> - simply elected to try executing the later lock operation first.
> - If there is a deadlock, this lock operation will simply spin (or
> - try to sleep, but more on that later).  The CPU will eventually
> - execute the unlock operation (which preceded the lock operation
> - in the assembly code), which will unravel the potential deadlock,
> - allowing the lock operation to succeed.
> -
> - But what if the lock is a sleeplock?  In that case, the code will
> - try to enter the scheduler, where it will eventually encounter
> - a memory barrier, which will force the earlier unlock operation
> - to complete, again unraveling the deadlock.  There might be
> - a sleep-unlock race, but the locking primitive needs to resolve
> - such races properly in any case.
> -
> -With

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Peter Zijlstra

On Mon, Jul 13, 2015 at 01:20:32PM -0700, Paul E. McKenney wrote:
> On Mon, Jul 13, 2015 at 06:50:29PM +0100, Will Deacon wrote:

> > So if I'm following along, smp_mb__after_unlock_lock *does* provide
> > transitivity when used with UNLOCK + LOCK, which is stronger than your
> > example here.
> 
> Yes, that is indeed the intent.

Maybe good to state this explicitly somewhere.

> > I don't think we want to make the same guarantee for general RELEASE +
> > ACQUIRE, because we'd end up forcing most architectures to implement the
> > expensive macro for a case that currently has no users.
> 
> Agreed, smp_mb__after_unlock_lock() makes a limited guarantee.

I'm still not seeing how the archs that implement load_acquire and
store_release with smp_mb() are a problem.

If we look at the inside of the critical section again -- similar
argument as before:

*A = a
smp_mb()
store M
load N
smp_mb()
*B = b

A and B are fully ordered, and in this case even transitivity is
provided.

I'm stating that the order of M and N don't matter, only the
load/stores that are inside the acquire/release are constrained.

IOW, I think smp_mb__after_unlock_lock() already works as advertised
with all our acquire/release thingies -- as is stated by the
documentation.

That said, I'm not aware of anybody but RCU actually using this, so its
not used in that capacity.

> > In which case, it boils down to the question of how expensive it would
> > be to implement an SC UNLOCK operation on PowerPC and whether that justifies
> > the existence of a complicated barrier macro that isn't used outside of
> > RCU.
> 
> Given that it is either smp_mb() or nothing, I am not seeing the
> "complicated" part...

The 'complicated' part is that we need think about it; that is we need
to realized and remember that UNLOCK+LOCK is a load-store barrier but
fails to provide transitivity.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Peter Zijlstra

On Mon, Jul 13, 2015 at 01:16:42PM -0700, Paul E. McKenney wrote:
> On Mon, Jul 13, 2015 at 03:41:53PM -0400, Peter Hurley wrote:
> > > Does that answer the question, or am I missing the point?
> > 
> > Yes, it shows that smp_mb__after_unlock_lock() has no purpose, since it
> > is defined only for PowerPC and your test above just showed that for
> > the sequence

The only purpose is to provide transitivity, but the documentation fails
to explicitly call that out.

> > 
> >   store a
> >   UNLOCK M
> >   LOCK N
> >   store b
> > 
> > a and b is always observed as an ordered pair {a,b}.
> 
> Not quite.
> 
> This is instead the sequence that is of concern:
> 
>   store a
>   unlock M
>   lock N
>   load b

So its late and that table didn't parse, but that should be ordered too.
The load of b should not be able to escape the lock N.

If only because LWSYNC is a valid RMB and any LOCK implementation must
load the lock state to observe it unlocked.

> > Additionally, the assertion in Documentation/memory_barriers.txt that
> > the sequence above can be reordered as
> > 
> >   LOCK N
> >   store b
> >   store a
> >   UNLOCK M
> > 
> > is not true on any existing arch in Linux.
> 
> It was at one time and might be again.

What would be required to make this true? I'm having a hard time seeing
how things can get reordered like that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

2015-07-13 Thread Paul E. McKenney

On Mon, Jul 13, 2015 at 06:50:29PM +0100, Will Deacon wrote:
> On Mon, Jul 13, 2015 at 04:54:47PM +0100, Peter Zijlstra wrote:
> > However I think we should look at the insides of the critical sections;
> > for example (from Documentation/memory-barriers.txt):
> > 
> > "   *A = a;
> > RELEASE M
> > ACQUIRE N
> > *B = b;
> > 
> > could occur as:
> > 
> > ACQUIRE N, STORE *B, STORE *A, RELEASE M"
> > 
> > This could not in fact happen, even though we could flip M and N, A and
> > B will remain strongly ordered.
> > 
> > That said, I don't think this could even happen on PPC because we have
> > load_acquire and store_release, this means that:
> > 
> > *A = a
> > lwsync
> > store_release M
> > load_acquire N
> > lwsync
> > *B = b
> > 
> > And since the store to M is wrapped inside two lwsync there must be
> > strong store order, and because the load from N is equally wrapped in
> > two lwsyncs there must also be strong load order.
> > 
> > In fact, no store/load can cross from before the first lwsync to after
> > the latter and the other way around.
> > 
> > So in that respect it does provide full load-store ordering. What it
> > does not provide is order for M and N, nor does it provide transitivity,
> > but looking at our documentation I'm not at all sure we guarantee that
> > in any case.
> 
> So if I'm following along, smp_mb__after_unlock_lock *does* provide
> transitivity when used with UNLOCK + LOCK, which is stronger than your
> example here.

Yes, that is indeed the intent.

> I don't think we want to make the same guarantee for general RELEASE +
> ACQUIRE, because we'd end up forcing most architectures to implement the
> expensive macro for a case that currently has no users.

Agreed, smp_mb__after_unlock_lock() makes a limited guarantee.

> In which case, it boils down to the question of how expensive it would
> be to implement an SC UNLOCK operation on PowerPC and whether that justifies
> the existence of a complicated barrier macro that isn't used outside of
> RCU.

Given that it is either smp_mb() or nothing, I am not seeing the
"complicated" part...

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 132 matches

Mail list logo