Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-14 Thread Paul E. McKenney
On Mon, Sep 14, 2015 at 04:38:48PM +0100, Will Deacon wrote:
> On Mon, Sep 14, 2015 at 01:11:56PM +0100, Peter Zijlstra wrote:
> > On Mon, Sep 14, 2015 at 02:01:53PM +0200, Peter Zijlstra wrote:
> > > The scenario is:
> > > 
> > >   CPU0CPU1
> > > 
> > >   unlock(x)
> > > smp_store_release(>lock, 0);
> > > 
> > >   unlock(y)
> > > smp_store_release(>lock, 1); /* next ==  */
> > > 
> > >   lock(y)
> > > while (!(smp_load_acquire(>lock))
> > >   cpu_relax();
> > > 
> > > 
> > > Where the lock does _NOT_ issue a store to acquire the lock at all. Now
> > > I don't think any of our current primitives manage this, so we should be
> > > good, but it might just be possible.
> > 
> > So with a bit more through this seems fundamentally impossible, you
> > always needs some stores in a lock() implementation, the above for
> > instance needs to queue itself, otherwise CPU0 will not be able to find
> > it etc..
> 
> Which brings us back round to separating LOCK/UNLOCK from ACQUIRE/RELEASE.

I believe that we do need to do this, unless we decide to have unlock-lock
continue to imply only acquire and release, rather than full ordering.
I believe that Mike Ellerman is working up additional benchmarking
on this.

Thanx, Paul

> If we say that UNLOCK(foo) -> LOCK(bar) is ordered but RELEASE(baz) ->
> ACQUIRE(boz) is only ordered by smp_mb__release_acquire(), then I think
> we're in a position where we can at least build arbitrary locks portably
> out of ACQUIRE/RELEASE operations, even though I don't see any users of
> that macro in the imminent future.
> 
> I'll have a crack at some documentation.
> 
> Will
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-14 Thread Will Deacon
On Mon, Sep 14, 2015 at 01:11:56PM +0100, Peter Zijlstra wrote:
> On Mon, Sep 14, 2015 at 02:01:53PM +0200, Peter Zijlstra wrote:
> > The scenario is:
> > 
> > CPU0CPU1
> > 
> > unlock(x)
> >   smp_store_release(>lock, 0);
> > 
> > unlock(y)
> >   smp_store_release(>lock, 1); /* next ==  */
> > 
> > lock(y)
> >   while (!(smp_load_acquire(>lock))
> > cpu_relax();
> > 
> > 
> > Where the lock does _NOT_ issue a store to acquire the lock at all. Now
> > I don't think any of our current primitives manage this, so we should be
> > good, but it might just be possible.
> 
> So with a bit more through this seems fundamentally impossible, you
> always needs some stores in a lock() implementation, the above for
> instance needs to queue itself, otherwise CPU0 will not be able to find
> it etc..

Which brings us back round to separating LOCK/UNLOCK from ACQUIRE/RELEASE.

If we say that UNLOCK(foo) -> LOCK(bar) is ordered but RELEASE(baz) ->
ACQUIRE(boz) is only ordered by smp_mb__release_acquire(), then I think
we're in a position where we can at least build arbitrary locks portably
out of ACQUIRE/RELEASE operations, even though I don't see any users of
that macro in the imminent future.

I'll have a crack at some documentation.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-14 Thread Peter Zijlstra
On Mon, Sep 14, 2015 at 02:01:53PM +0200, Peter Zijlstra wrote:
> The scenario is:
> 
>   CPU0CPU1
> 
>   unlock(x)
> smp_store_release(>lock, 0);
> 
>   unlock(y)
> smp_store_release(>lock, 1); /* next ==  */
> 
>   lock(y)
> while (!(smp_load_acquire(>lock))
>   cpu_relax();
> 
> 
> Where the lock does _NOT_ issue a store to acquire the lock at all. Now
> I don't think any of our current primitives manage this, so we should be
> good, but it might just be possible.

So with a bit more through this seems fundamentally impossible, you
always needs some stores in a lock() implementation, the above for
instance needs to queue itself, otherwise CPU0 will not be able to find
it etc..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-14 Thread Peter Zijlstra
On Mon, Sep 14, 2015 at 01:35:20PM +0200, Peter Zijlstra wrote:
> 
> Sorry for being tardy, I had a wee spell of feeling horrible and then I
> procrastinated longer than I should have.
> 
> On Fri, Sep 11, 2015 at 01:45:07PM +0100, Will Deacon wrote:
> 
> > Peter, any thoughts? I'm not au fait with the x86 memory model, but what
> > Paul's saying is worrying.
> 
> Right, so Paul is right -- and I completely forgot (I used to know about
> that).
> 
> So all the TSO archs (SPARC-TSO, x86 (!OOSTORE) and s390) can do
> smp_load_acquire()/smp_store_release() with just barrier(), and while:
> 
>   smp_store_release();
>   smp_load_acquire();
> 
> will provide full order by means of the address dependency,
> 
>   smp_store_release();
>   smp_load_acquire();
> 
> will not. Because the one reorder TSO allows is exactly that one.
> 
> > Peter -- if the above reordering can happen on x86, then moving away
> > from RCpc is going to be less popular than I hoped...
> 
> Sadly yes.. We could of course try and split LOCK from ACQUIRE again,
> but I'm not sure that's going to help anything except confusion.

This of course also means we need something like:

smp_mb__release_acquire()

which cannot be a no-op for TSO archs. And it might even mean it needs
to be the same as smp_mb__unlock_lock(), but I need to think more on
this.

The scenario is:

CPU0CPU1

unlock(x)
  smp_store_release(>lock, 0);

unlock(y)
  smp_store_release(>lock, 1); /* next ==  */

lock(y)
  while (!(smp_load_acquire(>lock))
cpu_relax();


Where the lock does _NOT_ issue a store to acquire the lock at all. Now
I don't think any of our current primitives manage this, so we should be
good, but it might just be possible.


And at the same time; having both:

smp_mb__release_acquire()
smp_mb__unlock_lock()

is quite horrible, for it clearly shows a LOCK isn't quite the same as
ACQUIRE :/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-14 Thread Peter Zijlstra

Sorry for being tardy, I had a wee spell of feeling horrible and then I
procrastinated longer than I should have.

On Fri, Sep 11, 2015 at 01:45:07PM +0100, Will Deacon wrote:

> Peter, any thoughts? I'm not au fait with the x86 memory model, but what
> Paul's saying is worrying.

Right, so Paul is right -- and I completely forgot (I used to know about
that).

So all the TSO archs (SPARC-TSO, x86 (!OOSTORE) and s390) can do
smp_load_acquire()/smp_store_release() with just barrier(), and while:

smp_store_release();
smp_load_acquire();

will provide full order by means of the address dependency,

smp_store_release();
smp_load_acquire();

will not. Because the one reorder TSO allows is exactly that one.

> Peter -- if the above reordering can happen on x86, then moving away
> from RCpc is going to be less popular than I hoped...

Sadly yes.. We could of course try and split LOCK from ACQUIRE again,
but I'm not sure that's going to help anything except confusion.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-14 Thread Peter Zijlstra

Sorry for being tardy, I had a wee spell of feeling horrible and then I
procrastinated longer than I should have.

On Fri, Sep 11, 2015 at 01:45:07PM +0100, Will Deacon wrote:

> Peter, any thoughts? I'm not au fait with the x86 memory model, but what
> Paul's saying is worrying.

Right, so Paul is right -- and I completely forgot (I used to know about
that).

So all the TSO archs (SPARC-TSO, x86 (!OOSTORE) and s390) can do
smp_load_acquire()/smp_store_release() with just barrier(), and while:

smp_store_release();
smp_load_acquire();

will provide full order by means of the address dependency,

smp_store_release();
smp_load_acquire();

will not. Because the one reorder TSO allows is exactly that one.

> Peter -- if the above reordering can happen on x86, then moving away
> from RCpc is going to be less popular than I hoped...

Sadly yes.. We could of course try and split LOCK from ACQUIRE again,
but I'm not sure that's going to help anything except confusion.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-14 Thread Peter Zijlstra
On Mon, Sep 14, 2015 at 01:35:20PM +0200, Peter Zijlstra wrote:
> 
> Sorry for being tardy, I had a wee spell of feeling horrible and then I
> procrastinated longer than I should have.
> 
> On Fri, Sep 11, 2015 at 01:45:07PM +0100, Will Deacon wrote:
> 
> > Peter, any thoughts? I'm not au fait with the x86 memory model, but what
> > Paul's saying is worrying.
> 
> Right, so Paul is right -- and I completely forgot (I used to know about
> that).
> 
> So all the TSO archs (SPARC-TSO, x86 (!OOSTORE) and s390) can do
> smp_load_acquire()/smp_store_release() with just barrier(), and while:
> 
>   smp_store_release();
>   smp_load_acquire();
> 
> will provide full order by means of the address dependency,
> 
>   smp_store_release();
>   smp_load_acquire();
> 
> will not. Because the one reorder TSO allows is exactly that one.
> 
> > Peter -- if the above reordering can happen on x86, then moving away
> > from RCpc is going to be less popular than I hoped...
> 
> Sadly yes.. We could of course try and split LOCK from ACQUIRE again,
> but I'm not sure that's going to help anything except confusion.

This of course also means we need something like:

smp_mb__release_acquire()

which cannot be a no-op for TSO archs. And it might even mean it needs
to be the same as smp_mb__unlock_lock(), but I need to think more on
this.

The scenario is:

CPU0CPU1

unlock(x)
  smp_store_release(>lock, 0);

unlock(y)
  smp_store_release(>lock, 1); /* next ==  */

lock(y)
  while (!(smp_load_acquire(>lock))
cpu_relax();


Where the lock does _NOT_ issue a store to acquire the lock at all. Now
I don't think any of our current primitives manage this, so we should be
good, but it might just be possible.


And at the same time; having both:

smp_mb__release_acquire()
smp_mb__unlock_lock()

is quite horrible, for it clearly shows a LOCK isn't quite the same as
ACQUIRE :/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-14 Thread Peter Zijlstra
On Mon, Sep 14, 2015 at 02:01:53PM +0200, Peter Zijlstra wrote:
> The scenario is:
> 
>   CPU0CPU1
> 
>   unlock(x)
> smp_store_release(>lock, 0);
> 
>   unlock(y)
> smp_store_release(>lock, 1); /* next ==  */
> 
>   lock(y)
> while (!(smp_load_acquire(>lock))
>   cpu_relax();
> 
> 
> Where the lock does _NOT_ issue a store to acquire the lock at all. Now
> I don't think any of our current primitives manage this, so we should be
> good, but it might just be possible.

So with a bit more through this seems fundamentally impossible, you
always needs some stores in a lock() implementation, the above for
instance needs to queue itself, otherwise CPU0 will not be able to find
it etc..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-14 Thread Will Deacon
On Mon, Sep 14, 2015 at 01:11:56PM +0100, Peter Zijlstra wrote:
> On Mon, Sep 14, 2015 at 02:01:53PM +0200, Peter Zijlstra wrote:
> > The scenario is:
> > 
> > CPU0CPU1
> > 
> > unlock(x)
> >   smp_store_release(>lock, 0);
> > 
> > unlock(y)
> >   smp_store_release(>lock, 1); /* next ==  */
> > 
> > lock(y)
> >   while (!(smp_load_acquire(>lock))
> > cpu_relax();
> > 
> > 
> > Where the lock does _NOT_ issue a store to acquire the lock at all. Now
> > I don't think any of our current primitives manage this, so we should be
> > good, but it might just be possible.
> 
> So with a bit more through this seems fundamentally impossible, you
> always needs some stores in a lock() implementation, the above for
> instance needs to queue itself, otherwise CPU0 will not be able to find
> it etc..

Which brings us back round to separating LOCK/UNLOCK from ACQUIRE/RELEASE.

If we say that UNLOCK(foo) -> LOCK(bar) is ordered but RELEASE(baz) ->
ACQUIRE(boz) is only ordered by smp_mb__release_acquire(), then I think
we're in a position where we can at least build arbitrary locks portably
out of ACQUIRE/RELEASE operations, even though I don't see any users of
that macro in the imminent future.

I'll have a crack at some documentation.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-14 Thread Paul E. McKenney
On Mon, Sep 14, 2015 at 04:38:48PM +0100, Will Deacon wrote:
> On Mon, Sep 14, 2015 at 01:11:56PM +0100, Peter Zijlstra wrote:
> > On Mon, Sep 14, 2015 at 02:01:53PM +0200, Peter Zijlstra wrote:
> > > The scenario is:
> > > 
> > >   CPU0CPU1
> > > 
> > >   unlock(x)
> > > smp_store_release(>lock, 0);
> > > 
> > >   unlock(y)
> > > smp_store_release(>lock, 1); /* next ==  */
> > > 
> > >   lock(y)
> > > while (!(smp_load_acquire(>lock))
> > >   cpu_relax();
> > > 
> > > 
> > > Where the lock does _NOT_ issue a store to acquire the lock at all. Now
> > > I don't think any of our current primitives manage this, so we should be
> > > good, but it might just be possible.
> > 
> > So with a bit more through this seems fundamentally impossible, you
> > always needs some stores in a lock() implementation, the above for
> > instance needs to queue itself, otherwise CPU0 will not be able to find
> > it etc..
> 
> Which brings us back round to separating LOCK/UNLOCK from ACQUIRE/RELEASE.

I believe that we do need to do this, unless we decide to have unlock-lock
continue to imply only acquire and release, rather than full ordering.
I believe that Mike Ellerman is working up additional benchmarking
on this.

Thanx, Paul

> If we say that UNLOCK(foo) -> LOCK(bar) is ordered but RELEASE(baz) ->
> ACQUIRE(boz) is only ordered by smp_mb__release_acquire(), then I think
> we're in a position where we can at least build arbitrary locks portably
> out of ACQUIRE/RELEASE operations, even though I don't see any users of
> that macro in the imminent future.
> 
> I'll have a crack at some documentation.
> 
> Will
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-11 Thread Paul E. McKenney
On Fri, Sep 11, 2015 at 01:45:07PM +0100, Will Deacon wrote:
> [left the context in the hope that we can make some progress]
> 
> On Wed, Sep 02, 2015 at 10:59:06AM +0100, Will Deacon wrote:
> > On Tue, Sep 01, 2015 at 10:45:40PM +0100, Paul E. McKenney wrote:
> > > On Tue, Sep 01, 2015 at 08:00:27PM +0100, Will Deacon wrote:
> > > > On Fri, Aug 28, 2015 at 04:39:21PM +0100, Peter Zijlstra wrote:
> > > > > Yes, the difference between RCpc and RCsc is in the meaning of 
> > > > > RELEASE +
> > > > > ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it 
> > > > > does
> > > > > not.
> > > > 
> > > > We've discussed this before, but for the sake of completeness, I don't
> > > > think we're fully RCsc either because we don't order the actual RELEASE
> > > > operation again a subsequent ACQUIRE operation:
> > > > 
> > > > P0
> > > > smp_store_release(, 1);
> > > > foo = smp_load_acquire();
> > > > 
> > > > P1
> > > > smp_store_release(, 1);
> > > > bar = smp_load_acquire();
> > > > 
> > > > We allow foo == bar == 0, which is prohibited by SC.
> > > 
> > > I certainly hope that no one expects foo == bar == 0 to be prohibited!!!
> > 
> > I just thought it was worth making this point, because it is prohibited
> > in SC and I don't want people to think that our RELEASE/ACQUIRE operations
> > are SC (even though they happen to be on arm64).
> > 
> > > On the other hand, in this case, foo == bar == 1 will be prohibited:
> > > 
> > > P0
> > > foo = smp_load_acquire();
> > > smp_store_release(, 1);
> > > 
> > > P1
> > > bar = smp_load_acquire();
> > > smp_store_release(, 1);
> > 
> > Agreed.
> > 
> > > > However, we *do* enforce ordering on any prior or subsequent accesses
> > > > for the code snippet above (the release and acquire combine to give a
> > > > full barrier), which makes these primitives well suited to things like
> > > > message passing.
> > > 
> > > If I understand your example correctly, neither x86 nor Power implement
> > > a full barrier in this case.  For example:
> > > 
> > >   P0
> > >   WRITE_ONCE(a, 1);
> > >   smp_store_release(b, 1);
> > >   r1 = smp_load_acquire(c);
> > >   r2 = READ_ONCE(d);
> > > 
> > >   P1
> > >   WRITE_ONCE(d, 1);
> > >   smp_mb();
> > >   r3 = READ_ONCE(a);
> > > 
> > > Both x86 and Power can reorder P0 as follows:
> > > 
> > >   P0
> > >   r1 = smp_load_acquire(c);
> > >   r2 = READ_ONCE(d);
> > >   WRITE_ONCE(a, 1);
> > >   smp_store_release(b, 1);
> > > 
> > > Which clearly shows that the non-SC outcome r2 == 0 && r3 == 0 is allowed.
> > > 
> > > Or am I missing your point here?
> > 
> > I think this example is slightly different. Having the RELEASE/ACQUIRE
> > operations being reordered with respect to each other is one thing, but
> > I thought we were heading in a direction where they combined to give a
> > full barrier with respect to other accesses. In that case, the reordering
> > above would be forbidden.
> > 
> > Peter -- if the above reordering can happen on x86, then moving away
> > from RCpc is going to be less popular than I hoped...
> 
> Peter, any thoughts? I'm not au fait with the x86 memory model, but what
> Paul's saying is worrying.

The herd tool has an x86 mode, which will allow you to double-check
my scenario.  This tool is described in "Herding Cats: Modelling,
Simulation, Testing, and Data-mining for Weak Memory" by Alglave,
Marenget, and Tautschnig.  The herd tool is available at this git
repository: https://github.com/herd/herdtools.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-11 Thread Will Deacon
[left the context in the hope that we can make some progress]

On Wed, Sep 02, 2015 at 10:59:06AM +0100, Will Deacon wrote:
> On Tue, Sep 01, 2015 at 10:45:40PM +0100, Paul E. McKenney wrote:
> > On Tue, Sep 01, 2015 at 08:00:27PM +0100, Will Deacon wrote:
> > > On Fri, Aug 28, 2015 at 04:39:21PM +0100, Peter Zijlstra wrote:
> > > > Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
> > > > ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
> > > > not.
> > > 
> > > We've discussed this before, but for the sake of completeness, I don't
> > > think we're fully RCsc either because we don't order the actual RELEASE
> > > operation again a subsequent ACQUIRE operation:
> > > 
> > > P0
> > > smp_store_release(, 1);
> > > foo = smp_load_acquire();
> > > 
> > > P1
> > > smp_store_release(, 1);
> > > bar = smp_load_acquire();
> > > 
> > > We allow foo == bar == 0, which is prohibited by SC.
> > 
> > I certainly hope that no one expects foo == bar == 0 to be prohibited!!!
> 
> I just thought it was worth making this point, because it is prohibited
> in SC and I don't want people to think that our RELEASE/ACQUIRE operations
> are SC (even though they happen to be on arm64).
> 
> > On the other hand, in this case, foo == bar == 1 will be prohibited:
> > 
> > P0
> > foo = smp_load_acquire();
> > smp_store_release(, 1);
> > 
> > P1
> > bar = smp_load_acquire();
> > smp_store_release(, 1);
> 
> Agreed.
> 
> > > However, we *do* enforce ordering on any prior or subsequent accesses
> > > for the code snippet above (the release and acquire combine to give a
> > > full barrier), which makes these primitives well suited to things like
> > > message passing.
> > 
> > If I understand your example correctly, neither x86 nor Power implement
> > a full barrier in this case.  For example:
> > 
> > P0
> > WRITE_ONCE(a, 1);
> > smp_store_release(b, 1);
> > r1 = smp_load_acquire(c);
> > r2 = READ_ONCE(d);
> > 
> > P1
> > WRITE_ONCE(d, 1);
> > smp_mb();
> > r3 = READ_ONCE(a);
> > 
> > Both x86 and Power can reorder P0 as follows:
> > 
> > P0
> > r1 = smp_load_acquire(c);
> > r2 = READ_ONCE(d);
> > WRITE_ONCE(a, 1);
> > smp_store_release(b, 1);
> > 
> > Which clearly shows that the non-SC outcome r2 == 0 && r3 == 0 is allowed.
> > 
> > Or am I missing your point here?
> 
> I think this example is slightly different. Having the RELEASE/ACQUIRE
> operations being reordered with respect to each other is one thing, but
> I thought we were heading in a direction where they combined to give a
> full barrier with respect to other accesses. In that case, the reordering
> above would be forbidden.
> 
> Peter -- if the above reordering can happen on x86, then moving away
> from RCpc is going to be less popular than I hoped...

Peter, any thoughts? I'm not au fait with the x86 memory model, but what
Paul's saying is worrying.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-11 Thread Paul E. McKenney
On Fri, Sep 11, 2015 at 01:45:07PM +0100, Will Deacon wrote:
> [left the context in the hope that we can make some progress]
> 
> On Wed, Sep 02, 2015 at 10:59:06AM +0100, Will Deacon wrote:
> > On Tue, Sep 01, 2015 at 10:45:40PM +0100, Paul E. McKenney wrote:
> > > On Tue, Sep 01, 2015 at 08:00:27PM +0100, Will Deacon wrote:
> > > > On Fri, Aug 28, 2015 at 04:39:21PM +0100, Peter Zijlstra wrote:
> > > > > Yes, the difference between RCpc and RCsc is in the meaning of 
> > > > > RELEASE +
> > > > > ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it 
> > > > > does
> > > > > not.
> > > > 
> > > > We've discussed this before, but for the sake of completeness, I don't
> > > > think we're fully RCsc either because we don't order the actual RELEASE
> > > > operation again a subsequent ACQUIRE operation:
> > > > 
> > > > P0
> > > > smp_store_release(, 1);
> > > > foo = smp_load_acquire();
> > > > 
> > > > P1
> > > > smp_store_release(, 1);
> > > > bar = smp_load_acquire();
> > > > 
> > > > We allow foo == bar == 0, which is prohibited by SC.
> > > 
> > > I certainly hope that no one expects foo == bar == 0 to be prohibited!!!
> > 
> > I just thought it was worth making this point, because it is prohibited
> > in SC and I don't want people to think that our RELEASE/ACQUIRE operations
> > are SC (even though they happen to be on arm64).
> > 
> > > On the other hand, in this case, foo == bar == 1 will be prohibited:
> > > 
> > > P0
> > > foo = smp_load_acquire();
> > > smp_store_release(, 1);
> > > 
> > > P1
> > > bar = smp_load_acquire();
> > > smp_store_release(, 1);
> > 
> > Agreed.
> > 
> > > > However, we *do* enforce ordering on any prior or subsequent accesses
> > > > for the code snippet above (the release and acquire combine to give a
> > > > full barrier), which makes these primitives well suited to things like
> > > > message passing.
> > > 
> > > If I understand your example correctly, neither x86 nor Power implement
> > > a full barrier in this case.  For example:
> > > 
> > >   P0
> > >   WRITE_ONCE(a, 1);
> > >   smp_store_release(b, 1);
> > >   r1 = smp_load_acquire(c);
> > >   r2 = READ_ONCE(d);
> > > 
> > >   P1
> > >   WRITE_ONCE(d, 1);
> > >   smp_mb();
> > >   r3 = READ_ONCE(a);
> > > 
> > > Both x86 and Power can reorder P0 as follows:
> > > 
> > >   P0
> > >   r1 = smp_load_acquire(c);
> > >   r2 = READ_ONCE(d);
> > >   WRITE_ONCE(a, 1);
> > >   smp_store_release(b, 1);
> > > 
> > > Which clearly shows that the non-SC outcome r2 == 0 && r3 == 0 is allowed.
> > > 
> > > Or am I missing your point here?
> > 
> > I think this example is slightly different. Having the RELEASE/ACQUIRE
> > operations being reordered with respect to each other is one thing, but
> > I thought we were heading in a direction where they combined to give a
> > full barrier with respect to other accesses. In that case, the reordering
> > above would be forbidden.
> > 
> > Peter -- if the above reordering can happen on x86, then moving away
> > from RCpc is going to be less popular than I hoped...
> 
> Peter, any thoughts? I'm not au fait with the x86 memory model, but what
> Paul's saying is worrying.

The herd tool has an x86 mode, which will allow you to double-check
my scenario.  This tool is described in "Herding Cats: Modelling,
Simulation, Testing, and Data-mining for Weak Memory" by Alglave,
Marenget, and Tautschnig.  The herd tool is available at this git
repository: https://github.com/herd/herdtools.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-11 Thread Will Deacon
[left the context in the hope that we can make some progress]

On Wed, Sep 02, 2015 at 10:59:06AM +0100, Will Deacon wrote:
> On Tue, Sep 01, 2015 at 10:45:40PM +0100, Paul E. McKenney wrote:
> > On Tue, Sep 01, 2015 at 08:00:27PM +0100, Will Deacon wrote:
> > > On Fri, Aug 28, 2015 at 04:39:21PM +0100, Peter Zijlstra wrote:
> > > > Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
> > > > ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
> > > > not.
> > > 
> > > We've discussed this before, but for the sake of completeness, I don't
> > > think we're fully RCsc either because we don't order the actual RELEASE
> > > operation again a subsequent ACQUIRE operation:
> > > 
> > > P0
> > > smp_store_release(, 1);
> > > foo = smp_load_acquire();
> > > 
> > > P1
> > > smp_store_release(, 1);
> > > bar = smp_load_acquire();
> > > 
> > > We allow foo == bar == 0, which is prohibited by SC.
> > 
> > I certainly hope that no one expects foo == bar == 0 to be prohibited!!!
> 
> I just thought it was worth making this point, because it is prohibited
> in SC and I don't want people to think that our RELEASE/ACQUIRE operations
> are SC (even though they happen to be on arm64).
> 
> > On the other hand, in this case, foo == bar == 1 will be prohibited:
> > 
> > P0
> > foo = smp_load_acquire();
> > smp_store_release(, 1);
> > 
> > P1
> > bar = smp_load_acquire();
> > smp_store_release(, 1);
> 
> Agreed.
> 
> > > However, we *do* enforce ordering on any prior or subsequent accesses
> > > for the code snippet above (the release and acquire combine to give a
> > > full barrier), which makes these primitives well suited to things like
> > > message passing.
> > 
> > If I understand your example correctly, neither x86 nor Power implement
> > a full barrier in this case.  For example:
> > 
> > P0
> > WRITE_ONCE(a, 1);
> > smp_store_release(b, 1);
> > r1 = smp_load_acquire(c);
> > r2 = READ_ONCE(d);
> > 
> > P1
> > WRITE_ONCE(d, 1);
> > smp_mb();
> > r3 = READ_ONCE(a);
> > 
> > Both x86 and Power can reorder P0 as follows:
> > 
> > P0
> > r1 = smp_load_acquire(c);
> > r2 = READ_ONCE(d);
> > WRITE_ONCE(a, 1);
> > smp_store_release(b, 1);
> > 
> > Which clearly shows that the non-SC outcome r2 == 0 && r3 == 0 is allowed.
> > 
> > Or am I missing your point here?
> 
> I think this example is slightly different. Having the RELEASE/ACQUIRE
> operations being reordered with respect to each other is one thing, but
> I thought we were heading in a direction where they combined to give a
> full barrier with respect to other accesses. In that case, the reordering
> above would be forbidden.
> 
> Peter -- if the above reordering can happen on x86, then moving away
> from RCpc is going to be less popular than I hoped...

Peter, any thoughts? I'm not au fait with the x86 memory model, but what
Paul's saying is worrying.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-03 Thread Will Deacon
On Wed, Sep 02, 2015 at 04:36:09PM +0100, Pranith Kumar wrote:
> On Wed, Sep 2, 2015 at 11:23 AM, Pranith Kumar  wrote:
> > On 09/02/2015 05:59 AM, Will Deacon wrote:
> >> I just thought it was worth making this point, because it is prohibited
> >> in SC and I don't want people to think that our RELEASE/ACQUIRE operations
> >> are SC (even though they happen to be on arm64).
> >
> > This is interesting information. Does that mean that the following patch
> > should work? (I am not proposing to use it, just trying to understand if
> > REL+ACQ will act as a full barrier on ARM64, which you say it does).
> >
> > Thanks,
> > Pranith.
> >
> > diff --git a/arch/arm64/include/asm/cmpxchg.h 
> > b/arch/arm64/include/asm/cmpxchg.h
> > index d8c25b7..14a1b35 100644
> > --- a/arch/arm64/include/asm/cmpxchg.h
> > +++ b/arch/arm64/include/asm/cmpxchg.h
> > @@ -68,8 +68,7 @@ static inline unsigned long __xchg(unsigned long x, 
> > volatile void *ptr, int size
> > BUILD_BUG();
> > }
> >
> > -   smp_mb();
> > -   return ret;
> > +   return smp_load_acquire(ret);
> 
> I meant 'smp_load_acquire();'

Yes, I think that would work on arm64, but it's not portable between
architectures.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-03 Thread Will Deacon
On Wed, Sep 02, 2015 at 04:36:09PM +0100, Pranith Kumar wrote:
> On Wed, Sep 2, 2015 at 11:23 AM, Pranith Kumar  wrote:
> > On 09/02/2015 05:59 AM, Will Deacon wrote:
> >> I just thought it was worth making this point, because it is prohibited
> >> in SC and I don't want people to think that our RELEASE/ACQUIRE operations
> >> are SC (even though they happen to be on arm64).
> >
> > This is interesting information. Does that mean that the following patch
> > should work? (I am not proposing to use it, just trying to understand if
> > REL+ACQ will act as a full barrier on ARM64, which you say it does).
> >
> > Thanks,
> > Pranith.
> >
> > diff --git a/arch/arm64/include/asm/cmpxchg.h 
> > b/arch/arm64/include/asm/cmpxchg.h
> > index d8c25b7..14a1b35 100644
> > --- a/arch/arm64/include/asm/cmpxchg.h
> > +++ b/arch/arm64/include/asm/cmpxchg.h
> > @@ -68,8 +68,7 @@ static inline unsigned long __xchg(unsigned long x, 
> > volatile void *ptr, int size
> > BUILD_BUG();
> > }
> >
> > -   smp_mb();
> > -   return ret;
> > +   return smp_load_acquire(ret);
> 
> I meant 'smp_load_acquire();'

Yes, I think that would work on arm64, but it's not portable between
architectures.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-02 Thread Pranith Kumar
On Wed, Sep 2, 2015 at 11:23 AM, Pranith Kumar  wrote:
> Hi Will,
>
> On 09/02/2015 05:59 AM, Will Deacon wrote:
>> I just thought it was worth making this point, because it is prohibited
>> in SC and I don't want people to think that our RELEASE/ACQUIRE operations
>> are SC (even though they happen to be on arm64).
>
> This is interesting information. Does that mean that the following patch
> should work? (I am not proposing to use it, just trying to understand if
> REL+ACQ will act as a full barrier on ARM64, which you say it does).
>
> Thanks,
> Pranith.
>
> diff --git a/arch/arm64/include/asm/cmpxchg.h 
> b/arch/arm64/include/asm/cmpxchg.h
> index d8c25b7..14a1b35 100644
> --- a/arch/arm64/include/asm/cmpxchg.h
> +++ b/arch/arm64/include/asm/cmpxchg.h
> @@ -68,8 +68,7 @@ static inline unsigned long __xchg(unsigned long x, 
> volatile void *ptr, int size
> BUILD_BUG();
> }
>
> -   smp_mb();
> -   return ret;
> +   return smp_load_acquire(ret);

I meant 'smp_load_acquire();'

-- 
Pranith
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-02 Thread Pranith Kumar
Hi Will,

On 09/02/2015 05:59 AM, Will Deacon wrote:
> I just thought it was worth making this point, because it is prohibited
> in SC and I don't want people to think that our RELEASE/ACQUIRE operations
> are SC (even though they happen to be on arm64).

This is interesting information. Does that mean that the following patch
should work? (I am not proposing to use it, just trying to understand if 
REL+ACQ will act as a full barrier on ARM64, which you say it does).

Thanks,
Pranith.

diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
index d8c25b7..14a1b35 100644
--- a/arch/arm64/include/asm/cmpxchg.h
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -68,8 +68,7 @@ static inline unsigned long __xchg(unsigned long x, volatile 
void *ptr, int size
BUILD_BUG();
}
 
-   smp_mb();
-   return ret;
+   return smp_load_acquire(ret);
 }
 
 #define xchg(ptr,x) \

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-02 Thread Paul E. McKenney
On Wed, Sep 02, 2015 at 10:59:06AM +0100, Will Deacon wrote:
> Hi Paul,
> 
> On Tue, Sep 01, 2015 at 10:45:40PM +0100, Paul E. McKenney wrote:
> > On Tue, Sep 01, 2015 at 08:00:27PM +0100, Will Deacon wrote:
> > > On Fri, Aug 28, 2015 at 04:39:21PM +0100, Peter Zijlstra wrote:
> > > > Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
> > > > ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
> > > > not.
> > > 
> > > We've discussed this before, but for the sake of completeness, I don't
> > > think we're fully RCsc either because we don't order the actual RELEASE
> > > operation again a subsequent ACQUIRE operation:
> > > 
> > > P0
> > > smp_store_release(, 1);
> > > foo = smp_load_acquire();
> > > 
> > > P1
> > > smp_store_release(, 1);
> > > bar = smp_load_acquire();
> > > 
> > > We allow foo == bar == 0, which is prohibited by SC.
> > 
> > I certainly hope that no one expects foo == bar == 0 to be prohibited!!!
> 
> I just thought it was worth making this point, because it is prohibited
> in SC and I don't want people to think that our RELEASE/ACQUIRE operations
> are SC (even though they happen to be on arm64).

OK, good.

> > On the other hand, in this case, foo == bar == 1 will be prohibited:
> > 
> > P0
> > foo = smp_load_acquire();
> > smp_store_release(, 1);
> > 
> > P1
> > bar = smp_load_acquire();
> > smp_store_release(, 1);
> 
> Agreed.

Good as well.

> > > However, we *do* enforce ordering on any prior or subsequent accesses
> > > for the code snippet above (the release and acquire combine to give a
> > > full barrier), which makes these primitives well suited to things like
> > > message passing.
> > 
> > If I understand your example correctly, neither x86 nor Power implement
> > a full barrier in this case.  For example:
> > 
> > P0
> > WRITE_ONCE(a, 1);
> > smp_store_release(b, 1);
> > r1 = smp_load_acquire(c);
> > r2 = READ_ONCE(d);
> > 
> > P1
> > WRITE_ONCE(d, 1);
> > smp_mb();
> > r3 = READ_ONCE(a);
> > 
> > Both x86 and Power can reorder P0 as follows:
> > 
> > P0
> > r1 = smp_load_acquire(c);
> > r2 = READ_ONCE(d);
> > WRITE_ONCE(a, 1);
> > smp_store_release(b, 1);
> > 
> > Which clearly shows that the non-SC outcome r2 == 0 && r3 == 0 is allowed.
> > 
> > Or am I missing your point here?
> 
> I think this example is slightly different. Having the RELEASE/ACQUIRE
> operations being reordered with respect to each other is one thing, but
> I thought we were heading in a direction where they combined to give a
> full barrier with respect to other accesses. In that case, the reordering
> above would be forbidden.

It is certainly less added overhead to make unlock-lock a full barrier
than it is to make smp_store_release()-smp_load_acquire() a full barrier.
I am not fully convinced on either, aside from needing some way to make
unlock-lock a full barrier within the RCU implementation, for which the
now-privatized smp_mb__after_unlock_lock() suffices.

> Peter -- if the above reordering can happen on x86, then moving away
> from RCpc is going to be less popular than I hoped...

;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-02 Thread Will Deacon
Hi Paul,

On Tue, Sep 01, 2015 at 10:45:40PM +0100, Paul E. McKenney wrote:
> On Tue, Sep 01, 2015 at 08:00:27PM +0100, Will Deacon wrote:
> > On Fri, Aug 28, 2015 at 04:39:21PM +0100, Peter Zijlstra wrote:
> > > Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
> > > ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
> > > not.
> > 
> > We've discussed this before, but for the sake of completeness, I don't
> > think we're fully RCsc either because we don't order the actual RELEASE
> > operation again a subsequent ACQUIRE operation:
> > 
> > P0
> > smp_store_release(, 1);
> > foo = smp_load_acquire();
> > 
> > P1
> > smp_store_release(, 1);
> > bar = smp_load_acquire();
> > 
> > We allow foo == bar == 0, which is prohibited by SC.
> 
> I certainly hope that no one expects foo == bar == 0 to be prohibited!!!

I just thought it was worth making this point, because it is prohibited
in SC and I don't want people to think that our RELEASE/ACQUIRE operations
are SC (even though they happen to be on arm64).

> On the other hand, in this case, foo == bar == 1 will be prohibited:
> 
> P0
> foo = smp_load_acquire();
> smp_store_release(, 1);
> 
> P1
> bar = smp_load_acquire();
> smp_store_release(, 1);

Agreed.

> > However, we *do* enforce ordering on any prior or subsequent accesses
> > for the code snippet above (the release and acquire combine to give a
> > full barrier), which makes these primitives well suited to things like
> > message passing.
> 
> If I understand your example correctly, neither x86 nor Power implement
> a full barrier in this case.  For example:
> 
>   P0
>   WRITE_ONCE(a, 1);
>   smp_store_release(b, 1);
>   r1 = smp_load_acquire(c);
>   r2 = READ_ONCE(d);
> 
>   P1
>   WRITE_ONCE(d, 1);
>   smp_mb();
>   r3 = READ_ONCE(a);
> 
> Both x86 and Power can reorder P0 as follows:
> 
>   P0
>   r1 = smp_load_acquire(c);
>   r2 = READ_ONCE(d);
>   WRITE_ONCE(a, 1);
>   smp_store_release(b, 1);
> 
> Which clearly shows that the non-SC outcome r2 == 0 && r3 == 0 is allowed.
> 
> Or am I missing your point here?

I think this example is slightly different. Having the RELEASE/ACQUIRE
operations being reordered with respect to each other is one thing, but
I thought we were heading in a direction where they combined to give a
full barrier with respect to other accesses. In that case, the reordering
above would be forbidden.

Peter -- if the above reordering can happen on x86, then moving away
from RCpc is going to be less popular than I hoped...

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-02 Thread Pranith Kumar
Hi Will,

On 09/02/2015 05:59 AM, Will Deacon wrote:
> I just thought it was worth making this point, because it is prohibited
> in SC and I don't want people to think that our RELEASE/ACQUIRE operations
> are SC (even though they happen to be on arm64).

This is interesting information. Does that mean that the following patch
should work? (I am not proposing to use it, just trying to understand if 
REL+ACQ will act as a full barrier on ARM64, which you say it does).

Thanks,
Pranith.

diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
index d8c25b7..14a1b35 100644
--- a/arch/arm64/include/asm/cmpxchg.h
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -68,8 +68,7 @@ static inline unsigned long __xchg(unsigned long x, volatile 
void *ptr, int size
BUILD_BUG();
}
 
-   smp_mb();
-   return ret;
+   return smp_load_acquire(ret);
 }
 
 #define xchg(ptr,x) \

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-02 Thread Pranith Kumar
On Wed, Sep 2, 2015 at 11:23 AM, Pranith Kumar  wrote:
> Hi Will,
>
> On 09/02/2015 05:59 AM, Will Deacon wrote:
>> I just thought it was worth making this point, because it is prohibited
>> in SC and I don't want people to think that our RELEASE/ACQUIRE operations
>> are SC (even though they happen to be on arm64).
>
> This is interesting information. Does that mean that the following patch
> should work? (I am not proposing to use it, just trying to understand if
> REL+ACQ will act as a full barrier on ARM64, which you say it does).
>
> Thanks,
> Pranith.
>
> diff --git a/arch/arm64/include/asm/cmpxchg.h 
> b/arch/arm64/include/asm/cmpxchg.h
> index d8c25b7..14a1b35 100644
> --- a/arch/arm64/include/asm/cmpxchg.h
> +++ b/arch/arm64/include/asm/cmpxchg.h
> @@ -68,8 +68,7 @@ static inline unsigned long __xchg(unsigned long x, 
> volatile void *ptr, int size
> BUILD_BUG();
> }
>
> -   smp_mb();
> -   return ret;
> +   return smp_load_acquire(ret);

I meant 'smp_load_acquire();'

-- 
Pranith
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-02 Thread Paul E. McKenney
On Wed, Sep 02, 2015 at 10:59:06AM +0100, Will Deacon wrote:
> Hi Paul,
> 
> On Tue, Sep 01, 2015 at 10:45:40PM +0100, Paul E. McKenney wrote:
> > On Tue, Sep 01, 2015 at 08:00:27PM +0100, Will Deacon wrote:
> > > On Fri, Aug 28, 2015 at 04:39:21PM +0100, Peter Zijlstra wrote:
> > > > Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
> > > > ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
> > > > not.
> > > 
> > > We've discussed this before, but for the sake of completeness, I don't
> > > think we're fully RCsc either because we don't order the actual RELEASE
> > > operation again a subsequent ACQUIRE operation:
> > > 
> > > P0
> > > smp_store_release(, 1);
> > > foo = smp_load_acquire();
> > > 
> > > P1
> > > smp_store_release(, 1);
> > > bar = smp_load_acquire();
> > > 
> > > We allow foo == bar == 0, which is prohibited by SC.
> > 
> > I certainly hope that no one expects foo == bar == 0 to be prohibited!!!
> 
> I just thought it was worth making this point, because it is prohibited
> in SC and I don't want people to think that our RELEASE/ACQUIRE operations
> are SC (even though they happen to be on arm64).

OK, good.

> > On the other hand, in this case, foo == bar == 1 will be prohibited:
> > 
> > P0
> > foo = smp_load_acquire();
> > smp_store_release(, 1);
> > 
> > P1
> > bar = smp_load_acquire();
> > smp_store_release(, 1);
> 
> Agreed.

Good as well.

> > > However, we *do* enforce ordering on any prior or subsequent accesses
> > > for the code snippet above (the release and acquire combine to give a
> > > full barrier), which makes these primitives well suited to things like
> > > message passing.
> > 
> > If I understand your example correctly, neither x86 nor Power implement
> > a full barrier in this case.  For example:
> > 
> > P0
> > WRITE_ONCE(a, 1);
> > smp_store_release(b, 1);
> > r1 = smp_load_acquire(c);
> > r2 = READ_ONCE(d);
> > 
> > P1
> > WRITE_ONCE(d, 1);
> > smp_mb();
> > r3 = READ_ONCE(a);
> > 
> > Both x86 and Power can reorder P0 as follows:
> > 
> > P0
> > r1 = smp_load_acquire(c);
> > r2 = READ_ONCE(d);
> > WRITE_ONCE(a, 1);
> > smp_store_release(b, 1);
> > 
> > Which clearly shows that the non-SC outcome r2 == 0 && r3 == 0 is allowed.
> > 
> > Or am I missing your point here?
> 
> I think this example is slightly different. Having the RELEASE/ACQUIRE
> operations being reordered with respect to each other is one thing, but
> I thought we were heading in a direction where they combined to give a
> full barrier with respect to other accesses. In that case, the reordering
> above would be forbidden.

It is certainly less added overhead to make unlock-lock a full barrier
than it is to make smp_store_release()-smp_load_acquire() a full barrier.
I am not fully convinced on either, aside from needing some way to make
unlock-lock a full barrier within the RCU implementation, for which the
now-privatized smp_mb__after_unlock_lock() suffices.

> Peter -- if the above reordering can happen on x86, then moving away
> from RCpc is going to be less popular than I hoped...

;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-02 Thread Will Deacon
Hi Paul,

On Tue, Sep 01, 2015 at 10:45:40PM +0100, Paul E. McKenney wrote:
> On Tue, Sep 01, 2015 at 08:00:27PM +0100, Will Deacon wrote:
> > On Fri, Aug 28, 2015 at 04:39:21PM +0100, Peter Zijlstra wrote:
> > > Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
> > > ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
> > > not.
> > 
> > We've discussed this before, but for the sake of completeness, I don't
> > think we're fully RCsc either because we don't order the actual RELEASE
> > operation again a subsequent ACQUIRE operation:
> > 
> > P0
> > smp_store_release(, 1);
> > foo = smp_load_acquire();
> > 
> > P1
> > smp_store_release(, 1);
> > bar = smp_load_acquire();
> > 
> > We allow foo == bar == 0, which is prohibited by SC.
> 
> I certainly hope that no one expects foo == bar == 0 to be prohibited!!!

I just thought it was worth making this point, because it is prohibited
in SC and I don't want people to think that our RELEASE/ACQUIRE operations
are SC (even though they happen to be on arm64).

> On the other hand, in this case, foo == bar == 1 will be prohibited:
> 
> P0
> foo = smp_load_acquire();
> smp_store_release(, 1);
> 
> P1
> bar = smp_load_acquire();
> smp_store_release(, 1);

Agreed.

> > However, we *do* enforce ordering on any prior or subsequent accesses
> > for the code snippet above (the release and acquire combine to give a
> > full barrier), which makes these primitives well suited to things like
> > message passing.
> 
> If I understand your example correctly, neither x86 nor Power implement
> a full barrier in this case.  For example:
> 
>   P0
>   WRITE_ONCE(a, 1);
>   smp_store_release(b, 1);
>   r1 = smp_load_acquire(c);
>   r2 = READ_ONCE(d);
> 
>   P1
>   WRITE_ONCE(d, 1);
>   smp_mb();
>   r3 = READ_ONCE(a);
> 
> Both x86 and Power can reorder P0 as follows:
> 
>   P0
>   r1 = smp_load_acquire(c);
>   r2 = READ_ONCE(d);
>   WRITE_ONCE(a, 1);
>   smp_store_release(b, 1);
> 
> Which clearly shows that the non-SC outcome r2 == 0 && r3 == 0 is allowed.
> 
> Or am I missing your point here?

I think this example is slightly different. Having the RELEASE/ACQUIRE
operations being reordered with respect to each other is one thing, but
I thought we were heading in a direction where they combined to give a
full barrier with respect to other accesses. In that case, the reordering
above would be forbidden.

Peter -- if the above reordering can happen on x86, then moving away
from RCpc is going to be less popular than I hoped...

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-01 Thread Paul E. McKenney
On Tue, Sep 01, 2015 at 08:00:27PM +0100, Will Deacon wrote:
> On Fri, Aug 28, 2015 at 04:39:21PM +0100, Peter Zijlstra wrote:
> > On Fri, Aug 28, 2015 at 10:16:02PM +0800, Boqun Feng wrote:
> > > Ah.. just read through the thread you mentioned, I might misunderstand
> > > you, probably because I didn't understand RCpc well..
> > > 
> > > You are saying that in a RELEASE we -might- switch from smp_lwsync() to
> > > smp_mb() semantically, right? I guess this means we -might- switch from
> > > RCpc to RCsc, right?
> > > 
> > > If so, I think I'd better to wait until we have a conclusion for this.
> > 
> > Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
> > ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
> > not.
> 
> We've discussed this before, but for the sake of completeness, I don't
> think we're fully RCsc either because we don't order the actual RELEASE
> operation again a subsequent ACQUIRE operation:
> 
> P0
> smp_store_release(, 1);
> foo = smp_load_acquire();
> 
> P1
> smp_store_release(, 1);
> bar = smp_load_acquire();
> 
> We allow foo == bar == 0, which is prohibited by SC.

I certainly hope that no one expects foo == bar == 0 to be prohibited!!!

On the other hand, in this case, foo == bar == 1 will be prohibited:

P0
foo = smp_load_acquire();
smp_store_release(, 1);

P1
bar = smp_load_acquire();
smp_store_release(, 1);

> However, we *do* enforce ordering on any prior or subsequent accesses
> for the code snippet above (the release and acquire combine to give a
> full barrier), which makes these primitives well suited to things like
> message passing.

If I understand your example correctly, neither x86 nor Power implement
a full barrier in this case.  For example:

P0
WRITE_ONCE(a, 1);
smp_store_release(b, 1);
r1 = smp_load_acquire(c);
r2 = READ_ONCE(d);

P1
WRITE_ONCE(d, 1);
smp_mb();
r3 = READ_ONCE(a);

Both x86 and Power can reorder P0 as follows:

P0
r1 = smp_load_acquire(c);
r2 = READ_ONCE(d);
WRITE_ONCE(a, 1);
smp_store_release(b, 1);

Which clearly shows that the non-SC outcome r2 == 0 && r3 == 0 is allowed.

Or am I missing your point here?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-01 Thread Will Deacon
On Fri, Aug 28, 2015 at 04:39:21PM +0100, Peter Zijlstra wrote:
> On Fri, Aug 28, 2015 at 10:16:02PM +0800, Boqun Feng wrote:
> > Ah.. just read through the thread you mentioned, I might misunderstand
> > you, probably because I didn't understand RCpc well..
> > 
> > You are saying that in a RELEASE we -might- switch from smp_lwsync() to
> > smp_mb() semantically, right? I guess this means we -might- switch from
> > RCpc to RCsc, right?
> > 
> > If so, I think I'd better to wait until we have a conclusion for this.
> 
> Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
> ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
> not.

We've discussed this before, but for the sake of completeness, I don't
think we're fully RCsc either because we don't order the actual RELEASE
operation again a subsequent ACQUIRE operation:


P0
smp_store_release(, 1);
foo = smp_load_acquire();

P1
smp_store_release(, 1);
bar = smp_load_acquire();

We allow foo == bar == 0, which is prohibited by SC.


However, we *do* enforce ordering on any prior or subsequent accesses
for the code snippet above (the release and acquire combine to give a
full barrier), which makes these primitives well suited to things like
message passing.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-01 Thread Will Deacon
On Fri, Aug 28, 2015 at 04:39:21PM +0100, Peter Zijlstra wrote:
> On Fri, Aug 28, 2015 at 10:16:02PM +0800, Boqun Feng wrote:
> > Ah.. just read through the thread you mentioned, I might misunderstand
> > you, probably because I didn't understand RCpc well..
> > 
> > You are saying that in a RELEASE we -might- switch from smp_lwsync() to
> > smp_mb() semantically, right? I guess this means we -might- switch from
> > RCpc to RCsc, right?
> > 
> > If so, I think I'd better to wait until we have a conclusion for this.
> 
> Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
> ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
> not.

We've discussed this before, but for the sake of completeness, I don't
think we're fully RCsc either because we don't order the actual RELEASE
operation again a subsequent ACQUIRE operation:


P0
smp_store_release(, 1);
foo = smp_load_acquire();

P1
smp_store_release(, 1);
bar = smp_load_acquire();

We allow foo == bar == 0, which is prohibited by SC.


However, we *do* enforce ordering on any prior or subsequent accesses
for the code snippet above (the release and acquire combine to give a
full barrier), which makes these primitives well suited to things like
message passing.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-01 Thread Paul E. McKenney
On Tue, Sep 01, 2015 at 08:00:27PM +0100, Will Deacon wrote:
> On Fri, Aug 28, 2015 at 04:39:21PM +0100, Peter Zijlstra wrote:
> > On Fri, Aug 28, 2015 at 10:16:02PM +0800, Boqun Feng wrote:
> > > Ah.. just read through the thread you mentioned, I might misunderstand
> > > you, probably because I didn't understand RCpc well..
> > > 
> > > You are saying that in a RELEASE we -might- switch from smp_lwsync() to
> > > smp_mb() semantically, right? I guess this means we -might- switch from
> > > RCpc to RCsc, right?
> > > 
> > > If so, I think I'd better to wait until we have a conclusion for this.
> > 
> > Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
> > ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
> > not.
> 
> We've discussed this before, but for the sake of completeness, I don't
> think we're fully RCsc either because we don't order the actual RELEASE
> operation again a subsequent ACQUIRE operation:
> 
> P0
> smp_store_release(, 1);
> foo = smp_load_acquire();
> 
> P1
> smp_store_release(, 1);
> bar = smp_load_acquire();
> 
> We allow foo == bar == 0, which is prohibited by SC.

I certainly hope that no one expects foo == bar == 0 to be prohibited!!!

On the other hand, in this case, foo == bar == 1 will be prohibited:

P0
foo = smp_load_acquire();
smp_store_release(, 1);

P1
bar = smp_load_acquire();
smp_store_release(, 1);

> However, we *do* enforce ordering on any prior or subsequent accesses
> for the code snippet above (the release and acquire combine to give a
> full barrier), which makes these primitives well suited to things like
> message passing.

If I understand your example correctly, neither x86 nor Power implement
a full barrier in this case.  For example:

P0
WRITE_ONCE(a, 1);
smp_store_release(b, 1);
r1 = smp_load_acquire(c);
r2 = READ_ONCE(d);

P1
WRITE_ONCE(d, 1);
smp_mb();
r3 = READ_ONCE(a);

Both x86 and Power can reorder P0 as follows:

P0
r1 = smp_load_acquire(c);
r2 = READ_ONCE(d);
WRITE_ONCE(a, 1);
smp_store_release(b, 1);

Which clearly shows that the non-SC outcome r2 == 0 && r3 == 0 is allowed.

Or am I missing your point here?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-08-28 Thread Boqun Feng
On Fri, Aug 28, 2015 at 05:39:21PM +0200, Peter Zijlstra wrote:
> On Fri, Aug 28, 2015 at 10:16:02PM +0800, Boqun Feng wrote:

> > 
> > Ah.. just read through the thread you mentioned, I might misunderstand
> > you, probably because I didn't understand RCpc well..
> > 
> > You are saying that in a RELEASE we -might- switch from smp_lwsync() to
> > smp_mb() semantically, right? I guess this means we -might- switch from
> > RCpc to RCsc, right?
> > 
> > If so, I think I'd better to wait until we have a conclusion for this.
> 
> Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
> ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
> not.
> 
> Currently PowerPC is the only arch that (can, and) does RCpc and gives a
> weaker RELEASE + ACQUIRE. Only the CPU who did the ACQUIRE is guaranteed
> to see the stores of the CPU which did the RELEASE in order.
> 
> As it stands, RCU is the only _known_ codebase where this matters, but
> we did in fact write code for a fair number of years 'assuming' RELEASE
> + ACQUIRE was a full barrier, so who knows what else is out there.
> 
> 
> RCsc - release consistency sequential consistency
> RCpc - release consistency processor consistency
> 
> https://en.wikipedia.org/wiki/Processor_consistency (where they have
> s/sequential/causal/)

Thank you for your detailed explanation! Much clear now ;-)

Regards,
Boqun


signature.asc
Description: PGP signature


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-08-28 Thread Peter Zijlstra
On Fri, Aug 28, 2015 at 10:16:02PM +0800, Boqun Feng wrote:
> On Fri, Aug 28, 2015 at 08:06:14PM +0800, Boqun Feng wrote:
> > Hi Peter,
> > 
> > On Fri, Aug 28, 2015 at 12:48:54PM +0200, Peter Zijlstra wrote:
> > > On Fri, Aug 28, 2015 at 10:48:17AM +0800, Boqun Feng wrote:
> > > > +/*
> > > > + * Since {add,sub}_return_relaxed and xchg_relaxed are implemented with
> > > > + * a "bne-" instruction at the end, so an isync is enough as a acquire 
> > > > barrier
> > > > + * on the platform without lwsync.
> > > > + */
> > > > +#ifdef CONFIG_SMP
> > > > +#define smp_acquire_barrier__after_atomic() \
> > > > +   __asm__ __volatile__(PPC_ACQUIRE_BARRIER : : : "memory")
> > > > +#else
> > > > +#define smp_acquire_barrier__after_atomic() barrier()
> > > > +#endif
> > > > +#define arch_atomic_op_acquire(op, args...)
> > > > \
> > > > +({ 
> > > > \
> > > > +   typeof(op##_relaxed(args)) __ret  = op##_relaxed(args); 
> > > > \
> > > > +   smp_acquire_barrier__after_atomic();
> > > > \
> > > > +   __ret;  
> > > > \
> > > > +})
> > > > +
> > > > +#define arch_atomic_op_release(op, args...)
> > > > \
> > > > +({ 
> > > > \
> > > > +   smp_lwsync();   
> > > > \
> > > > +   op##_relaxed(args); 
> > > > \
> > > > +})
> > > 
> > > Urgh, so this is RCpc. We were trying to get rid of that if possible.
> > > Lets wait until that's settled before introducing more of it.
> > > 
> > > lkml.kernel.org/r/20150820155604.gb24...@arm.com
> > 
> > OK, get it. Thanks.
> > 
> > So I'm not going to introduce these arch specific macros, I think what I
> > need to implement are just _relaxed variants and cmpxchg_acquire.
> 
> Ah.. just read through the thread you mentioned, I might misunderstand
> you, probably because I didn't understand RCpc well..
> 
> You are saying that in a RELEASE we -might- switch from smp_lwsync() to
> smp_mb() semantically, right? I guess this means we -might- switch from
> RCpc to RCsc, right?
> 
> If so, I think I'd better to wait until we have a conclusion for this.

Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
not.

Currently PowerPC is the only arch that (can, and) does RCpc and gives a
weaker RELEASE + ACQUIRE. Only the CPU who did the ACQUIRE is guaranteed
to see the stores of the CPU which did the RELEASE in order.

As it stands, RCU is the only _known_ codebase where this matters, but
we did in fact write code for a fair number of years 'assuming' RELEASE
+ ACQUIRE was a full barrier, so who knows what else is out there.


RCsc - release consistency sequential consistency
RCpc - release consistency processor consistency

https://en.wikipedia.org/wiki/Processor_consistency (where they have
s/sequential/causal/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-08-28 Thread Boqun Feng
On Fri, Aug 28, 2015 at 08:06:14PM +0800, Boqun Feng wrote:
> Hi Peter,
> 
> On Fri, Aug 28, 2015 at 12:48:54PM +0200, Peter Zijlstra wrote:
> > On Fri, Aug 28, 2015 at 10:48:17AM +0800, Boqun Feng wrote:
> > > +/*
> > > + * Since {add,sub}_return_relaxed and xchg_relaxed are implemented with
> > > + * a "bne-" instruction at the end, so an isync is enough as a acquire 
> > > barrier
> > > + * on the platform without lwsync.
> > > + */
> > > +#ifdef CONFIG_SMP
> > > +#define smp_acquire_barrier__after_atomic() \
> > > + __asm__ __volatile__(PPC_ACQUIRE_BARRIER : : : "memory")
> > > +#else
> > > +#define smp_acquire_barrier__after_atomic() barrier()
> > > +#endif
> > > +#define arch_atomic_op_acquire(op, args...)  
> > > \
> > > +({   
> > > \
> > > + typeof(op##_relaxed(args)) __ret  = op##_relaxed(args); \
> > > + smp_acquire_barrier__after_atomic();\
> > > + __ret;  \
> > > +})
> > > +
> > > +#define arch_atomic_op_release(op, args...)  
> > > \
> > > +({   
> > > \
> > > + smp_lwsync();   \
> > > + op##_relaxed(args); \
> > > +})
> > 
> > Urgh, so this is RCpc. We were trying to get rid of that if possible.
> > Lets wait until that's settled before introducing more of it.
> > 
> > lkml.kernel.org/r/20150820155604.gb24...@arm.com
> 
> OK, get it. Thanks.
> 
> So I'm not going to introduce these arch specific macros, I think what I
> need to implement are just _relaxed variants and cmpxchg_acquire.

Ah.. just read through the thread you mentioned, I might misunderstand
you, probably because I didn't understand RCpc well..

You are saying that in a RELEASE we -might- switch from smp_lwsync() to
smp_mb() semantically, right? I guess this means we -might- switch from
RCpc to RCsc, right?

If so, I think I'd better to wait until we have a conclusion for this.

Thank you for your comments!

Regards,
Boqun



signature.asc
Description: PGP signature


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-08-28 Thread Boqun Feng
Hi Peter,

On Fri, Aug 28, 2015 at 12:48:54PM +0200, Peter Zijlstra wrote:
> On Fri, Aug 28, 2015 at 10:48:17AM +0800, Boqun Feng wrote:
> > +/*
> > + * Since {add,sub}_return_relaxed and xchg_relaxed are implemented with
> > + * a "bne-" instruction at the end, so an isync is enough as a acquire 
> > barrier
> > + * on the platform without lwsync.
> > + */
> > +#ifdef CONFIG_SMP
> > +#define smp_acquire_barrier__after_atomic() \
> > +   __asm__ __volatile__(PPC_ACQUIRE_BARRIER : : : "memory")
> > +#else
> > +#define smp_acquire_barrier__after_atomic() barrier()
> > +#endif
> > +#define arch_atomic_op_acquire(op, args...)
> > \
> > +({ \
> > +   typeof(op##_relaxed(args)) __ret  = op##_relaxed(args); \
> > +   smp_acquire_barrier__after_atomic();\
> > +   __ret;  \
> > +})
> > +
> > +#define arch_atomic_op_release(op, args...)
> > \
> > +({ \
> > +   smp_lwsync();   \
> > +   op##_relaxed(args); \
> > +})
> 
> Urgh, so this is RCpc. We were trying to get rid of that if possible.
> Lets wait until that's settled before introducing more of it.
> 
> lkml.kernel.org/r/20150820155604.gb24...@arm.com

OK, get it. Thanks.

So I'm not going to introduce these arch specific macros, I think what I
need to implement are just _relaxed variants and cmpxchg_acquire.

Regards,
Boqun



signature.asc
Description: PGP signature


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-08-28 Thread Peter Zijlstra
On Fri, Aug 28, 2015 at 10:48:17AM +0800, Boqun Feng wrote:
> +/*
> + * Since {add,sub}_return_relaxed and xchg_relaxed are implemented with
> + * a "bne-" instruction at the end, so an isync is enough as a acquire 
> barrier
> + * on the platform without lwsync.
> + */
> +#ifdef CONFIG_SMP
> +#define smp_acquire_barrier__after_atomic() \
> + __asm__ __volatile__(PPC_ACQUIRE_BARRIER : : : "memory")
> +#else
> +#define smp_acquire_barrier__after_atomic() barrier()
> +#endif
> +#define arch_atomic_op_acquire(op, args...)  \
> +({   \
> + typeof(op##_relaxed(args)) __ret  = op##_relaxed(args); \
> + smp_acquire_barrier__after_atomic();\
> + __ret;  \
> +})
> +
> +#define arch_atomic_op_release(op, args...)  \
> +({   \
> + smp_lwsync();   \
> + op##_relaxed(args); \
> +})

Urgh, so this is RCpc. We were trying to get rid of that if possible.
Lets wait until that's settled before introducing more of it.

lkml.kernel.org/r/20150820155604.gb24...@arm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-08-28 Thread Peter Zijlstra
On Fri, Aug 28, 2015 at 10:48:17AM +0800, Boqun Feng wrote:
 +/*
 + * Since {add,sub}_return_relaxed and xchg_relaxed are implemented with
 + * a bne- instruction at the end, so an isync is enough as a acquire 
 barrier
 + * on the platform without lwsync.
 + */
 +#ifdef CONFIG_SMP
 +#define smp_acquire_barrier__after_atomic() \
 + __asm__ __volatile__(PPC_ACQUIRE_BARRIER : : : memory)
 +#else
 +#define smp_acquire_barrier__after_atomic() barrier()
 +#endif
 +#define arch_atomic_op_acquire(op, args...)  \
 +({   \
 + typeof(op##_relaxed(args)) __ret  = op##_relaxed(args); \
 + smp_acquire_barrier__after_atomic();\
 + __ret;  \
 +})
 +
 +#define arch_atomic_op_release(op, args...)  \
 +({   \
 + smp_lwsync();   \
 + op##_relaxed(args); \
 +})

Urgh, so this is RCpc. We were trying to get rid of that if possible.
Lets wait until that's settled before introducing more of it.

lkml.kernel.org/r/20150820155604.gb24...@arm.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-08-28 Thread Boqun Feng
Hi Peter,

On Fri, Aug 28, 2015 at 12:48:54PM +0200, Peter Zijlstra wrote:
 On Fri, Aug 28, 2015 at 10:48:17AM +0800, Boqun Feng wrote:
  +/*
  + * Since {add,sub}_return_relaxed and xchg_relaxed are implemented with
  + * a bne- instruction at the end, so an isync is enough as a acquire 
  barrier
  + * on the platform without lwsync.
  + */
  +#ifdef CONFIG_SMP
  +#define smp_acquire_barrier__after_atomic() \
  +   __asm__ __volatile__(PPC_ACQUIRE_BARRIER : : : memory)
  +#else
  +#define smp_acquire_barrier__after_atomic() barrier()
  +#endif
  +#define arch_atomic_op_acquire(op, args...)
  \
  +({ \
  +   typeof(op##_relaxed(args)) __ret  = op##_relaxed(args); \
  +   smp_acquire_barrier__after_atomic();\
  +   __ret;  \
  +})
  +
  +#define arch_atomic_op_release(op, args...)
  \
  +({ \
  +   smp_lwsync();   \
  +   op##_relaxed(args); \
  +})
 
 Urgh, so this is RCpc. We were trying to get rid of that if possible.
 Lets wait until that's settled before introducing more of it.
 
 lkml.kernel.org/r/20150820155604.gb24...@arm.com

OK, get it. Thanks.

So I'm not going to introduce these arch specific macros, I think what I
need to implement are just _relaxed variants and cmpxchg_acquire.

Regards,
Boqun



signature.asc
Description: PGP signature


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-08-28 Thread Peter Zijlstra
On Fri, Aug 28, 2015 at 10:16:02PM +0800, Boqun Feng wrote:
 On Fri, Aug 28, 2015 at 08:06:14PM +0800, Boqun Feng wrote:
  Hi Peter,
  
  On Fri, Aug 28, 2015 at 12:48:54PM +0200, Peter Zijlstra wrote:
   On Fri, Aug 28, 2015 at 10:48:17AM +0800, Boqun Feng wrote:
+/*
+ * Since {add,sub}_return_relaxed and xchg_relaxed are implemented with
+ * a bne- instruction at the end, so an isync is enough as a acquire 
barrier
+ * on the platform without lwsync.
+ */
+#ifdef CONFIG_SMP
+#define smp_acquire_barrier__after_atomic() \
+   __asm__ __volatile__(PPC_ACQUIRE_BARRIER : : : memory)
+#else
+#define smp_acquire_barrier__after_atomic() barrier()
+#endif
+#define arch_atomic_op_acquire(op, args...)
\
+({ 
\
+   typeof(op##_relaxed(args)) __ret  = op##_relaxed(args); 
\
+   smp_acquire_barrier__after_atomic();
\
+   __ret;  
\
+})
+
+#define arch_atomic_op_release(op, args...)
\
+({ 
\
+   smp_lwsync();   
\
+   op##_relaxed(args); 
\
+})
   
   Urgh, so this is RCpc. We were trying to get rid of that if possible.
   Lets wait until that's settled before introducing more of it.
   
   lkml.kernel.org/r/20150820155604.gb24...@arm.com
  
  OK, get it. Thanks.
  
  So I'm not going to introduce these arch specific macros, I think what I
  need to implement are just _relaxed variants and cmpxchg_acquire.
 
 Ah.. just read through the thread you mentioned, I might misunderstand
 you, probably because I didn't understand RCpc well..
 
 You are saying that in a RELEASE we -might- switch from smp_lwsync() to
 smp_mb() semantically, right? I guess this means we -might- switch from
 RCpc to RCsc, right?
 
 If so, I think I'd better to wait until we have a conclusion for this.

Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
not.

Currently PowerPC is the only arch that (can, and) does RCpc and gives a
weaker RELEASE + ACQUIRE. Only the CPU who did the ACQUIRE is guaranteed
to see the stores of the CPU which did the RELEASE in order.

As it stands, RCU is the only _known_ codebase where this matters, but
we did in fact write code for a fair number of years 'assuming' RELEASE
+ ACQUIRE was a full barrier, so who knows what else is out there.


RCsc - release consistency sequential consistency
RCpc - release consistency processor consistency

https://en.wikipedia.org/wiki/Processor_consistency (where they have
s/sequential/causal/)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-08-28 Thread Boqun Feng
On Fri, Aug 28, 2015 at 08:06:14PM +0800, Boqun Feng wrote:
 Hi Peter,
 
 On Fri, Aug 28, 2015 at 12:48:54PM +0200, Peter Zijlstra wrote:
  On Fri, Aug 28, 2015 at 10:48:17AM +0800, Boqun Feng wrote:
   +/*
   + * Since {add,sub}_return_relaxed and xchg_relaxed are implemented with
   + * a bne- instruction at the end, so an isync is enough as a acquire 
   barrier
   + * on the platform without lwsync.
   + */
   +#ifdef CONFIG_SMP
   +#define smp_acquire_barrier__after_atomic() \
   + __asm__ __volatile__(PPC_ACQUIRE_BARRIER : : : memory)
   +#else
   +#define smp_acquire_barrier__after_atomic() barrier()
   +#endif
   +#define arch_atomic_op_acquire(op, args...)  
   \
   +({   
   \
   + typeof(op##_relaxed(args)) __ret  = op##_relaxed(args); \
   + smp_acquire_barrier__after_atomic();\
   + __ret;  \
   +})
   +
   +#define arch_atomic_op_release(op, args...)  
   \
   +({   
   \
   + smp_lwsync();   \
   + op##_relaxed(args); \
   +})
  
  Urgh, so this is RCpc. We were trying to get rid of that if possible.
  Lets wait until that's settled before introducing more of it.
  
  lkml.kernel.org/r/20150820155604.gb24...@arm.com
 
 OK, get it. Thanks.
 
 So I'm not going to introduce these arch specific macros, I think what I
 need to implement are just _relaxed variants and cmpxchg_acquire.

Ah.. just read through the thread you mentioned, I might misunderstand
you, probably because I didn't understand RCpc well..

You are saying that in a RELEASE we -might- switch from smp_lwsync() to
smp_mb() semantically, right? I guess this means we -might- switch from
RCpc to RCsc, right?

If so, I think I'd better to wait until we have a conclusion for this.

Thank you for your comments!

Regards,
Boqun



signature.asc
Description: PGP signature


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-08-28 Thread Boqun Feng
On Fri, Aug 28, 2015 at 05:39:21PM +0200, Peter Zijlstra wrote:
 On Fri, Aug 28, 2015 at 10:16:02PM +0800, Boqun Feng wrote:
snip
  
  Ah.. just read through the thread you mentioned, I might misunderstand
  you, probably because I didn't understand RCpc well..
  
  You are saying that in a RELEASE we -might- switch from smp_lwsync() to
  smp_mb() semantically, right? I guess this means we -might- switch from
  RCpc to RCsc, right?
  
  If so, I think I'd better to wait until we have a conclusion for this.
 
 Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
 ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
 not.
 
 Currently PowerPC is the only arch that (can, and) does RCpc and gives a
 weaker RELEASE + ACQUIRE. Only the CPU who did the ACQUIRE is guaranteed
 to see the stores of the CPU which did the RELEASE in order.
 
 As it stands, RCU is the only _known_ codebase where this matters, but
 we did in fact write code for a fair number of years 'assuming' RELEASE
 + ACQUIRE was a full barrier, so who knows what else is out there.
 
 
 RCsc - release consistency sequential consistency
 RCpc - release consistency processor consistency
 
 https://en.wikipedia.org/wiki/Processor_consistency (where they have
 s/sequential/causal/)

Thank you for your detailed explanation! Much clear now ;-)

Regards,
Boqun


signature.asc
Description: PGP signature