Re: wake_q memory ordering

2019-10-11 Thread Davidlohr Bueso

On Fri, 11 Oct 2019, Manfred Spraul wrote:
I don't know. The new documentation would not have answered my 
question (is it ok to combine smp_mb__before_atomic() with 
atomic_relaxed()?). And it copies content already present in 
atomic_t.txt.


Well, the _relaxed (and release/acquire) extentions refer to a
_successful_ operation (LL/SC), and whether it has barriers on
each of the sides before and after. I thought you were mainly
worried about the failed CAS scenario, not the relaxed itself.

I don't know how this copies content from atomic_t.txt, at no
point does it talk about failed CAS.



Thus: I would prefer if the first sentence of the paragraph is 
replaced: The list of operations should end with "...", and it should 
match what is in atomic_t.txt


I'll see about combining some of your changes in patch 5/5 of
your new series, but have to say I still prefer my documentation
change.

Thanks,
Davidlohr


Re: wake_q memory ordering

2019-10-11 Thread Manfred Spraul

Hi Davidlohr,

On 10/10/19 9:25 PM, Davidlohr Bueso wrote:

On Thu, 10 Oct 2019, Peter Zijlstra wrote:


On Thu, Oct 10, 2019 at 02:13:47PM +0200, Manfred Spraul wrote:


Therefore smp_mb__{before,after}_atomic() may be combined with
cmpxchg_relaxed, to form a full memory barrier, on all archs.


Just so.


We might want something like this?

8<-

From: Davidlohr Bueso 
Subject: [PATCH] Documentation/memory-barriers.txt: Mention 
smp_mb__{before,after}_atomic() and CAS


Explicitly mention possible usages to guarantee serialization even upon
failed cmpxchg (or similar) calls along with 
smp_mb__{before,after}_atomic().


Signed-off-by: Davidlohr Bueso 
---
Documentation/memory-barriers.txt | 12 
1 file changed, 12 insertions(+)

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt

index 1adbb8a371c7..5d2873d4b442 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1890,6 +1890,18 @@ There are some more advanced barrier functions:
 This makes sure that the death mark on the object is perceived to 
be set

 *before* the reference counter is decremented.

+ Similarly, these barriers can be used to guarantee serialization 
for atomic
+ RMW calls on architectures which may not imply memory barriers 
upon failure.

+
+    obj->next = NULL;
+    smp_mb__before_atomic()
+    if (cmpxchg(>ptr, NULL, val))
+    return;
+
+ This makes sure that the store to the next pointer always has 
smp_store_mb()
+ semantics. As such, smp_mb__{before,after}_atomic() calls allow 
optimizing

+ the barrier usage by finer grained serialization.
+
 See Documentation/atomic_{t,bitops}.txt for more information.


I don't know. The new documentation would not have answered my question 
(is it ok to combine smp_mb__before_atomic() with atomic_relaxed()?). 
And it copies content already present in atomic_t.txt.


Thus: I would prefer if the first sentence of the paragraph is replaced: 
The list of operations should end with "...", and it should match what 
is in atomic_t.txt


Ok?

--

    Manfred


>From 8df60211228042672ba0cd89c3566c5145e8b203 Mon Sep 17 00:00:00 2001
From: Manfred Spraul 
Date: Fri, 11 Oct 2019 10:33:26 +0200
Subject: [PATCH 4/4] Documentation/memory-barriers.txt:  Clarify cmpxchg()

The documentation in memory-barriers.txt claims that
smp_mb__{before,after}_atomic() are for atomic ops that do not return a
value.

This is misleading and doesn't match the example in atomic_t.txt,
and e.g. smp_mb__before_atomic() may and is used together with
cmpxchg_relaxed() in the wake_q code.

The purpose of e.g. smp_mb__before_atomic() is to "upgrade" a following
RMW atomic operation to a full memory barrier.
The return code of the atomic operation has no impact, so all of the
following examples are valid:

1)
	smp_mb__before_atomic();
	atomic_add();

2)
	smp_mb__before_atomic();
	atomic_xchg_relaxed();

3)
	smp_mb__before_atomic();
	atomic_fetch_add_relaxed();

Invalid would be:
	smp_mb__before_atomic();
	atomic_set();

Signed-off-by: Manfred Spraul 
Cc: Waiman Long 
Cc: Davidlohr Bueso 
Cc: Peter Zijlstra 
---
 Documentation/memory-barriers.txt | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 1adbb8a371c7..52076b057400 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1873,12 +1873,13 @@ There are some more advanced barrier functions:
  (*) smp_mb__before_atomic();
  (*) smp_mb__after_atomic();
 
- These are for use with atomic (such as add, subtract, increment and
- decrement) functions that don't return a value, especially when used for
- reference counting.  These functions do not imply memory barriers.
+ These are for use with atomic RMW functions (such as add, subtract,
+ increment, decrement, failed conditional operations, ...) that do
+ not imply memory barriers, but where the code needs a memory barrier,
+ for example when used for reference counting.
 
- These are also used for atomic bitop functions that do not return a
- value (such as set_bit and clear_bit).
+ These are also used for atomic RMW bitop functions that do imply a full
+ memory barrier (such as set_bit and clear_bit).
 
  As an example, consider a piece of code that marks an object as being dead
  and then decrements the object's reference count:
-- 
2.21.0



Re: wake_q memory ordering

2019-10-10 Thread Davidlohr Bueso

On Thu, 10 Oct 2019, Peter Zijlstra wrote:


On Thu, Oct 10, 2019 at 02:13:47PM +0200, Manfred Spraul wrote:

Hi Peter,

On 10/10/19 1:42 PM, Peter Zijlstra wrote:
> On Thu, Oct 10, 2019 at 12:41:11PM +0200, Manfred Spraul wrote:
> > Hi,
> >
> > Waiman Long noticed that the memory barriers in sem_lock() are not really
> > documented, and while adding documentation, I ended up with one case where
> > I'm not certain about the wake_q code:
> >
> > Questions:
> > - Does smp_mb__before_atomic() + a (failed) cmpxchg_relaxed provide an
> >    ordering guarantee?
> Yep. Either the atomic instruction implies ordering (eg. x86 LOCK
> prefix) or it doesn't (most RISC LL/SC), if it does,
> smp_mb__{before,after}_atomic() are a NO-OP and the ordering is
> unconditinoal, if it does not, then smp_mb__{before,after}_atomic() are
> unconditional barriers.

And _relaxed() differs from "normal" cmpxchg only for LL/SC architectures,
correct?


Indeed.


Therefore smp_mb__{before,after}_atomic() may be combined with
cmpxchg_relaxed, to form a full memory barrier, on all archs.


Just so.


We might want something like this?

8<-

From: Davidlohr Bueso 
Subject: [PATCH] Documentation/memory-barriers.txt: Mention 
smp_mb__{before,after}_atomic() and CAS

Explicitly mention possible usages to guarantee serialization even upon
failed cmpxchg (or similar) calls along with smp_mb__{before,after}_atomic().

Signed-off-by: Davidlohr Bueso 
---
Documentation/memory-barriers.txt | 12 
1 file changed, 12 insertions(+)

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index 1adbb8a371c7..5d2873d4b442 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1890,6 +1890,18 @@ There are some more advanced barrier functions:
 This makes sure that the death mark on the object is perceived to be set
 *before* the reference counter is decremented.

+ Similarly, these barriers can be used to guarantee serialization for 
atomic
+ RMW calls on architectures which may not imply memory barriers upon 
failure.
+
+   obj->next = NULL;
+   smp_mb__before_atomic()
+   if (cmpxchg(>ptr, NULL, val))
+   return;
+
+ This makes sure that the store to the next pointer always has 
smp_store_mb()
+ semantics. As such, smp_mb__{before,after}_atomic() calls allow optimizing
+ the barrier usage by finer grained serialization.
+
 See Documentation/atomic_{t,bitops}.txt for more information.


--
2.16.4


Re: wake_q memory ordering

2019-10-10 Thread Peter Zijlstra
On Thu, Oct 10, 2019 at 02:13:47PM +0200, Manfred Spraul wrote:
> Hi Peter,
> 
> On 10/10/19 1:42 PM, Peter Zijlstra wrote:
> > On Thu, Oct 10, 2019 at 12:41:11PM +0200, Manfred Spraul wrote:
> > > Hi,
> > > 
> > > Waiman Long noticed that the memory barriers in sem_lock() are not really
> > > documented, and while adding documentation, I ended up with one case where
> > > I'm not certain about the wake_q code:
> > > 
> > > Questions:
> > > - Does smp_mb__before_atomic() + a (failed) cmpxchg_relaxed provide an
> > >    ordering guarantee?
> > Yep. Either the atomic instruction implies ordering (eg. x86 LOCK
> > prefix) or it doesn't (most RISC LL/SC), if it does,
> > smp_mb__{before,after}_atomic() are a NO-OP and the ordering is
> > unconditinoal, if it does not, then smp_mb__{before,after}_atomic() are
> > unconditional barriers.
> 
> And _relaxed() differs from "normal" cmpxchg only for LL/SC architectures,
> correct?

Indeed.

> Therefore smp_mb__{before,after}_atomic() may be combined with
> cmpxchg_relaxed, to form a full memory barrier, on all archs.

Just so.

> > > - Is it ok that wake_up_q just writes wake_q->next, shouldn't
> > >    smp_store_acquire() be used? I.e.: guarantee that wake_up_process()
> > >    happens after cmpxchg_relaxed(), assuming that a failed cmpxchg_relaxed
> > >    provides any ordering.
> > There is no such thing as store_acquire, it is either load_acquire or
> > store_release. But just like how we can write load-aquire like
> > load+smp_mb(), so too I suppose we could write store-acquire like
> > store+smp_mb(), and that is exactly what is there (through the implied
> > barrier of wake_up_process()).
> 
> Thanks for confirming my assumption:
> The code is correct, due to the implied barrier inside wake_up_process().

It has a comment there, trying to state this.

task->wake_q.next = NULL;

/*
 * wake_up_process() executes a full barrier, which pairs with
 * the queueing in wake_q_add() so as not to miss wakeups.
 */
wake_up_process(task);


Re: wake_q memory ordering

2019-10-10 Thread Manfred Spraul

Hi Peter,

On 10/10/19 1:42 PM, Peter Zijlstra wrote:

On Thu, Oct 10, 2019 at 12:41:11PM +0200, Manfred Spraul wrote:

Hi,

Waiman Long noticed that the memory barriers in sem_lock() are not really
documented, and while adding documentation, I ended up with one case where
I'm not certain about the wake_q code:

Questions:
- Does smp_mb__before_atomic() + a (failed) cmpxchg_relaxed provide an
   ordering guarantee?

Yep. Either the atomic instruction implies ordering (eg. x86 LOCK
prefix) or it doesn't (most RISC LL/SC), if it does,
smp_mb__{before,after}_atomic() are a NO-OP and the ordering is
unconditinoal, if it does not, then smp_mb__{before,after}_atomic() are
unconditional barriers.


And _relaxed() differs from "normal" cmpxchg only for LL/SC 
architectures, correct?


Therefore smp_mb__{before,after}_atomic() may be combined with 
cmpxchg_relaxed, to form a full memory barrier, on all archs.


[...]



- Is it ok that wake_up_q just writes wake_q->next, shouldn't
   smp_store_acquire() be used? I.e.: guarantee that wake_up_process()
   happens after cmpxchg_relaxed(), assuming that a failed cmpxchg_relaxed
   provides any ordering.

There is no such thing as store_acquire, it is either load_acquire or
store_release. But just like how we can write load-aquire like
load+smp_mb(), so too I suppose we could write store-acquire like
store+smp_mb(), and that is exactly what is there (through the implied
barrier of wake_up_process()).


Thanks for confirming my assumption:
The code is correct, due to the implied barrier inside wake_up_process().

[...]

rewritten:

start condition: A = 1; B = 0;

CPU1:
     B = 1;
     RELEASE, unlock LockX;

CPU2:
     lock LockX, ACQUIRE
     if (LOAD A == 1) return; /* using cmp_xchg_relaxed */

CPU2:
     A = 0;
     ACQUIRE, lock LockY
     smp_mb__after_spinlock();
     READ B

Question: is A = 1, B = 0 possible?

Your example is incomplete (there is no A=1 assignment for example), but
I'm thinking I can guess where that should go given the earlier text.


A=1 is listed as start condition. Way before, someone did wake_q_add().



I don't think this is broken.


Thanks.

--

    Manfred



Re: wake_q memory ordering

2019-10-10 Thread Peter Zijlstra
On Thu, Oct 10, 2019 at 12:41:11PM +0200, Manfred Spraul wrote:
> Hi,
> 
> Waiman Long noticed that the memory barriers in sem_lock() are not really
> documented, and while adding documentation, I ended up with one case where
> I'm not certain about the wake_q code:
> 
> Questions:
> - Does smp_mb__before_atomic() + a (failed) cmpxchg_relaxed provide an
>   ordering guarantee?

Yep. Either the atomic instruction implies ordering (eg. x86 LOCK
prefix) or it doesn't (most RISC LL/SC), if it does,
smp_mb__{before,after}_atomic() are a NO-OP and the ordering is
unconditinoal, if it does not, then smp_mb__{before,after}_atomic() are
unconditional barriers.

IOW, the only way to get a cmpxchg without barriers on failure, is with
LL/SC, and in that case smp_mb__{before,after}_atomic() are
unconditional.

For instance, the way ARM64 does cmpxchg() is:

cmpxchg(p, o, n)
do {
v = LL(p);
if (v != o)
return v;
} while (!SC_RELEASE(p, n))
smp_mb();
return v;

And you'll note how on success the store-release constraints all prior
memory operations, and the smp_mb() constraints all later memops. But on
failure there's not a barrier to be found.

> - Is it ok that wake_up_q just writes wake_q->next, shouldn't
>   smp_store_acquire() be used? I.e.: guarantee that wake_up_process()
>   happens after cmpxchg_relaxed(), assuming that a failed cmpxchg_relaxed
>   provides any ordering.

There is no such thing as store_acquire, it is either load_acquire or
store_release. But just like how we can write load-aquire like
load+smp_mb(), so too I suppose we could write store-acquire like
store+smp_mb(), and that is exactly what is there (through the implied
barrier of wake_up_process()).

(arguably it should've been WRITE_ONCE() I suppose)

> 
> Example:
> - CPU2 never touches lock a. It is just an unrelated wake_q user that also
>   wants to wake up task 1234.
> - I've noticed already that smp_store_acquire() doesn't exist.
>   So smp_store_mb() is required. But from semantical point of view, we would
>   need an ACQUIRE: the wake_up_process() must happen after cmpxchg().
> - May wake_up_q() rely on the spinlocks/memory barriers in try_to_wake_up,
>   or should the function be safe by itself?
> 
> CPU1: /current=1234, inside do_semtimedop()/
>     g_wakee = current;
>     current->state = TASK_INTERRUPTIBLE;
>     spin_unlock(a);
> 
> CPU2: / arbitrary kernel thread that uses wake_q /
>     wake_q_add(_q, 1234);
>     wake_up_q(_q);
>     <...ongoing>
> 
> CPU3: / do_semtimedop() + wake_up_sem_queue_prepare() /
>     spin_lock(a);
>     wake_q_add(,g_wakee);
>     < within wake_q_add() >:
>   smp_mb__before_atomic();
>   if (unlikely(cmpxchg_relaxed(>next, NULL,
> WAKE_Q_TAIL)))
>   return false; /* -> this happens */
> 
> CPU2:
>     
>     1234->wake_q.next = NULL; < Ok? Is store_acquire()
> missing? 

/* smp_mb(); implied by the following wake_up_process() */

>     wake_up_process(1234);
>     < within wake_up_process/try_to_wake_up():
>     raw_spin_lock_irqsave()
>     smp_mb__after_spinlock()
>     if(1234->state = TASK_RUNNING) return;
>  >
> 
> 
> rewritten:
> 
> start condition: A = 1; B = 0;
> 
> CPU1:
>     B = 1;
>     RELEASE, unlock LockX;
> 
> CPU2:
>     lock LockX, ACQUIRE
>     if (LOAD A == 1) return; /* using cmp_xchg_relaxed */
> 
> CPU2:
>     A = 0;
>     ACQUIRE, lock LockY
>     smp_mb__after_spinlock();
>     READ B
> 
> Question: is A = 1, B = 0 possible?

Your example is incomplete (there is no A=1 assignment for example), but
I'm thinking I can guess where that should go given the earlier text.

I don't think this is broken.



wake_q memory ordering

2019-10-10 Thread Manfred Spraul

Hi,

Waiman Long noticed that the memory barriers in sem_lock() are not 
really documented, and while adding documentation, I ended up with one 
case where I'm not certain about the wake_q code:


Questions:
- Does smp_mb__before_atomic() + a (failed) cmpxchg_relaxed provide an
  ordering guarantee?
- Is it ok that wake_up_q just writes wake_q->next, shouldn't
  smp_store_acquire() be used? I.e.: guarantee that wake_up_process()
  happens after cmpxchg_relaxed(), assuming that a failed cmpxchg_relaxed
  provides any ordering.

Example:
- CPU2 never touches lock a. It is just an unrelated wake_q user that also
  wants to wake up task 1234.
- I've noticed already that smp_store_acquire() doesn't exist.
  So smp_store_mb() is required. But from semantical point of view, we 
would

  need an ACQUIRE: the wake_up_process() must happen after cmpxchg().
- May wake_up_q() rely on the spinlocks/memory barriers in try_to_wake_up,
  or should the function be safe by itself?

CPU1: /current=1234, inside do_semtimedop()/
    g_wakee = current;
    current->state = TASK_INTERRUPTIBLE;
    spin_unlock(a);

CPU2: / arbitrary kernel thread that uses wake_q /
    wake_q_add(_q, 1234);
    wake_up_q(_q);
    <...ongoing>

CPU3: / do_semtimedop() + wake_up_sem_queue_prepare() /
    spin_lock(a);
    wake_q_add(,g_wakee);
    < within wake_q_add() >:
  smp_mb__before_atomic();
  if (unlikely(cmpxchg_relaxed(>next, 
NULL, WAKE_Q_TAIL)))

  return false; /* -> this happens */

CPU2:
    
    1234->wake_q.next = NULL; < Ok? Is 
store_acquire() missing? 

    wake_up_process(1234);
    < within wake_up_process/try_to_wake_up():
    raw_spin_lock_irqsave()
    smp_mb__after_spinlock()
    if(1234->state = TASK_RUNNING) return;
 >


rewritten:

start condition: A = 1; B = 0;

CPU1:
    B = 1;
    RELEASE, unlock LockX;

CPU2:
    lock LockX, ACQUIRE
    if (LOAD A == 1) return; /* using cmp_xchg_relaxed */

CPU2:
    A = 0;
    ACQUIRE, lock LockY
    smp_mb__after_spinlock();
    READ B

Question: is A = 1, B = 0 possible?

--

    Manfred