On 01/27/2016 03:54 PM, Peter Zijlstra wrote:
On Wed, Jan 27, 2016 at 03:22:19PM -0500, Waiman Long wrote:

+       /*
+        * Put itself into the list_batch queue
+        */
+       node.next  = NULL;
+       node.entry = entry;
+       node.cmd   = cmd;
+       node.state = lb_state_waiting;
+
Here we rely on the release barrier implied by xchg() to ensure the node
initialization is complete before the xchg() publishes the thing.

But do we also need the acquire part of this barrier? From what I could
tell, the primitive as a whole does not imply any ordering.
I think we probably won't need the acquire part, but I don't have a non-x86
machine that can really test out the more relaxed versions of the atomic
ops. That is why I use the strict versions. We can always relax it later on
with additional patches.
Yeah, I have no hardware either. But at least we should comment the bits
we do know to rely upon.


Using xchg_release() looks OK to me. As this feature is enabled on x86 only for this patch, we can make the change and whoever enabling it for other architectures that have a real release function will have to test it.

+       if (!next) {
+               /*
+                * The queue tail should equal to nptr, so clear it to
+                * mark the queue as empty.
+                */
+               if (cmpxchg(&batch->tail, nptr, NULL) != nptr) {
+                       /*
+                        * Queue not empty, wait until the next pointer is
+                        * initialized.
+                        */
+                       while (!(next = READ_ONCE(nptr->next)))
+                               cpu_relax();
+               }
+               /* The above cmpxchg acts as a memory barrier */
for what? :-)

Also, if that cmpxchg() fails, it very much does _not_ act as one.

I suspect you want smp_store_release() setting the state_done, just as
above, and then use cmpxchg_relaxed().
You are right. I did forgot about there was no memory barrier guarantee when
cmpxchg() fails.
However, in that case, the READ_ONCE() and WRITE_ONCE()
macros should still provide the necessary ordering, IMO.
READ/WRITE_ONCE() provide _no_ order what so ever. And the issue here is
that we must not do any other stores to nptr after the state_done.


I thought if those macros are accessing the same cacheline, the compiler won't change the ordering and the hardware will take care of the proper ordering. Anyway, I do intended to change to use smp_store_release() for safety.

I can certainly
change it to use cmpxchg_relaxed() and smp_store_release() instead.
That seems a safe combination and would still generate the exact same
code on x86.

Cheers,
Longman

Reply via email to