On Wed, Jan 27, 2016 at 03:22:19PM -0500, Waiman Long wrote: > >>+ /* > >>+ * Put itself into the list_batch queue > >>+ */ > >>+ node.next = NULL; > >>+ node.entry = entry; > >>+ node.cmd = cmd; > >>+ node.state = lb_state_waiting; > >>+ > >Here we rely on the release barrier implied by xchg() to ensure the node > >initialization is complete before the xchg() publishes the thing. > > > >But do we also need the acquire part of this barrier? From what I could > >tell, the primitive as a whole does not imply any ordering. > > I think we probably won't need the acquire part, but I don't have a non-x86 > machine that can really test out the more relaxed versions of the atomic > ops. That is why I use the strict versions. We can always relax it later on > with additional patches.
Yeah, I have no hardware either. But at least we should comment the bits we do know to rely upon. > >>+ if (!next) { > >>+ /* > >>+ * The queue tail should equal to nptr, so clear it to > >>+ * mark the queue as empty. > >>+ */ > >>+ if (cmpxchg(&batch->tail, nptr, NULL) != nptr) { > >>+ /* > >>+ * Queue not empty, wait until the next pointer is > >>+ * initialized. > >>+ */ > >>+ while (!(next = READ_ONCE(nptr->next))) > >>+ cpu_relax(); > >>+ } > >>+ /* The above cmpxchg acts as a memory barrier */ > >for what? :-) > > > >Also, if that cmpxchg() fails, it very much does _not_ act as one. > > > >I suspect you want smp_store_release() setting the state_done, just as > >above, and then use cmpxchg_relaxed(). > > You are right. I did forgot about there was no memory barrier guarantee when > cmpxchg() fails. > However, in that case, the READ_ONCE() and WRITE_ONCE() > macros should still provide the necessary ordering, IMO. READ/WRITE_ONCE() provide _no_ order what so ever. And the issue here is that we must not do any other stores to nptr after the state_done. > I can certainly > change it to use cmpxchg_relaxed() and smp_store_release() instead. That seems a safe combination and would still generate the exact same code on x86.