Linus Torvalds <torva...@linux-foundation.org> wrote: > > It mandates using smp_store_release() to update buffer->head in the producer > > and buffer->tail in the consumer - but these need pairing with memory > > barriers > > used when reading buffer->head and buffer->tail on the other side. > > No, the rule with smp_store_release() should be that it's paired with > "smp_load_acquire()".
Yes. > No other barriers needed. See below. > If you do that > > thread #1 thread #2 > > ... add data to queue .. > smp_store_release(x) > > smp_load_acquire(x) > ... read data from queue .. > ... > But yes, store_release(x) should always pair with the load_acquire(x), > and the guarantee is that if the load_acquire reads the value that the > store_release stored, then all subsequent reads in thread #2 will see > all preceding writes in thread #1. I agree with this bit, but it's only half the picture. > then you should need no other barriers. But I don't agree with this. You're missing half the barriers. There should be *four* barriers. The document mandates only 3 barriers, and uses READ_ONCE() where the fourth should be, i.e.: thread #1 thread #2 smp_load_acquire(head) ... read data from queue .. smp_store_release(tail) READ_ONCE(tail) ... add data to queue .. smp_store_release(head) but I think that READ_ONCE() should instead be smp_load_acquire() so that the read of the data is ordered before it gets overwritten by the writer, ordering with respect to accesses on tail (it now implies smp_read_barrier_depends() which I think is sufficient). Another way of looking at it is that smp_store_release(tail) needs pairing with something, so it should be: thread #1 thread #2 smp_load_acquire(head) ... read data from queue .. smp_store_release(tail) smp_load_acquire(tail) ... add data to queue .. smp_store_release(head) Take your example, reorder the threads and add the missing indices accesses: thread #1 thread #2 smp_load_acquire(x) ... read data from queue .. tail++ read tail; ... add data to queue .. smp_store_release(x) Is there anything to stop the cpus doing out-of-order loads/stores such that the data read in thread 2 doesn't come after the update of tail? If that can happen, the data being written by thread 1 may be being read by thread 2 when in fact, it shouldn't see it yet, e.g.: LOAD head ACQUIRE_BARRIER LOAD tail LOAD head STORE tail++ LOAD tail STORE data[head] LOAD data[tail] RELEASE_BARRIER STORE head++ I would really like Paul and Will to check me on this. -~- > HOWEVER. > > I think this is all entirely pointless wrt the pipe buffer use. You > don't have a simple queue. You have multiple writers, and you have > multiple readers. As a result, you need real locking. Yes, and I don't deny that. Between readers and readers you do; between writers and writers you do. But barriers are about the coupling between the active reader and the active writer - and even with locking, you *still* need the barriers, it's just that there are barriers implicit in the locking primitives - so they're there, just hidden. > So don't do the barriers. If you do the barriers, you're almost > certainly doing something buggy. You don't have the simple "consumer > -> producer" thing. Or rather, you don't _only_ have that thing. I think what you're thinking of still has a problem. Could you give a simple 'trace' of what it is you're thinking of, particularly in the reader, so that I can see. I think that the following points need to be considered: (1) Accessing the ring requires *four* barriers, two read-side and two write-side. (2) pipe_lock() and pipe_unlock() currently provide the barrier and the code places them very wide so that they encompass the index access, the data access and the index update, and order them with respect to pipe->mutex: pipe_lock(); // barrier #1 ... get curbuf and nbufs... ... write data ... ... nbufs++ ... pipe_unlock(); // barrier #2 pipe_lock(); // barrier #3 ... get curbuf and nbufs... ... read data ... ... curbuf++; nbufs--; ... pipe_unlock(); // barrier #4 The barriers pair up #2->#3 and #4->#1. (3) Multiple readers are serialised against each other by pipe_lock(), and will continue to be so. (4) Multiple userspace writers are serialised against each other by pipe_lock(), and will continue to be so. (5) Readers and userspace writers are serialised against each other by pipe_lock(). This could be changed at some point in the future. (6) pipe_lock() and pipe_unlock() cannot be used by post_notification() as it needs to be able to insert a message from softirq context or with a spinlock held. (7) It might be possible to use pipe->wait's spinlock to guard access to the ring indices since we might be taking that lock anyway to poke the other side. (8) We don't want to call copy_page_to/from_iter() with spinlock held. As we discussed at Plumbers, we could use pipe->wait's spinlock in the from-userspace writer to do something like: pipe_lock(); ... spin_lock_irq(&pipe->wait->lock); // barrier #1 ... get indices... ... advance index ... spin_unlock_irq(&pipe->wait->lock); ... write data ... pipe_unlock(); // barrier #2 ie. allocate the slot inside the spinlock, then write it whilst holding off the reader with the pipe lock. Yes, that would work as pipe_unlock() acts as the closing barrier. But this doesn't work for post_notification(). That would have to do: spin_lock_irq(&pipe->wait->lock); // barrier #1 ... get indices... ... write data ... ... advance index ... spin_unlock_irq(&pipe->wait->lock); // barrier #2 which is fine, even if it occurs within the from-userspace writer as the latter is holding the pipe lock. However, consider the reader. If you're thinking of: pipe_lock(); // barrier #3? ... ... get indices... ... read data ... spin_lock_irq(&pipe->wait->lock); ... advance index ... spin_unlock_irq(&pipe->wait->lock); // barrier #4 pipe_unlock(); that works against the from-userspace writer, but barrier #3 is in the wrong place against post_notification(). We could do this: pipe_lock(); ... spin_lock_irq(&pipe->wait->lock); // barrier #3 ... get indices... ... read data ... ... advance index ... spin_unlock_irq(&pipe->wait->lock); // barrier #4 pipe_unlock(); which does indeed protect the reader against post_notification(), but it means that the data copy happens with the lock held. An alternative could be: pipe_lock(); ... spin_lock_irq(&pipe->wait->lock); // barrier #3 ... get indices... spin_unlock_irq(&pipe->wait->lock); ... read data ... spin_lock_irq(&pipe->wait->lock); ... advance index ... spin_unlock_irq(&pipe->wait->lock); // barrier #4 pipe_unlock(); but that would take the spinlock twice. There shouldn't need to be *any* common lock between the reader and the writer[*] - not even pipe_lock() or pipe->wait, both of which could be split. [*] Apart from the minor fact that, currently, the writer can violate normal ring-buffer semantics and go back and continue filling something it's already committed. David