On 02/09/2016 16:33, Pranith Kumar wrote: > > Hi Paolo, > > This is in reference to the discussion we had yesterday on IRC. I am trying to > understand the need for smp_read_barrier_depends() and how it prevents the > following race condition. I think a regular barrier() should suffice instead > of smp_read_barrier_depends(). Consider: > > P0 P1 > ---------------------------------------- > bh = ctx->first_bh; > smp_read_barrier_depends(); // barrier() should be sufficient since bh > // is local variable > next = bh->next; > lock(bh_lock); > new_bh->next = ctx->first_bh; > smp_wmb(); > ctx->first_bh = new_bh; > unlock(bh_lock); > > if (bh) { > // do something > } > > Why do you think smp_read_barrier_depends() is necessary here? If bh was a > shared variable I would understand, but here bh is local and a regular > barrier() would make sure that we are not optimizing the initial load into bh.
Honestly, I don't think you understand why memory barriers exist... They are used to synchronize writes to shared *data*, not to shared variables. It doesn't matter whether bh is a shared variable. The *data that it points to* is shared with other threads. ctx->first_bh and bh->next are both shared by P0 and P1. P1 must make sure that ctx->first_bh is written after all of its context (which in aio_bh_new includes new_bh->next) is ready. It uses smp_wmb for that. A "release store" for ctx->first_bh would be okay too. This is easy. P0 must make sure that bh->next is read after ctx->first_bh. The simplest way to ensure this is an "acquire load" for ctx->first_bh and bh->next place an smp_rmb where there is currently smp_read_barrier_depends(). This is easy too, but a bit overkill because bh->next is really ctx->first_bh->next and data dependent reads do not need full-blown acquire semantics. However, you still need to make sure that bh->next is read from _exactly_ the ctx->first_bh that was assigned to bh, and not for example a value that was changed in the meanwhile by another processor. Most processors promise this (except the Alpha!) but compilers might reload values if they think it's useful. For this reason Linux and QEMU have smp_read_barrier_depends(), and for this reason C11/C++11 introduce the "consume" memory order. smp_read_barrier_depends() is the same as C11's atomic_thread_fence(MEMORDER_CONSUME). We didn't make it up. So instead of smp_read_barrier_depends() you could load ctx->first_bh and bh->next with the consume memory order, but you do need _something_. Paolo