On Tue, 2015-12-15 at 14:41 -0800, Davidlohr Bueso wrote: > On Tue, 15 Dec 2015, Michael Kerrisk (man-pages) wrote: > > > When executing a futex operation that requests to block a thread, > > the kernel will block only if the futex word has the value that > > the calling thread supplied (as one of the arguments of the > > futex() call) as the expected value of the futex word. The load??? > > ing of the futex word's value, the comparison of that value with > > the expected value, and the actual blocking will happen atomi??? > > > >FIXME: for next line, it would be good to have an explanation of > >"totally ordered" somewhere around here. > > > > cally and totally ordered with respect to concurrently executing > > futex operations on the same futex word. > > So there are two things here regarding ordering. One is the most obvious > which is ordered due to the taking/dropping the hb spinlock.
I suppose that this means what is described in the manpage already? That is, that futex operations (ie, the syscalls) are atomic wrt each other and in a strict total order? > Secondly, its > the cases which Peter brought up a while ago that involves atomic futex ops > futex_atomic_*(), which do not have clearly defined semantics, and you > get > inconsistencies with certain archs (tile being the worst iirc). OK. So, from a user's POV, this is about the semantics of the kernel's accesses to the futex word. I agree that specifying this more clearly would be helpful. First, there are the comparisons of the futex words used in, for example, FUTEX_WAIT. They should use an atomic load within the conceptual critical sections that make up futex operations. This load itself doesn't need to establish any ordering, so it can be equivalent to a C11 memory_order_relaxed load. Are there any objections to that? Second, We have the write accesses in FUTEX_[TRY]LOCK_PI and FUTEX_UNLOCK_PI. We already specify those as atomic and within the conceptual critical sections of the futex operation. In addition, they should establish ordering themselves, so C11 have memory_order_acquire / memory_order_release semantics. Specifying this would be good. Any objections to these semantics? Third, we have the atomic read-modify-write operation that is part of FUTEX_WAKE_OP (ie, AFAIU, the case you pointed at specifically). I don't have a strong opinion on what it should be, because I think userspace can enforce the orderings it needs on its own (eg, if I interpret Peter Zijlstra's example correctly, userspace can add appropriate fences before the CPU0/futex_unlock and after the CPU2/futex_load calls). FUTEX_WAKE_OP accesses no other userspace memory location, so there's no ordering relation to other accesses to userspace memory that userspace cannot affect. OTOH, legacy userspace may have assumed strong semantics, so making the read-modify-write have memory_order_seq_cst semantics is probably a safe bet. Futex operations typically shouldn't be on the fast paths anyway. > But anyway, the important thing users need to know about is that the atomic > futex operation must be totally ordered wrt any other user tasks that are > trying > to access that address. I'm not sure what you mean precisely. One can't order the whole futex operations totally wrt memory accesses by userspace because they'd need to synchronize to do that, and thus userspace would to hvae either hook into the kernel's synchronization or use HTM or such. > This is not necessarily the case for kernel ops. Peter > illustrates this nicely with lock stealing example; > (see https://lkml.org/lkml/2015/8/26/596). > > Internally, I believe we decided that making it fully ordered (as opposed to > making use of implicit barriers for ACQUIRE/RELEASE), so you'd endup having > an MB ll/sc MB kind of setup. OK. So, any objections to documenting that the read-modify-write op in FUTEX_WAKE_OP has memory_order_seq_cst semantics? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/