If we look at using purely store fences and purely load fences in the "initialized flag" example as in this discussion, I think it's worth distinguishing too possible scenarios:
1) We guarantee some form of dependency-based ordering, as most real computer architectures do. This probably invalidates the example from my committee paper that's under discussion here. The problem is, as always, that we don't know how to make this precise at the programming language level. It's the compiler's job to break certain dependencies, like the dependency of the store to x on the load of y in x = 0 * y. Many people are thinking about this problem, both to deal with "out-of-thin-air" issues correctly in various memory models, and to design a version of C++'s memory_order_consume that's more usable. If we had a way to guarantee some well-defined notion of dependency-based ordering, then at least some of the examples here would need to be revisited. 2) We don't guarantee that dependencies imply any sort of ordering. Then I think the weird example under discussion here stands. There is officially nothing to prevent the load of x.a in thread 1 from being reordered with the store to x_init. But there may actually be better examples as to why the store-store ordering in the initializing thread is not always enough. Consider: Thread 1: x.a = 1; if (x.a != 1) world_is_broken = true; StoreStore fence; x_init = true; ... if (world_is_broken) die(); Thread 2: if (x_init) { full fence; x.a++; } I think there is nothing to prevent the read of x.a in Thread 1 from seeing the incremented value, at least if (1) the compiler promotes world_is_broken to a register, and (2) at the assembly level the store to x_init is not dependent on the load of x.a. (1) seems quite plausible, and (2) seems very reasonable if the architecture has a conditional move instruction or the like. (For Itanium, (2) holds even for the naive compilation.) This is not a particularly likely scenario, but I have no idea how would concoct programming rules that would guarantee to prevent this kind of weirdness. The first two statements of Thread 1 might appear inside an "initialize a" library routine that knows nothing about concurrency. Hans On Wed, Dec 17, 2014 at 10:54 AM, Martin Buchholz <marti...@google.com> wrote: > On Wed, Dec 17, 2014 at 1:28 AM, Peter Levart <peter.lev...@gmail.com> > wrote: > > On 12/17/2014 03:28 AM, David Holmes wrote: > >> > >> On 17/12/2014 10:06 AM, Martin Buchholz wrote: > >> Hans allows for the nonsensical, in my view, possibility that the load > of > >> x.a can happen after the x_init=true store and yet somehow be subject > to the > >> ++ and the ensuing store that has to come before the x_init = true. > > > > Perhaps, he is speaking about why it is dangerous to replace BOTH release > > with just store-store AND acquire with just load-load? > > I'm pretty sure he's talking about weakening EITHER. > > """Clearly, and unsurprisingly, it is unsafe to replace the > load_acquire with a version that restricts only load ordering in this > case. That would allow the store to x in thread 2 to become visible > before the initialization of x by thread 1 is complete, possibly > losing the update, or corrupting the state of x during initialization. > > More interestingly, it is also generally unsafe to restrict the > release ordering constraint in thread 1 to only stores.""" > > (What's "clear and unsurprising" to Hans may not be to the rest of us) >