Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm: > I'm no hardware architect, but fundamentally it seems to me that > > load x > acquire_fence > > imposes a much more stringent constraint than > > load_acquire x > > Consider the case in which the load from x is an L1 hit, but a preceding > load (from say y) is a long-latency miss. If we enforce ordering by just > waiting for completion of prior operation, the former has to wait for the > load from y to complete; while the latter doesn't. I find it hard to > believe that this doesn't leave an appreciable amount of performance on the > table, at least for some interesting microarchitectures.
I agree, Hans, that this is a reasonable assumption. Load_acquire x does allow roach motel, whereas the acquire fence does not. > In addition, for better or worse, fencing requirements on at least > Power are actually driven as much by store atomicity issues, as by > the ordering issues discussed in the cookbook. This was not > understood in 2005, and unfortunately doesn't seem to be amenable to > the kind of straightforward explanation as in Doug's cookbook. Coming from a strongly ordered architecture to a weakly ordered one myself, I also needed some mental adjustment about store (multi-copy) atomicity. I can imagine others will be unaware of this difference, too, even in 2014. Stephan