Stephan Diestelhorst writes: > > Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm: > > I'm no hardware architect, but fundamentally it seems to me that > > > > load x > > acquire_fence > > > > imposes a much more stringent constraint than > > > > load_acquire x > > > > Consider the case in which the load from x is an L1 hit, but a preceding > > load (from say y) is a long-latency miss. If we enforce > ordering by just > > waiting for completion of prior operation, the former has to > wait for the > > load from y to complete; while the latter doesn't. I find it hard to > > believe that this doesn't leave an appreciable amount of > performance on the > > table, at least for some interesting microarchitectures. > > I agree, Hans, that this is a reasonable assumption. Load_acquire x > does allow roach motel, whereas the acquire fence does not. > > > In addition, for better or worse, fencing requirements on at least > > Power are actually driven as much by store atomicity issues, as by > > the ordering issues discussed in the cookbook. This was not > > understood in 2005, and unfortunately doesn't seem to be amenable to > > the kind of straightforward explanation as in Doug's cookbook. > > Coming from a strongly ordered architecture to a weakly ordered one > myself, I also needed some mental adjustment about store (multi-copy) > atomicity. I can imagine others will be unaware of this difference, > too, even in 2014.
Sorry I'm missing the connection between fences and multi-copy atomicity. David > Stephan > > _______________________________________________ > Concurrency-interest mailing list > concurrency-inter...@cs.oswego.edu > http://cs.oswego.edu/mailman/listinfo/concurrency-interest