I was thinking about my response, and realized there was one major misleading thing in my description. The load reordering I described applies to load instructions in C-- proper, i.e. things that show up in the C-- dup as:
W_ x = I64[...addr...] Reads to IORefs and reads to vectors get compiled inline (as they eventually translate into inline primops), so my admonitions are applicable. However, the story with *foreign primops* (which is how loadLoadBarrier in atomic-primops is defined, how you might imagine defining a custom read function as a primop) is a little different. First, what does a call to an foreign primop compile into? It is *not* inlined, so it will eventually get compiled into a jump (this could be a problem if you're really trying to squeeze out performance!) Second, the optimizer is a bit more conservative when it comes to primop calls (internally referred to as "unsafe foreign calls"); at the moment, the optimizer assumes these foreign calls clobber heap memory, so we *automatically* will not push loads/stores beyond this boundary. (NB: We reserve the right to change this in the future!) This is probably why atomic-primops, as it is written today, seems to work OK, even in the presence of the optimizer. But I also have a hard time believing it gives the speedups you want, due to the current design. (CC'd Ryan Newton, because I would love to be wrong here, and maybe he can correct me on this note.) Cheers, Edward P.S. loadLoadBarrier compiles to a no-op on x86 architectures, but because it's not inlined I think you will still end up with a jump (LLVM might be able to eliminate it). Excerpts from John Lato's message of 2013-12-31 03:01:58 +0800: > Hi Edward, > > Thanks very much for this reply, it answers a lot of questions I'd had. > I'd hoped that ordering would be preserved through C--, but c'est la vie. > Optimizing compilers are ever the bane of concurrent algorithms! > > stg/SMP.h does define a loadLoadBarrier, which is exposed in Ryan Newton's > atomic-primops package. From the docs, I think that's a general read > barrier, and should do what I want. Assuming it works properly, of course. > If I'm lucky it might even be optimized out. > > Thanks, > John > > On Mon, Dec 30, 2013 at 6:04 AM, Edward Z. Yang <ezy...@mit.edu> wrote: > > > Hello John, > > > > Here are some prior discussions (which I will attempt to summarize > > below): > > > > http://www.haskell.org/pipermail/haskell-cafe/2011-May/091878.html > > http://www.haskell.org/pipermail/haskell-prime/2006-April/001237.html > > http://www.haskell.org/pipermail/haskell-prime/2006-March/001079.html > > > > The guarantees that Haskell and GHC give in this area are hand-wavy at > > best; at the moment, I don't think Haskell or GHC have a formal memory > > model—this seems to be an open research problem. (Unfortunately, AFAICT > > all the researchers working on relaxed memory models have their hands > > full with things like C++ :-) > > > > If you want to go ahead and build something that /just/ works for a > > /specific version/ of GHC, you will need to answer this question > > separately for every phase of the compiler. For Core and STG, monads > > will preserve ordering, so there is no trouble. However, for C--, we > > will almost certainly apply optimizations which reorder reads (look at > > CmmSink.hs). To properly support your algorithm, you will have to add > > some new read barrier mach-ops, and teach the optimizer to respect them. > > (This could be fiendishly subtle; it might be better to give C-- a > > memory model first.) These mach-ops would then translate into > > appropriate arch-specific assembly or LLVM instructions, preserving > > the guarantees further. > > > > This is not related to your original question, but the situation is a > > bit better with regards to reordering stores: we have a WriteBarrier > > MachOp, which in principle, prevents store reordering. In practice, we > > don't seem to actually have any C-- optimizations that reorder stores. > > So, at least you can assume these will work OK! > > > > Hope this helps (and is not too inaccurate), > > Edward > > > > Excerpts from John Lato's message of 2013-12-20 09:36:11 +0800: > > > Hello, > > > > > > I'm working on a lock-free algorithm that's meant to be used in a > > > concurrent setting, and I've run into a possible issue. > > > > > > The crux of the matter is that a particular function needs to perform the > > > following: > > > > > > > x <- MVector.read vec ix > > > > position <- readIORef posRef > > > > > > and the algorithm is only safe if these two reads are not reordered (both > > > the vector and IORef are written to by other threads). > > > > > > My concern is, according to standard Haskell semantics this should be > > safe, > > > as IO sequencing should guarantee that the reads happen in-order. Of > > > course this also relies upon the architecture's memory model, but x86 > > also > > > guarantees that reads happen in order. However doubts remain; I do not > > > have confidence that the code generator will handle this properly. In > > > particular, LLVM may freely re-order loads of NotAtomic and Unordered > > > values. > > > > > > The one hope I have is that ghc will preserve IO semantics through the > > > entire pipeline. This seems like it would be necessary for proper > > handling > > > of exceptions, for example. So, can anyone tell me if my worries are > > > unfounded, or if there's any way to ensure the behavior I want? I could > > > change the readIORef to an atomicModifyIORef, which should issue an > > mfence, > > > but that seems a bit heavy-handed as just a read fence would be > > sufficient > > > (although even that seems more than necessary). > > > > > > Thanks, > > > John L. > > _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users