Awesome summary by Ben! theres also some crazy hijinks needed currently to do compare and swap style stuff
see https://hackage.haskell.org/package/atomic-primops-0.8.2/docs/Data-Atomics.html for some discussion. (i'm not sure what the current state of the world is on CAS tech) On Thu, Nov 29, 2018 at 9:45 AM Ben Gamari <[email protected]> wrote: > Travis Whitaker <[email protected]> writes: > > > Hello GHC Devs, > > > > I'm trying to get my head around ticket #15449 ( > > https://ghc.haskell.org/trac/ghc/ticket/15449). This gist of things is > that > > GHC generates incorrect aarch64 code that causes memory corruption in > > multithreaded programs run on out-of-order machines. User trommler > > discovered that similar issues are present on PowerPC, and indeed ARMv7 > and > > PowerPC support the same types of load/store reorderings. The LLVM code > > emitted by GHC may be incorrect with respect to LLVM's memory model, but > > this isn't a problem on architectures with minimal reordering like x86. > > > Thank you for picking this up! > > > I had initially thought that GHC simply wasn't emitting the appropriate > > LLVM fences; there's an elephant-gun-approach here ( > > https://github.com/TravisWhitaker/ghc/commits/ghc843-wip/T15449) that > > guards each atomic operation with a full barrier. I still believe that > GHC > > is omitting necessary LLVM fences, but this change is insufficient to fix > > the behavior of the test case (which is simply GHC itself compiling a > test > > package with '-jN', N > 1). > > > > It seems there's a long and foggy history of the Cmm memory model. Edward > > Yang discusses this a bit in his post here ( > > > http://blog.ezyang.com/2014/01/so-you-want-to-add-a-new-concurrency-primitive-to-ghc/ > ) > > and issues similar to #15449 have plagued GHC in the past, like #12469 ( > > https://ghc.haskell.org/trac/ghc/ticket/12469). Worryingly, GHC only has > > MO_WriteBarrier, whereas PowerPC and ARMv7 really need read, write, and > > full memory barriers. On ARM an instruction memory barrier might be > > required as well, but I don't know enough about STG/Cmm to say for sure, > > and it'd likely be LLVM's responsibility to emit that anyway. > > > In my opinion GHC's current memory model story is quite unsustainable. As > you point out, we currently have a limited selection of plain barriers > and a few atomic operations, none of which are adequately documented > given the subtlety they involve. > > I think we would at this point be foolish not to take advantage of the > research that has been done in this area by moving to C11-style > acquire/release semantics wherever possible. While following through on > this is not a small task, these semantics are both well defined and > easier to reason about intuitively. We might also be able to benefit > from the wealth of model checking tools that are now available for this > formalism. > > Ulan Degenbaev (CC'd) and I discussed him possibly picking up the task > of moving to C11 atomics a few weeks ago. > > > I'm hoping that someone with more tribal knowledge than I might be able > to > > give me some pointers with regards to the following areas: > > > > > > - Does STG itself have anything like a memory model? My intuition says > > 'definitely not', but given that STG expressions may contain Cmm > operations > > (via StgCmmPrim), there may be STG-to-STG transformations that need > to care > > about the target machine's memory model. > > As far as I know, it has nothing formally, or even informally, defined. > That being said, relatively few of our primops make any guarantees about > their operation in a concurrent setting. The cases that I can think of > include `casArray#`, `atomicModifyIORef#`, `atomic{Read,Write}*Array#` > and the STM operations. > > > - With respect to Cmm, what reorderings does GHC perform? What are the > > relevant parts of the compiler to begin studying? > > As far as I know, very few. We only perform a handful of optimizations > on C--. These include, > > * Sinking of assignments (see compiler/cmm/CmmSink.hs); see > CmmSink.conflicts for what commutation we allow. We notably don't > allow sinking of any memory assignments past calls (including > primops) > > * Simple constant folding (see CmmOpt.cmmMachOpFoldM) > > * Common block elimination (see CmmCommonBlockElim) > > * Some simple control-flow optimisation (see CmmContFlowOpt) > > > - Are the LLVM atomics that GHC emits correct with respect to the LLVM > > memory model? As it stands now LLVM fences are only emitted for > > MO_WriteBarrier. Without fences accompanying the atomics, it seems > the LLVM > > compiler could float dependent loads/stores past atomic operations. > > Frankly, I would be surprised if they are correct. Few people have > really looked at GHC's memory ordering properties and even fewer have > looked at those of the LLVM backend. > > > - Why is MO_WriteBarrier the only provided memory barrier? My hunch is > > that it's because this is the only sort of barrier required on x86, > which > > only allows loads to be reordered with older stores, but perhaps I'm > > missing something? Is it plausible that Cmm simply needs additional > barrier > > primitives to target these weaker memory models? Conversely, is there > some > > property of Cmm that let's us get away without read barriers at all? > > > Note that there are a few others defined in SMP.h (namely > store_load_barrier and load_load_barrier). However, see the discussion > above. My understanding is that there were just very few formalisms for > reasoning about weak memory models when this code was written. The > literature now contains a much better understanding of this field but > GHC has yet to catch up. > > > Cheers, > > - Ben > > _______________________________________________ > ghc-devs mailing list > [email protected] > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >
_______________________________________________ ghc-devs mailing list [email protected] http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
