On 16 May 2011 21:31, <dm-list-haskell-c...@scs.stanford.edu> wrote: > At Mon, 16 May 2011 10:56:02 +0100, > Simon Marlow wrote: > > > > Yes, it's not actually documented as far as I know, and we should fix > > that. But if you think about it, sequential consistency is really the > > only sensible policy: suppose one processor creates a heap object and > > writes a reference to it in the IORef, then another processor reads the > > IORef. The writes that created the heap object must be visible to the > > second processor, otherwise it will encounter uninitialised memory and > > crash. So sequential consistency is necessary to ensure concurrent > > programs can't crash. > > > > Now perhaps it's possible to have a relaxed memory model that provides > > the no-crashes guarantee but still allows IORef writes to be reordered > > (e.g. some kind of causal consistency). That might be important if > > there is some processor arcitecture that provides that memory model, but > > as far as I know there isn't. > > Actually, in your heap object example, it sounds like you only really > care about preserving program order, rather than write atomicity. > Thus, you can get away with less-than-sequential consistency and not > crash. > > The x86 is an example of a relaxed memory model that provides the > no-crashes guarantee you are talking about. Specifically, the x86 > deviates from sequential consistency in two ways > > 1. A load can finish before an earlier store to a different memory > location. [intel, Sec. 8.2.3.4] > > 2. A thread can read its own writes early. [intel, 8.2.3.5] > > [Section references are to the intel architecture manual, vol 3a: > http://www.intel.com/Assets/PDF/manual/253668.pdf] > > One could imagine an implementation of IORefs that relies on the fact > that pointer writes are atomic and that program order is preserved to > avoid mutex overhead for most calls. E.g.: > > struct IORef { > spinlock_t lock; /* Only ever used by atomicModifyIORef */ > HaskellValue *val; /* Updated atomically because pointer-sized > writes are atomic */ > }; > > HaskellValue * > readIORef (struct IORef *ref) > { > return ref->val; > } > > void > writeIORef (struct IORef *ref, HaskellValue *val) > { > /* Note that if *val was initialized in the same thread, then by > * the time another CPU sees ref->val, it will also see the > * correct contents of *ref->val, because stores are seen in a > * consistent order by other processors [intel, Sec. 8.2.3.7]. > * > * If *val was initialized in a different thread, then since this > * thread has seen it, other threads will too, because x86 > * guarantees stores are transitively visible [intel, Sec. 8.2.3.6]. > */ > ref->val = val; > } > > /* modifyIORef is built out of readIORef and writeIORef */ > > HaskellValue * > atomicModifyIORef (Struct IORef *ref, HaskellFunction *f) > { > HaskellValue *result; > spinlock_acquire (&ref->lock); > > result = modifyIORef (ref, f); > > spinlock_release (&ref->lock); > return result; > } > > This is actually how I assumed IORefs worked. But then consider the > following program: > > maybePrint :: IORef Bool -> IORef Bool -> IO () > maybePrint myRef yourRef = do > writeIORef myRef True > yourVal <- readIORef yourRef > unless yourVal $ putStrLn "critical section" > > main :: IO () > main = do > r1 <- newIORef False > r2 <- newIORef False > forkOS $ maybePrint r1 r2 > forkOS $ maybePrint r2 r1 > threadDelay 1000000 > > Under sequential consistency, the string "critical section" should be > output at most once. However, with the above IORef implementation on > x86, since a read can finish before a write to a different location, > both threads might see False for the value of yourVal. > > To prevent this deviation from sequential consistency, you would need > to do something like stick an MFENCE instruction at the end of > writeIORef, and that would slow down the common case where you don't > care about sequential consistency. In fact, I would argue that if you > care about S.C., you should either be using atomicModifyIORef or > MVars. >
mfence is apparently slower than lock add. see http://blogs.oracle.com/dave/entry/instruction_selection_for_volatile_fences so using mfence would make it slower than atomicModifyIORef, and with weaker guarantees. not a good combination. Alexander > Can you explain what actually happens inside the real IORef > implementation? > > As an aside, these days one sees a lot of hand-wringing over the fact > that CPU clock rates have been flat for a while and the only way to > get more performance is through parallelism. "How are we going to > teach programmers to write concurrent code when it's so hard to write > and debug?" I've heard numerous people ask. > > Haskell could be a major step in the right direction, since in the > absence of variables, it's impossible to have data races. (You can > still have deadlock and other kinds of race condition, such as the one > in maybePrint above, if you had my definition of IORef, but data races > are by far the most pernicious concurrency problems.) Of course, the > key to making Haskell useful in a parallel setting is that things like > the memory model have to be fully specified... > > Thanks, > David > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe