William Lee Irwin III <[EMAIL PROTECTED]> asks about true SMP haskell. > On Mon, Oct 13, 2003 at 11:13:56AM +0200, Wolfgang Thaller wrote: > > The reason why we currently do not take advantage of SMP is that the > > Haskell Heap is a shared data structure which is modified whenever a > > thunk (an unevaluated expression) is evaluated. Using synchronisation > > primitives like pthread_mutex_lock for every evaluation of a thunk > > would be deadly for performance. > > There is some old (1999) code in the GHC RTS that attempts this (using > > intel cmpxchg instructions for synchronisation), but it's currently > > "bitrotted" and I don't know how successful it was. > > cmpxchg and then taking a blocking lock sounds like the 2-tier locking > supported with Linux' new futex system calls. I wonder how they chose > to block in the older GHC RTS.
This invariably proves tricky, even more so now that GHC effectively uses a many-to-one threading model (and where it may therefore be wrong to simply block the OS-level thread). In pH, which isn't quite haskell but has the same syntax and runs on multiple processors with a shared heap, we kludged: we simply called "yield" or "sleep" and kept spinning. We were actually using work stealing (with lock-free work queues as I recall) along with wait queues on empty locations, so we didn't sleep very often. Nonetheless, this did occasionally turn out to cause real performance problems. In reality you probably want to use some sort of thin lock / fat lock approach a la Java. Is the location thinly locked? If so, CAS in a fat lock instead, then wait on that. Now update requires a CAS as well, to see whether the thin lock was fattened. If so, the fat lock must be unlocked and deallocated. Naturally, we'd like to keep track of unshared objects so we can avoid any kind of complicated update protocol in the common case. This requires the implementor to establish a careful and clear contract between the compiler, the GC, and the thread scheduler. Finally, the complexity of thunk update pales in comparison to the complexity of multiprocessor GC. In pH we punted on this issue---debugging a GC with per-thread allocation pools was hard enough. It killed us; thread performance didn't scale because GC didn't scale. Eager Haskell was designed with multithreaded shared-memory execution in mind, but I simply couldn't find the year or more it would take to build and debug a true multiprocessor GC---so it remains a uniprocessor implementation. -Jan-Willem Maessen [EMAIL PROTECTED] _______________________________________________ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell