Re: [Haskell-cafe] Parallel weirdness [new insights]

Andrew Coppin Sun, 20 Apr 2008 12:41:53 -0700

OK, well I now have so much data sitting in from of me I don't even know*what* I'm seeing any more. I have made several significant discoveriesthough...


Consider the following:


 msort [] = []
 msort [x] = [x]
 msort xs =
   let
     xs0 = msort (split0 xs)
     xs1 = msort (split1 xs)
   in merge xs0 xs1

This takes roughly 14 seconds to sort a list of one million Word32values. If I now change the final line to read


 in listSeq rwhnf xs0 `seq` listSeq rwhnf xs1 `seq` merge xs0 xs1

it now takes 8 seconds to do the same job.

Notice that this is still completely sequential execution. It's justexecuting in a different order. (And, at first glance, doing slightlymore work.) Of all the benchmarks I've performed, I have yet to findanything that goes faster than this. If I make it so that xs0 iscomputed in parallel with xs1 instead of in series, then it goes atroughly the same speed (but with more variation) if the number of realthreads is 1. If you add more real threads, execution slows down. (Gofigure!) I was expecting running parallel at just the top few levels andthen switching to pure sequential for the lower levels to give the bestperformance. But the numbers I have seem to say that more parallel =slower, with 100% sequential giving me the fastest time of all.

The next insight happens when you look at the GC statistics. Both theunmarked and the explicitly sequential program are giving me roughly 55%GC time and 45% user time. (!!) Obviously this is a Very Bad Thing. Idiscovered that simply by adding -H200m to the command line, I can makeboth programs speed up by about 20% or so. (They then drop down toroughly 25% GC time. Adding more RAM doesn't seem to make any difference.)

I had assumed that the explicitly sequential program was faster becauseit was somehow demanding less GC time, but that doesn't appear to be thecase - both GC time and user time are lower for the explicitlysequential version. And adding more heap space doesn't make the (large)speed difference go away. Is the strictness of the seq operator makingGHC come up with different a Core implementation for this function orsomething? I have no idea.

With the extra heap space, the speed difference between the sequentialand parallel programs becomes smaller. The sequential version *is* stillfaster, however. I have no explanation for why that might be. Addingmore heap also seems to make the runtimes more variable. (I run eachtest 8 times. One test, the fastest run was 7 seconds and the slowestwas 11 seconds. That's quite a variation. The sequential algorithm onlyvaries by a few milliseconds each time.)

In short, it seems my little sorting algorithm test is *actually* juststressing out the GC engine, and I'm "really" benchmarking different GCsettings. :-(


Questions:

1. Does running the GC force all threads to stop? I know some GC designsdo this, but I have no idea how the one GHC implements works.


2. Is the GC still single-threaded? (GHC 6.8.2 here.)

3. Is there any way for a running Haskell program to find out how muchheap space is currently allocated / used / free? I know you can find outhow much wall time and CPU time you've used, but I couldn't findanything for RAM usage.


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Parallel weirdness [new insights]

Reply via email to