Thanks for the suggestions. I'll try them and report back. Although I've since found that out of 3 not-identical systems, this problem only occurs on one. So I may try different kernel/system libs and see where that gets me.
-qg is funny. My interpretation from the results so far is that, when the parallel collector doesn't get stalled, it results in a big win. But when parGC does stall, it's slower than disabling parallel gc entirely. I had thought the last core parallel slowdown problem was fixed a while ago, but apparently not? Thanks, John On Tue, Jun 19, 2012 at 8:49 AM, Ben Lippmeier <[email protected]> wrote: > > On 19/06/2012, at 24:48 , Tyson Whitehead wrote: > >> On June 18, 2012 04:20:51 John Lato wrote: >>> Given this, can anyone suggest any likely causes of this issue, or >>> anything I might want to look for? Also, should I be concerned about >>> the much larger gc_alloc_block_sync level for the slow run? Does that >>> indicate the allocator waiting to alloc a new block, or is it >>> something else? Am I on completely the wrong track? >> >> A total shot in the dark here, but wasn't there something about really bad >> performance when you used all the CPUs on your machine under Linux? >> >> Presumably very tight coupling that is causing all the threads to stall >> everytime the OS needs to do something or something? > > This can be a problem for data parallel computations (like in Repa). In Repa > all threads in the gang are supposed to run for the same time, but if one > gets swapped out by the OS then the whole gang is stalled. > > I tend to get best results using -N7 for an 8 core machine. > > It is also important to enable thread affinity (with the -qa) flag. > > For a Repa program on an 8 core machine I use +RTS -N7 -qa -qg > > Ben. > > _______________________________________________ Glasgow-haskell-users mailing list [email protected] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
