I'm afraid without uniqueness or linear typing it would be hard to avoid. 2012/3/10 Thiago Negri <evoh...@gmail.com>
> I see. Thanks for the answers. > > Any data structure or source annotation that would prevent that? > > For example, if I try the same program to run on a > [1..9999999999999999] list, I'll get an out of memory error for the > single-threaded version. Any way to prevent it without declaring two > different versions of "list"? > > > 2012/3/10 Anthony Cowley <acow...@gmail.com>: > > From that profiling data, I think you're just seeing a decrease in > sharing. With one thread, you create the list structure in memory: the > first fold could consume it in-place, but the second fold is still waiting > for its turn. The list is built on the heap so the two folds can both > refer to the same list. > > > > With two threads, GHC is being clever and inlining the definition you > give for list, which is then optimized into two parallel loops. No list on > the heap means there's not much for the GC to do. > > > > Sharing of index lists like this is a common source of problems. In > particular, nested loops can make it even trickier to prevent sharing as > there may not be an opportunity for parallel evaluation. > > > > Anthony > > > > On Mar 10, 2012, at 10:21 AM, Thiago Negri <evoh...@gmail.com> wrote: > > > >> Hi all. > >> > >> I wrote a very simple program to try out parallel Haskel and check how > >> it would look like to make use of more than one core in this language. > >> > >> When I tried the program with RTS option -N1, total time shows it took > >> 2.48 seconds to complete and around 65% of that time was taken by GC. > >> > >> Then I tried the same program with RTS options -N2 and total time > >> decreased to 1.15 seconds as I expected a gain here. But what I didn't > >> expect is the GC time to drop to 0%. > >> > >> I guess I'm having trouble to understand the output of the RTS option > -s. > >> Can you enlighten me? > >> > >> > >> The source for the testing program: > >> > >>> module Main where > >>> > >>> import Data.List (foldl1') > >>> import Control.Parallel (par, pseq) > >>> import Control.Arrow ((&&&)) > >>> > >>> f `parApp` (a, b) = a `par` (b `pseq` (f a b)) > >>> seqApp = uncurry > >>> > >>> main = print result > >>> where result = (+) `parApp` minMax list > >>> minMax = minlist &&& maxlist > >>> minlist = foldl1' min > >>> maxlist = foldl1' max > >>> list = [1..19999999] > >> > >> > >> The results on a Windows 7 64bits with an Intel Core 2 Duo, compiled > >> with GHC from Haskell Platform: > >> > >> c:\tmp\hs>par +RTS -s -N1 > >> par +RTS -s -N1 > >> 20000000 > >> 803,186,152 bytes allocated in the heap > >> 859,916,960 bytes copied during GC > >> 233,465,740 bytes maximum residency (10 sample(s)) > >> 30,065,860 bytes maximum slop > >> 483 MB total memory in use (0 MB lost due to fragmentation) > >> > >> Generation 0: 1523 collections, 0 parallel, 0.80s, 0.75s elapsed > >> Generation 1: 10 collections, 0 parallel, 0.83s, 0.99s elapsed > >> > >> Parallel GC work balance: nan (0 / 0, ideal 1) > >> > >> MUT time (elapsed) GC time (elapsed) > >> Task 0 (worker) : 0.00s ( 0.90s) 0.00s ( 0.06s) > >> Task 1 (worker) : 0.00s ( 0.90s) 0.00s ( 0.00s) > >> Task 2 (bound) : 0.86s ( 0.90s) 1.62s ( 1.69s) > >> > >> SPARKS: 1 (0 converted, 0 pruned) > >> > >> INIT time 0.00s ( 0.00s elapsed) > >> MUT time 0.86s ( 0.90s elapsed) > >> GC time 1.62s ( 1.74s elapsed) > >> EXIT time 0.00s ( 0.00s elapsed) > >> Total time 2.48s ( 2.65s elapsed) > >> > >> %GC time 65.4% (65.9% elapsed) > >> > >> Alloc rate 936,110,032 bytes per MUT second > >> > >> Productivity 34.6% of total user, 32.4% of total elapsed > >> > >> gc_alloc_block_sync: 0 > >> whitehole_spin: 0 > >> gen[0].sync_large_objects: 0 > >> gen[1].sync_large_objects: 0 > >> > >> > >> c:\tmp\hs>par +RTS -s -N2 > >> par +RTS -s -N2 > >> 20000000 > >> 1,606,279,644 bytes allocated in the heap > >> 74,924 bytes copied during GC > >> 28,340 bytes maximum residency (1 sample(s)) > >> 29,004 bytes maximum slop > >> 2 MB total memory in use (0 MB lost due to fragmentation) > >> > >> Generation 0: 1566 collections, 1565 parallel, 0.00s, 0.01s elapsed > >> Generation 1: 1 collections, 1 parallel, 0.00s, 0.00s elapsed > >> > >> Parallel GC work balance: 1.78 (15495 / 8703, ideal 2) > >> > >> MUT time (elapsed) GC time (elapsed) > >> Task 0 (worker) : 0.00s ( 0.59s) 0.00s ( 0.00s) > >> Task 1 (worker) : 0.58s ( 0.59s) 0.00s ( 0.01s) > >> Task 2 (bound) : 0.58s ( 0.59s) 0.00s ( 0.00s) > >> Task 3 (worker) : 0.00s ( 0.59s) 0.00s ( 0.00s) > >> > >> SPARKS: 1 (1 converted, 0 pruned) > >> > >> INIT time 0.00s ( 0.00s elapsed) > >> MUT time 1.15s ( 0.59s elapsed) > >> GC time 0.00s ( 0.01s elapsed) > >> EXIT time 0.00s ( 0.00s elapsed) > >> Total time 1.15s ( 0.61s elapsed) > >> > >> %GC time 0.0% (2.4% elapsed) > >> > >> Alloc rate 1,391,432,695 bytes per MUT second > >> > >> Productivity 100.0% of total user, 190.3% of total elapsed > >> > >> gc_alloc_block_sync: 90 > >> whitehole_spin: 0 > >> gen[0].sync_large_objects: 0 > >> gen[1].sync_large_objects: 0 > >> > >> _______________________________________________ > >> Haskell-Cafe mailing list > >> Haskell-Cafe@haskell.org > >> http://www.haskell.org/mailman/listinfo/haskell-cafe > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe