A better test program import Control.Parallel
main = a `par` b `pseq` print (a + b ) where a = ack 3 11 b = fib 39 ack 0 n = n+1 ack m 0 = ack (m-1) 1 ack m n = ack (m-1) (ack m (n-1)) fib 0 = 0 fib 1 = 1 fib n = fib (n-1) + fib (n-2) running it , these are the results paol...@paolino-casa:~$ ghc --make prova -threaded [1 of 1] Compiling Main ( prova.hs, prova.o ) Linking prova ... paol...@paolino-casa:~$ time ./prova +RTS -N1 63262367 real 1m17.485s user 1m16.473s sys 0m0.392s paol...@paolino-casa:~$ time ./prova +RTS -N2 63262367 real 1m20.186s user 1m31.554s sys 0m0.600s paol...@paolino-casa:~$ touch prova.hs paol...@paolino-casa:~$ ghc --make prova -O2 -threaded [1 of 1] Compiling Main ( prova.hs, prova.o ) Linking prova ... paol...@paolino-casa:~$ time ./prova +RTS -N1 63262367 real 0m17.652s user 0m15.277s sys 0m0.108s paol...@paolino-casa:~$ time ./prova +RTS -N2 63262367 real 0m13.650s user 0m15.121s sys 0m0.188s >From the resource graph and the timings it is clear that the program is not able to use all the 2 cores powers, considering computing 'a' alone is about 7 seconds and 'b' alone 9. What is retaining the cpu's to run full power ? paolino 2009/8/21 Don Stewart <d...@galois.com> > paolo.veronelli: > > Hi, reading a previous thread I got interested. > > I simplified the example pointed by dons in > > > > import Control.Parallel > > > > main = a `par` b `pseq` print (a + b ) > > where > > a = ack 3 11 > > b = ack 3 11 > > > > ack 0 n = n+1 > > ack m 0 = ack (m-1) 1 > > ack m n = ack (m-1) (ack m (n-1)) > > > > compiled with > > ghc --make prova -O2 -threaded > > > > timings > > paol...@paolino-casa:~$ time ./prova +RTS -N1 > > 32762 > > > > real 0m7.031s > > user 0m6.304s > > sys 0m0.004s > > paol...@paolino-casa:~$ time ./prova +RTS -N2 > > 32762 > > > > real 0m6.997s > > user 0m6.728s > > sys 0m0.020s > > paol...@paolino-casa:~$ > > > > without optimizations it gets worse > > > > paol...@paolino-casa:~$ time ./prova +RTS -N1 > > 32762 > > > > real 1m20.706s > > user 1m18.197s > > sys 0m0.104s > > paol...@paolino-casa:~$ time ./prova +RTS -N2 > > 32762 > > > > real 1m38.927s > > user 1m45.039s > > sys 0m0.536s > > paol...@paolino-casa:~$ > > > > staring at the resource usage graph I can see it does use 2 cores when > told to > > do it, but with -N1 the used cpu goes 100% and with -N2 they both run > just over > > 50% > > > > thanks for comments > > > Firstly, a and b are identical, so GHC commons them up. The compiler > transforms it into: > > a `par` a `seq` print (a + a) > > So you essentially fork a spark to evaluate 'a', and then have the main > thread also evaluate 'a' again. One of them wins, then you add the > result to itself. The runtime may choose not to convert your first spark > into a thread. > > Running with a 2009 GHC head snapshot, we can see with +RTS -sstderr > > SPARKS: 1 (0 converted, 0 pruned) > > That indeed, it doesn't convert your `par` into a real thread. > > While, for example, the helloworld on the wiki: > > http://haskell.org/haskellwiki/Haskell_in_5_steps > > Converts 2 sparks to 2 theads: > > SPARKS: 2 (2 converted, 0 pruned) > ./B +RTS -threaded -N2 -sstderr 2.13s user 0.04s system 137% cpu 1.570 > total > > -- Don > >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe