vandijk.roel: > I replaced the standard random number generated with the one from > mersenne-random. On my system this makes the resulting program about > 14 times faster than the original. I also made a change to > accumulateHit because it doesn't need to count to total. That is > already known. > > {-# LANGUAGE BangPatterns #-} > > import System( getArgs ) > import Data.List( foldl' ) > > import System.Random.Mersenne > > pairs :: [a] -> [(a,a)] > pairs [] = [] > pairs (x:[]) = [] > pairs (x:y:rest) = (x, y) : pairs rest > > isInCircle :: (Double, Double) -> Bool > isInCircle (x,y) = sqrt (x*x + y*y) <= 1.0 > > accumulateHit :: Int -> (Double, Double) -> Int > accumulateHit (!hits) pair | isInCircle pair = hits + 1 > | otherwise = hits > > countHits :: [(Double, Double)] -> Int > countHits ps = foldl' accumulateHit 0 ps > > monteCarloPi :: Int -> [(Double, Double)] -> Double > monteCarloPi n xs = 4.0 * fromIntegral hits / fromIntegral n > where hits = countHits $ take n xs > > main = do > args <- getArgs > let samples = read $ head args > > randomNumberGenerator <- getStdGen > randomNumbers <- randoms randomNumberGenerator > > let res = monteCarloPi samples $ pairs randomNumbers > putStrLn $ show $ res
But note the lazy list of Double pairs, so the inner loop still looks like this though: $wlgo :: Int# -> [(Double, Double)] -> Int $wlgo = \ (ww_s1pv :: Int#) (w_s1px :: [(Double, Double)]) -> case w_s1px of wild_aTl { [] -> I# ww_s1pv; : x_aTp xs_aTq -> case x_aTp of wild1_B1 { (x1_ak3, y_ak5) -> case x1_ak3 of wild2_aX8 { D# x2_aXa -> case y_ak5 of wild3_XYs { D# x3_XYx -> case <=## (sqrtDouble# (+## (*## x2_aXa x2_aXa) (*## x3_XYx x3_XYx))) 1.0 of wild4_X1D { False -> $wlgo ww_s1pv xs_aTq; True -> $wlgo (+# ww_s1pv 1) xs_aTq } while we want to keep everything in registers with something like: Int# -> Double# -> Double# -> Int# So we'll be paying a penalty to force the next elem of the list (instead of just calling the Double generator). This definitely has an impact on performance. $ ghc-core B.hs -O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=core2 -funbox-strict-fields $ time ./B 10000000 3.1407688 ./B 10000000 2.41s user 0.01s system 99% cpu 2.415 total Now, what if we just rewrote that inner loop directly to avoid intermediate stuff? That'd give us a decent lower bound. {-# LANGUAGE BangPatterns #-} import System.Environment import System.Random.Mersenne isInCircle :: Double -> Double -> Bool isInCircle x y = sqrt (x*x + y*y) <= 1.0 countHits :: Int -> IO Int countHits lim = do g <- newMTGen Nothing let go :: Int -> Int -> IO Int go !throws !hits | throws >= lim = return hits | otherwise = do x <- random g -- use mersenne-random-pure64 to stay pure! y <- random g if isInCircle x y then go (throws+1) (hits+1) else go (throws+1) hits go 0 0 monteCarloPi :: Int -> IO Double monteCarloPi n = do hits <- countHits n return $ 4.0 * fromIntegral hits / fromIntegral n main = do [n] <- getArgs res <- monteCarloPi (read n) print res And now the inner loop looks like: $wa_s1yW :: Int# -> Int# -> State# RealWorld -> (# State# RealWorld, Int #) Pretty good. Can't avoid the Int boxed return (and resulting heap check) due to use of IO monad. But at least does away with heap allocs in the inner loop! How does it go: $ ghc-core A.hs -O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=core2 -funbox-strict-fields $ time ./A 10000000 3.1412564 ./A 10000000 0.81s user 0.00s system 99% cpu 0.818 total Ok. So 3 times faster. Now the goal is to recover the high level version. We have many tools to employ: switching to mersenne-random-pure64 might help here. And seeing if you can fuse filling a uvector with randoms, and folding over it... t -- Don _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe