[Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)
Hi all, On HaWiki was an announcement of MersenneTwister made by Lennart Augustsson. On a typical run to find out 1000th rnd num the output is (code shown below): $ time ./testMTla Testing Mersenne Twister. Result is [3063349438] real0m4.925s user0m4.856s I was exercising with the very same algorithm and tried to make it efficient (by using IOUArray): now a typical run looks like (code shown below): $ time ./testMT Testing Mersenne Twister. 3063349438 real0m3.032s user0m3.004s The original C-version (modified so that only the last number is shown) gives typically $ time ./mt19937ar outputs of genrand_int32() 3063349438 real0m0.624s user0m0.616s Results are similar with 64 bit IOUArray against 64 bit C variant. C seems to work about 5 to 10 times faster in this case. I have tried to do different things but now I'm stuck. unsafeRead and unsafeWrite improved a bit the lazy (STUArray-version) and IOUArray-versions but not very much. I took a look of Core file but then, I'm not sure where the boxed values are ok. E.g. should IOUArray Int Word64 be replaced with something else? Any hints and comments on how to improve the efficiency and make everything better will be appreciated a lot! br, Isto - testMTla.hs (MersenneTwister, see HaWiki) module Main where -- ghc -O3 -optc-O3 -optc-ffast-math -fexcess-precision --make testMTla import MersenneTwister main = do putStrLn "Testing Mersenne Twister." let mt = mersenneTwister 100 w = take 1 (drop 999 mt) -- w = take 1 (drop 99 mt) putStrLn $ "Result is " ++ (show w) - - testMT.hs module Main where -- Compile eg with -- ghc -O3 -optc-O3 -optc-ffast-math -fexcess-precision --make testMT import Mersenne genRNums32 :: MT32 -> Int -> IO (MT32) genRNums32 mt nCnt = gRN mt nCnt where gRN :: MT32 -> Int -> IO (MT32) gRN mt nCnt | mt `seq` nCnt `seq` False = undefined gRN mt 1= do (r,mt') <- next32 mt putStrLn $ (show r) return mt' gRN mt nCnt = do (r,mt') <- next32 mt gRN mt' $! (nCnt-1) main = do putStrLn "Testing Mersenne Twister." mt32 <- initialiseGenerator32 100 genRNums32 mt32 1000 - - Mersenne.hs (sorry for linewraps) module Mersenne where import Data.Bits import Data.Word import Data.Array.Base import Data.Array.MArray import Data.Array.IO -- import System.Random data MT32 = MT32 (IOUArray Int Word32) Int data MT64 = MT64 (IOUArray Int Word64) Int last32bitsof :: Word32 -> Word32 last32bitsof a = a .&. 0x -- == (2^32-1) lm32 = 0x7fff :: Word32 um32 = 0x8000 :: Word32 mA32 = 0x9908b0df :: Word32 -- == 2567483615 -- Array of length 624. initialiseGenerator32 :: Int -> IO MT32 initialiseGenerator32 seed = do let s = last32bitsof (fromIntegral seed)::Word32 mt <- newArray (0,623) (0::Word32) unsafeWrite mt 0 s iG mt s 1 mt' <- generateNumbers32 mt return (MT32 mt' 0) where iG :: (IOUArray Int Word32) -> Word32 -> Int -> IO (IOUArray Int Word32) iG mt lastNro n | n == 624= return mt | otherwise = do let n1 = lastNro `xor` (shiftR lastNro 30) new = (1812433253 * n1 + (fromIntegral n)::Word32) unsafeWrite mt n new iG mt new (n+1) generateNumbers32 :: (IOUArray Int Word32) -> IO (IOUArray Int Word32) generateNumbers32 mt = gLoop 0 mt where gLoop :: Int -> (IOUArray Int Word32) -> IO (IOUArray Int Word32) gLoop i mt | i==623 = do wL <- unsafeRead mt 623 w0 <- unsafeRead mt 0 w396 <- unsafeRead mt 396 let y = (wL .&. um32) .|. (w0 .&. lm32) :: Word32 if even y then unsafeWrite mt 623 (w396 `xor` (shiftR y 1)) else unsafeWrite mt 623 (w396 `xor` (shiftR y 1) `xor` mA32) return mt | otherwise = do wi <- unsafeRead mt i wi1 <- unsafeRead mt (i+1) w3 <- unsafeRead mt ((i+397) `mod` 624) let y = (wi .&. um32) .|. (wi1 .&. lm32) if even y then unsafeWrite mt i (w3 `xor` (shiftR y
Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)
Now, this will be hard to get close the the highly tuned C. Possibly its doable. The main tricks are documented here: http://haskell.org/haskellwiki/Performance/GHC#Unboxed_types Inspecting the Core to ensure the math is being inlined and unboxed will be the most crucial issue, I'd imagine. Then again, an FFI binding to mersenne.c is also a good idea :) -- Don isto.aho: > Hi all, > > On HaWiki was an announcement of MersenneTwister made by Lennart > Augustsson. On a typical run to find out 1000th rnd num the output > is (code shown below): > > $ time ./testMTla > Testing Mersenne Twister. > Result is [3063349438] > > real0m4.925s > user0m4.856s > > > I was exercising with the very same algorithm and tried to make it > efficient (by using IOUArray): now a typical run looks like (code shown > below): > > $ time ./testMT > Testing Mersenne Twister. > 3063349438 > > real0m3.032s > user0m3.004s > > > The original C-version (modified so that only the last number is > shown) gives typically > > $ time ./mt19937ar > outputs of genrand_int32() > 3063349438 > > real0m0.624s > user0m0.616s > > Results are similar with 64 bit IOUArray against 64 bit C variant. > C seems to work about 5 to 10 times faster in this case. > > I have tried to do different things but now I'm stuck. unsafeRead > and unsafeWrite improved a bit the lazy (STUArray-version) and > IOUArray-versions but not very much. I took a look of Core file but > then, I'm not sure where the boxed values are ok. E.g. should IOUArray > Int Word64 be replaced with something else? > > Any hints and comments on how to improve the efficiency and make > everything better will be appreciated a lot! > > br, Isto > > - testMTla.hs (MersenneTwister, see HaWiki) > module Main where > > -- ghc -O3 -optc-O3 -optc-ffast-math -fexcess-precision --make testMTla > > import MersenneTwister > > main = do > putStrLn "Testing Mersenne Twister." > let mt = mersenneTwister 100 > w = take 1 (drop 999 mt) > -- w = take 1 (drop 99 mt) > putStrLn $ "Result is " ++ (show w) > - > > - testMT.hs > module Main where > > -- Compile eg with > -- ghc -O3 -optc-O3 -optc-ffast-math -fexcess-precision --make testMT > > import Mersenne > > genRNums32 :: MT32 -> Int -> IO (MT32) > genRNums32 mt nCnt = gRN mt nCnt > where gRN :: MT32 -> Int -> IO (MT32) > gRN mt nCnt | mt `seq` nCnt `seq` False = undefined > gRN mt 1= do > (r,mt') <- next32 mt > putStrLn $ (show r) > return mt' > gRN mt nCnt = do > (r,mt') <- next32 mt > gRN mt' $! (nCnt-1) > > > main = do > putStrLn "Testing Mersenne Twister." > mt32 <- initialiseGenerator32 100 > genRNums32 mt32 1000 > - > > - Mersenne.hs (sorry for linewraps) > module Mersenne where > > import Data.Bits > import Data.Word > import Data.Array.Base > import Data.Array.MArray > import Data.Array.IO > -- import System.Random > > > data MT32 = MT32 (IOUArray Int Word32) Int > data MT64 = MT64 (IOUArray Int Word64) Int > > > last32bitsof :: Word32 -> Word32 > last32bitsof a = a .&. 0x -- == (2^32-1) > > lm32 = 0x7fff :: Word32 > um32 = 0x8000 :: Word32 > mA32 = 0x9908b0df :: Word32 -- == 2567483615 > > -- Array of length 624. > initialiseGenerator32 :: Int -> IO MT32 > initialiseGenerator32 seed = do > let s = last32bitsof (fromIntegral seed)::Word32 > mt <- newArray (0,623) (0::Word32) > unsafeWrite mt 0 s > iG mt s 1 > mt' <- generateNumbers32 mt > return (MT32 mt' 0) > where > iG :: (IOUArray Int Word32) -> Word32 -> Int -> IO (IOUArray Int > Word32) > iG mt lastNro n > | n == 624= return mt > | otherwise = do let n1 = lastNro `xor` (shiftR lastNro > 30) >new = (1812433253 * n1 + > (fromIntegral n)::Word32) >unsafeWrite mt n new >iG mt new (n+1) > > > generateNumbers32 :: (IOUArray Int Word32) -> IO (IOUArray Int Word32) > generateNumbers32 mt = gLoop 0 mt > where > gLoop :: Int -> (IOUArray Int Word32) -> IO (IOUArray Int > Word32) > gLoop i mt > | i==623 = do > wL <- unsafeRead mt 623 > w0 <- unsafeRead mt 0 > w396 <- unsafeRead mt 396 > let y = (wL .&. um32) .|. (w0 .&. lm32) :: > Word32 > if even y > then unsafe
Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)
A big problem with the Mersenne Twister is the shifts. As has been noted elsewhere, ghc doesn't do such a great job on those. -- Lennart On Nov 1, 2006, at 20:17 , Donald Bruce Stewart wrote: Now, this will be hard to get close the the highly tuned C. Possibly its doable. The main tricks are documented here: http://haskell.org/haskellwiki/Performance/GHC#Unboxed_types Inspecting the Core to ensure the math is being inlined and unboxed will be the most crucial issue, I'd imagine. Then again, an FFI binding to mersenne.c is also a good idea :) -- Don isto.aho: Hi all, On HaWiki was an announcement of MersenneTwister made by Lennart Augustsson. On a typical run to find out 1000th rnd num the output is (code shown below): $ time ./testMTla Testing Mersenne Twister. Result is [3063349438] real0m4.925s user0m4.856s I was exercising with the very same algorithm and tried to make it efficient (by using IOUArray): now a typical run looks like (code shown below): $ time ./testMT Testing Mersenne Twister. 3063349438 real0m3.032s user0m3.004s The original C-version (modified so that only the last number is shown) gives typically $ time ./mt19937ar outputs of genrand_int32() 3063349438 real0m0.624s user0m0.616s Results are similar with 64 bit IOUArray against 64 bit C variant. C seems to work about 5 to 10 times faster in this case. I have tried to do different things but now I'm stuck. unsafeRead and unsafeWrite improved a bit the lazy (STUArray-version) and IOUArray-versions but not very much. I took a look of Core file but then, I'm not sure where the boxed values are ok. E.g. should IOUArray Int Word64 be replaced with something else? Any hints and comments on how to improve the efficiency and make everything better will be appreciated a lot! br, Isto - testMTla.hs (MersenneTwister, see HaWiki) module Main where -- ghc -O3 -optc-O3 -optc-ffast-math -fexcess-precision --make testMTla import MersenneTwister main = do putStrLn "Testing Mersenne Twister." let mt = mersenneTwister 100 w = take 1 (drop 999 mt) -- w = take 1 (drop 99 mt) putStrLn $ "Result is " ++ (show w) - - testMT.hs module Main where -- Compile eg with -- ghc -O3 -optc-O3 -optc-ffast-math -fexcess-precision --make testMT import Mersenne genRNums32 :: MT32 -> Int -> IO (MT32) genRNums32 mt nCnt = gRN mt nCnt where gRN :: MT32 -> Int -> IO (MT32) gRN mt nCnt | mt `seq` nCnt `seq` False = undefined gRN mt 1= do (r,mt') <- next32 mt putStrLn $ (show r) return mt' gRN mt nCnt = do (r,mt') <- next32 mt gRN mt' $! (nCnt-1) main = do putStrLn "Testing Mersenne Twister." mt32 <- initialiseGenerator32 100 genRNums32 mt32 1000 - - Mersenne.hs (sorry for linewraps) module Mersenne where import Data.Bits import Data.Word import Data.Array.Base import Data.Array.MArray import Data.Array.IO -- import System.Random data MT32 = MT32 (IOUArray Int Word32) Int data MT64 = MT64 (IOUArray Int Word64) Int last32bitsof :: Word32 -> Word32 last32bitsof a = a .&. 0x -- == (2^32-1) lm32 = 0x7fff :: Word32 um32 = 0x8000 :: Word32 mA32 = 0x9908b0df :: Word32 -- == 2567483615 -- Array of length 624. initialiseGenerator32 :: Int -> IO MT32 initialiseGenerator32 seed = do let s = last32bitsof (fromIntegral seed)::Word32 mt <- newArray (0,623) (0::Word32) unsafeWrite mt 0 s iG mt s 1 mt' <- generateNumbers32 mt return (MT32 mt' 0) where iG :: (IOUArray Int Word32) -> Word32 -> Int -> IO (IOUArray Int Word32) iG mt lastNro n | n == 624= return mt | otherwise = do let n1 = lastNro `xor` (shiftR lastNro 30) new = (1812433253 * n1 + (fromIntegral n)::Word32) unsafeWrite mt n new iG mt new (n+1) generateNumbers32 :: (IOUArray Int Word32) -> IO (IOUArray Int Word32) generateNumbers32 mt = gLoop 0 mt where gLoop :: Int -> (IOUArray Int Word32) -> IO (IOUArray Int Word32) gLoop i mt | i==623 = do wL <- unsafeRead mt 623 w0 <- unsafeRead mt 0 w396 <- unsafeRead mt 396 let y = (wL .&. um32) .|. (w0 .&. lm32) :: Word32 if even y then unsa
Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)
On 11/1/06, isto <[EMAIL PROTECTED]> wrote: Hi all, On HaWiki was an announcement of MersenneTwister made by Lennart Augustsson. On a typical run to find out 1000th rnd num the output is (code shown below): $ time ./testMTla Testing Mersenne Twister. Result is [3063349438] real0m4.925s user0m4.856s I was exercising with the very same algorithm and tried to make it efficient (by using IOUArray): now a typical run looks like (code shown below): $ time ./testMT Testing Mersenne Twister. 3063349438 real0m3.032s user0m3.004s The original C-version (modified so that only the last number is shown) gives typically $ time ./mt19937ar outputs of genrand_int32() 3063349438 real0m0.624s user0m0.616s Results are similar with 64 bit IOUArray against 64 bit C variant. C seems to work about 5 to 10 times faster in this case. I have tried to do different things but now I'm stuck. unsafeRead and unsafeWrite improved a bit the lazy (STUArray-version) and IOUArray-versions but not very much. I took a look of Core file but then, I'm not sure where the boxed values are ok. E.g. should IOUArray Int Word64 be replaced with something else? Any hints and comments on how to improve the efficiency and make everything better will be appreciated a lot! br, Isto Greetings, Applying a few optimations can make it about 3x faster. 1. Hoist the array out of your loops. (See generateNumbers32, initialiseGenerator32 and genRNums). 2. Don't create too many new MT32 boxes. Most of the time is spent in 'next32' and changing its type to 'IOUArray Int Word32 -> Int -> IO (Word32, Int)' makes it much faster. 3. Demand more inlining. If you're using GHC, -funfolding-use-threshold=16 will substantially improve the performance. Using 'seq' is generally a bad idea. It can worsen the performance if not used carefully and GHCs strictness analyser is usually good enough. I used the profiler and -ddump-simpl to analyse this program. Donald suggested manual unboxing. However, in this case it won't help much (if at all) since GHC is doing such an excellent job on its own. -- Cheers, Lemmih ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)
On 11/2/06, Lennart Augustsson <[EMAIL PROTECTED]> wrote: A big problem with the Mersenne Twister is the shifts. As has been noted elsewhere, ghc doesn't do such a great job on those. Actually, the shifts are only evaluated once (hurrah for lazy evaluation) and with -funfolding-use-threshold=16 they're all compiled to unchecked primitives (GHC.Prim.uncheckedShiftRL#). -- Cheers, Lemmih ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)
The whole point of writing the Mersenne Twister was that I wanted to show how a stateful computation could be encapsulated in the ST monad and none of it showing up outside. This aspect of the code is totally gone now when everything is in the IO monad. Is there some good reason to have it in the IO monad? -- Lennart On Nov 1, 2006, at 20:51 , Lemmih wrote: On 11/1/06, isto <[EMAIL PROTECTED]> wrote: Hi all, On HaWiki was an announcement of MersenneTwister made by Lennart Augustsson. On a typical run to find out 1000th rnd num the output is (code shown below): $ time ./testMTla Testing Mersenne Twister. Result is [3063349438] real0m4.925s user0m4.856s I was exercising with the very same algorithm and tried to make it efficient (by using IOUArray): now a typical run looks like (code shown below): $ time ./testMT Testing Mersenne Twister. 3063349438 real0m3.032s user0m3.004s The original C-version (modified so that only the last number is shown) gives typically $ time ./mt19937ar outputs of genrand_int32() 3063349438 real0m0.624s user0m0.616s Results are similar with 64 bit IOUArray against 64 bit C variant. C seems to work about 5 to 10 times faster in this case. I have tried to do different things but now I'm stuck. unsafeRead and unsafeWrite improved a bit the lazy (STUArray-version) and IOUArray-versions but not very much. I took a look of Core file but then, I'm not sure where the boxed values are ok. E.g. should IOUArray Int Word64 be replaced with something else? Any hints and comments on how to improve the efficiency and make everything better will be appreciated a lot! br, Isto Greetings, Applying a few optimations can make it about 3x faster. 1. Hoist the array out of your loops. (See generateNumbers32, initialiseGenerator32 and genRNums). 2. Don't create too many new MT32 boxes. Most of the time is spent in 'next32' and changing its type to 'IOUArray Int Word32 -> Int -> IO (Word32, Int)' makes it much faster. 3. Demand more inlining. If you're using GHC, -funfolding-use-threshold=16 will substantially improve the performance. Using 'seq' is generally a bad idea. It can worsen the performance if not used carefully and GHCs strictness analyser is usually good enough. I used the profiler and -ddump-simpl to analyse this program. Donald suggested manual unboxing. However, in this case it won't help much (if at all) since GHC is doing such an excellent job on its own. -- Cheers, Lemmih ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)
lemmih: > On 11/1/06, isto <[EMAIL PROTECTED]> wrote: > >Hi all, > > > >On HaWiki was an announcement of MersenneTwister made by Lennart > >Augustsson. On a typical run to find out 1000th rnd num the output > >is (code shown below): > > > >$ time ./testMTla > >Testing Mersenne Twister. > >Result is [3063349438] > > > >real0m4.925s > >user0m4.856s > > > > > >I was exercising with the very same algorithm and tried to make it > >efficient (by using IOUArray): now a typical run looks like (code shown > >below): > > > >$ time ./testMT > >Testing Mersenne Twister. > >3063349438 > > > >real0m3.032s > >user0m3.004s > > > > > >The original C-version (modified so that only the last number is > >shown) gives typically > > > >$ time ./mt19937ar > >outputs of genrand_int32() > >3063349438 > > > >real0m0.624s > >user0m0.616s > > > >Results are similar with 64 bit IOUArray against 64 bit C variant. > >C seems to work about 5 to 10 times faster in this case. > > > >I have tried to do different things but now I'm stuck. unsafeRead > >and unsafeWrite improved a bit the lazy (STUArray-version) and > >IOUArray-versions but not very much. I took a look of Core file but > >then, I'm not sure where the boxed values are ok. E.g. should IOUArray > >Int Word64 be replaced with something else? > > > >Any hints and comments on how to improve the efficiency and make > >everything better will be appreciated a lot! > > > >br, Isto > > Greetings, > > Applying a few optimations can make it about 3x faster. > > 1. Hoist the array out of your loops. (See generateNumbers32, > initialiseGenerator32 and genRNums). > 2. Don't create too many new MT32 boxes. Most of the time is spent in > 'next32' and changing its type to 'IOUArray Int Word32 -> Int -> IO > (Word32, Int)' makes it much faster. > 3. Demand more inlining. If you're using GHC, > -funfolding-use-threshold=16 will substantially improve the > performance. > > Using 'seq' is generally a bad idea. It can worsen the performance if > not used carefully and GHCs strictness analyser is usually good > enough. > I used the profiler and -ddump-simpl to analyse this program. > > Donald suggested manual unboxing. However, in this case it won't help > much (if at all) since GHC is doing such an excellent job on its own. I wasn't suggesting manual unboxing, more that you should carefully inspect the Core, and tune with bang patterns where necessary. -funfolding-use-threshold=16 is a good idea though. or =100 ;) -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)
Hi, When writing IO version, I wasn't aware of other twister versions, and the only reason is/was that it was easiest to me and that I knew (believed) that plain lists would have been inefficient. I just wanted to see and learn, how close to C version this can be made. (And still do.) There were some good suggestions on this thread - next I'll try to get grasp on how to apply the suggestions and do something... br, Isto ke, 2006-11-01 kello 22:04 -0500, Lennart Augustsson kirjoitti: > The whole point of writing the Mersenne Twister was that I wanted to > show how a stateful computation could be encapsulated in the ST monad > and none of it showing up outside. This aspect of the code is > totally gone now when everything is in the IO monad. Is there some > good reason to have it in the IO monad? > ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)
Hello isto, Thursday, November 2, 2006, 1:16:55 AM, you wrote: > I have tried to do different things but now I'm stuck. unsafeRead > and unsafeWrite improved a bit the lazy (STUArray-version) and why you think it's a lazy? :) ST monad is just the same as IO monad internally, only types are different (there is also Lazy.ST monad - this is really lazy) 10-20 times difference is typical for GHC programs. i think that better results are observed only when C version is really bound by memory speed (that is rather typical situation) use JHC if you need to generate really fast code. or use ocaml/clean -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)
bulat.ziganshin: > Hello isto, > > Thursday, November 2, 2006, 1:16:55 AM, you wrote: > > > I have tried to do different things but now I'm stuck. unsafeRead > > and unsafeWrite improved a bit the lazy (STUArray-version) and > > why you think it's a lazy? :) ST monad is just the same as IO monad > internally, only types are different (there is also Lazy.ST monad - > this is really lazy) > > 10-20 times difference is typical for GHC programs. ! It's really more like 2-4x. Sometimes better than C. Where's this huge figure coming from Bulat? If you have code that behaves like this, you should report it. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)
Oh, sorry, I thought your version was a rewritten version of mine. :) The names are so similar, after all. On Nov 2, 2006, at 02:26 , isto wrote: Hi, When writing IO version, I wasn't aware of other twister versions, and the only reason is/was that it was easiest to me and that I knew (believed) that plain lists would have been inefficient. I just wanted to see and learn, how close to C version this can be made. (And still do.) There were some good suggestions on this thread - next I'll try to get grasp on how to apply the suggestions and do something... br, Isto ke, 2006-11-01 kello 22:04 -0500, Lennart Augustsson kirjoitti: The whole point of writing the Mersenne Twister was that I wanted to show how a stateful computation could be encapsulated in the ST monad and none of it showing up outside. This aspect of the code is totally gone now when everything is in the IO monad. Is there some good reason to have it in the IO monad? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)
Hi & no problems (I didn't tell it clearly right away), I modified the code along the comments given by Lemmih and things improved a lot. mod-operator can be removed by two loops as in C version, which still further improved the speed. I tried this with the old version and the speed-up was next to nothing. The new version can be found below. Now, the results are comparable to the C version. On the second run it was even better, but usually it is about 0.78 on my machine. $ time ./testMT Testing Mersenne Twister. 3063349438 real0m0.791s user0m0.776s sys 0m0.008s $ time ./testMT Testing Mersenne Twister. 3063349438 real0m0.414s user0m0.400s Can somebody tell, why this happens? There were similar timings with ST-version and with the old version (so that it sometimes runs almost twice as fast or twice as slow than normally). Is this something system dependent (amd64/ubuntu)? C version seems to be within 10% that is between 0.60 and 0.66. Anyhow, this random generator now seems to randomly outperform this particular C version :) C version is located at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html About Lemmih's proposals. Did I follow them correctly? I'm using ghc and there the inlining demanding seemed to work. With =100 there are more often 0.4 timings than with =16, but also =100 gives 0.78 seconds often (majority of runs). And can something still be done? At this point I would leave FFI and other languages out as one goal here is to learn Haskelling and somehow solutions that require writing code only in one language sounds nicer. br, Isto to, 2006-11-02 kello 08:18 -0500, Lennart Augustsson kirjoitti: > Oh, sorry, I thought your version was a rewritten version of mine. :) > The names are so similar, after all. -- Mersenne.hs (64bit version removed) module Mersenne where import Data.Bits import Data.Word import Data.Array.Base import Data.Array.MArray import Data.Array.IO data MT32 = MT32 (IOUArray Int Word32) Int last32bitsof :: Word32 -> Word32 last32bitsof a = a .&. 0x -- == (2^32-1) lm32 = 0x7fff :: Word32 um32 = 0x8000 :: Word32 mA32 = 0x9908b0df :: Word32 -- == 2567483615 -- Array of length 624. initialiseGenerator32 :: Int -> IO MT32 initialiseGenerator32 seed = do let s = last32bitsof (fromIntegral seed)::Word32 mt <- newArray (0,623) (0::Word32) unsafeWrite mt 0 s mtLoop mt s 1 generateNumbers32 mt return (MT32 mt 0) mtLoop :: (IOUArray Int Word32) -> Word32 -> Int -> IO (IOUArray Int Word32) mtLoop mt lastNro n = loop lastNro n where loop :: Word32 -> Int -> IO (IOUArray Int Word32) loop lastNro n | n == 624= return mt | otherwise = do let n1 = lastNro `xor` (shiftR lastNro 30) new = (1812433253 * n1 + (fromIntegral n)::Word32) unsafeWrite mt n new loop new $! (n+1) generateNumbers32 :: (IOUArray Int Word32) -> IO () generateNumbers32 mt = do gLoop1 0 gLoop2 227 wL <- unsafeRead mt 623 w0 <- unsafeRead mt 0 w396 <- unsafeRead mt 396 let y = (wL .&. um32) .|. (w0 .&. lm32) :: Word32 if even y then unsafeWrite mt 623 (w396 `xor` (shiftR y 1)) else unsafeWrite mt 623 (w396 `xor` (shiftR y 1) `xor` mA32) return () where gLoop1 :: Int -> IO () gLoop1 227 = return () gLoop1 i= do wi <- unsafeRead mt i wi1 <- unsafeRead mt (i+1) -- w3 <- unsafeRead mt ((i+397) `mod` 624) w3 <- unsafeRead mt (i+397) let y = (wi .&. um32) .|. (wi1 .&. lm32) if even y then unsafeWrite mt i (w3 `xor` (shiftR y 1)) else unsafeWrite mt i (w3 `xor` (shiftR y 1) `xor` mA32) gLoop1 $! (i+1) gLoop2 :: Int -> IO () gLoop2 623 = return () gLoop2 i= do wi <- unsafeRead mt i wi1 <- unsafeRead mt (i+1) -- w3 <- unsafeRead mt ((i+397) `mod` 624) w3 <- unsafeRead mt (i-227) let y = (wi .&. um32) .|. (wi1 .&. lm32) if even y then unsafeWrite mt i (w3 `xor` (shiftR y 1)) else unsafeWrite mt i (w3 `xor` (shiftR y 1) `xor` mA32) gLoop2 $! (i+1) {- gLoop :: Int -> IO () gLoop 623 = do wL <- unsafeRead mt 623 w0 <-
Re[2]: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)
Hello Lennart, Thursday, November 2, 2006, 4:34:04 AM, you wrote: > A big problem with the Mersenne Twister is the shifts. As has been > noted elsewhere, ghc doesn't do such a great job on those. #ifdef __GLASGOW_HASKELL__ (I# a) <<# (I# b) = (I# (a `iShiftL#` b)) (I# a) >># (I# b) = (I# (a `uncheckedIShiftRL#` b)) #else /* ! __GLASGOW_HASKELL__ */ a <<# b = a `shiftL` b a >># b = a `shiftR` b #endif /* ! __GLASGOW_HASKELL__ */ -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)
Hello Lennart, Thursday, November 2, 2006, 6:04:39 AM, you wrote: > The whole point of writing the Mersenne Twister was that I wanted to > show how a stateful computation could be encapsulated in the ST monad > and none of it showing up outside. This aspect of the code is > totally gone now when everything is in the IO monad. Is there some > good reason to have it in the IO monad? i think no. ST computations internally are really the same as IO ones -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)
Hello Donald, Thursday, November 2, 2006, 2:21:31 PM, you wrote: >> 10-20 times difference is typical for GHC programs. > It's really more like 2-4x. Sometimes better than C. > Where's this huge figure coming from Bulat? If you have code that > behaves like this, you should report it. are you analyzed the cases where performance is close or even better? i does. it's either because C version is limited by memory performance or just use different, less efficient algorithm the cases which shows slowness of ghc-generated code is factorial algorithm and the program attached. despite that Haskell code is far uglier, C version outperforms it 20 times. run both programs with arrays of about 10k elements to see the difference: a.out 1 elements 10 iterations in February i've written detailed explanation (in ghc-users) of why this comes and made some suggestions on improving it. of course, main problem is ghc's own code generator which is far away from gcc or even ocaml ones -- Best regards, Bulatmailto:[EMAIL PROTECTED] full.c Description: Binary data newest.hs Description: Binary data ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe