lemmih: > On 11/1/06, isto <[EMAIL PROTECTED]> wrote: > >Hi all, > > > >On HaWiki was an announcement of MersenneTwister made by Lennart > >Augustsson. On a typical run to find out 10000000th rnd num the output > >is (code shown below): > > > >$ time ./testMTla > >Testing Mersenne Twister. > >Result is [3063349438] > > > >real 0m4.925s > >user 0m4.856s > > > > > >I was exercising with the very same algorithm and tried to make it > >efficient (by using IOUArray): now a typical run looks like (code shown > >below): > > > >$ time ./testMT > >Testing Mersenne Twister. > >3063349438 > > > >real 0m3.032s > >user 0m3.004s > > > > > >The original C-version (modified so that only the last number is > >shown) gives typically > > > >$ time ./mt19937ar > >outputs of genrand_int32() > >3063349438 > > > >real 0m0.624s > >user 0m0.616s > > > >Results are similar with 64 bit IOUArray against 64 bit C variant. > >C seems to work about 5 to 10 times faster in this case. > > > >I have tried to do different things but now I'm stuck. unsafeRead > >and unsafeWrite improved a bit the lazy (STUArray-version) and > >IOUArray-versions but not very much. I took a look of Core file but > >then, I'm not sure where the boxed values are ok. E.g. should IOUArray > >Int Word64 be replaced with something else? > > > >Any hints and comments on how to improve the efficiency and make > >everything better will be appreciated a lot! > > > >br, Isto > > Greetings, > > Applying a few optimations can make it about 3x faster. > > 1. Hoist the array out of your loops. (See generateNumbers32, > initialiseGenerator32 and genRNums). > 2. Don't create too many new MT32 boxes. Most of the time is spent in > 'next32' and changing its type to 'IOUArray Int Word32 -> Int -> IO > (Word32, Int)' makes it much faster. > 3. Demand more inlining. If you're using GHC, > -funfolding-use-threshold=16 will substantially improve the > performance. > > Using 'seq' is generally a bad idea. It can worsen the performance if > not used carefully and GHCs strictness analyser is usually good > enough. > I used the profiler and -ddump-simpl to analyse this program. > > Donald suggested manual unboxing. However, in this case it won't help > much (if at all) since GHC is doing such an excellent job on its own.
I wasn't suggesting manual unboxing, more that you should carefully inspect the Core, and tune with bang patterns where necessary. -funfolding-use-threshold=16 is a good idea though. or =100 ;) -- Don _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe