[Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)

2006-11-01 Thread isto
Hi all,

On HaWiki was an announcement of MersenneTwister made by Lennart
Augustsson.  On a typical run to find out 1000th rnd num the output
is (code shown below):

$ time ./testMTla
Testing Mersenne Twister.
Result is [3063349438]

real0m4.925s
user0m4.856s


I was exercising with the very same algorithm and tried to make it
efficient (by using IOUArray): now a typical run looks like (code shown
below):

$ time ./testMT
Testing Mersenne Twister.
3063349438

real0m3.032s
user0m3.004s


The original C-version (modified so that only the last number is
shown) gives typically

$ time ./mt19937ar
outputs of genrand_int32()
3063349438

real0m0.624s
user0m0.616s

Results are similar with 64 bit IOUArray against 64 bit C variant.
C seems to work about 5 to 10 times faster in this case.

I have tried to do different things but now I'm stuck.  unsafeRead
and unsafeWrite improved a bit the lazy (STUArray-version) and
IOUArray-versions but not very much.  I took a look of Core file but
then, I'm not sure where the boxed values are ok. E.g. should  IOUArray
Int Word64  be replaced with something else?

Any hints and comments on how to improve the efficiency and make
everything better will be appreciated a lot!  

br, Isto

- testMTla.hs (MersenneTwister, see HaWiki)
module Main where

-- ghc -O3 -optc-O3 -optc-ffast-math -fexcess-precision --make testMTla

import MersenneTwister

main = do
putStrLn "Testing Mersenne Twister."
let mt = mersenneTwister 100
w = take 1 (drop 999 mt)
-- w = take 1 (drop 99 mt)
putStrLn $ "Result is " ++ (show w)
-

- testMT.hs
module Main where

-- Compile eg with
--   ghc -O3 -optc-O3 -optc-ffast-math -fexcess-precision --make testMT

import Mersenne

genRNums32 :: MT32 -> Int -> IO (MT32)
genRNums32 mt nCnt = gRN mt nCnt 
where gRN :: MT32 -> Int -> IO (MT32)
  gRN mt nCnt | mt `seq` nCnt `seq` False = undefined
  gRN mt 1= do 
(r,mt') <- next32 mt
putStrLn $ (show r)
return mt'
  gRN mt nCnt = do
(r,mt') <- next32 mt
gRN mt' $! (nCnt-1) 


main = do
putStrLn "Testing Mersenne Twister."
mt32 <- initialiseGenerator32 100
genRNums32 mt32 1000
-

- Mersenne.hs (sorry for linewraps)
module Mersenne where

import Data.Bits
import Data.Word
import Data.Array.Base
import Data.Array.MArray
import Data.Array.IO
-- import System.Random


data MT32 = MT32 (IOUArray Int Word32) Int
data MT64 = MT64 (IOUArray Int Word64) Int


last32bitsof :: Word32 -> Word32 
last32bitsof a = a .&. 0x -- == (2^32-1)  

lm32 = 0x7fff :: Word32
um32 = 0x8000 :: Word32
mA32 = 0x9908b0df :: Word32 -- == 2567483615

-- Array of length 624.
initialiseGenerator32 :: Int -> IO MT32 
initialiseGenerator32 seed = do
let s = last32bitsof (fromIntegral seed)::Word32
mt <- newArray (0,623) (0::Word32)
unsafeWrite mt 0 s
iG mt s 1
mt' <- generateNumbers32 mt
return (MT32 mt' 0)
where
iG :: (IOUArray Int Word32) -> Word32 -> Int -> IO (IOUArray Int
Word32)
iG mt lastNro n  
| n == 624= return mt
| otherwise = do let n1 = lastNro `xor` (shiftR lastNro 
30)
 new = (1812433253 * n1 + 
(fromIntegral n)::Word32) 
 unsafeWrite mt n new
 iG mt new (n+1)


generateNumbers32 :: (IOUArray Int Word32) -> IO (IOUArray Int Word32)
generateNumbers32 mt = gLoop 0 mt
where
gLoop :: Int -> (IOUArray Int Word32) -> IO (IOUArray Int 
Word32)
gLoop i mt 
| i==623  = do 
wL <- unsafeRead mt 623
w0 <- unsafeRead mt 0
w396 <- unsafeRead mt 396
let y = (wL .&. um32) .|. (w0 .&. lm32) :: 
Word32
if even y 
   then unsafeWrite mt 623 (w396 `xor` (shiftR 
y 1))
   else unsafeWrite mt 623 (w396 `xor` (shiftR 
y 1) `xor` mA32)
return mt
| otherwise = do
wi  <- unsafeRead mt i
wi1 <- unsafeRead mt (i+1) 
w3  <- unsafeRead mt ((i+397) `mod` 624)
let y = (wi .&. um32) .|. (wi1 .&. lm32)
if even y 
   then unsafeWrite mt i (w3 `xor` (shiftR y 

Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)

2006-11-01 Thread Donald Bruce Stewart
Now, this will be hard to get close the the highly tuned C. Possibly its
doable.

The main tricks are documented here:
http://haskell.org/haskellwiki/Performance/GHC#Unboxed_types

Inspecting the Core to ensure the math is being inlined and unboxed will
be the most crucial issue, I'd imagine.

Then again, an FFI binding to mersenne.c is also a good idea :)

-- Don


isto.aho:
> Hi all,
> 
> On HaWiki was an announcement of MersenneTwister made by Lennart
> Augustsson.  On a typical run to find out 1000th rnd num the output
> is (code shown below):
> 
> $ time ./testMTla
> Testing Mersenne Twister.
> Result is [3063349438]
> 
> real0m4.925s
> user0m4.856s
> 
> 
> I was exercising with the very same algorithm and tried to make it
> efficient (by using IOUArray): now a typical run looks like (code shown
> below):
> 
> $ time ./testMT
> Testing Mersenne Twister.
> 3063349438
> 
> real0m3.032s
> user0m3.004s
> 
> 
> The original C-version (modified so that only the last number is
> shown) gives typically
> 
> $ time ./mt19937ar
> outputs of genrand_int32()
> 3063349438
> 
> real0m0.624s
> user0m0.616s
> 
> Results are similar with 64 bit IOUArray against 64 bit C variant.
> C seems to work about 5 to 10 times faster in this case.
> 
> I have tried to do different things but now I'm stuck.  unsafeRead
> and unsafeWrite improved a bit the lazy (STUArray-version) and
> IOUArray-versions but not very much.  I took a look of Core file but
> then, I'm not sure where the boxed values are ok. E.g. should  IOUArray
> Int Word64  be replaced with something else?
> 
> Any hints and comments on how to improve the efficiency and make
> everything better will be appreciated a lot!  
> 
> br, Isto
> 
> - testMTla.hs (MersenneTwister, see HaWiki)
> module Main where
> 
> -- ghc -O3 -optc-O3 -optc-ffast-math -fexcess-precision --make testMTla
> 
> import MersenneTwister
> 
> main = do
>   putStrLn "Testing Mersenne Twister."
>   let mt = mersenneTwister 100
>   w = take 1 (drop 999 mt)
>   -- w = take 1 (drop 99 mt)
>   putStrLn $ "Result is " ++ (show w)
> -
> 
> - testMT.hs
> module Main where
> 
> -- Compile eg with
> --   ghc -O3 -optc-O3 -optc-ffast-math -fexcess-precision --make testMT
> 
> import Mersenne
> 
> genRNums32 :: MT32 -> Int -> IO (MT32)
> genRNums32 mt nCnt = gRN mt nCnt 
>   where gRN :: MT32 -> Int -> IO (MT32)
> gRN mt nCnt | mt `seq` nCnt `seq` False = undefined
> gRN mt 1= do 
>   (r,mt') <- next32 mt
>   putStrLn $ (show r)
>   return mt'
> gRN mt nCnt = do
>   (r,mt') <- next32 mt
>   gRN mt' $! (nCnt-1) 
> 
> 
> main = do
>   putStrLn "Testing Mersenne Twister."
>   mt32 <- initialiseGenerator32 100
>   genRNums32 mt32 1000
> -
> 
> - Mersenne.hs (sorry for linewraps)
> module Mersenne where
> 
> import Data.Bits
> import Data.Word
> import Data.Array.Base
> import Data.Array.MArray
> import Data.Array.IO
> -- import System.Random
> 
> 
> data MT32 = MT32 (IOUArray Int Word32) Int
> data MT64 = MT64 (IOUArray Int Word64) Int
> 
> 
> last32bitsof :: Word32 -> Word32 
> last32bitsof a = a .&. 0x -- == (2^32-1)  
> 
> lm32 = 0x7fff :: Word32
> um32 = 0x8000 :: Word32
> mA32 = 0x9908b0df :: Word32 -- == 2567483615
> 
> -- Array of length 624.
> initialiseGenerator32 :: Int -> IO MT32 
> initialiseGenerator32 seed = do
>   let s = last32bitsof (fromIntegral seed)::Word32
>   mt <- newArray (0,623) (0::Word32)
>   unsafeWrite mt 0 s
>   iG mt s 1
>   mt' <- generateNumbers32 mt
>   return (MT32 mt' 0)
>   where
>   iG :: (IOUArray Int Word32) -> Word32 -> Int -> IO (IOUArray Int
> Word32)
>   iG mt lastNro n  
>   | n == 624= return mt
>   | otherwise = do let n1 = lastNro `xor` (shiftR lastNro 
> 30)
>new = (1812433253 * n1 + 
> (fromIntegral n)::Word32) 
>unsafeWrite mt n new
>iG mt new (n+1)
> 
> 
> generateNumbers32 :: (IOUArray Int Word32) -> IO (IOUArray Int Word32)
> generateNumbers32 mt = gLoop 0 mt
>   where
>   gLoop :: Int -> (IOUArray Int Word32) -> IO (IOUArray Int 
> Word32)
>   gLoop i mt 
>   | i==623  = do 
>   wL <- unsafeRead mt 623
>   w0 <- unsafeRead mt 0
>   w396 <- unsafeRead mt 396
>   let y = (wL .&. um32) .|. (w0 .&. lm32) :: 
> Word32
>   if even y 
>  then unsafe

Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)

2006-11-01 Thread Lennart Augustsson
A big problem with the Mersenne Twister is the shifts.  As has been  
noted elsewhere, ghc doesn't do such a great job on those.


-- Lennart

On Nov 1, 2006, at 20:17 , Donald Bruce Stewart wrote:

Now, this will be hard to get close the the highly tuned C.  
Possibly its

doable.

The main tricks are documented here:
http://haskell.org/haskellwiki/Performance/GHC#Unboxed_types

Inspecting the Core to ensure the math is being inlined and unboxed  
will

be the most crucial issue, I'd imagine.

Then again, an FFI binding to mersenne.c is also a good idea :)

-- Don


isto.aho:

Hi all,

On HaWiki was an announcement of MersenneTwister made by Lennart
Augustsson.  On a typical run to find out 1000th rnd num the  
output

is (code shown below):

$ time ./testMTla
Testing Mersenne Twister.
Result is [3063349438]

real0m4.925s
user0m4.856s


I was exercising with the very same algorithm and tried to make it
efficient (by using IOUArray): now a typical run looks like (code  
shown

below):

$ time ./testMT
Testing Mersenne Twister.
3063349438

real0m3.032s
user0m3.004s


The original C-version (modified so that only the last number is
shown) gives typically

$ time ./mt19937ar
outputs of genrand_int32()
3063349438

real0m0.624s
user0m0.616s

Results are similar with 64 bit IOUArray against 64 bit C variant.
C seems to work about 5 to 10 times faster in this case.

I have tried to do different things but now I'm stuck.  unsafeRead
and unsafeWrite improved a bit the lazy (STUArray-version) and
IOUArray-versions but not very much.  I took a look of Core file but
then, I'm not sure where the boxed values are ok. E.g. should   
IOUArray

Int Word64  be replaced with something else?

Any hints and comments on how to improve the efficiency and make
everything better will be appreciated a lot!

br, Isto

- testMTla.hs (MersenneTwister, see  
HaWiki)

module Main where

-- ghc -O3 -optc-O3 -optc-ffast-math -fexcess-precision --make  
testMTla


import MersenneTwister

main = do
putStrLn "Testing Mersenne Twister."
let mt = mersenneTwister 100
w = take 1 (drop 999 mt)
-- w = take 1 (drop 99 mt)
putStrLn $ "Result is " ++ (show w)
-

- testMT.hs
module Main where

-- Compile eg with
--   ghc -O3 -optc-O3 -optc-ffast-math -fexcess-precision --make  
testMT


import Mersenne

genRNums32 :: MT32 -> Int -> IO (MT32)
genRNums32 mt nCnt = gRN mt nCnt
where gRN :: MT32 -> Int -> IO (MT32)
  gRN mt nCnt | mt `seq` nCnt `seq` False = undefined
  gRN mt 1= do
(r,mt') <- next32 mt
putStrLn $ (show r)
return mt'
  gRN mt nCnt = do
(r,mt') <- next32 mt
gRN mt' $! (nCnt-1)


main = do
putStrLn "Testing Mersenne Twister."
mt32 <- initialiseGenerator32 100
genRNums32 mt32 1000
-

- Mersenne.hs (sorry for linewraps)
module Mersenne where

import Data.Bits
import Data.Word
import Data.Array.Base
import Data.Array.MArray
import Data.Array.IO
-- import System.Random


data MT32 = MT32 (IOUArray Int Word32) Int
data MT64 = MT64 (IOUArray Int Word64) Int


last32bitsof :: Word32 -> Word32
last32bitsof a = a .&. 0x -- == (2^32-1)

lm32 = 0x7fff :: Word32
um32 = 0x8000 :: Word32
mA32 = 0x9908b0df :: Word32 -- == 2567483615

-- Array of length 624.
initialiseGenerator32 :: Int -> IO MT32
initialiseGenerator32 seed = do
let s = last32bitsof (fromIntegral seed)::Word32
mt <- newArray (0,623) (0::Word32)
unsafeWrite mt 0 s
iG mt s 1
mt' <- generateNumbers32 mt
return (MT32 mt' 0)
where
iG :: (IOUArray Int Word32) -> Word32 -> Int -> IO (IOUArray Int
Word32)
iG mt lastNro n
| n == 624= return mt
| otherwise = do let n1 = lastNro `xor` (shiftR lastNro 
30)
 new = (1812433253 * n1 + 
(fromIntegral n)::Word32)
 unsafeWrite mt n new
 iG mt new (n+1)


generateNumbers32 :: (IOUArray Int Word32) -> IO (IOUArray Int  
Word32)

generateNumbers32 mt = gLoop 0 mt
where
gLoop :: Int -> (IOUArray Int Word32) -> IO (IOUArray Int 
Word32)
gLoop i mt
| i==623  = do
wL <- unsafeRead mt 623
w0 <- unsafeRead mt 0
w396 <- unsafeRead mt 396
let y = (wL .&. um32) .|. (w0 .&. lm32) :: 
Word32
if even y
   then unsa

Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)

2006-11-01 Thread Lemmih

On 11/1/06, isto <[EMAIL PROTECTED]> wrote:

Hi all,

On HaWiki was an announcement of MersenneTwister made by Lennart
Augustsson.  On a typical run to find out 1000th rnd num the output
is (code shown below):

$ time ./testMTla
Testing Mersenne Twister.
Result is [3063349438]

real0m4.925s
user0m4.856s


I was exercising with the very same algorithm and tried to make it
efficient (by using IOUArray): now a typical run looks like (code shown
below):

$ time ./testMT
Testing Mersenne Twister.
3063349438

real0m3.032s
user0m3.004s


The original C-version (modified so that only the last number is
shown) gives typically

$ time ./mt19937ar
outputs of genrand_int32()
3063349438

real0m0.624s
user0m0.616s

Results are similar with 64 bit IOUArray against 64 bit C variant.
C seems to work about 5 to 10 times faster in this case.

I have tried to do different things but now I'm stuck.  unsafeRead
and unsafeWrite improved a bit the lazy (STUArray-version) and
IOUArray-versions but not very much.  I took a look of Core file but
then, I'm not sure where the boxed values are ok. E.g. should  IOUArray
Int Word64  be replaced with something else?

Any hints and comments on how to improve the efficiency and make
everything better will be appreciated a lot!

br, Isto


Greetings,

Applying a few optimations can make it about 3x faster.

1. Hoist the array out of your loops. (See generateNumbers32,
initialiseGenerator32 and genRNums).
2. Don't create too many new MT32 boxes. Most of the time is spent in
'next32' and changing its type to 'IOUArray Int Word32 -> Int -> IO
(Word32, Int)' makes it much faster.
3. Demand more inlining. If you're using GHC,
-funfolding-use-threshold=16 will substantially improve the
performance.

Using 'seq' is generally a bad idea. It can worsen the performance if
not used carefully and GHCs strictness analyser is usually good
enough.
I used the profiler and -ddump-simpl to analyse this program.

Donald suggested manual unboxing. However, in this case it won't help
much (if at all) since GHC is doing such an excellent job on its own.

--
Cheers,
 Lemmih
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)

2006-11-01 Thread Lemmih

On 11/2/06, Lennart Augustsson <[EMAIL PROTECTED]> wrote:

A big problem with the Mersenne Twister is the shifts.  As has been
noted elsewhere, ghc doesn't do such a great job on those.


Actually, the shifts are only evaluated once (hurrah for lazy
evaluation) and with -funfolding-use-threshold=16 they're all compiled
to unchecked primitives (GHC.Prim.uncheckedShiftRL#).

--
Cheers,
 Lemmih
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)

2006-11-01 Thread Lennart Augustsson
The whole point of writing the Mersenne Twister was that I wanted to  
show how a stateful computation could be encapsulated in the ST monad  
and none of it showing up outside.  This aspect of the code is  
totally gone now when everything is in the IO monad.  Is there some  
good reason to have it in the IO monad?


-- Lennart

On Nov 1, 2006, at 20:51 , Lemmih wrote:


On 11/1/06, isto <[EMAIL PROTECTED]> wrote:

Hi all,

On HaWiki was an announcement of MersenneTwister made by Lennart
Augustsson.  On a typical run to find out 1000th rnd num the  
output

is (code shown below):

$ time ./testMTla
Testing Mersenne Twister.
Result is [3063349438]

real0m4.925s
user0m4.856s


I was exercising with the very same algorithm and tried to make it
efficient (by using IOUArray): now a typical run looks like (code  
shown

below):

$ time ./testMT
Testing Mersenne Twister.
3063349438

real0m3.032s
user0m3.004s


The original C-version (modified so that only the last number is
shown) gives typically

$ time ./mt19937ar
outputs of genrand_int32()
3063349438

real0m0.624s
user0m0.616s

Results are similar with 64 bit IOUArray against 64 bit C variant.
C seems to work about 5 to 10 times faster in this case.

I have tried to do different things but now I'm stuck.  unsafeRead
and unsafeWrite improved a bit the lazy (STUArray-version) and
IOUArray-versions but not very much.  I took a look of Core file but
then, I'm not sure where the boxed values are ok. E.g. should   
IOUArray

Int Word64  be replaced with something else?

Any hints and comments on how to improve the efficiency and make
everything better will be appreciated a lot!

br, Isto


Greetings,

Applying a few optimations can make it about 3x faster.

1. Hoist the array out of your loops. (See generateNumbers32,
initialiseGenerator32 and genRNums).
2. Don't create too many new MT32 boxes. Most of the time is spent in
'next32' and changing its type to 'IOUArray Int Word32 -> Int -> IO
(Word32, Int)' makes it much faster.
3. Demand more inlining. If you're using GHC,
-funfolding-use-threshold=16 will substantially improve the
performance.

Using 'seq' is generally a bad idea. It can worsen the performance if
not used carefully and GHCs strictness analyser is usually good
enough.
I used the profiler and -ddump-simpl to analyse this program.

Donald suggested manual unboxing. However, in this case it won't help
much (if at all) since GHC is doing such an excellent job on its own.

--
Cheers,
 Lemmih
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)

2006-11-01 Thread Donald Bruce Stewart
lemmih:
> On 11/1/06, isto <[EMAIL PROTECTED]> wrote:
> >Hi all,
> >
> >On HaWiki was an announcement of MersenneTwister made by Lennart
> >Augustsson.  On a typical run to find out 1000th rnd num the output
> >is (code shown below):
> >
> >$ time ./testMTla
> >Testing Mersenne Twister.
> >Result is [3063349438]
> >
> >real0m4.925s
> >user0m4.856s
> >
> >
> >I was exercising with the very same algorithm and tried to make it
> >efficient (by using IOUArray): now a typical run looks like (code shown
> >below):
> >
> >$ time ./testMT
> >Testing Mersenne Twister.
> >3063349438
> >
> >real0m3.032s
> >user0m3.004s
> >
> >
> >The original C-version (modified so that only the last number is
> >shown) gives typically
> >
> >$ time ./mt19937ar
> >outputs of genrand_int32()
> >3063349438
> >
> >real0m0.624s
> >user0m0.616s
> >
> >Results are similar with 64 bit IOUArray against 64 bit C variant.
> >C seems to work about 5 to 10 times faster in this case.
> >
> >I have tried to do different things but now I'm stuck.  unsafeRead
> >and unsafeWrite improved a bit the lazy (STUArray-version) and
> >IOUArray-versions but not very much.  I took a look of Core file but
> >then, I'm not sure where the boxed values are ok. E.g. should  IOUArray
> >Int Word64  be replaced with something else?
> >
> >Any hints and comments on how to improve the efficiency and make
> >everything better will be appreciated a lot!
> >
> >br, Isto
> 
> Greetings,
> 
> Applying a few optimations can make it about 3x faster.
> 
> 1. Hoist the array out of your loops. (See generateNumbers32,
> initialiseGenerator32 and genRNums).
> 2. Don't create too many new MT32 boxes. Most of the time is spent in
> 'next32' and changing its type to 'IOUArray Int Word32 -> Int -> IO
> (Word32, Int)' makes it much faster.
> 3. Demand more inlining. If you're using GHC,
> -funfolding-use-threshold=16 will substantially improve the
> performance.
> 
> Using 'seq' is generally a bad idea. It can worsen the performance if
> not used carefully and GHCs strictness analyser is usually good
> enough.
> I used the profiler and -ddump-simpl to analyse this program.
> 
> Donald suggested manual unboxing. However, in this case it won't help
> much (if at all) since GHC is doing such an excellent job on its own.

I wasn't suggesting manual unboxing, more that you should carefully
inspect the Core, and tune with bang patterns where necessary. 

-funfolding-use-threshold=16 is a good idea though. or =100 ;)

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)

2006-11-01 Thread isto
Hi, 

When writing IO version, I wasn't aware of other twister versions,
and the only reason is/was that it was easiest to me and that I knew
(believed) that plain lists would have been inefficient.  I just wanted
to see and learn, how close to C version this can be made.  (And still
do.)

There were some good suggestions on this thread - next I'll try 
to get grasp on how to apply the suggestions and do something...

br, Isto

ke, 2006-11-01 kello 22:04 -0500, Lennart Augustsson kirjoitti:
> The whole point of writing the Mersenne Twister was that I wanted to  
> show how a stateful computation could be encapsulated in the ST monad  
> and none of it showing up outside.  This aspect of the code is  
> totally gone now when everything is in the IO monad.  Is there some  
> good reason to have it in the IO monad?
> 


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)

2006-11-02 Thread Bulat Ziganshin
Hello isto,

Thursday, November 2, 2006, 1:16:55 AM, you wrote:

> I have tried to do different things but now I'm stuck.  unsafeRead
> and unsafeWrite improved a bit the lazy (STUArray-version) and

why you think it's a lazy? :)  ST monad is just the same as IO monad
internally, only types are different (there is also Lazy.ST monad -
this is really lazy)

10-20 times difference is typical for GHC programs. i think that
better results are observed only when C version is really bound by
memory speed (that is rather typical situation)

use JHC if you need to generate really fast code. or use ocaml/clean



-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)

2006-11-02 Thread Donald Bruce Stewart
bulat.ziganshin:
> Hello isto,
> 
> Thursday, November 2, 2006, 1:16:55 AM, you wrote:
> 
> > I have tried to do different things but now I'm stuck.  unsafeRead
> > and unsafeWrite improved a bit the lazy (STUArray-version) and
> 
> why you think it's a lazy? :)  ST monad is just the same as IO monad
> internally, only types are different (there is also Lazy.ST monad -
> this is really lazy)
> 
> 10-20 times difference is typical for GHC programs.

!

It's really more like 2-4x. Sometimes better than C.

Where's this huge figure coming from Bulat? If you have code that
behaves like this, you should report it.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)

2006-11-02 Thread Lennart Augustsson

Oh, sorry, I thought your version was a rewritten version of mine. :)
The names are so similar, after all.

On Nov 2, 2006, at 02:26 , isto wrote:


Hi,

When writing IO version, I wasn't aware of other twister versions,
and the only reason is/was that it was easiest to me and that I knew
(believed) that plain lists would have been inefficient.  I just  
wanted

to see and learn, how close to C version this can be made.  (And still
do.)

There were some good suggestions on this thread - next I'll try
to get grasp on how to apply the suggestions and do something...

br, Isto

ke, 2006-11-01 kello 22:04 -0500, Lennart Augustsson kirjoitti:

The whole point of writing the Mersenne Twister was that I wanted to
show how a stateful computation could be encapsulated in the ST monad
and none of it showing up outside.  This aspect of the code is
totally gone now when everything is in the IO monad.  Is there some
good reason to have it in the IO monad?





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)

2006-11-02 Thread isto
Hi & no problems (I didn't tell it clearly right away),

I modified the code along the comments given by Lemmih and things
improved a lot.  mod-operator can be removed by two loops as in C
version, which still further improved the speed.  I tried this with
the old version and the speed-up was next to nothing.  The new 
version can be found below.

Now, the results are comparable to the C version.  On the second run 
it was even better, but usually it is about 0.78 on my machine.

$ time ./testMT
Testing Mersenne Twister.
3063349438

real0m0.791s
user0m0.776s
sys 0m0.008s

$ time ./testMT
Testing Mersenne Twister.
3063349438

real0m0.414s
user0m0.400s

Can somebody tell, why this happens?  There were similar timings
with ST-version and with the old version (so that it sometimes runs
almost twice as fast or twice as slow than normally).  Is this something
system dependent (amd64/ubuntu)?  C version seems to be within 10% 
that is between 0.60 and 0.66.  

Anyhow, this random generator now seems to randomly outperform this 
particular C version :)  C version is located at
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html

About Lemmih's proposals.  Did I follow them correctly?  I'm using ghc
and there the inlining demanding seemed to work.  With =100 there are
more often 0.4 timings than with =16, but also =100 gives 0.78 seconds
often (majority of runs).  

And can something still be done?  At this point I would leave FFI and 
other languages out as one goal here is to learn Haskelling and somehow
solutions that require writing code only in one language sounds nicer.

br, Isto


to, 2006-11-02 kello 08:18 -0500, Lennart Augustsson kirjoitti:
> Oh, sorry, I thought your version was a rewritten version of mine. :)
> The names are so similar, after all.

-- Mersenne.hs (64bit version removed)
module Mersenne where

import Data.Bits
import Data.Word
import Data.Array.Base
import Data.Array.MArray
import Data.Array.IO

data MT32 = MT32 (IOUArray Int Word32) Int

last32bitsof :: Word32 -> Word32 
last32bitsof a = a .&. 0x -- == (2^32-1)  

lm32 = 0x7fff :: Word32
um32 = 0x8000 :: Word32
mA32 = 0x9908b0df :: Word32 -- == 2567483615

-- Array of length 624.
initialiseGenerator32 :: Int -> IO MT32 
initialiseGenerator32 seed = do
let s = last32bitsof (fromIntegral seed)::Word32
mt <- newArray (0,623) (0::Word32)
unsafeWrite mt 0 s
mtLoop mt s 1
generateNumbers32 mt
return (MT32 mt 0)

mtLoop :: (IOUArray Int Word32) -> Word32 -> Int -> IO (IOUArray Int
Word32) 
mtLoop mt lastNro n = loop lastNro n
where
loop :: Word32 -> Int -> IO (IOUArray Int Word32) 
loop lastNro n  
| n == 624= return mt
| otherwise = do 
let n1 = lastNro `xor` (shiftR lastNro 30)
new = (1812433253 * n1 + (fromIntegral 
n)::Word32) 
unsafeWrite mt n new
loop new $! (n+1)


generateNumbers32 :: (IOUArray Int Word32) -> IO ()
generateNumbers32 mt = do
gLoop1 0 
gLoop2 227
wL <- unsafeRead mt 623
w0 <- unsafeRead mt 0
w396 <- unsafeRead mt 396
let y = (wL .&. um32) .|. (w0 .&. lm32) :: Word32
if even y 
then unsafeWrite mt 623 (w396 `xor` (shiftR y 1))
else unsafeWrite mt 623 (w396 `xor` (shiftR y 1) `xor` mA32)
return ()
where
gLoop1 :: Int -> IO ()
gLoop1 227  = return ()
gLoop1 i= do 
wi  <- unsafeRead mt i
wi1 <- unsafeRead mt (i+1) 
-- w3  <- unsafeRead mt ((i+397) `mod` 624)
w3  <- unsafeRead mt (i+397)
let y = (wi .&. um32) .|. (wi1 .&. lm32)
if even y 
   then unsafeWrite mt i (w3 `xor` (shiftR y 1))
   else unsafeWrite mt i (w3 `xor` (shiftR y 1) `xor` 
mA32)
gLoop1 $! (i+1)
gLoop2 :: Int -> IO ()
gLoop2 623  = return ()
gLoop2 i= do 
wi  <- unsafeRead mt i
wi1 <- unsafeRead mt (i+1) 
-- w3  <- unsafeRead mt ((i+397) `mod` 624)
w3  <- unsafeRead mt (i-227)
let y = (wi .&. um32) .|. (wi1 .&. lm32)
if even y 
   then unsafeWrite mt i (w3 `xor` (shiftR y 1))
   else unsafeWrite mt i (w3 `xor` (shiftR y 1) `xor` 
mA32)
gLoop2 $! (i+1)

{-
gLoop :: Int -> IO ()
gLoop 623  = do 
wL <- unsafeRead mt 623
w0 <-

Re[2]: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)

2006-11-02 Thread Bulat Ziganshin
Hello Lennart,

Thursday, November 2, 2006, 4:34:04 AM, you wrote:

> A big problem with the Mersenne Twister is the shifts.  As has been  
> noted elsewhere, ghc doesn't do such a great job on those.

#ifdef __GLASGOW_HASKELL__

(I# a) <<# (I# b) = (I# (a `iShiftL#` b))
(I# a) >># (I# b) = (I# (a `uncheckedIShiftRL#` b))

#else /* ! __GLASGOW_HASKELL__ */

a <<# b = a `shiftL` b
a >># b = a `shiftR` b

#endif /* ! __GLASGOW_HASKELL__ */



-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)

2006-11-02 Thread Bulat Ziganshin
Hello Lennart,

Thursday, November 2, 2006, 6:04:39 AM, you wrote:

> The whole point of writing the Mersenne Twister was that I wanted to  
> show how a stateful computation could be encapsulated in the ST monad
> and none of it showing up outside.  This aspect of the code is  
> totally gone now when everything is in the IO monad.  Is there some  
> good reason to have it in the IO monad?

i think no. ST computations internally are really the same as IO ones



-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)

2006-11-03 Thread Bulat Ziganshin
Hello Donald,

Thursday, November 2, 2006, 2:21:31 PM, you wrote:

>> 10-20 times difference is typical for GHC programs.

> It's really more like 2-4x. Sometimes better than C.

> Where's this huge figure coming from Bulat? If you have code that
> behaves like this, you should report it.

are you analyzed the cases where performance is close or even better?
i does. it's either because C version is limited by memory performance
or just use different, less efficient algorithm

the cases which shows slowness of ghc-generated code is factorial
algorithm and the program attached. despite that Haskell code is far
uglier, C version outperforms it 20 times. run both programs with
arrays of about 10k elements to see the difference:

a.out 1 elements 10 iterations

in February i've written detailed explanation (in ghc-users) of why this
comes and made some suggestions on improving it. of course, main
problem is ghc's own code generator which is far away from gcc or even
ocaml ones

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

full.c
Description: Binary data


newest.hs
Description: Binary data
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe