Am 28.09.2011 02:35, schrieb Ryan Ingram:
My guess is that Cont plays really nicely with GHC's inliner, so things that end up looking like

 return x >>= \y -> ...

get optimized really well

    return x >>= f
    -- inline >>=
    = ContState $ \s0 k -> runCS (return x) s0 $ \a s1 -> runCS (f a) s1 k
    -- inline return
= ContState $ \s0 k -> runCS (ContState $ \s2 k2 -> k2 x s2) s0 $ \a s1 -> runCS (f a) s1 k
    -- runCS record selector
= ContState $ \s0 k -> (\s2 k2 -> k2 x s2) s0 $ \a s1 -> runCS (f a) s1 k
    -- beta
    = ContState $ \s0 k -> (\k2 -> k2 x s0) $ \a s1 -> runCS (f a) s1 k
    -- beta
    = ContState $ \s0 k -> (\a s1 -> runCS (f a) s1 k) x s0
    -- beta
    = ContState $ \s0 k -> runCS (f x) s0 k

and then further inlining of f can take place.

I was even thinking - and this would have been the next idea to try if I couldn't get your example code to run so fast - to define some rules for the state monad (transformer) to "fuse" such expressions like

m >>= f >>= g = ...

or even

modify f >>= modify g = modify (g . f)

and perhaps other variations, so that it would perhaps end up in some nice combination of f and g, avoiding the intermediate tuples, hopefully with better performance. But then I did not follow it, and I want to concentrate on further improvements with the new code. The way is still long, because the top engines (written in C or C++) can do about 10 mil nps on my machine :-)

Nicu

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to