On Thursday 17 June 2010 11:43:09, Roman Cheplyaka wrote:
> * Roman Cheplyaka <r...@ro-che.info> [2010-06-17 12:40:59+0300]
>
> > I'm trying to optimize the following program:
> > http://github.com/feuerbach/particles/blob/303c8a17c9b732e22457b5409bd
> >ce4b7520be94a/run.hs
> >
> > Of course general suggestions are welcome (BTW I'm going to give a try
> > to vector), but currently I'm concerned with two questions:
> >
> > 1. Heavy allocations in 'distance' function. Here is (part of) the
> > profile:
> >
> > COST CENTRE   MODULE    %time %alloc  ticks     bytes
> >
> > d2            Main        9.0   22.0    290 600000000
> > d             Main        8.6   65.9    278 1800000000
> > d1            Main        7.5   11.0    242 299700000
> >

I suspect the distance function is not what you intended,


distance :: Double -> Double -> Double
distance !x1 !x2 = {-# SCC "min" #-} min d1 d2
    where
    d = {-# SCC "d" #-} x1 - x2
    d1 = {-# SCC "d1" #-} abs d
    d2 = {-# SCC "d2" #-} abs $ l - d

that would give

distance 0.2 24.8 = 24.6, while the wrapping suggests that it should be 
0.4, so in d2, it should be d1 instead of d.
Either way, both d and d1 are <= 25, so the 'abs' in d2 is superfluous, 
removing that alone reduces the allocations drastically and the running 
time by ~40% (astonishingly, not in the profiling version, I suspect it's 
because profiling needs a few registers so that there aren't enough left 
for the loops on my box).

Further, if you export only main from the module, you allow GHC to be more 
aggressive with optimising. On my box, that leads to more allocation again 
because there aren't enough registers, but things become a little faster.

Also, a few more bangs here and there plus a couple of INLINE and UNPACK 
pragmas speed things up, the (on my box) fastest combination I've found is 
attached, it has the same semantics for distance as the original code, 
changing distance to what I believe it should be unfortunately slows it 
down significantly.

On my box, I get a further big speedup by compiling with

-O2 -fexcess-precision -fvia-C -optc-O3

> > >From reading core I got the impression that everything is strict &
> >
> > unboxed.

Not everything, there lurk a few boxed Doubles e.g. in average.

> > Perhaps this is related to creating some closures? How to get
> > rid of those allocations?
> >

Do you need to? Sometimes an allocating loop is faster than a non-
allocating one (of course, if you have enough registers for the allocating 
loop to run entirely in registers, it'll be much faster still).

IMO, the important criteria are time and resident memory, not allocation.

> > 2. Again from reading the core I learned that although 'l' and other
> > constants are inlined, their type is boxed Double. This makes sense
> > since CAFs are evaluated on demand, but obviously in this particular
> > case it does not make sense, so can I somehow make them unboxed?

Putting bangs in the loops where they are used likely uses the unboxed 
values; not exporting them too.

>
> Forgot to mention, I'm using ghc 6.12.1, compiling with -O2.

{-# LANGUAGE BangPatterns #-}

module Main (main)
where
import System.Random
import Text.Printf
import Data.List
import System.IO


{-# INLINE r #-}
r = 1
{-# INLINE r2 #-}
r2 = r*r
n = 1000
time = 100
{-# INLINE l #-}
l = 25
{-# INLINE h #-}
h = 0.01

data Point = Point {-# UNPACK #-} !Double {-# UNPACK #-} !Double
data Particle = Particle { point :: !Point, angle :: !Double } -- point, angle

{-# INLINE distance #-}
distance :: Double -> Double -> Double
distance !x1 !x2
    | x1 < x2   = x2-x1
    | otherwise = let !d = x1-x2 in min d (l-d)
-- distance !x1 !x2 = {-# SCC "min" #-} min d1 d2
--     where
--     !d  = {-# SCC "d" #-}  x1 - x2
--     !d1 = {-# SCC "d1" #-} abs d
--     !d2 = {-# SCC "d2" #-} abs $ l - d

{-# INLINE pointsAreClose #-}
pointsAreClose :: Point -> Point -> Bool
pointsAreClose (Point x1 y1) (Point x2 y2) = sqr (distance x1 x2) + sqr (distance y1 y2) < r2
--    where sqr !x = x * x

{-# INLINE sqr #-}
sqr :: Double -> Double
sqr !x = x*x

{-# INLINE average #-}
average :: [Double] -> Double
average list = -- let (!s,!n) = foldl' (\(!s,!n) x -> (s+x,n+1)) (0,0) list in s / n
    case foldl' (\(!s, !n) !x -> (s + x, n + 1)) (0.0, 0.0) list of
      (!s, !n) -> s / n

{-# INLINE wrap #-}
wrap :: Double -> Double
wrap x | x < 0     = x + l
       | x > l     = x - l
       | otherwise = x

makeStep :: [Particle] -> Double -> [Particle]
makeStep allParticles dt = map (makeStep1 dt allParticles) allParticles

makeStep1 :: Double -> [Particle] -> Particle -> Particle
makeStep1 dt allParticles particle = updateParticle dt particle newAngle
    where
      !newAngle = average . map angle . filter (pointsAreClose (point particle) . point) $ allParticles

updateParticle dt (Particle (Point x0 y0) _) newAngle = Particle (Point x1 y1) newAngle
    where
    x1 = wrap $ x0 + cos newAngle * dt
    y1 = wrap $ y0 + sin newAngle * dt

create = zipWith3 (\x y a -> Particle (Point x y) a)

move t list = foldl' makeStep list (replicate t h)

prettyPrint list =
    withFile "output.txt" WriteMode $ \h -> do
    mapM_ (pp1 h) list
    where
        pp1 h (Particle (Point kx ky) ka) = hPrintf h "%.4f %.4f %.4f %.4f" kx ky (cos ka) (sin ka)

order list = let (c,s) = foldl' (\(!c,!s) (Particle _ a) -> (c+cos a, s+sin a)) (0,0) list
             in sqrt (sqr (c / (fromIntegral n)) + sqr (s / (fromIntegral n)))

groupN n list = let (h,!t) = splitAt n list in h:groupN n t

main = do
        g <- getStdGen
        let [x,y,alpha] = take 3 $ groupN n $ randomRs (0, l) g
            initialPosition = create x y alpha
            finalPosition   = move time initialPosition
            orderPar        = order finalPosition

        prettyPrint finalPosition
        printf "Order parameter: %.4f\n" orderPar
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to