derek.a.elkins: > On Mon, 2008-05-12 at 19:30 -0700, Don Stewart wrote: > > > I offer up the following example: > > > > > > mean xs = sum xs / length xs > > > > > > Now try, say, "mean [1.. 1e9]", and watch GHC eat several GB of RAM. (!!) > > > > But you know why, don't you? > > > > > sat down and spent the best part of a day writing an MD5 > > > implementation. Eventually I got it so that all the test vectors work > > > right. (Stupid little-endian nonsense... mutter mutter...) When I tried > > > it on a file containing more than 1 MB of data... ooooohhhh dear... > > > > Did you use Data.Binary or the other libraries for efficient access to > > gigabytes of data? > > > > > Of course, the first step in any serious attempt at performance > > > improvement > > > is to actually profile the code to figure out where the time > > > is being spent. > > > > Well, actually, for our 'mean()' case, it means just using the right > > structures. > > Arrays for example: > > > > import Data.Array.Vector > > import Text.Printf > > > > mean :: UArr Double -> Double > > mean arr = b / fromIntegral a > > where > > k (n :*: s) a = n+1 :*: s+a > > a :*: b = foldlU k (0 :*: 0) arr :: (Int :*: Double) > > > > main = printf "%f\n" . mean $ enumFromToFracU 1 1e9 > > > > For example, > > > > $ time ./A > > 5.00000000067109e8 > > ./A 3.53s user 0.00s system 99% cpu 3.532 total > > > > Try allocating an array of doubles in C, and getting similar results. > > (The compiler is optimising this to: > > > > Main_zdszdwfold_info: > > leaq 32(%r12), %rax > > cmpq %r15, %rax > > movq %rax, %r12 > > ja .L10 > > movsd .LC0(%rip), %xmm0 > > ucomisd %xmm5, %xmm0 > > jae .L12 > > movq %rsi, (%rax) > > movq $base_GHCziFloat_Dzh_con_info, -24(%rax) > > movsd %xmm6, -16(%rax) > > movq $base_GHCziBase_Izh_con_info, -8(%rax) > > leaq -7(%rax), %rbx > > leaq -23(%rax), %rsi > > jmp *(%rbp) > > .L12: > > movapd %xmm6, %xmm0 > > addq $1, %rsi > > subq $32, %r12 > > addsd %xmm5, %xmm0 > > addsd .LC2(%rip), %xmm5 > > movapd %xmm0, %xmm6 > > jmp Main_zdszdwfold_info > > > > Note even any garbage collection. This should be pretty much the same > > performance-wise as unoptimised C. > > > > > almost any nontrivial program you write > > > spends 60% or more of its time doing GC rather than actual work. > > > > Ok, you're doing something very wrong. GC time is typically less than 15% > > of the running > > time of typical work programs I hack on. > > > > I bet you're using lists inappropriately? > > > > > I find it completely impossibly to write code that doesn't crawl along at > > > a > > > snail's pace. Even when I manage to make it faster, I usually have no > > > clue > > > why. > > > > I think there is a problem that few people are taking the time to understand > > the compilation model of Haskell, while they've had the C runtime behaviour > > drilled into their brains since college. > > > > Until you sit down and understand what your Haskell code means, it will be > > very > > hard to reason about optimisations, unfortunately. > > > > GHC does what it does well, and its predictable. As long as you understand > > the operations your trying to make predictions about. > > > > I suggest installing ghc-core, and looking at how your code is compiled. > > Try many examples, and have a good knowledge of the STG paper. > > My experience has been that simply understanding lazy/normal order > evaluation is enough to catch most of the "big performance problems" > beginners complain about. If you can't evaluate Haskell by hand at the > source level, you have no hope of understanding performance issues. > > For going more for C-like speed, your advice seems more appropriate. >
Yes, I agree. You must first be able to reason about the evaluation strategy on paper. If you want to get C like speed, you should understand Core. Thankfully, the knowledge of how to do this, and tools to help, are improving. -- Don _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
