bulat.ziganshin: > Hello haskell-cafe, > > since there are no objective tests comparing ghc to gcc, i made my own > one. these are 3 programs, calculating sum in c++ and haskell:
Wonderful. Thank you! > main = print $ sum[1..10^9::Int] This won't be comparable to your loop below, as 'sum' is a left fold (which doesn't fuse under build/foldr). You should use the list implementation from the stream-fusion package (or uvector) if you're expecting it to fuse to the following loop: > main = print $ sum0 (10^9) 0 > > sum0 :: Int -> Int -> Int > sum0 0 !acc = acc > sum0 !x !acc = sum0 (x-1) (acc+x) Note the bang patterns aren't required here. It compiles to the following core: $wsum0 :: Int# -> Int# -> Int# $wsum0 = \ (ww_sON :: Int#) (ww1_sOR :: Int#) -> case ww_sON of ds_XD0 { _ -> $wsum0 (-# ds_XD0 1) (+# ww1_sOR ds_XD0); 0 -> ww1_sOR which is perfect. Main_zdwsum0_info: testq %rsi, %rsi movq %rsi, %rax jne .L2 movq %rdi, %rbx jmp *(%rbp) .L2: leaq -1(%rsi), %rsi addq %rax, %rdi jmp Main_zdwsum0_info Which seems ... OK. $ ghc-core A.hs -fvia-C -optc-O3 $ time ./A 500000000500000000 ./A 1.12s user 0.00s system 99% cpu 1.127 total Works for me. That's on linux x86_64, gcc 4.4 Trying -fasm: Main_zdwsum0_info: .LcQs: movq %rsi,%rax testq %rax,%rax jne .LcQw movq %rdi,%rbx jmp *(%rbp) .LcQw: movq %rdi,%rcx addq %rax,%rcx leaq -1(%rax),%rsi movq %rcx,%rdi jmp Main_zdwsum0_info $ time ./A 500000000500000000 ./A 1.65s user 0.00s system 98% cpu 1.677 total Is a bit slower. > main() > { > int sum=0; > //for(int j=0; j<100;j++) > for(int i=0; i<1000*1000*1000;i++) > sum += i; > return sum; > } Well, that's a bit different. It doesn't print the result, and it returns a different results on 64 bit.... $ gcc -O0 t.c $ time ./a.out -1243309312 ./a.out 3.99s user 0.00s system 88% cpu 4.500 total $ gcc -O1 t.c $ time ./a.out -1243309312 ./a.out 0.88s user 0.00s system 99% cpu 0.892 total $ gcc -O3 -funroll-loops t.c $ time ./a.out -1243309312 ./a.out 0.31s user 0.00s system 97% cpu 0.318 total I don't get anything near the 0.062s which is interesting. The print statement slows things down, I guess... So we have: ghc -fvia-C -O2 1.127 ghc -fasm 1.677 gcc -O0 4.500 gcc -O3 -funroll-loops 0.318 So. some lessons. GHC is around 3-4x slower on this tight loop. (Which isn't as bad as it used to be). That's actually a worse margin than any current shootout program, where we are no worse than 2.9 slower on larger things: http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=ghc&lang2=gcc&box=1 > > execution times: > sum: > ghc 6.6.1 -O2 : 12.433 secs > ghc 6.10.1 -O2 : 12.792 secs > sum-fast: > ghc 6.6.1 -O2 : 1.919 secs > ghc 6.10.1 -O2 : 1.856 secs > ghc 6.10.1 -O2 -fvia-C : 1.966 secs > C++: > gcc 3.4.5 -O3 -funroll-loops: 0.062 secs > I couldn't reproduce your final number. Now, given GHC gets most of the way there -- I think this might make a good bug report against GHC head, so we can see if the new register allocator helps any. http://hackage.haskell.org/trac/ghc/newticket?type=bug Thanks for the report, Bulat! -- Don _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe