Re: Removing/deprecating -fvia-c

David Terei Thu, 18 Feb 2010 01:09:43 -0800

Don Stewart wrote:

Here's an example that doesn't use floating point:


import Data.Array.Vector
import Data.Bits

main = print . sumU $ zipWith3U (\x y z -> x * y * z)
(enumFromToU 1 (100000000 :: Int))
(enumFromToU 2 (100000001 :: Int))
(enumFromToU 7 (100000008 :: Int))

In core:

main_$s$wfold :: Int# -> Int# -> Int# -> Int# -> Int#
main_$s$wfold =
\ (sc_s1l1 :: Int#)
(sc1_s1l2 :: Int#)
(sc2_s1l3 :: Int#)
(sc3_s1l4 :: Int#) ->
case># sc2_s1l3 100000000 of _ {
False ->
case># sc1_s1l2 100000001 of _ {
False ->
case># sc_s1l1 100000008 of _ {
False ->
main_$s$wfold
(+# sc_s1l1 1)
(+# sc1_s1l2 1)
(+# sc2_s1l3 1)
(+#
sc3_s1l4 (*# (*# sc2_s1l3 sc1_s1l2) sc_s1l1));
True -> sc3_s1l4
};
True -> sc3_s1l4
};
True -> sc3_s1l4
}

Rather nice!

-fvia-C -optc-O3

Main_mainzuzdszdwfold_info:
cmpq $100000000, %rdi
jg .L6
cmpq $100000001, %rsi
jg .L6
cmpq $100000008, %r14
jle .L10
.L6:
movq %r8, %rbx
movq (%rbp), %rax
jmp *%rax
.L10:
movq %rsi, %r10
leaq 1(%rsi), %rsi
imulq %rdi, %r10
leaq 1(%rdi), %rdi
imulq %r14, %r10
leaq 1(%r14), %r14
leaq (%r10,%r8), %r8
jmp Main_mainzuzdszdwfold_info

Which looks ok.

$ time ./zipwith3
3541230156834269568
./zipwith3 0.33s user 0.00s system 99% cpu 0.337 total

And -fasm we get very different code, and a bit of a slowdown:

Main_mainzuzdszdwfold_info:
.Lc1mo:
cmpq $100000000,%rdi
jg .Lc1mq
cmpq $100000001,%rsi
jg .Lc1ms
cmpq $100000008,%r14
jg .Lc1mv

movq %rsi,%rax
imulq %r14,%rax
movq %rdi,%rcx
imulq %rax,%rcx
movq %r8,%rax
addq %rcx,%rax
leaq 1(%rdi),%rcx
leaq 1(%rsi),%rdx
incq %r14
movq %rdx,%rsi
movq %rcx,%rdi
movq %rax,%r8
jmp Main_mainzuzdszdwfold_info

.Lc1mq:
movq %r8,%rbx
jmp *(%rbp)
.Lc1ms:
movq %r8,%rbx
jmp *(%rbp)
.Lc1mv:
movq %r8,%rbx
jmp *(%rbp)

Slower:

$ time ./zipwith3
3541230156834269568
./zipwith3 0.38s user 0.00s system 98% cpu 0.384 total

Now maybe we need to wait on the new backend optimizations to get there?

I've been looking at some of these cases and seeing how the LLVMback-end performs. My general impression from benchmarking the LLVMback-end in the past has been that it generally performs with similarcharacteristics as the C code generator (that is, where the C codegenerator stood out compared to the NCG, so did LLVM).


(On x86-32/Mac OS X 10.5, had some issues getting x64 working at moment):

./zipWith3_viac 0.72s
./zipWith3_fasm 0.65s
./zipWith3_llvm 0.38s

Code that LLVM produces:

_Main_mainzuzdszdwfold_entry:
## BB#0: ## %c1qP
subl $12, %esp
jmp LBB2_1
.align 4, 0x90
LBB2_4: ## %n1re
## Loop Depth 1
## Loop Header is BB2_1
## Inner Loop
movl %ecx, %esi
incl %ecx
imull %eax, %esi
incl %eax
imull %edx, %esi
incl %edx
addl (%ebp), %esi
movl %edx, 12(%ebp)
movl %ecx, 8(%ebp)
movl %eax, 4(%ebp)
movl %esi, (%ebp)
LBB2_1: ## %tailrecurse
## Loop Depth 1
## Loop Header
## Inner Loop
movl 4(%ebp), %eax
cmpl $100000000, %eax
jg LBB2_5
## BB#2: ## %n1qX
## Loop Depth 1
## Loop Header is BB2_1
## Inner Loop
movl 8(%ebp), %ecx
cmpl $100000001, %ecx
jg LBB2_5
## BB#3: ## %n1r5
## Loop Depth 1
## Loop Header is BB2_1
## Inner Loop
movl 12(%ebp), %edx
cmpl $100000008, %edx
jle LBB2_4
LBB2_5: ## %c1qW
movl 16(%ebp), %eax
movl (%ebp), %esi
addl $16, %ebp
movl (%eax), %eax
addl $12, %esp
jmpl *%eax # TAILCALL

Which is very nice. (The comments in the code are inserted by LLVM, not me).

I also ran through some of the programs outlined here:

http://permalink.gmane.org/gmane.comp.lang.haskell.glasgow.user/18151

All ran with 'echo '1e-8' | ./$PRG'.

Loop.hs:
========================================
{-# LANGUAGE BangPatterns #-}
module Main (main) where

main :: IO ()
main = do
putStrLn "EPS: "
eps <- readLn :: IO Double
let !mx = (4/eps)
!pi14 = pisum mx
putStrLn $ "PI mit EPS "++(show eps)++" = "++ show(4*pi14)

pisum :: Double -> Double
pisum cut = go True 1 0
where
go b n s | cut < n = if b then s+1/(2*n) else s-1/(2*n)
go True n !s = go False (n+2) (s+recip n)
go False n !s = go True (n+2) (s-recip n)
========================================

./Loops_fasm 4.53s
./Loops_viac 4.22s
./Loops_llvm 2.89s


Fusion.hs (uses stream-fusion package)
========================================
module Main (main) where

import qualified Data.List.Stream as S

main :: IO ()
main = do
putStrLn "EPS: "
eps <- readLn :: IO Double
let !mx = floor (4/eps)
!k = (mx+1) `quot` 2
putStrLn $ "PI mit EPS " ++ (show eps) ++ " = " ++ show (leibniz k)

leibniz n = (4 *) $ S.sum $ S.take n step

step :: [Double]
step = S.unfoldr phi (True,1) where
phi (sig,d) | sig = Just (1/d, (False,d+2))
| otherwise = Just (negate (1/d), (True,d+2))
========================================

./Fusion_fasm 4.61s
./Fusion_viac 4.22s
./Fusion_llvm 3.62s


List.hs
========================================
module Main (main) where

import Data.List (unfoldr)

main :: IO ()
main = do
putStrLn "EPS: "
eps <- readLn :: IO Double
let mx = floor (4/eps)
!k = (mx+1) `quot` 2
putStrLn $ "PI mit EPS " ++ (show eps) ++ " = " ++ show (leibniz k)

leibniz n = (4 *) $ sum $ take n step

step :: [Double]
step = unfoldr phi (True,1) where
phi (sig,d) | sig = Just (1/d, (False,d+2))
| otherwise = Just (negate (1/d), (True,d+2))
========================================

./List_fasm 18.21s
./List_viac 16.71s
./List_llvm 16.92s

So with these kinds of results (obviously I'm biased though since Iwrote the llvm back-end) I think the sentiment that the -fvia-C approachshould be eventually removed is the right way to go since with the LLVMback-end and the new code generator there is a promising and much moreinteresting future.


~ David
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: Removing/deprecating -fvia-c

Reply via email to