Hi,
I've narrowed down one of the problems with the LLVM backend that
David descried in his thesis. LLVM sometimes generates gratuitous
extra stack manipulations.
For example, consider this input program:
{{{
module Toy where
toy :: (Int -> Maybe Int) -> Int
toy f = case f 10 of Just x -> x; Nothing -> 20
}}}
Compiled using ghc -O (which runs llvl's "opt -O2"), we get the
following LLVM IR:
{{{
define cc10 void @Toy_toy_entry(i32 %stg_terei_baseArg, i32
%stg_terei_spArg, i32 %stg_terei_hpArg, i32 %stg_terei_r1Arg) nounwind
{
cdx:
%ndz = add i32 %stg_terei_spArg, -4 ; <i32> [#uses=3]
%ndB = add i32 %stg_terei_baseArg, 84 ; <i32> [#uses=1]
%ndC = inttoptr i32 %ndB to i32* ; <i32*> [#uses=1]
%ndD = load i32* %ndC ; <i32> [#uses=1]
%ndE = icmp ult i32 %ndz, %ndD ; <i1> [#uses=1]
br i1 %ndE, label %cdG, label %ndH
ndH: ; preds = %cdx
%ndJ = inttoptr i32 %stg_terei_spArg to i32* ; <i32*> [#uses=2]
%ndK = load i32* %ndJ ; <i32> [#uses=1]
%ndP = inttoptr i32 %ndz to i32* ; <i32*> [#uses=1]
store i32 add (i32 ptrtoint (%Toy_toy1_closure_struct*
@Toy_toy2_closure to i32), i32 1), i32* %ndP
store i32 ptrtoint (%sbo_info_struct* @sbo_info to i32), i32* %ndJ
tail call cc10 void @stg_ap_p_fast(i32 %stg_terei_baseArg, i32 %ndz,
i32 %stg_terei_hpArg, i32 %ndK) nounwind
ret void
cdG: ; preds = %cdx
%ne1 = add i32 %stg_terei_baseArg, -4 ; <i32> [#uses=1]
%ne2 = inttoptr i32 %ne1 to i32* ; <i32*> [#uses=1]
%ne3 = load i32* %ne2 ; <i32> [#uses=1]
%ne4 = inttoptr i32 %ne3 to void (i32, i32, i32, i32)* ; <void (i32,
i32, i32, i32)*> [#uses=1]
tail call cc10 void %ne4(i32 %stg_terei_baseArg, i32
%stg_terei_spArg, i32 %stg_terei_hpArg, i32 ptrtoint
(%Toy_toy_closure_struct* @Toy_toy_closure to i32)) nounwind
ret void
}
}}}
So far so good. However, when we compile this with "llc -tailcallopt
-O2 -march=x86", we get:
{{{
_Toy_toy_entry: ## @Toy_toy_entry
## BB#0: ## %cdx
subl $12, %esp
leal -4(%ebp), %eax
cmpl 84(%ebx), %eax
jb LBB2_2
## BB#1: ## %ndH
movl (%ebp), %esi
movl $_Toy_toy2_closure+1, -4(%ebp)
movl $_sbo_info, (%ebp)
movl %eax, %ebp
addl $12, %esp
jmp _stg_ap_p_fast # TAILCALL
LBB2_2: ## %cdG
movl -4(%ebx), %eax
movl $_Toy_toy_closure, %esi
addl $12, %esp
jmpl *%eax # TAILCALL
.globl ___stginit_Toy_
.align 4, 0x90
}}}
Note the "subl $12, %esp" and "addl $12, %esp" pairs. %esp is never
actually used, so this fiddling is entirely pointless! This seems to
happen in every single function LLVM compiles, so I imagine fixing the
problem would be quite a big win.
Without -tailcallopt, we get nice code:
{{{
_Toy_toy_entry: ## @Toy_toy_entry
## BB#0: ## %cdx
leal -4(%ebp), %eax
cmpl 84(%ebx), %eax
jb LBB2_2
## BB#1: ## %ndH
movl (%ebp), %esi
movl $_Toy_toy2_closure+1, -4(%ebp)
movl $_sbo_info, (%ebp)
movl %eax, %ebp
jmp _stg_ap_p_fast # TAILCALL
LBB2_2: ## %cdG
movl -4(%ebx), %eax
movl $_Toy_toy_closure, %esi
jmpl *%eax # TAILCALL
}}}
Unfortunately -tailcallopt is apparently the only way to get
*guaranteed* tail calls. The LLVM developers appear to know that
tailcallopt causes this sort of rubbish in the output
(http://old.nabble.com/Removing--tailcallopt--td27475523.html) but
obviously haven't fixed it (my results are with the very recent LLVM
r100183).
I couldn't get the stack manipulations to go away with any amount of
"noreturn" and "unreachable" annotations around "tail call"s and
function definitions. It was worth a try though :-)
In summary, it looks like anyone wanting to fix the excess stack
manipulations issue is going to have to get their hands messy and
delve into LLVM's LLC!
Cheers,
Max
_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc