Hi guys, Looks like there's a possible fix on that bug already.
This is pretty interesting discovery, and from the llvm-dev thread it sounds like those guys are aware of this but probably snowed under. I wonder how much mileage there might be in attempting to collar one of the devs for a chat (eg. irc, skype or whatever)? I wouldn't be surprised if they had a load of useful info to divulge, and could probably tell you whether tailcallopt is essential or not. Anyway, just a thought. Cheers, Sam -----Original Message----- From: [email protected] on behalf of David Terei Sent: Mon 05/04/2010 15:03 To: Max Bolingbroke Cc: GHC Development Mailing List Subject: Re: Stack manipulation in LLVM backend Yeah I had noticed this as well recently when I tried out the sibling call optimisation improvements in LLVM. The interesting thing is, can we get away with removing the '-tailcallopt' flag and just hoping for the best? It makes me a little worried that the tail calls aren't guaranteed but for the code GHC generates LLVM really shouldn't have any troubles picking them up with the plain sibling call optimisation. I ran the testsuite (fast) against the LLVM back-end on x86-32 today without the '-tailcallopt' argument to llc and had everything pass fine. I also ran nofib a couple of times and got a nice performance improvement of 2%. I'll try some more tests and on a different platform or two as well but its probably worth while getting rid of the '-tailcallopt' argument for now. Thanks for filing the bug report, will be interesting to see if there is much of a response. NoFib Results -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed -------------------------------------------------------------------------------- anna +2.0% +0.0% 0.12 0.13 ansi +0.0% +0.0% 0.00 0.00 atom +0.0% +0.0% +8.6% +10.0% awards +0.0% +0.0% 0.00 0.00 banner +0.0% +0.0% 0.00 0.00 bernouilli +0.0% +0.0% +0.5% +0.5% boyer +0.0% +0.0% 0.04 0.05 boyer2 +0.1% +0.0% 0.01 0.01 bspt +0.5% +0.0% 0.01 0.01 cacheprof +0.6% +0.0% +2.6% +2.9% calendar +0.0% +0.0% 0.00 0.00 cichelli +0.1% +0.0% 0.10 0.10 circsim +0.2% +0.0% +0.7% +0.2% clausify +0.0% +0.0% 0.05 0.05 comp_lab_zift +0.1% +0.0% +0.0% +0.9% compress +0.1% +0.0% 0.20 0.21 compress2 +0.1% +0.0% 0.16 0.18 constraints +0.1% +0.0% -1.4% +1.5% cryptarithm1 +0.0% +0.0% +5.1% +1.3% cryptarithm2 +0.1% +0.0% 0.02 0.01 cse +0.1% +0.0% 0.00 0.00 eliza +0.0% +0.0% 0.00 0.00 event +0.0% +0.0% 0.16 0.18 exp3_8 +0.0% +0.0% 0.11 0.12 expert +0.1% +0.0% 0.00 0.00 fem +0.5% -0.0% 0.02 0.03 fft +0.1% +0.0% 0.03 0.03 fft2 +0.1% +0.0% 0.08 0.09 fibheaps +0.1% +0.0% 0.03 0.04 fish +0.1% +0.0% 0.02 0.02 fluid +0.8% +0.0% 0.01 0.01 fulsom +0.6% +0.0% +2.3% +0.7% gamteb +0.2% +0.0% 0.10 0.10 gcd +0.0% +0.0% 0.03 0.03 gen_regexps +0.0% +0.0% 0.00 0.00 genfft +0.0% +0.0% 0.04 0.04 gg +0.6% +0.0% 0.01 0.01 grep +0.2% +0.0% 0.00 0.00 hidden +0.4% -0.0% +1.3% -2.5% hpg +0.4% +0.0% 0.17 +2.7% ida +0.1% +0.0% 0.11 0.12 infer +0.4% +0.0% 0.06 0.06 integer +0.0% +0.0% +3.0% +4.2% integrate +0.0% +0.0% -1.4% -3.0% knights +0.2% +0.0% 0.00 0.01 lcss +0.0% +0.0% +3.8% +1.1% life +0.0% +0.0% +4.9% +4.0% lift +0.3% +0.0% 0.00 0.00 listcompr +0.0% +0.0% 0.11 0.11 listcopy +0.1% +0.0% 0.11 0.12 maillist +0.0% -0.0% 0.06 0.10 mandel +0.1% -0.0% 0.13 0.13 mandel2 +0.0% +0.0% 0.01 0.01 minimax +0.1% +0.0% 0.00 0.00 mkhprog +0.1% +0.0% 0.00 0.00 multiplier +0.2% +0.0% 0.13 0.14 nucleic2 +0.2% +0.0% 0.09 0.09 para +0.2% +0.0% +0.0% -0.9% paraffins +0.1% +0.0% 0.10 0.11 parser +0.5% +0.0% 0.04 0.04 parstof +0.3% +0.0% 0.00 0.01 pic +0.2% +0.0% 0.01 0.01 power +0.2% +0.0% +0.8% +3.5% pretty +0.1% +0.0% 0.00 0.00 primes +0.0% +0.0% 0.05 0.06 primetest +0.1% +0.0% -2.4% -3.4% prolog +0.2% +0.0% 0.00 0.00 puzzle +0.1% +0.0% 0.20 +5.9% queens +0.0% +0.0% 0.02 0.02 reptile +0.3% +0.0% 0.02 0.02 rewrite +0.2% +0.0% 0.02 0.02 rfib +0.0% +0.0% 0.04 0.04 rsa +0.0% +0.0% 0.09 0.09 scc +0.0% +0.0% 0.00 0.00 sched +0.0% +0.0% 0.03 0.03 scs +0.5% -0.0% -1.3% -2.0% simple +0.8% +0.0% +3.8% +0.0% solid +0.1% +0.0% 0.18 -10.6% sorting +0.0% +0.0% 0.00 0.00 sphere +0.3% +0.0% 0.12 0.13 symalg +0.4% +0.0% 0.04 0.04 tak +0.0% +0.0% 0.02 0.02 transform +0.4% +0.0% +4.3% +0.5% treejoin +0.0% -0.0% +1.7% -12.3% typecheck +0.1% +0.0% +1.6% +0.7% veritas +1.6% +0.0% 0.00 0.00 wang +0.1% +0.0% 0.10 0.11 wave4main +0.1% +0.0% +2.9% +2.8% wheel-sieve1 +0.0% +0.0% +10.1% +9.9% wheel-sieve2 +0.0% +0.0% 0.17 -2.9% x2n1 +0.0% +0.0% 0.02 0.03 -------------------------------------------------------------------------------- Min +0.0% -0.0% -2.4% -12.3% Max +2.0% +0.0% +10.1% +10.0% Geometric Mean +0.2% -0.0% +2.2% +0.5% On 4 April 2010 21:52, Max Bolingbroke <[email protected]> wrote: > I've filed this as bug 6774 on the LLVM trac > (http://llvm.org/bugs/show_bug.cgi?id=6774), so the issue at least > stands a chance of being fixed. > > On 2 April 2010 16:02, Max Bolingbroke <[email protected]> wrote: >> Hi, >> >> I've narrowed down one of the problems with the LLVM backend that >> David descried in his thesis. LLVM sometimes generates gratuitous >> extra stack manipulations. >> >> For example, consider this input program: >> >> {{{ >> module Toy where >> >> toy :: (Int -> Maybe Int) -> Int >> toy f = case f 10 of Just x -> x; Nothing -> 20 >> }}} >> >> Compiled using ghc -O (which runs llvl's "opt -O2"), we get the >> following LLVM IR: >> >> {{{ >> define cc10 void @Toy_toy_entry(i32 %stg_terei_baseArg, i32 >> %stg_terei_spArg, i32 %stg_terei_hpArg, i32 %stg_terei_r1Arg) nounwind >> { >> cdx: >> %ndz = add i32 %stg_terei_spArg, -4 ; <i32> [#uses=3] >> %ndB = add i32 %stg_terei_baseArg, 84 ; <i32> [#uses=1] >> %ndC = inttoptr i32 %ndB to i32* ; <i32*> [#uses=1] >> %ndD = load i32* %ndC ; <i32> [#uses=1] >> %ndE = icmp ult i32 %ndz, %ndD ; <i1> [#uses=1] >> br i1 %ndE, label %cdG, label %ndH >> >> ndH: ; preds = %cdx >> %ndJ = inttoptr i32 %stg_terei_spArg to i32* ; <i32*> [#uses=2] >> %ndK = load i32* %ndJ ; <i32> [#uses=1] >> %ndP = inttoptr i32 %ndz to i32* ; <i32*> [#uses=1] >> store i32 add (i32 ptrtoint (%Toy_toy1_closure_struct* >> @Toy_toy2_closure to i32), i32 1), i32* %ndP >> store i32 ptrtoint (%sbo_info_struct* @sbo_info to i32), i32* %ndJ >> tail call cc10 void @stg_ap_p_fast(i32 %stg_terei_baseArg, i32 %ndz, >> i32 %stg_terei_hpArg, i32 %ndK) nounwind >> ret void >> >> cdG: ; preds = %cdx >> %ne1 = add i32 %stg_terei_baseArg, -4 ; <i32> [#uses=1] >> %ne2 = inttoptr i32 %ne1 to i32* ; <i32*> [#uses=1] >> %ne3 = load i32* %ne2 ; <i32> [#uses=1] >> %ne4 = inttoptr i32 %ne3 to void (i32, i32, i32, i32)* ; <void (i32, >> i32, i32, i32)*> [#uses=1] >> tail call cc10 void %ne4(i32 %stg_terei_baseArg, i32 >> %stg_terei_spArg, i32 %stg_terei_hpArg, i32 ptrtoint >> (%Toy_toy_closure_struct* @Toy_toy_closure to i32)) nounwind >> ret void >> } >> }}} >> >> So far so good. However, when we compile this with "llc -tailcallopt >> -O2 -march=x86", we get: >> >> {{{ >> _Toy_toy_entry: ## @Toy_toy_entry >> ## BB#0: ## %cdx >> subl $12, %esp >> leal -4(%ebp), %eax >> cmpl 84(%ebx), %eax >> jb LBB2_2 >> ## BB#1: ## %ndH >> movl (%ebp), %esi >> movl $_Toy_toy2_closure+1, -4(%ebp) >> movl $_sbo_info, (%ebp) >> movl %eax, %ebp >> addl $12, %esp >> jmp _stg_ap_p_fast # TAILCALL >> LBB2_2: ## %cdG >> movl -4(%ebx), %eax >> movl $_Toy_toy_closure, %esi >> addl $12, %esp >> jmpl *%eax # TAILCALL >> >> .globl ___stginit_Toy_ >> .align 4, 0x90 >> }}} >> >> Note the "subl $12, %esp" and "addl $12, %esp" pairs. %esp is never >> actually used, so this fiddling is entirely pointless! This seems to >> happen in every single function LLVM compiles, so I imagine fixing the >> problem would be quite a big win. >> >> Without -tailcallopt, we get nice code: >> >> {{{ >> _Toy_toy_entry: ## @Toy_toy_entry >> ## BB#0: ## %cdx >> leal -4(%ebp), %eax >> cmpl 84(%ebx), %eax >> jb LBB2_2 >> ## BB#1: ## %ndH >> movl (%ebp), %esi >> movl $_Toy_toy2_closure+1, -4(%ebp) >> movl $_sbo_info, (%ebp) >> movl %eax, %ebp >> jmp _stg_ap_p_fast # TAILCALL >> LBB2_2: ## %cdG >> movl -4(%ebx), %eax >> movl $_Toy_toy_closure, %esi >> jmpl *%eax # TAILCALL >> }}} >> >> Unfortunately -tailcallopt is apparently the only way to get >> *guaranteed* tail calls. The LLVM developers appear to know that >> tailcallopt causes this sort of rubbish in the output >> (http://old.nabble.com/Removing--tailcallopt--td27475523.html) but >> obviously haven't fixed it (my results are with the very recent LLVM >> r100183). >> >> I couldn't get the stack manipulations to go away with any amount of >> "noreturn" and "unreachable" annotations around "tail call"s and >> function definitions. It was worth a try though :-) >> >> In summary, it looks like anyone wanting to fix the excess stack >> manipulations issue is going to have to get their hands messy and >> delve into LLVM's LLC! >> >> Cheers, >> Max >> > > _______________________________________________ > Cvs-ghc mailing list > [email protected] > http://www.haskell.org/mailman/listinfo/cvs-ghc > _______________________________________________ Cvs-ghc mailing list [email protected] http://www.haskell.org/mailman/listinfo/cvs-ghc
_______________________________________________ Cvs-ghc mailing list [email protected] http://www.haskell.org/mailman/listinfo/cvs-ghc
