#4065: Inconsistent loop performance
---------------------------------+------------------------------------------
    Reporter:  rl                |        Owner:                         
        Type:  bug               |       Status:  new                    
    Priority:  normal            |    Milestone:  7.2.1                  
   Component:  Compiler          |      Version:  6.13                   
    Keywords:                    |     Testcase:                         
   Blockedby:                    |   Difficulty:                         
          Os:  Unknown/Multiple  |     Blocking:                         
Architecture:  Unknown/Multiple  |      Failure:  Runtime performance bug
---------------------------------+------------------------------------------

Comment(by daniel.is.fischer):

 I just ran the benchmark with 7.0.3, I get the reverse outcome with that,
 foo takes about 48ms and bar about 42ms. Via C (-O2 -fvia-C -optc-O3),
 both take 40-41ms (bar is typically a bit faster).
 So, unless it's platform-dependent, the new code generator seems to have
 turned things around.

 The core looks fine for both, basically
 {{{
 case n of
   0# -> ...
   _  -> ...

 case n <=# 0# of
   True  -> ...
   False -> ...
 }}}
 for the workers. I don't understand cmm, so I'm guessing, the relevant
 parts seem to be
 {{{
     cnO:
         _sn9::I32 = I32[Sp + 0];
         if (_sn9::I32 != 0) goto cnS;
         R1 = I32[Sp + 4];
         Sp = Sp + 8;
         jump (I32[Sp + 0]) ();
     cnS:
         _snI::I32 = I32[Sp + 4] + 1;
         _snJ::I32 = _sn9::I32 - 1;
         I32[Sp + 4] = _snI::I32;
         I32[Sp + 0] = _snJ::I32;
         jump FooBar_zdwbar_info ();
 }}}
 for bar and
 {{{
     coF:
         _coI::I32 = %MO_S_Le_W32(I32[Sp + 0], 0);
         ;
         if (_coI::I32 >= 1) goto coL;
         _soy::I32 = I32[Sp + 4] + 1;
         _soz::I32 = I32[Sp + 0] - 1;
         I32[Sp + 4] = _soy::I32;
         I32[Sp + 0] = _soz::I32;
         jump FooBar_zdwfoo_info ();
     coL:
         R1 = I32[Sp + 4];
         Sp = Sp + 8;
         jump (I32[Sp + 0]) ();
 }}}
 for foo. The asm:
 {{{
 FooBar_zdwbar_info:
 .LcnT:
         movl 0(%ebp),%eax
         testl %eax,%eax
         jne .LcnX
         movl 4(%ebp),%esi
         addl $8,%ebp
         jmp *0(%ebp)
 .LcnX:
         decl %eax
         incl 4(%ebp)
         movl %eax,0(%ebp)
         jmp FooBar_zdwbar_info
 }}}
 resp.
 {{{
 FooBar_zdwfoo_info:
 .Lcpe:
         cmpl $0,0(%ebp)
         jle .Lcph
         movl 0(%ebp),%eax
         decl %eax
         incl 4(%ebp)
         movl %eax,0(%ebp)
         jmp FooBar_zdwfoo_info
 .Lcph:
         movl 4(%ebp),%esi
         addl $8,%ebp
         jmp *0(%ebp)
 }}}
 doesn't tell me either why bar is faster than foo.

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/4065#comment:4>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to