#4951: Performance regression 7.0.1 -> 7.0.1.20110201
--------------------------------------+-------------------------------------
  Reporter:  simonmar                 |          Owner:  simonpj         
      Type:  bug                      |         Status:  new             
  Priority:  highest                  |      Milestone:  7.0.2           
 Component:  Compiler                 |        Version:  7.0.1           
Resolution:                           |       Keywords:                  
  Testcase:                           |      Blockedby:                  
Difficulty:                           |             Os:  Unknown/Multiple
  Blocking:                           |   Architecture:  Unknown/Multiple
   Failure:  Runtime performance bug  |  
--------------------------------------+-------------------------------------

Comment(by simonpj):

 I'm very puzzled.  I've been looking at `imaginary/primes`, a very simple
 benchmark.  With GHC 6.12 -O I get
 {{{
 bash$ ./primes-612 4000 +RTS  -s
 37831
      457,709,320 bytes allocated in the heap
       82,966,936 bytes copied during GC
          378,968 bytes maximum residency (29 sample(s))
           89,944 bytes maximum slop
                3 MB total memory in use (0 MB lost due to fragmentation)

   Generation 0:   844 collections,     0 parallel,  0.99s,  0.96s elapsed
   Generation 1:    29 collections,     0 parallel,  0.05s,  0.06s elapsed

   INIT  time    0.00s  (  0.00s elapsed)
   MUT   time    0.67s  (  0.69s elapsed)
   GC    time    1.04s  (  1.03s elapsed)
   EXIT  time    0.00s  (  0.00s elapsed)
   Total time    1.71s  (  1.72s elapsed)
 }}}
 With HEAD I get
 {{{
 bash$ ./primes 4000 +RTS -rprimes.ticky -s
 37831
      718,383,496 bytes allocated in the heap
       79,051,592 bytes copied during GC
          357,648 bytes maximum residency (25 sample(s))
           88,384 bytes maximum slop
                3 MB total memory in use (0 MB lost due to fragmentation)

   Generation 0:   848 collections,     0 parallel,  0.95s,  0.95s elapsed
   Generation 1:    25 collections,     0 parallel,  0.05s,  0.05s elapsed

   INIT  time    0.00s  (  0.00s elapsed)
   MUT   time    1.43s  (  1.43s elapsed)
   GC    time    1.00s  (  1.00s elapsed)
   EXIT  time    0.00s  (  0.00s elapsed)
   Total time    2.43s  (  2.43s elapsed)
 }}}
 Note the massive increase in allocation, and in mutator execution time.

 BUT when I look at the `-ddump-simpl` code I see virtually the same code.
 (For HEAD I also used `-funfolding-use-threshold=9` to make `mod` inline;
 the HEAD seems a tiny bit less keen to inline.  With this flag the two are
 close to identical.)

 Moreover, I compiled both with `-ticky` (including all the libraries). The
 allocation word counts from `-ticky` are practically the same for the two
 programs!

 So I'm stumped. Somewhere a lot of time and allocation is happening, but
 `-ticky` isn't seeing it.

 It's a really small program and (by the time we've done inlining) almost
 all the code is in Main (though it still calls `GHC.List.filter`).

 So where is that time and allocation going?  Somewhere in the RTS?

 Simon

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/4951#comment:8>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler

_______________________________________________
Glasgow-haskell-bugs mailing list
Glasgow-haskell-bugs@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to