#4951: Performance regression 7.0.1 -> 7.0.1.20110201 --------------------------------------+------------------------------------- Reporter: simonmar | Owner: simonpj Type: bug | Status: new Priority: highest | Milestone: 7.0.2 Component: Compiler | Version: 7.0.1 Resolution: | Keywords: Testcase: | Blockedby: Difficulty: | Os: Unknown/Multiple Blocking: | Architecture: Unknown/Multiple Failure: Runtime performance bug | --------------------------------------+-------------------------------------
Comment(by simonpj): I'm very puzzled. I've been looking at `imaginary/primes`, a very simple benchmark. With GHC 6.12 -O I get {{{ bash$ ./primes-612 4000 +RTS -s 37831 457,709,320 bytes allocated in the heap 82,966,936 bytes copied during GC 378,968 bytes maximum residency (29 sample(s)) 89,944 bytes maximum slop 3 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 844 collections, 0 parallel, 0.99s, 0.96s elapsed Generation 1: 29 collections, 0 parallel, 0.05s, 0.06s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 0.67s ( 0.69s elapsed) GC time 1.04s ( 1.03s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 1.71s ( 1.72s elapsed) }}} With HEAD I get {{{ bash$ ./primes 4000 +RTS -rprimes.ticky -s 37831 718,383,496 bytes allocated in the heap 79,051,592 bytes copied during GC 357,648 bytes maximum residency (25 sample(s)) 88,384 bytes maximum slop 3 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 848 collections, 0 parallel, 0.95s, 0.95s elapsed Generation 1: 25 collections, 0 parallel, 0.05s, 0.05s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 1.43s ( 1.43s elapsed) GC time 1.00s ( 1.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 2.43s ( 2.43s elapsed) }}} Note the massive increase in allocation, and in mutator execution time. BUT when I look at the `-ddump-simpl` code I see virtually the same code. (For HEAD I also used `-funfolding-use-threshold=9` to make `mod` inline; the HEAD seems a tiny bit less keen to inline. With this flag the two are close to identical.) Moreover, I compiled both with `-ticky` (including all the libraries). The allocation word counts from `-ticky` are practically the same for the two programs! So I'm stumped. Somewhere a lot of time and allocation is happening, but `-ticky` isn't seeing it. It's a really small program and (by the time we've done inlining) almost all the code is in Main (though it still calls `GHC.List.filter`). So where is that time and allocation going? Somewhere in the RTS? Simon -- Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/4951#comment:8> GHC <http://www.haskell.org/ghc/> The Glasgow Haskell Compiler _______________________________________________ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs