Is it possible INLINE didn't inline the function because it's recursive? If it were my function, I'd probably try a manual worker /wrapper.
On 07:59, Wed, Dec 17, 2014 Simon Peyton Jones <simo...@microsoft.com> wrote: > I still would like to understand why INLINE does not make it inline. > That's weird. > > Eg way to reproduce. > > Simion > > | -----Original Message----- > | From: Richard Eisenberg [mailto:e...@cis.upenn.edu] > | Sent: 17 December 2014 15:56 > | To: Simon Peyton Jones > | Cc: Joachim Breitner; ghc-devs@haskell.org > | Subject: Re: performance regressions > | > | By unsubstantiated guess is that INLINEABLE would have the same effect > | as INLINE here, as GHC doesn't see fit to actually inline the > | function, even with INLINE -- the big improvement seen between (1) and > | (2) is actually specialization, not inlining. The jump from (2) to (3) > | is actual inlining. Thus, it seems that GHC's heuristics for inlining > | aren't working out for the best here. > | > | I've pushed my changes, though I agree with Simon that more research > | may uncover even more improvements here. I didn't focus on the number > | of calls because that number didn't regress. Will look into this soon. > | > | Richard > | > | On Dec 17, 2014, at 4:15 AM, Simon Peyton Jones > | <simo...@microsoft.com> wrote: > | > | > If you use INLINEABLE, that should make the function specialisable > | to a particular monad, even if it's in a different module. You > | shouldn't need INLINE for that. > | > > | > I don't understand the difference between cases (2) and (3). > | > > | > I am still suspicious of why there are so many calls to this one > | function that it, alone, is allocating a significant proportion of > | compilation of the entire run of GHC. Are you sure there isn't an > | algorithmic improvement to be had, to simply reduce the number of > | calls? > | > > | > Simon > | > > | > | -----Original Message----- > | > | From: ghc-devs [mailto:ghc-devs-boun...@haskell.org] On Behalf Of > | > | Richard Eisenberg > | > | Sent: 16 December 2014 21:46 > | > | To: Joachim Breitner > | > | Cc: ghc-devs@haskell.org > | > | Subject: Re: performance regressions > | > | > | > | I've learned several very interesting things in this analysis. > | > | > | > | - Inlining polymorphic methods is very important. Here are some > | > | data points to back up that claim: > | > | * Original implementation using zipWithAndUnzipM: > | 8,472,613,440 > | > | bytes allocated in the heap > | > | * Adding {-# INLINE #-} to the definition thereof: > | 6,639,253,488 > | > | bytes allocated in the heap > | > | * Using `inline` at call site to force inlining: > | 6,281,539,792 > | > | bytes allocated in the heap > | > | > | > | The middle step above allowed GHC to specialize zipWithAndUnzipM > | to > | > | my particular monad, but GHC didn't see fit to actually inline > | the > | > | function. Using `inline` forced it, to good effect. (I did not > | > | collect data on code sizes, but it wouldn't be hard to.) > | > | > | > | By comparison: > | > | * Hand-written recursion: 6,587,809,112 bytes allocated in > | the > | > | heap > | > | Interestingly, this is *not* the best result! > | > | > | > | Conclusion: We should probably add INLINE pragmas to Util and > | > | MonadUtils. > | > | > | > | > | > | - I then looked at rejiggering the algorithm to keep the common > | > | case fast. This had a side effect of changing the > | zipWithAndUnzipM > | > | to mapAndUnzipM, from Control.Monad. To my surprise, this brought > | > | disaster! > | > | * Using `inline` and mapAndUnzipM: 7,463,047,432 bytes > | > | allocated in the heap > | > | * Hand-written recursion: 5,848,602,848 bytes > | > | allocated in the heap > | > | > | > | That last number is better than the numbers above because of the > | > | algorithm streamlining. But, the inadequacy of mapAndUnzipM > | > | surprised me -- it already has an INLINE pragma in Control.Monad > | of course. > | > | Looking at -ddump-simpl, it seems that mapAndUnzipM was indeed > | > | getting inlined, but a call to `map` remained, perhaps causing > | > | extra allocation. > | > | > | > | Conclusion: We should examine the implementation of mapAndUnzipM > | > | (and similar functions) in Control.Monad. Is it as fast as > | possible? > | > | > | > | > | > | > | > | In the end, I was unable to bring the allocation numbers down to > | > | where they were before my work. This is because the flattener now > | > | deals in roles. Most of its behavior is the same between nominal > | > | and representational roles, so it seems silly (though very > | > | possible) to specialize the code to nominal to keep that path > | fast. > | > | Instead, I identified one key spot and made that go fast. > | > | > | > | Thus, there is a 7% bump to memory usage on very-type-family- > | heavy > | > | code, compared to before my commit on Friday. (On more ordinary > | > | code, there is no noticeable change.) > | > | > | > | Validating my patch locally now; will push when that's done. > | > | > | > | Thanks, > | > | Richard > | > | > | > | On Dec 16, 2014, at 10:41 AM, Joachim Breitner <mail@joachim- > | > | breitner.de> wrote: > | > | > | > | > Hi, > | > | > > | > | > > | > | > Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard > | Eisenberg: > | > | >> On Dec 16, 2014, at 4:01 AM, Joachim Breitner <mail@joachim- > | > | breitner.de> wrote: > | > | >> > | > | >>> another guess (without looking at the code, sorry): Are they > | in > | > | the >>> same module? I.e., can GHC specialize the code to your > | > | particular Monad? > | > | > > | > | >> No, they're not in the same module. I could also try moving > | the > | > | >> zipWithAndUnzipM function to the same module, and even > | > | specializing >> it by hand to the right monad. > | > | > > | > | > I did mean zipWithAndUnzipM, so maybe yes: Try that. > | > | > > | > | > (I find it hard to believe that any polymorphic monadic code > | > | should > perform well, with those many calls to an unknown (>>=) > | > | with a > function parameter, but maybe I'm too pessimistic here.) > | > | > > >> Could that be preventing the fusing? > | > | > > | > | > There is not going to be any fusing here, at least not list > | > | fusion; > that would require your code to be written in terms of > | > | functions with > fusion rules. > | > | > > | > | > Greetings, > | > | > Joachim > | > | > > | > | > -- > | > | > Joachim "nomeata" Breitner > | > | > m...@joachim-breitner.de * http://www.joachim-breitner.de/ > > | > | Jabber: nome...@joachim-breitner.de * GPG-Key: 0xF0FBF51F Debian > | > | > Developer: nome...@debian.org > > > | > | _______________________________________________ > | > | > ghc-devs mailing list > | > | > ghc-devs@haskell.org > | > | > http://www.haskell.org/mailman/listinfo/ghc-devs > | > | > | > | _______________________________________________ > | > | ghc-devs mailing list > | > | ghc-devs@haskell.org > | > | http://www.haskell.org/mailman/listinfo/ghc-devs > | > > > _______________________________________________ > ghc-devs mailing list > ghc-devs@haskell.org > http://www.haskell.org/mailman/listinfo/ghc-devs >
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs