I've learned several very interesting things in this analysis.

- Inlining polymorphic methods is very important. Here are some data points to 
back up that claim:
   * Original implementation using zipWithAndUnzipM:    8,472,613,440 bytes 
allocated in the heap
   * Adding {-# INLINE #-} to the definition thereof:   6,639,253,488 bytes 
allocated in the heap
   * Using `inline` at call site to force inlining:     6,281,539,792 bytes 
allocated in the heap

The middle step above allowed GHC to specialize zipWithAndUnzipM to my 
particular monad, but GHC didn't see fit to actually inline the function. Using 
`inline` forced it, to good effect. (I did not collect data on code sizes, but 
it wouldn't be hard to.)

By comparison:
   * Hand-written recursion:    6,587,809,112 bytes allocated in the heap
Interestingly, this is *not* the best result!

Conclusion: We should probably add INLINE pragmas to Util and MonadUtils.


- I then looked at rejiggering the algorithm to keep the common case fast. This 
had a side effect of changing the zipWithAndUnzipM to mapAndUnzipM, from 
Control.Monad. To my surprise, this brought disaster!
   * Using `inline` and mapAndUnzipM:        7,463,047,432 bytes allocated in 
the heap
   * Hand-written recursion:                 5,848,602,848 bytes allocated in 
the heap

That last number is better than the numbers above because of the algorithm 
streamlining. But, the inadequacy of mapAndUnzipM surprised me -- it already 
has an INLINE pragma in Control.Monad of course. Looking at -ddump-simpl, it 
seems that mapAndUnzipM was indeed getting inlined, but a call to `map` 
remained, perhaps causing extra allocation.

Conclusion: We should examine the implementation of mapAndUnzipM (and similar 
functions) in Control.Monad. Is it as fast as possible?



In the end, I was unable to bring the allocation numbers down to where they 
were before my work. This is because the flattener now deals in roles. Most of 
its behavior is the same between nominal and representational roles, so it 
seems silly (though very possible) to specialize the code to nominal to keep 
that path fast. Instead, I identified one key spot and made that go fast. 

Thus, there is a 7% bump to memory usage on very-type-family-heavy code, 
compared to before my commit on Friday. (On more ordinary code, there is no 
noticeable change.)

Validating my patch locally now; will push when that's done.

Thanks,
Richard

On Dec 16, 2014, at 10:41 AM, Joachim Breitner <m...@joachim-breitner.de> wrote:

> Hi,
> 
> 
> Am Dienstag, den 16.12.2014, 09:59 -0500 schrieb Richard Eisenberg:
>> On Dec 16, 2014, at 4:01 AM, Joachim Breitner <m...@joachim-breitner.de> 
>> wrote:
>> 
>>> another guess (without looking at the code, sorry): Are they in the same
>>> module? I.e., can GHC specialize the code to your particular Monad?
> 
>> No, they're not in the same module. I could also try moving the
>> zipWithAndUnzipM function to the same module, and even specializing it
>> by hand to the right monad.
> 
> I did mean zipWithAndUnzipM, so maybe yes: Try that.
> 
> (I find it hard to believe that any polymorphic monadic code should
> perform well, with those many calls to an unknown (>>=) with a function
> parameter, but maybe I’m too pessimistic here.)
> 
> 
>> Could that be preventing the fusing?
> 
> There is not going to be any fusing here, at least not list fusion; that
> would require your code to be written in terms of functions with fusion
> rules.
> 
> Greetings,
> Joachim
> 
> -- 
> Joachim “nomeata” Breitner
>  m...@joachim-breitner.de • http://www.joachim-breitner.de/
>  Jabber: nome...@joachim-breitner.de  • GPG-Key: 0xF0FBF51F
>  Debian Developer: nome...@debian.org
> 
> _______________________________________________
> ghc-devs mailing list
> ghc-devs@haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs

_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://www.haskell.org/mailman/listinfo/ghc-devs

Reply via email to