Hi all, I spent some time today looking into the performance of a program involving a parser type that looks something like this:
data AnnotatedParser a = AnnotatedParser { annotation :: Annotation , parser :: String -> (Maybe a, String) } The `Annotation` records metadata about the structure of an `AnnotatedParser` that can be accessed statically (that is, without having to run the parser on some input). `AnnotatedParser`s are built from various primitive constructors and composed using various combinators. These combinators end up looking something like this: (<+>) :: AnnotatedParser a -> AnnotatedParser b -> AnnotatedParser (a, b) AnnotatedParser ann1 f <+> AnnotatedParser ann2 g = AnnotatedParser { annotation = Seq ann1 ann2 , parser = \s1 -> let !(a, s2) = f s1 !(b, s3) = g s2 in ((,) <$> a <*> b, s3) } Use of these combinators leads to the construction and subsequent case analysis of numerous `AnnotatedParser` closures. Happily, constructed product result[1] analysis kicks in and rewrites such combinators to cut down on the needless boxing, leading to worker/wrapper splits like this: $w<+> :: Annotation -> (String -> (Maybe a, String)) -> Annotation -> (String -> (Maybe b, String)) -> (# Annotation, String -> (Maybe (a, b), String) #) $w<+> ann1 f ann2 g = (# Seq ann1 ann2 , \s1 -> let !(a, s2) = f s1 !(b, s3) = g s2 in ((,) <$> a <*> b, s3) #) <+> :: AnnotatedParser a -> AnnotatedParser b -> AnnotatedParser (a, b) <+> (AnnotatedParser ann1 f) (AnnotatedParser ann2 g) = case $w<+> ann1 f ann2 g of (# a, b #) -> AnnotatedParser a b {-# INLINE <+> #-} This is great, and it cuts down on allocation significantly, but there is still something unsatisfying about it: the `parser` function inside the record is not affected by CPR! This is a shame, because essentially all use sites immediately deconstruct the pair, making it a prime candidate for unboxing. Ideally, we’d like to get this, instead: $w<+> :: Annotation -> (String -> (Maybe a, String)) -> Annotation -> (String -> (Maybe b, String)) -> (# Annotation, String -> (# Maybe (a, b), String #) #) $w<+> ann1 f ann2 g = (# Seq ann1 ann2 , \s1 -> let !(a, s2) = f s1 !(b, s3) = g s2 in (# (,) <$> a <*> b, s3 #) #) In practice, little combinators like `$w<+>` are marked INLINE, so `f` and `g` are usually known rather than unknown calls. This nested CPR transformation would allow the tuple construction to fuse with the tuple deconstruction, eliminating quite a lot of unnecessary boxing/unboxing. Unfortunately, it seems as though GHC’s implementation of CPR is entirely first-order: although function arguments are given rich demand signatures, results are only described one level deep. But as the above example hopefully illustrates, that’s leaving significant optimization opportunities on the table! Hence, my questions: 1. Has this notion of “nested CPR” been explored at all before? 2. Does such an extension to CPR sound worth its weight? I peeked a little at GHC.Types.Cpr and GHC.Core.Opt.CprAnal, and it seems quite manageable to me, but I haven’t actually looked into an implementation attempt just yet. I’m mostly interested in whether others have thought about something like this and/or run into similar issues in the past, or if this is really an unusual construction. Thanks, Alexis [1]: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/cpr.pdf _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs