Dan Weston wrote:
So to be clear with the terminology:

inductive   = good consumer?
coinductive = good producer?

So fusion should be possible (automatically? or do I need a GHC rule?) with
  inductive . coinductive

Or have I bungled it?

Not quite. Induction means starting from base cases and building things upwards from those. Coinduction is the dual and can be thought of as starting from the ceiling and building your way downwards (until you hit the base cases, or possibly forever).

So, if you have potentially infinite data (aka co-data) coming in, then you need to use coinduction because you may never see the basis but you want to make progress anyways. In formal terms, coinduction on co-data gives the same progress guarantees as induction on data, though termination is no longer a conclusion of progress (since coinduction may produce an infinite output from infinite input).

Haskell doesn't distinguish data and co-data, but you can imagine data as if all the data constructors are strict, and co-data as if all the constructors are lazy. Another way to think of it is that finite lists (ala OCaml and SML) are data, but streams are co-data.



For fusion there's the build/fold type and its dual unfold/destroy, where build/unfold are producers and fold/destroy are consumers. To understand how fusion works, let's look at the types of build and fold.

    GHC.Exts.build      :: (forall b. (a -> b -> b) -> b -> b) -> [a]
    flip (flip . foldr) :: [a] -> ( (a -> b -> b) -> b -> b )

Together they give an isomorphism between lists as an ADT [a] and as a catamorphism (forall b. (a -> b -> b) -> b -> b), aka Church encoding. When we have build followed by foldr, we can remove the intermediate list and pass the F-algebra down directly:

    foldr cons nil (build k) = k cons nil

For unfold/destroy fusion the idea is the same except that we use unfold (an anamorphism on the greatest fixed point) instead of fold (a catamorphism on the least fixed point). The two fixed points coincide in Haskell.

Since Haskell does build/fold fusion, "good producer" requires that the function was written using build, and "good consumer" requires it's written using foldr. Using these functions allows us to apply the rule, though it's not sufficient for "good fusion". Why the functions have the particular types they do and why this is safe has to do with induction and coinduction, but the relationship isn't direct.



The reason a coinductive function is easy to make into a good producer has to do with that relationship. Take a canonically coinductive function like

    f []     = []
    f (x:xs) = x : f xs

Once we've made one step of recursion, we've generated (x:) and then have a thunk for recursing. Most importantly is that no matter how we evaluate the rest of the list, the head of the return value is already known to be (:) thus we can get to WHNF after one step. Whatever function is consuming this output can then take x and do whatever with it, and then pull on f xs which then takes a single step and returns (x':) along with a thunk f xs'. Because all of those (:) are being produced immediately, it's easy to abstract it out as a functional argument--- thus we can use build.

Coinduction doesn't need to do 1-to-1 mapping of input to output, there just needs to be the guarantee that we only need to read a finite amount of input before producing a non-zero finite amount of output. These functions are also coinductive:

    p []       = []
    p [x]      = [x]
    p (x:y:ys) = y : x : p ys

    q []       = []
    q [x]      = []
    q (x:y:ys) = y : q ys

    r []     = []
    r (x:xs) = x : x : r xs

They can also be written using build, though they're chunkier about reading input or producing output. These functions are not coinductive because there's no finite bound on how long it takes to reach WHNF:

    bad []     = []
    bad (x:xs) = bad xs

    reverse []     = []
    reverse (x:xs) = reverse xs ++ [x]

Because build/fold is an isomorphism, we can technically use build for writing *any* function that produces a list. However, there's more to fusion than just using the build/fold isomorphism. The big idea behind it all is that when producers and consumers are in 1-to-1 correlation, then we can avoid allocating that 1 (the cons cell) and can just pass the arguments of the constructor directly to the consumer. For example:

    let buildF []     = []
        buildF (x:xs) = x : buildF xs

        consumeF []     = 0
        consumeF (x:xs) = 1 + consumeF xs
    in
        consumeF . buildF
==
    let buildF = \xs -> build (f xs)
            where
            f []     cons nil = nil
            f (x:xs) cons nil = x `cons` f xs cons nil

        consumeF = foldr consumeCons consumeNil
            where
            consumeNil       = 0
            consumeCons x rs = 1 + rs
    in
        consumeF . buildF
==
    let f []     cons nil = nil
        f (x:xs) cons nil = x `cons` f xs cons nil

        consumeNil       = 0
        consumeCons x rs = 1 + rs
    in
        foldr consumeCons consumeNil . \xs -> build (f xs)
==
    let... in
        \xs -> foldr consumeCons consumeNil (build (f xs))
==
    let... in
        \xs -> (f xs) consumeCons consumeNil

And now f never allocates any (:) or [], it just calls the two consumers directly. The first step of choosing to use build and foldr instead of primitive recursion is what enables the compiler to automatically do all the other steps.

Leaving it at that is cute since we can avoid allocating the list, however, due to laziness we may still end up allocating a spine of calls to consumeCons, which isn't much better than a spine of calls to (:). This is why "good producers" are ones which are capable of producing a single cons at a time, they never construct a spine before it is needed by the consumer. And this is why "good consumers" are ones which are capable of consuming a single cons at a time, they never force the production of a spine without immediately consuming it. We can relax this goodness from 1-to-1 to chunkier things, but that also reduces the benefits of fusion.



All of this can be generalized to other types besides lists, of course.

--
Live well,
~wren
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to