Stupid Haskell question

Guy Steele Mon, 22 Feb 93 13:42:18 EST
   Date: Mon, 22 Feb 93 14:28:47 GMT
   From: wadler <[EMAIL PROTECTED]>

   Guy asks the following (non-stupid) Haskell question, which I reply to
   below.  The question points out an area in the Haskell report that
   seems to be unclear; and a place where it might be worthwhile to change
   the design to be less conservative but more uniform.

   Guy's question:


   ----- Begin Included Message -----

   >From [EMAIL PROTECTED] Thu Feb 18 17:21:56 1993
   From: Guy Steele <[EMAIL PROTECTED]>
   Date: Thu, 18 Feb 93 12:20:44 EST
   To: wadler <[EMAIL PROTECTED]>
   Cc: [EMAIL PROTECTED]
   Subject: Re: Stupid Haskell question
   Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], 
       [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]


   Haskell theoretically allows recursive datatypes.  But the following
   example does not work (he said innocently).

       module Rec where

       data Unary a = Zero | Successor a

       f :: Unary z -> [Unary z]
       f x = [x, Successor x]

   I think that the compiler ought to deduce the restriction
   x::q  where  q = Unary q.  It ought to be okay for q to be Unary q
   because "an algebraic datatype intervenes" (Haskell report, 4.2.2).

   But the Glasgow compiler says

       "/users/lang1/gls/Haskell/monads/Rec.hs", line 6:
           Type variable "a" occurs within the type "Unary a".
           In a list expression: [x, Successor x]

   and the Chalmers compiler says

       Errors:
       "/users/lang1/gls/Haskell/monads/Rec.hs", line 6, [63] unify1 (occurence)
           a
       and Unary a
        in  (:) A1_f ((:) (Successor A1_f) ([]))
        in f

   Now everything is okay if I write

       module Rec where

       data Unary a = Zero | Successor (Unary a)

       f :: Unary z -> [Unary z]
       f x = [x, Successor x]

   but I have reasons in my actual code (which is hairy--this is a
   stripped-down example) not to force the data type to be recursive,
   but to let the type analysis deduce it where necessary.  Am I foolish
   to expect this?

   --Guy


   ----- End Included Message -----

   Phil's response:

   Guy,

   Haskell requires that `an algebraic datatype intervenes' in order that
   all types can be written as a finite tree.  The type you refer to,

           q where q = Unary q,

   is an infinite tree (though a finite graph).  If we intended to allow
   such infinite solutions, we wouldn't need the restriction to algebraic
   datatypes at all.  This suggests we should clear up the wording in the
   Haskell report, so I've forwarded your question and this response to
   the Haskell mailing list.

   Why not allow cyclic types (i.e., any type expressible as a
   finite graph)?  It turns out there is a unification algorithm
   that works for finite graphs, so this is in theory possible.
   But the intent of Haskell was to be a conservative design, so
   we stuck with what we were familiar with.  Your example of
   a place where cyclic types are useful provides an impetus
   to step into the less familiar but more uniform territory.

Phil,

  Thanks for the reply.  I do think some clarification in the
report is called for because I suspect that there are some
implicit implementation-motivated assumptions about how types are
processed.  For example, I suspect there is an assumption that
type synonyms may be handled simply by substitution (inline
expansion), so to speak, whereas algebraic data types are not.
(If algebraic data types were substitutive, then recursive
algebraic types would be expressed as infinite trees after all.)

  So it is not so much that cyclic graphs are disallowed, as that
the user is notationally required to indicate where the graph is
to be cut so as to render it acyclic (and therefore notatable as a
tree, since sharing in a acyclic dag is irrelevant here).  This,
plus the inability to explicit construct type declarations
dynamically, means that all cycles in a type graph must be of
fixed size, determined by the static program text.  And this is
what is getting in my way.

  Here is a less stripped-down version of what I am trying
to accomplish.  I want a data structure that is a tree:

  data FooTree a = Leaf a | Node (FooTree a) (FooTree a)

But I want to be able to annotate these trees in various ways.
So I wrap an annotation structure around the recursion:

  data FooTree a = Leaf a | Node (FooTree (Wrapper a)) (FooTree (Wrapper a))

  data Wrapper a = Annotation a Int

Now sometimes I need different kinds of trees that have different
annotations.  But the basic tree structure is common to them all.
So I want to abstract over the annotation structures.  This leads me
to write:

  data FooTree a = Leaf a | Node t t

where my intent is that "t" is some complex type that may
eventually be discovered to involve "FooTree a".  Unfortunately,
"t" is not allowed on the right-hand side unless it appears on the
left-hand side.  (For a while, I thought that the existential
types in the Chalmers compiler would give me a way out.  But the
existential types must "escape", which is forbidden in the current
Chalmers implementation.)  So I must write

  data FooTree t a = Leaf a | Node t t

And this is rejected by the "occurs check".

  What I want is a tree-walking routine

  treewalk f tree = ...

where "f" knows how to process annotations.  Given something of type "t",
it can walk through the annotations and find the subnode of type FooTree,
which is then handed back to "treewalk" for recursive processing.
I want treewalk to be polymorphic over different kinds of trees and
thus over different routines "f"; the type of "f" at the call site bears
the necessary information to deduce the overall type of the tree data
structure.  So, depending on "f", the tree data structure might be

  data FooTree a = Leaf a | Node (FooTree (Wrapper1 a))
                                 (FooTree (Wrapper1 a))

  data FooTree a = Leaf a | Node (FooTree (Wrapper2 a))
                                 (FooTree (Wrapper2 a))

  data FooTree a = Leaf a | Node (FooTree (Wrapper1 (Wrapper2 a)))
                                 (FooTree (Wrapper1 (Wrapper2 a)))

  data FooTree a = Leaf a | Node (FooTree (Wrapper3 (Wrapper2 a)))
                                 (FooTree (Wrapper3 (Wrapper2 a)))

and so on.  Any one of these is legitimate when written out explicitly;
but I want the treewalk routine to be polymorphic over all of them.
Moreover, it is impractical to list out all possible combinations
explicitly, partly because of the large number of annotation types
and partly because I want the set of annotations to be open-ended.

  So I was genuinely puzzled when the Haskell type-checker did not
figure this out, as indeed an algebraic data type did "intervene"
in every cycle, in some sense.  When the type checker encounters
a unification of x with t(x), it would seem to be a simple exercise
to invent a new name for t(x), certainly when the outermost type name
in the expression t(x) is algebraic.

  I suppose I am asking the type checker to automatically discover
restrictions to certain fixpoints:  when it discovers that

        t = FooTree t a

this should be unified as something like

        mu t. FooTree t a

but of course "mu" as such is not part of the Haskell type system,
although some forms of recursive reference (explicitly written-out
algebraic data types) are.  Ah, well.

  Thanks for hearing me out.

--Guy
Stupid Haskell question

Reply via email to