Hi -
A main problem I've found with Haskell is that within a module, too many things are put into the same scope. For example

   data Tree a b = Leaf a | Node {elem::b, lhs::Tree a b, rhs::Tree a b}

puts Leaf, Node, elem, lhs, and rhs, into the module's namespace,
as well as Tree. This means that if you want another kind of tree in
the same module you've got to try and think of another word for Leaf,
or else prefix every value constructor with some prefix of the tycon to
try and prevent name collisions.

I propose that the above declaration should introduce a *new module* Tree, as a sub module of the containing module, and Leaf, Node, elem etc will be put into this module, and not the module containing the data declaration itself.

Identifiers in this child module could then either be used qualified or unqualified as long as they don't conflict with anything in the parent (containing) module, thus in the parent module, we could have:

   a = Tree.Leaf 3
   b = Node{elem = 6, lhs = Leaf 2, rhs = Leaf 3}
   c = Tree.elem b
   c = b^elem

where ^ is just a helpful sugar which binds tighter than function application and can be used anywhere to allow an "object oriented" way of thinking where the first arg is put on the left of the function (more on this later).

Also, we could generalise the GHC GADT style data declarations by allowing any declarations to appear in the where part, and allow modules to consist of a single data declaration (ie begin with the keyword data instead of module) so that the contents of the Data.Set module would instead be something like:

   data Ord a => Set a = .... where
       insert :: Set a -> a -> Set a
       ...

This would allow Set to be imported qualified into other modules where we could refer to the Set type by the single word Set instead of Set.Set (which I always think looks very strange indeed) ie

   module M where
       import qualified Data.Set as Set

       f :: a -> Set a            -- no longer need to write a -> Set.Set a

In fact, the whole complexity asociated with hiding components of a module and allowing unqualified imports could be discarded, since the correct resolution for each identifier would be determined by its typed context, just as in OOP the resolution is determined by the type of the object.

So to summarise so far, for any value f :: T0 -> T1 -> ... -> Tn, we construct the tuple (Q0, Q1, ... Qn) where for each i, Qi is obtained from Ti by ignoring all predicates and quantification and replacing each tyvar by the symbol '?'.

For example, for

module M where
foo :: forall b. Eq a =>  Set (Tree a) -> Tree ([a],b) -> [Tree (a,b)]

we ignore the forall and Eq, and replace tyvars and tuple the results to get:

  M.(Set (Tree ?), Tree ( (,) ( [] ?) ?), [] (Tree ( (,) ? ?))).foo

as the fully qualified reference to the entity we've just declared.

If we call the tuple (Set, ...) the "ty-tuple", which specifies a space of values (ie something that each value is considered to "belong" to - the analogue of the object in OOP), then we can define a projection from a space into a containing space as follows:

1) Let Qi == Ui Si0 Si1 ... Sik be any element of a ty-tuple P. Then we can form a new ty-tuple P' by replacing Qi by Ui.

2) Let Qi be any element of a ty-tuple P of arity n+1. Then we can form
P' == (Q0, ..., Qi-1, Qi+1, ... Qn) of arity n

In both cases, we can use P' to qualify the identifier to get a value reference as long as this still has enough information for disambiguating other uses of the identifier in M.

Applying these transformations of the ty-tuple to the example above,
any of the following could be used as a partial qualification of foo in M:

   (Set Tree, Tree ([],?), [Tree]).foo
   (Set, Tree, []).foo
   Tree.foo
   foo

Rule 2 allows us to propagate identifiers back up to their ancestor modules as long as there are no conflicts along the way.

The type inference algorithm could then be modified (I hope) to use the top-level annotation for a value declaration to determine the entities that identifiers refer to, by searching in the appropriate value spaces determined by the types of the args and return value.

   label :: Tree a -> Int -> (Tree a, Int)
   label (Leaf x) i = (Leaf i, i+1)

The identifier Leaf is resolved in the current namespace augmented by the namespace for module Tree, thus we don't have to explicitly write Tree.Leaf even though the declaration of label occurs outside the Tree module itself.

(I suggest that top-level type annotations should be mandatory since without them one just drowns in a sea of TI confusion when there is a type error. By making them mandatory the TI algorithm could make real use of them to resolve identifier bindings as in OOP)

Similarly:

   foo :: a -> Set a
   foo x = singleton x

There is no need to say Set.singleton x, because we know that the result of foo has type Set a therefore we search in the current module augmented by the contents of the Set module (every type is also a module) for a binding for "singleton" which maps from a to Set a.

Finally (apologies for this long post), returning to the use of ^ to allow an object oriented way of thinking consider:

   insert :: a -> Set a -> Set a
   ps = singleton 3
   qs = insert 4 ps
   rs = ps^insert 4

When resolving "insert" used in the binding for rs, the compiler should see that we are looking for some function Set Int -> Int -> Set Int, and hence will be looking in the current module augmented by the Set module. However the Set module only has a binding for insert with type a -> Set a -> Set a. So the compiler should generate a new function insert' from insert by moving the first Set a arg to the front.

Summary:

1) Every value binding belongs to a value space represented by a ty-tuple which abstracts away from the details of the actual type

2) The space of ty-tuples forms a lattice, and in particular this means qualification is optional when there are no conflicts (we can use Leaf 3 instead of Tree.Leaf 3), and we can use partial qualification to supply just the disambiguation info needed.

3) The TI algorithm should use the type of the expression (generated top-down from the top-level annotation) to search in the correct value spaces to resolve identifiers

4) Record field selection doesn't need any special scoping rules, and is just one example of a general object-oriented way of thinking about function application.

5) We can get all the advantages of automatic namespace management the OOP programmers take for granted, in functional programming, by using value spaces as the analogue of objects, and can thereby get rid of complicated import/export directives and improve code readability with less typing (but making more use of typing - excuse the pun :-))

I'm in the process of trying to implement a pre-processor for Haskell which would use the algorithms sketched above (as well as fixing a few other things I think are wrong with Haskell such as the way the current layout rule allows one to write code that would break if you replaced an identifier with one that had a different number of characters in it etc), so any feedback on these ideas would be welcome.

Thanks for reading so far!

Brian Hulley

PS Everything above is purely intended for resolving the kind of ad-hoc overloading that is not amenable to treatment by type classes (which transforms some subset of the ad-hoc-overloaded functions which share the same shape into a single function with implicit dictionary arguments (in the usual implementation)).

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to