[Haskell-cafe] Avoiding name collisions by using value spaces instead of modules

Brian Hulley Sun, 08 Jan 2006 05:05:01 -0800

Hi -

A main problem I've found with Haskell is that within a module, too manythings are put into the same scope. For example


   data Tree a b = Leaf a | Node {elem::b, lhs::Tree a b, rhs::Tree a b}

puts Leaf, Node, elem, lhs, and rhs, into the module's namespace,
as well as Tree. This means that if you want another kind of tree in
the same module you've got to try and think of another word for Leaf,
or else prefix every value constructor with some prefix of the tycon to
try and prevent name collisions.

I propose that the above declaration should introduce a *new module* Tree,as a sub module of the containing module, and Leaf, Node, elem etc will beput into this module, and not the module containing the data declarationitself.

Identifiers in this child module could then either be used qualified orunqualified as long as they don't conflict with anything in the parent(containing) module, thus in the parent module, we could have:


   a = Tree.Leaf 3
   b = Node{elem = 6, lhs = Leaf 2, rhs = Leaf 3}
   c = Tree.elem b
   c = b^elem

where ^ is just a helpful sugar which binds tighter than functionapplication and can be used anywhere to allow an "object oriented" way ofthinking where the first arg is put on the left of the function (more onthis later).

Also, we could generalise the GHC GADT style data declarations by allowingany declarations to appear in the where part, and allow modules to consistof a single data declaration (ie begin with the keyword data instead ofmodule) so that the contents of the Data.Set module would instead besomething like:


   data Ord a => Set a = .... where
       insert :: Set a -> a -> Set a
       ...

This would allow Set to be imported qualified into other modules where wecould refer to the Set type by the single word Set instead of Set.Set (whichI always think looks very strange indeed) ie


   module M where
       import qualified Data.Set as Set

       f :: a -> Set a            -- no longer need to write a -> Set.Set a

In fact, the whole complexity asociated with hiding components of a moduleand allowing unqualified imports could be discarded, since the correctresolution for each identifier would be determined by its typed context,just as in OOP the resolution is determined by the type of the object.

So to summarise so far, for any value f :: T0 -> T1 -> ... -> Tn, weconstruct the tuple (Q0, Q1, ... Qn) where for each i, Qi is obtained fromTi by ignoring all predicates and quantification and replacing each tyvar bythe symbol '?'.


For example, for

module M where
foo :: forall b. Eq a =>  Set (Tree a) -> Tree ([a],b) -> [Tree (a,b)]

we ignore the forall and Eq, and replace tyvars and tuple the results toget:


  M.(Set (Tree ?), Tree ( (,) ( [] ?) ?), [] (Tree ( (,) ? ?))).foo

as the fully qualified reference to the entity we've just declared.

If we call the tuple (Set, ...) the "ty-tuple", which specifies a space ofvalues (ie something that each value is considered to "belong" to - theanalogue of the object in OOP), then we can define a projection from a spaceinto a containing space as follows:

1) Let Qi == Ui Si0 Si1 ... Sik be any element of a ty-tuple P. Then we canform a new ty-tuple P' by replacing Qi by Ui.


2) Let Qi be any element of a ty-tuple P of arity n+1. Then we can form
P' == (Q0, ..., Qi-1, Qi+1, ... Qn) of arity n

In both cases, we can use P' to qualify the identifier to get a valuereference as long as this still has enough information for disambiguatingother uses of the identifier in M.


Applying these transformations of the ty-tuple to the example above,
any of the following could be used as a partial qualification of foo in M:

   (Set Tree, Tree ([],?), [Tree]).foo
   (Set, Tree, []).foo
   Tree.foo
   foo

Rule 2 allows us to propagate identifiers back up to their ancestor modulesas long as there are no conflicts along the way.

The type inference algorithm could then be modified (I hope) to use thetop-level annotation for a value declaration to determine the entities thatidentifiers refer to, by searching in the appropriate value spacesdetermined by the types of the args and return value.


   label :: Tree a -> Int -> (Tree a, Int)
   label (Leaf x) i = (Leaf i, i+1)

The identifier Leaf is resolved in the current namespace augmented by thenamespace for module Tree, thus we don't have to explicitly write Tree.Leafeven though the declaration of label occurs outside the Tree module itself.

(I suggest that top-level type annotations should be mandatory since withoutthem one just drowns in a sea of TI confusion when there is a type error. Bymaking them mandatory the TI algorithm could make real use of them toresolve identifier bindings as in OOP)


Similarly:

   foo :: a -> Set a
   foo x = singleton x

There is no need to say Set.singleton x, because we know that the result offoo has type Set a therefore we search in the current module augmented bythe contents of the Set module (every type is also a module) for a bindingfor "singleton" which maps from a to Set a.

Finally (apologies for this long post), returning to the use of ^ to allowan object oriented way of thinking consider:


   insert :: a -> Set a -> Set a
   ps = singleton 3
   qs = insert 4 ps
   rs = ps^insert 4

When resolving "insert" used in the binding for rs, the compiler should seethat we are looking for some function Set Int -> Int -> Set Int, and hencewill be looking in the current module augmented by the Set module. Howeverthe Set module only has a binding for insert with type a -> Set a -> Set a.So the compiler should generate a new function insert' from insert by movingthe first Set a arg to the front.


Summary:

1) Every value binding belongs to a value space represented by a ty-tuplewhich abstracts away from the details of the actual type

2) The space of ty-tuples forms a lattice, and in particular this meansqualification is optional when there are no conflicts (we can use Leaf 3instead of Tree.Leaf 3), and we can use partial qualification to supply justthe disambiguation info needed.

3) The TI algorithm should use the type of the expression (generatedtop-down from the top-level annotation) to search in the correct valuespaces to resolve identifiers

4) Record field selection doesn't need any special scoping rules, and isjust one example of a general object-oriented way of thinking about functionapplication.

5) We can get all the advantages of automatic namespace management the OOPprogrammers take for granted, in functional programming, by using valuespaces as the analogue of objects, and can thereby get rid of complicatedimport/export directives and improve code readability with less typing (butmaking more use of typing - excuse the pun :-))

I'm in the process of trying to implement a pre-processor for Haskell whichwould use the algorithms sketched above (as well as fixing a few otherthings I think are wrong with Haskell such as the way the current layoutrule allows one to write code that would break if you replaced an identifierwith one that had a different number of characters in it etc), so anyfeedback on these ideas would be welcome.


Thanks for reading so far!

Brian Hulley

PS Everything above is purely intended for resolving the kind of ad-hocoverloading that is not amenable to treatment by type classes (whichtransforms some subset of the ad-hoc-overloaded functions which share thesame shape into a single function with implicit dictionary arguments (in theusual implementation)).


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Avoiding name collisions by using value spaces instead of modules

Reply via email to