Hi all, It looks like there's been a bit of recent discussion regarding module and package namespaces. There is a certain possible design feature that I don't think has been mentioned yet, that I think would be very helpful, so I thought I should at least bring it up.
What I want is to be able to build a module namespace for a program out of packages in much the same way that filesystem namespaces are built, namely with mounting operations, rather than just by "union" or "overlay" operations as in the status quo. In other words I would like to be able to specify along with the "-package" option a "mount point" for that package in the module namespace. One possible option syntax might be e.g. "-package my-graphics-lib -package-base Graphics.UI.MyGraphicsLib". (Also, for backward compatibility and convenience, packages should probably be able to specify a default "mount point", to allow existing compiler command-line syntax to be used.) The idea is that with such a feature, library packages could get rid of the common module path prefixes which currently must be specified in every module in the library (such as "Graphics.UI.MyGraphicsLib" above). These prefixes would instead be specified once by each user of the library package (unless the default was desired), perhaps after the package import option on the compiler command line. Modules would have simple unqualified names within the library, like "Button" or "Window" which, if the package mount point were specified as say "Graphics.UI.MyGraphicsLib" in a certain compiler invocation, would be mapped to "Graphics.UI.MyGraphicsLib.Button" and "Graphics.UI.MyGraphicsLib.Window" respectively for code compiled by that invocation. But they could just as easily be mapped to "MGL.Button" etc. in a different invocation in a different project if a different mount point were preferred or were necessary to eliminate a namespace collision. There would be many benefits to being able to do things this way. First, developers would be able to move shared code across libraries without having to worry about the need to make widespread trivial changes to reflect the new module names. I could copy a 'Debug' or 'Util' module into my library from another library, and not have to go through the code to update the module hierarchy base location - furthermore I could incorporate new upstream changes easily without having to repeat this menial fixing-up procedure each time. While it's true that new version control systems like 'darcs' are meant to handle search-and-replace style changes effectively, I think that as far as this issue goes, a VC-based solution would be less elegant and less usable than what I am proposing. Second, this would decouple some aspects of the design process that in my opinion shouldn't be coupled. I would be able to start writing a library before deciding on a name, for instance - currently I at least have to stick in a dummy name as the module namespace base to avoid potential conflicts with other library imports while testing. But under this proposal I could just concentrate on building interior, bottom-up functionality first - at the end of the process a certain set of the package modules would be marked for external visibility, would comprise the exterior interface, and would suggest to me a fitting package name. Setting this name would only involve touching the cabal file rather than every single source file in my library. This would also make it easier to merge and split packages. Third, it would encourage the use of lightweight modules, by reducing the maintenance overhead of each module. Currently modules are the only way (correct me) to partition parts of the top-level namespace of a program - this is OK except that especially in libraries each module contains a certain amount of administrative paperwork, which is to say that it has to know the name of the library that contains it, because that, or some form of it, has to be part of the module name; and other importers of the module have to specify this information too; and as argued above there is a little work involved in touching up these references when code moves between libraries or when the library name changes. As a result I think people end up sticking more code in the same module at times when multiple modules would have been otherwise more suitable. Fourth, I think there would be psychological benefits. I think it's a bit patronizing to the programmer that he has to pretend to remind himself "you are in the following package" at the top of each file. I think people can easily enough keep track of that amount of state. It's as if the building code required me to put a sign with the current city and country in each room of my house. These are bits of context that I can easily call to mind if necessary, but which I would sometimes like to temporarily forget about. I believe programming is somewhat the same. We've come a long way from languages like C where one has to decide whether to precede each symbol in a library with an otherwise-meaningless identifier like "Py_", or risk namespace collisions with other libraries - but I don't think we're at the end of the road yet. It's true that if a module occurs in a package then that is its package, but often the package name doesn't do anything to suggest what the module functionality is - maybe the module exists only because other package modules depend on it, or maybe the package provides an assortment of otherwise unrelated functionality in its modules. In other words, the package name may describe something that the module functionality is only *applicable* to, or it may just be a catchy name. Being able to leave out this not-quite-relevant piece of information would make package-distributed code more conceptually streamlined, easier to quote out of context in e.g. papers, etc. Haskell is really a beautiful language - it's very dense and I think it is one of its great advantages that it allows the programmer to eliminate from code almost all save that which is absolutely relevant to its functionality - this proposal would take the language further in that direction. There is a strong tradition in science to put things in taxonomies, in static hierarchies, and people have tried to do this with collections of code libraries too, perhaps in imitation of scientists. One thing to note however is that the things from the natural world such as the genes of biological organisms change a lot more slowly than man-made code does. Science is different from engineering. A related reason that language designers may be drawn to requiring users to participate in universal classifications is that doing so projects an artificial aura of stability and organization onto the evolving code situation. But these designers, in shading themselves from progress, also stifle it. They create a central administrative hoop which couples all packages and impacts the scalability of the collective development effort. What I'm proposing would be a big departure from the practice of languages like Perl and Java that demand such a global module hierarchy. I've been told that the Haskell community is trying to make it so that two packages can have modules of the same name, as long as they aren't imported in the same compilation unit. My proposal would go further by (1) removing the latter restriction (2) allowing the package code to be completely ignorant of any "mount point". By the way, if you look at some aspects of operating system interfaces I think you'll see that often the choice I'm suggesting has already been made. For instance, you don't have to specify your current working directory with every command you execute, and furthermore the same command can easily be used in different working directories without modification; you can install binaries at different locations in the filesystem; you can mount filesystems at different mount-points, etc. One further thing, there have been proposals to simplify the importing of collections of modules from a certain point in the namespace, etc. I hope it is realized that they are independent from my proposal. They would not be very useful in implementing my proposal, at least I think any such solution would be far from optimal; and vice-versa. Modules and packages are quite distinct constructs, modules are needed for namespace partitioning and packages are needed to delineate administrative boundaries and sources of change. Both are necessary and both deserve special consideration in the ongoing design of Haskell. I will not be surprised if it seems strange to people that I attach such importance to what is likely seen as an unimportant detail of the language, but I do, and I hope that people will consider my suggestion. Also, I haven't said anything about implementation. I realize that this would probably require some modification to the linker. I hope I'm correct in assuming that the modifications will be relatively easy to make, provided it turns out of course that this feature is really something that people want. Frederik P.S. Thanks to John Meacham for a useful discussion. -- http://ofb.net/~frederik/ _______________________________________________ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell