Re: [Haskell-cafe] Correspondence between libraries and modules
On Wed, May 23, 2012 at 9:44 PM, Gregg Lebovitz glebov...@gmail.com wrote: Rustom, I am drafting a document that captures some of the social norms from the comments on this list, mostly from Brandon and Wren. I have captured the discussion about module namespace and am sorting out the comments on the relationship between libraries and packages. My initial question to the list was to try an identify where Haskell is different from other open source distributions. From what I can tell, the issues are very similar. The module name space seems to have characteristics very similar to the include file hierarchy of linux distributions. If you have some spare cycles and would like to contribute, I think everyone would appreciate your help and effort Gregg Hi Gregg. One of the common complaints one gets from a first year programming student (and its now about 3 decades I dealing with these!) is: The compiler/interpreter etc HAS a BUG!!! So... While I am an old geezer with programming and functional programming -- doing, teaching, playing, implementing, or just plain thinking but -- I am too much of a noob to ghc to risk falling into the 1st year student trap above. Yes perhaps not a typical noob... Somethings are easier for me than the typical noob -- all the 'classical' good-stuff like pattern-matching, lambda-calculus, type-inferencing, polymorphism etc. And this is helpful to understand the 'modern good stuff' starting monads and onwards But then I get hit -- finding my way round hackage, installing with cabal etc -- even tho I'm an ol-time unix hacker and sysadmin-er. So I guess its best to assume (as of now) that I dont know the ropes rather than something is wrong/broken with them. O well... If the noob trap is one error playing it safe is probably another so here goes with me saying things that I (probably) know nothing about: 1. cabal was a beautiful system 10 years ago. Now its being forcibly scaled up 2 (3?) orders of magnitude and is creaking at the seams 2. There's too much conflicting suggestions out there on the web for a noob - use system install (eg apt-get) or use cabal - cabal in user area or system area etc - the problem is exponentiated by the absence of cabal uninstall ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Correspondence between libraries and modules
Rustom: O well... If the noob trap is one error playing it safe is probably another so here goes with me saying things that I (probably) know nothing about: 1. cabal was a beautiful system 10 years ago. Now its being forcibly scaled up 2 (3?) orders of magnitude and is creaking at the seams The problem is, Cabal is not a package management system. The name gives it away: it is the Common Architecture for *Building* Applications and Libraries. Cabal is to Haskell how GNU autotools + make is to C: a thin wrapper that checks for dependencies and invokes the compiler. All that boring not-making-your-package-break-everything-else stuff belongs to the distribution maintainer, not Hackage and Cabal. 2. There's too much conflicting suggestions out there on the web for a noob - use system install (eg apt-get) or use cabal Use apt-get. Your distribution packages are usually new enough, have been tested thoroughly, and most importantly, do not conflict with each other. - cabal in user area or system area etc Installing with --user is usually the best, since they won't clobber system packages and if^H^Hwhen they do go wrong, you can simply rm -r ~/.ghc. For actual coding, it's better to use a sandboxing tool such as [cabal-dev][] instead. [cabal-dev]: http://hackage.haskell.org/package/cabal-dev - the problem is exponentiated by the absence of cabal uninstall See above. By the way, someone else a whole article about it: https://ivanmiljenovic.wordpress.com/2010/03/15/repeat-after-me-cabal-is-not-a-package-manager/ Hope that clears it up for you. Chris ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Correspondence between libraries and modules
On Tue, Apr 24, 2012 at 7:29 PM, Gregg Lebovitz glebov...@gmail.com wrote: On 4/23/2012 10:17 PM, Brandon Allbery wrote: On Mon, Apr 23, 2012 at 17:16, Gregg Lebovitz glebov...@gmail.comwrote: On 4/23/2012 3:39 PM, Brandon Allbery wrote: The other dirty little secret that is carefully being avoided here is the battle between the folks for whom Haskell is a language research platform and those who use it to get work done. It's not entirely inaccurate to say the former group would regard a fragmented module namespace as a good thing, specifically because it discourages people from considering it to be stable Brandon, I find that a little hard to believe. If the issues are similar to other systems and languages, then I think it is more likely that no one has volunteered to work on it. You volunteering to help? Does haskell/hackage have something like debian's lintian? Debian has a detailed policy document that keeps evolving: http://www.debian.org/doc/debian-policy/ Lintian tries hard to automate (as much as possible) policy-compliance http://lintian.debian.org/manual/index.html Eg how packages should use the file system http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/ Even 'boring' legal stuff like license-checking is somewhat automated http://dep.debian.net/deps/dep5/ And most important is the dos and donts for package dependency making possible nice pics http://collab-maint.alioth.debian.org/debtree/ Of course as Wren pointed out, the Linux communities have enough manpower to police their distributions which haskell perhaps cannot. My question is really: Would not something like a haskell-lintian make such sanity checking easier and more useful for everyone? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Correspondence between libraries and modules
Rustom, I am drafting a document that captures some of the social norms from the comments on this list, mostly from Brandon and Wren. I have captured the discussion about module namespace and am sorting out the comments on the relationship between libraries and packages. My initial question to the list was to try an identify where Haskell is different from other open source distributions. From what I can tell, the issues are very similar. The module name space seems to have characteristics very similar to the include file hierarchy of linux distributions. If you have some spare cycles and would like to contribute, I think everyone would appreciate your help and effort Gregg On 5/23/2012 4:24 AM, Rustom Mody wrote: On Tue, Apr 24, 2012 at 7:29 PM, Gregg Lebovitz glebov...@gmail.com wrote: On 4/23/2012 10:17 PM, Brandon Allbery wrote: On Mon, Apr 23, 2012 at 17:16, Gregg Lebovitz glebov...@gmail.com wrote: On 4/23/2012 3:39 PM, Brandon Allbery wrote: The other dirty little secret that is carefully being avoided here is the battle between the folks for whom Haskell is a language research platform and those who use it to get work done. It's not entirely inaccurate to say the former group would regard a fragmented module namespace as a good thing, specifically because it discourages people from considering it to be stable Brandon, I find that a little hard to believe. If the issues are similar to other systems and languages, then I think it is more likely that no one has volunteered to work on it. You volunteering to help? Does haskell/hackage have something like debian's lintian? Debian has a detailed policy document that keeps evolving: http://www.debian.org/doc/debian-policy/ Lintian tries hard to automate (as much as possible) policy-compliance http://lintian.debian.org/manual/index.html Eg how packages should use the file system http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/ Even 'boring' legal stuff like license-checking is somewhat automated http://dep.debian.net/deps/dep5/ And most important is the dos and donts for package dependency making possible nice pics http://collab-maint.alioth.debian.org/debtree/ Of course as Wren pointed out, the Linux communities have enough manpower to police their distributions which haskell perhaps cannot. My question is really: Would not something like a haskell-lintian make such sanity checking easier and more useful for everyone? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Correspondence between libraries and modules
Alvaro Gutierrez wrote: I've only dabbled in Haskell, so please excuse my ignorance: why isn't there a 1-to-1 mapping between libraries and modules? As I understand it, a library can provide any number of unrelated modules, and conversely, a single module could be provided by more than one library. I can see how this affords library authors more flexibility, but at a cost: there is no longer a single, unified view of the library universe. (The alternative would be for every module to be its own, hermetic library.) So I'm very interested in the rationale behind that aspect of the library system. I am probably repeating arguments brought forward by others, but I really like that the Haskell module name space is ordered along functionality rather than authorship. If I ever manage to complete an implementaton of the EPICS pvData project in Haskell, it will certainly inherit the Java module naming convention and thus will contain modules named Org.Epics.PvData.XXX, *but* if I need to add utility functions to the API that are generic list processing functions they will certainly live in the Data.List.* name space and if I need to add type level stuff (which is likely) it will be published under Data.Type.* etc. This strikes me as promoting re-use: makes it far easier and more likely to factor out these things into a separate general purpose library or maybe even integrate them into a widely known standard library. It also gives you a much better idea what the thing you export is doing than if it is from, say, Org.Epics.PvData.Util. Finally, it gives the package author an incentive to actually do the refactoring that makes it obvious where the function belongs to, functionally. Cheers Ben ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Correspondence between libraries and modules
On 4/24/2012 11:44 PM, wren ng thornton wrote: To pick another similar namespacing issue, consider the problem of Google Code. In Google Code there's a single namespace for projects, and the Google team spends a lot of effort on maintaining that namespace and resolving conflicts. (I know folks who've worked in the lab next door to that team. So, yes, they do spend a lot of work on it.) Whereas if you consider BitBucket or GitHub, each user is given a separate project namespace, and therefore the only thing that has to be maintained is the user namespace--- which has to be done anyways in order to deal with logins. The model of Google Code, SourceForge, and Java all assume that projects and repositories are scarce resources. Back in the day that may have been true (or may not), but today it is clearly false. Repos are cheap and everyone has a dozen side projects. Actually, I like the idea of combining an assigned User name with the repo name as the namespace. We already have login names for haskell.org, why not use those. I agree that it is not an end all, but it would be a start. My top level namespace would be Org.Haskell.Glebovitz. It is democratic and it identifies the code by the repoand the user the created it. If someone else decided to use their github id then it their modules would be org.github.username or org.github.project. Of course people can choose to ignore the namespace common practice, but they can do that anyway. If you look at the case of Perl and CPAN, there's the same old story: universal authority. Contrary to Java, CPAN does very much actively police (or rather, vett) the namespace. However, this extreme level of policing requires a great deal of work and serves to drive away a great many developers from publishing their code on CPAN. I'm not as familiar with the innards of how various Linux distros manage things, but they're also tasked with the additional burden of needing to pull in stuff from places like CPAN, Hackage, etc. Because of that, their namespace situation seems quite different from that of Hackage or CPAN on their own. I do know that Debian at least (and presumably the others as well) devote a great deal of manpower to all this. Yes, but that goes back to my comments about upstream and downstream. Hackage can try to solve the problem for itself, but eventually someone is going to put together a distribution, whether it be ubuntu, or Microsoft and they will have to sort out the name collisions for their packages and modules. If we have a good naming scheme to start with, it will make the downstream problem a bit easier. Even so, they will probably change it anyways. I know that ubuntu and fedora take different approaches to packaging. When I try to use a package like Qt on these different platforms, I have to figure out which package contains which library. So we have (1) the Java model where there are rules that noone follows; (2) the Google Code, CPAN, and Linux distro model of devoting a great deal of community resources to maintaining the rules; and (3) the BitBucket, GitHub, Hackage model of having few institutionalized rules and leaving it to social factors. The first option buys us nothing over the last, excepting a false sense of security and the ability to alienate private open-source developers. I think my combo of formalized namespace and social rules would work best here. The problem is that we do have module collisions because the namespace is too simple. Right now it is not an issue because the community is not huge. Eventually it will be a problem if Haskell popularity grows. There is no technical solution to this problem, at least not any used by the communities you cite. The only solutions on offer require a great deal of human effort, which is always a social/political/economic matter. The only technical avenues I see are ways of making the problem less problematic, such as GitHub and BitBucket distinguishing the user namespace from each user's project namespace, such as the -XPackageImports extension (which is essentially the same as GitHub/BitBucket), or such as various ideas about using tree-grafting to rearrange the module namespace on a per-project basis thereby allowing clients to resolve the conflicts rather than requiring a global solution. I'm quite interested in that last one, though I don't have any time for it in the foreseeable future. There probably is a technical solution, but no one is going to discover it and build it anytime soon. dI think we all agree that a centralized global solution is out. No one would want to manage it. I do think the repo.username namespace has potential. The problem is that informal social convention works if the community is small. Once it starts to grow it has to be codified to some degree. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org
Re: [Haskell-cafe] Correspondence between libraries and modules
On 4/24/2012 11:49 PM, wren ng thornton wrote: On 4/24/12 9:59 AM, Gregg Lebovitz wrote: The question of how to support rapid innovation and stable deployment is not an us versus them problem. It is one of staging releases. The Linux kernel is a really good example. The Linux development team innovates faster than the community can absorb it. The same was true of the GNU team. Distributions addressed the gap by staging releases. In that case, what you are interested in is not Hackage (the too-fast torrent of development) but rather the Haskell Platform (a policed set of stable/core libraries with staged releases). No, that was not what I was thinking because a stable policed set of core libraries is at the opposite end of the spectrum from how you describe Hackage. What I am suggesting is a way of creating an upstream that feeds increasingly stable code into an ever increasing set of stable and useful components. Using the current open system model, the core compiler team for gcc releases the compiler and a set of libstdc and libstdc++ libraries. The GNU folks release more useful libraries, and then projects like GNOME build on the other components. Right now we have Hackage that moves to fast and the Haskell core that rightfully moves more slowly. Maybe the answer is to add a rating system to Hackage and mark packages as experimental, unsupported, and supported, or use a 5 star rating system like the app store. Later on when we have appropriate testing tools, we can include a rating from the automated tests. I forget who the best person to contact is these days if you want to get involved with helping the HP, but I'm sure someone on the list will say shortly :) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Correspondence between libraries and modules
On 4/23/2012 10:17 PM, Brandon Allbery wrote: On Mon, Apr 23, 2012 at 17:16, Gregg Lebovitz glebov...@gmail.com wrote: On 4/23/2012 3:39 PM, Brandon Allbery wrote: The other dirty little secret that is carefully being avoided here is the battle between the folks for whom Haskell is a language research platform and those who use it to get work done. It's not entirely inaccurate to say the former group would regard a fragmented module namespace as a good thing, specifically because it discourages people from considering it to be stable Brandon, I find that a little hard to believe. If the issues are similar to other systems and languages, then I think it is more likely that no one has volunteered to work on it. You volunteering to help? Yes, you do find it hard to believe; so hard that you went straight past it and tried to point to the "easy" technical solution to the problem you decided to see in place of the real one, which doesn't have a technical solution. Brandon, I am very glad to make your acquaintance. I think you have given these issue much thought. That is good. No, I don't think I "went straight past it". I we are trying to address the same issue, but from different directions. If you take the time to look at my history, you'll find that I spent my career bridging the very gap you make so very salient. Here's where we differ, you see an untenable political issue, and I see a technical one. The question of how to support rapid innovation and stable deployment is not an us versus them problem. It is one of staging releases. The Linux kernel is a really good example. The Linux development team innovates faster than the community can absorb it. The same was true of the GNU team. Distributions addressed the gap by staging releases. I fought this very battle in the 1980s with the Andrew system. The technology coming out of the ITC (research community) was evolving faster than users could absorb. Researchers want to innovate and push the limits and users want stability. I've spoken with many in the Haskell research community, and I never heard anyone say "no, we want to obfuscate Haskell so that we never have to make is stable." I think both communities want success. The question is how to build a system that will address both. From your history, I see you are knowledgeable and well known on the deployment side of technology. You also understand what Haskell needs to move forward. So I ask you again, are you volunteering to help? -- brandon s allbery allber...@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Correspondence between libraries and modules
On 4/23/12 11:39 AM, Gregg Lebovitz wrote: On 04/23/2012 12:03 AM, wren ng thornton wrote: However, until better technical support is implemented (not just for GHC, but also jhc, UHC,...) it's best to follow social practice. Wren, I am new to Haskell and not aware of all of the conventions. Is there a place where I can find information on these social practices? Are they documented some place? Not that I know of, though they're fairly standard for any open-source programming community. E.g., when it comes to module names: familiarize yourself with what's out there; try to fit in with the patterns you see[1]; don't intentionally clash, steal namespaces[2], or squat on valuable territory[3]; be reasonable and conscientious when interacting with people. [1] e.g., the use of Data.* for data structures which are predominantly/universally treated as such, vs the use of Control.* for things which are often thought of as control structures (monads, etc). The use of Foo.Bar.Strict and Foo.Bar.Lazy when you provide both strict and lazy versions of some whole API, usually with Foo.Bar re-exporting whichever one seems the sensible default. The use of Foo.Bar.Class to resolve circular import issues when defining a class and a bunch of datatypes with instances. Etc. [2] I mean things like if some package is providing a bunch of Foo.Bar.* modules, and it's the only one doing so, then you should try to get in touch with the maintainer before you start publishing your own Foo.Bar.* modules--- in order to collaborate, to send patches up-stream, or just to let them know what's going on. [3] Witness an unintentional breach of this myself a while back. When I was hacking up the exact-combinatorics package for my own use, I put things in Math.Combinatorics.* since that's a reasonable place and wasn't in use; but I didn't think of that fact when I decided to publish the code. When pointed out, I promptly moved everything to Math.Combinatorics.Exact.* since that project is only interested in exact combinatorics and I have no intention of codifying all of combinatoric theory; hence using Math.Combinatorics.* would be squatting on very valuable names. However, centralization is prone to bottlenecks and systemic failure. As such, while it would be nice to ensure that a given module is provided by only one package, there is no mechanism in place to enforce this (except at compile time for the code that links the conflicting modules together). From someone new to the community, it seems that yes centralization has its issues, but it also seems that practices could be put in place that minimize the bottlenecks and systemic failures. Unless I greatly misunderstand the challenges, there seem to be lot of ways to approach this problem and none of them are new. We all use systems that are composed of many modules neatly combined into complete systems. Linux distributions do this well. So does Java. Maybe should borough from their experiences and think about how we put packages together and what mechanisms we need to resolve inter-package dependencies. Java attempts to resolve the issue by imposing universal authority (use reverse urls for the first part of your package name). Many Java developers flagrantly ignore that claim to authority. Sun/Oracle has no interest in actually policing these violations, and there's no central repository for leveraging social pressure to do it. Moreover, open-source developers who do not have a commercial/institutional affiliation are specifically placed in a tough spot, and are elided from public discourse because of that fact, which is extremely problematic on too many levels to get into here. Furthermore, many developers ---especially among open-source and academic authors--- have an inherent distrust for ambient authority like this. To pick another similar namespacing issue, consider the problem of Google Code. In Google Code there's a single namespace for projects, and the Google team spends a lot of effort on maintaining that namespace and resolving conflicts. (I know folks who've worked in the lab next door to that team. So, yes, they do spend a lot of work on it.) Whereas if you consider BitBucket or GitHub, each user is given a separate project namespace, and therefore the only thing that has to be maintained is the user namespace--- which has to be done anyways in order to deal with logins. The model of Google Code, SourceForge, and Java all assume that projects and repositories are scarce resources. Back in the day that may have been true (or may not), but today it is clearly false. Repos are cheap and everyone has a dozen side projects. If you look at the case of Perl and CPAN, there's the same old story: universal authority. Contrary to Java, CPAN does very much actively police (or rather, vett) the namespace. However, this extreme level of policing requires a great deal of work and serves to drive away a great many
Re: [Haskell-cafe] Correspondence between libraries and modules
On 4/24/12 9:59 AM, Gregg Lebovitz wrote: The question of how to support rapid innovation and stable deployment is not an us versus them problem. It is one of staging releases. The Linux kernel is a really good example. The Linux development team innovates faster than the community can absorb it. The same was true of the GNU team. Distributions addressed the gap by staging releases. In that case, what you are interested in is not Hackage (the too-fast torrent of development) but rather the Haskell Platform (a policed set of stable/core libraries with staged releases). I forget who the best person to contact is these days if you want to get involved with helping the HP, but I'm sure someone on the list will say shortly :) -- Live well, ~wren ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Correspondence between libraries and modules
On 4/23/12 3:06 PM, Alvaro Gutierrez wrote: I see. The first thing that comes to mind is the notion of module granularity, which of course is subjective, so whether a single module or multiple ones should handle e.g. doubles and integrals is a good question; are there guidelines as to how those choices are made? I'm not sure if there are any guidelines per se; that's more of a general software engineering problem. If you browse around on Hackage you'll get a fairly good idea what the norms are though. Everyone seems to have settled on a common range of scope--- with notable exceptions like the containers library with far too many functions per module, and some of Ed Kmett's work on category theory which tends towards very few declarations per module. At any rate, why do these modules, with sufficiently-different functionality, live in the same library -- is it that they share some common bits of implementation, or to ease the management of source code? I contacted Don Stewart (the former maintainer) to see whether he thought I should release the integral stuff on its own, or integrate it into bytestring-lexing. We agreed that it made more sense to try to build up a core library for lexing various common data types, rather than having a bunch of little libraries. He'd just never had time to get around to developing bytestring-lexing further; so I took over. Eventually I plan to add rendering functions for floating point, and to split up the parsers for different floating point formats[1], so that it more closely resembles the integral stuff. But that won't be until this fall or later, unless someone requests it sooner. [1] Having an omni-parser can be helpful when you want to be liberal about your input. But when you're writing parsers for a specified format, usually they're not that liberal so we need to offer restricted lexers in order to give code reuse. When dealing with FFI code, because of the impedance mismatch between Haskell and imperative languages like C, it's clear that there's going to be some massaging of the API beyond simply declaring FFI calls. As such, clearly we'd like to have separate modules for doing the low-level binding vs presenting a high-level API. Moreover, depending on what you're interfacing with, you may be forced to have multiple low-level modules. Ah, that's a good use case. Is the lower-level module usually made public as well, or is it only an implementation detail? Depends on the project. For ByteStrings, most of that is hidden away as implementation details. For binding to C libraries, I think the current advice is to offer the low-level interface so that if there's something the high-level interface can't handle well, people have some easy recourse. On the other hand, the main purpose of packages or libraries is as unit of distribution, code reuse, and separate compilation. Even with the Haskell culture of making small libraries, most worthwhile units of distribution/reuse/compilation tend to be larger than a single namespace/concern. Thus, it makes sense to have more than one module per package, because otherwise we'd need some higher level mechanism in order to manage the collections of package-modules which should be considered a single unit (i.e., clients will almost always want the whole bunch of them). This is the part that I'm trying to get a better sense of. I can see how in some cases, it makes sense for more than one module to form a unit, because they are tightly coupled semantically or implementation-wise -- so clients will indeed want the whole bunch. On the other hand, several libraries provide modules that are all over the place, in a way that doesn't form a unit of any kind (e.g. MissingH), and it's not clear that you would want any Network stuff when all you need is String utilities. Yeah, MissingH and similar libraries are just grab-bags full of stuff. Usually grab-bag libraries think of themselves as place-holders, with the intention of breaking things out once there's something of a large enough size to warrant being its own package. (Whether the breaking out actually happens is another matter.) But to get the general sense of things, you should ignore them. Instead, consider one of the parsing libraries like uu-parsinglib, attoparsec, parsec, frisby. There are lots of pieces to a parsing framework, but it makes sense to distribute them together. Or, consider one of the base libraries for iteratees, enumerators, pipes, conduits, etc. Like parsing, these offer a whole framework. You won't usually need 100% of it, but everyone needs a different 80%. Or to mention some more of my own packages, consider stm-chans, unification-fd, or unix-bytestrings. In unification-fd, the stuff outside of Control.Unification.* could be moved elsewhere, but the stuff within there makes sense to be split up yet distributed together. For stm-chans because of the similarity in interfaces, use cases, etc,
Re: [Haskell-cafe] Correspondence between libraries and modules
On Wed, 25 Apr 2012 05:44:28 +0200, wren ng thornton w...@freegeek.org wrote: On 4/23/12 11:39 AM, Gregg Lebovitz wrote: On 04/23/2012 12:03 AM, wren ng thornton wrote: However, until better technical support is implemented (not just for GHC, but also jhc, UHC,...) it's best to follow social practice. Wren, I am new to Haskell and not aware of all of the conventions. Is there a place where I can find information on these social practices? Are they documented some place? Not that I know of, though they're fairly standard for any open-source programming community. E.g., when it comes to module names: familiarize yourself with what's out there; try to fit in with the patterns you see[1]; don't intentionally clash, steal namespaces[2], or squat on valuable territory[3]; be reasonable and conscientious when interacting with people. The following page gives you some idea of the module names: http://www.haskell.org/haskellwiki/Hierarchical_module_names An overview of pages about programming style: http://www.haskell.org/haskellwiki/Category:Style Regards, Henk-Jan van Tuyl -- http://Van.Tuyl.eu/ http://members.chello.nl/hjgtuyl/tourdemonad.html Haskell programming -- ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Correspondence between libraries and modules
On 04/23/2012 12:03 AM, wren ng thornton wrote: However, until better technical support is implemented (not just for GHC, but also jhc, UHC,...) it's best to follow social practice. Wren, I am new to Haskell and not aware of all of the conventions. Is there a place where I can find information on these social practices? Are they documented some place? However, centralization is prone to bottlenecks and systemic failure. As such, while it would be nice to ensure that a given module is provided by only one package, there is no mechanism in place to enforce this (except at compile time for the code that links the conflicting modules together). From someone new to the community, it seems that yes centralization has its issues, but it also seems that practices could be put in place that minimize the bottlenecks and systemic failures. Unless I greatly misunderstand the challenges, there seem to be lot of ways to approach this problem and none of them are new. We all use systems that are composed of many modules neatly combined into complete systems. Linux distributions do this well. So does Java. Maybe should borough from their experiences and think about how we put packages together and what mechanisms we need to resolve inter-package dependencies. Am I missing something that makes this problem harder than other systems and languages? Is anyone currently working on the packaging and distribution issues? If not, does anyone else want to work on it? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Correspondence between libraries and modules
Thanks for the write-up -- it's been very helpful! On Mon, Apr 23, 2012 at 12:03 AM, wren ng thornton w...@freegeek.orgwrote: Consider one of my own libraries (chosen randomly via Safari's url autocompletion): http://hackage.haskell.org/**package/bytestring-lexinghttp://hackage.haskell.org/package/bytestring-lexing When I inherited this package there were the Data.ByteString.Lex.Double and Data.ByteString.Lex.Lazy.**Double modules, which were separated because they provide the same API but for strict vs lazy ByteStrings. Both of those modules are concerned with lexing floating point numbers. I inherited the package because I wanted to publicize some code I had for lexing integers in various formats. Since that's quite a different task than lexing floating point numbers, I put it in its own module: Data.ByteString.Lex.Integral. I see. The first thing that comes to mind is the notion of module granularity, which of course is subjective, so whether a single module or multiple ones should handle e.g. doubles and integrals is a good question; are there guidelines as to how those choices are made? At any rate, why do these modules, with sufficiently-different functionality, live in the same library -- is it that they share some common bits of implementation, or to ease the management of source code? When dealing with FFI code, because of the impedance mismatch between Haskell and imperative languages like C, it's clear that there's going to be some massaging of the API beyond simply declaring FFI calls. As such, clearly we'd like to have separate modules for doing the low-level binding vs presenting a high-level API. Moreover, depending on what you're interfacing with, you may be forced to have multiple low-level modules. Ah, that's a good use case. Is the lower-level module usually made public as well, or is it only an implementation detail? On the other hand, the main purpose of packages or libraries is as unit of distribution, code reuse, and separate compilation. Even with the Haskell culture of making small libraries, most worthwhile units of distribution/reuse/compilation tend to be larger than a single namespace/concern. Thus, it makes sense to have more than one module per package, because otherwise we'd need some higher level mechanism in order to manage the collections of package-modules which should be considered a single unit (i.e., clients will almost always want the whole bunch of them). This is the part that I'm trying to get a better sense of. I can see how in some cases, it makes sense for more than one module to form a unit, because they are tightly coupled semantically or implementation-wise -- so clients will indeed want the whole bunch. On the other hand, several libraries provide modules that are all over the place, in a way that doesn't form a unit of any kind (e.g. MissingH), and it's not clear that you would want any Network stuff when all you need is String utilities. However, centralization is prone to bottlenecks and systemic failure. As such, while it would be nice to ensure that a given module is provided by only one package, there is no mechanism in place to enforce this (except at compile time for the code that links the conflicting modules together). With few exceptions, it's considered bad form to knowingly use the same module name as is being used by another package. In part, it's bad form because egos are involved; but it's also bad form because there's poor technical support for resolving namespace collisions for module names. In GHC you can use -XPackageImports, which is workable but conflates issues of code with issues of provenance, which the Haskell Report intentionally keeps separate. However, until better technical support is implemented (not just for GHC, but also jhc, UHC,...) it's best to follow social practice. But the way you describe it, it seems that despite centralization having those disadvantages, it is more or less the way the system works, socially (egos, bad form, etc.) and technically (because of the lack of compiler support) -- except that it is ad-hoc instead of mechanically enforced. In other words, I don't see what the advantages of allowing ambiguity currently are. Some people figured to solve the new issue by implementing it both ways in separate packages, but reusing the same module names. (Witness for example mtl-2 aka monads-fd, vs monads-tf.) In practice, that didn't work out so well. Part of the reason for failure is that although fundeps and TF/ATs are formally equivalent in theory, in practice the implementation of TF/ATs has(had?) been missing some necessary machinery, and consequentially the TF/AT versions were not as powerful as the original fundep versions. Though the butterfly dependency issues certainly didn't help. Ah, interesting. So, perhaps I misunderstand, but this seems like an argument in favor of having uniquely-named modules (e.g. Foo.FD and Foo.TF) instead of
Re: [Haskell-cafe] Correspondence between libraries and modules
On Mon, Apr 23, 2012 at 11:39, Gregg Lebovitz glebov...@gmail.com wrote: Am I missing something that makes this problem harder than other systems and languages? Is anyone currently working on the packaging and distribution issues? If not, does anyone else want to work on it? The other dirty little secret that is carefully being avoided here is the battle between the folks for whom Haskell is a language research platform and those who use it to get work done. It's not entirely inaccurate to say the former group would regard a fragmented module namespace as a good thing, specifically because it discourages people from considering it to be stable -- brandon s allbery allber...@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Correspondence between libraries and modules
On 4/23/2012 3:39 PM, Brandon Allbery wrote: On Mon, Apr 23, 2012 at 11:39, Gregg Lebovitz glebov...@gmail.com wrote: Am I missing something that makes this problem harder than other systems and languages? Is anyone currently working on the packaging and distribution issues? If not, does anyone else want to work on it? The other dirty little secret that is carefully being avoided here is the battle between the folks for whom Haskell is a language research platform and those who use it to get work done. It's not entirely inaccurate to say the former group would regard a fragmented module namespace as a good thing, specifically because it discourages people from considering it to be stable Brandon, I find that a little hard to believe. If the issues are similar to other systems and languages, then I think it is more likely that no one has volunteered to work on it. You volunteering to help? -- brandon s allbery allber...@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Correspondence between libraries and modules
On Mon, Apr 23, 2012 at 17:16, Gregg Lebovitz glebov...@gmail.com wrote: On 4/23/2012 3:39 PM, Brandon Allbery wrote: The other dirty little secret that is carefully being avoided here is the battle between the folks for whom Haskell is a language research platform and those who use it to get work done. It's not entirely inaccurate to say the former group would regard a fragmented module namespace as a good thing, specifically because it discourages people from considering it to be stable Brandon, I find that a little hard to believe. If the issues are similar to other systems and languages, then I think it is more likely that no one has volunteered to work on it. You volunteering to help? Yes, you do find it hard to believe; so hard that you went straight past it and tried to point to the easy technical solution to the problem you decided to see in place of the real one, which doesn't have a technical solution. -- brandon s allbery allber...@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Correspondence between libraries and modules
On Sun, Apr 22, 2012 at 13:15, Alvaro Gutierrez radi...@google.com wrote: As I understand it, a library can provide any number of unrelated modules, and conversely, a single module could be provided by more than one library. I can see how this affords library authors more flexibility, but at a cost: there is no longer a single, unified view of the library universe. (The alternative would be for every module to be its own, hermetic library.) So I'm very interested in the rationale behind that aspect of the library system. One reason: modules serve multiple purposes; one of these is namespacing, and in the case of interfaces to foreign libraries that may force a division that would otherwise not exist. More generally, making libraries and modules one-to-one means that either modules exist solely for libraries, or libraries must be artificially split. Perhaps this indicates that modules have too many other functions, but in that case you should propose an alternative system to replace them. As to multiple libraries providing the same module: the Haskell ecosystem is still evolving and it's not always appropriate to give a particular implementation sole ownership of a general module name. Type families vs. functional dependencies are an example of this (theoretically type families were considered superior but to date they haven't lived up to it and recently some cases were shown that fundeps can solve but type families can't; parallel monad libraries based on both still exist). New container implementations have existed as standalone packages, some of which later merge with standard packages while others are discarded. Your proposal to reject this reflects a static library ecosystem that does not exist. (It could be enforced dictatorially, but there is no Guido van Rossum of Haskell and a mistake in an evolving system is difficult to fix after the fact even with a dictator; we're already living with some difficult to fix issues not related to modules.) -- brandon s allbery allber...@gmail.com wandering unix systems administrator (available) (412) 475-9364 vm/sms ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Correspondence between libraries and modules
Thanks for your response. On Sun, Apr 22, 2012 at 4:45 PM, Brandon Allbery allber...@gmail.comwrote: One reason: modules serve multiple purposes; one of these is namespacing, and in the case of interfaces to foreign libraries that may force a division that would otherwise not exist. Interesting. Could you elaborate on what the other purposes are, and perhaps point to an instance of the foreign library case? More generally, making libraries and modules one-to-one means that either modules exist solely for libraries, or libraries must be artificially split. Perhaps this indicates that modules have too many other functions, but in that case you should propose an alternative system to replace them. Oh, I don't intend to replace it -- at most I want to understand why the system is set up the way it is, what the cons/pros are, and so on. I've come across a lot of design discussions for various Haskell features, but not this one; are there any? As to multiple libraries providing the same module: the Haskell ecosystem is still evolving and it's not always appropriate to give a particular implementation sole ownership of a general module name. Type families vs. functional dependencies are an example of this (theoretically type families were considered superior but to date they haven't lived up to it and recently some cases were shown that fundeps can solve but type families can't; parallel monad libraries based on both still exist). New container implementations have existed as standalone packages, some of which later merge with standard packages while others are discarded. I see. I didn't imagine there was as much variability with respect to module names and implementations as you suggest. I'm confused as to how type families vs. fundeps play a role here -- as far as I can tell both are compiler extensions that do not provide modules. I'm interested to see examples where two or more well-known yet unrelated modules clash under the same name; I can't imagine them coexisting in public very long -- wouldn't the confusion among users (e.g. when looking for documentation) be enough to either reconcile the modules or change one of the names? Your proposal to reject this reflects a static library ecosystem that does not exist. (It could be enforced dictatorially, but there is no Guido van Rossum of Haskell and a mistake in an evolving system is difficult to fix after the fact even with a dictator; we're already living with some difficult to fix issues not related to modules.) Right, assuming there could only be one implementation of a module, this is one of the main drawbacks; on the flip side, it is a feature in that there is no confusion as to what Foo.Bar.Qux means. As it is, any import requires out-of-band information in order to be resolved (both cognitively and by the compiler), in the form of the library it comes from. (There's also versioning information, but that could be equally specified per-library or per-module.) On the other hand, enforcing a single implementation is orthogonal to having a 1-to-1 module/library mapping. That is, you could allow multiple implementations either way. Alvaro ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Correspondence between libraries and modules
On 4/22/12 6:30 PM, Alvaro Gutierrez wrote: On Sun, Apr 22, 2012 at 4:45 PM, Brandon Allberyallber...@gmail.comwrote: One reason: modules serve multiple purposes; one of these is namespacing, and in the case of interfaces to foreign libraries that may force a division that would otherwise not exist. Interesting. Could you elaborate on what the other purposes are, and perhaps point to an instance of the foreign library case? The main purpose of namespacing (IMO) is to separate concerns and make it easier to figure out how a project fits together. The primary goal of modules is to resolve namespacing issues. Consider one of my own libraries (chosen randomly via Safari's url autocompletion): http://hackage.haskell.org/package/bytestring-lexing When I inherited this package there were the Data.ByteString.Lex.Double and Data.ByteString.Lex.Lazy.Double modules, which were separated because they provide the same API but for strict vs lazy ByteStrings. Both of those modules are concerned with lexing floating point numbers. I inherited the package because I wanted to publicize some code I had for lexing integers in various formats. Since that's quite a different task than lexing floating point numbers, I put it in its own module: Data.ByteString.Lex.Integral. When dealing with FFI code, because of the impedance mismatch between Haskell and imperative languages like C, it's clear that there's going to be some massaging of the API beyond simply declaring FFI calls. As such, clearly we'd like to have separate modules for doing the low-level binding vs presenting a high-level API. Moreover, depending on what you're interfacing with, you may be forced to have multiple low-level modules. For example, if you use Google protocol buffers via the hprotoc package, then it will generate a separate module for each buffer type. That's fine, but usually it's not something you want to foist on your users. On the other hand, the main purpose of packages or libraries is as unit of distribution, code reuse, and separate compilation. Even with the Haskell culture of making small libraries, most worthwhile units of distribution/reuse/compilation tend to be larger than a single namespace/concern. Thus, it makes sense to have more than one module per package, because otherwise we'd need some higher level mechanism in order to manage the collections of package-modules which should be considered a single unit (i.e., clients will almost always want the whole bunch of them). However, centralization is prone to bottlenecks and systemic failure. As such, while it would be nice to ensure that a given module is provided by only one package, there is no mechanism in place to enforce this (except at compile time for the code that links the conflicting modules together). With few exceptions, it's considered bad form to knowingly use the same module name as is being used by another package. In part, it's bad form because egos are involved; but it's also bad form because there's poor technical support for resolving namespace collisions for module names. In GHC you can use -XPackageImports, which is workable but conflates issues of code with issues of provenance, which the Haskell Report intentionally keeps separate. However, until better technical support is implemented (not just for GHC, but also jhc, UHC,...) it's best to follow social practice. I'm confused as to how type families vs. fundeps play a role here -- as far as I can tell both are compiler extensions that do not provide modules. Both TFs (or rather associated types) and fundeps aim to solve the same problem. Namely: when using multi-parameter type classes, it is often desirable to declare that one parameter is wholly defined by other parameters, either for semantic reasons or (more often) to help type inference. Since they both aim to solve the same problem, this raises a new problem: for some given type class, do I implement it with TF/ATs or with fundeps? Some people figured to solve the new issue by implementing it both ways in separate packages, but reusing the same module names. (Witness for example mtl-2 aka monads-fd, vs monads-tf.) In practice, that didn't work out so well. Part of the reason for failure is that although fundeps and TF/ATs are formally equivalent in theory, in practice the implementation of TF/ATs has(had?) been missing some necessary machinery, and consequentially the TF/AT versions were not as powerful as the original fundep versions. Though the butterfly dependency issues certainly didn't help. I'm interested to see examples where two or more well-known yet unrelated modules clash under the same name; I can't imagine them coexisting in public very long -- wouldn't the confusion among users (e.g. when looking for documentation) be enough to either reconcile the modules or change one of the names? That's not much of a problem in practice. There are lots of different