Re: Proposal: priority queues in containers
Okay, here's my current plan: I think I'm going to withdraw the ticket for containers, and launch a package to be included in a future release of the Haskell Platform. First off, I'd like to know what the procedure is for adding something to the Platform. I haven't been able to find a mailing list dedicated to it, so I don't know where it should be discussed. (In all likelihood, I'm missing some obvious link on either Trac or HaskellWiki or something...) First, people should look at my current setup, and complain at me about the design. Haddock documentation is herehttp://code.haskell.org/containers-pqueue/pqueue-1.0.0/html/, though I've only so far documented the minimum-queue variants; in general, I aimed to use the same function names used by Data.Map and Data.Set wherever possible, so extract, top, and delete have been replaced by the function names used for the analogous methods in Data.Map and Data.Set. Probably the single aspect of the design I'm least happy about is how I distinguish between the priority queues that distinguish a key and an element, and those which don't. Currently, the Data.Map-style key/element version is named MinPQueue k a, and lives in the module Data.PQueue.Prio.Min, and the Data.Set-style version is named MinQueue a, and lives in the module Data.PQueue.Min. Can anybody think of better names for these things? Finally, I'm not going to consider any more alternative implementations at this point. My implementation has performed outstandingly on benchmarks, and has beaten a number of competitors. If someone believes that they have a superior implementation, they should email me off-list, and perhaps a later version of the package will be based on a different implementation, but for the moment I'm sticking to the type-magical binomial heap implementation. Louis Wasserman wasserman.lo...@gmail.com http://profiles.google.com/wasserman.louis ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Proposal: priority queues in containers
On 03/19/10 09:39, Louis Wasserman wrote: Yo, * I'm not comfortable with having two redundant modules, one for Min- and one for MaxQueue I'm pretty sure there won't be a containers-compatible solution, certainly not a solution compatible with the style of containers as it's currently written On 03/19/10 06:26, Ross Paterson wrote: If keys are separate, the two versions could be easily achieved using an inversion adaptor on Ord, which has been proposed before and would have many other uses. I agree with Ross. We should add Data.Ord.Opposite, or whatever we decide to call it, (newtype Opposite a = Opposite { getOpposite :: a } deriving (Eq, Show, ...), and the obvious Ord instance.) This works with Data.Map too. Additionally I suggest, since folding a Map produces keys lowest-first, that we choose the same behavior for the priority queue. On the other hand, separating the key and the value gives us the same Data.Map vs. Data.Set mess (which maybe is not a bad mess, but, it would suggest there should be two different PQ modules too, one with associated non-ordered data and the other with only the priorities). -Isaac ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Proposal: priority queues in containers
Oh, god, so much to respond to...heh. Submit this package for canonicalization as part of the Haskell Platform. I would for one would support its inclusion. This is an option I seriously hadn't considered. To be fair, that's because I've never used the Platform myself, preferring rather to have the most up-to-date version of GHC at all times, heh. That said, while I'd be okay with this option, I'd prefer putting it into containers, because I feel like a canonical, reliable priority queue implementation is the sort of thing a self-respecting language ought to have built in. As does Python. In Python, though, the PQ implementation is not a built-in class in the default namespace (as are dict and set). Rather, it is one of the batteries included libraries that come with Python. I think that's the right place for it in Haskell, too. I don't know Python, but according to Wikipedia, dict and set are built into the language. I don't think it's a fair comparison: set and dict in Python seem to have a role almost as ubiquitous as [] in Haskell, much more ubiquitous than e.g. Data.Set or Data.Map. I'm also not entirely sure that batteries included doesn't describe containers, given all the other packages that come with GHC. * There is no distinction between keys and priority values. A utility type Prio p a with the instance Ord p = Ord (Prio p a) is exported to allow usage of distinct keys and priority values. I disagree with this one. It requires an Ord instance that isn't really an ordering, and makes a Functor instance impossible. I would prefer an interface separating keys and values like that of Data.Map (which would also increase consistency within the package). I'd be okay with separating out a priority/value version. However, I'm still clueless as to what you're talking about with respect to Functor -- what's wrong with the following? data Prio p a = Prio p a instance Ord p = Ord (Prio p a) where ... instance Functor (Prio p) where fmap f (Prio p a) = Prio p (f a) I can understand if you're uncomfortable with (==) and (\ x y - compare x y == EQ) being inequivalent, but neither the H98 Report nor the Prelude make any such claim, as far as I can tell. The Foldable instance breaks the abstraction. I think it should process elements in order. I think that wanting to iterate over the priority queue in the middle of the computation, without caring about order, is a perfectly legitimate desire for a programmer! Moreover, the only way to make a Foldable instance process elements in order would be something like data Ord a = PQueue a = which I think is an awfully dirty approach. I'm not at all a fan of adding restrictions like that, not least because it adds lots of awkward overhead. Would you be okay with not exporting a Foldable instance at all, but still exporting a toList method which doesn't guarantee any ordering on the return list? My advice would be not to stress over whether priority queues go into containers. It's not some pristine thing of beauty that deserves treatment with velvet gloves. I'm...not sure how to respond to this claim. At least part of me wants to say, I genuinely do think the containers package is a piece of art...and then another part pipes up, except for the inconsistencies between the various minView/maxView versions, and the little differences between IntMap and Map, and... That said, I wouldn't be a fan of scrapping the style which the containers package has at least tried to create. I could be convinced that rewriting the rest of containers would be a good thing to do, though...and I might just want to do that myself. Hah. How does this implementation compare with using Map/Set as a priority queue? Continuing the discussion of the benchmarks: first, Jim, it appears that I'm the one who made a n00b benchmarking error. TT_TT That said, as you found, my implementation is still slightly faster when the benchmark is corrected. Some comments: - QuickBinom needs to have O(1) findMin for a reasonable comparison. I added that in the benchmark below, and here. - I can't think of any more optimizations for the sparse binomial heap -- I genuinely think it's not going to get better. - There is a way to optimize my implementation still further, but it makes my code much less readable. (Specifically, I start the BinomForest at Succ Zero, and unpack the first child of every node still in the forest. Modifying the whole implementation that way, though, makes it unreadably ugly...and I think QuickBinom is possibly already at that point. I started implementing it, and realized just how ugly it was, and I stopped, but I can finish it if I have to.) Sorting 500,000 elements, compiled with -O2, run with +RTS -H128m -K64m, with another few implementations thrown in for good measure: Times (ms) min mean +/-sdmedian max Pairing:1440.090 1482.893
Re: Proposal: priority queues in containers
On Thu, Mar 18, 2010 at 7:43 AM, Louis Wasserman wasserman.lo...@gmail.com wrote: Oh, god, so much to respond to...heh. You did request feedback back there, didn't you :P As does Python. In Python, though, the PQ implementation is not a built-in class in the default namespace (as are dict and set). Rather, it is one of the batteries included libraries that come with Python. I think that's the right place for it in Haskell, too. I don't know Python, but according to Wikipedia, dict and set are built into the language. I don't think it's a fair comparison: set and dict in Python seem to have a role almost as ubiquitous as [] in Haskell, much more It's not really the same. pqueue is not in the built-in namespace in python, that's like the Prelude in haskell. pqueue *is* in the default library, which every python installation will have since it comes with the default download, this is what's meant by batteries included. So that's like containers: you have to explicitly import it, but you shouldn't worry about installations that don't have it because it comes with the compiler. The main difference is that python doesn't have cabal and doesn't have anything like the haskell platform and installing new packages, while easy, is not as automatic as cabal can be. After about five hours' work (!!!) I *finally* managed to install Criterion yesterday, so I'll send out those tests ASAP. I wanted to use criterion too at one point, but it looked too hard to install so I was scared away... ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Proposal: priority queues in containers
Hi, After about five hours' work (!!!) I *finally* managed to install Criterion yesterday, so I'll send out those tests ASAP. I wanted to use criterion too at one point, but it looked too hard to install so I was scared away... Why is that? Because of the Chart depencency? You can install progression -- a wrapper over criterion, which uses gnuplot to draw graphs, and disable the Chart dependency. I did just cabal --user install -f -Chart criterion progression and it installed without a problem, on ghc-6.12.1, ghc-6.13.something, both Linux and Windows. Cheers, Milan ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Proposal: priority queues in containers
On Thu, Mar 18, 2010 at 4:43 PM, Louis Wasserman wasserman.lo...@gmail.com wrote: Submit this package for canonicalization as part of the Haskell Platform. I would for one would support its inclusion. This is an option I seriously hadn't considered. To be fair, that's because I've never used the Platform myself, preferring rather to have the most up-to-date version of GHC at all times, heh. That said, while I'd be okay with this option, I'd prefer putting it into containers, because I feel like a canonical, reliable priority queue implementation is the sort of thing a self-respecting language ought to have built in. I don't like libraries getting bigger, I like them getting smaller. When they're smaller they're easier to understand and easier to upgrade. So I would also advice proposing your package for the HP (Haskell Platform). I'm even for splitting containers into sub-packages: maps, sets, sequence, graph and tree. Those sub-packages would then need to be added to the HP. Then we could turn containers into a meta-package that depends on these sub-packages (similar to how the HP works[1]). Finally we could deprecate containers and after some time remove it. (I'm also for splitting base even more... but one thing at a time) regards, Bas [1] http://hackage.haskell.org/platform/2009.2.0.2/haskell-platform.cabal ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Proposal: priority queues in containers
Hi, I don't like libraries getting bigger, I like them getting smaller. When they're smaller they're easier to understand and easier to upgrade. So I would also advice proposing your package for the HP (Haskell Platform). I'm even for splitting containers into sub-packages: maps, sets, sequence, graph and tree. Those sub-packages would then need to be added to the HP. Then we could turn containers into a meta-package that depends on these sub-packages (similar to how the HP works[1]). Finally we could deprecate containers and after some time remove it. (I'm also for splitting base even more... but one thing at a time) personally I am against splitting containers. It is a collection of several basic data structures with similar design decisions (reasonably efficient, can be used persistently, decent API). I think these structures should stay together, to have a library of data structures for common usage. I am for adding the priority queues to the containers. Cheers, Milan Straka ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Proposal: priority queues in containers
First off: I've finally gotten set up with code.haskell.org. A darcs repo of my binomial heap implementation is at http://code.haskell.org/containers-pqueue/. Also on that site is the haddock documentationhttp://code.haskell.org/containers-pqueue/containers-0.3.0.0/html/, which I'm sure many people will appreciate. Somebody else (Ross?) had requested a separate version of the priority queue to handle priorities and values separately, so I'm working on that. I've also deleted the Foldable instance of MinQueue, though I still offer a clearly documented unordered toList, which will stay in place. Well, I only tested one thing (heap sort), and QuickBinom is actually faster under different options (-prof -auto-all or without calling performGC before every heapsort). -prof -auto-all blocks a significant number of optimizations from actually happening. Essentially, if the compiler has to figure out how much time is taken by some particular function, then it's not allowed to inline or eliminate uses of that function. I don't consider it a fair comparison. Moreover, calling performGC makes sense -- it essentially wipes the slate, making each successive comparison unbiased by the previous one. Louis, you note later in this email that your implementation is done. That seems important to me. If we fix a sane interface now, the implementation can be changed later if we find something more efficient, right? Absolutely true. However, I finally assembled a benchmark that I think is a fair comparison -- a heapsort, essentially length . Data.List.unfoldr extract . foldr insert empty. The resultshttp://code.haskell.org/containers-pqueue/bench-chart.pdfare pretty supportive of my implementation. (Original timing data, outputted by Progression, is herehttp://code.haskell.org/containers-pqueue/bench-final.csv. I think all of the original code for the benchmark is in the code.haskell.org folder, just not part of the darcs repo. However, I had to slightly modify my copy of Progression to force the GC, so YMMV.) I'm still pretty strongly in favor of putting priority queues into containers: other programming languages consider it necessary for inclusion into standardized libraries, people will be more likely to use appropriate data structures for their needs when reliable, friendly implementations are already at their fingertips, and other reasons already discussed. In general, I think priority queues should be treated the same as Data.Map or Data.Set, like java.util.* or the C++ STL treat their respective versions of those structures. I don't think there's likely to be any agreement any time soon to split up containers, so my inclination is to put pqueues into containers, and maybe if we agree later to split containers, then priority queues will be part of that decision. On a somewhat different note: writing unit tests in the existing framework is awkward and hard! Writing QuickCheck tests is trivial and much more exhaustive than what the existing tests look like. The existing containers tests appear to check one particular example that was the source of a preexisting bug, rather than examining exhaustively that methods work correctly, to eliminate potentially new bugs. I mean, Data.IntSet's one existing test is main = print $ isProperSubsetOf (fromList [2,3]) $ fromList [2,3,4] which might have found some preexisting bug, but certainly doesn't convince me of Data.IntSet's correctness. Is this normal? Is it permissible to use QuickCheck inside unit tests? (A collection of QuickCheck tests -- all of which my implementation passes -- is in the code.haskell.org directory.) Louis Wasserman ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Proposal: priority queues in containers
On Thu, Mar 18, 2010 at 10:39 PM, Milan Straka f...@ucw.cz wrote: personally I am against splitting containers. It is a collection of several basic data structures with similar design decisions (reasonably efficient, can be used persistently, decent API). I think these structures should stay together, to have a library of data structures for common usage. But when turning containers into a meta-package these structures will stay together while at the same time allowing people to use and upgrade them separately. regards, Bas ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Proposal: priority queues in containers
Hi, On Thu, Mar 18, 2010 at 10:39 PM, Milan Straka f...@ucw.cz wrote: personally I am against splitting containers. It is a collection of several basic data structures with similar design decisions (reasonably efficient, can be used persistently, decent API). I think these structures should stay together, to have a library of data structures for common usage. But when turning containers into a meta-package these structures will stay together while at the same time allowing people to use and upgrade them separately. If the metapackage stays, then there is probably little difference. I was under impression you wanted to remove the metapackage and leave the individual structures as packages (that was my understanding of your mail). I would like to have basic data structures connected together. I do not really mind if the modules are in one library or in several, as long as I could say I want 'containers'. But independently on the form of the containers package, I would like it to contain a priority queue. Including it to the containers package seems as a good first step. Cheers, Milan ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Proposal: priority queues in containers
On 18 March 2010 22:02, Louis Wasserman wasserman.lo...@gmail.com wrote: I'm still pretty strongly in favor of putting priority queues into containers: other programming languages consider it necessary for inclusion into standardized libraries, people will be more likely to use appropriate data structures for their needs when reliable, friendly implementations are already at their fingertips, and other reasons already discussed. The Haskell Platform is really is intended to be available at your fingertips. Unfortunately, the following does not work (although I thought it's supposed to) $ cabal install haskell-platform Nevertheless, the libraries bundled with GHC are those libraries that GHC itself needs and which therefore cannot be upgraded independently. The real standard libraries are the Haskell Platform and if your package is part of the platform, then your package *is* in status equivalent to things like java.util.*. This weekend's Hackathon in Zürich will partly be dedicated to getting the next release of the Platform release ready. If you can get your package into the following platform release (due 6 months after the current release), then this would surely make it the default package for anyone in need of a PQ. / Thomas -- Push the envelope. Watch it bend. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Proposal: priority queues in containers
Okay, let me ask the following question: Would anybody besides me be heartbroken if priority queues *weren't* put into containers, but were instead put into the Platform? Louis Wasserman wasserman.lo...@gmail.com http://profiles.google.com/wasserman.louis On Thu, Mar 18, 2010 at 6:50 PM, Thomas Schilling nomin...@googlemail.comwrote: On 18 March 2010 22:02, Louis Wasserman wasserman.lo...@gmail.com wrote: I'm still pretty strongly in favor of putting priority queues into containers: other programming languages consider it necessary for inclusion into standardized libraries, people will be more likely to use appropriate data structures for their needs when reliable, friendly implementations are already at their fingertips, and other reasons already discussed. The Haskell Platform is really is intended to be available at your fingertips. Unfortunately, the following does not work (although I thought it's supposed to) $ cabal install haskell-platform Nevertheless, the libraries bundled with GHC are those libraries that GHC itself needs and which therefore cannot be upgraded independently. The real standard libraries are the Haskell Platform and if your package is part of the platform, then your package *is* in status equivalent to things like java.util.*. This weekend's Hackathon in Zürich will partly be dedicated to getting the next release of the Platform release ready. If you can get your package into the following platform release (due 6 months after the current release), then this would surely make it the default package for anyone in need of a PQ. / Thomas -- Push the envelope. Watch it bend. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Proposal: priority queues in containers
On Mar 18, 2010, at 18:33 , Milan Straka wrote: I would like to have basic data structures connected together. I do not really mind if the modules are in one library or in several, as long as I could say I want 'containers'. This is what the Haskell Platform is for. No real need to add more stuff to containers; if anything, the libraries stored there should be decoupled so that e.g. the recent Data.Map proposals wouldn't disturb programs not using Data.Map. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu electrical and computer engineering, carnegie mellon universityKF8NH PGP.sig Description: This is a digitally signed message part ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Proposal: priority queues in containers
On Mar 18, 2010, at 19:50 , Thomas Schilling wrote: The Haskell Platform is really is intended to be available at your fingertips. Unfortunately, the following does not work (although I thought it's supposed to) $ cabal install haskell-platform Other way around: installing the Haskell Platorm gives you ghc, a set of libraries known to work with that version of ghc and with each other, and cabal-install. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu electrical and computer engineering, carnegie mellon universityKF8NH PGP.sig Description: This is a digitally signed message part ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users