[Haskell-cafe] cheap in-repo local branches (just needs implementation)
Hi everyone, Max Battcher had an idea that I thought I should post on the mailing list. The idea is about making branches in darcs. Right now, we take the view that a darcs branch is a darcs repository plain and simple. If you want to create a branch, all you have to do is darcs get (darcs get --lazy to be faster). While this is very simple, a lot of us think that it's inconvenient (one because it's slow, and two because you have to think of where to put the branch). So darcs users have been asking about in-repo branches for a while. And now, Max has come up with a way to implement them. What's nice about his approach is that it lets us keep the simplicity of darcs, while giving more demanding users a chance to work with branches. It also takes advantage of the Petr Ročkai's Summer of Code project to make darcs faster in our daily lives and for the matter, paves the way for a possible darcs plugin system in the future. On Max's advice, I'm cross-posting to Haskell Cafe. Haskellers: here's a nice chance for you get a cool Darcs feature without not very much effort or Darcs hacking experience :-) More info on: http://bugs.darcs.net/issue555 Max's write-up Here's a quick primer: Basically, darcs = 2.0 uses a hashed pristine store that acts as a file object cache. An interesting artifact of the pristine.hashed store, which is being pushed into a useful third-party accessible library named hashed-storage, however, is that it does (for many reasons, most co-evolutionary) resemble the git object store. There are several differences, but one of the key differences that applies to the topic at hand is that darcs generally garbage collects pristine.hashed objects much faster than git. Darcs is very quick to garbage collect old objects partly because many aren't all that useful, but mostly because the primary representation for a repository state is the patch store (and inventory), so there is only one root pointer in the pristine store. Petr, the author of the hashed-storage library, briefly discusses this in his most recent design post about the future of hashed-storage: http://mornfall.net/blog/designing_storage_for_darcs.html Here's where the primer meets the topic at hand: A darcs branch consists of three major components: an inventory store, a patch store, and a pristine store. To store multiple branches in the same place you need to take care of: 1) storing the alternate inventories, and 2) if you want it to be relatively fast, storing additional objects in the pristine store. (The patch store will already happily hold more patches than are referenced in the current inventory.) (1) is mostly a matter of naming alternate inventories and swapping between them. Thanks to the *ahem* git-like nature of pristine.hashed/hashed-storage: darcs could easily archive (many) more pristine objects, than it will during normal operation, in pristine.hashed and it may be as simple as storing additional, useful root pointers visible to hashed-storage so that it knows not to garbage collect the objects from other branches. Here's where the fun happens: It seems to me that a branch switching tool, utilizing darcs' existing repository data stores, could be built almost purely on top of mostly just the hashed-storage library (which has been designed for reuse), as it exists today or hopefully with only minor tweaking, and with only minimal interaction with darcs itself. That is, in-repo branching could be provided entirely, today or soon, as a second/third-party tool to darcs. (!) I think this is great from a darcs perspective: darcs itself remains conceptually simple (1 repository == 1 branch), which is something that I for one love about darcs, and doesn't need additional commands in darcs iteslf. But yet, power users (and git escapees) would have easy access to a ``darcs-branch`` tool that provides simple and powerful in-repo switching. Potentially, such a tool is also a great candidate to be an earlier adopter for the darcs library support and can help better define and enhance darcs' public API. (It's also interesting in that it mirrors that hg's support for branches is an addon, and that both hg and git have darcs-like patch queues as addons.) I think this is even better from a hashed-storage perspective: ``darcs-branch`` would be a strong (new) use case for hashed-storage as a public API. The tool would provide good incentive to keep hashed-storage's API clean, and better incentive (than darcs' normal operation) to keep hashed-storage's garbage collection and object compaction strong. (With the 'cheap' cost of in-repo branches primarily a consequence of how well hashed-storage stores the additional objects of multiple branches. As a bonus, normal darcs operations should benefit as well from the gc/compaction optimizations that
Re: [Haskell-cafe] cheap in-repo local branches (just needs implementation)
I like it. git branches are nice to work with, and they don't the conceptual pain of creating an new repository. Things that make them nice: * When switching branches, all your files magically update (if they have not been modified). * Easy to maintain multiple branches, say stable and experimental. That helps me avoid getting clobbered by other's changes to APIs I depend on. Things that are a pain: * Comparing commits (patches) between branches. Its hard to tell what is one and what is in another. * When you have modified files, git is super picky about switching branches. * Once a remote branch is pushed to a public repo, its scary to remove it. You don't want to break somebody, but you don't want that old junk hanging around either. I don't mean to write about git, but if darcs was to have branches, thats the kind of stuff I would love to see. On Tue, Jul 21, 2009 at 2:23 PM, Eric Kowko...@darcs.net wrote: Hi everyone, Max Battcher had an idea that I thought I should post on the mailing list. The idea is about making branches in darcs. Right now, we take the view that a darcs branch is a darcs repository plain and simple. If you want to create a branch, all you have to do is darcs get (darcs get --lazy to be faster). While this is very simple, a lot of us think that it's inconvenient (one because it's slow, and two because you have to think of where to put the branch). So darcs users have been asking about in-repo branches for a while. And now, Max has come up with a way to implement them. What's nice about his approach is that it lets us keep the simplicity of darcs, while giving more demanding users a chance to work with branches. It also takes advantage of the Petr Ročkai's Summer of Code project to make darcs faster in our daily lives and for the matter, paves the way for a possible darcs plugin system in the future. On Max's advice, I'm cross-posting to Haskell Cafe. Haskellers: here's a nice chance for you get a cool Darcs feature without not very much effort or Darcs hacking experience :-) More info on: http://bugs.darcs.net/issue555 Max's write-up Here's a quick primer: Basically, darcs = 2.0 uses a hashed pristine store that acts as a file object cache. An interesting artifact of the pristine.hashed store, which is being pushed into a useful third-party accessible library named hashed-storage, however, is that it does (for many reasons, most co-evolutionary) resemble the git object store. There are several differences, but one of the key differences that applies to the topic at hand is that darcs generally garbage collects pristine.hashed objects much faster than git. Darcs is very quick to garbage collect old objects partly because many aren't all that useful, but mostly because the primary representation for a repository state is the patch store (and inventory), so there is only one root pointer in the pristine store. Petr, the author of the hashed-storage library, briefly discusses this in his most recent design post about the future of hashed-storage: http://mornfall.net/blog/designing_storage_for_darcs.html Here's where the primer meets the topic at hand: A darcs branch consists of three major components: an inventory store, a patch store, and a pristine store. To store multiple branches in the same place you need to take care of: 1) storing the alternate inventories, and 2) if you want it to be relatively fast, storing additional objects in the pristine store. (The patch store will already happily hold more patches than are referenced in the current inventory.) (1) is mostly a matter of naming alternate inventories and swapping between them. Thanks to the *ahem* git-like nature of pristine.hashed/hashed-storage: darcs could easily archive (many) more pristine objects, than it will during normal operation, in pristine.hashed and it may be as simple as storing additional, useful root pointers visible to hashed-storage so that it knows not to garbage collect the objects from other branches. Here's where the fun happens: It seems to me that a branch switching tool, utilizing darcs' existing repository data stores, could be built almost purely on top of mostly just the hashed-storage library (which has been designed for reuse), as it exists today or hopefully with only minor tweaking, and with only minimal interaction with darcs itself. That is, in-repo branching could be provided entirely, today or soon, as a second/third-party tool to darcs. (!) I think this is great from a darcs perspective: darcs itself remains conceptually simple (1 repository == 1 branch), which is something that I for one love about darcs, and doesn't need additional commands in darcs iteslf. But yet, power users (and git escapees) would have easy access to a ``darcs-branch`` tool that provides