[Haskell-cafe] cheap in-repo local branches (just needs implementation)

2009-07-21 Thread Eric Kow
Hi everyone,

Max Battcher had an idea that I thought I should post on the mailing list.

The idea is about making branches in darcs.  Right now, we take the view that a
darcs branch is a darcs repository plain and simple.  If you want to create a
branch, all you have to do is darcs get (darcs get --lazy to be faster).  While
this is very simple, a lot of us think that it's inconvenient (one because it's
slow, and two because you have to think of where to put the branch).

So darcs users have been asking about in-repo branches for a while.  And now,
Max has come up with a way to implement them.  What's nice about his approach
is that it lets us keep the simplicity of darcs, while giving more demanding
users a chance to work with branches.  It also takes advantage of the Petr
Ročkai's Summer of Code project to make darcs faster in our daily lives and for
the matter, paves the way for a possible darcs plugin system in the future.

On Max's advice, I'm cross-posting to Haskell Cafe.  Haskellers: here's a nice
chance for you get a cool Darcs feature without not very much effort or Darcs
hacking experience :-)
 
More info on: http://bugs.darcs.net/issue555


Max's write-up


Here's a quick primer: Basically, darcs = 2.0 uses a hashed pristine 
store that acts as a file object cache. An interesting artifact of the 
pristine.hashed store, which is being pushed into a useful third-party 
accessible library named hashed-storage, however, is that it does (for 
many reasons, most co-evolutionary) resemble the git object store. There 
are several differences, but one of the key differences that applies to 
the topic at hand is that darcs generally garbage collects 
pristine.hashed objects much faster than git.

Darcs is very quick to garbage collect old objects partly because many 
aren't all that useful, but mostly because the primary representation 
for a repository state is the patch store (and inventory), so there is 
only one root pointer in the pristine store. Petr, the author of the 
hashed-storage library, briefly discusses this in his most recent design 
post about the future of hashed-storage:

http://mornfall.net/blog/designing_storage_for_darcs.html

Here's where the primer meets the topic at hand: A darcs branch consists 
of three major components: an inventory store, a patch store, and a 
pristine store. To store multiple branches in the same place you need 
to take care of: 1) storing the alternate inventories, and 2) if you 
want it to be relatively fast, storing additional objects in the 
pristine store. (The patch store will already happily hold more patches 
than are referenced in the current inventory.) (1) is mostly a matter of 
naming alternate inventories and swapping between them. Thanks to the 
*ahem* git-like nature of pristine.hashed/hashed-storage: darcs could 
easily archive (many) more pristine objects, than it will during normal 
operation, in pristine.hashed and it may be as simple as storing 
additional, useful root pointers visible to hashed-storage so that it 
knows not to garbage collect the objects from other branches.

Here's where the fun happens: It seems to me that a branch switching 
tool, utilizing darcs' existing repository data stores, could be built 
almost purely on top of mostly just the hashed-storage library (which 
has been designed for reuse), as it exists today or hopefully with only 
minor tweaking, and with only minimal interaction with darcs itself. 
That is, in-repo branching could be provided entirely, today or soon, as 
a second/third-party tool to darcs. (!)

I think this is great from a darcs perspective: darcs itself remains 
conceptually simple (1 repository == 1 branch), which is something that 
I for one love about darcs, and doesn't need additional commands in 
darcs iteslf. But yet, power users (and git escapees) would have easy 
access to a ``darcs-branch`` tool that provides simple and powerful 
in-repo switching. Potentially, such a tool is also a great candidate to 
be an earlier adopter for the darcs library support and can help better 
define and enhance darcs' public API. (It's also interesting in that it 
mirrors that hg's support for branches is an addon, and that both hg and 
git have darcs-like patch queues as addons.)

I think this is even better from a hashed-storage perspective: 
``darcs-branch`` would be a strong (new) use case for hashed-storage as 
a public API. The tool would provide good incentive to keep 
hashed-storage's API clean, and better incentive (than darcs' normal 
operation) to keep hashed-storage's garbage collection and object 
compaction strong. (With the 'cheap' cost of in-repo branches primarily 
a consequence of how well hashed-storage stores the additional objects 
of multiple branches. As a bonus, normal darcs operations should benefit 
as well from the gc/compaction optimizations that 

Re: [Haskell-cafe] cheap in-repo local branches (just needs implementation)

2009-07-21 Thread Justin Bailey
I like it. git branches are nice to work with, and they don't the
conceptual pain of creating an new repository.

Things that make them nice:

  * When switching branches, all your files magically update (if they
have not been modified).
  * Easy to maintain multiple branches, say stable and
experimental. That helps me avoid getting clobbered by other's
changes to APIs I depend on.

Things that are a pain:

  * Comparing commits (patches) between branches. Its hard to tell
what is one and what is in another.
  * When you have modified files, git is super picky about switching branches.
  * Once a remote branch is pushed to a public repo, its scary to
remove it. You don't want to break somebody, but you don't want that
old junk hanging around either.

I don't mean to write about git, but if darcs was to have branches,
thats the kind of stuff I would love to see.

On Tue, Jul 21, 2009 at 2:23 PM, Eric Kowko...@darcs.net wrote:
 Hi everyone,

 Max Battcher had an idea that I thought I should post on the mailing list.

 The idea is about making branches in darcs.  Right now, we take the view that 
 a
 darcs branch is a darcs repository plain and simple.  If you want to create a
 branch, all you have to do is darcs get (darcs get --lazy to be faster).  
 While
 this is very simple, a lot of us think that it's inconvenient (one because 
 it's
 slow, and two because you have to think of where to put the branch).

 So darcs users have been asking about in-repo branches for a while.  And now,
 Max has come up with a way to implement them.  What's nice about his approach
 is that it lets us keep the simplicity of darcs, while giving more demanding
 users a chance to work with branches.  It also takes advantage of the Petr
 Ročkai's Summer of Code project to make darcs faster in our daily lives and 
 for
 the matter, paves the way for a possible darcs plugin system in the future.

 On Max's advice, I'm cross-posting to Haskell Cafe.  Haskellers: here's a nice
 chance for you get a cool Darcs feature without not very much effort or Darcs
 hacking experience :-)

 More info on: http://bugs.darcs.net/issue555

 
 Max's write-up
 

 Here's a quick primer: Basically, darcs = 2.0 uses a hashed pristine
 store that acts as a file object cache. An interesting artifact of the
 pristine.hashed store, which is being pushed into a useful third-party
 accessible library named hashed-storage, however, is that it does (for
 many reasons, most co-evolutionary) resemble the git object store. There
 are several differences, but one of the key differences that applies to
 the topic at hand is that darcs generally garbage collects
 pristine.hashed objects much faster than git.

 Darcs is very quick to garbage collect old objects partly because many
 aren't all that useful, but mostly because the primary representation
 for a repository state is the patch store (and inventory), so there is
 only one root pointer in the pristine store. Petr, the author of the
 hashed-storage library, briefly discusses this in his most recent design
 post about the future of hashed-storage:

 http://mornfall.net/blog/designing_storage_for_darcs.html

 Here's where the primer meets the topic at hand: A darcs branch consists
 of three major components: an inventory store, a patch store, and a
 pristine store. To store multiple branches in the same place you need
 to take care of: 1) storing the alternate inventories, and 2) if you
 want it to be relatively fast, storing additional objects in the
 pristine store. (The patch store will already happily hold more patches
 than are referenced in the current inventory.) (1) is mostly a matter of
 naming alternate inventories and swapping between them. Thanks to the
 *ahem* git-like nature of pristine.hashed/hashed-storage: darcs could
 easily archive (many) more pristine objects, than it will during normal
 operation, in pristine.hashed and it may be as simple as storing
 additional, useful root pointers visible to hashed-storage so that it
 knows not to garbage collect the objects from other branches.

 Here's where the fun happens: It seems to me that a branch switching
 tool, utilizing darcs' existing repository data stores, could be built
 almost purely on top of mostly just the hashed-storage library (which
 has been designed for reuse), as it exists today or hopefully with only
 minor tweaking, and with only minimal interaction with darcs itself.
 That is, in-repo branching could be provided entirely, today or soon, as
 a second/third-party tool to darcs. (!)

 I think this is great from a darcs perspective: darcs itself remains
 conceptually simple (1 repository == 1 branch), which is something that
 I for one love about darcs, and doesn't need additional commands in
 darcs iteslf. But yet, power users (and git escapees) would have easy
 access to a ``darcs-branch`` tool that provides