[darcs-devel] Re: summary of my recent spurt of patches, and request for suggestions

2006-12-04 Thread Simon Marlow

David Roundy wrote:


I've been working hard on getting support for the new hashed inventory
format into good shape.  If you aren't familiar with the benefits of the
new format (which I've talked about with at least some of you in person),
suffice to say that I see it as a precursor to working out the new way of
dealing with conflicts.


As an interested bystander, I'd really like to hear a brief description of what 
a "hashed inventory" is, and what benefits it brings.  Not a 12-page paper, just 
a quick outline will do fine, I don't want to distract you from the hacking 
frenzy :)


Cheers,
Simon


___
darcs-devel mailing list
darcs-devel@darcs.net
http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel


[darcs-devel] Re: summary of my recent spurt of patches, and request for suggestions

2006-12-04 Thread David Roundy
On Mon, Dec 04, 2006 at 04:34:48PM +, Simon Marlow wrote:
> David Roundy wrote:
> 
> >I've been working hard on getting support for the new hashed inventory
> >format into good shape.  If you aren't familiar with the benefits of the
> >new format (which I've talked about with at least some of you in person),
> >suffice to say that I see it as a precursor to working out the new way of
> >dealing with conflicts.
> 
> As an interested bystander, I'd really like to hear a brief description of 
> what a "hashed inventory" is, and what benefits it brings.  Not a 12-page 
> paper, just a quick outline will do fine, I don't want to distract you from 
> the hacking frenzy :)

A hashed inventory is a modification of the darcs repository format, which
essentially replaces the _darcs/inventory file (which is human-readable, if
not human-modifiable, so if you're not familiar with it, you could take a
look) with a _darcs/hashed_inventory file.  The difference is that a hash
of the contents of each patch is stored, along with the identifier of the
patch, as is currently stored.  This hash is then used as the filename in
_darcs/patches/.  This has several benefits.

At the most obvious level, we've now got some extra information for
checking the consistency of a repository (helpful if, e.g. an http proxy
modifies files in transit).

The next advantage is that by cryptographically signing the hashed
inventory, you cryptographically sign the entire contents of the repository
(unless someone cracks sha1).  This is potentially valuable to high-profile
projects, or projects that use untrusted mirrors.

Next, because the filename for patches now depends on patch contents, all
darcs commands will be atomic (except with respect to the pristine
cache--but atomic with respect to remote access), including those that
currently aren't, such as amend-record and obliterate.

With hashed inventories it will be possible to implement "lazy" partial
repositories, in which darcs downloads patch files as needed to do the
commands you ask, since we'll have the hash with which to verify that the
patch files haven't been commuted (and therefore are still in the proper
context for our use).

Finally, as I mentioned above, the refactoring for this change should help
with our plans for new conflict handling, which will probably require that
we break the current picture of one patch file per named patch (which
wouldn't work in the current scheme where the patch filename is determined
by the name of the patch).
-- 
David Roundy
Department of Physics
Oregon State University

___
darcs-devel mailing list
darcs-devel@darcs.net
http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel


[darcs-devel] Re: summary of my recent spurt of patches, and request for suggestions

2006-12-05 Thread Simon Marlow

David Roundy wrote:

On Mon, Dec 04, 2006 at 04:34:48PM +, Simon Marlow wrote:


David Roundy wrote:



I've been working hard on getting support for the new hashed inventory
format into good shape.  If you aren't familiar with the benefits of the
new format (which I've talked about with at least some of you in person),
suffice to say that I see it as a precursor to working out the new way of
dealing with conflicts.


As an interested bystander, I'd really like to hear a brief description of 
what a "hashed inventory" is, and what benefits it brings.  Not a 12-page 
paper, just a quick outline will do fine, I don't want to distract you from 
the hacking frenzy :)



A hashed inventory is a modification of the darcs repository format, which
essentially replaces the _darcs/inventory file (which is human-readable, if
not human-modifiable, so if you're not familiar with it, you could take a
look) with a _darcs/hashed_inventory file.  The difference is that a hash
of the contents of each patch is stored, along with the identifier of the
patch, as is currently stored.  This hash is then used as the filename in
_darcs/patches/.  This has several benefits.

At the most obvious level, we've now got some extra information for
checking the consistency of a repository (helpful if, e.g. an http proxy
modifies files in transit).

The next advantage is that by cryptographically signing the hashed
inventory, you cryptographically sign the entire contents of the repository
(unless someone cracks sha1).  This is potentially valuable to high-profile
projects, or projects that use untrusted mirrors.

Next, because the filename for patches now depends on patch contents, all
darcs commands will be atomic (except with respect to the pristine
cache--but atomic with respect to remote access), including those that
currently aren't, such as amend-record and obliterate.

With hashed inventories it will be possible to implement "lazy" partial
repositories, in which darcs downloads patch files as needed to do the
commands you ask, since we'll have the hash with which to verify that the
patch files haven't been commuted (and therefore are still in the proper
context for our use).

Finally, as I mentioned above, the refactoring for this change should help
with our plans for new conflict handling, which will probably require that
we break the current picture of one patch file per named patch (which
wouldn't work in the current scheme where the patch filename is determined
by the name of the patch).


Great, thanks David!

Simon


___
darcs-devel mailing list
darcs-devel@darcs.net
http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel


Re: [darcs-devel] Re: summary of my recent spurt of patches, and request for suggestions

2006-12-17 Thread Juliusz Chroboczek
> At the most obvious level, we've now got some extra information for
> checking the consistency of a repository (helpful if, e.g. an http proxy
> modifies files in transit).

This is especially important for Windows users, who tend to have
random software modify line endings behind their back.  (Since they
tend to have their repos corrupted in a consistent manner, the normal
consistency checks don't notice the corruption.)

> The next advantage is that by cryptographically signing the hashed
> inventory, you cryptographically sign the entire contents of the repository
> (unless someone cracks sha1).  This is potentially valuable to high-profile
> projects, or projects that use untrusted mirrors.

...and projects that make their tree available over plain HTTP (no TLS).
Which is every project known to me.

Note, however, that unlike what happens with Git, Monotone or Arch,
the hashes do not protect patches in transit: hashes are invalidated
when they are commuted.  We have been thinking of a hashing algorithm
that is invariant w.r.t. commutation at FOSDEM last, and came up with
a rather nice design, but it needs implementing[1].

> With hashed inventories it will be possible to implement "lazy" partial
> repositories, in which darcs downloads patch files as needed to do the
> commands you ask, since we'll have the hash with which to verify that the
> patch files haven't been commuted (and therefore are still in the proper
> context for our use).

It will also be possible to do a

  darcs pull --sibling ../darcs-unstable http://darcs.net/repos/darcs

where a patch will be copied locally if it is found in the sibling
repo.

As should be clear from the above, I am convinced that hashed
inventories are a Good Thing (tm).

Juliusz

[1] My vow of no hacking ends at the beginning of February.

___
darcs-devel mailing list
darcs-devel@darcs.net
http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel


Re: [darcs-devel] Re: summary of my recent spurt of patches, and request for suggestions

2006-12-17 Thread David Roundy
On Sun, Dec 17, 2006 at 09:16:01PM +0100, Juliusz Chroboczek wrote:
> > With hashed inventories it will be possible to implement "lazy" partial
> > repositories, in which darcs downloads patch files as needed to do the
> > commands you ask, since we'll have the hash with which to verify that the
> > patch files haven't been commuted (and therefore are still in the proper
> > context for our use).
> 
> It will also be possible to do a
> 
>   darcs pull --sibling ../darcs-unstable http://darcs.net/repos/darcs
> 
> where a patch will be copied locally if it is found in the sibling
> repo.

And, in fact, I've thought that once hashed inventories are in, it'd make a
lot of sense on systems supporting hard links to by default stick all
patches in ~/.darcs/patches/ or something.  Possibly also supporting an env
variable to indicate a system-wide patch store, so that we could avoid the
inconvenience of specifying siblings, and could in certain circumstances
avoid a heck of a lot of patch copying (and copies).

> As should be clear from the above, I am convinced that hashed
> inventories are a Good Thing (tm).

:)

> [1] My vow of no hacking ends at the beginning of February.

Yay!
-- 
David Roundy
Department of Physics
Oregon State University

___
darcs-devel mailing list
darcs-devel@darcs.net
http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel