Re: [darcs-devel] Proposal for new format to store patch files

Ian Lynagh Wed, 01 Jun 2005 17:41:47 -0700

On Wed, Jun 01, 2005 at 03:39:20PM -0700, Jason Dagit wrote:
> Ian Lynagh <[EMAIL PROTECTED]> writes:
> 
> > On Thu, May 12, 2005 at 12:22:34PM -0700, Jason Dagit wrote:
> >> My propsal is to use a format similar to ar (or we could maybe use ar
> >> as is).
> >> [snip description of ar idea]
> >
> > When applying patches etc this isn't a problem.


["this" refers to indexing of patch files; not a problem because we're
 going to apply the whole lot anyway]

> Fortunately apply patches remains fast for small patches.  I have a
> repository where the pristine tree is about 200MB.  Some of the
> initial patches are 40-60Mb in size.  Getting the repository isn't
> really an option at this point.  I believe the last time I tried it,
> it took on the order of 6hours.

With a darcs release, presumably? darcs-unstable should be a lot faster.
The new hunk format (not yet implemented) should be faster still.

> > If we just want to know what files a patch affects, or what patches
> > affect a file, then we should have a separate index for this info.
> > (we have to be careful re: renames, of course).
> 
> Is anyone working on this?  Is there a bug that I should be watching?

No idea.

> > I think we'd just want our own header at the start though, so rather
> > than having
> >
> >     gzip(foo_hunks, bar_hunks)
> >
> > we would have
> >
> >     gzip(foo=0\nbar=sizeof(foo_hunks)\n, EOH, foo_hunks, bar_hunks)
> >
> > (this would also give us one of the above indices).
> > (also, we have to be careful that either all the bits for one file are
> > together or that we get them all. Probably best to try to do both).
> 
> I respect the idea of starting small.  I'm not up to speed on the
> darcs source, but I do know some haskell.  What would be the impacts
> of making this change?  How far reaching would it be?  I'm trying to
> guage if making this change would be a reasonable way to learn the
> darcs source tree.

Hmm, commuting patches can change the size of the data. I suspect the
easiest way to do this without huge memory use will be to right the
index to a different file than the patch.

Essentially, when writing a patch list, alternate between writing the
current offset/patch description to the index file and writing the patch
(or the set of contiguous patches affecting the same file). After
writing a patch see how many bytes are written (either keep track of how
many you are writing or, probably worth it for the simplicity despite the
possibly higher overhead, look at the filesize before/after writing).

I think this should be a fairly small area impacted, around
writePatch/gzWritePatch in PatchShow. Rather than just using showPatch
you'll have to go inside the outer NamedP/ComP and hPutDoc the
individual patches shown with showPatch. However you'll have to get the
repoformat info there (which doesn't exist yet) so you know whether you
are meant to be writing in the old format or the new format. Also, the
index format needs to be decided (e.g. the exact syntax, how
"rename foo bar" should appear in the index).


Thanks
Ian


_______________________________________________
darcs-devel mailing list
[email protected]
http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel

Re: [darcs-devel] Proposal for new format to store patch files

Reply via email to