On Wed, Jun 01, 2005 at 03:39:20PM -0700, Jason Dagit wrote: > Ian Lynagh <[EMAIL PROTECTED]> writes: > > > On Thu, May 12, 2005 at 12:22:34PM -0700, Jason Dagit wrote: > >> My propsal is to use a format similar to ar (or we could maybe use ar > >> as is). > >> [snip description of ar idea] > > > > When applying patches etc this isn't a problem.
["this" refers to indexing of patch files; not a problem because we're going to apply the whole lot anyway] > Fortunately apply patches remains fast for small patches. I have a > repository where the pristine tree is about 200MB. Some of the > initial patches are 40-60Mb in size. Getting the repository isn't > really an option at this point. I believe the last time I tried it, > it took on the order of 6hours. With a darcs release, presumably? darcs-unstable should be a lot faster. The new hunk format (not yet implemented) should be faster still. > > If we just want to know what files a patch affects, or what patches > > affect a file, then we should have a separate index for this info. > > (we have to be careful re: renames, of course). > > Is anyone working on this? Is there a bug that I should be watching? No idea. > > I think we'd just want our own header at the start though, so rather > > than having > > > > gzip(foo_hunks, bar_hunks) > > > > we would have > > > > gzip(foo=0\nbar=sizeof(foo_hunks)\n, EOH, foo_hunks, bar_hunks) > > > > (this would also give us one of the above indices). > > (also, we have to be careful that either all the bits for one file are > > together or that we get them all. Probably best to try to do both). > > I respect the idea of starting small. I'm not up to speed on the > darcs source, but I do know some haskell. What would be the impacts > of making this change? How far reaching would it be? I'm trying to > guage if making this change would be a reasonable way to learn the > darcs source tree. Hmm, commuting patches can change the size of the data. I suspect the easiest way to do this without huge memory use will be to right the index to a different file than the patch. Essentially, when writing a patch list, alternate between writing the current offset/patch description to the index file and writing the patch (or the set of contiguous patches affecting the same file). After writing a patch see how many bytes are written (either keep track of how many you are writing or, probably worth it for the simplicity despite the possibly higher overhead, look at the filesize before/after writing). I think this should be a fairly small area impacted, around writePatch/gzWritePatch in PatchShow. Rather than just using showPatch you'll have to go inside the outer NamedP/ComP and hPutDoc the individual patches shown with showPatch. However you'll have to get the repoformat info there (which doesn't exist yet) so you know whether you are meant to be writing in the old format or the new format. Also, the index format needs to be decided (e.g. the exact syntax, how "rename foo bar" should appear in the index). Thanks Ian _______________________________________________ darcs-devel mailing list [email protected] http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel
