On 05/07/2013 09:12 AM, Junio C Hamano wrote:
> Michael Haggerty <mhag...@alum.mit.edu> writes:
> 
>>>>>> CVS stores all of the revisions of a single file in a single filename,v
>>>>>> file in rcsfile(5) format.  The revisions are stored as deltas ordered
>>>>>> so that a single revision can be reconstructed from a single serial read
>>>>>> of the file.
>>>>>>
>>>>>> cvs2git reads each of these files once, reconstructing *all* of the
>>>>>> revisions for a file in a single go.  It then pours them into a
>>>>>> git-fast-import stream as blobs and sets a mark on each blob.
> 
> This is more or less off-topic but in the bigger picture it is more
> interesting and important X-<.
> 
> The way you describe how cvs2git handles the blobs is the more
> efficient way, given that fast-import does not even attempt to
> bother to create good deltas. The only thing it does is to see if
> the current data deltifies against the last object.
> 
> IIRC, CVS's backend storage is mostly recorded in backward delta, so
> if you are feeding the blob data from new to old, then the resulting
> pack would follow Linus's law (the file generally grows over time)
> and would generally give you a good deltified chain of objects.

Yes, you are correct about how CVS orders commits on the mainline.
Branches are stored in the opposite order--oldest to newest--but CVS
users don't tend to get carried away with branches anyway, and if the
changes are small deltafication should help a lot anyway.

Cool.  I knew that fast-import didn't do much in the way of compression,
but I didn't realize that it can compute deltas only between adjacent
blobs.  So cvs2git happily hits the sweet-spot of fast-import.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to