Hi,

On Sun, Sep 16, 2018 at 11:17:27AM -0700, John Austin wrote:
> Taylor Blau wrote:

>> Right, though this still subjects the remote copy to all of the
>> difficulty of packing large objects (though Christian's work to support
>> other object database implementations would go a long way to help this).
>
> Ah, interesting -- I didn't realize this step was part of the
> bottleneck. I presumed git didn't do much more than perhaps gzip'ing
> binary files when it packed them up. Or do you mean the growing cost
> of storing the objects locally as you work? Perhaps that could be
> solved by allowing the client more control (ie. delete the oldest
> blobs that exist on the server).

John, I believe you are correct.  Taylor, can you elaborate about what
packing overhead you are referring to?

One thing I would like to see in the long run to help Git cope with
very large files is adding something similar to bup's "bupsplit" to
the packfile format (or even better, to the actual object format, so
that it affects object names).  In other words, using a rolling hash
to decide where to split a blob and use a tree-like structure so that
(1) common portions between files can deduplicated and (2) portions
can be hashed in parallel.  I haven't heard of these things being the
bottleneck for anyone in practice today, though.

Thanks,
Jonathan

Reply via email to