Al Viro <v...@zeniv.linux.org.uk> writes: > FWIW, I wasn't proposing to recreate the remaining bits of that _pack_; > just do the normal pull with one addition: start with sending the list > of sha1 of objects you are about to send and let the recepient reply > with "I already have <set of sha1>, don't bother with those". And exclude > those from the transfer.
I did a quick-and-dirty unscientific experiment. I had a clone of Linus's repository that was about a week old, whose tip was at 4de8ebef (Merge tag 'trace-fixes-v4.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace, 2016-02-22). To bring it up to date (i.e. a pull about a week's worth of progress) to f691b77b (Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs, 2016-03-01): $ git rev-list --objects 4de8ebef..f691b77b1fc | wc -l 1396 $ git rev-parse 4de8ebef..f691b77b1fc | git pack-objects --revs --delta-base-offset --stdout | wc -c 2444127 So in order to salvage some transfer out of 2.4MB, the hypothetical Al protocol would first have the upload-pack give 20*1396 = 28kB object names to fetch-pack; no matter how fetch-pack encodes its preference, its answer would be less than 28kB. We would likely to design this part of the new protocol in line with the existing part and use textual object names, so let's round them up to 100kB. That is quite small, even if you are on a crappy connection that you need to retry 5 times, the additional overhead to negotiate the list of objects alone would be 0.5MB (or less than 20% of the real transfer). That is quite interesting [*1*]. For the approach to be practical, you would have to write a program that reads from a truncated packfile and writes a new packfile, excising deltas that lack their bases, to salvage objects from a half-transferred packfile; it is however unclear how involved the code would get. It is probably OK for a tiny pack that has only 1400 objects--we could just pass the early part through unpack-objects and let it die when it hits EOF, but for a "resumable clone", I do not think you can afford to unpack 4.6M objects in the kernel repository into loose objects. The approach of course requires the server end to spend 5 times as many cycles as usual in order to help a client that retries 5 times. On the other hand, the resumable "clone" we were discussing by allowing the server to respond with a slightly older bundle or a pack and then asking the client to fill the latest bits by a follow-up fetch targets to reduce the load of the server side (the "slightly older" part can be offloaded to CDN). It is a happy side effect that material offloaded to CDN can more easily obtained via HTTPS that is trivially resumable ;-) I think your "I've got these already" extention may be worth trying, and it is definitely better than the "let's make sure the server end creates byte-for-byte identical pack stream, and discard the early part without sending it to the network", and it may help resuming a small incremental fetch, but I do not think it is advisable to use it for a full clone, given that it is very likely that we would be adding the "offload 'clone' to CDN" kind. Even though I can foresee both kinds to co-exist, I do not think it is practical to offer it for resuming multi-hour cloning of the kernel repository (or worse, Android repositories) over a trans-Pacific link, for example. [Footnote] *1* To update v4.5-rc1 to today's HEAD involves 10809 objects, and the pack data takes 14955728 bytes. That translates to ~440kB needed to advertise a list of textual object names to salvage object transfer of 15MB. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html