Jeff King wrote: > It depends on what each side has it, doesn't it? We generally try to > reuse on-disk deltas when we can, since they require no computation. If > I have object A delta'd against B, and I know that the other side wants > A and has B (or I am also sending B), I can simply send what I have on > disk. So we do not just blit out the existing pack as-is, but we may > reuse portions of it as appropriate.
I'll raise some (hopefully interesting) points. Let's take the example of a simple push: I start send-pack, which in turn starts receive_pack on the server and connects its stdin/stdout to it (using git_connect). Now, it reads the (sha1, ref) pairs it receives on stdin and spawns pack-objects --stdout with the right arguments as the response, right? Overall, nothing special: just pack-objects invoked with specific arguments. How does pack-objects work? ll_find_deltas() spawns some worker threads to find_deltas(). This is where this get fuzzy for me: it calls try_delta() in a nested loop, trying to find the smallest delta, right? I'm not sure whether the interfaces it uses to read objects differentiates between packed deltas versus packed undeltified objects versus loose objects at all. Now, the main problem I see is that a pack has to be self-contained: I can't have an object "bar" which is a delta against an object that is not present in the pack, right? Therefore no matter what the server already has, I have to prepare deltas only against the data that I'm sending across. > Of course we may have to reconstruct deltas for trees in order to find > the correct set of objects (i.e., the history traversal). But that is a > separate phase from generating the pack's object content, and we do not > reuse any of the traversal work in later phases. I see. Are we talking about tree-walk.c here? This is not unique to packing at all; we need to do this kind of traversal for any git operation that digs into older history, no? I recall a discussion about using generation numbers to speed up the walk: I tried playing with your series (where you use a cache to keep the generation numbers), but got nowhere. Does it make sense to think about speeding up the walk (how?). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html