On Sun, Dec 22, 2013 at 09:55:23PM +0000, Ben Maurer wrote:

> One issue with this approach is that it seems git-pack-index doesn't
> perform as well with thin packs. git-index-pack uses a multi-threaded
> approach to resolving the deltas. However, the multithreading only
> works on deltas that are exclusively in the pack. After the
> multi-threaded phase, it incrementally brings in objects from external
> packs, but in single threaded manner. Many objects in the pack have
> some dependency on an external object, therefore, defeating the
> multithreading.

Yes. It will also just plain perform worse, because it will have to
copy over more external objects. This is somewhat made up for getting an
actual smaller pack size, but I suspect the completed thin-pack ends up
larger than what the server would otherwise send. Because the server is
blindly reusing on-disk deltas (which is good, because it takes load off
of the server), it misses good delta opportunities between objects in
the sent pack (which are likely almost as small, but would not require
fixing on the other end).

Single-threading the extra work we have to do just exacerbates the
problem, of course.

Still, I think it will be a net win for end-to-end wall clock time of
the operation. You are saving CPU time on the server end, and you're
saving network bandwidth with a smaller pack.

In my tests on torvalds/linux, doing a fetch across a local machine (so
basically discounting network improvements), the times look like (this
is end-to-end, counting both server and client CPU time):

  [vanilla]
  real    0m3.850s
  user    0m7.504s
  sys     0m0.380s

  [patched]
  real    0m2.785s
  user    0m2.472s
  sys     0m0.180s

So it was a win both for wall-clock and CPU.

> What's the use case for a pack file with a SHA1 reference living
> inside the pack file (why not just use an offset?) Would it make sense
> to assume that all such files are external and bring them in in the
> first phase.

Once upon a time, ref-delta was the only format supported by packfiles. Later,
delta-base-offset was invented, and the client and server negotiate the
use of the feature before the packfile is generated (and even when we
reuse objects, pack-objects will rewrite the header on the fly to use
ref-delta if necessary).

These days, pretty much everybody supports delta-base-offset, so I don't
think there is any reason index-pack should see ref-delta for a non-thin
object. We could probably teach index-pack an "--assume-refs-are-thin"
option to optimize for this case, and have fetch-pack/receive-pack pass
it whenever they know that delta-base-offset was negotiated.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to