On 3/16/2018 4:19 PM, Jeff King wrote:
On Fri, Mar 16, 2018 at 04:06:39PM -0400, Jeff King wrote:

Furthermore, in order to look at an object it has to be zlib inflated
first, and since commit objects tend to be much smaller than trees and
especially blobs, there are a lot less bytes to inflate:

   $ grep ^commit type-size |cut -d' ' -f2 |avg
   34395730 / 53754 = 639
   $ cat type-size |cut -d' ' -f2 |avg
   3866685744 / 244723 = 15800

So a simple revision walk inflates less than 1% of the bytes that the
"enumerate objects packfiles" approach has to inflate.
I don't think this is quite accurate. It's true that we have to
_consider_ every object, but Git is smart enough not to inflate each one
to find its type. For loose objects we just inflate the header. For
packed objects, we either pick the type directly out of the packfile
header (for a non-delta) or can walk the delta chain (without actually
looking at the data bytes!) until we hit the base.
Hmm, so that's a big part of the problem with this patch series. It
actually _does_ unpack every object with --stdin-packs to get the type,
which is just silly. With the patch below, my time for "commit-graph
write --stdin-packs" on linux.git goes from over 5 minutes (I got bored
and killed it) to 17 seconds.

diff --git a/commit-graph.c b/commit-graph.c
index 6348bab82b..cf1da2e8c1 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -491,11 +491,12 @@ static int add_packed_commits(const struct object_id *oid,
  {
        struct packed_oid_list *list = (struct packed_oid_list*)data;
        enum object_type type;
-       unsigned long size;
-       void *inner_data;
        off_t offset = nth_packed_object_offset(pack, pos);
-       inner_data = unpack_entry(pack, offset, &type, &size);
-       FREE_AND_NULL(inner_data);
+       struct object_info oi = OBJECT_INFO_INIT;
+
+       oi.typep = &type;
+       if (packed_object_info(pack, offset, &oi) < 0)
+               die("unable to get type of object %s", oid_to_hex(oid));
if (type != OBJ_COMMIT)
                return 0;

-Peff

Thanks for this! Fixing this performance problem is very important to me, as we will use the "--stdin-packs" mechanism in the GVFS scenario (we will walk all commits in the prefetch packs full of commits and trees instead of relying on refs). This speedup is very valuable!

Thanks,
-Stolee

Reply via email to