I know I still have a lot of holes to plug, but this was more
interesting because we could see some encouraging numbers.
Unfortunately the result is disappointing. Maybe I did it in a stupid
way and need to restart with a totally different way.

"rev-list --objects" on v2 takes 4 secs, v4 with current walker 11s
and the new walker 16s (worst!). perf's top functions with v2 are

 23,51%  git  libz.so.1.2.7       [.] inflate
 16,66%  git  git                 [.] lookup_object
 11,46%  git  libz.so.1.2.7       [.] inflate_fast
  6,89%  git  libc-2.16.so        [.] __memcpy_ssse3_back
  4,19%  git  libz.so.1.2.7       [.] inflate_table
  4,15%  git  git                 [.] find_pack_entry_one
  3,84%  git  git                 [.] decode_tree_entry

and with new walker

 58,61%  git  git                [.] decode_entries
 18,66%  git  git                [.] decode_varint
  9,73%  git  git                [.] use_pack
  3,31%  git  git                [.] nth_packed_object_offset
  1,73%  git  git                [.] process_tree
  1,66%  git  git                [.] pv4_lookup_blob
  1,09%  git  git                [.] get_pathref
  1,03%  git  libc-2.16.so       [.] __memcpy_ssse3_back
  0,90%  git  libz.so.1.2.7      [.] inflate
  0,50%  git  libz.so.1.2.7      [.] inflate_table

It's no surprise that lookup_object is no longer hot. The closet is
pv4_lookup_blob. nth_packed_object_offset is getting hotter as it's
used extensively by decode_entries.

And decode_entries is getting toooo hot. This function is now called
for each tree entry of every tree. And it does get_tree_offset_cache()
lookup for every call (ironically we try hard to avoid hash lookup in
lookup_object).

The only bit I haven't done is avoid checking if a tree is already
examined, if so do not bother with copy sequences referring to it.
That should cut down the number of decode_entries but not sure how
much because there's no relation between tree traversing order and how
copy sequences are made.

Maybe we could make an exception and allow the tree walker to pass
pv4_tree_cache* directly to decode_entries so it does not need to do
the first lookup every time..

Suggestions?

Nguyễn Thái Ngọc Duy (9):
  sha1_file: provide real packed type in object_info_extended
  pack v4: move v2 tree entry generation code out of decode_entries
  pv4_tree_desc: introduce new struct for pack v4 tree walker
  pv4_tree_desc: use struct tree_desc from pv4_tree_desc
  pv4_tree_desc: allow decode_entries to return v4 trees, one at a time
  pv4_tree_desc: complete interface
  pv4_tree_desc: don't bother looking for v4 trees if no v4 packs are present
  pv4_tree_desc: avoid lookup_object() when possible
  list-object.c: take "advantage" of new pv4_tree_desc interface

 cache.h        |   3 +-
 list-objects.c |  38 +++++----
 packv4-parse.c | 263 ++++++++++++++++++++++++++++++++++++++++++++++-----------
 packv4-parse.h |  48 +++++++++++
 sha1_file.c    |   9 +-
 streaming.c    |   9 +-
 6 files changed, 300 insertions(+), 70 deletions(-)

-- 
1.8.2.83.gc99314b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to