On 8/28/2018 3:25 PM, Duy Nguyen wrote:
On Mon, Aug 27, 2018 at 9:36 PM Junio C Hamano <gits...@pobox.com> wrote:
PS. I notice that v4 does not pad to align entries at 4 byte boundary
like v2/v3. This could cause a slight slow down on x86 and segfault on
some other platforms.

Care to elaborate?

Long time ago, we used to mmap and read directly from the index file
contents, requiring either an unaligned read or padded entries.  But
that was eons ago and we first read and convert from on-disk using
get_be32() etc. to in-core structure, so I am not sure what you mean
by "segfault" here.


My bad. I saw this line

#define get_be16(p) ntohs(*(unsigned short *)(p))

and jumped to conclusion without realizing that block is for safe
unaligned access.

@@ -1898,7 +1884,8 @@ int do_read_index(struct index_state *istate, const char 
*path, int must_exist)
       struct cache_header *hdr;
       void *mmap;
       size_t mmap_size;
-     struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+     const struct cache_entry *previous_ce = NULL;
+     struct cache_entry *dummy_entry = NULL;

       if (istate->initialized)
               return istate->cache_nr;
@@ -1936,11 +1923,10 @@ int do_read_index(struct index_state *istate, const 
char *path, int must_exist)
       istate->initialized = 1;

       if (istate->version == 4) {
-             previous_name = &previous_name_buf;
+             previous_ce = dummy_entry = make_empty_transient_cache_entry(0);

I do like the idea of passing the previous ce around to tell the
next one what the previous name was, but I would have preferred to
see this done a bit more cleanly without requiring us to support "a
dummy entry with name whose length is 0"; a real cache entry never
has zero-length name, and our code may want to enforce it as a
sanity check.

I think we can just call create_from_disk() with NULL set to
previous_ce in the first round; of course, the logic to assign the
one we just created to previous_ce must check istate->version,
instead of "is previous_ce NULL?" (which is an indirect way to check
the same thing used in this patch).

Yeah I kinda hated dummy_entry too but the feeling wasn't strong
enough to move towards the index->version check. I guess I'm going to
do it now.


I ran some perf tests using p0002-read-cache.sh to compare V4 performance before and after this patch so I could get a feel for how much it helps.

100,000 files

Test                                  HEAD~1   HEAD
------------------------------------------------------------
read_cache/discard_cache 1000 times    14.12    10.75 -23.9%

1,000,000 files

Test                                  HEAD~1   HEAD
------------------------------------------------------------
read_cache/discard_cache 1000 times   202.81   170.33 -16.0%


This provides a nice speedup and IMO simplifies the code as well. Nicely done.

Reply via email to