Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte
On Wed, Aug 09, 2017 at 05:20:05AM -0400, Jeff King wrote: > > I still wonder if we want to retire that conditional invocation of > > sha1_entry_pos(), though. > > I think so. Digging in the list for it, almost every mention is either > somebody asking "should we scrap this?" or somebody showing benchmarks > in which it is slower than the normal lookup (and then somebody asking > "should we scrap this" :) ). > > I just re-ran a simple benchmark and it is indeed slower. I also came > across the hashcmp open-code-versus-memcmp discussion, which shows that > the memcmp in recent glibc is much faster. That has been around long > enough that it's probably worth switching to. So here are two patches (on top of René's since there is otherwise a minor textual conflict). [1/2]: sha1_file: drop experimental GIT_USE_LOOKUP search [2/2]: hashcmp: use memcmp instead of open-coded loop cache.h | 9 +- sha1-lookup.c | 216 -- sha1_file.c | 11 -- t/t5308-pack-detect-duplicates.sh | 11 +- t/test-lib.sh | 1 - 5 files changed, 2 insertions(+), 246 deletions(-) -Peff
Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte
On Tue, Aug 08, 2017 at 10:36:33PM -0700, Junio C Hamano wrote: > > Actually, I take it back. The problem happens when we enter the loop > > with no entries to look at. But both sha1_pos() and sha1_entry_pos() > > return early before hitting their do-while loops in that case. > > Ah, I was not looking at that part of the code. Thanks. > > I still wonder if we want to retire that conditional invocation of > sha1_entry_pos(), though. I think so. Digging in the list for it, almost every mention is either somebody asking "should we scrap this?" or somebody showing benchmarks in which it is slower than the normal lookup (and then somebody asking "should we scrap this" :) ). I just re-ran a simple benchmark and it is indeed slower. I also came across the hashcmp open-code-versus-memcmp discussion, which shows that the memcmp in recent glibc is much faster. That has been around long enough that it's probably worth switching to. -Peff
Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte
Jeff King writes: > On Tue, Aug 08, 2017 at 06:52:31PM -0400, Jeff King wrote: > >> > Interesting. I see that we still have the conditional code to call >> > out to sha1-lookup.c::sha1_entry_pos(). Do we need a similar change >> > over there, I wonder? Alternatively, as we have had the experimental >> > sha1-lookup.c::sha1_entry_pos() long enough without anybody using it, >> > perhaps we should write it off as a failed experiment and retire it? >> >> There is also sha1_pos(), which seems to have the same problem (and is >> used in several places). > > Actually, I take it back. The problem happens when we enter the loop > with no entries to look at. But both sha1_pos() and sha1_entry_pos() > return early before hitting their do-while loops in that case. Ah, I was not looking at that part of the code. Thanks. I still wonder if we want to retire that conditional invocation of sha1_entry_pos(), though.
Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte
On Tue, Aug 08, 2017 at 06:52:31PM -0400, Jeff King wrote: > > Interesting. I see that we still have the conditional code to call > > out to sha1-lookup.c::sha1_entry_pos(). Do we need a similar change > > over there, I wonder? Alternatively, as we have had the experimental > > sha1-lookup.c::sha1_entry_pos() long enough without anybody using it, > > perhaps we should write it off as a failed experiment and retire it? > > There is also sha1_pos(), which seems to have the same problem (and is > used in several places). Actually, I take it back. The problem happens when we enter the loop with no entries to look at. But both sha1_pos() and sha1_entry_pos() return early before hitting their do-while loops in that case. -Peff
Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte
On Tue, Aug 08, 2017 at 03:43:13PM -0700, Junio C Hamano wrote: > > @@ -2812,7 +2812,7 @@ off_t find_pack_entry_one(const unsigned char *sha1, > > hi = mi; > > else > > lo = mi+1; > > - } while (lo < hi); > > + } > > return 0; > > } > > Interesting. I see that we still have the conditional code to call > out to sha1-lookup.c::sha1_entry_pos(). Do we need a similar change > over there, I wonder? Alternatively, as we have had the experimental > sha1-lookup.c::sha1_entry_pos() long enough without anybody using it, > perhaps we should write it off as a failed experiment and retire it? There is also sha1_pos(), which seems to have the same problem (and is used in several places). -Peff
Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte
René Scharfe writes: > find_pack_entry_one() uses the fan-out table of pack indexes to find out > which entries match the first byte of the searched hash and does a > binary search on this subset of the main index table. > > If there are no matching entries then lo and hi will have the same > value. The binary search still starts and compares the hash of the > following entry (which has a non-matching first byte, so won't cause any > trouble), or whatever comes after the sorted list of entries. > > The probability of that stray comparison matching by mistake is low, but > let's not take any chances and check when entering the binary search > loop if we're actually done already. > > Signed-off-by: Rene Scharfe > --- > sha1_file.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/sha1_file.c b/sha1_file.c > index b60ae15f70..11ee69a99d 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -2799,7 +2799,7 @@ off_t find_pack_entry_one(const unsigned char *sha1, > return nth_packed_object_offset(p, pos); > } > > - do { > + while (lo < hi) { > unsigned mi = (lo + hi) / 2; > int cmp = hashcmp(index + mi * stride, sha1); > > @@ -2812,7 +2812,7 @@ off_t find_pack_entry_one(const unsigned char *sha1, > hi = mi; > else > lo = mi+1; > - } while (lo < hi); > + } > return 0; > } Interesting. I see that we still have the conditional code to call out to sha1-lookup.c::sha1_entry_pos(). Do we need a similar change over there, I wonder? Alternatively, as we have had the experimental sha1-lookup.c::sha1_entry_pos() long enough without anybody using it, perhaps we should write it off as a failed experiment and retire it?
Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte
René Scharfe wrote: > find_pack_entry_one() uses the fan-out table of pack indexes to find out > which entries match the first byte of the searched hash and does a > binary search on this subset of the main index table. > > If there are no matching entries then lo and hi will have the same > value. The binary search still starts and compares the hash of the > following entry (which has a non-matching first byte, so won't cause any > trouble), or whatever comes after the sorted list of entries. > > The probability of that stray comparison matching by mistake is low, but > let's not take any chances and check when entering the binary search > loop if we're actually done already. > > Signed-off-by: Rene Scharfe > --- > sha1_file.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Thanks for a clear explanation. Sanity checking: is this correct in the sha1[0] == 0 case? In that case, we have lo = 0, hi = the 0th offset from the fanout table. The offsets in the fanout table are defined as "the number of objects in the corresponding pack, the first byte of whose object name is less than or equal to N." So hi == lo would mean there are no objects with id starting with 0, as hoped. Or in other words, the [lo, hi) interval we're trying to search is indeed a half-open interval. Reviewed-by: Jonathan Nieder
[PATCH] sha1_file: avoid comparison if no packed hash matches the first byte
find_pack_entry_one() uses the fan-out table of pack indexes to find out which entries match the first byte of the searched hash and does a binary search on this subset of the main index table. If there are no matching entries then lo and hi will have the same value. The binary search still starts and compares the hash of the following entry (which has a non-matching first byte, so won't cause any trouble), or whatever comes after the sorted list of entries. The probability of that stray comparison matching by mistake is low, but let's not take any chances and check when entering the binary search loop if we're actually done already. Signed-off-by: Rene Scharfe --- sha1_file.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index b60ae15f70..11ee69a99d 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2799,7 +2799,7 @@ off_t find_pack_entry_one(const unsigned char *sha1, return nth_packed_object_offset(p, pos); } - do { + while (lo < hi) { unsigned mi = (lo + hi) / 2; int cmp = hashcmp(index + mi * stride, sha1); @@ -2812,7 +2812,7 @@ off_t find_pack_entry_one(const unsigned char *sha1, hi = mi; else lo = mi+1; - } while (lo < hi); + } return 0; } -- 2.14.0