Re: [ANNOUNCE] Git v2.19.0-rc0
On Thu, 2018-08-23 at 06:26 -0400, Derrick Stolee wrote: > > Around the time that my proposed approaches were getting vetoed for > alignment issues, I figured I was out of my depth here. I reached out to > Daniel Lemire (of EWAH bitmap fame) on Twitter [1]. His blog is full of > posts of word-based approaches to different problems, so I thought he > might know something off the top of his head that would be applicable. > His conclusion (after looking only a short time) was to take a 'hasheq' > approach [2] like Peff suggested [3]. Since that requires auditing all > callers of hashcmp to see if hasheq is appropriate, it is not a good > solution for 2.19 but (in my opinion) should be evaluated as part of the > 2.20 cycle. > That was an interesting blog post, indeed. It had an interesting comments section. One comment especially caught my eyes was [a]: "So the way gcc (and maybe clang) handles this is specifically by recognizing memcmp and checking whether a only a 2-way result is needed and then essentially replacing it with a memcmp_eq call. ..." I find this to be an interesting note. It seems GCC does optimize when we clearly indicate that we use the result of the memcmp as a boolean. So would that help in anyway? Maybe it would help in writing a `hasheq` method easily? I'm not sure. > [1] https://twitter.com/stolee/status/1032312965754748930 > > [2] > https://lemire.me/blog/2018/08/22/avoid-lexicographical-comparisons-when-testing-for-string-equality/ > > [3] > https://public-inbox.org/git/20180822030344.ga14...@sigill.intra.peff.net/ > > [4] > https://public-inbox.org/git/7ea416cf-b043-1274-e161-85a8780b8...@gmail.com/ [a]: https://lemire.me/blog/2018/08/22/avoid-lexicographical-comparisons-when-testing-for-string-equality/#comment-344073 -- Sivaraam
Re: [ANNOUNCE] Git v2.19.0-rc0
Jeff King writes: > Actually, what I showed earlier does seem to have some weirdness with > else-if. But I finally stumbled on something even better: > > - oidcmp(a, b) != 0 > + !oideq(a, b) > > Because of the isomorphisms that coccinelle knows about, that catches > everything we want. Nice ;-)
Re: [ANNOUNCE] Git v2.19.0-rc0
On Fri, Aug 24, 2018 at 12:45:10PM -0400, Derrick Stolee wrote: > Here are the numbers for Linux: > > | | v2.18.0 | v2.19.0-rc0 | HEAD | > |--|--|-|| > | | 86.5 | 70.739 | 57.266 | > | | 60.582 | 101.928 | 56.641 | > | | 58.964 | 60.139 | 60.258 | > | | 59.47 | 61.141 | 58.213 | > | | 62.554 | 60.73 | 84.54 | > | | 59.139 | 85.424 | 57.745 | > | | 58.487 | 59.31 | 59.979 | > | | 58.653 | 69.845 | 60.181 | > | | 58.085 | 102.777 | 61.455 | > | | 58.304 | 60.459 | 62.551 | > | Max | 86.5 | 102.777 | 84.54 | > | Min | 58.085 | 59.31 | 56.641 | > | Avg* | 59.51913 | 71.30063 | 59.706 | > | Med | 59.0515 | 65.493 | 60.08 | Thanks. The median ones are the most interesting, I think (and show a very satisfying recovery from my patch). I'm surprised at the variance, especially in your v2.19 runs. It makes me distrust the mean (and implies to me we could do better by throwing away outliers based on value, not just the single high/low; or just using the median also solves that). The variance in the v2.18 column is what I'd expect based on past experience (slow cold cache to start, then a few percent change run-to-run). -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On 8/23/2018 4:59 PM, Derrick Stolee wrote: When I choose my own metrics for performance tests, I like to run at least 10 runs, remove the largest AND smallest runs from the samples, and then take the average. I did this manually for 'git rev-list --all --objects' on git/git and got the following results: v2.18.0 v2.19.0-rc0 HEAD 3.126 s 3.308 s 3.170 s For full disclosure, here is a full table including all samples: | | v2.18.0 | v2.19.0-rc0 | HEAD | |--|-|-|-| | | 4.58 | 3.302 | 3.239 | | | 3.13 | 3.337 | 3.133 | | | 3.213 | 3.291 | 3.159 | | | 3.219 | 3.318 | 3.131 | | | 3.077 | 3.302 | 3.163 | | | 3.074 | 3.328 | 3.119 | | | 3.022 | 3.277 | 3.125 | | | 3.083 | 3.259 | 3.203 | | | 3.057 | 3.311 | 3.223 | | | 3.155 | 3.413 | 3.225 | | Max | 4.58 | 3.413 | 3.239 | | Min | 3.022 | 3.259 | 3.119 | | Avg* | 3.126 | 3.30825 | 3.17025 | (Note that the largest one was the first run, on v2.18.0, which is due to a cold disk.) I just kicked off a script that will run this test on the Linux repo while I drive home. I'll be able to report a similar table of data easily. Here are the numbers for Linux: | | v2.18.0 | v2.19.0-rc0 | HEAD | |--|--|-|| | | 86.5 | 70.739 | 57.266 | | | 60.582 | 101.928 | 56.641 | | | 58.964 | 60.139 | 60.258 | | | 59.47 | 61.141 | 58.213 | | | 62.554 | 60.73 | 84.54 | | | 59.139 | 85.424 | 57.745 | | | 58.487 | 59.31 | 59.979 | | | 58.653 | 69.845 | 60.181 | | | 58.085 | 102.777 | 61.455 | | | 58.304 | 60.459 | 62.551 | | Max | 86.5 | 102.777 | 84.54 | | Min | 58.085 | 59.31 | 56.641 | | Avg* | 59.51913 | 71.30063 | 59.706 | | Med | 59.0515 | 65.493 | 60.08 | Thanks, -Stolee
Re: [ANNOUNCE] Git v2.19.0-rc0
On 8/24/2018 2:45 AM, Jeff King wrote: On Thu, Aug 23, 2018 at 10:59:55PM -0400, Jeff King wrote: So I think we have a winner. I'll polish that up into patches and send it out later tonight. Oof. This rabbit hole keeps going deeper and deeper. I wrote up my coccinelle findings separately in: https://public-inbox.org/git/20180824064228.ga3...@sigill.intra.peff.net/ which is possibly a coccinelle bug (there I talked about oidcmp, since it can be demonstrated with the existing transformations, but the same thing happens with my hasheq patches). I'll wait to see how that discussion plays out, but I do otherwise have hasheq() patches ready to go, so it's probably not worth anybody else digging in further. I was trying this myself yesterday, and found a doc _somewhere_ that said what we should do is this: @@ expression e1, e2; @@ if ( - oidcmp(e1, e2) + !oideq(e1, e2) ) (Don't forget the space before the last end-paren!) -Stolee
Re: [ANNOUNCE] Git v2.19.0-rc0
On Fri, Aug 24 2018, Jeff King wrote: > On Thu, Aug 23, 2018 at 04:59:27PM -0400, Derrick Stolee wrote: > >> Using git/git: >> >> Test v2.18.0 v2.19.0-rc0 HEAD >> - >> 0001.2: 3.10(3.02+0.08) 3.27(3.17+0.09) +5.5% 3.14(3.02+0.11) +1.3% >> >> >> Using torvalds/linux: >> >> Test v2.18.0 v2.19.0-rc0 HEAD >> -- >> 0001.2: 56.08(45.91+1.50) 56.60(46.62+1.50) +0.9% 54.61(45.47+1.46) -2.6% > > Interesting that these timings aren't as dramatic as the ones you got > the other day (mine seemed to shift, too; for whatever reason it seems > like under load the difference is larger). > >> Now here is where I get on my soapbox (and create a TODO for myself later). >> I ran the above with GIT_PERF_REPEAT_COUNT=10, which intuitively suggests >> that the results should be _more_ accurate than the default of 3. However, I >> then remember that we only report the *minimum* time from all the runs, >> which is likely to select an outlier from the distribution. To test this, I >> ran a few tests manually and found the variation between runs to be larger >> than 3%. > > Yes, I agree it's not a great system. The whole "best of 3" thing is > OK for throwing out cold-cache warmups, but it's really bad for teasing > out the significance of small changes, or even understanding how much > run-to-run noise there is. > >> When I choose my own metrics for performance tests, I like to run at least >> 10 runs, remove the largest AND smallest runs from the samples, and then >> take the average. I did this manually for 'git rev-list --all --objects' on >> git/git and got the following results: > > I agree that technique is better. I wonder if there's something even > more statistically rigorous we could do. E.g., to compute the variance > and throw away outliers based on standard deviations. And also to report > the variance to give a sense of the significance of any changes. > > Obviously more runs gives greater confidence in the results, but 10 > sounds like a lot. Many of these tests take minutes to run. Letting it > go overnight is OK if you're doing a once-per-release mega-run, but it's > pretty painful if you just want to generate some numbers to show off > your commit. An ex-coworker who's a *lot* smarter about these things than I am wrote this module: https://metacpan.org/pod/Dumbbench I while ago I dabbled briefly with integrating it into t/perf/ but got distracted by something else: The module currently works similar to [more traditional iterative benchmark modules], except (in layman terms) it will run the command many times, estimate the uncertainty of the result and keep iterating until a certain user-defined precision has been reached. Then, it calculates the resulting uncertainty and goes through some pain to discard bad runs and subtract overhead from the timings. The reported timing includes an uncertainty, so that multiple benchmarks can more easily be compared. Details of how it works here: https://metacpan.org/pod/Dumbbench#HOW-IT-WORKS-AND-WHY-IT-DOESN'T Something like that seems to me to be an inherently better approach. I.e. we have lots of test cases that take 500ms, and some that take maybe 5 minutes (depending on the size of the repository). Indiscriminately repeating all of those for GIT_PERF_REPEAT_COUNT must be dumber than something like the above method. We could also speed up the runtime of the perf tests a lot with such a method, by e.g. saying that we're OK with less certainty on tests that take a longer time than those that take a shorter time. >> v2.18.0 v2.19.0-rc0 HEAD >> >> 3.126 s 3.308 s 3.170 s > > So that's 5% worsening in 2.19, and we reclaim all but 1.4% of it. Those > numbers match what I expect (and what I was seeing in some of my earlier > timings). > >> I just kicked off a script that will run this test on the Linux repo while I >> drive home. I'll be able to report a similar table of data easily. > > Thanks, I'd expect it to come up with similar percentages. So we'll see > if that holds true. :) > >> My TODO is to consider aggregating the data this way (or with a median) >> instead of reporting the minimum. > > Yes, I think that would be a great improvement for t/perf. > > -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On Thu, Aug 23, 2018 at 04:59:27PM -0400, Derrick Stolee wrote: > Using git/git: > > Test v2.18.0 v2.19.0-rc0 HEAD > - > 0001.2: 3.10(3.02+0.08) 3.27(3.17+0.09) +5.5% 3.14(3.02+0.11) +1.3% > > > Using torvalds/linux: > > Test v2.18.0 v2.19.0-rc0 HEAD > -- > 0001.2: 56.08(45.91+1.50) 56.60(46.62+1.50) +0.9% 54.61(45.47+1.46) -2.6% Interesting that these timings aren't as dramatic as the ones you got the other day (mine seemed to shift, too; for whatever reason it seems like under load the difference is larger). > Now here is where I get on my soapbox (and create a TODO for myself later). > I ran the above with GIT_PERF_REPEAT_COUNT=10, which intuitively suggests > that the results should be _more_ accurate than the default of 3. However, I > then remember that we only report the *minimum* time from all the runs, > which is likely to select an outlier from the distribution. To test this, I > ran a few tests manually and found the variation between runs to be larger > than 3%. Yes, I agree it's not a great system. The whole "best of 3" thing is OK for throwing out cold-cache warmups, but it's really bad for teasing out the significance of small changes, or even understanding how much run-to-run noise there is. > When I choose my own metrics for performance tests, I like to run at least > 10 runs, remove the largest AND smallest runs from the samples, and then > take the average. I did this manually for 'git rev-list --all --objects' on > git/git and got the following results: I agree that technique is better. I wonder if there's something even more statistically rigorous we could do. E.g., to compute the variance and throw away outliers based on standard deviations. And also to report the variance to give a sense of the significance of any changes. Obviously more runs gives greater confidence in the results, but 10 sounds like a lot. Many of these tests take minutes to run. Letting it go overnight is OK if you're doing a once-per-release mega-run, but it's pretty painful if you just want to generate some numbers to show off your commit. > v2.18.0 v2.19.0-rc0 HEAD > > 3.126 s 3.308 s 3.170 s So that's 5% worsening in 2.19, and we reclaim all but 1.4% of it. Those numbers match what I expect (and what I was seeing in some of my earlier timings). > I just kicked off a script that will run this test on the Linux repo while I > drive home. I'll be able to report a similar table of data easily. Thanks, I'd expect it to come up with similar percentages. So we'll see if that holds true. :) > My TODO is to consider aggregating the data this way (or with a median) > instead of reporting the minimum. Yes, I think that would be a great improvement for t/perf. -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On Thu, Aug 23, 2018 at 10:59:55PM -0400, Jeff King wrote: > So I think we have a winner. I'll polish that up into patches and send > it out later tonight. Oof. This rabbit hole keeps going deeper and deeper. I wrote up my coccinelle findings separately in: https://public-inbox.org/git/20180824064228.ga3...@sigill.intra.peff.net/ which is possibly a coccinelle bug (there I talked about oidcmp, since it can be demonstrated with the existing transformations, but the same thing happens with my hasheq patches). I'll wait to see how that discussion plays out, but I do otherwise have hasheq() patches ready to go, so it's probably not worth anybody else digging in further. -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On Thu, Aug 23, 2018 at 07:48:42PM -0700, Jacob Keller wrote: > Odd... > > What about.. > > - if (oidcmp(a,b)) > + if(!oideq(a,b)) > { ... } Nope, it doesn't like that syntactically. > Or maybe you need to use something like > > <... > - if (oidcmp(a,b)) > + if (!oideq(a,b)) > ...> Nor that (I also tried finding documentation on what exactly the angle brackets mean, but couldn't). > Hmm. Yea, semantic patches are a bit confusing overall sometimes. > > But it looks like you got something which works? Actually, what I showed earlier does seem to have some weirdness with else-if. But I finally stumbled on something even better: - oidcmp(a, b) != 0 + !oideq(a, b) Because of the isomorphisms that coccinelle knows about, that catches everything we want. Obvious ones like: diff --git a/bisect.c b/bisect.c index 41c56a665e..7c1d8f1a6d 100644 --- a/bisect.c +++ b/bisect.c @@ -595,7 +595,7 @@ static struct commit_list *skip_away(struct commit_list *list, int count) for (i = 0; cur; cur = cur->next, i++) { if (i == index) { - if (oidcmp(&cur->item->object.oid, current_bad_oid)) + if (!oideq(&cur->item->object.oid, current_bad_oid)) return cur; if (previous) return previous; and compound conditionals like: diff --git a/blame.c b/blame.c index 10d72e36dd..538d0ab1aa 100644 --- a/blame.c +++ b/blame.c @@ -1834,7 +1834,7 @@ void setup_scoreboard(struct blame_scoreboard *sb, sb->revs->children.name = "children"; while (c->parents && - oidcmp(&c->object.oid, &sb->final->object.oid)) { + !oideq(&c->object.oid, &sb->final->object.oid)) { struct commit_list *l = xcalloc(1, sizeof(*l)); l->item = c; and even non-if contexts, like: diff --git a/sha1-file.c b/sha1-file.c index 631f6b9dc2..d85f4e93e1 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -825,7 +825,7 @@ int check_object_signature(const struct object_id *oid, void *map, if (map) { hash_object_file(map, size, type, &real_oid); - return oidcmp(oid, &real_oid) ? -1 : 0; + return !oideq(oid, &real_oid) ? -1 : 0; } So I think we have a winner. I'll polish that up into patches and send it out later tonight. -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On Thu, Aug 23, 2018 at 5:16 PM Jeff King wrote: > > On Thu, Aug 23, 2018 at 08:06:37PM -0400, Jeff King wrote: > > > This almost works: > > > > @@ > > expression a, b; > > statement s; > > @@ > > - if (oidcmp(a, b)) s > > + if (!oideq(a, b)) s > > > > [...] > > > So I really do want some way of saying "all of the block, no matter what > > it is". Or of leaving it out as context. > > Aha. The magic invocation is: > > @@ > expression a, b; > statement s; > @@ > - if (oidcmp(a, b)) > + if (!oideq(a, b)) > s > > I would have expected that you could replace "s" with "...", but that > does not seem to work. > > -Peff Odd... What about.. - if (oidcmp(a,b)) + if(!oideq(a,b)) { ... } Or maybe you need to use something like <... - if (oidcmp(a,b)) + if (!oideq(a,b)) ...> Hmm. Yea, semantic patches are a bit confusing overall sometimes. But it looks like you got something which works? Thanks, Jake
Re: [ANNOUNCE] Git v2.19.0-rc0
On Thu, Aug 23, 2018 at 08:06:37PM -0400, Jeff King wrote: > This almost works: > > @@ > expression a, b; > statement s; > @@ > - if (oidcmp(a, b)) s > + if (!oideq(a, b)) s > > [...] > So I really do want some way of saying "all of the block, no matter what > it is". Or of leaving it out as context. Aha. The magic invocation is: @@ expression a, b; statement s; @@ - if (oidcmp(a, b)) + if (!oideq(a, b)) s I would have expected that you could replace "s" with "...", but that does not seem to work. -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On Thu, Aug 23, 2018 at 07:40:49PM -0400, Jeff King wrote: > > You can look for explicitly "if (oidcmp(...))" though. I don't know if > > you can catch *any* use which degrades to boolean outside of an if > > statement, but I wouldn't expect there to be too many of those? > > Yeah, that was my thought, too. And I've been trying this all afternoon > without success. Why doesn't this work: > > @@ > expression a, b; > @@ > - if (oidcmp(a, b)) > + if (!oideq(a, b)) > > I get: > > Fatal error: exception Failure("minus: parse error: \n = File > \"contrib/coccinelle/oideq.cocci\", line 21, column 0, charpos = > 221\naround = '', whole content = \n") > > If I do: > > - if (oidcmp(a, b)) { ... } > > that seems to please the parser for the minus line. But I cannot include > the "..." on the plus line. Clearly the "..." part should be context, > but I can't seem to find the right syntax. This almost works: @@ expression a, b; statement s; @@ - if (oidcmp(a, b)) s + if (!oideq(a, b)) s It generates this, for example: diff -u -p a/bisect.c b/bisect.c --- a/bisect.c +++ b/bisect.c @@ -595,7 +595,7 @@ static struct commit_list *skip_away(str for (i = 0; cur; cur = cur->next, i++) { if (i == index) { - if (oidcmp(&cur->item->object.oid, current_bad_oid)) + if (!oideq(&cur->item->object.oid, current_bad_oid)) return cur; if (previous) return previous; which is what we want. But it also generates this: diff -u -p a/bundle.c b/bundle.c --- a/bundle.c +++ b/bundle.c @@ -369,25 +369,11 @@ static int write_bundle_refs(int bundle_ * commit that is referenced by the tag, and not the tag * itself. */ - if (oidcmp(&oid, &e->item->oid)) { - /* -* Is this the positive end of a range expressed -* in terms of a tag (e.g. v2.0 from the range -* "v1.0..v2.0")? -*/ - struct commit *one = lookup_commit_reference(the_repository, -&oid); + if (!oideq(&oid, &e->item->oid)) { + struct commit *one=lookup_commit_reference(the_repository, + &oid); struct object *obj; - if (e->item == &(one->object)) { - /* -* Need to include e->name as an -* independent ref to the pack-objects -* input, so that the tag is included -* in the output; otherwise we would -* end up triggering "empty bundle" -* error. -*/ obj = parse_object_or_die(&oid, e->name); obj->flags |= SHOWN; add_pending_object(revs, obj, e->name); So I really do want some way of saying "all of the block, no matter what it is". Or of leaving it out as context. -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On Thu, Aug 23, 2018 at 04:30:10PM -0700, Jacob Keller wrote: > On Thu, Aug 23, 2018 at 9:30 AM Jeff King wrote: > > I think that audit isn't actually too bad (but definitely not something > > we should do for v2.19). The cocci patch I showed earlier hits most of > > them. It misses the negated ones (e.g., "if (oidcmp(...))"). I'm not > > sure if there's a way to ask coccinelle only for oidcmp() > > used in a boolean context. > > > > You can look for explicitly "if (oidcmp(...))" though. I don't know if > you can catch *any* use which degrades to boolean outside of an if > statement, but I wouldn't expect there to be too many of those? Yeah, that was my thought, too. And I've been trying this all afternoon without success. Why doesn't this work: @@ expression a, b; @@ - if (oidcmp(a, b)) + if (!oideq(a, b)) I get: Fatal error: exception Failure("minus: parse error: \n = File \"contrib/coccinelle/oideq.cocci\", line 21, column 0, charpos = 221\naround = '', whole content = \n") If I do: - if (oidcmp(a, b)) { ... } that seems to please the parser for the minus line. But I cannot include the "..." on the plus line. Clearly the "..." part should be context, but I can't seem to find the right syntax. FWIW, I do have patches adding hasheq() and converting all of the !oidcmp() cases. I may resort to hand-investigating each of the negated ones, but I really feel like I should be able to do better with coccinelle. -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On Thu, Aug 23, 2018 at 9:30 AM Jeff King wrote: > I think that audit isn't actually too bad (but definitely not something > we should do for v2.19). The cocci patch I showed earlier hits most of > them. It misses the negated ones (e.g., "if (oidcmp(...))"). I'm not > sure if there's a way to ask coccinelle only for oidcmp() > used in a boolean context. > You can look for explicitly "if (oidcmp(...))" though. I don't know if you can catch *any* use which degrades to boolean outside of an if statement, but I wouldn't expect there to be too many of those? Thanks, Jake
Re: [ANNOUNCE] Git v2.19.0-rc0
On 8/23/2018 2:53 PM, Jeff King wrote: On Thu, Aug 23, 2018 at 06:26:58AM -0400, Derrick Stolee wrote: I think you can safely ignore the rest of it if you are otherwise occupied. Even if v2.19 ships without some mitigation, I don't know that it's all that big a deal, given the numbers I generated (which for some reason are less dramatic than Stolee's). My numbers may be more dramatic because my Linux environment is a virtual machine. If you have a chance, can you run p0001 on my patch (compared to 2.19-rc0, or to both v2.18 and v2.19-rc0)? It would be nice to double check that it really is fixing the problem you saw. Sure. Note: I had to create a new Linux VM on a different machine between Tuesday and today, so the absolute numbers are different. Using git/git: Test v2.18.0 v2.19.0-rc0 HEAD - 0001.2: 3.10(3.02+0.08) 3.27(3.17+0.09) +5.5% 3.14(3.02+0.11) +1.3% Using torvalds/linux: Test v2.18.0 v2.19.0-rc0 HEAD -- 0001.2: 56.08(45.91+1.50) 56.60(46.62+1.50) +0.9% 54.61(45.47+1.46) -2.6% Now here is where I get on my soapbox (and create a TODO for myself later). I ran the above with GIT_PERF_REPEAT_COUNT=10, which intuitively suggests that the results should be _more_ accurate than the default of 3. However, I then remember that we only report the *minimum* time from all the runs, which is likely to select an outlier from the distribution. To test this, I ran a few tests manually and found the variation between runs to be larger than 3%. When I choose my own metrics for performance tests, I like to run at least 10 runs, remove the largest AND smallest runs from the samples, and then take the average. I did this manually for 'git rev-list --all --objects' on git/git and got the following results: v2.18.0 v2.19.0-rc0 HEAD 3.126 s 3.308 s 3.170 s For full disclosure, here is a full table including all samples: | | v2.18.0 | v2.19.0-rc0 | HEAD | |--|-|-|-| | | 4.58 | 3.302 | 3.239 | | | 3.13 | 3.337 | 3.133 | | | 3.213 | 3.291 | 3.159 | | | 3.219 | 3.318 | 3.131 | | | 3.077 | 3.302 | 3.163 | | | 3.074 | 3.328 | 3.119 | | | 3.022 | 3.277 | 3.125 | | | 3.083 | 3.259 | 3.203 | | | 3.057 | 3.311 | 3.223 | | | 3.155 | 3.413 | 3.225 | | Max | 4.58 | 3.413 | 3.239 | | Min | 3.022 | 3.259 | 3.119 | | Avg* | 3.126 | 3.30825 | 3.17025 | (Note that the largest one was the first run, on v2.18.0, which is due to a cold disk.) I just kicked off a script that will run this test on the Linux repo while I drive home. I'll be able to report a similar table of data easily. My TODO is to consider aggregating the data this way (or with a median) instead of reporting the minimum. Thanks, -Stolee
Re: [ANNOUNCE] Git v2.19.0-rc0
On Thu, Aug 23, 2018 at 06:26:58AM -0400, Derrick Stolee wrote: > > I think you can safely > > ignore the rest of it if you are otherwise occupied. Even if v2.19 ships > > without some mitigation, I don't know that it's all that big a deal, > > given the numbers I generated (which for some reason are less dramatic > > than Stolee's). > My numbers may be more dramatic because my Linux environment is a virtual > machine. If you have a chance, can you run p0001 on my patch (compared to 2.19-rc0, or to both v2.18 and v2.19-rc0)? It would be nice to double check that it really is fixing the problem you saw. -Peff
wide t/perf output, was Re: [ANNOUNCE] Git v2.19.0-rc0
On Thu, Aug 23, 2018 at 06:20:26AM -0700, Junio C Hamano wrote: > Jeff King writes: > > > Here are numbers for p0001.2 run against linux.git on a few > > versions. This is using -O2 with gcc 8.2.0. > > > > Test v2.18.0 v2.19.0-rc0 HEAD > > > > -- > > 0001.2: 34.24(33.81+0.43) 34.83(34.42+0.40) +1.7% 33.90(33.47+0.42) > > -1.0% > > I see what you did to the formatting here, which is a topic of > another thread ;-). Do you happen to have a link? I missed that one, and digging turned up nothing. A while ago I wrote the patch below. I don't recall why I never sent it in (and it doesn't apply cleanly these days, though I'm sure it could be forward-ported). -- >8 -- Date: Wed, 20 Jan 2016 23:54:14 -0500 Subject: [PATCH] t/perf: add "tall" output format When aggregating results, we usually show a list of tests, one per line, with the tested revisions in columns across. Like: $ ./aggregate.perl 348d4f2^ 348d4f2 p7000-filter-branch.sh Test 348d4f2^ 348d4f2 --- 7000.2: noop filter 295.32(269.61+14.36) 7.92(0.85+0.72) -97.3% This is useful if you have a lot of tests to show, but few revisions; you're effectively comparing the two items on each line. But sometimes you have the opposite: few tests, but a large number of revisions. In this case, the lines get very long, and it's hard to compare values. This patch introduces a "tall" format that shows the same data in a more vertical manner: $ ./aggregate.perl --tall \ 348d4f2^ 348d4f2 \ jk/filter-branch-empty^ \ jk/filter-branch-empty \ p7000-filter-branch.sh Test: p7000-filter-branch.2 348d4f2^ 295.32(269.61+14.36) 348d4f27.92(0.85+0.72) -97.3% jk/filter-branch-empty^9.37(0.87+0.80) -96.8% jk/filter-branch-empty 7.71(0.92+0.62) -97.4% Signed-off-by: Jeff King --- t/perf/aggregate.perl | 124 ++ 1 file changed, 88 insertions(+), 36 deletions(-) diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl index e401208488..d108a02ccd 100755 --- a/t/perf/aggregate.perl +++ b/t/perf/aggregate.perl @@ -17,29 +17,41 @@ sub get_times { return ($rt, $4, $5); } -sub format_times { +sub format_times_list { my ($r, $u, $s, $firstr) = @_; if (!defined $r) { return ""; } my $out = sprintf "%.2f(%.2f+%.2f)", $r, $u, $s; + my $pct; if (defined $firstr) { if ($firstr > 0) { - $out .= sprintf " %+.1f%%", 100.0*($r-$firstr)/$firstr; + $pct = sprintf "%+.1f%%", 100.0*($r-$firstr)/$firstr; } elsif ($r == 0) { - $out .= " ="; + $pct = "="; } else { - $out .= " +inf"; + $pct = "+inf"; } } - return $out; + return ($out, $pct); +} + +sub format_times { + my ($times, $pct) = format_times_list(@_); + return defined $pct ? "$times $pct" : $times; } my (@dirs, %dirnames, %dirabbrevs, %prefixes, @tests); +my ($tall_format); while (scalar @ARGV) { my $arg = $ARGV[0]; my $dir; last if -f $arg or $arg eq "--"; + if ($arg eq "--tall") { + $tall_format = 1; + shift @ARGV; + next; + } if (! -d $arg) { my $rev = Git::command_oneline(qw(rev-parse --verify), $arg); $dir = "build/".$rev; @@ -122,6 +134,11 @@ sub have_slash { return 0; } +sub printable_dir { + my ($d) = @_; + return exists $dirabbrevs{$d} ? $dirabbrevs{$d} : $dirnames{$d}; +} + my %newdirabbrevs = %dirabbrevs; while (!have_duplicate(values %newdirabbrevs)) { %dirabbrevs = %newdirabbrevs; @@ -132,44 +149,79 @@ sub have_slash { } } -my %times; -my @colwidth = ((0)x@dirs); -for my $i (0..$#dirs) { - my $d = $dirs[$i]; - my $w = length (exists $dirabbrevs{$d} ? $dirabbrevs{$d} : $dirnames{$d}); - $colwidth[$i] = $w if $w > $colwidth[$i]; -} -for my $t (@subtests) { - my $firstr; +binmode STDOUT, ":utf8" or die "PANIC on binmode: $!"; + +if (!$tall_format) { + my %times; + my @colwidth = ((0)x@dirs); for my $i (0..$#dirs) { my $d = $dirs[$i]; - $times{$prefixes{$d}.$t} = [get_times("$resultsdir/$prefixes{$d}$t.times")]; - my ($r,$u,$s) = @{$times{$prefixes{$d}.$t}}; - my $w = length format_times($r,$u,$s,$firstr); + my $w = length(printable_dir($d)); $colwidth[$i] = $w if $w > $colwidth[$i]; - $firstr = $r unless defined $firstr; } -} -my $t
Re: [ANNOUNCE] Git v2.19.0-rc0
On Thu, Aug 23, 2018 at 06:26:58AM -0400, Derrick Stolee wrote: > Around the time that my proposed approaches were getting vetoed for > alignment issues, I figured I was out of my depth here. I reached out to > Daniel Lemire (of EWAH bitmap fame) on Twitter [1]. His blog is full of > posts of word-based approaches to different problems, so I thought he might > know something off the top of his head that would be applicable. His > conclusion (after looking only a short time) was to take a 'hasheq' approach > [2] like Peff suggested [3]. Since that requires auditing all callers of > hashcmp to see if hasheq is appropriate, it is not a good solution for 2.19 > but (in my opinion) should be evaluated as part of the 2.20 cycle. I think that audit isn't actually too bad (but definitely not something we should do for v2.19). The cocci patch I showed earlier hits most of them. It misses the negated ones (e.g., "if (oidcmp(...))"). I'm not sure if there's a way to ask coccinelle only for oidcmp() used in a boolean context. Just skimming the grep output, it's basically every call except the ones we use for binary search. -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
Jeff King writes: > Here are numbers for p0001.2 run against linux.git on a few > versions. This is using -O2 with gcc 8.2.0. > > Test v2.18.0 v2.19.0-rc0 HEAD > > -- > 0001.2: 34.24(33.81+0.43) 34.83(34.42+0.40) +1.7% 33.90(33.47+0.42) > -1.0% I see what you did to the formatting here, which is a topic of another thread ;-). Thanks, as Derrick also noted, I agree this is an appropriate workaround for the upcoming release and we may want to explore hasheq() and other solutions as part of the effort during the next cycle (which can start now if people are bored---after all working on the codebase is the best way to hunt for recent bugs). Thanks, will queue. > diff --git a/cache.h b/cache.h > index b1fd3d58ab..4d014541ab 100644 > --- a/cache.h > +++ b/cache.h > @@ -1023,6 +1023,16 @@ extern const struct object_id null_oid; > > static inline int hashcmp(const unsigned char *sha1, const unsigned char > *sha2) > { > + /* > + * This is a temporary optimization hack. By asserting the size here, > + * we let the compiler know that it's always going to be 20, which lets > + * it turn this fixed-size memcmp into a few inline instructions. > + * > + * This will need to be extended or ripped out when we learn about > + * hashes of different sizes. > + */ > + if (the_hash_algo->rawsz != 20) > + BUG("hash size not yet supported by hashcmp"); > return memcmp(sha1, sha2, the_hash_algo->rawsz); > }
Re: [ANNOUNCE] Git v2.19.0-rc0
Derrick Stolee writes: > I was thinking that having a mitigation for 2.19 is best, and then we > can focus as part of the 2.20 cycle how we can properly avoid this > cost, especially when 32 is a valid option. > ... > ... to > take a 'hasheq' approach [2] like Peff suggested [3]. Since that > requires auditing all callers of hashcmp to see if hasheq is > appropriate, it is not a good solution for 2.19 but (in my opinion) > should be evaluated as part of the 2.20 cycle. Thanks for thoughtful comments. I think it makes sense to go with the "tell compiler hashcmp() currently is only about 20-byte array" for now, as it is trivial to see why it cannot break things. During 2.20 cycle, we may find out that hasheq() abstraction performs better than hashcmp() with multiple lengths, and end up doing that "audit and replace to use hasheq()". But as you said, it probably isnot a good idea to rush it in this cycle.
Re: [ANNOUNCE] Git v2.19.0-rc0
On 8/23/2018 1:04 AM, Jeff King wrote: On Thu, Aug 23, 2018 at 03:47:07AM +, brian m. carlson wrote: I expect that's going to be the case as well. I have patches that wire up actual SHA-256 support in my hash-impl branch. However, having said that, I'm happy to defer to whatever everyone else thinks is best for 2.19. The assert solution would be fine with me in this situation, and if we need to pull it out in the future, that's okay with me. I don't really have a strong opinion on this either way, so if someone else does, please say so. I have somewhat more limited availability over the next couple days, as I'm travelling on business, but I'm happy to review a patch (and it seems like Peff has one minus the actual commit message). I just posted the patch elsewhere in the thread. Thank you for that! I think you can safely ignore the rest of it if you are otherwise occupied. Even if v2.19 ships without some mitigation, I don't know that it's all that big a deal, given the numbers I generated (which for some reason are less dramatic than Stolee's). My numbers may be more dramatic because my Linux environment is a virtual machine. I was thinking that having a mitigation for 2.19 is best, and then we can focus as part of the 2.20 cycle how we can properly avoid this cost, especially when 32 is a valid option. Around the time that my proposed approaches were getting vetoed for alignment issues, I figured I was out of my depth here. I reached out to Daniel Lemire (of EWAH bitmap fame) on Twitter [1]. His blog is full of posts of word-based approaches to different problems, so I thought he might know something off the top of his head that would be applicable. His conclusion (after looking only a short time) was to take a 'hasheq' approach [2] like Peff suggested [3]. Since that requires auditing all callers of hashcmp to see if hasheq is appropriate, it is not a good solution for 2.19 but (in my opinion) should be evaluated as part of the 2.20 cycle. Of course, if someone with knowledge of word-alignment issues across the platforms we support knows how to enforce an alignment for object_id, then something word-based like [4] could be reconsidered. Thanks, everyone! -Stolee [1] https://twitter.com/stolee/status/1032312965754748930 [2] https://lemire.me/blog/2018/08/22/avoid-lexicographical-comparisons-when-testing-for-string-equality/ [3] https://public-inbox.org/git/20180822030344.ga14...@sigill.intra.peff.net/ [4] https://public-inbox.org/git/7ea416cf-b043-1274-e161-85a8780b8...@gmail.com/
Re: [ANNOUNCE] Git v2.19.0-rc0
On Thu, Aug 23, 2018 at 01:02:25AM -0400, Jeff King wrote: > Here's the patch. For some reason my numbers aren't quite as large as > they were yesterday (I was very careful to keep the system unloaded > today, whereas yesterday I was doing a few other things, so perhaps that > is the difference). This looks sane to me. Thanks for writing this up. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [ANNOUNCE] Git v2.19.0-rc0
Jeff King wrote: > Here's the patch. For some reason my numbers aren't quite as large as > they were yesterday (I was very careful to keep the system unloaded > today, whereas yesterday I was doing a few other things, so perhaps that > is the difference). > > -- >8 -- > Subject: [PATCH] hashcmp: assert constant hash size > > Prior to 509f6f62a4 (cache: update object ID functions for > the_hash_algo, 2018-07-16), hashcmp() called memcmp() with a > constant size of 20 bytes. Some compilers were able to turn > that into a few quad-word comparisons, which is faster than > actually calling memcmp(). > > In 509f6f62a4, we started using the_hash_algo->rawsz > instead. Even though this will always be 20, the compiler > doesn't know that while inlining hashcmp() and ends up just > generating a call to memcmp(). > > Eventually we'll have to deal with multiple hash sizes, but > for the upcoming v2.19, we can restore some of the original > performance by asserting on the size. That gives the > compiler enough information to know that the memcmp will > always be called with a length of 20, and it performs the > same optimization. > > Here are numbers for p0001.2 run against linux.git on a few > versions. This is using -O2 with gcc 8.2.0. > > Test v2.18.0 v2.19.0-rc0 HEAD > > -- > 0001.2: 34.24(33.81+0.43) 34.83(34.42+0.40) +1.7% 33.90(33.47+0.42) > -1.0% > > You can see that v2.19 is a little slower than v2.18. This > commit ended up slightly faster than v2.18, but there's a > fair bit of run-to-run noise (the generated code in the two > cases is basically the same). This patch does seem to be > consistently 1-2% faster than v2.19. > > I tried changing hashcpy(), which was also touched by > 509f6f62a4, in the same way, but couldn't measure any > speedup. Which makes sense, at least for this workload. A > traversal of the whole commit graph requires looking up > every entry of every tree via lookup_object(). That's many > multiples of the numbers of objects in the repository (most > of the lookups just return "yes, we already saw that > object"). > > Reported-by: Derrick Stolee > Signed-off-by: Jeff King > --- > cache.h | 10 ++ > 1 file changed, 10 insertions(+) Reviewed-by: Jonathan Nieder Verified using "make object.s" that the memcmp call goes away. Thank you.
Re: [ANNOUNCE] Git v2.19.0-rc0
On Thu, Aug 23, 2018 at 03:47:07AM +, brian m. carlson wrote: > I expect that's going to be the case as well. I have patches that > wire up actual SHA-256 support in my hash-impl branch. > > However, having said that, I'm happy to defer to whatever everyone else > thinks is best for 2.19. The assert solution would be fine with me in > this situation, and if we need to pull it out in the future, that's okay > with me. > > I don't really have a strong opinion on this either way, so if someone > else does, please say so. I have somewhat more limited availability > over the next couple days, as I'm travelling on business, but I'm happy > to review a patch (and it seems like Peff has one minus the actual > commit message). I just posted the patch elsewhere in the thread. I think you can safely ignore the rest of it if you are otherwise occupied. Even if v2.19 ships without some mitigation, I don't know that it's all that big a deal, given the numbers I generated (which for some reason are less dramatic than Stolee's). -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On Wed, Aug 22, 2018 at 07:27:56PM -0700, Jonathan Nieder wrote: > Jeff King wrote: > > > FWIW, it's not 10%. The best I measured was ~4% on a very > > hashcmp-limited operation, and I suspect even that may be highly > > dependent on the compiler. We might be able to improve more by > > sprinkling more asserts around, but there are 75 mentions of > > the_hash_algo->rawsz. I wouldn't want to an assert at each one. > > > > I don't mind doing one or a handful of these asserts as part of v2.19 if > > we want to try to reclaim those few percent. But I suspect the very > > first commit in any further hash-transition work is just going to be to > > rip them all out. > > I was thinking just hashcmp and hashcpy. > > Ideally such a change would come with a performance test to help the > person writing that very first commit. Except we already have > performance tests that capture this. ;-) > > For further hash-transition work, I agree someone may want to revert > this, and I don't mind such a revert appearing right away in "next". > And it's possible that we might have to do the equivalent of manual > template expansion to recover the performance in some > performance-sensitive areas. Maybe we can get the compiler to > cooperate with us in that and maybe we can't. That's okay with me. > > Anyway, I'll resend your patch with a commit message added some time > this evening. Here's the patch. For some reason my numbers aren't quite as large as they were yesterday (I was very careful to keep the system unloaded today, whereas yesterday I was doing a few other things, so perhaps that is the difference). -- >8 -- Subject: [PATCH] hashcmp: assert constant hash size Prior to 509f6f62a4 (cache: update object ID functions for the_hash_algo, 2018-07-16), hashcmp() called memcmp() with a constant size of 20 bytes. Some compilers were able to turn that into a few quad-word comparisons, which is faster than actually calling memcmp(). In 509f6f62a4, we started using the_hash_algo->rawsz instead. Even though this will always be 20, the compiler doesn't know that while inlining hashcmp() and ends up just generating a call to memcmp(). Eventually we'll have to deal with multiple hash sizes, but for the upcoming v2.19, we can restore some of the original performance by asserting on the size. That gives the compiler enough information to know that the memcmp will always be called with a length of 20, and it performs the same optimization. Here are numbers for p0001.2 run against linux.git on a few versions. This is using -O2 with gcc 8.2.0. Test v2.18.0 v2.19.0-rc0 HEAD -- 0001.2: 34.24(33.81+0.43) 34.83(34.42+0.40) +1.7% 33.90(33.47+0.42) -1.0% You can see that v2.19 is a little slower than v2.18. This commit ended up slightly faster than v2.18, but there's a fair bit of run-to-run noise (the generated code in the two cases is basically the same). This patch does seem to be consistently 1-2% faster than v2.19. I tried changing hashcpy(), which was also touched by 509f6f62a4, in the same way, but couldn't measure any speedup. Which makes sense, at least for this workload. A traversal of the whole commit graph requires looking up every entry of every tree via lookup_object(). That's many multiples of the numbers of objects in the repository (most of the lookups just return "yes, we already saw that object"). Reported-by: Derrick Stolee Signed-off-by: Jeff King --- cache.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/cache.h b/cache.h index b1fd3d58ab..4d014541ab 100644 --- a/cache.h +++ b/cache.h @@ -1023,6 +1023,16 @@ extern const struct object_id null_oid; static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) { + /* +* This is a temporary optimization hack. By asserting the size here, +* we let the compiler know that it's always going to be 20, which lets +* it turn this fixed-size memcmp into a few inline instructions. +* +* This will need to be extended or ripped out when we learn about +* hashes of different sizes. +*/ + if (the_hash_algo->rawsz != 20) + BUG("hash size not yet supported by hashcmp"); return memcmp(sha1, sha2, the_hash_algo->rawsz); } -- 2.19.0.rc0.412.g7005db4e88
Re: [ANNOUNCE] Git v2.19.0-rc0
On Wed, Aug 22, 2018 at 10:16:18PM -0400, Jeff King wrote: > FWIW, it's not 10%. The best I measured was ~4% on a very > hashcmp-limited operation, and I suspect even that may be highly > dependent on the compiler. We might be able to improve more by > sprinkling more asserts around, but there are 75 mentions of > the_hash_algo->rawsz. I wouldn't want to an assert at each one. > > I don't mind doing one or a handful of these asserts as part of v2.19 if > we want to try to reclaim those few percent. But I suspect the very > first commit in any further hash-transition work is just going to be to > rip them all out. I expect that's going to be the case as well. I have patches that wire up actual SHA-256 support in my hash-impl branch. However, having said that, I'm happy to defer to whatever everyone else thinks is best for 2.19. The assert solution would be fine with me in this situation, and if we need to pull it out in the future, that's okay with me. I don't really have a strong opinion on this either way, so if someone else does, please say so. I have somewhat more limited availability over the next couple days, as I'm travelling on business, but I'm happy to review a patch (and it seems like Peff has one minus the actual commit message). -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [ANNOUNCE] Git v2.19.0-rc0
Jeff King wrote: > FWIW, it's not 10%. The best I measured was ~4% on a very > hashcmp-limited operation, and I suspect even that may be highly > dependent on the compiler. We might be able to improve more by > sprinkling more asserts around, but there are 75 mentions of > the_hash_algo->rawsz. I wouldn't want to an assert at each one. > > I don't mind doing one or a handful of these asserts as part of v2.19 if > we want to try to reclaim those few percent. But I suspect the very > first commit in any further hash-transition work is just going to be to > rip them all out. I was thinking just hashcmp and hashcpy. Ideally such a change would come with a performance test to help the person writing that very first commit. Except we already have performance tests that capture this. ;-) For further hash-transition work, I agree someone may want to revert this, and I don't mind such a revert appearing right away in "next". And it's possible that we might have to do the equivalent of manual template expansion to recover the performance in some performance-sensitive areas. Maybe we can get the compiler to cooperate with us in that and maybe we can't. That's okay with me. Anyway, I'll resend your patch with a commit message added some time this evening. Thanks, Jonathan
Re: [ANNOUNCE] Git v2.19.0-rc0
On Wed, Aug 22, 2018 at 06:23:43PM -0700, Jonathan Nieder wrote: > Jeff King wrote: > >> On Tue, 2018-08-21 at 23:03 -0400, Jeff King wrote: > > >>> static inline int hashcmp(const unsigned char *sha1, const unsigned > >>> char *sha2) > >>> { > >>> + assert(the_hash_algo->rawsz == 20); > >>> return memcmp(sha1, sha2, the_hash_algo->rawsz); > >>> } > [...] > > The bigger questions are: > > > > - are we OK with such an assertion; and > > > > - does the assertion still give us the desired behavior when we add in > > a branch for rawsz==32? > > > > And I think the answers for those are both "probably not". > > At this point in the release process, I think the answer to the first > question is a pretty clear "yes". > > A ~10% increase in latency of some operations is quite significant, in > exchange for no user benefit yet. We can continue to try to figure > out how to convince compilers to generate good code for this (and > that's useful), but in the meantime we should also do the simple thing > to avoid the regression for users. FWIW, it's not 10%. The best I measured was ~4% on a very hashcmp-limited operation, and I suspect even that may be highly dependent on the compiler. We might be able to improve more by sprinkling more asserts around, but there are 75 mentions of the_hash_algo->rawsz. I wouldn't want to an assert at each one. I don't mind doing one or a handful of these asserts as part of v2.19 if we want to try to reclaim those few percent. But I suspect the very first commit in any further hash-transition work is just going to be to rip them all out. -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
Hi, Jeff King wrote: >> On Tue, 2018-08-21 at 23:03 -0400, Jeff King wrote: >>> static inline int hashcmp(const unsigned char *sha1, const unsigned >>> char *sha2) >>> { >>> + assert(the_hash_algo->rawsz == 20); >>> return memcmp(sha1, sha2, the_hash_algo->rawsz); >>> } [...] > The bigger questions are: > > - are we OK with such an assertion; and > > - does the assertion still give us the desired behavior when we add in > a branch for rawsz==32? > > And I think the answers for those are both "probably not". At this point in the release process, I think the answer to the first question is a pretty clear "yes". A ~10% increase in latency of some operations is quite significant, in exchange for no user benefit yet. We can continue to try to figure out how to convince compilers to generate good code for this (and that's useful), but in the meantime we should also do the simple thing to avoid the regression for users. Thanks, Jonathan
Re: [ANNOUNCE] Git v2.19.0-rc0
On 8/22/2018 12:58 PM, Duy Nguyen wrote: On Wed, Aug 22, 2018 at 6:49 PM Derrick Stolee wrote: On 8/22/2018 12:26 PM, Jeff King wrote: On Wed, Aug 22, 2018 at 06:14:24PM +0200, Duy Nguyen wrote: On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen wrote: On Wed, Aug 22, 2018 at 6:03 PM Jeff King wrote: On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote: The other thing I was going to recommend (and I'll try to test this out myself later) is to see if 'the_hash_algo->rawsz' is being treated as a volatile variable, since it is being referenced through a pointer. Perhaps storing the value locally and then casing on it would help? I tried various sprinkling of "const" around the declarations to make it clear that the values wouldn't change once we saw them. But I couldn't detect any difference. At most I think that would let us hoist the "if" out of the loop, but gcc still seems unwilling to expand the memcmp when there are other branches. I think if that's the thing we want to have happen, we really do need to just write it out on that branch rather than saying "memcmp". This reminds me of an old discussion about memcpy() vs doing explicit compare loop with lots of performance measurements.. Ah found it. Not sure if it is still relevant in light of multiple hash support https://public-inbox.org/git/20110427225114.ga16...@elte.hu/ Yes, that was what I meant. We actually did switch to that hand-rolled loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use memcmp instead of open-coded loop, 2017-08-09). Looking at that commit, I'm surprised the old logic was just a for loop, instead of a word-based approach, such as the following: Might work on x86 but it breaks on cpu architectures with stricter alignment. I don't think we have a guarantee that object_id is always 8 byte aligned. You (and Peff) are probably correct here, which is unfortunate. I'm not familiar with alignment constraints, but assume that such a word-based approach is best. Thanks, -Stolee
Re: [ANNOUNCE] Git v2.19.0-rc0
Jeff King writes: > On Wed, Aug 22, 2018 at 12:49:34PM -0400, Derrick Stolee wrote: > >> > Yes, that was what I meant. We actually did switch to that hand-rolled >> > loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use >> > memcmp instead of open-coded loop, 2017-08-09). >> >> Looking at that commit, I'm surprised the old logic was just a for >> loop, instead of a word-based approach, such as the following: >> [...] >> +struct object_id_20 { >> + uint64_t data0; >> + uint64_t data1; >> + uint32_t data2; >> +}; >> + >> static inline int hashcmp(const unsigned char *sha1, const unsigned char >> *sha2) >> { >> - return memcmp(sha1, sha2, the_hash_algo->rawsz); >> + if (the_hash_algo->rawsz == 20) { >> + struct object_id_20 *obj1 = (struct object_id_20 *)sha1; >> + struct object_id_20 *obj2 = (struct object_id_20 *)sha2; > > I wonder if you're potentially running afoul of alignment requirements > here. Yup, and I think that all was discussed in that old thread ;-)
Re: [ANNOUNCE] Git v2.19.0-rc0
On Wed, Aug 22, 2018 at 12:49:34PM -0400, Derrick Stolee wrote: > > Yes, that was what I meant. We actually did switch to that hand-rolled > > loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use > > memcmp instead of open-coded loop, 2017-08-09). > > Looking at that commit, I'm surprised the old logic was just a for > loop, instead of a word-based approach, such as the following: > [...] > +struct object_id_20 { > + uint64_t data0; > + uint64_t data1; > + uint32_t data2; > +}; > + > static inline int hashcmp(const unsigned char *sha1, const unsigned char > *sha2) > { > - return memcmp(sha1, sha2, the_hash_algo->rawsz); > + if (the_hash_algo->rawsz == 20) { > + struct object_id_20 *obj1 = (struct object_id_20 *)sha1; > + struct object_id_20 *obj2 = (struct object_id_20 *)sha2; I wonder if you're potentially running afoul of alignment requirements here. -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On Wed, Aug 22, 2018 at 6:49 PM Derrick Stolee wrote: > > On 8/22/2018 12:26 PM, Jeff King wrote: > > On Wed, Aug 22, 2018 at 06:14:24PM +0200, Duy Nguyen wrote: > > > >> On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen wrote: > >>> On Wed, Aug 22, 2018 at 6:03 PM Jeff King wrote: > On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote: > > > The other thing I was going to recommend (and I'll try to test this out > > myself later) is to see if 'the_hash_algo->rawsz' is being treated as a > > volatile variable, since it is being referenced through a pointer. > > Perhaps > > storing the value locally and then casing on it would help? > I tried various sprinkling of "const" around the declarations to make it > clear that the values wouldn't change once we saw them. But I couldn't > detect any difference. At most I think that would let us hoist the "if" > out of the loop, but gcc still seems unwilling to expand the memcmp when > there are other branches. > > I think if that's the thing we want to have happen, we really do need to > just write it out on that branch rather than saying "memcmp". > >>> This reminds me of an old discussion about memcpy() vs doing explicit > >>> compare loop with lots of performance measurements.. > >> Ah found it. Not sure if it is still relevant in light of multiple hash > >> support > >> > >> https://public-inbox.org/git/20110427225114.ga16...@elte.hu/ > > Yes, that was what I meant. We actually did switch to that hand-rolled > > loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use > > memcmp instead of open-coded loop, 2017-08-09). > > Looking at that commit, I'm surprised the old logic was just a for loop, > instead of a word-based approach, such as the following: Might work on x86 but it breaks on cpu architectures with stricter alignment. I don't think we have a guarantee that object_id is always 8 byte aligned. > > diff --git a/cache.h b/cache.h > index b1fd3d58ab..5e5819ad49 100644 > --- a/cache.h > +++ b/cache.h > @@ -1021,9 +1021,41 @@ extern int find_unique_abbrev_r(char *hex, const > struct object_id *oid, int len) > extern const unsigned char null_sha1[GIT_MAX_RAWSZ]; > extern const struct object_id null_oid; > > +static inline int word_cmp_32(uint32_t a, uint32_t b) > +{ > + return memcmp(&a, &b, sizeof(uint32_t)); > +} > + > +static inline int word_cmp_64(uint64_t a, uint64_t b) > +{ > + return memcmp(&a, &b, sizeof(uint64_t)); > +} > + > +struct object_id_20 { > + uint64_t data0; > + uint64_t data1; > + uint32_t data2; > +}; > + > static inline int hashcmp(const unsigned char *sha1, const unsigned > char *sha2) > { > - return memcmp(sha1, sha2, the_hash_algo->rawsz); > + if (the_hash_algo->rawsz == 20) { > + struct object_id_20 *obj1 = (struct object_id_20 *)sha1; > + struct object_id_20 *obj2 = (struct object_id_20 *)sha2; > + > + if (obj1->data0 == obj2->data0) { > + if (obj1->data1 == obj2->data1) { > + if (obj1->data2 == obj2->data2) { > + return 0; > + } > + return word_cmp_32(obj1->data2, > obj2->data2); > + } > + return word_cmp_64(obj1->data1, obj2->data1); > + } > + return word_cmp_64(obj1->data0, obj2->data0); > + } > + > + assert(0); > } > > static inline int oidcmp(const struct object_id *oid1, const struct > object_id *oid2) > > -- Duy
Re: [ANNOUNCE] Git v2.19.0-rc0
On 8/22/2018 12:26 PM, Jeff King wrote: On Wed, Aug 22, 2018 at 06:14:24PM +0200, Duy Nguyen wrote: On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen wrote: On Wed, Aug 22, 2018 at 6:03 PM Jeff King wrote: On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote: The other thing I was going to recommend (and I'll try to test this out myself later) is to see if 'the_hash_algo->rawsz' is being treated as a volatile variable, since it is being referenced through a pointer. Perhaps storing the value locally and then casing on it would help? I tried various sprinkling of "const" around the declarations to make it clear that the values wouldn't change once we saw them. But I couldn't detect any difference. At most I think that would let us hoist the "if" out of the loop, but gcc still seems unwilling to expand the memcmp when there are other branches. I think if that's the thing we want to have happen, we really do need to just write it out on that branch rather than saying "memcmp". This reminds me of an old discussion about memcpy() vs doing explicit compare loop with lots of performance measurements.. Ah found it. Not sure if it is still relevant in light of multiple hash support https://public-inbox.org/git/20110427225114.ga16...@elte.hu/ Yes, that was what I meant. We actually did switch to that hand-rolled loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use memcmp instead of open-coded loop, 2017-08-09). Looking at that commit, I'm surprised the old logic was just a for loop, instead of a word-based approach, such as the following: diff --git a/cache.h b/cache.h index b1fd3d58ab..5e5819ad49 100644 --- a/cache.h +++ b/cache.h @@ -1021,9 +1021,41 @@ extern int find_unique_abbrev_r(char *hex, const struct object_id *oid, int len) extern const unsigned char null_sha1[GIT_MAX_RAWSZ]; extern const struct object_id null_oid; +static inline int word_cmp_32(uint32_t a, uint32_t b) +{ + return memcmp(&a, &b, sizeof(uint32_t)); +} + +static inline int word_cmp_64(uint64_t a, uint64_t b) +{ + return memcmp(&a, &b, sizeof(uint64_t)); +} + +struct object_id_20 { + uint64_t data0; + uint64_t data1; + uint32_t data2; +}; + static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) { - return memcmp(sha1, sha2, the_hash_algo->rawsz); + if (the_hash_algo->rawsz == 20) { + struct object_id_20 *obj1 = (struct object_id_20 *)sha1; + struct object_id_20 *obj2 = (struct object_id_20 *)sha2; + + if (obj1->data0 == obj2->data0) { + if (obj1->data1 == obj2->data1) { + if (obj1->data2 == obj2->data2) { + return 0; + } + return word_cmp_32(obj1->data2, obj2->data2); + } + return word_cmp_64(obj1->data1, obj2->data1); + } + return word_cmp_64(obj1->data0, obj2->data0); + } + + assert(0); } static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
Re: [ANNOUNCE] Git v2.19.0-rc0
On Wed, Aug 22, 2018 at 06:14:24PM +0200, Duy Nguyen wrote: > On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen wrote: > > > > On Wed, Aug 22, 2018 at 6:03 PM Jeff King wrote: > > > > > > On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote: > > > > > > > The other thing I was going to recommend (and I'll try to test this out > > > > myself later) is to see if 'the_hash_algo->rawsz' is being treated as a > > > > volatile variable, since it is being referenced through a pointer. > > > > Perhaps > > > > storing the value locally and then casing on it would help? > > > > > > I tried various sprinkling of "const" around the declarations to make it > > > clear that the values wouldn't change once we saw them. But I couldn't > > > detect any difference. At most I think that would let us hoist the "if" > > > out of the loop, but gcc still seems unwilling to expand the memcmp when > > > there are other branches. > > > > > > I think if that's the thing we want to have happen, we really do need to > > > just write it out on that branch rather than saying "memcmp". > > > > This reminds me of an old discussion about memcpy() vs doing explicit > > compare loop with lots of performance measurements.. > > Ah found it. Not sure if it is still relevant in light of multiple hash > support > > https://public-inbox.org/git/20110427225114.ga16...@elte.hu/ Yes, that was what I meant. We actually did switch to that hand-rolled loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use memcmp instead of open-coded loop, 2017-08-09). -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen wrote: > > On Wed, Aug 22, 2018 at 6:03 PM Jeff King wrote: > > > > On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote: > > > > > The other thing I was going to recommend (and I'll try to test this out > > > myself later) is to see if 'the_hash_algo->rawsz' is being treated as a > > > volatile variable, since it is being referenced through a pointer. Perhaps > > > storing the value locally and then casing on it would help? > > > > I tried various sprinkling of "const" around the declarations to make it > > clear that the values wouldn't change once we saw them. But I couldn't > > detect any difference. At most I think that would let us hoist the "if" > > out of the loop, but gcc still seems unwilling to expand the memcmp when > > there are other branches. > > > > I think if that's the thing we want to have happen, we really do need to > > just write it out on that branch rather than saying "memcmp". > > This reminds me of an old discussion about memcpy() vs doing explicit > compare loop with lots of performance measurements.. Ah found it. Not sure if it is still relevant in light of multiple hash support https://public-inbox.org/git/20110427225114.ga16...@elte.hu/ -- Duy
Re: [ANNOUNCE] Git v2.19.0-rc0
On Wed, Aug 22, 2018 at 6:03 PM Jeff King wrote: > > On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote: > > > The other thing I was going to recommend (and I'll try to test this out > > myself later) is to see if 'the_hash_algo->rawsz' is being treated as a > > volatile variable, since it is being referenced through a pointer. Perhaps > > storing the value locally and then casing on it would help? > > I tried various sprinkling of "const" around the declarations to make it > clear that the values wouldn't change once we saw them. But I couldn't > detect any difference. At most I think that would let us hoist the "if" > out of the loop, but gcc still seems unwilling to expand the memcmp when > there are other branches. > > I think if that's the thing we want to have happen, we really do need to > just write it out on that branch rather than saying "memcmp". This reminds me of an old discussion about memcpy() vs doing explicit compare loop with lots of performance measurements.. Is that what you meant by "write it out"? -- Duy
Re: [ANNOUNCE] Git v2.19.0-rc0
On Wed, Aug 22, 2018 at 10:28:56AM -0400, Derrick Stolee wrote: > In my testing, I've had the best luck with this change: > > diff --git a/cache.h b/cache.h > index b1fd3d58ab..6c8b51c390 100644 > --- a/cache.h > +++ b/cache.h > @@ -1023,7 +1023,14 @@ extern const struct object_id null_oid; > > static inline int hashcmp(const unsigned char *sha1, const unsigned char > *sha2) > { > - return memcmp(sha1, sha2, the_hash_algo->rawsz); > + switch (the_hash_algo->rawsz) { > + case 20: > + return memcmp(sha1, sha2, 20); > + case 32: > + return memcmp(sha1, sha2, 32); > + default: > + assert(0); > + } > } > > The fact that '20' and '32' are constants here may be helpful to the > compiler. Can someone else test the perf? I tested that one last night (and just re-tested it now to be sure). It seems to just generate two separate calls to memcmp, with no speed improvement. -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On Wed, Aug 22, 2018 at 08:42:20AM -0400, Paul Smith wrote: > On Tue, 2018-08-21 at 23:03 -0400, Jeff King wrote: > > static inline int hashcmp(const unsigned char *sha1, const unsigned > > char *sha2) > > { > > + assert(the_hash_algo->rawsz == 20); > > return memcmp(sha1, sha2, the_hash_algo->rawsz); > > } > > I'm not familiar with Git code, but for most environments assert() is a > macro which is compiled out when built for "release mode" (whatever > that might mean). If that's the case for Git too, then relying on > assert() to provide a side-effect (even an optimizer hint side-effect) > won't work and this will actually get slower when built for "release > mode". > > Just a thought... We don't have such a "release mode" in Git, though of course people may pass -DNDEBUG to the compiler if they want. However, to me how we spell the assert is mostly orthogonal to the discussion. We can do "if (...) BUG(...)" to get a guaranteed-present conditional. The bigger questions are: - are we OK with such an assertion; and - does the assertion still give us the desired behavior when we add in a branch for rawsz==32? And I think the answers for those are both "probably not". -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote: > The other thing I was going to recommend (and I'll try to test this out > myself later) is to see if 'the_hash_algo->rawsz' is being treated as a > volatile variable, since it is being referenced through a pointer. Perhaps > storing the value locally and then casing on it would help? I tried various sprinkling of "const" around the declarations to make it clear that the values wouldn't change once we saw them. But I couldn't detect any difference. At most I think that would let us hoist the "if" out of the loop, but gcc still seems unwilling to expand the memcmp when there are other branches. I think if that's the thing we want to have happen, we really do need to just write it out on that branch rather than saying "memcmp". -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On Wed, Aug 22, 2018 at 09:39:57AM +0200, Ævar Arnfjörð Bjarmason wrote: > > I don't have a good option. The assert() thing works until I add in the > > "32" branch, but that's just punting the issue off until you add support > > for the new hash. > > > > Hand-rolling our own asm or C is a portability headache, and we need to > > change all of the callsites to use a new hasheq(). > > > > Hiding it behind a per-hash function is conceptually cleanest, but not > > quite as fast. And it also requires hasheq(). > > > > So all of the solutions seem non-trivial. Again, I'm starting to wonder > > if it's worth chasing this few percent. > > Did you try __builtin_expect? It's a GCC builtin for these sorts of > situations, and sometimes helps: > https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html > > I.e. you'd tell GCC we expect to have the 20 there with: > > if (__builtin_expect(the_hash_algo->rawsz == 20, 1)) { ... } > > The perl codebase has LIKELY() and UNLIKELY() macros for this which if > the feature isn't available fall back on just plain C code: > https://github.com/Perl/perl5/blob/v5.27.7/perl.h#L3335-L3344 Sadly, no, this doesn't seem to change anything. We still end up with a single call to memcmp. I also tried "hiding" the fallback call like this: diff --git a/cache.h b/cache.h index b1fd3d58ab..7808bf3d6b 100644 --- a/cache.h +++ b/cache.h @@ -1021,9 +1021,13 @@ extern int find_unique_abbrev_r(char *hex, const struct object_id *oid, int len) extern const unsigned char null_sha1[GIT_MAX_RAWSZ]; extern const struct object_id null_oid; +int super_secret_memcmp(const void *a, const void *b, size_t len); + static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) { - return memcmp(sha1, sha2, the_hash_algo->rawsz); + if (the_hash_algo->rawsz == 20) + return memcmp(sha1, sha2, 20); + return super_secret_memcmp(sha1, sha2, the_hash_algo->rawsz); } static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2) diff --git a/sha1-file.c b/sha1-file.c index 97b7423848..5cd0a4b73f 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -2280,3 +2280,8 @@ int read_loose_object(const char *path, munmap(map, mapsize); return ret; } + +int super_secret_memcmp(const void *a, const void *b, size_t len) +{ + return memcmp(a, b, len); +} but that just results in calling memcmp and super_secret_memcmp on the two codepaths (with or without the __builtin_expect). -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On 8/22/2018 1:36 AM, brian m. carlson wrote: On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote: So I wonder if there's some other way to tell the compiler that we'll only have a few values. An enum comes to mind, though I don't think the enum rules are strict enough to make this guarantee (after all, it's OK to bitwise-OR enums, so they clearly don't specify all possible values). I was thinking about this: diff --git a/cache.h b/cache.h index 1398b2a4e4..1f5c6e9319 100644 --- a/cache.h +++ b/cache.h @@ -1033,7 +1033,14 @@ extern const struct object_id null_oid; static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) { - return memcmp(sha1, sha2, the_hash_algo->rawsz); + switch (the_hash_algo->rawsz) { + case 20: + return memcmp(sha1, sha2, 20); + case 32: + return memcmp(sha1, sha2, 32); + default: + assert(0); + } } static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2) That would make it obvious that there are at most two options. Unfortunately, gcc for me determines that the buffer in walker.c is 20 bytes in size and steadfastly refuses to compile because it doesn't know that the value will never be 32 in our codebase currently. I'd need to send in more patches before it would compile. I don't know if something like this is an improvement or now, but this seems to at least compile: diff --git a/cache.h b/cache.h index 1398b2a4e4..3207f74771 100644 --- a/cache.h +++ b/cache.h @@ -1033,7 +1033,13 @@ extern const struct object_id null_oid; static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) { - return memcmp(sha1, sha2, the_hash_algo->rawsz); + switch (the_hash_algo->rawsz) { + case 20: + case 32: + return memcmp(sha1, sha2, the_hash_algo->rawsz); + default: + assert(0); + } } In my testing, I've had the best luck with this change: diff --git a/cache.h b/cache.h index b1fd3d58ab..6c8b51c390 100644 --- a/cache.h +++ b/cache.h @@ -1023,7 +1023,14 @@ extern const struct object_id null_oid; static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) { - return memcmp(sha1, sha2, the_hash_algo->rawsz); + switch (the_hash_algo->rawsz) { + case 20: + return memcmp(sha1, sha2, 20); + case 32: + return memcmp(sha1, sha2, 32); + default: + assert(0); + } } The fact that '20' and '32' are constants here may be helpful to the compiler. Can someone else test the perf? Thanks, -Stolee
Re: [ANNOUNCE] Git v2.19.0-rc0
On Tue, 2018-08-21 at 23:03 -0400, Jeff King wrote: > static inline int hashcmp(const unsigned char *sha1, const unsigned > char *sha2) > { > + assert(the_hash_algo->rawsz == 20); > return memcmp(sha1, sha2, the_hash_algo->rawsz); > } I'm not familiar with Git code, but for most environments assert() is a macro which is compiled out when built for "release mode" (whatever that might mean). If that's the case for Git too, then relying on assert() to provide a side-effect (even an optimizer hint side-effect) won't work and this will actually get slower when built for "release mode". Just a thought...
Re: [ANNOUNCE] Git v2.19.0-rc0
On 8/22/2018 3:39 AM, Ævar Arnfjörð Bjarmason wrote: On Wed, Aug 22, 2018 at 8:20 AM Jeff King wrote: On Wed, Aug 22, 2018 at 05:36:26AM +, brian m. carlson wrote: On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote: I don't know if something like this is an improvement or now, but this seems to at least compile: diff --git a/cache.h b/cache.h index 1398b2a4e4..3207f74771 100644 --- a/cache.h +++ b/cache.h @@ -1033,7 +1033,13 @@ extern const struct object_id null_oid; static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) { - return memcmp(sha1, sha2, the_hash_algo->rawsz); + switch (the_hash_algo->rawsz) { + case 20: + case 32: + return memcmp(sha1, sha2, the_hash_algo->rawsz); + default: + assert(0); + } I think that would end up with the same slow code, as gcc would rather call memcmp than expand out the two sets of asm. I won't have time to sit down and test this out until tomorrow afternoon at the earliest. If you want to send in something in the mean time, even if that limits things to just 20 for now, that's fine. I don't have a good option. The assert() thing works until I add in the "32" branch, but that's just punting the issue off until you add support for the new hash. Hand-rolling our own asm or C is a portability headache, and we need to change all of the callsites to use a new hasheq(). Hiding it behind a per-hash function is conceptually cleanest, but not quite as fast. And it also requires hasheq(). So all of the solutions seem non-trivial. Again, I'm starting to wonder if it's worth chasing this few percent. Did you try __builtin_expect? It's a GCC builtin for these sorts of situations, and sometimes helps: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html I.e. you'd tell GCC we expect to have the 20 there with: if (__builtin_expect(the_hash_algo->rawsz == 20, 1)) { ... } The perl codebase has LIKELY() and UNLIKELY() macros for this which if the feature isn't available fall back on just plain C code: https://github.com/Perl/perl5/blob/v5.27.7/perl.h#L3335-L3344 The other thing I was going to recommend (and I'll try to test this out myself later) is to see if 'the_hash_algo->rawsz' is being treated as a volatile variable, since it is being referenced through a pointer. Perhaps storing the value locally and then casing on it would help?
Re: [ANNOUNCE] Git v2.19.0-rc0
On 8/21/2018 11:36 PM, Jeff King wrote: On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote: with the obvious "oideq()" implementation added, that seems to get me to 2-3%. Not _quite_ as good as the original branching version I showed. And we had to touch all the callsites (although arguably that kind of "eq" function is a better interface anyway, since it obviously allows for more optimization. So maybe the branching thing is actually not so insane. It makes new hash_algo's Just Work; they just won't be optimized. And the change is very localized. Hmph. So I went back to double-check my measurements on that branching version, and I couldn't replicate it! I'm actually relieved to see this, as I couldn't either. It turns out what I showed (and measured) before has a bug. Can you see it? I had rewritten the section from scratch instead of applying your diff, so I didn't get the sha1-sha1 error. I decided to sleep on it instead of sending my email. So the assert() version really is the fastest. I didn't test, but I suspect we could "trick" the compiler by having the fallback call an opaque wrapper around memcmp(). That would prevent it from combining the two paths, and presumably it would still optimize the constant-20 side. Or maybe it would eventually decide our inline function is getting too big and scrap it. Which probably crosses a line of craziness (if I didn't already cross it two emails ago). I appreciate your effort here. Thanks -Stolee
Re: [ANNOUNCE] Git v2.19.0-rc0
On Wed, Aug 22, 2018 at 8:20 AM Jeff King wrote: > > On Wed, Aug 22, 2018 at 05:36:26AM +, brian m. carlson wrote: > > > On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote: > > > So I wonder if there's some other way to tell the compiler that we'll > > > only have a few values. An enum comes to mind, though I don't think the > > > enum rules are strict enough to make this guarantee (after all, it's OK > > > to bitwise-OR enums, so they clearly don't specify all possible values). > > > > I was thinking about this: > > > > diff --git a/cache.h b/cache.h > > index 1398b2a4e4..1f5c6e9319 100644 > > --- a/cache.h > > +++ b/cache.h > > @@ -1033,7 +1033,14 @@ extern const struct object_id null_oid; > > > > static inline int hashcmp(const unsigned char *sha1, const unsigned char > > *sha2) > > { > > - return memcmp(sha1, sha2, the_hash_algo->rawsz); > > + switch (the_hash_algo->rawsz) { > > + case 20: > > + return memcmp(sha1, sha2, 20); > > + case 32: > > + return memcmp(sha1, sha2, 32); > > + default: > > + assert(0); > > + } > > } > > Unfortunately this version doesn't seem to be any faster than the status > quo. And looking at the generated asm, it still looks to be calling > memcpy(). Removing the "case 32" branch switches it back to fast > assembly (this is all using gcc 8.2.0, btw). So I think we're deep into > guessing what the optimizer is going to do, and there's a good chance > that other versions are going to optimize it differently. > > We might be better off just writing it out manually. Unfortunately, it's > a bit hard because the neg/0/pos return is more expensive to compute > than pure equality. And only the compiler knows at each inlined site > whether we actually want equality. So now we're back to switching every > caller to use hasheq() if that's what they want. > > But _if_ we're OK with that, and _if_ we don't mind some ifdefs for > portability, then this seems as fast as the original (memcmp+constant) > code on my machine: > > diff --git a/cache.h b/cache.h > index b1fd3d58ab..c406105f3c 100644 > --- a/cache.h > +++ b/cache.h > @@ -1023,7 +1023,16 @@ extern const struct object_id null_oid; > > static inline int hashcmp(const unsigned char *sha1, const unsigned char > *sha2) > { > - return memcmp(sha1, sha2, the_hash_algo->rawsz); > + switch (the_hash_algo->rawsz) { > + case 20: > + if (*(uint32_t *)sha1 == *(uint32_t *)sha2 && > + *(unsigned __int128 *)(sha1+4) == *(unsigned __int128 > *)(sha2+4)) > + return 0; > + case 32: > + return memcmp(sha1, sha2, 32); > + default: > + assert(0); > + } > } > > static inline int oidcmp(const struct object_id *oid1, const struct > object_id *oid2) > > Which is really no surprise, because the generated asm looks about the > same. There are obviously alignment questions there. It's possible it > could even be written portably as a simple loop. Or maybe not. We used > to do that, but modern compilers were able to optimize the memcmp > better. Maybe that's changed. Or maybe they were simply unwilling to > unroll a 20-length loop to find out that it could be turned into a few > quad-word compares. > > > That would make it obvious that there are at most two options. > > Unfortunately, gcc for me determines that the buffer in walker.c is 20 > > bytes in size and steadfastly refuses to compile because it doesn't know > > that the value will never be 32 in our codebase currently. I'd need to > > send in more patches before it would compile. > > Yeah, I see that warning all over the place (everywhere that calls > is_null_oid(), which is passing in a 20-byte buffer). > > > I don't know if something like this is an improvement or now, but this > > seems to at least compile: > > > > diff --git a/cache.h b/cache.h > > index 1398b2a4e4..3207f74771 100644 > > --- a/cache.h > > +++ b/cache.h > > @@ -1033,7 +1033,13 @@ extern const struct object_id null_oid; > > > > static inline int hashcmp(const unsigned char *sha1, const unsigned char > > *sha2) > > { > > - return memcmp(sha1, sha2, the_hash_algo->rawsz); > > + switch (the_hash_algo->rawsz) { > > + case 20: > > + case 32: > > + return memcmp(sha1, sha2, the_hash_algo->rawsz); > > + default: > > + assert(0); > > + } > > I think that would end up with the same slow code, as gcc would rather > call memcmp than expand out the two sets of asm. > > > I won't have time to sit down and test this out until tomorrow afternoon > > at the earliest. If you want to send in something in the mean time, > > even if that limits things to just 20 for now, that's fine. > > I don't have a good option. The assert() thing works until I add in the > "32" branch, but that's just punting the issue off until
Re: [ANNOUNCE] Git v2.19.0-rc0
On Wed, Aug 22, 2018 at 05:36:26AM +, brian m. carlson wrote: > On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote: > > So I wonder if there's some other way to tell the compiler that we'll > > only have a few values. An enum comes to mind, though I don't think the > > enum rules are strict enough to make this guarantee (after all, it's OK > > to bitwise-OR enums, so they clearly don't specify all possible values). > > I was thinking about this: > > diff --git a/cache.h b/cache.h > index 1398b2a4e4..1f5c6e9319 100644 > --- a/cache.h > +++ b/cache.h > @@ -1033,7 +1033,14 @@ extern const struct object_id null_oid; > > static inline int hashcmp(const unsigned char *sha1, const unsigned char > *sha2) > { > - return memcmp(sha1, sha2, the_hash_algo->rawsz); > + switch (the_hash_algo->rawsz) { > + case 20: > + return memcmp(sha1, sha2, 20); > + case 32: > + return memcmp(sha1, sha2, 32); > + default: > + assert(0); > + } > } Unfortunately this version doesn't seem to be any faster than the status quo. And looking at the generated asm, it still looks to be calling memcpy(). Removing the "case 32" branch switches it back to fast assembly (this is all using gcc 8.2.0, btw). So I think we're deep into guessing what the optimizer is going to do, and there's a good chance that other versions are going to optimize it differently. We might be better off just writing it out manually. Unfortunately, it's a bit hard because the neg/0/pos return is more expensive to compute than pure equality. And only the compiler knows at each inlined site whether we actually want equality. So now we're back to switching every caller to use hasheq() if that's what they want. But _if_ we're OK with that, and _if_ we don't mind some ifdefs for portability, then this seems as fast as the original (memcmp+constant) code on my machine: diff --git a/cache.h b/cache.h index b1fd3d58ab..c406105f3c 100644 --- a/cache.h +++ b/cache.h @@ -1023,7 +1023,16 @@ extern const struct object_id null_oid; static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) { - return memcmp(sha1, sha2, the_hash_algo->rawsz); + switch (the_hash_algo->rawsz) { + case 20: + if (*(uint32_t *)sha1 == *(uint32_t *)sha2 && + *(unsigned __int128 *)(sha1+4) == *(unsigned __int128 *)(sha2+4)) + return 0; + case 32: + return memcmp(sha1, sha2, 32); + default: + assert(0); + } } static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2) Which is really no surprise, because the generated asm looks about the same. There are obviously alignment questions there. It's possible it could even be written portably as a simple loop. Or maybe not. We used to do that, but modern compilers were able to optimize the memcmp better. Maybe that's changed. Or maybe they were simply unwilling to unroll a 20-length loop to find out that it could be turned into a few quad-word compares. > That would make it obvious that there are at most two options. > Unfortunately, gcc for me determines that the buffer in walker.c is 20 > bytes in size and steadfastly refuses to compile because it doesn't know > that the value will never be 32 in our codebase currently. I'd need to > send in more patches before it would compile. Yeah, I see that warning all over the place (everywhere that calls is_null_oid(), which is passing in a 20-byte buffer). > I don't know if something like this is an improvement or now, but this > seems to at least compile: > > diff --git a/cache.h b/cache.h > index 1398b2a4e4..3207f74771 100644 > --- a/cache.h > +++ b/cache.h > @@ -1033,7 +1033,13 @@ extern const struct object_id null_oid; > > static inline int hashcmp(const unsigned char *sha1, const unsigned char > *sha2) > { > - return memcmp(sha1, sha2, the_hash_algo->rawsz); > + switch (the_hash_algo->rawsz) { > + case 20: > + case 32: > + return memcmp(sha1, sha2, the_hash_algo->rawsz); > + default: > + assert(0); > + } I think that would end up with the same slow code, as gcc would rather call memcmp than expand out the two sets of asm. > I won't have time to sit down and test this out until tomorrow afternoon > at the earliest. If you want to send in something in the mean time, > even if that limits things to just 20 for now, that's fine. I don't have a good option. The assert() thing works until I add in the "32" branch, but that's just punting the issue off until you add support for the new hash. Hand-rolling our own asm or C is a portability headache, and we need to change all of the callsites to use a new hasheq(). Hiding it behind a per-hash function is conceptually cleanest, but not quite as fast. And it also requires hash
Re: [ANNOUNCE] Git v2.19.0-rc0
On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote: > So I wonder if there's some other way to tell the compiler that we'll > only have a few values. An enum comes to mind, though I don't think the > enum rules are strict enough to make this guarantee (after all, it's OK > to bitwise-OR enums, so they clearly don't specify all possible values). I was thinking about this: diff --git a/cache.h b/cache.h index 1398b2a4e4..1f5c6e9319 100644 --- a/cache.h +++ b/cache.h @@ -1033,7 +1033,14 @@ extern const struct object_id null_oid; static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) { - return memcmp(sha1, sha2, the_hash_algo->rawsz); + switch (the_hash_algo->rawsz) { + case 20: + return memcmp(sha1, sha2, 20); + case 32: + return memcmp(sha1, sha2, 32); + default: + assert(0); + } } static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2) That would make it obvious that there are at most two options. Unfortunately, gcc for me determines that the buffer in walker.c is 20 bytes in size and steadfastly refuses to compile because it doesn't know that the value will never be 32 in our codebase currently. I'd need to send in more patches before it would compile. I don't know if something like this is an improvement or now, but this seems to at least compile: diff --git a/cache.h b/cache.h index 1398b2a4e4..3207f74771 100644 --- a/cache.h +++ b/cache.h @@ -1033,7 +1033,13 @@ extern const struct object_id null_oid; static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) { - return memcmp(sha1, sha2, the_hash_algo->rawsz); + switch (the_hash_algo->rawsz) { + case 20: + case 32: + return memcmp(sha1, sha2, the_hash_algo->rawsz); + default: + assert(0); + } } static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2) I won't have time to sit down and test this out until tomorrow afternoon at the earliest. If you want to send in something in the mean time, even if that limits things to just 20 for now, that's fine. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [ANNOUNCE] Git v2.19.0-rc0
On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote: > with the obvious "oideq()" implementation added, that seems to get me to > 2-3%. Not _quite_ as good as the original branching version I showed. > And we had to touch all the callsites (although arguably that kind of > "eq" function is a better interface anyway, since it obviously allows > for more optimization. > > So maybe the branching thing is actually not so insane. It makes new > hash_algo's Just Work; they just won't be optimized. And the change is > very localized. Hmph. So I went back to double-check my measurements on that branching version, and I couldn't replicate it! It turns out what I showed (and measured) before has a bug. Can you see it? diff --git a/cache.h b/cache.h index b1fd3d58ab..9c004a26c9 100644 --- a/cache.h +++ b/cache.h @@ -1023,7 +1023,10 @@ extern const struct object_id null_oid; static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) { - return memcmp(sha1, sha2, the_hash_algo->rawsz); + if (the_hash_algo->rawsz == 20) + return memcmp(sha1, sha2, 20); + else + return memcmp(sha1, sha1, the_hash_algo->rawsz); } static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2) The problem is the fallback code compares "sha1" to "sha1". The compiler realizes that's a noop and is able to treat it like a constant. Thus essentially leaving only the first branch, which it then expands into a few instructions. If we fix that bug, then we really do memcmp on either side of the conditional. And the compiler is smart enough to realize that hey, that's the same as just calling memcmp with the_hash_algo->rawsz on either side. And we end up with roughly the same code that we started with. So the assert() version really is the fastest. I didn't test, but I suspect we could "trick" the compiler by having the fallback call an opaque wrapper around memcmp(). That would prevent it from combining the two paths, and presumably it would still optimize the constant-20 side. Or maybe it would eventually decide our inline function is getting too big and scrap it. Which probably crosses a line of craziness (if I didn't already cross it two emails ago). -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On Wed, Aug 22, 2018 at 12:48:16AM +, brian m. carlson wrote: > > diff --git a/cache.h b/cache.h > > index b1fd3d58ab..9c004a26c9 100644 > > --- a/cache.h > > +++ b/cache.h > > @@ -1023,7 +1023,10 @@ extern const struct object_id null_oid; > > > > static inline int hashcmp(const unsigned char *sha1, const unsigned char > > *sha2) > > { > > - return memcmp(sha1, sha2, the_hash_algo->rawsz); > > + if (the_hash_algo->rawsz == 20) > > + return memcmp(sha1, sha2, 20); > > + else > > + return memcmp(sha1, sha1, the_hash_algo->rawsz); > > } > > > > static inline int oidcmp(const struct object_id *oid1, const struct > > object_id *oid2) > > on top of v2.19-rc0 seems to give me about a 3% speedup (though I might > > be imaging it, as there's a bit of noise). A function pointer in > > the_hash_algo might make even more sense. > > It's possible that might be a better solution. I looked into a GCC > assertion that the value was either 20 or 32, and that in itself didn't > seem to help, at least in the generated code. Your solution is likely > better in that regard. > > We could wire it up to be either 20 or 32 and let people experimenting > with other sizes of algorithms just add another branch. I haven't > tested how that performs, though. Here's a _really_ dirty one: diff --git a/cache.h b/cache.h index b1fd3d58ab..a6750524ea 100644 --- a/cache.h +++ b/cache.h @@ -1023,6 +1023,7 @@ extern const struct object_id null_oid; static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) { + assert(the_hash_algo->rawsz == 20); return memcmp(sha1, sha2, the_hash_algo->rawsz); } We probably don't want to do that, because it makes experimenting with new hash algos a bit painful, but it gives the same 3-4% speedup pretty consistently. But I think it demonstrates pretty clearly that giving the compiler the extra limit information is sufficient. Presumably the fixed-size memcmp turns into a few multi-word compares. And indeed, if I look at the generated asm for the call in lookup_object (which is likely the one we're hitting a lot in this case), I see: # cache.h:1027: return memcmp(sha1, sha2, the_hash_algo->rawsz); .loc 4 1027 9 is_stmt 0 view .LVU86 movq(%rsi), %rcx# MEM[(void *)sha1_25(D)], MEM[(void *)sha1_25(D)] movq8(%rsi), %rdi # MEM[(void *)sha1_25(D)], tmp125 xorq4(%rax), %rcx # MEM[(void *)_6], tmp116 xorq8(%r8), %rdi# MEM[(void *)_6], tmp115 orq %rcx, %rdi # tmp116, tmp115 jne .L27#, movl16(%r8), %ecx # MEM[(void *)_6], tmp129 cmpl%ecx, 16(%rsi) # tmp129, MEM[(void *)sha1_25(D)] jne .L27#, So I wonder if there's some other way to tell the compiler that we'll only have a few values. An enum comes to mind, though I don't think the enum rules are strict enough to make this guarantee (after all, it's OK to bitwise-OR enums, so they clearly don't specify all possible values). Having a dedicate hashcmp function for each hash_algo seems like the sanest approach. We pay for one indirect function call, but the function itself will have the constants available. But it does introduce one extra complication. We're benefiting here from knowing that the size is always 20, but also the inline hashcmp knows that we only care about equality, not comparison. So if I do this: diff --git a/cache.h b/cache.h index b1fd3d58ab..da56da7be2 100644 --- a/cache.h +++ b/cache.h @@ -1023,7 +1023,7 @@ extern const struct object_id null_oid; static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) { - return memcmp(sha1, sha2, the_hash_algo->rawsz); + return the_hash_algo->cmp_fn(sha1, sha2); } static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2) diff --git a/hash.h b/hash.h index 7c8238bc2e..ac22ba63b6 100644 --- a/hash.h +++ b/hash.h @@ -64,6 +64,7 @@ typedef union git_hash_ctx git_hash_ctx; typedef void (*git_hash_init_fn)(git_hash_ctx *ctx); typedef void (*git_hash_update_fn)(git_hash_ctx *ctx, const void *in, size_t len); typedef void (*git_hash_final_fn)(unsigned char *hash, git_hash_ctx *ctx); +typedef int (*git_hash_cmp_fn)(const void *a, const void *b); struct git_hash_algo { /* @@ -90,6 +91,8 @@ struct git_hash_algo { /* The hash finalization function. */ git_hash_final_fn final_fn; + git_hash_cmp_fn cmp_fn; + /* The OID of the empty tree. */ const struct object_id *empty_tree; diff --git a/sha1-file.c b/sha1-file.c index 97b7423848..7072e360d7 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -69,6 +69,11 @@ static void git_hash_sha1_final(unsigned char *hash, git_hash_ctx *ctx) git_SHA1_Final(hash, &ctx->sha1); } +static int git_hash_sha1_cmp(const void *a, const void *b) +{ + return memcmp(a, b, 20); +} + static void git_hash
Re: [ANNOUNCE] Git v2.19.0-rc0
On Tue, Aug 21, 2018 at 05:29:24PM -0400, Jeff King wrote: > 0001.2: rev-list --all --objects 37.07(36.62+0.45) 39.11(38.58+0.51) +5.5% > > Less change, but my overall times were smaller, too, so clearly our > hardware or exact repos are a little bit different. Those numbers seem > pretty consistent in further runs. > > It bisects to 509f6f62a4 (cache: update object ID functions for > the_hash_algo, 2018-07-16). Which make sense. An "--objects" traversal > spends a huge amount of time checking each tree entry to see if we've > processed that object yet, which ends up as hashcmp() in the hash table. > I expect that a fixed 20-byte memcmp() can be optimized a lot more than > one with an arbitrary value. > > Even if _we_ know the value can only take on one of a few values, I > don't know that we have an easy way to tell the compiler that. Possibly > we could improve things by jumping directly to an optimized code path. > Sort of a poor-man's JIT. ;) > > Doing this: > > diff --git a/cache.h b/cache.h > index b1fd3d58ab..9c004a26c9 100644 > --- a/cache.h > +++ b/cache.h > @@ -1023,7 +1023,10 @@ extern const struct object_id null_oid; > > static inline int hashcmp(const unsigned char *sha1, const unsigned char > *sha2) > { > - return memcmp(sha1, sha2, the_hash_algo->rawsz); > + if (the_hash_algo->rawsz == 20) > + return memcmp(sha1, sha2, 20); > + else > + return memcmp(sha1, sha1, the_hash_algo->rawsz); > } > > static inline int oidcmp(const struct object_id *oid1, const struct > object_id *oid2) > on top of v2.19-rc0 seems to give me about a 3% speedup (though I might > be imaging it, as there's a bit of noise). A function pointer in > the_hash_algo might make even more sense. It's possible that might be a better solution. I looked into a GCC assertion that the value was either 20 or 32, and that in itself didn't seem to help, at least in the generated code. Your solution is likely better in that regard. We could wire it up to be either 20 or 32 and let people experimenting with other sizes of algorithms just add another branch. I haven't tested how that performs, though. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [ANNOUNCE] Git v2.19.0-rc0
On Tue, Aug 21, 2018 at 04:41:02PM -0400, Derrick Stolee wrote: > On 8/20/2018 6:13 PM, Junio C Hamano wrote: > > An early preview release Git v2.19.0-rc0 is now available for > > testing at the usual places. > > As part of testing the release candidate, I ran the performance suite > against a fresh clone of the Linux repository using v2.18.0 and v2.19.0-rc0 > (also: GIT_PERF_REPEAT_COUNT=10). Wow, you're a glutton for punishment. :) > I found a few nice improvements, but I > also found a possible regression in tree walking. I say "tree walking" > because it was revealed using p0001-rev-list.sh, but only with the > "--objects" flag. I also saw some similar numbers on 'git log --raw'. > > Test v2.18.0 v2.19.0-rc0 > > 0001.1: rev-list --all 6.69(6.33+0.35) 6.52(6.20+0.31) -2.5% > 0001.2: rev-list --all --objects 52.14(47.43+1.02) 57.15(51.09+1.18) +9.6% > > To me, 9.6% seems out of the range of just noise for this length of a > command, but I could be wrong. Could anyone else try to repro these results? I got: 0001.2: rev-list --all --objects 37.07(36.62+0.45) 39.11(38.58+0.51) +5.5% Less change, but my overall times were smaller, too, so clearly our hardware or exact repos are a little bit different. Those numbers seem pretty consistent in further runs. It bisects to 509f6f62a4 (cache: update object ID functions for the_hash_algo, 2018-07-16). Which make sense. An "--objects" traversal spends a huge amount of time checking each tree entry to see if we've processed that object yet, which ends up as hashcmp() in the hash table. I expect that a fixed 20-byte memcmp() can be optimized a lot more than one with an arbitrary value. Even if _we_ know the value can only take on one of a few values, I don't know that we have an easy way to tell the compiler that. Possibly we could improve things by jumping directly to an optimized code path. Sort of a poor-man's JIT. ;) Doing this: diff --git a/cache.h b/cache.h index b1fd3d58ab..9c004a26c9 100644 --- a/cache.h +++ b/cache.h @@ -1023,7 +1023,10 @@ extern const struct object_id null_oid; static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) { - return memcmp(sha1, sha2, the_hash_algo->rawsz); + if (the_hash_algo->rawsz == 20) + return memcmp(sha1, sha2, 20); + else + return memcmp(sha1, sha1, the_hash_algo->rawsz); } static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2) on top of v2.19-rc0 seems to give me about a 3% speedup (though I might be imaging it, as there's a bit of noise). A function pointer in the_hash_algo might make even more sense. -Peff
Re: [ANNOUNCE] Git v2.19.0-rc0
On 8/20/2018 6:13 PM, Junio C Hamano wrote: An early preview release Git v2.19.0-rc0 is now available for testing at the usual places. As part of testing the release candidate, I ran the performance suite against a fresh clone of the Linux repository using v2.18.0 and v2.19.0-rc0 (also: GIT_PERF_REPEAT_COUNT=10). I found a few nice improvements, but I also found a possible regression in tree walking. I say "tree walking" because it was revealed using p0001-rev-list.sh, but only with the "--objects" flag. I also saw some similar numbers on 'git log --raw'. Test v2.18.0 v2.19.0-rc0 0001.1: rev-list --all 6.69(6.33+0.35) 6.52(6.20+0.31) -2.5% 0001.2: rev-list --all --objects 52.14(47.43+1.02) 57.15(51.09+1.18) +9.6% To me, 9.6% seems out of the range of just noise for this length of a command, but I could be wrong. Could anyone else try to repro these results? (This may also not just be tree-walking, but general pack-file loading and decompression, since I computed and stored a commit-graph file. Hence, commits are not being parsed from the pack-file by either command.) Aside: the perf results were not all bad. Here was an interesting improvement: Test v2.18.0 v2.19.0-rc0 0002.1: read_cache/discard_cache 1000 times 5.63(5.30+0.32) 3.34(3.03+0.30) -40.7% Thanks, -Stolee
Re: [ANNOUNCE] Git v2.19.0-rc0
On Mon, Aug 20, 2018 at 5:27 PM Jonathan Nieder wrote: > > Jonathan Nieder wrote: > > Stefan Beller wrote: > >> Junio C Hamano wrote: > > >>> * "git submodule" did not correctly adjust core.worktree setting that > >>>indicates whether/where a submodule repository has its associated > >>>working tree across various state transitions, which has been > >>>corrected. > >>>(merge 984cd77ddb sb/submodule-core-worktree later to maint). > >> > >> Personally I do not view this as a bug fix but a feature > >> (but then again my thinking might be tainted of too much > >> submodule work) hence I would not merge it down. > > > > Can you elaborate? > > ... ah, I figured it out. You are saying "would not merge it down to > maint". In that case, I agree, since this this is not a recent bug > (it's existed since before v1.7.10-rc1~14^2~2, 2012-03-02). Yeah; the behavior was the gold standard for submodules ever since, so I am wary of changing it under the guise of fixing a bug. The core.worktree setting doesn't harm the user by default; you need to craft a very specific situation to benefit from this feature. Stefan
Re: [ANNOUNCE] Git v2.19.0-rc0
Jonathan Nieder wrote: > Stefan Beller wrote: >> Junio C Hamano wrote: >>> * "git submodule" did not correctly adjust core.worktree setting that >>>indicates whether/where a submodule repository has its associated >>>working tree across various state transitions, which has been >>>corrected. >>>(merge 984cd77ddb sb/submodule-core-worktree later to maint). >> >> Personally I do not view this as a bug fix but a feature >> (but then again my thinking might be tainted of too much >> submodule work) hence I would not merge it down. > > Can you elaborate? ... ah, I figured it out. You are saying "would not merge it down to maint". In that case, I agree, since this this is not a recent bug (it's existed since before v1.7.10-rc1~14^2~2, 2012-03-02). Thanks, Jonathan
Re: [ANNOUNCE] Git v2.19.0-rc0
(-cc: other lists) Stefan Beller wrote: > Junio C Hamano wrote: >> * "git submodule" did not correctly adjust core.worktree setting that >>indicates whether/where a submodule repository has its associated >>working tree across various state transitions, which has been >>corrected. >>(merge 984cd77ddb sb/submodule-core-worktree later to maint). > > Personally I do not view this as a bug fix but a feature > (but then again my thinking might be tainted of too much > submodule work) hence I would not merge it down. Can you elaborate? The symptom that this series fixes was pretty bad, so I'm pretty glad you wrote it. Thanks, Jonathan