[PATCH 19/20] abbrev: support relative abbrev values
Change the core.abbrev config variable and the corresponding --abbrev command-line option to support relative values such as +1 or -1. Before Linus's e6c587c733 ("abbrev: auto size the default abbreviation", 2016-09-30) git would default to abbreviating object names to 7-hexdigits, and only picking longer SHA-1s as needed if that was ambiguous. That change instead set the default length as a function of the estimated current count of objects: Based on the expectation that we would see collision in a repository with 2^(2N) objects when using object names shortened to first N bits, use sufficient number of hexdigits to cover the number of objects in the repository. Each hexdigit (4-bits) we add to the shortened name allows us to have four times (2-bits) as many objects in the repository. By supporting relative values for core.abbrev we can allow users to consistently opt-in for either a higher or lower probability of collision, without needing to hardcode a given numeric value like "10", which would be overkill on some repositories, and far to small on others. Signed-off-by: Ævar Arnfjörð Bjarmason --- Documentation/config.txt | 6 ++ Documentation/diff-options.txt | 3 + Documentation/git-blame.txt| 3 + Documentation/git-branch.txt | 3 + Documentation/git-describe.txt | 3 + Documentation/git-ls-files.txt | 3 + Documentation/git-ls-tree.txt | 3 + Documentation/git-show-ref.txt | 3 + cache.h| 1 + config.c | 11 +++ diff.c | 18 +++- environment.c | 1 + parse-options-cb.c | 12 ++- revision.c | 18 +++- sha1-name.c| 11 +++ t/t0014-abbrev.sh | 170 ++--- 16 files changed, 204 insertions(+), 65 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index ab641bf5a9..abf07be7b6 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -919,6 +919,12 @@ core.abbrev:: in your repository, which hopefully is enough for abbreviated object names to stay unique for some time. The minimum length is 4. ++ +This can also be set to relative values such as `+2` or `-2`, which +means to add or subtract N characters from the SHA-1 that Git would +otherwise print, this allows for producing more future-proof SHA-1s +for use within a given project, while adjusting the value for the +current approximate number of objects. add.ignoreErrors:: add.ignore-errors (deprecated):: diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index f466600972..f1114a7b8d 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -384,6 +384,9 @@ endif::git-format-patch[] independent of the `--full-index` option above, which controls the diff-patch output format. Non default number of digits can be specified with `--abbrev=`. ++ +Can also be set to a relative value, see `core.abbrev` in +linkgit:git-diff[1]. -B[][/]:: --break-rewrites[=[][/]]:: diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt index d6cddbcb2e..8559d3b0c7 100644 --- a/Documentation/git-blame.txt +++ b/Documentation/git-blame.txt @@ -99,6 +99,9 @@ requested, resulting in unaligned output. Before 2.19, setting `core.abbrev=40` wouldn't apply the above rule and would cause blame to emit output that was unaligned. This bug has since been fixed. ++ +Can also be set to a relative value, see `core.abbrev` in +linkgit:git-diff[1]. THE PORCELAIN FORMAT diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt index 02eccbb931..0f8032cec6 100644 --- a/Documentation/git-branch.txt +++ b/Documentation/git-branch.txt @@ -182,6 +182,9 @@ See `--create-reflog` above for details. Alter the sha1's minimum display length in the output listing. The default value is 7 and can be overridden by the `core.abbrev` config option. ++ +Can also be set to a relative value, see `core.abbrev` in +linkgit:git-diff[1]. --no-abbrev:: Display the full sha1s in the output listing rather than abbreviating them. diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt index e027fb8c4b..a3d5c7e817 100644 --- a/Documentation/git-describe.txt +++ b/Documentation/git-describe.txt @@ -65,6 +65,9 @@ OPTIONS abbreviated object name, use digits, or as many digits as needed to form a unique object name. An of 0 will suppress long format, only showing the closest tag. ++ +Can also be set to a relative value, see `core.abbrev` in +linkgit:git-diff[1]. --candidates=:: Instead of considering only the 10 most recent tags as diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt index 5298f1bc30..f9af2b23bf 100644 --- a/Documentation/git-ls-files.txt +++ b/Documentation/git-ls-file
Re: [PATCH 19/20] abbrev: support relative abbrev values
On 9 June 2018 at 00:41, Ævar Arnfjörð Bjarmason wrote: > diff --git a/Documentation/config.txt b/Documentation/config.txt > index ab641bf5a9..abf07be7b6 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -919,6 +919,12 @@ core.abbrev:: > in your repository, which hopefully is enough for > abbreviated object names to stay unique for some time. > The minimum length is 4. > ++ > +This can also be set to relative values such as `+2` or `-2`, which > +means to add or subtract N characters from the SHA-1 that Git would > +otherwise print, this allows for producing more future-proof SHA-1s > +for use within a given project, while adjusting the value for the > +current approximate number of objects. How about s/, this/. This/ to break it up a little? Also, you write "+2" and "-2" but then "N". Unify it? Also, I'd suggest s/SHA-1/object ID/ to be future-proof. > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt > index f466600972..f1114a7b8d 100644 > --- a/Documentation/diff-options.txt > +++ b/Documentation/diff-options.txt > @@ -384,6 +384,9 @@ endif::git-format-patch[] > independent of the `--full-index` option above, which controls > the diff-patch output format. Non default number of > digits can be specified with `--abbrev=`. > ++ > +Can also be set to a relative value, see `core.abbrev` in > +linkgit:git-diff[1]. Good. You then add this paragraph to lots of other places... > diff --git a/config.c b/config.c > index 12f762ad92..cd95c6bdfb 100644 > --- a/config.c > +++ b/config.c > @@ -1151,6 +1151,17 @@ static int git_default_core_config(const char *var, > const char *value) > return config_error_nonbool(var); > if (!strcasecmp(value, "auto")) { > default_abbrev = -1; > + } else if (*value == '+' || *value == '-') { > + int relative = git_config_int(var, value); > + if (relative == 0) > + die(_("bad core.abbrev value %s. " Trailing period? Same below. > + "relative values must be non-zero"), > + value); > + if (abs(relative) > GIT_SHA1_HEXSZ) > + die(_("bad core.abbrev value %s. " > + "impossibly out of range"), > + value); > + default_abbrev_relative = relative; > } else { > int abbrev = git_config_int(var, value); > if (abbrev < minimum_abbrev || abbrev > 40) > diff --git a/diff.c b/diff.c > index e0141cfbc0..f7861b8472 100644 > --- a/diff.c > +++ b/diff.c > @@ -4801,16 +4801,28 @@ int diff_opt_parse(struct diff_options *options, > else if (!strcmp(arg, "--abbrev")) > options->abbrev = DEFAULT_ABBREV; > else if (skip_prefix(arg, "--abbrev=", &arg)) { > + int v; > char *end; > if (!strcmp(arg, "")) > die("--abbrev expects a value, got '%s'", arg); > - options->abbrev = strtoul(arg, &end, 10); > + v = strtoul(arg, &end, 10); > if (*end) > die("--abbrev expects a numerical value, got '%s'", > arg); > - if (options->abbrev < MINIMUM_ABBREV) { > + if (*arg == '+' || *arg == '-') { > + if (v == 0) { > + die("relative abbrev must be non-zero"); > + } else if (abs(v) > the_hash_algo->hexsz) { > + die("relative abbrev impossibly out of > range"); > + } else { > + default_abbrev_relative = v; > + options->abbrev = DEFAULT_ABBREV; > + } > + } else if (v < MINIMUM_ABBREV) { > options->abbrev = MINIMUM_ABBREV; > - } else if (the_hash_algo->hexsz < options->abbrev) { > + } else if (the_hash_algo->hexsz < v) { > options->abbrev = the_hash_algo->hexsz; > + } else { > + options->abbrev = v; I've cut out a few instances of more-or-less repeated code. Any possibility of extracting this into a helper? Maybe after you've done the preparatory work of unifying these sites. Or as part of it, i.e., "let's switch this spot to use the helper; that makes it stricter in this-and-that sense". These can't all be entirely unified, I guess, but maybe "mostly"? Martin
Re: [PATCH 19/20] abbrev: support relative abbrev values
Ævar Arnfjörð Bjarmason writes: > Change the core.abbrev config variable and the corresponding --abbrev > command-line option to support relative values such as +1 or -1. > > Before Linus's e6c587c733 ("abbrev: auto size the default > abbreviation", 2016-09-30) git would default to abbreviating object > names to 7-hexdigits, and only picking longer SHA-1s as needed if that > was ambiguous. > > That change instead set the default length as a function of the > estimated current count of objects: > > Based on the expectation that we would see collision in a > repository with 2^(2N) objects when using object names shortened > to first N bits, use sufficient number of hexdigits to cover the > number of objects in the repository. Each hexdigit (4-bits) we > add to the shortened name allows us to have four times (2-bits) as > many objects in the repository. > > By supporting relative values for core.abbrev we can allow users to > consistently opt-in for either a higher or lower probability of > collision, without needing to hardcode a given numeric value like > "10", which would be overkill on some repositories, and far to small > on others. Nicely explained and calculated ;-) > test_expect_success 'describe core.abbrev=[-+]1 and --abbrev=[-+]1' ' > - test_must_fail git -c core.abbrev=+1 describe | sed_g_tr_d_n >describe > && > - test_must_fail git -c core.abbrev=-1 describe | sed_g_tr_d_n >describe > && > + git -c core.abbrev=-1 describe | sed_g_tr_d_n >describe && > + test_byte_count = 6 describe && > + > + git -c core.abbrev=+1 describe | sed_g_tr_d_n >describe && > + test_byte_count = 8 describe && Even though I see the point of supporting absurdly small absolute values like 4, I do not quite see the point of supporting negative relative values here. What's the expected use case? > git log --abbrev=+1 --pretty=format:%h -1 | tr_d_n >log && > - test_byte_count = 4 log && > + test_byte_count = 8 log && This, together with many many others in the rest of the patch, is cute but confusing in that the diff shows change from 4 to 8 due to the redefinition of what abbrev=+1 means. I have a feeling that it may not be worth doing it "right", but if we were doing it "right", we would probably have done it in multiple steps: - the earlier patches in this series that demonstrates --abbrev=+1 is --abbrev=1 and core.abbrev=+1 is an error. - ensure --abbrev=+1 is rejected as syntax error just like core.abbrev=+1 was, without introducing relative values - introduce relative value. That way, the last step (which corresponds to this patch) would show change from "log --abbrev=+1" failing due to syntax error to showing abbreviated value that is slightly longer than the default. But a I said, it may not be worth doing so. "Is it worth supporting negative relative length?" still stands, though.
Re: [PATCH 19/20] abbrev: support relative abbrev values
On Tue, Jun 12 2018, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason writes: > >> Change the core.abbrev config variable and the corresponding --abbrev >> command-line option to support relative values such as +1 or -1. >> >> Before Linus's e6c587c733 ("abbrev: auto size the default >> abbreviation", 2016-09-30) git would default to abbreviating object >> names to 7-hexdigits, and only picking longer SHA-1s as needed if that >> was ambiguous. >> >> That change instead set the default length as a function of the >> estimated current count of objects: >> >> Based on the expectation that we would see collision in a >> repository with 2^(2N) objects when using object names shortened >> to first N bits, use sufficient number of hexdigits to cover the >> number of objects in the repository. Each hexdigit (4-bits) we >> add to the shortened name allows us to have four times (2-bits) as >> many objects in the repository. >> >> By supporting relative values for core.abbrev we can allow users to >> consistently opt-in for either a higher or lower probability of >> collision, without needing to hardcode a given numeric value like >> "10", which would be overkill on some repositories, and far to small >> on others. > > Nicely explained and calculated ;-) > >> test_expect_success 'describe core.abbrev=[-+]1 and --abbrev=[-+]1' ' >> -test_must_fail git -c core.abbrev=+1 describe | sed_g_tr_d_n >describe >> && >> -test_must_fail git -c core.abbrev=-1 describe | sed_g_tr_d_n >describe >> && >> +git -c core.abbrev=-1 describe | sed_g_tr_d_n >describe && >> +test_byte_count = 6 describe && >> + >> +git -c core.abbrev=+1 describe | sed_g_tr_d_n >describe && >> +test_byte_count = 8 describe && > > Even though I see the point of supporting absurdly small absolute > values like 4, I do not quite see the point of supporting negative > relative values here. What's the expected use case? I'll add a better explanation for this to the commit message. Initially I did this for consistency, since it was easy to implement, and there's no reason to have that arbitrary limitation, but thinking about it again I think I'll use this for some of my projects. E.g. here's a breakdown of my dotfiles repo: $ git -c core.abbrev=4 log --pretty=format:%h|perl -nE 'chomp;say length'|sort|uniq -c|sort -nr 784 4 59 5 7 6 I don't have a single commit that needs 7 characters, yet that's our default. This is a sane trade-off for the kernel, but for something that's just a toy or something you're playing around with something shorter can make sense. SHA-1s aren't just written down, but also e.g. remembered in wetware short-term memory. >> git log --abbrev=+1 --pretty=format:%h -1 | tr_d_n >log && >> -test_byte_count = 4 log && >> +test_byte_count = 8 log && > > This, together with many many others in the rest of the patch, is > cute but confusing in that the diff shows change from 4 to 8 due to > the redefinition of what abbrev=+1 means. I have a feeling that it > may not be worth doing it "right", but if we were doing it "right", > we would probably have done it in multiple steps: > > - the earlier patches in this series that demonstrates > --abbrev=+1 is --abbrev=1 and core.abbrev=+1 is an error. > > - ensure --abbrev=+1 is rejected as syntax error just like > core.abbrev=+1 was, without introducing relative values > > - introduce relative value. > > That way, the last step (which corresponds to this patch) would show > change from "log --abbrev=+1" failing due to syntax error to showing > abbreviated value that is slightly longer than the default. > > But a I said, it may not be worth doing so. "Is it worth supporting > negative relative length?" still stands, though. I'll see what I can do about this value churn.
Re: [PATCH 19/20] abbrev: support relative abbrev values
Ævar Arnfjörð Bjarmason writes: > E.g. here's a breakdown of my dotfiles repo: > > $ git -c core.abbrev=4 log --pretty=format:%h|perl -nE 'chomp;say > length'|sort|uniq -c|sort -nr > 784 4 > 59 5 > 7 6 > > I don't have a single commit that needs 7 characters, yet that's our > default. This is a sane trade-off for the kernel, but for something > that's just a toy or something you're playing around with something > shorter can make sense. > > SHA-1s aren't just written down, but also e.g. remembered in wetware > short-term memory. That's a fine example of what I called "supporting absurdly small absolute values like 4"; I still do not see why you want "negative relative values" from that example.
Re: [PATCH 19/20] abbrev: support relative abbrev values
On Wed, Jun 13 2018, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason writes: > >> E.g. here's a breakdown of my dotfiles repo: >> >> $ git -c core.abbrev=4 log --pretty=format:%h|perl -nE 'chomp;say >> length'|sort|uniq -c|sort -nr >> 784 4 >> 59 5 >> 7 6 >> >> I don't have a single commit that needs 7 characters, yet that's our >> default. This is a sane trade-off for the kernel, but for something >> that's just a toy or something you're playing around with something >> shorter can make sense. >> >> SHA-1s aren't just written down, but also e.g. remembered in wetware >> short-term memory. > > That's a fine example of what I called "supporting absurdly small > absolute values like 4"; I still do not see why you want "negative > relative values" from that example. Because hardcoding -2 is very different than setting it to 5, because the -2 will scale to the size of the repository, but 5 is just 7-2 where 7 is our default value. So, in general if you want less future proof hashes by some probabilistic metric (whether you use core.validateAbbrev or not) you'd use -2 or -3, just like you might use +2 or +3 if you'd like to have more future-proof hashes (especially with core.validateAbbrev=true).
Re: [PATCH 19/20] abbrev: support relative abbrev values
Ævar Arnfjörð Bjarmason writes: > On Wed, Jun 13 2018, Junio C Hamano wrote: > >> Ævar Arnfjörð Bjarmason writes: >> >>> E.g. here's a breakdown of my dotfiles repo: >>> >>> $ git -c core.abbrev=4 log --pretty=format:%h|perl -nE 'chomp;say >>> length'|sort|uniq -c|sort -nr >>> 784 4 >>> 59 5 >>> 7 6 >>> >>> I don't have a single commit that needs 7 characters, yet that's our >>> default. This is a sane trade-off for the kernel, but for something >>> that's just a toy or something you're playing around with something >>> shorter can make sense. >>> >>> SHA-1s aren't just written down, but also e.g. remembered in wetware >>> short-term memory. >> >> That's a fine example of what I called "supporting absurdly small >> absolute values like 4"; I still do not see why you want "negative >> relative values" from that example. > > Because hardcoding -2 is very different than setting it to 5, because > the -2 will scale to the size of the repository, but 5 is just 7-2 where > 7 is our default value. > > So, in general if you want less future proof hashes by some > probabilistic metric (whether you use core.validateAbbrev or not) you'd > use -2 or -3, just like you might use +2 or +3 if you'd like to have > more future-proof hashes (especially with core.validateAbbrev=true). That still does not make much sense to me at all. I do agree that something shorter than the default 7 may be more appropriate for our wetware short-term memory, and it would make sense to grow the "riskier to collide than the default heuristics but more memorable" variant as the project grows, _ONLY_ _IF_ our wetware short-term memory scales with the project we happen to be working on. But our wetware does not scale with the project we work on; at least mine does not. So...
Re: [PATCH 19/20] abbrev: support relative abbrev values
On Thu, Jun 14 2018, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason writes: > >> On Wed, Jun 13 2018, Junio C Hamano wrote: >> >>> Ævar Arnfjörð Bjarmason writes: >>> E.g. here's a breakdown of my dotfiles repo: $ git -c core.abbrev=4 log --pretty=format:%h|perl -nE 'chomp;say length'|sort|uniq -c|sort -nr 784 4 59 5 7 6 I don't have a single commit that needs 7 characters, yet that's our default. This is a sane trade-off for the kernel, but for something that's just a toy or something you're playing around with something shorter can make sense. SHA-1s aren't just written down, but also e.g. remembered in wetware short-term memory. >>> >>> That's a fine example of what I called "supporting absurdly small >>> absolute values like 4"; I still do not see why you want "negative >>> relative values" from that example. >> >> Because hardcoding -2 is very different than setting it to 5, because >> the -2 will scale to the size of the repository, but 5 is just 7-2 where >> 7 is our default value. >> >> So, in general if you want less future proof hashes by some >> probabilistic metric (whether you use core.validateAbbrev or not) you'd >> use -2 or -3, just like you might use +2 or +3 if you'd like to have >> more future-proof hashes (especially with core.validateAbbrev=true). > > That still does not make much sense to me at all. > > I do agree that something shorter than the default 7 may be more > appropriate for our wetware short-term memory, and it would make > sense to grow the "riskier to collide than the default heuristics > but more memorable" variant as the project grows, _ONLY_ _IF_ our > wetware short-term memory scales with the project we happen to be > working on. But our wetware does not scale with the project we work > on; at least mine does not. Yes, it's a trade-off, but just because something is a trade-off doesn't make it useless. Aside from the feature I'm proposing, the same thing applies to the short SHA-1 currently. My ~1k commit dotfiles has 7 characters, but linux.git has 12. Does that make printing the short SHA-1 at all useless and we should just fall back to 40 characters if it's say >= 12? No. 12 is still better than 40, but if we could get away with it 10 would be better than 12. Right now the "get away with it" calculation is a hardcoded constant, this makes it configurable. The reason to make it configurable is because you may want more future proof *or* less future-proof SHA-1s depending on the use-case. Printing a longer SHA-1 has a cost, including but not limited to: * Remembering it for a short time * Seeing it on one screen and typing it into another computer manually * `git log --oneline` output in a terminal where horizontal space is at a premium