[PATCH 19/20] abbrev: support relative abbrev values

2018-06-08 Thread Ævar Arnfjörð Bjarmason
Change the core.abbrev config variable and the corresponding --abbrev
command-line option to support relative values such as +1 or -1.

Before Linus's e6c587c733 ("abbrev: auto size the default
abbreviation", 2016-09-30) git would default to abbreviating object
names to 7-hexdigits, and only picking longer SHA-1s as needed if that
was ambiguous.

That change instead set the default length as a function of the
estimated current count of objects:

Based on the expectation that we would see collision in a
repository with 2^(2N) objects when using object names shortened
to first N bits, use sufficient number of hexdigits to cover the
number of objects in the repository.  Each hexdigit (4-bits) we
add to the shortened name allows us to have four times (2-bits) as
many objects in the repository.

By supporting relative values for core.abbrev we can allow users to
consistently opt-in for either a higher or lower probability of
collision, without needing to hardcode a given numeric value like
"10", which would be overkill on some repositories, and far to small
on others.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/config.txt   |   6 ++
 Documentation/diff-options.txt |   3 +
 Documentation/git-blame.txt|   3 +
 Documentation/git-branch.txt   |   3 +
 Documentation/git-describe.txt |   3 +
 Documentation/git-ls-files.txt |   3 +
 Documentation/git-ls-tree.txt  |   3 +
 Documentation/git-show-ref.txt |   3 +
 cache.h|   1 +
 config.c   |  11 +++
 diff.c |  18 +++-
 environment.c  |   1 +
 parse-options-cb.c |  12 ++-
 revision.c |  18 +++-
 sha1-name.c|  11 +++
 t/t0014-abbrev.sh  | 170 ++---
 16 files changed, 204 insertions(+), 65 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a9..abf07be7b6 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -919,6 +919,12 @@ core.abbrev::
in your repository, which hopefully is enough for
abbreviated object names to stay unique for some time.
The minimum length is 4.
++
+This can also be set to relative values such as `+2` or `-2`, which
+means to add or subtract N characters from the SHA-1 that Git would
+otherwise print, this allows for producing more future-proof SHA-1s
+for use within a given project, while adjusting the value for the
+current approximate number of objects.
 
 add.ignoreErrors::
 add.ignore-errors (deprecated)::
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index f466600972..f1114a7b8d 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -384,6 +384,9 @@ endif::git-format-patch[]
independent of the `--full-index` option above, which controls
the diff-patch output format.  Non default number of
digits can be specified with `--abbrev=`.
++
+Can also be set to a relative value, see `core.abbrev` in
+linkgit:git-diff[1].
 
 -B[][/]::
 --break-rewrites[=[][/]]::
diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt
index d6cddbcb2e..8559d3b0c7 100644
--- a/Documentation/git-blame.txt
+++ b/Documentation/git-blame.txt
@@ -99,6 +99,9 @@ requested, resulting in unaligned output.
 Before 2.19, setting `core.abbrev=40` wouldn't apply the above rule
 and would cause blame to emit output that was unaligned. This bug has
 since been fixed.
++
+Can also be set to a relative value, see `core.abbrev` in
+linkgit:git-diff[1].
 
 
 THE PORCELAIN FORMAT
diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
index 02eccbb931..0f8032cec6 100644
--- a/Documentation/git-branch.txt
+++ b/Documentation/git-branch.txt
@@ -182,6 +182,9 @@ See `--create-reflog` above for details.
Alter the sha1's minimum display length in the output listing.
The default value is 7 and can be overridden by the `core.abbrev`
config option.
++
+Can also be set to a relative value, see `core.abbrev` in
+linkgit:git-diff[1].
 
 --no-abbrev::
Display the full sha1s in the output listing rather than abbreviating 
them.
diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
index e027fb8c4b..a3d5c7e817 100644
--- a/Documentation/git-describe.txt
+++ b/Documentation/git-describe.txt
@@ -65,6 +65,9 @@ OPTIONS
abbreviated object name, use  digits, or as many digits
as needed to form a unique object name.  An  of 0
will suppress long format, only showing the closest tag.
++
+Can also be set to a relative value, see `core.abbrev` in
+linkgit:git-diff[1].
 
 --candidates=::
Instead of considering only the 10 most recent tags as
diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index 5298f1bc30..f9af2b23bf 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-file

Re: [PATCH 19/20] abbrev: support relative abbrev values

2018-06-09 Thread Martin Ågren
On 9 June 2018 at 00:41, Ævar Arnfjörð Bjarmason  wrote:
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index ab641bf5a9..abf07be7b6 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -919,6 +919,12 @@ core.abbrev::
> in your repository, which hopefully is enough for
> abbreviated object names to stay unique for some time.
> The minimum length is 4.
> ++
> +This can also be set to relative values such as `+2` or `-2`, which
> +means to add or subtract N characters from the SHA-1 that Git would
> +otherwise print, this allows for producing more future-proof SHA-1s
> +for use within a given project, while adjusting the value for the
> +current approximate number of objects.

How about s/, this/. This/ to break it up a little? Also, you write "+2"
and "-2" but then "N". Unify it?

Also, I'd suggest s/SHA-1/object ID/ to be future-proof.

> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index f466600972..f1114a7b8d 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -384,6 +384,9 @@ endif::git-format-patch[]
> independent of the `--full-index` option above, which controls
> the diff-patch output format.  Non default number of
> digits can be specified with `--abbrev=`.
> ++
> +Can also be set to a relative value, see `core.abbrev` in
> +linkgit:git-diff[1].

Good. You then add this paragraph to lots of other places...

> diff --git a/config.c b/config.c
> index 12f762ad92..cd95c6bdfb 100644
> --- a/config.c
> +++ b/config.c
> @@ -1151,6 +1151,17 @@ static int git_default_core_config(const char *var, 
> const char *value)
> return config_error_nonbool(var);
> if (!strcasecmp(value, "auto")) {
> default_abbrev = -1;
> +   } else if (*value == '+' || *value == '-') {
> +   int relative = git_config_int(var, value);
> +   if (relative == 0)
> +   die(_("bad core.abbrev value %s. "

Trailing period? Same below.

> + "relative values must be non-zero"),
> +   value);
> +   if (abs(relative) > GIT_SHA1_HEXSZ)
> +   die(_("bad core.abbrev value %s. "
> + "impossibly out of range"),
> +   value);
> +   default_abbrev_relative = relative;
> } else {
> int abbrev = git_config_int(var, value);
> if (abbrev < minimum_abbrev || abbrev > 40)
> diff --git a/diff.c b/diff.c
> index e0141cfbc0..f7861b8472 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4801,16 +4801,28 @@ int diff_opt_parse(struct diff_options *options,
> else if (!strcmp(arg, "--abbrev"))
> options->abbrev = DEFAULT_ABBREV;
> else if (skip_prefix(arg, "--abbrev=", &arg)) {
> +   int v;
> char *end;
> if (!strcmp(arg, ""))
> die("--abbrev expects a value, got '%s'", arg);
> -   options->abbrev = strtoul(arg, &end, 10);
> +   v = strtoul(arg, &end, 10);
> if (*end)
> die("--abbrev expects a numerical value, got '%s'", 
> arg);
> -   if (options->abbrev < MINIMUM_ABBREV) {
> +   if (*arg == '+' || *arg == '-') {
> +   if (v == 0) {
> +   die("relative abbrev must be non-zero");
> +   } else if (abs(v) > the_hash_algo->hexsz) {
> +   die("relative abbrev impossibly out of 
> range");
> +   } else {
> +   default_abbrev_relative = v;
> +   options->abbrev = DEFAULT_ABBREV;
> +   }
> +   } else if (v < MINIMUM_ABBREV) {
> options->abbrev = MINIMUM_ABBREV;
> -   } else if (the_hash_algo->hexsz < options->abbrev) {
> +   } else if (the_hash_algo->hexsz < v) {
> options->abbrev = the_hash_algo->hexsz;
> +   } else {
> +   options->abbrev = v;

I've cut out a few instances of more-or-less repeated code. Any
possibility of extracting this into a helper? Maybe after you've done
the preparatory work of unifying these sites. Or as part of it, i.e.,
"let's switch this spot to use the helper; that makes it stricter in
this-and-that sense".

These can't all be entirely unified, I guess, but maybe "mostly"?

Martin


Re: [PATCH 19/20] abbrev: support relative abbrev values

2018-06-12 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> Change the core.abbrev config variable and the corresponding --abbrev
> command-line option to support relative values such as +1 or -1.
>
> Before Linus's e6c587c733 ("abbrev: auto size the default
> abbreviation", 2016-09-30) git would default to abbreviating object
> names to 7-hexdigits, and only picking longer SHA-1s as needed if that
> was ambiguous.
>
> That change instead set the default length as a function of the
> estimated current count of objects:
>
> Based on the expectation that we would see collision in a
> repository with 2^(2N) objects when using object names shortened
> to first N bits, use sufficient number of hexdigits to cover the
> number of objects in the repository.  Each hexdigit (4-bits) we
> add to the shortened name allows us to have four times (2-bits) as
> many objects in the repository.
>
> By supporting relative values for core.abbrev we can allow users to
> consistently opt-in for either a higher or lower probability of
> collision, without needing to hardcode a given numeric value like
> "10", which would be overkill on some repositories, and far to small
> on others.

Nicely explained and calculated ;-)

>  test_expect_success 'describe core.abbrev=[-+]1 and --abbrev=[-+]1' '
> - test_must_fail git -c core.abbrev=+1 describe | sed_g_tr_d_n >describe 
> &&
> - test_must_fail git -c core.abbrev=-1 describe | sed_g_tr_d_n >describe 
> &&
> + git -c core.abbrev=-1 describe | sed_g_tr_d_n >describe &&
> + test_byte_count = 6 describe &&
> +
> + git -c core.abbrev=+1 describe | sed_g_tr_d_n >describe &&
> + test_byte_count = 8 describe &&

Even though I see the point of supporting absurdly small absolute
values like 4, I do not quite see the point of supporting negative
relative values here.  What's the expected use case?

>   git log --abbrev=+1 --pretty=format:%h -1 | tr_d_n >log &&
> - test_byte_count = 4 log &&
> + test_byte_count = 8 log &&

This, together with many many others in the rest of the patch, is
cute but confusing in that the diff shows change from 4 to 8 due to
the redefinition of what abbrev=+1 means.  I have a feeling that it
may not be worth doing it "right", but if we were doing it "right",
we would probably have done it in multiple steps:

- the earlier patches in this series that demonstrates
  --abbrev=+1 is --abbrev=1 and core.abbrev=+1 is an error.

- ensure --abbrev=+1 is rejected as syntax error just like
  core.abbrev=+1 was, without introducing relative values

- introduce relative value.

That way, the last step (which corresponds to this patch) would show
change from "log --abbrev=+1" failing due to syntax error to showing
abbreviated value that is slightly longer than the default.

But a I said, it may not be worth doing so.  "Is it worth supporting
negative relative length?" still stands, though.


Re: [PATCH 19/20] abbrev: support relative abbrev values

2018-06-13 Thread Ævar Arnfjörð Bjarmason


On Tue, Jun 12 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason   writes:
>
>> Change the core.abbrev config variable and the corresponding --abbrev
>> command-line option to support relative values such as +1 or -1.
>>
>> Before Linus's e6c587c733 ("abbrev: auto size the default
>> abbreviation", 2016-09-30) git would default to abbreviating object
>> names to 7-hexdigits, and only picking longer SHA-1s as needed if that
>> was ambiguous.
>>
>> That change instead set the default length as a function of the
>> estimated current count of objects:
>>
>> Based on the expectation that we would see collision in a
>> repository with 2^(2N) objects when using object names shortened
>> to first N bits, use sufficient number of hexdigits to cover the
>> number of objects in the repository.  Each hexdigit (4-bits) we
>> add to the shortened name allows us to have four times (2-bits) as
>> many objects in the repository.
>>
>> By supporting relative values for core.abbrev we can allow users to
>> consistently opt-in for either a higher or lower probability of
>> collision, without needing to hardcode a given numeric value like
>> "10", which would be overkill on some repositories, and far to small
>> on others.
>
> Nicely explained and calculated ;-)
>
>>  test_expect_success 'describe core.abbrev=[-+]1 and --abbrev=[-+]1' '
>> -test_must_fail git -c core.abbrev=+1 describe | sed_g_tr_d_n >describe 
>> &&
>> -test_must_fail git -c core.abbrev=-1 describe | sed_g_tr_d_n >describe 
>> &&
>> +git -c core.abbrev=-1 describe | sed_g_tr_d_n >describe &&
>> +test_byte_count = 6 describe &&
>> +
>> +git -c core.abbrev=+1 describe | sed_g_tr_d_n >describe &&
>> +test_byte_count = 8 describe &&
>
> Even though I see the point of supporting absurdly small absolute
> values like 4, I do not quite see the point of supporting negative
> relative values here.  What's the expected use case?

I'll add a better explanation for this to the commit message.

Initially I did this for consistency, since it was easy to implement,
and there's no reason to have that arbitrary limitation, but thinking
about it again I think I'll use this for some of my projects.

E.g. here's a breakdown of my dotfiles repo:

$ git -c core.abbrev=4 log  --pretty=format:%h|perl -nE 'chomp;say 
length'|sort|uniq -c|sort -nr
784 4
 59 5
  7 6

I don't have a single commit that needs 7 characters, yet that's our
default. This is a sane trade-off for the kernel, but for something
that's just a toy or something you're playing around with something
shorter can make sense.

SHA-1s aren't just written down, but also e.g. remembered in wetware
short-term memory.

>>  git log --abbrev=+1 --pretty=format:%h -1 | tr_d_n >log &&
>> -test_byte_count = 4 log &&
>> +test_byte_count = 8 log &&
>
> This, together with many many others in the rest of the patch, is
> cute but confusing in that the diff shows change from 4 to 8 due to
> the redefinition of what abbrev=+1 means.  I have a feeling that it
> may not be worth doing it "right", but if we were doing it "right",
> we would probably have done it in multiple steps:
>
> - the earlier patches in this series that demonstrates
>   --abbrev=+1 is --abbrev=1 and core.abbrev=+1 is an error.
>
> - ensure --abbrev=+1 is rejected as syntax error just like
>   core.abbrev=+1 was, without introducing relative values
>
> - introduce relative value.
>
> That way, the last step (which corresponds to this patch) would show
> change from "log --abbrev=+1" failing due to syntax error to showing
> abbreviated value that is slightly longer than the default.
>
> But a I said, it may not be worth doing so.  "Is it worth supporting
> negative relative length?" still stands, though.

I'll see what I can do about this value churn.


Re: [PATCH 19/20] abbrev: support relative abbrev values

2018-06-13 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> E.g. here's a breakdown of my dotfiles repo:
>
> $ git -c core.abbrev=4 log  --pretty=format:%h|perl -nE 'chomp;say 
> length'|sort|uniq -c|sort -nr
> 784 4
>  59 5
>   7 6
>
> I don't have a single commit that needs 7 characters, yet that's our
> default. This is a sane trade-off for the kernel, but for something
> that's just a toy or something you're playing around with something
> shorter can make sense.
>
> SHA-1s aren't just written down, but also e.g. remembered in wetware
> short-term memory.

That's a fine example of what I called "supporting absurdly small
absolute values like 4"; I still do not see why you want "negative
relative values" from that example.


Re: [PATCH 19/20] abbrev: support relative abbrev values

2018-06-14 Thread Ævar Arnfjörð Bjarmason


On Wed, Jun 13 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  writes:
>
>> E.g. here's a breakdown of my dotfiles repo:
>>
>> $ git -c core.abbrev=4 log  --pretty=format:%h|perl -nE 'chomp;say 
>> length'|sort|uniq -c|sort -nr
>> 784 4
>>  59 5
>>   7 6
>>
>> I don't have a single commit that needs 7 characters, yet that's our
>> default. This is a sane trade-off for the kernel, but for something
>> that's just a toy or something you're playing around with something
>> shorter can make sense.
>>
>> SHA-1s aren't just written down, but also e.g. remembered in wetware
>> short-term memory.
>
> That's a fine example of what I called "supporting absurdly small
> absolute values like 4"; I still do not see why you want "negative
> relative values" from that example.

Because hardcoding -2 is very different than setting it to 5, because
the -2 will scale to the size of the repository, but 5 is just 7-2 where
7 is our default value.

So, in general if you want less future proof hashes by some
probabilistic metric (whether you use core.validateAbbrev or not) you'd
use -2 or -3, just like you might use +2 or +3 if you'd like to have
more future-proof hashes (especially with core.validateAbbrev=true).


Re: [PATCH 19/20] abbrev: support relative abbrev values

2018-06-14 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> On Wed, Jun 13 2018, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason  writes:
>>
>>> E.g. here's a breakdown of my dotfiles repo:
>>>
>>> $ git -c core.abbrev=4 log  --pretty=format:%h|perl -nE 'chomp;say 
>>> length'|sort|uniq -c|sort -nr
>>> 784 4
>>>  59 5
>>>   7 6
>>>
>>> I don't have a single commit that needs 7 characters, yet that's our
>>> default. This is a sane trade-off for the kernel, but for something
>>> that's just a toy or something you're playing around with something
>>> shorter can make sense.
>>>
>>> SHA-1s aren't just written down, but also e.g. remembered in wetware
>>> short-term memory.
>>
>> That's a fine example of what I called "supporting absurdly small
>> absolute values like 4"; I still do not see why you want "negative
>> relative values" from that example.
>
> Because hardcoding -2 is very different than setting it to 5, because
> the -2 will scale to the size of the repository, but 5 is just 7-2 where
> 7 is our default value.
>
> So, in general if you want less future proof hashes by some
> probabilistic metric (whether you use core.validateAbbrev or not) you'd
> use -2 or -3, just like you might use +2 or +3 if you'd like to have
> more future-proof hashes (especially with core.validateAbbrev=true).

That still does not make much sense to me at all.

I do agree that something shorter than the default 7 may be more
appropriate for our wetware short-term memory, and it would make
sense to grow the "riskier to collide than the default heuristics
but more memorable" variant as the project grows, _ONLY_ _IF_ our
wetware short-term memory scales with the project we happen to be
working on.  But our wetware does not scale with the project we work
on; at least mine does not.

So...



Re: [PATCH 19/20] abbrev: support relative abbrev values

2018-06-14 Thread Ævar Arnfjörð Bjarmason


On Thu, Jun 14 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  writes:
>
>> On Wed, Jun 13 2018, Junio C Hamano wrote:
>>
>>> Ævar Arnfjörð Bjarmason  writes:
>>>
 E.g. here's a breakdown of my dotfiles repo:

 $ git -c core.abbrev=4 log  --pretty=format:%h|perl -nE 'chomp;say 
 length'|sort|uniq -c|sort -nr
 784 4
  59 5
   7 6

 I don't have a single commit that needs 7 characters, yet that's our
 default. This is a sane trade-off for the kernel, but for something
 that's just a toy or something you're playing around with something
 shorter can make sense.

 SHA-1s aren't just written down, but also e.g. remembered in wetware
 short-term memory.
>>>
>>> That's a fine example of what I called "supporting absurdly small
>>> absolute values like 4"; I still do not see why you want "negative
>>> relative values" from that example.
>>
>> Because hardcoding -2 is very different than setting it to 5, because
>> the -2 will scale to the size of the repository, but 5 is just 7-2 where
>> 7 is our default value.
>>
>> So, in general if you want less future proof hashes by some
>> probabilistic metric (whether you use core.validateAbbrev or not) you'd
>> use -2 or -3, just like you might use +2 or +3 if you'd like to have
>> more future-proof hashes (especially with core.validateAbbrev=true).
>
> That still does not make much sense to me at all.
>
> I do agree that something shorter than the default 7 may be more
> appropriate for our wetware short-term memory, and it would make
> sense to grow the "riskier to collide than the default heuristics
> but more memorable" variant as the project grows, _ONLY_ _IF_ our
> wetware short-term memory scales with the project we happen to be
> working on.  But our wetware does not scale with the project we work
> on; at least mine does not.

Yes, it's a trade-off, but just because something is a trade-off doesn't
make it useless.

Aside from the feature I'm proposing, the same thing applies to the
short SHA-1 currently. My ~1k commit dotfiles has 7 characters, but
linux.git has 12. Does that make printing the short SHA-1 at all useless
and we should just fall back to 40 characters if it's say >= 12? No.

12 is still better than 40, but if we could get away with it 10 would be
better than 12. Right now the "get away with it" calculation is a
hardcoded constant, this makes it configurable.

The reason to make it configurable is because you may want more future
proof *or* less future-proof SHA-1s depending on the use-case. Printing
a longer SHA-1 has a cost, including but not limited to:

 * Remembering it for a short time
 * Seeing it on one screen and typing it into another computer manually
 * `git log --oneline` output in a terminal where horizontal space is at
   a premium