date:20180608

Re: [RFC PATCH v1] telemetry design overview (part 1)

2018-06-08 Thread Duy Nguyen

On Sat, Jun 9, 2018 at 12:22 AM Ævar Arnfjörð Bjarmason
 wrote:
>
>
> On Fri, Jun 08 2018, Johannes Sixt wrote:
>
> > Am 08.06.2018 um 18:00 schrieb Thomas Braun:
> >> I for my part would much rather prefer that to be a compile time
> >> option so that I don't need to check on every git update on windows
> >> if  this is now enabled or not.
> >
> > This exactly my concern, too! A compile-time option may make it a good
> > deal less worrisome.
>
> Can you elaborate on how someone who can maintain inject malicious code
> into your git package + config would be thwarted by this being some
> compile-time option, wouldn't they just compile it in?

Look at this from a different angle. This is driven by the needs to
collect telemetry in _controlled_ environment (mostly server side, I
guess) and it should be no problem to make custom builds there for
you. Not making it a compile-time option could force [1] linux distro
to carry this function to everybody even if they don't use it (and
it's kinda dangerous to misuse if you don't anonymize the data
properly). I also prefer this a compile time option.

[1] Of course many distros can choose to patch it out. But it's the
same argument as bringing this option in in the first place: you guys
already have that code in private and now want to put it in stock git
to reduce maintenance cost, why add extra cost on linux distro
maintenance?
-- 
Duy

Re: GDPR compliance best practices?

2018-06-08 Thread Ævar Arnfjörð Bjarmason



On Fri, Jun 08 2018, Jonathan Nieder wrote:

> Separate from that legal context, though, I think it's an interesting
> feature request.  I don't think it goes far enough: I would like a way
> to erase arbitrary information from the history in a repository.  For
> example, if I accidentally check in an encryption key in my repository
> as content or a commit message, I would like a way to remove it,
> assuming that others who fetch from the same repo are willing to
> cooperate with me, of course (i.e. in place of the object, the server
> would store a placeholder and an _advisory_ token allowing clients to
> know (1) that this object was deleted, (2) what object to use instead,
> and (3) an explanatory note about why the deletion occured; clients
> could make whatever use of this information they choose).
>
> I've seen some discussion on this subject at
> https://www.mercurial-scm.org/pipermail/mercurial/2008-March/017802.html
> long ago and have some ideas of my own, but nothing concrete yet.
> Anyway, I thought it might be useful to get people's minds working on
> it.

You may find it interesting to look at how git-annex-forget does this:
https://git-annex.branchable.com/git-annex-forget/ &
http://git-annex.branchable.com/devblog/day_-4__forgetting/

Re: GDPR compliance best practices?

2018-06-08 Thread Ævar Arnfjörð Bjarmason

On Fri, Jun 08 2018, Peter Backes wrote:

> On Fri, Jun 08, 2018 at 10:13:20AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Can you walk us through how anyone would be expected to fork (as create
>> a new project, not the github-ism) existing projects under such a
>> regiment?
>
> I don't see your point. Copy the repository to fork. Nothing changes
> about that. Nothing prevents anyone from forking a repository which had
> some of its author names removed from the commits.

This basically the same as saying the whole notion of Signed-off-by
should be abandoned entirely, since in this case the fork will only have
a partial set of these.

The point is that we're recording information so each line in the
repository can be traced back to a SOB.

These sorts of take-downs would destroy that information, and the
proposed solution of having some party retain these creates a special
class of free software users who are capable of following that line of
attributions.

[PATCH 05/20] abbrev tests: test "git-blame" behavior

2018-06-08 Thread Ævar Arnfjörð Bjarmason

Add tests showing how "git-blame" behaves. As noted in an earlier
change there's a behavior difference between core.abbrev=40 and
--abbrev=40.

Let's also assert that neither way of changing the abbreviation length
modifies the porcelain output.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t0014-abbrev.sh | 48 +++
 1 file changed, 48 insertions(+)

diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index 1c60f5ff93..77f15d5b0b 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -12,6 +12,10 @@ cut_tr_d_n_field_n() {
cut -d " " -f $1 | tr_d_n
 }
 
+nocaret() {
+   sed 's/\^//'
+}
+
 test_expect_success 'setup' '
test_commit A &&
git tag -a -mannotated A.annotated &&
@@ -115,4 +119,48 @@ do
"
 done
 
+for i in $(test_seq 4 40)
+do
+   for opt in --porcelain --line-porcelain
+   do
+   test_expect_success "blame $opt ignores core.abbrev=$i and 
--abbrev=$i" "
+   git -c core.abbrev=$i blame $opt A.t | head -n 1 | 
cut_tr_d_n_field_n 1 >blame &&
+   test_byte_count = 40 blame &&
+   git blame $opt --abbrev=$i A.t | head -n 1 | 
cut_tr_d_n_field_n 1 >blame &&
+   test_byte_count = 40 blame
+   "
+   done
+
+
+   test_expect_success "blame core.abbrev=$i and --abbrev=$i with 
boundary" "
+   # See the blame documentation for why this is off-by-one
+   git -c core.abbrev=$i blame A.t | cut_tr_d_n_field_n 1 | 
nocaret >blame &&
+   test_byte_count = $i blame &&
+   git blame --abbrev=$i A.t | cut_tr_d_n_field_n 1 | nocaret 
>blame &&
+   if test $i -eq 40
+   then
+   test_byte_count = 39 blame
+   else
+   test_byte_count = $i blame
+   fi
+   "
+
+   test_expect_success "blame core.abbrev=$i and --abbrev=$i without 
boundary" "
+   git -c core.abbrev=$i blame B.t | cut_tr_d_n_field_n 1 | 
nocaret >blame &&
+   if test $i -eq 40
+   then
+   test_byte_count = $i blame
+   else
+   test_byte_count = \$(($i + 1)) blame
+   fi &&
+   git blame --abbrev=$i B.t | cut_tr_d_n_field_n 1 | nocaret 
>blame &&
+   if test $i -eq 40
+   then
+   test_byte_count = $i blame
+   else
+   test_byte_count = \$(($i + 1)) blame
+   fi
+   "
+done
+
 test_done
-- 
2.17.0.290.gded63e768a

[PATCH 20/20] abbrev: add a core.validateAbbrev setting

2018-06-08 Thread Ævar Arnfjörð Bjarmason

Operations that need to abbreviate a lot of SHA-1s such as 'git log
--oneline' experience degraded performance when traversing a lot of
packs. See [1] and [2] for some relevant performance numbers.

One way to address this is something like the MIDX to make looking up
the SHA-1s cheaper.

This change adds an alternate method of achieving some of the same
ends (but possibly not all, see [3] and replies to the original thread
at [1]).

Instead of trying really hard to find an unambiguous SHA-1 we can with
core.validateAbbrev=false, and preferably combined with the new
support for relative core.abbrev values (such as +4) unconditionally
print a short SHA-1 without doing any disambiguation check. I.e. it
allows for picking a trade-off between performance, and the odds that
future or remote (or current and local) short SHA-1 will be ambiguous.

With the included performance test my git.git repacked with with `git
repack -A -d --max-pack-size=X` gives the following results against
git.git itself with X=64M:

TestHEAD~ HEAD


0014.1: git log --oneline --raw --parents   2.53(2.48+0.05)   
2.20(2.14+0.05) -13.0%

With one big pack -7.6%, and with 16M packs -23.8%.

1. https://public-inbox.org/git/20180107181459.222909-1-dsto...@microsoft.com/
2. https://public-inbox.org/git/20180607140338.32440-1-dsto...@microsoft.com/
3. https://public-inbox.org/git/87lgbsz61p@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/config.txt| 43 ++
 cache.h |  1 +
 config.c|  7 +
 environment.c   |  1 +
 sha1-name.c |  4 +++
 t/perf/p0014-abbrev.sh  | 13 
 t/t1512-rev-parse-disambiguation.sh | 47 +
 7 files changed, 116 insertions(+)
 create mode 100755 t/perf/p0014-abbrev.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index abf07be7b6..df31d1351f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -925,6 +925,49 @@ means to add or subtract N characters from the SHA-1 that 
Git would
 otherwise print, this allows for producing more future-proof SHA-1s
 for use within a given project, while adjusting the value for the
 current approximate number of objects.
++
+This is especially useful in combination with the
+`core.validateAbbrev` setting, or to get more future-proof hashes to
+reference in the future in a repository whose number of objects is
+expected to grow.
+
+core.validateAbbrev::
+   If set to false (true by default) don't do any validation when
+   printing abbreviated object names to see if they're really
+   unique. This makes printing objects more performant at the
+   cost of potentially printing object names that aren't unique
+   within the current repository.
++
+When printing abbreviated object names Git needs to look through the
+local object store. This is an `O(log N)` operation assuming all the
+objects are in a single pack file, but `X * O(log N)` given `X` pack
+files, which can get expensive on some larger repositories.
++
+This setting changes that to `O(1)`, but with the trade-off that
+depending on the value of `core.abbrev` we may be printing abbreviated
+hashes that collide. Too see how likely this is, try running:
++
+---
+git log --all --pretty=format:%h --abbrev=4 | perl -nE 'chomp; say length' | 
sort | uniq -c | sort -nr
+---
++
+This shows how many commits were found at each abbreviation length. On
+linux.git in June 2018 this shows a bit more than 750,000 commits,
+with just 4 needing 11 characters to be fully abbreviated, and the
+default heuristic picks a length of 12.
++
+Even without `core.validateAbbrev=false` the results abbreviation
+already a bit of a probability game. They're guaranteed at the moment
+of generation, but as more objects are added, ambiguities may be
+introduced. Likewise, what's unambiguous for you may not be for
+somebody else you're communicating with, if they have their own clone.
++
+Therefore the default of `core.validateAbbrev=true` may not save you
+in practice if you're sharing the SHA-1 or noting it now to use after
+a `git fetch`. You may be better off setting `core.abbrev` to
+e.g. `+2` to add 2 extra characters to the SHA-1, and possibly combine
+that with `core.validateAbbrev=false` to get a reasonable trade-off
+between safety and performance.
 
 add.ignoreErrors::
 add.ignore-errors (deprecated)::
diff --git a/cache.h b/cache.h
index 0fb4211804..6dc5af9482 100644
--- a/cache.h
+++ b/cache.h
@@ -773,6 +773,7 @@ extern int

[PATCH 13/20] parse-options-cb.c: convert uses of 40 to GIT_SHA1_HEXSZ

2018-06-08 Thread Ævar Arnfjörð Bjarmason

In ac53fe8601 ("sha1_name: convert uses of 40 to GIT_SHA1_HEXSZ",
2017-07-13) the code this is validating user input for in
find_unique_abbrev_r() was converted to GIT_SHA1_HEXSZ, but the
corresponding validation codepath wasn't change. Let's do that.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 parse-options-cb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/parse-options-cb.c b/parse-options-cb.c
index 0f9f311a7a..298b5735c8 100644
--- a/parse-options-cb.c
+++ b/parse-options-cb.c
@@ -21,8 +21,8 @@ int parse_opt_abbrev_cb(const struct option *opt, const char 
*arg, int unset)
return opterror(opt, "expects a numerical value", 0);
if (v && v < MINIMUM_ABBREV)
v = MINIMUM_ABBREV;
-   else if (v > 40)
-   v = 40;
+   else if (v > GIT_SHA1_HEXSZ)
+   v = GIT_SHA1_HEXSZ;
}
*(int *)(opt->value) = v;
return 0;
-- 
2.17.0.290.gded63e768a

Re: GDPR compliance best practices?

2018-06-08 Thread Jonathan Nieder

Hi,

Peter Backes wrote:

> I'd like to ask whether anyone has best practices for achieving GDPR
> compliance for git repos? The GDPR will come into effect in the EU next
> month.

This is a reasonable question to ask other Git users on this list to
share ideas, so thanks for asking it.

> In particular, how do you cope with the "Right to erasure" concerning
> entries in the history of your git repos?

Later in the thread you discussed some changes you would like to make
to Git or in front of Git to ensure that people can erase their
authorship information from a repository after the fact in a
non-disruptive way.

I have no opinion about how that relates to GDPR requirements.  I tend
to expect any legal advice a person gets to be situation-specific;
it's much harder to get legal advice that is useful to share.

Separate from that legal context, though, I think it's an interesting
feature request.  I don't think it goes far enough: I would like a way
to erase arbitrary information from the history in a repository.  For
example, if I accidentally check in an encryption key in my repository
as content or a commit message, I would like a way to remove it,
assuming that others who fetch from the same repo are willing to
cooperate with me, of course (i.e. in place of the object, the server
would store a placeholder and an _advisory_ token allowing clients to
know (1) that this object was deleted, (2) what object to use instead,
and (3) an explanatory note about why the deletion occured; clients
could make whatever use of this information they choose).

I've seen some discussion on this subject at
https://www.mercurial-scm.org/pipermail/mercurial/2008-March/017802.html
long ago and have some ideas of my own, but nothing concrete yet.
Anyway, I thought it might be useful to get people's minds working on
it.

Thanks,
Jonathan

[PATCH 09/20] abbrev tests: test for "git-log" behavior

2018-06-08 Thread Ævar Arnfjörð Bjarmason

The "log" family of commands does its own parsing for --abbrev in
revision.c, so having dedicated tests for it makes sense.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t0014-abbrev.sh | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index 645bcca1d1..a66051c040 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -203,4 +203,14 @@ do
"
 done
 
+for i in $(test_seq 4 40)
+do
+   test_expect_success "log core.abbrev=$i and --abbrev=$i" "
+   git -c core.abbrev=$i log --pretty=format:%h -1 | tr_d_n >log &&
+   test_byte_count = $i log &&
+   git log --abbrev=$i --pretty=format:%h -1 | tr_d_n >log &&
+   test_byte_count = $i log
+   "
+done
+
 test_done
-- 
2.17.0.290.gded63e768a

[PATCH 06/20] blame: fix a bug, core.abbrev should work like --abbrev

2018-06-08 Thread Ævar Arnfjörð Bjarmason

Since 84393bfd73 ("blame: add --abbrev command line option and make it
honor core.abbrev", 2011-04-06) first released with v1.7.6 "git blame"
has supported both core.abbrev and --abbrev to change the abbreviation
length.

Initially output wouldn't alter the abbreviation length to account for
the boundary commit. This was changed in 91229834c2 ("blame: fix
alignment with --abbrev=40", 2017-01-05) first released with v2.11.1.

That change had a bug which I'm fixing here. It didn't account for the
abbreviation length also being set via core.abbrev, not just via the
--abbrev command-line option.

So let's handle that. The easiest way to do that is to check if the
global default_abbrev variable (-1 by default) has been set by
git_default_core_config(), and if so pretend we had the --abbrev
option is set, in lieu of making everything that uses the "abbrev"
variable now read that OR default_abbrev).

The reason I'm documenting these past behaviors is that whatever our
desires with --porcelain people do parse the human-readable "git
blame" output, and for any machine use are likely to use
core.abbrev=40 or --abbrev=40. Documenting how the format has changed
over time will help those users avoid nasty surprises.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/git-blame.txt | 8 +++-
 builtin/blame.c | 2 ++
 t/t0014-abbrev.sh   | 7 ++-
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt
index 7b562494ac..d6cddbcb2e 100644
--- a/Documentation/git-blame.txt
+++ b/Documentation/git-blame.txt
@@ -92,7 +92,13 @@ include::blame-options.txt[]
 Because of this UI design, the only way to get the full SHA-1 of the
 boundary commit is to use the `--porcelain` format. With `--abbrev=40`
 only 39 characters of the boundary SHA-1 will be emitted, since one
-will be used for the caret to mark the boundary.
+will be used for the caret to mark the boundary. This behavior was
+different before 2.11.1, git would then emit the 40 character if
+requested, resulting in unaligned output.
++
+Before 2.19, setting `core.abbrev=40` wouldn't apply the above rule
+and would cause blame to emit output that was unaligned. This bug has
+since been fixed.
 
 
 THE PORCELAIN FORMAT
diff --git a/builtin/blame.c b/builtin/blame.c
index 4202584f97..6ab08561d1 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -868,6 +868,8 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
} else if (show_progress < 0)
show_progress = isatty(2);
 
+   if (default_abbrev >= 0)
+   abbrev = default_abbrev;
if (0 < abbrev && abbrev < GIT_SHA1_HEXSZ)
/* one more abbrev length is needed for the boundary commit */
abbrev++;
diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index 77f15d5b0b..934c54a96b 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -135,7 +135,12 @@ do
test_expect_success "blame core.abbrev=$i and --abbrev=$i with 
boundary" "
# See the blame documentation for why this is off-by-one
git -c core.abbrev=$i blame A.t | cut_tr_d_n_field_n 1 | 
nocaret >blame &&
-   test_byte_count = $i blame &&
+   if test $i -eq 40
+   then
+   test_byte_count = 39 blame
+   else
+   test_byte_count = $i blame
+   fi &&
git blame --abbrev=$i A.t | cut_tr_d_n_field_n 1 | nocaret 
>blame &&
if test $i -eq 40
then
-- 
2.17.0.290.gded63e768a

[PATCH 18/20] abbrev parsing: use braces on multiple conditional arms

2018-06-08 Thread Ævar Arnfjörð Bjarmason

Adjust this code that'll be modified in a subsequent change to have
more than one line per branch to use braces per the CodingGuidelines,
this makes the later change easier to understand.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 diff.c | 5 +++--
 revision.c | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index cab79d24ab..e0141cfbc0 100644
--- a/diff.c
+++ b/diff.c
@@ -4807,10 +4807,11 @@ int diff_opt_parse(struct diff_options *options,
options->abbrev = strtoul(arg, , 10);
if (*end)
die("--abbrev expects a numerical value, got '%s'", 
arg);
-   if (options->abbrev < MINIMUM_ABBREV)
+   if (options->abbrev < MINIMUM_ABBREV) {
options->abbrev = MINIMUM_ABBREV;
-   else if (the_hash_algo->hexsz < options->abbrev)
+   } else if (the_hash_algo->hexsz < options->abbrev) {
options->abbrev = the_hash_algo->hexsz;
+   }
}
else if ((argcount = parse_long_opt("src-prefix", av, ))) {
options->a_prefix = optarg;
diff --git a/revision.c b/revision.c
index d39a292895..2a75fef22d 100644
--- a/revision.c
+++ b/revision.c
@@ -2053,10 +2053,11 @@ static int handle_revision_opt(struct rev_info *revs, 
int argc, const char **arg
revs->abbrev = strtoul(optarg, , 10);
if (*end)
die("--abbrev expects a numerical value, got '%s'", 
optarg);
-   if (revs->abbrev < MINIMUM_ABBREV)
+   if (revs->abbrev < MINIMUM_ABBREV) {
revs->abbrev = MINIMUM_ABBREV;
-   else if (revs->abbrev > hexsz)
+   } else if (revs->abbrev > hexsz) {
revs->abbrev = hexsz;
+   }
} else if (!strcmp(arg, "--abbrev-commit")) {
revs->abbrev_commit = 1;
revs->abbrev_commit_given = 1;
-- 
2.17.0.290.gded63e768a

[PATCH 12/20] abbrev tests: test for --abbrev and core.abbrev=[+-]N

2018-06-08 Thread Ævar Arnfjörð Bjarmason

In a later change I mean to make values like -1 and +1 mean something
different, but right now they're implicitly parsed. Let's test for the
current behavior before changing it.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t0014-abbrev.sh | 131 --
 1 file changed, 128 insertions(+), 3 deletions(-)

diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index 5a99cbe434..6dee92f35e 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -74,7 +74,7 @@ test_expect_success 'abbrev non-integer value handling 
differs ' '
! test -s stderr
 '
 
-for i in -41 -20 -10 -1 0 1 2 3 41
+for i in -41 -20 -10 -1 -0 +0 0 1 2 3 41
 do
test_expect_success "core.abbrev value $i out of range errors out" "
test_must_fail git -c core.abbrev=$i log -1 --pretty=format:%h 
2>stderr &&
@@ -90,7 +90,7 @@ do
"
 done
 
-for i in 0 1 2 3 4
+for i in 0 1 2 3 4 -0 +0 +1 +2 +3 +4
 do
test_expect_success "non-negative --abbrev=$i value log &&
@@ -98,7 +98,7 @@ do
"
 done
 
-for i in 41 9001
+for i in 41 9001 +41 +9001
 do
test_expect_success "non-negative --abbrev=$i value >MINIMUM_ABBREV 
falls back on 40" "
git log --abbrev=$i -1 --pretty=format:%h >log &&
@@ -116,6 +116,10 @@ do
git log --abbrev=$i -1 --pretty=format:%h >log &&
test_byte_count = $i log &&
 
+   # core.abbrev=+N is the same as core.abbrev=N
+   git -c core.abbrev=+$i log -1 --pretty=format:%h >log &&
+   test_byte_count = $i log &&
+
# The --abbrev option should take priority over
# core.abbrev
git -c core.abbrev=20 log --abbrev=$i -1 --pretty=format:%h 
>log &&
@@ -172,16 +176,39 @@ do
"
 done
 
+test_expect_success 'blame core.abbrev=[-+]1 and --abbrev=[-+]1' '
+   test_must_fail git -c core.abbrev=+1 blame A.t | cut_tr_d_n_field_n 1 
>blame &&
+   test_must_fail git -c core.abbrev=-1 blame A.t | cut_tr_d_n_field_n 1 
>blame &&
+
+   git blame --abbrev=-1 A.t | cut_tr_d_n_field_n 1 >blame &&
+   test_byte_count = 5 blame &&
+
+   git blame --abbrev=+1 A.t | cut_tr_d_n_field_n 1 >blame &&
+   test_byte_count = 5 blame
+'
+
 for i in $(test_seq 4 40)
 do
test_expect_success "branch core.abbrev=$i and --abbrev=$i" "
git -c core.abbrev=$i branch -v | cut_tr_d_n_field_n 3 >branch 
&&
test_byte_count = $i branch &&
+
git branch --abbrev=$i -v | cut_tr_d_n_field_n 3 >branch &&
test_byte_count = $i branch
"
 done
 
+test_expect_success 'branch core.abbrev=[-+]1 and --abbrev=[-+]1' '
+   test_must_fail git -c core.abbrev=+1 branch -v | cut_tr_d_n_field_n 3 
>branch &&
+   test_must_fail git -c core.abbrev=-1 branch -v | cut_tr_d_n_field_n 3 
>branch &&
+
+   git branch --abbrev=-1 -v | cut_tr_d_n_field_n 3 >branch &&
+   test_byte_count = 4 branch &&
+
+   git branch --abbrev=+1 -v | cut_tr_d_n_field_n 3 >branch &&
+   test_byte_count = 4 branch
+'
+
 test_expect_success 'describe core.abbrev and --abbrev special cases' '
# core.abbrev=0 behaves as usual...
test_must_fail git -c core.abbrev=0 describe &&
@@ -203,6 +230,17 @@ do
"
 done
 
+test_expect_success 'describe core.abbrev=[-+]1 and --abbrev=[-+]1' '
+   test_must_fail git -c core.abbrev=+1 describe | sed_g_tr_d_n >describe 
&&
+   test_must_fail git -c core.abbrev=-1 describe | sed_g_tr_d_n >describe 
&&
+
+   git describe --abbrev=-1 | sed_g_tr_d_n >describe &&
+   test_byte_count = 4 describe &&
+
+   git describe --abbrev=+1 | sed_g_tr_d_n >describe &&
+   test_byte_count = 4 describe
+'
+
 for i in $(test_seq 4 40)
 do
test_expect_success "log core.abbrev=$i and --abbrev=$i" "
@@ -213,6 +251,20 @@ do
"
 done
 
+test_expect_success 'log core.abbrev=[-+]1 and --abbrev=[-+]1' '
+   test_must_fail git -c core.abbrev=+1 log --pretty=format:%h -1 2>stderr 
&&
+   test_i18ngrep "abbrev length out of range" stderr &&
+
+   test_must_fail git -c core.abbrev=-1 log --pretty=format:%h -1 2>stderr 
&&
+   test_i18ngrep "abbrev length out of range" stderr &&
+
+   git log --abbrev=+1 --pretty=format:%h -1 | tr_d_n >log &&
+   test_byte_count = 4 log &&
+
+   git log --abbrev=-1 --pretty=format:%h -1 | tr_d_n >log &&
+   test_byte_count = 40 log
+'
+
 for i in $(test_seq 4 40)
 do
test_expect_success "diff --no-index --raw core.abbrev=$i and 
--abbrev=$i" "
@@ -244,6 +296,46 @@ do
"
 done
 
+test_expect_success 'diff --no-index --raw core.abbrev=[-+]1 and 
--abbrev=[-+]1' '
+   test_must_fail git -c core.abbrev=+1 diff --no-index --raw X Y 2>stderr 
&&
+   test_i18ngrep "abbrev length out of range" stderr &&
+
+   test_must_fail git -c core.abbrev=-1 diff --no-index --raw X Y 2>stderr 
&&
+   test_i18ngrep "abbrev length out of

[PATCH 08/20] abbrev tests: test for "git-describe" behavior

2018-06-08 Thread Ævar Arnfjörð Bjarmason

The only thing out of the ordinary with git-describe is the --abbrev=0
special-case.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t0014-abbrev.sh | 25 +
 1 file changed, 25 insertions(+)

diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index d8b060d922..645bcca1d1 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -16,6 +16,10 @@ nocaret() {
sed 's/\^//'
 }
 
+sed_g_tr_d_n() {
+   sed 's/.*g//' | tr_d_n
+}
+
 test_expect_success 'setup' '
test_commit A &&
git tag -a -mannotated A.annotated &&
@@ -178,4 +182,25 @@ do
"
 done
 
+test_expect_success 'describe core.abbrev and --abbrev special cases' '
+   # core.abbrev=0 behaves as usual...
+   test_must_fail git -c core.abbrev=0 describe &&
+
+   # ...but --abbrev=0 is special-cased to print the nearest tag,
+   # not fall back on "4" like git-log.
+   echo A.annotated >expected &&
+   git describe --abbrev=0 >actual &&
+   test_cmp expected actual
+'
+
+for i in $(test_seq 4 40)
+do
+   test_expect_success "describe core.abbrev=$i and --abbrev=$i" "
+   git -c core.abbrev=$i describe | sed_g_tr_d_n >describe &&
+   test_byte_count = $i describe &&
+   git describe --abbrev=$i | sed_g_tr_d_n >describe &&
+   test_byte_count = $i describe
+   "
+done
+
 test_done
-- 
2.17.0.290.gded63e768a

[PATCH 03/20] blame doc: explicitly note how --abbrev=40 gives 39 chars

2018-06-08 Thread Ævar Arnfjörð Bjarmason

In a later change I'm adding stress testing of the commit abbreviation
as it relates to git-blame and others, and initially thought that the
inability to extract full SHA-1s from the non-"--porcelain" output was
a bug.

In hindsight I could have read the existing paragraph more carefully,
but let's make this clearer by explicitly stating this limitation of
--abbrev as it relates to git-blame, it is not shared by any other
command that supports core.abbrev or --abbrev.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/git-blame.txt | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt
index 16323eb80e..7b562494ac 100644
--- a/Documentation/git-blame.txt
+++ b/Documentation/git-blame.txt
@@ -88,6 +88,11 @@ include::blame-options.txt[]
Instead of using the default 7+1 hexadecimal digits as the
abbreviated object name, use +1 digits. Note that 1 column
is used for a caret to mark the boundary commit.
++
+Because of this UI design, the only way to get the full SHA-1 of the
+boundary commit is to use the `--porcelain` format. With `--abbrev=40`
+only 39 characters of the boundary SHA-1 will be emitted, since one
+will be used for the caret to mark the boundary.
 
 
 THE PORCELAIN FORMAT
-- 
2.17.0.290.gded63e768a

[PATCH 11/20] abbrev tests: test for plumbing behavior

2018-06-08 Thread Ævar Arnfjörð Bjarmason

The "git-{ls-files,ls-tree,show-ref}" commands all have in common that
the core.abbrev variable is ignored, and only --abbrev works. This is
intentional since these are all plumbing.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t0014-abbrev.sh | 32 
 1 file changed, 32 insertions(+)

diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index 783807475f..5a99cbe434 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -244,4 +244,36 @@ do
"
 done
 
+for i in $(test_seq 4 40)
+do
+   test_expect_success "ls-files core.abbrev=$i and --abbrev=$i" "
+   git -c core.abbrev=$i ls-files --stage A.t | cut_tr_d_n_field_n 
2 >ls-files &&
+   test_byte_count = 40 ls-files &&
+   git ls-files --abbrev=$i --stage A.t | cut_tr_d_n_field_n 2 
>ls-files &&
+   test_byte_count = $i ls-files
+   "
+done
+
+for i in $(test_seq 4 40)
+do
+   test_expect_success "ls-tree core.abbrev=$i and --abbrev=$i" "
+   git -c core.abbrev=$i ls-tree HEAD A.t | cut -f 1 | 
cut_tr_d_n_field_n 3 >ls-tree &&
+   test_byte_count = 40 ls-tree &&
+   git ls-tree --abbrev=$i HEAD A.t | cut -f 1 | 
cut_tr_d_n_field_n 3 >ls-tree &&
+   test_byte_count = $i ls-tree
+   "
+done
+
+for i in $(test_seq 4 40)
+do
+   test_expect_success "show-ref core.abbrev=$i and --abbrev=$i" "
+   git -c core.abbrev=$i show-ref --hash refs/heads/master | 
tr_d_n >show-ref &&
+   test_byte_count = 40 show-ref &&
+   git show-ref --hash --abbrev=$i refs/heads/master | tr_d_n 
>show-ref &&
+   test_byte_count = $i show-ref &&
+   git show-ref --hash=$i refs/heads/master | tr_d_n >show-ref &&
+   test_byte_count = $i show-ref
+   "
+done
+
 test_done
-- 
2.17.0.290.gded63e768a

[PATCH 15/20] parse-options-cb.c: use braces on multiple conditional arms

2018-06-08 Thread Ævar Arnfjörð Bjarmason

Adjust this code that'll be modified in a subsequent change to have
more than one line per branch to use braces per the CodingGuidelines,
this makes the later change easier to understand.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 parse-options-cb.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/parse-options-cb.c b/parse-options-cb.c
index 298b5735c8..e3cd87fbd6 100644
--- a/parse-options-cb.c
+++ b/parse-options-cb.c
@@ -19,10 +19,11 @@ int parse_opt_abbrev_cb(const struct option *opt, const 
char *arg, int unset)
v = strtol(arg, (char **), 10);
if (*arg)
return opterror(opt, "expects a numerical value", 0);
-   if (v && v < MINIMUM_ABBREV)
+   if (v && v < MINIMUM_ABBREV) {
v = MINIMUM_ABBREV;
-   else if (v > GIT_SHA1_HEXSZ)
+   } else if (v > GIT_SHA1_HEXSZ) {
v = GIT_SHA1_HEXSZ;
+   }
}
*(int *)(opt->value) = v;
return 0;
-- 
2.17.0.290.gded63e768a

[PATCH 10/20] abbrev tests: test for "git-diff" behavior

2018-06-08 Thread Ævar Arnfjörð Bjarmason

The "diff" family of commands does its own parsing for --abbrev in
diff.c, so having dedicated tests for it makes sense.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t0014-abbrev.sh | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index a66051c040..783807475f 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -213,4 +213,35 @@ do
"
 done
 
+for i in $(test_seq 4 40)
+do
+   test_expect_success "diff --no-index --raw core.abbrev=$i and 
--abbrev=$i" "
+   test_must_fail git -c core.abbrev=$i diff --no-index --raw X Y 
>diff &&
+   cut_tr_d_n_field_n 3 diff.3 &&
+   test_byte_count = $i diff.3 &&
+   cut_tr_d_n_field_n 4 diff.4 &&
+   test_byte_count = $i diff.4 &&
+
+   test_must_fail git diff --no-index --raw --abbrev=$i X Y >diff 
&&
+   cut_tr_d_n_field_n 3 diff.3 &&
+   test_byte_count = $i diff.3 &&
+   cut_tr_d_n_field_n 4 diff.4 &&
+   test_byte_count = $i diff.4
+   "
+
+   test_expect_success "diff --raw core.abbrev=$i and --abbrev=$i" "
+   git -c core.abbrev=$i diff --raw HEAD~ >diff &&
+   cut_tr_d_n_field_n 3 diff.3 &&
+   test_byte_count = $i diff.3 &&
+   cut_tr_d_n_field_n 4 diff.4 &&
+   test_byte_count = $i diff.4 &&
+
+   git diff --raw --abbrev=$i HEAD~ >diff &&
+   cut_tr_d_n_field_n 3 diff.3 &&
+   test_byte_count = $i diff.3 &&
+   cut_tr_d_n_field_n 4 diff.4 &&
+   test_byte_count = $i diff.4
+   "
+done
+
 test_done
-- 
2.17.0.290.gded63e768a

[PATCH 14/20] config.c: use braces on multiple conditional arms

2018-06-08 Thread Ævar Arnfjörð Bjarmason

Adjust this code that'll be modified in a subsequent change to have
more than one line per branch to use braces per the CodingGuidelines,
this makes the later change easier to understand.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 config.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/config.c b/config.c
index fbbf0f8e9f..12f762ad92 100644
--- a/config.c
+++ b/config.c
@@ -1149,9 +1149,9 @@ static int git_default_core_config(const char *var, const 
char *value)
if (!strcmp(var, "core.abbrev")) {
if (!value)
return config_error_nonbool(var);
-   if (!strcasecmp(value, "auto"))
+   if (!strcasecmp(value, "auto")) {
default_abbrev = -1;
-   else {
+   } else {
int abbrev = git_config_int(var, value);
if (abbrev < minimum_abbrev || abbrev > 40)
return error("abbrev length out of range: %d", 
abbrev);
-- 
2.17.0.290.gded63e768a

[PATCH 16/20] abbrev: unify the handling of non-numeric values

2018-06-08 Thread Ævar Arnfjörð Bjarmason

Unify how --abbrev=XYZ and core.abbrev=XYZ is handled. 2/4 parsers for
these values would just let these invalid values pass, treating them
as though "0" was provided, which due to other inconsistent fallback
logic (soon to be fixed) would be treated as providing MINIMUM_ABBREV.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 diff.c|  5 -
 revision.c|  5 -
 t/t0014-abbrev.sh | 10 +-
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 136d44b455..75935322f1 100644
--- a/diff.c
+++ b/diff.c
@@ -4801,7 +4801,10 @@ int diff_opt_parse(struct diff_options *options,
else if (!strcmp(arg, "--abbrev"))
options->abbrev = DEFAULT_ABBREV;
else if (skip_prefix(arg, "--abbrev=", )) {
-   options->abbrev = strtoul(arg, NULL, 10);
+   char *end;
+   options->abbrev = strtoul(arg, , 10);
+   if (*end)
+   die("--abbrev expects a numerical value, got '%s'", 
arg);
if (options->abbrev < MINIMUM_ABBREV)
options->abbrev = MINIMUM_ABBREV;
else if (the_hash_algo->hexsz < options->abbrev)
diff --git a/revision.c b/revision.c
index 40fd91ff2b..aa87afa77f 100644
--- a/revision.c
+++ b/revision.c
@@ -2047,7 +2047,10 @@ static int handle_revision_opt(struct rev_info *revs, 
int argc, const char **arg
} else if (!strcmp(arg, "--abbrev")) {
revs->abbrev = DEFAULT_ABBREV;
} else if (skip_prefix(arg, "--abbrev=", )) {
-   revs->abbrev = strtoul(optarg, NULL, 10);
+   char *end;
+   revs->abbrev = strtoul(optarg, , 10);
+   if (*end)
+   die("--abbrev expects a numerical value, got '%s'", 
optarg);
if (revs->abbrev < MINIMUM_ABBREV)
revs->abbrev = MINIMUM_ABBREV;
else if (revs->abbrev > hexsz)
diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index 6dee92f35e..203fe316b9 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -64,14 +64,14 @@ test_expect_success 'abbrev non-integer value handling 
differs ' '
test_must_fail git branch -v --abbrev=XYZ 2>stderr &&
test_i18ngrep "expects a numerical value" stderr &&
 
-   git log --abbrev=XYZ -1 --pretty=format:%h 2>stderr &&
-   ! test -s stderr &&
+   test_must_fail git log --abbrev=XYZ -1 --pretty=format:%h 2>stderr &&
+   test_i18ngrep "expects a numerical value" stderr &&
 
-   git diff --raw --abbrev=XYZ HEAD~ 2>stderr &&
-   ! test -s stderr &&
+   test_must_fail git diff --raw --abbrev=XYZ HEAD~ 2>stderr &&
+   test_i18ngrep "expects a numerical value" stderr &&
 
test_must_fail git diff --raw --abbrev=XYZ --no-index X Y 2>stderr &&
-   ! test -s stderr
+   test_i18ngrep "expects a numerical value" stderr
 '
 
 for i in -41 -20 -10 -1 -0 +0 0 1 2 3 41
-- 
2.17.0.290.gded63e768a

[PATCH 17/20] abbrev: unify the handling of empty values

2018-06-08 Thread Ævar Arnfjörð Bjarmason

For no good reason the --abbrev= command-line option was less strict
than the core.abbrev config option, which came down to the latter
using git_config_int() which rejects an empty string, but the rest of
the parsing using strtoul() which will convert it to 0.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 diff.c |  2 ++
 parse-options-cb.c |  2 ++
 revision.c |  2 ++
 t/t0014-abbrev.sh  | 22 --
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/diff.c b/diff.c
index 75935322f1..cab79d24ab 100644
--- a/diff.c
+++ b/diff.c
@@ -4802,6 +4802,8 @@ int diff_opt_parse(struct diff_options *options,
options->abbrev = DEFAULT_ABBREV;
else if (skip_prefix(arg, "--abbrev=", )) {
char *end;
+   if (!strcmp(arg, ""))
+   die("--abbrev expects a value, got '%s'", arg);
options->abbrev = strtoul(arg, , 10);
if (*end)
die("--abbrev expects a numerical value, got '%s'", 
arg);
diff --git a/parse-options-cb.c b/parse-options-cb.c
index e3cd87fbd6..aa9984f164 100644
--- a/parse-options-cb.c
+++ b/parse-options-cb.c
@@ -16,6 +16,8 @@ int parse_opt_abbrev_cb(const struct option *opt, const char 
*arg, int unset)
if (!arg) {
v = unset ? 0 : DEFAULT_ABBREV;
} else {
+   if (!strcmp(arg, ""))
+   return opterror(opt, "expects a value", 0);
v = strtol(arg, (char **), 10);
if (*arg)
return opterror(opt, "expects a numerical value", 0);
diff --git a/revision.c b/revision.c
index aa87afa77f..d39a292895 100644
--- a/revision.c
+++ b/revision.c
@@ -2048,6 +2048,8 @@ static int handle_revision_opt(struct rev_info *revs, int 
argc, const char **arg
revs->abbrev = DEFAULT_ABBREV;
} else if (skip_prefix(arg, "--abbrev=", )) {
char *end;
+   if (!strcmp(optarg, ""))
+   die("--abbrev expects a value, got '%s'", optarg);
revs->abbrev = strtoul(optarg, , 10);
if (*end)
die("--abbrev expects a numerical value, got '%s'", 
optarg);
diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index 203fe316b9..8448f78560 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -38,23 +38,17 @@ test_expect_success 'abbrev empty value handling differs ' '
test_must_fail git -c core.abbrev= log -1 --pretty=format:%h 2>stderr &&
test_i18ngrep "bad numeric config value.*invalid unit" stderr &&
 
-   git branch -v --abbrev= | cut_tr_d_n_field_n 3 >branch &&
-   test_byte_count = 40 branch &&
+   test_must_fail git branch -v --abbrev= 2>stderr &&
+   test_i18ngrep "expects a value" stderr &&
 
-   git log --abbrev= -1 --pretty=format:%h >log &&
-   test_byte_count = 4 log &&
+   test_must_fail git log --abbrev= -1 --pretty=format:%h 2>stderr &&
+   test_i18ngrep "expects a value" stderr &&
 
-   git diff --raw --abbrev= HEAD~ >diff &&
-   cut_tr_d_n_field_n 3 diff.3 &&
-   test_byte_count = 4 diff.3 &&
-   cut_tr_d_n_field_n 4 diff.4 &&
-   test_byte_count = 4 diff.4 &&
+   test_must_fail git diff --raw --abbrev= HEAD~ 2>stderr &&
+   test_i18ngrep "expects a value" stderr &&
 
-   test_must_fail git diff --raw --abbrev= --no-index X Y >diff &&
-   cut_tr_d_n_field_n 3 diff.3 &&
-   test_byte_count = 4 diff.3 &&
-   cut_tr_d_n_field_n 4 diff.4 &&
-   test_byte_count = 4 diff.4
+   test_must_fail git diff --raw --abbrev= --no-index X Y 2>stderr &&
+   test_i18ngrep "expects a value" stderr
 '
 
 test_expect_success 'abbrev non-integer value handling differs ' '
-- 
2.17.0.290.gded63e768a

[PATCH 19/20] abbrev: support relative abbrev values

2018-06-08 Thread Ævar Arnfjörð Bjarmason

Change the core.abbrev config variable and the corresponding --abbrev
command-line option to support relative values such as +1 or -1.

Before Linus's e6c587c733 ("abbrev: auto size the default
abbreviation", 2016-09-30) git would default to abbreviating object
names to 7-hexdigits, and only picking longer SHA-1s as needed if that
was ambiguous.

That change instead set the default length as a function of the
estimated current count of objects:

Based on the expectation that we would see collision in a
repository with 2^(2N) objects when using object names shortened
to first N bits, use sufficient number of hexdigits to cover the
number of objects in the repository.  Each hexdigit (4-bits) we
add to the shortened name allows us to have four times (2-bits) as
many objects in the repository.

By supporting relative values for core.abbrev we can allow users to
consistently opt-in for either a higher or lower probability of
collision, without needing to hardcode a given numeric value like
"10", which would be overkill on some repositories, and far to small
on others.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/config.txt   |   6 ++
 Documentation/diff-options.txt |   3 +
 Documentation/git-blame.txt|   3 +
 Documentation/git-branch.txt   |   3 +
 Documentation/git-describe.txt |   3 +
 Documentation/git-ls-files.txt |   3 +
 Documentation/git-ls-tree.txt  |   3 +
 Documentation/git-show-ref.txt |   3 +
 cache.h|   1 +
 config.c   |  11 +++
 diff.c |  18 +++-
 environment.c  |   1 +
 parse-options-cb.c |  12 ++-
 revision.c |  18 +++-
 sha1-name.c|  11 +++
 t/t0014-abbrev.sh  | 170 ++---
 16 files changed, 204 insertions(+), 65 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a9..abf07be7b6 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -919,6 +919,12 @@ core.abbrev::
in your repository, which hopefully is enough for
abbreviated object names to stay unique for some time.
The minimum length is 4.
++
+This can also be set to relative values such as `+2` or `-2`, which
+means to add or subtract N characters from the SHA-1 that Git would
+otherwise print, this allows for producing more future-proof SHA-1s
+for use within a given project, while adjusting the value for the
+current approximate number of objects.
 
 add.ignoreErrors::
 add.ignore-errors (deprecated)::
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index f466600972..f1114a7b8d 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -384,6 +384,9 @@ endif::git-format-patch[]
independent of the `--full-index` option above, which controls
the diff-patch output format.  Non default number of
digits can be specified with `--abbrev=`.
++
+Can also be set to a relative value, see `core.abbrev` in
+linkgit:git-diff[1].
 
 -B[][/]::
 --break-rewrites[=[][/]]::
diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt
index d6cddbcb2e..8559d3b0c7 100644
--- a/Documentation/git-blame.txt
+++ b/Documentation/git-blame.txt
@@ -99,6 +99,9 @@ requested, resulting in unaligned output.
 Before 2.19, setting `core.abbrev=40` wouldn't apply the above rule
 and would cause blame to emit output that was unaligned. This bug has
 since been fixed.
++
+Can also be set to a relative value, see `core.abbrev` in
+linkgit:git-diff[1].
 
 
 THE PORCELAIN FORMAT
diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
index 02eccbb931..0f8032cec6 100644
--- a/Documentation/git-branch.txt
+++ b/Documentation/git-branch.txt
@@ -182,6 +182,9 @@ See `--create-reflog` above for details.
Alter the sha1's minimum display length in the output listing.
The default value is 7 and can be overridden by the `core.abbrev`
config option.
++
+Can also be set to a relative value, see `core.abbrev` in
+linkgit:git-diff[1].
 
 --no-abbrev::
Display the full sha1s in the output listing rather than abbreviating 
them.
diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
index e027fb8c4b..a3d5c7e817 100644
--- a/Documentation/git-describe.txt
+++ b/Documentation/git-describe.txt
@@ -65,6 +65,9 @@ OPTIONS
abbreviated object name, use  digits, or as many digits
as needed to form a unique object name.  An  of 0
will suppress long format, only showing the closest tag.
++
+Can also be set to a relative value, see `core.abbrev` in
+linkgit:git-diff[1].
 
 --candidates=::
Instead of considering only the 10 most recent tags as
diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index 5298f1bc30..f9af2b23bf 100644
--- a/Documentation/git-ls-files.txt
+++

[PATCH 07/20] abbrev tests: test "git branch" behavior

2018-06-08 Thread Ævar Arnfjörð Bjarmason

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t0014-abbrev.sh | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index 934c54a96b..d8b060d922 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -168,4 +168,14 @@ do
"
 done
 
+for i in $(test_seq 4 40)
+do
+   test_expect_success "branch core.abbrev=$i and --abbrev=$i" "
+   git -c core.abbrev=$i branch -v | cut_tr_d_n_field_n 3 >branch 
&&
+   test_byte_count = $i branch &&
+   git branch --abbrev=$i -v | cut_tr_d_n_field_n 3 >branch &&
+   test_byte_count = $i branch
+   "
+done
+
 test_done
-- 
2.17.0.290.gded63e768a

[PATCH 02/20] test library: add a test_byte_count

2018-06-08 Thread Ævar Arnfjörð Bjarmason

This new function is like test_line_count except it uses "wc -c"
instead of "wc -l". Perhaps this should be a parameter, but I don't
see us needing "wc -m" (chars), "wc -w" (words) etc.

Change a couple of existing tests that use this, I expect to use this
in later patches when adding more tests.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/README   |  4 
 t/t6006-rev-list-format.sh |  6 ++
 t/test-lib-functions.sh| 23 +++
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/t/README b/t/README
index eede11d649..3139c27e4e 100644
--- a/t/README
+++ b/t/README
@@ -728,6 +728,10 @@ library for your script to use.
Check whether the  rev points to the same commit as the
 rev.
 
+ - test_byte_count (= | -lt | -ge | ...)  
+
+   Check whether a file has the number of bytes it is expected to.
+
  - test_line_count (= | -lt | -ge | ...)  
 
Check whether a file has the number of lines it is expected to.
diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
index ec42c2f779..ec068c55ab 100755
--- a/t/t6006-rev-list-format.sh
+++ b/t/t6006-rev-list-format.sh
@@ -456,14 +456,12 @@ test_expect_success '--abbrev' '
 
 test_expect_success '%H is not affected by --abbrev-commit' '
git log -1 --format=%H --abbrev-commit --abbrev=20 HEAD >actual &&
-   len=$(wc -c actual &&
-   len=$(wc -c output &&
+#  test_byte_count = 1 output
+#  '
+#
+# is like "test $(wc -c

[PATCH 04/20] abbrev tests: add tests for core.abbrev and --abbrev

2018-06-08 Thread Ævar Arnfjörð Bjarmason

How hashes are abbreviated is a core feature of git, and should have
its own t0*.sh tests. There's a few tests for core.abbrev and --abbrev
already, but none of those stress the feature itself and its edge
cases much. We should have tests for those in one place.

I don't like some of this behavior of --abbrev being looser about
input values that core.abbrev, But let's start by asserting the
current behavior we have before we change any of it.

That difference in behavior wasn't intentional. The code that does the
command-line parsing was initially added in 0ce865b134 ("Add shortcuts
for very often used options.", 2007-10-14), and it wasn't until much
later in dce9648916 ("Make the default abbrev length configurable",
2010-10-28) when core.abbrev was added with stricter parsing.

That's also only most of the command-line parsing. The diff and log
family of commands have their own parsing for it in diff.c and
revision.c, respectively. Those were added earlier in
47dd0d595d ("diff: --abbrev option", 2005-12-13) and 508d9e372e ("Fix
"--abbrev=xyz" for revision listing", 2006-05-27), although note that
sometimes diff goes via the revision.c path.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t0014-abbrev.sh | 118 ++
 1 file changed, 118 insertions(+)
 create mode 100755 t/t0014-abbrev.sh

diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
new file mode 100755
index 00..1c60f5ff93
--- /dev/null
+++ b/t/t0014-abbrev.sh
@@ -0,0 +1,118 @@
+#!/bin/sh
+
+test_description='test core.abbrev and related features'
+
+. ./test-lib.sh
+
+tr_d_n() {
+   tr -d '\n'
+}
+
+cut_tr_d_n_field_n() {
+   cut -d " " -f $1 | tr_d_n
+}
+
+test_expect_success 'setup' '
+   test_commit A &&
+   git tag -a -mannotated A.annotated &&
+   test_commit B &&
+   test_commit C &&
+   mkdir X Y &&
+   touch X/file1 Y/file2
+'
+
+test_expect_success 'the FALLBACK_DEFAULT_ABBREV is 7' '
+   git log -1 --pretty=format:%h >log &&
+   test_byte_count = 7 log
+'
+
+test_expect_success 'abbrev empty value handling differs ' '
+   test_must_fail git -c core.abbrev= log -1 --pretty=format:%h 2>stderr &&
+   test_i18ngrep "bad numeric config value.*invalid unit" stderr &&
+
+   git branch -v --abbrev= | cut_tr_d_n_field_n 3 >branch &&
+   test_byte_count = 40 branch &&
+
+   git log --abbrev= -1 --pretty=format:%h >log &&
+   test_byte_count = 4 log &&
+
+   git diff --raw --abbrev= HEAD~ >diff &&
+   cut_tr_d_n_field_n 3 diff.3 &&
+   test_byte_count = 4 diff.3 &&
+   cut_tr_d_n_field_n 4 diff.4 &&
+   test_byte_count = 4 diff.4 &&
+
+   test_must_fail git diff --raw --abbrev= --no-index X Y >diff &&
+   cut_tr_d_n_field_n 3 diff.3 &&
+   test_byte_count = 4 diff.3 &&
+   cut_tr_d_n_field_n 4 diff.4 &&
+   test_byte_count = 4 diff.4
+'
+
+test_expect_success 'abbrev non-integer value handling differs ' '
+   test_must_fail git -c core.abbrev=XYZ log -1 --pretty=format:%h 
2>stderr &&
+   test_i18ngrep "bad numeric config value.*invalid unit" stderr &&
+
+   test_must_fail git branch -v --abbrev=XYZ 2>stderr &&
+   test_i18ngrep "expects a numerical value" stderr &&
+
+   git log --abbrev=XYZ -1 --pretty=format:%h 2>stderr &&
+   ! test -s stderr &&
+
+   git diff --raw --abbrev=XYZ HEAD~ 2>stderr &&
+   ! test -s stderr &&
+
+   test_must_fail git diff --raw --abbrev=XYZ --no-index X Y 2>stderr &&
+   ! test -s stderr
+'
+
+for i in -41 -20 -10 -1 0 1 2 3 41
+do
+   test_expect_success "core.abbrev value $i out of range errors out" "
+   test_must_fail git -c core.abbrev=$i log -1 --pretty=format:%h 
2>stderr &&
+   test_i18ngrep 'abbrev length out of range' stderr
+   "
+done
+
+for i in -41 -20 -10 -1
+do
+   test_expect_success "negative --abbrev=$i value out of range means 
--abbrev=40" "
+   git log --abbrev=$i -1 --pretty=format:%h >log &&
+   test_byte_count = 40 log
+   "
+done
+
+for i in 0 1 2 3 4
+do
+   test_expect_success "non-negative --abbrev=$i value log &&
+   test_byte_count = 4 log
+   "
+done
+
+for i in 41 9001
+do
+   test_expect_success "non-negative --abbrev=$i value >MINIMUM_ABBREV 
falls back on 40" "
+   git log --abbrev=$i -1 --pretty=format:%h >log &&
+   test_byte_count = 40 log
+   "
+done
+
+for i in $(test_seq 4 40)
+do
+   test_expect_success "core.abbrev=$i and --abbrev=$i in combination 
within the valid range" "
+   # Both core.abbrev=X and --abbrev=X do the same thing
+   # in isolation
+   git -c core.abbrev=$i log -1 --pretty=format:%h >log &&
+   test_byte_count = $i log &&
+   git log --abbrev=$i -1 --pretty=format:%h >log &&
+   test_byte_count = $i log &&
+
+   # The --abbrev option should take priority over
+

[PATCH 00/20] unconditional O(1) SHA-1 abbreviation

2018-06-08 Thread Ævar Arnfjörð Bjarmason

This patch series implements an entirely alternate method of achieving
some of the same ends as the MIDX, using the approach suggested by
Jeff King from the RFC thread back in January[1]. You can now do:

core.abbrev = 20
core.validateAbbrev = false

Or:

core.abbrev = +2
core.validateAbbrev = false

On linux.git `git log --oneline --raw --parents` with 64MB packs gives
this improvement with core.abbrev=15 & core.validateAbbrev=false
(v.s. true):

TestHEAD~   HEAD


0014.1: git log --oneline --raw --parents   95.68(95.07+0.53)   
42.74(42.33+0.39) -55.3%

While cleaning up the RFC version of this which I sent in [2] I
discovered that almost none of the existing functionality was tested
for, and was very inconsistent since we have 4 different places where
the abbrev config is parsed.

See 19/20 and 20/20 for what this whole thing is building towards, the
rest is all tests, cleanup, and preparatory work.

(There's still other long-standing issues with the existing behavior
which this doesn't change, but I had to stop somewhere to make this
digestible).

1. https://public-inbox.org/git/20180108102029.ga21...@sigill.intra.peff.net/
2. https://public-inbox.org/git/20180606102719.27145-1-ava...@gmail.com/

Ævar Arnfjörð Bjarmason (20):
  t/README: clarify the description of test_line_count
  test library: add a test_byte_count
  blame doc: explicitly note how --abbrev=40 gives 39 chars
  abbrev tests: add tests for core.abbrev and --abbrev
  abbrev tests: test "git-blame" behavior
  blame: fix a bug, core.abbrev should work like --abbrev
  abbrev tests: test "git branch" behavior
  abbrev tests: test for "git-describe" behavior
  abbrev tests: test for "git-log" behavior
  abbrev tests: test for "git-diff" behavior
  abbrev tests: test for plumbing behavior
  abbrev tests: test for --abbrev and core.abbrev=[+-]N
  parse-options-cb.c: convert uses of 40 to GIT_SHA1_HEXSZ
  config.c: use braces on multiple conditional arms
  parse-options-cb.c: use braces on multiple conditional arms
  abbrev: unify the handling of non-numeric values
  abbrev: unify the handling of empty values
  abbrev parsing: use braces on multiple conditional arms
  abbrev: support relative abbrev values
  abbrev: add a core.validateAbbrev setting

 Documentation/config.txt|  49 +++
 Documentation/diff-options.txt  |   3 +
 Documentation/git-blame.txt |  14 +
 Documentation/git-branch.txt|   3 +
 Documentation/git-describe.txt  |   3 +
 Documentation/git-ls-files.txt  |   3 +
 Documentation/git-ls-tree.txt   |   3 +
 Documentation/git-show-ref.txt  |   3 +
 builtin/blame.c |   2 +
 cache.h |   2 +
 config.c|  22 +-
 diff.c  |  24 +-
 environment.c   |   2 +
 parse-options-cb.c  |  19 +-
 revision.c  |  24 +-
 sha1-name.c |  15 +
 t/README|   6 +-
 t/perf/p0014-abbrev.sh  |  13 +
 t/t0014-abbrev.sh   | 452 
 t/t1512-rev-parse-disambiguation.sh |  47 +++
 t/t6006-rev-list-format.sh  |   6 +-
 t/test-lib-functions.sh |  23 ++
 22 files changed, 722 insertions(+), 16 deletions(-)
 create mode 100755 t/perf/p0014-abbrev.sh
 create mode 100755 t/t0014-abbrev.sh

-- 
2.17.0.290.gded63e768a

[PATCH 01/20] t/README: clarify the description of test_line_count

2018-06-08 Thread Ævar Arnfjörð Bjarmason

Referring to the "length" of a file is ambiguous. We mean the number
of lines, so let's say that.

Changes the description added in fb3340a6a7 ("test-lib: introduce
test_line_count to measure files", 2010-10-31).

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/README | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/README b/t/README
index 8373a27fea..eede11d649 100644
--- a/t/README
+++ b/t/README
@@ -730,7 +730,7 @@ library for your script to use.
 
  - test_line_count (= | -lt | -ge | ...)  
 
-   Check whether a file has the length it is expected to.
+   Check whether a file has the number of lines it is expected to.
 
  - test_path_is_file  []
test_path_is_dir  []
-- 
2.17.0.290.gded63e768a

Re: [PATCH v6 20/21] gc: automatically write commit-graph files

2018-06-08 Thread SZEDER Gábor



> The commit-graph file is a very helpful feature for speeding up git
> operations. In order to make it more useful, make it possible to
> write the commit-graph file during standard garbage collection
> operations.
> 
> Add a 'gc.commitGraph' config setting that triggers writing a
> commit-graph file after any non-trivial 'git gc' command. Defaults to
> false while the commit-graph feature matures. We specifically do not
> want to have this on by default until the commit-graph feature is fully
> integrated with history-modifying features like shallow clones.

So I played around with an earlier version of this patch series a
while ago... and as I looked into my gitconfig today I was surprised
to have both 'core.commitGraph' and 'gc.commitGraph' config variables
set.  When I looked into it I came across this email from Ævar:

  https://public-inbox.org/git/87fu3peni2@evledraar.gmail.com/

  > Other than the question if 'gc.commitGraph' and 'core.commitGraph'
  > should be independent config variables, and the exact wording of the
  > git-gc docs, it looks good to me.

  Sans doc errors you pointed out in other places (you need to set
  core.commitGraph so it's read at all), I think it's very useful to have
  these split up. It's simliar to pack.useBitmaps & pack.writeBitmaps.

I think the comparison with pack bitmaps makes a lot of sense and I
have to say that I really like those 'useBitmaps' and 'writeBitmaps'
variable names, because it's clear right away which one is which,
without consulting the documentation.  I think having 'useCommitGraph'
and 'writeCommitGraph' variables would be a lot better than the same
variable name in two different sections, and I'm sure that then I
wouldn't have been caught off guard.

Yeah, I know, my timing sucks, with 'core.commitGraph' already out
there in the -rc releases...  sorry.

Re: [RFC PATCH v1] telemetry design overview (part 1)

2018-06-08 Thread Ævar Arnfjörð Bjarmason

On Fri, Jun 08 2018, Johannes Sixt wrote:

> Am 08.06.2018 um 18:00 schrieb Thomas Braun:
>> I for my part would much rather prefer that to be a compile time
>> option so that I don't need to check on every git update on windows
>> if  this is now enabled or not.
>
> This exactly my concern, too! A compile-time option may make it a good
> deal less worrisome.

Can you elaborate on how someone who can maintain inject malicious code
into your git package + config would be thwarted by this being some
compile-time option, wouldn't they just compile it in?

Re: GDPR compliance best practices?

2018-06-08 Thread Johannes Sixt


Am 08.06.2018 um 04:53 schrieb Theodore Y. Ts'o:

And of course, that's the other thing you seem to fundamentally not
understand about how git works.  Every developer in the world working
on that open source project has their own copy.


Everyone here understands how Git works, of course.

"*shrug* but that's how Git works" does *NOT* override the GDPR.

-- Hannes

Re: [RFC PATCH v1] telemetry design overview (part 1)

2018-06-08 Thread Johannes Sixt


Am 08.06.2018 um 18:00 schrieb Thomas Braun:

I for my part would much rather prefer that to be a compile time
option so that I don't need to check on every git update on windows
if  this is now enabled or not.


This exactly my concern, too! A compile-time option may make it a good 
deal less worrisome.


-- Hannes

Re: [PATCH v2] completion: reduce overhead of clearing cached --options

2018-06-08 Thread SZEDER Gábor

On Fri, Jun 8, 2018 at 11:52 PM, Jonathan Nieder  wrote:
> SZEDER Gábor wrote:
>> On Fri, Jun 8, 2018 at 11:23 PM, Jonathan Nieder  wrote:
>
 +   [[ -z "${GIT_SOURCING_ZSH_COMPLETION}" ]] ; then
>>>
>>> Needs a - before the } to avoid errors in a shell where the user has
>>> chosen to use "set -u".  See v1.7.4-rc0~159 (completion: fix zsh check
>>> under bash with 'set -u', 2010-10-27) for more details.
>>
>> Right...  I did remember that, but by the time I finished typing out
>> that long variable name I forgot about it... :)
>>
>> However, I'm not sure it's worth caring about, because the
>> bash-competion scripts don't work with 'set -u' anyway...
>
> I'd be happy to see these '-'s go away (it would make the code both
> easier to read and easier to maintain), but as a post-2.18 topic,
> please.

Of course.

> Alternatively, if there's a real need for them, I'd be happy to see
> the completion tests start using 'set -u'.

Our test harness doesn't work with 'set -u' at all...

Re: [PATCH v2] completion: reduce overhead of clearing cached --options

2018-06-08 Thread Jonathan Nieder

SZEDER Gábor wrote:
> On Fri, Jun 8, 2018 at 11:23 PM, Jonathan Nieder  wrote:

>>> +   [[ -z "${GIT_SOURCING_ZSH_COMPLETION}" ]] ; then
>>
>> Needs a - before the } to avoid errors in a shell where the user has
>> chosen to use "set -u".  See v1.7.4-rc0~159 (completion: fix zsh check
>> under bash with 'set -u', 2010-10-27) for more details.
>
> Right...  I did remember that, but by the time I finished typing out
> that long variable name I forgot about it... :)
>
> However, I'm not sure it's worth caring about, because the
> bash-competion scripts don't work with 'set -u' anyway...

I'd be happy to see these '-'s go away (it would make the code both
easier to read and easier to maintain), but as a post-2.18 topic,
please.

Alternatively, if there's a real need for them, I'd be happy to see
the completion tests start using 'set -u'.

Thanks,
Jonathan

Re: [PATCH v2] completion: reduce overhead of clearing cached --options

2018-06-08 Thread SZEDER Gábor

On Fri, Jun 8, 2018 at 11:23 PM, Jonathan Nieder  wrote:
> Hi,
>
> SZEDER Gábor wrote:
>
>> Being in RC phase, I'm all for aiming for a minimal solution.
>> However, I don't think that the better fix would be erm.. any "less
>> minimal":
>>
>> diff --git a/contrib/completion/git-completion.bash 
>> b/contrib/completion/git-completion.bash
>> index f2aa484758..7aeb575cd1 100644
>> --- a/contrib/completion/git-completion.bash
>> +++ b/contrib/completion/git-completion.bash
>> @@ -3244,7 +3244,10 @@ __gitk_main ()
>>   __git_complete_revlist
>>  }
>>
>> -if [[ -n ${ZSH_VERSION-} ]]; then
>> +if [[ -n ${ZSH_VERSION-} ]] &&
>> +   # Don't define these functions when sourced from 'git-completion.zsh',
>> +   # it has its own implementations.
>> +   [[ -z "${GIT_SOURCING_ZSH_COMPLETION}" ]] ; then
>
> Needs a - before the } to avoid errors in a shell where the user has
> chosen to use "set -u".  See v1.7.4-rc0~159 (completion: fix zsh check
> under bash with 'set -u', 2010-10-27) for more details.

Right...  I did remember that, but by the time I finished typing out
that long variable name I forgot about it... :)

However, I'm not sure it's worth caring about, because the
bash-competion scripts don't work with 'set -u' anyway...

>>   echo "WARNING: this script is deprecated, please see 
>> git-completion.zsh" 1>&2
>>
>>   autoload -U +X compinit && compinit
>> diff --git a/contrib/completion/git-completion.zsh 
>> b/contrib/completion/git-completion.zsh
>> index 53cb0f934f..049d6b80f6 100644
>> --- a/contrib/completion/git-completion.zsh
>> +++ b/contrib/completion/git-completion.zsh
>> @@ -39,7 +39,7 @@ if [ -z "$script" ]; then
>>   test -f $e && script="$e" && break
>>   done
>>  fi
>> -ZSH_VERSION='' . "$script"
>> +GIT_SOURCING_ZSH_COMPLETION=y . "$script"
>>
>>  __gitcomp ()
>>  {
>
> Except for that tweak,
> Reviewed-by: Jonathan Nieder 
> Thanks.
>
> Now it just needs a commit message. :)

Re: [PATCH v2] completion: reduce overhead of clearing cached --options

2018-06-08 Thread Jonathan Nieder

Hi,

SZEDER Gábor wrote:

> Being in RC phase, I'm all for aiming for a minimal solution.
> However, I don't think that the better fix would be erm.. any "less
> minimal":
>
> diff --git a/contrib/completion/git-completion.bash 
> b/contrib/completion/git-completion.bash
> index f2aa484758..7aeb575cd1 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -3244,7 +3244,10 @@ __gitk_main ()
>   __git_complete_revlist
>  }
>  
> -if [[ -n ${ZSH_VERSION-} ]]; then
> +if [[ -n ${ZSH_VERSION-} ]] &&
> +   # Don't define these functions when sourced from 'git-completion.zsh',
> +   # it has its own implementations.
> +   [[ -z "${GIT_SOURCING_ZSH_COMPLETION}" ]] ; then

Needs a - before the } to avoid errors in a shell where the user has
chosen to use "set -u".  See v1.7.4-rc0~159 (completion: fix zsh check
under bash with 'set -u', 2010-10-27) for more details.

>   echo "WARNING: this script is deprecated, please see 
> git-completion.zsh" 1>&2
>  
>   autoload -U +X compinit && compinit
> diff --git a/contrib/completion/git-completion.zsh 
> b/contrib/completion/git-completion.zsh
> index 53cb0f934f..049d6b80f6 100644
> --- a/contrib/completion/git-completion.zsh
> +++ b/contrib/completion/git-completion.zsh
> @@ -39,7 +39,7 @@ if [ -z "$script" ]; then
>   test -f $e && script="$e" && break
>   done
>  fi
> -ZSH_VERSION='' . "$script"
> +GIT_SOURCING_ZSH_COMPLETION=y . "$script"
>  
>  __gitcomp ()
>  {

Except for that tweak,
Reviewed-by: Jonathan Nieder 
Thanks.

Now it just needs a commit message. :)

Re: [PATCH v2] completion: reduce overhead of clearing cached --options

2018-06-08 Thread SZEDER Gábor



> Related question: what would it take to add a zsh completion sanity
> check to t/t9902-completion.sh?

I don't know.  What I do know is that we can't just run our tests with
ZSH, e.g. running 'zsh ./t-basic.sh' shows mostly failures.  So it
won't be as simple as modifying 't/lib-bash.sh' to somehow support ZSH
as well.

> Here's a minimal fix, untested.  As a followup, as G�bor suggests at [2],
> it would presumably make sense to stop overriding ZSH_VERSION, using
> this GIT_ scoped variable everywhere instead.
> 
> Thoughts?
> 
> Reported-by: Rick van Hattem 
> Reported-by: Dave Borowitz 
> Signed-off-by: Jonathan Nieder 
> 
> [1] 
> https://public-inbox.org/git/01020163c683e753-04629405-15f8-4a30-9dc3-e4e3f2a5aa26-000...@eu-west-1.amazonses.com/
> [2] https://public-inbox.org/git/20180606114147.7753-1-szeder@gmail.com/
> 
> diff --git i/contrib/completion/git-completion.bash 
> w/contrib/completion/git-completion.bash
> index 12814e9bbf..e4bcc435ea 100644
> --- i/contrib/completion/git-completion.bash
> +++ w/contrib/completion/git-completion.bash
> @@ -348,7 +348,7 @@ __gitcomp ()
>  
>  # Clear the variables caching builtins' options when (re-)sourcing
>  # the completion script.
> -if [[ -n ${ZSH_VERSION-} ]]; then
> +if [[ -n ${ZSH_VERSION-} || -n ${GIT_SOURCING_ZSH_COMPLETION-} ]]; then
>   unset $(set |sed -ne 
> 's/^\(__gitcomp_builtin_[a-zA-Z0-9_][a-zA-Z0-9_]*\)=.*/\1/p') 2>/dev/null
>  else
>   unset $(compgen -v __gitcomp_builtin_)
> diff --git i/contrib/completion/git-completion.zsh 
> w/contrib/completion/git-completion.zsh
> index 53cb0f934f..c7be9fd4dc 100644
> --- i/contrib/completion/git-completion.zsh
> +++ w/contrib/completion/git-completion.zsh
> @@ -39,7 +39,7 @@ if [ -z "$script" ]; then
>   test -f $e && script="$e" && break
>   done
>  fi
> -ZSH_VERSION='' . "$script"
> +GIT_SOURCING_ZSH_COMPLETION=1 ZSH_VERSION='' . "$script"
>  
>  __gitcomp ()
>  {
> 

Being in RC phase, I'm all for aiming for a minimal solution.
However, I don't think that the better fix would be erm.. any "less
minimal":

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index f2aa484758..7aeb575cd1 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -3244,7 +3244,10 @@ __gitk_main ()
__git_complete_revlist
 }
 
-if [[ -n ${ZSH_VERSION-} ]]; then
+if [[ -n ${ZSH_VERSION-} ]] &&
+   # Don't define these functions when sourced from 'git-completion.zsh',
+   # it has its own implementations.
+   [[ -z "${GIT_SOURCING_ZSH_COMPLETION}" ]] ; then
echo "WARNING: this script is deprecated, please see 
git-completion.zsh" 1>&2
 
autoload -U +X compinit && compinit
diff --git a/contrib/completion/git-completion.zsh 
b/contrib/completion/git-completion.zsh
index 53cb0f934f..049d6b80f6 100644
--- a/contrib/completion/git-completion.zsh
+++ b/contrib/completion/git-completion.zsh
@@ -39,7 +39,7 @@ if [ -z "$script" ]; then
test -f $e && script="$e" && break
done
 fi
-ZSH_VERSION='' . "$script"
+GIT_SOURCING_ZSH_COMPLETION=y . "$script"
 
 __gitcomp ()
 {

[PATCHv2 0/6] git-p4: some small fixes updated

2018-06-08 Thread Luke Diamand

This is an updated version of the set of changes I posted recently,
following comments on the list:

disable automatic sync after git-p4 submit:
https://marc.info/?l=git=152818734814838=2

better handling of being logged out by Perforce:
   https://marc.info/?l=git=152818893815326=2

adapt the block size automatically on git-p4 submit:
   https://marc.info/?l=git=152819004315688=2

- Spelling corrections (Eric)
- Improved comments (Eric)
- Exception class hierarchy fix (Merland)
- test simplification (Eric)


Luke Diamand (6):
  git-p4: disable-rebase: allow setting this via configuration
  git-p4: add option to disable syncing of p4/master with p4
  git-p4: better error reporting when p4 fails
  git-p4: raise exceptions from p4CmdList based on error from p4 server
  git-p4: narrow the scope of exceptions caught when parsing an int
  git-p4: auto-size the block

 Documentation/git-p4.txt |  13 +++-
 git-p4.py| 161 +--
 t/t9818-git-p4-block.sh  |   8 ++
 t/t9833-errors.sh|  78 +++
 4 files changed, 236 insertions(+), 24 deletions(-)
 create mode 100755 t/t9833-errors.sh

-- 
2.17.0.392.gdeb1a6e9b7

[PATCHv2 6/6] git-p4: auto-size the block

2018-06-08 Thread Luke Diamand

git-p4 originally would fetch changes in one query. On large repos this
could fail because of the limits that Perforce imposes on the number of
items returned and the number of queries in the database.

To fix this, git-p4 learned to query changes in blocks of 512 changes,
However, this can be very slow - if you have a few million changes,
with each chunk taking about a second, it can be an hour or so.

Although it's possible to tune this value manually with the
"--changes-block-size" option, it's far from obvious to ordinary users
that this is what needs doing.

This change alters the block size dynamically by looking for the
specific error messages returned from the Perforce server, and reducing
the block size if the error is seen, either to the limit reported by the
server, or to half the current block size.

That means we can start out with a very large block size, and then let
it automatically drop down to a value that works without error, while
still failing correctly if some other error occurs.

Signed-off-by: Luke Diamand 
---
 git-p4.py   | 27 +--
 t/t9818-git-p4-block.sh |  8 
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 5f59feb5bb..0354d4df5c 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -47,8 +47,8 @@ def __str__(self):
 # Only labels/tags matching this will be imported/exported
 defaultLabelRegexp = r'[a-zA-Z0-9_\-.]+$'
 
-# Grab changes in blocks of this many revisions, unless otherwise requested
-defaultBlockSize = 512
+# The block size is reduced automatically if required
+defaultBlockSize = 1<<20
 
 p4_access_checked = False
 
@@ -969,7 +969,8 @@ def p4ChangesForPaths(depotPaths, changeRange, 
requestedBlockSize):
 changes = set()
 
 # Retrieve changes a block at a time, to prevent running
-# into a MaxResults/MaxScanRows error from the server.
+# into a MaxResults/MaxScanRows error from the server. If
+# we _do_ hit one of those errors, turn down the block size
 
 while True:
 cmd = ['changes']
@@ -983,10 +984,24 @@ def p4ChangesForPaths(depotPaths, changeRange, 
requestedBlockSize):
 for p in depotPaths:
 cmd += ["%s...@%s" % (p, revisionRange)]
 
+# fetch the changes
+try:
+result = p4CmdList(cmd, errors_as_exceptions=True)
+except P4RequestSizeException as e:
+if not block_size:
+block_size = e.limit
+elif block_size > e.limit:
+block_size = e.limit
+else:
+block_size = max(2, block_size // 2)
+
+if verbose: print("block size error, retrying with block size 
{0}".format(block_size))
+continue
+except P4Exception as e:
+die('Error retrieving changes description 
({0})'.format(e.p4ExitCode))
+
 # Insert changes in chronological order
-for entry in reversed(p4CmdList(cmd)):
-if entry.has_key('p4ExitCode'):
-die('Error retrieving changes descriptions 
({})'.format(entry['p4ExitCode']))
+for entry in reversed(result):
 if not entry.has_key('change'):
 continue
 changes.add(int(entry['change']))
diff --git a/t/t9818-git-p4-block.sh b/t/t9818-git-p4-block.sh
index 8840a183ac..ce7cb22ad3 100755
--- a/t/t9818-git-p4-block.sh
+++ b/t/t9818-git-p4-block.sh
@@ -129,6 +129,7 @@ test_expect_success 'Create a repo with multiple depot 
paths' '
 '
 
 test_expect_success 'Clone repo with multiple depot paths' '
+   test_when_finished cleanup_git &&
(
cd "$git" &&
git p4 clone --changes-block-size=4 //depot/pathA@all 
//depot/pathB@all \
@@ -138,6 +139,13 @@ test_expect_success 'Clone repo with multiple depot paths' 
'
)
 '
 
+test_expect_success 'Clone repo with self-sizing block size' '
+   test_when_finished cleanup_git &&
+   git p4 clone --changes-block-size=100 //depot@all 
--destination="$git" &&
+   git -C "$git" log --oneline >log &&
+   test_line_count \> 10 log
+'
+
 test_expect_success 'kill p4d' '
kill_p4d
 '
-- 
2.17.0.392.gdeb1a6e9b7

[PATCHv2 3/6] git-p4: better error reporting when p4 fails

2018-06-08 Thread Luke Diamand

Currently when p4 fails to run, git-p4 just crashes with an obscure
error message.

For example, if the P4 ticket has expired, you get:

  Error: Cannot locate perforce checkout of  in client view

This change checks whether git-p4 can talk to the Perforce server when
the first P4 operation is attempted, and tries to print a meaningful
error message if it fails.

Signed-off-by: Luke Diamand 
---
 git-p4.py | 55 +
 t/t9833-errors.sh | 78 +++
 2 files changed, 133 insertions(+)
 create mode 100755 t/t9833-errors.sh

diff --git a/git-p4.py b/git-p4.py
index b61f47cc61..3de12a4a0a 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -50,6 +50,8 @@ def __str__(self):
 # Grab changes in blocks of this many revisions, unless otherwise requested
 defaultBlockSize = 512
 
+p4_access_checked = False
+
 def p4_build_cmd(cmd):
 """Build a suitable p4 command line.
 
@@ -91,6 +93,13 @@ def p4_build_cmd(cmd):
 real_cmd = ' '.join(real_cmd) + ' ' + cmd
 else:
 real_cmd += cmd
+
+# now check that we can actually talk to the server
+global p4_access_checked
+if not p4_access_checked:
+p4_access_checked = True# suppress access checks in 
p4_check_access itself
+p4_check_access()
+
 return real_cmd
 
 def git_dir(path):
@@ -264,6 +273,52 @@ def p4_system(cmd):
 if retcode:
 raise CalledProcessError(retcode, real_cmd)
 
+def die_bad_access(s):
+die("failure accessing depot: {0}".format(s.rstrip()))
+
+def p4_check_access(min_expiration=1):
+""" Check if we can access Perforce - account still logged in
+"""
+results = p4CmdList(["login", "-s"])
+
+if len(results) == 0:
+# should never get here: always get either some results, or a 
p4ExitCode
+assert("could not parse response from perforce")
+
+result = results[0]
+
+if 'p4ExitCode' in result:
+# p4 returned non-zero status, e.g. P4PORT invalid, or p4 not in path
+die_bad_access("could not run p4")
+
+code = result.get("code")
+if not code:
+# we get here if we couldn't connect and there was nothing to unmarshal
+die_bad_access("could not connect")
+
+elif code == "stat":
+expiry = result.get("TicketExpiration")
+if expiry:
+expiry = int(expiry)
+if expiry > min_expiration:
+# ok to carry on
+return
+else:
+die_bad_access("perforce ticket expires in {0} 
seconds".format(expiry))
+
+else:
+# account without a timeout - all ok
+return
+
+elif code == "error":
+data = result.get("data")
+if data:
+die_bad_access("p4 error: {0}".format(data))
+else:
+die_bad_access("unknown error")
+else:
+die_bad_access("unknown error code {0}".format(code))
+
 _p4_version_string = None
 def p4_version_string():
 """Read the version string, showing just the last line, which
diff --git a/t/t9833-errors.sh b/t/t9833-errors.sh
new file mode 100755
index 00..9ba892de7a
--- /dev/null
+++ b/t/t9833-errors.sh
@@ -0,0 +1,78 @@
+#!/bin/sh
+
+test_description='git p4 errors'
+
+. ./lib-git-p4.sh
+
+test_expect_success 'start p4d' '
+   start_p4d
+'
+
+test_expect_success 'add p4 files' '
+   (
+   cd "$cli" &&
+   echo file1 >file1 &&
+   p4 add file1 &&
+   p4 submit -d "file1"
+   )
+'
+
+# after this test, the default user requires a password
+test_expect_success 'error handling' '
+   git p4 clone --dest="$git" //depot@all &&
+   (
+   cd "$git" &&
+   P4PORT=: test_must_fail git p4 submit 2>errmsg
+   ) &&
+   p4 passwd -P newpassword &&
+   (
+   P4PASSWD=badpassword test_must_fail git p4 clone //depot/foo 
2>errmsg &&
+   grep -q "failure accessing depot.*P4PASSWD" errmsg
+   )
+'
+
+test_expect_success 'ticket logged out' '
+   P4TICKETS="$cli/tickets" &&
+   echo "newpassword" | p4 login &&
+   (
+   cd "$git" &&
+   test_commit "ticket-auth-check" &&
+   p4 logout &&
+   test_must_fail git p4 submit 2>errmsg &&
+   grep -q "failure accessing depot" errmsg
+   )
+'
+
+test_expect_success 'create group with short ticket expiry' '
+   P4TICKETS="$cli/tickets" &&
+   echo "newpassword" | p4 login &&
+   p4_add_user short_expiry_user &&
+   p4 -u short_expiry_user passwd -P password &&
+   p4 group -i <<-EOF &&
+   Group: testgroup
+   Timeout: 3
+   Users: short_expiry_user
+   EOF
+
+   p4 users | grep short_expiry_user
+'
+
+test_expect_success 'git operation with expired ticket' '
+   P4TICKETS="$cli/tickets" &&
+   P4USER=short_expiry_user &&
+   echo "password" | p4 login &&
+   (
+   cd

[PATCHv2 4/6] git-p4: raise exceptions from p4CmdList based on error from p4 server

2018-06-08 Thread Luke Diamand

This change lays some groundwork for better handling of rowcount errors
from the server, where it fails to send us results because we requested
too many.

It adds an option to p4CmdList() to return errors as a Python exception.

The exceptions are derived from P4Exception (something went wrong),
P4ServerException (the server sent us an error code) and
P4RequestSizeException (we requested too many rows/results from the
server database).

This makes the code that handles the errors a bit easier.

The default behavior is unchanged; the new code is enabled with a flag.

Signed-off-by: Luke Diamand 
---
 git-p4.py | 44 
 1 file changed, 40 insertions(+), 4 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 3de12a4a0a..f4b5deeb83 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -566,10 +566,30 @@ def isModeExec(mode):
 # otherwise False.
 return mode[-3:] == "755"
 
+class P4Exception(Exception):
+""" Base class for exceptions from the p4 client """
+def __init__(self, exit_code):
+self.p4ExitCode = exit_code
+
+class P4ServerException(P4Exception):
+""" Base class for exceptions where we get some kind of marshalled up 
result from the server """
+def __init__(self, exit_code, p4_result):
+super(P4ServerException, self).__init__(exit_code)
+self.p4_result = p4_result
+self.code = p4_result[0]['code']
+self.data = p4_result[0]['data']
+
+class P4RequestSizeException(P4ServerException):
+""" One of the maxresults or maxscanrows errors """
+def __init__(self, exit_code, p4_result, limit):
+super(P4RequestSizeException, self).__init__(exit_code, p4_result)
+self.limit = limit
+
 def isModeExecChanged(src_mode, dst_mode):
 return isModeExec(src_mode) != isModeExec(dst_mode)
 
-def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False):
+def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, skip_info=False,
+errors_as_exceptions=False):
 
 if isinstance(cmd,basestring):
 cmd = "-G " + cmd
@@ -616,9 +636,25 @@ def p4CmdList(cmd, stdin=None, stdin_mode='w+b', cb=None, 
skip_info=False):
 pass
 exitCode = p4.wait()
 if exitCode != 0:
-entry = {}
-entry["p4ExitCode"] = exitCode
-result.append(entry)
+if errors_as_exceptions:
+if len(result) > 0:
+data = result[0].get('data')
+if data:
+m = re.search('Too many rows scanned \(over (\d+)\)', data)
+if not m:
+m = re.search('Request too large \(over (\d+)\)', data)
+
+if m:
+limit = int(m.group(1))
+raise P4RequestSizeException(exitCode, result, limit)
+
+raise P4ServerException(exitCode, result)
+else:
+raise P4Exception(exitCode)
+else:
+entry = {}
+entry["p4ExitCode"] = exitCode
+result.append(entry)
 
 return result
 
-- 
2.17.0.392.gdeb1a6e9b7

[PATCHv2 2/6] git-p4: add option to disable syncing of p4/master with p4

2018-06-08 Thread Luke Diamand

Add an option to the git-p4 submit command to disable syncing
with Perforce.

This is useful for the case where a git-p4 mirror has been setup
on a server somewhere, running from (e.g.) cron, and developers
then clone from this. Having the local cloned copy also sync
from Perforce just isn't useful.

Signed-off-by: Luke Diamand 
---
 Documentation/git-p4.txt |  8 
 git-p4.py| 31 ---
 2 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index 3d83842e47..f0de3b891b 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -369,6 +369,11 @@ These options can be used to modify 'git p4 submit' 
behavior.
 Disable the automatic rebase after all commits have been successfully
 submitted. Can also be set with git-p4.disableRebase.
 
+--disable-p4sync::
+Disable the automatic sync of p4/master from Perforce after commits have
+been submitted. Implies --disable-rebase. Can also be set with
+git-p4.disableP4Sync. Sync with origin/master still goes ahead if possible.
+
 Rebase options
 ~~
 These options can be used to modify 'git p4 rebase' behavior.
@@ -693,6 +698,9 @@ git-p4.conflict::
 git-p4.disableRebase::
 Do not rebase the tree against p4/master following a submit.
 
+git-p4.disableP4Sync::
+Do not sync p4/master with Perforce following a submit. Implies 
git-p4.disableRebase.
+
 IMPLEMENTATION DETAILS
 --
 * Changesets from p4 are imported using Git fast-import.
diff --git a/git-p4.py b/git-p4.py
index 5ab9421af8..b61f47cc61 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1368,7 +1368,9 @@ def __init__(self):
  help="submit only the specified 
commit(s), one commit or xxx..xxx"),
 optparse.make_option("--disable-rebase", 
dest="disable_rebase", action="store_true",
  help="Disable rebase after submit is 
completed. Can be useful if you "
- "work from a local git branch that is not 
master")
+ "work from a local git branch that is not 
master"),
+optparse.make_option("--disable-p4sync", 
dest="disable_p4sync", action="store_true",
+ help="Skip Perforce sync of p4/master 
after submit or shelve"),
 ]
 self.description = "Submit changes from git to the perforce depot."
 self.usage += " [name of git branch to submit into perforce depot]"
@@ -1380,6 +1382,7 @@ def __init__(self):
 self.update_shelve = list()
 self.commit = ""
 self.disable_rebase = gitConfigBool("git-p4.disableRebase")
+self.disable_p4sync = gitConfigBool("git-p4.disableP4Sync")
 self.prepare_p4_only = False
 self.conflict_behavior = None
 self.isWindows = (platform.system() == "Windows")
@@ -2240,11 +2243,14 @@ def run(self, args):
 sync = P4Sync()
 if self.branch:
 sync.branch = self.branch
-sync.run([])
+if self.disable_p4sync:
+sync.sync_origin_only()
+else:
+sync.run([])
 
-if self.disable_rebase is False:
-rebase = P4Rebase()
-rebase.rebase()
+if not self.disable_rebase:
+rebase = P4Rebase()
+rebase.rebase()
 
 else:
 if len(applied) == 0:
@@ -3324,6 +3330,14 @@ def importChanges(self, changes, shelved=False, 
origin_revision=0):
 print self.gitError.read()
 sys.exit(1)
 
+def sync_origin_only(self):
+if self.syncWithOrigin:
+self.hasOrigin = originP4BranchesExist()
+if self.hasOrigin:
+if not self.silent:
+print 'Syncing with origin first, using "git fetch origin"'
+system("git fetch origin")
+
 def importHeadRevision(self, revision):
 print "Doing initial import of %s from revision %s into %s" % (' 
'.join(self.depotPaths), revision, self.branch)
 
@@ -3402,12 +3416,7 @@ def run(self, args):
 else:
 self.refPrefix = "refs/heads/p4/"
 
-if self.syncWithOrigin:
-self.hasOrigin = originP4BranchesExist()
-if self.hasOrigin:
-if not self.silent:
-print 'Syncing with origin first, using "git fetch origin"'
-system("git fetch origin")
+self.sync_origin_only()
 
 branch_arg_given = bool(self.branch)
 if len(self.branch) == 0:
-- 
2.17.0.392.gdeb1a6e9b7

[PATCHv2 1/6] git-p4: disable-rebase: allow setting this via configuration

2018-06-08 Thread Luke Diamand

This just lets you set the --disable-rebase option with the
git configuration options git-p4.disableRebase. If you're
using this option, you probably want to set it all the time
for a given repo.

Signed-off-by: Luke Diamand 
---
 Documentation/git-p4.txt | 5 -
 git-p4.py| 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index e8452528fc..3d83842e47 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -367,7 +367,7 @@ These options can be used to modify 'git p4 submit' 
behavior.
 
 --disable-rebase::
 Disable the automatic rebase after all commits have been successfully
-submitted.
+submitted. Can also be set with git-p4.disableRebase.
 
 Rebase options
 ~~
@@ -690,6 +690,9 @@ git-p4.conflict::
Specify submit behavior when a conflict with p4 is found, as per
--conflict.  The default behavior is 'ask'.
 
+git-p4.disableRebase::
+Do not rebase the tree against p4/master following a submit.
+
 IMPLEMENTATION DETAILS
 --
 * Changesets from p4 are imported using Git fast-import.
diff --git a/git-p4.py b/git-p4.py
index c4581d20a6..5ab9421af8 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1379,7 +1379,7 @@ def __init__(self):
 self.shelve = False
 self.update_shelve = list()
 self.commit = ""
-self.disable_rebase = False
+self.disable_rebase = gitConfigBool("git-p4.disableRebase")
 self.prepare_p4_only = False
 self.conflict_behavior = None
 self.isWindows = (platform.system() == "Windows")
-- 
2.17.0.392.gdeb1a6e9b7

[PATCHv2 5/6] git-p4: narrow the scope of exceptions caught when parsing an int

2018-06-08 Thread Luke Diamand

The current code traps all exceptions around some code which parses an
integer, and then talks to Perforce.

That can result in errors from Perforce being ignored. Change the code
to only catch the integer conversion exceptions.

Signed-off-by: Luke Diamand 
---
 git-p4.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-p4.py b/git-p4.py
index f4b5deeb83..5f59feb5bb 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -959,7 +959,7 @@ def p4ChangesForPaths(depotPaths, changeRange, 
requestedBlockSize):
 try:
 (changeStart, changeEnd) = p4ParseNumericChangeRange(parts)
 block_size = chooseBlockSize(requestedBlockSize)
-except:
+except ValueError:
 changeStart = parts[0][1:]
 changeEnd = parts[1]
 if requestedBlockSize:
-- 
2.17.0.392.gdeb1a6e9b7

Re: [PATCH v8 1/2] json_writer: new routines to create data in JSON format

2018-06-08 Thread René Scharfe

Am 07.06.2018 um 16:12 schrieb g...@jeffhostetler.com:
>   Makefile|   2 +
>   json-writer.c   | 419 
>   json-writer.h   | 113 +
>   t/helper/test-json-writer.c | 572 
> 
>   t/t0019-json-writer.sh  | 236 ++
>   5 files changed, 1342 insertions(+)
>   create mode 100644 json-writer.c
>   create mode 100644 json-writer.h
>   create mode 100644 t/helper/test-json-writer.c
>   create mode 100755 t/t0019-json-writer.sh

$ git grep 'static inline' '*json*'
json-writer.c:static inline void indent_pretty(struct json_writer *jw)
json-writer.c:static inline void begin(struct json_writer *jw, char ch_open, 
int pretty)
json-writer.c:static inline void assert_in_object(const struct json_writer *jw, 
const char *key)
json-writer.c:static inline void assert_in_array(const struct json_writer *jw)
json-writer.c:static inline void maybe_add_comma(struct json_writer *jw)
json-writer.c:static inline void fmt_double(struct json_writer *jw, int 
precision,
json-writer.c:static inline void object_common(struct json_writer *jw, const 
char *key)
json-writer.c:static inline void array_common(struct json_writer *jw)
json-writer.c:static inline void assert_is_terminated(const struct json_writer 
*jw)
t/helper/test-json-writer.c:static inline int scripted(int argc, const char 
**argv)
t/helper/test-json-writer.c:static inline int my_usage(void)

Do you need all those inline keywords?  I'd rather leave the decision
about inlining to the compiler and (via optimization flags) the user
as much as possible.  Not a biggie, but the high frequency of that
word made me blink..

René

I NEED YOUR HELP

2018-06-08 Thread Malouf Tuma

Dear Nguyễn,
I am barrister Malouf Tuma, the solicitor and executor of late
(Eng.G.E. Nguyễn,), will" and "Testament"; who died of kidney cancer
with UN-identified family or relative. I am contacting you to stand in
as a next of kin to his deposit of US$11.350million, with one of the
leading bank here in west Africa since you share the same last name
with my late client.
Note that i find it very difficult to send you this letter due to ip
differences therefore; i implore you to create gmail.com, Hotmail.com,
or yahoo.com and contact me through it. Thank you for your
understanding.
Please respond directly to my private email address (malouftum...@gmail.com)
Regards,
Malouf Tuma (Esq.)

Re: [PATCH v8 1/2] json_writer: new routines to create data in JSON format

2018-06-08 Thread Jeff Hostetler





On 6/7/2018 1:24 PM, Eric Sunshine wrote:

On Thu, Jun 7, 2018 at 10:12 AM,   wrote:

Add a series of jw_ routines and "struct json_writer" structure to compose
JSON data.  The resulting string data can then be output by commands wanting
to support a JSON output format.
[...]
Signed-off-by: Jeff Hostetler 
---
diff --git a/t/t0019-json-writer.sh b/t/t0019-json-writer.sh
@@ -0,0 +1,236 @@
+test_expect_success 'simple object' '
+   cat >expect <<-\EOF &&
+   {"a":"abc","b":42,"c":3.14,"d":true,"e":false,"f":null}
+   EOF
+   test-json-writer >actual \
+   @object \
+   @object-string a abc \
+   @object-int b 42 \
+   @object-double c 2 3.140 \
+   @object-true d \
+   @object-false e \
+   @object-null f \
+   @end &&
+   test_cmp expect actual
+'


To make it easier on people writing these tests, it might be nice for
this to be less noisy by getting rid of "@" and "\". To get rid of
"\", the test program could grab its script commands from stdin (one
instruction per line) rather than from argv[]. For instance:

 test-json-writer >actual <<-\EOF &&
 object
 object-string a abc
 ...
 end
 EOF

Not a big deal, and certainly not worth a re-roll.



I hadn't thought about doing it that way.  Might be a little easier
to use.  Let me take a look and see if it would be much work to switch.

Thanks
Jeff

Re: [PATCH v8 1/2] json_writer: new routines to create data in JSON format

2018-06-08 Thread René Scharfe

Am 07.06.2018 um 16:12 schrieb g...@jeffhostetler.com:
> From: Jeff Hostetler 
> +/*
> + * Add comma if we have already seen a member at this level.
> + */
> +static inline void maybe_add_comma(struct json_writer *jw)
> +{
> + if (!jw->open_stack.len)
> + return;

This is impossible.  maybe_add_comma() is only called directly after
assert_in_object() and assert_in_object(), which abort if open_stack 
is empty.

> + if (jw->first_stack.buf[jw->first_stack.len - 1] == '1')
> + jw->first_stack.buf[jw->first_stack.len - 1] = '0';
> + else
> + strbuf_addch(>json, ',');

You only need a binary flag to indicate if a comma is needed, not a
stack.  We need a comma at the current level if and only if we already
wrote a child item.  If we finish a level then we do need a comma at the
previous level because we just wrote a sub-object or sub-array there.
And that should cover all cases.  Am I missing something?

I get a sense of déjà vu, by the way. :)

Here's what I mean:
---
 json-writer.c | 17 ++---
 json-writer.h | 13 +
 2 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/json-writer.c b/json-writer.c
index f35ce1912a..14a4e89188 100644
--- a/json-writer.c
+++ b/json-writer.c
@@ -5,7 +5,7 @@ void jw_init(struct json_writer *jw)
 {
strbuf_reset(>json);
strbuf_reset(>open_stack);
-   strbuf_reset(>first_stack);
+   jw->need_comma = 0;
jw->pretty = 0;
 }
 
@@ -13,7 +13,6 @@ void jw_release(struct json_writer *jw)
 {
strbuf_release(>json);
strbuf_release(>open_stack);
-   strbuf_release(>first_stack);
 }
 
 /*
@@ -69,7 +68,7 @@ static inline void begin(struct json_writer *jw, char 
ch_open, int pretty)
strbuf_addch(>json, ch_open);
 
strbuf_addch(>open_stack, ch_open);
-   strbuf_addch(>first_stack, '1');
+   jw->need_comma = 0;
 }
 
 /*
@@ -99,12 +98,10 @@ static inline void assert_in_array(const struct json_writer 
*jw)
  */
 static inline void maybe_add_comma(struct json_writer *jw)
 {
-   if (!jw->open_stack.len)
-   return;
-   if (jw->first_stack.buf[jw->first_stack.len - 1] == '1')
-   jw->first_stack.buf[jw->first_stack.len - 1] = '0';
-   else
+   if (jw->need_comma)
strbuf_addch(>json, ',');
+   else
+   jw->need_comma = 1;
 }
 
 static inline void fmt_double(struct json_writer *jw, int precision,
@@ -397,8 +394,6 @@ void jw_end(struct json_writer *jw)
char ch_open;
int len;
 
-   if (jw->open_stack.len != jw->first_stack.len)
-   BUG("jwon-writer: open_ and first_ stacks out of sync");
if (!jw->open_stack.len)
BUG("json-writer: too many jw_end(): '%s'", jw->json.buf);
 
@@ -406,7 +401,7 @@ void jw_end(struct json_writer *jw)
ch_open = jw->open_stack.buf[len];
 
strbuf_setlen(>open_stack, len);
-   strbuf_setlen(>first_stack, len);
+   jw->need_comma = 1;
 
if (jw->pretty)
strbuf_addch(>json, '\n');
diff --git a/json-writer.h b/json-writer.h
index af9c2612f8..c437462ba8 100644
--- a/json-writer.h
+++ b/json-writer.h
@@ -59,18 +59,15 @@ struct json_writer
struct strbuf open_stack;
 
/*
-* Another simple stack of the currently open array and object
-* forms.  This is used in parallel to "open_stack" (same length).
-* It contains a string of '1' and '0' where '1' indicates that
-* the next child-element seen will be the first child (and does
-* not need a leading comma).
+* Indicates if the next child item needs a leading comma.
+* Only the first item of arrays and objects doesn't need one.
 */
-   struct strbuf first_stack;
+   unsigned int need_comma:1;
 
-   int pretty;
+   unsigned int pretty:1;
 };
 
-#define JSON_WRITER_INIT { STRBUF_INIT, STRBUF_INIT, STRBUF_INIT, 0 }
+#define JSON_WRITER_INIT { STRBUF_INIT, STRBUF_INIT, 0, 0 }
 
 void jw_init(struct json_writer *jw);
 void jw_release(struct json_writer *jw);
-- 
2.17.1

[PATCH] docs/git-blame: explain carets on boundary commits

2018-06-08 Thread Stephen Kitt

The caret on boundary commits is only mentioned in the description of
--abbrev; this patch adds a description of the behaviour in the
paragraph describing how revision range specifiers are handled.

Signed-off-by: Stephen Kitt 
---
 Documentation/git-blame.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt
index 16323eb80..7f814dcef 100644
--- a/Documentation/git-blame.txt
+++ b/Documentation/git-blame.txt
@@ -163,7 +163,8 @@ When revision range specifiers are used to limit the 
annotation,
 lines that have not changed since the range boundary (either the
 commit v2.6.18 or the most recent commit that is more than 3
 weeks old in the above example) are blamed for that range
-boundary commit.
+boundary commit.  The sha1 is preceded by a caret, '^' to indicate
+this (and its length is reduced by 1).
 
 A particularly useful way is to see if an added file has lines
 created by copy-and-paste from existing files.  Sometimes this
-- 
2.11.0

Re: GDPR compliance best practices?

2018-06-08 Thread David Lang


On Fri, 8 Jun 2018, Peter Backes wrote:


On Fri, Jun 08, 2018 at 12:42:54AM -0700, David Lang wrote:

Wrong, if you have to delete info, you are not allowed to keep a private
copy.


Yes you are allowed. See Art. 17 (3) lit e GDPR.


There is _nothing_ in the GDPR about publishing information,
everything in it is about what you are allowed to store privately, how you
are required to protect it (or more precisely, what you are required to do
if private data gets hacked), and how you are required to keep it available.


Nope, the GDPR is not at all restricted to private copies.


If the GDPR doesn't restrict private copies, then Google and Facebook are free 
to keep all data about everyone. That is explicitly what the GDPR is trying to 
prevent.



The GDPR has special jargon for publishing; the GDPR calls it
"disclosure (Art. 4 (2) GDPR) to an unspecified number of unspecified
recipients (Art. 4 (9) GDPR), including ones in third countries
(Chapter 5) in a repetitive (Art 49 (1) GDPR) fashion".


disclosure is what the person who submits the patch is doing, torturing the 
language of the GDPR to say that hanging on to data that people want you to 
delete is legal, and echoing public data that people have asked to be public is 
not legal is not going to be a successful line of argument, it's the exact 
opposite of the stated goals of the GDPR.


David Lang

Truncating file names with Unicode characters

2018-06-08 Thread Vitaly Potyarkin

# Truncating file names with Unicode characters

When shortening file names that contain Unicode characters, git performs
truncation without awareness of two-byte characters. That often leads to
splitting a character in half and displaying a garbage byte that's left.

Unawareness of Unicode also means that filename length is calculated incorrectly
and some output gets misaligned.

I have tested this with git 2.14.1 on Windows and with git 2.11.0 on Linux. My
configuration includes setting `core.quotepath = off` to display Unicode paths.

# Example: `git log --stat`

## Bad output: half-characters and wrong text alignment

The last file name gets truncated in the middle of the character (`ˆ` is
what's left of it). Text alignment is off because string lengths are calculated
in bytes instead of characters.

Extension/README.md|  28 +
.../Catalog.Номенклатура.xml   |  32 ++
.../Configuration.xml  |   5 +-
...етПереработчика.ObjectModule.txt |  39 
...cument.ОтчетПереработчика.xml |  68 +
.../Enum.СтавкиНДС.xml|  24 
...ˆирениеERPПотяркин_2018-06-05.cfe | Bin 0 -> 22018 bytes
7 files changed, 195 insertions(+), 1 deletion(-)

## Good output with ASCII file names

Truncation and alignment are done right because each character is represented
by a single byte.

.../index.html | 14
++
docs/posts/2017/loops-in-power-query-m-language/index.html | 14
++
.../index.html |  7 +++
.../temporary-virtual-environment-for-python/index.html| 14
++
.../index.html | 14
++
docs/posts/2018/getting-started-with-libpq/index.html  | 14
++
.../index.html | 14
++
.../2018/unit-testing-in-power-query-m-language/index.html |  7 +++
8 files changed, 98 insertions(+)

Re: is there a canonical doc about how to deal with whitespace issues?

2018-06-08 Thread Derrick Stolee


On 6/8/2018 9:18 AM, Robert P. J. Day wrote:

   for one of my courses, i wanted to write a section about the various
techniques for dealing with whitespace issues in git, so i started
making a list, things like:

   - running "git diff --check"
   - "git commit --cleanup=" possibilities
   - config options like core.{eol,safecrlf,autocrlf}
   - i'm sure there are client-side hooks that can be mentioned

etc, etc.

   has anyone ever written a doc that collects these things in one
place? if not, i guess i have to write one.

rday



I don't know of a doc for whitespace issues, but the contributing guide 
on GitForWindows [1] recommends `git rebase --whitespace=fix`.


Thanks,
-Stolee

[1] 
https://github.com/git-for-windows/git/blob/master/CONTRIBUTING.md#polish-your-commits

Re: GDPR compliance best practices?

2018-06-08 Thread Peter Backes

On Fri, Jun 08, 2018 at 10:45:51AM -0400, Theodore Y. Ts'o wrote:
> *Anyone* can run a repository.  It's not just github and gitlab.  The
> hobbiest in New Zealand, who might never visit Europe (so she can't
> be arrested when she visits the fair shores of Europe) and who has no
> business interests in Europe, can host such a web site.

Just because letters of request are hardly enforced doesn't make it 
legal to break the GDPR. For sure, a hobbyist would not have much to 
fear, even if he is violating the GDPR and coming to Europe. The GDPR 
is mostly about taming the megacorporations, not about arresting 
tourists.

> So the person trying to engage in censorship

Censorship? The GDPR is not about censorship.

If you want to write an opionion about someone by name, the GDPR gives 
you all legitimization to do so, against that person's will.

This is about removing the data under ordinary circumstances.

> would need to contact *everyone*.

This is the subject's problem, not the repository provider's.

> And someone who has a git note in their private repo who
> then pushes to github/gitlab would end up pushing that note back up to
> the web server.

If that note has been deleted based on the right to be forgotten, you 
as the repository provider have to make sure you don't publish it 
again. Since you are allowed to keep a private copy, ensuring that 
shouldn't be a problem for you. 

> Great, so you can get github and gitlab to get rid of the information.
> But it's *pointless*.

It's up to the subject to consider it pointless or not to exercise his 
rights...

> Your problem is in the word: "a"

...and against whom, whether one repository provider, the major ones, 
all of them he can find.

Best wishes
Peter

-- 
Peter Backes, r...@helen.plasma.xg8.de

Re: [RFC PATCH v1] telemetry design overview (part 1)

2018-06-08 Thread Thomas Braun

Am 08.06.2018 um 11:07 schrieb Jeff King:
> On Thu, Jun 07, 2018 at 11:10:52PM +0200, Johannes Sixt wrote:
> 
>> Am 07.06.2018 um 16:53 schrieb g...@jeffhostetler.com:
>>> From: Jeff Hostetler 
>>>
>>> I've been working to add code to Git to optionally collect telemetry data.
>>> The goal is to be able to collect performance data from Git commands and
>>> allow it to be aggregated over a user community to find "slow commands".
>>
>> Seriously? "add code to collect telemetry data" said by somebody whose email
>> address ends with @microsoft.com is very irritating. I really don't want to
>> have yet another switch that I must check after every update that it is
>> still off.
> 
> If you look at the design document, it's off by default and would write
> to a file on the filesystem. That doesn't seem all that different from
> GIT_TRACE.

The patch also includes the following part

+telemetry.plugin
+
+
+If the config setting "telemetry.plugin" contains the pathname to a shared
+library, the library will be dynamically loaded during start up and events
+will be sent to it using the plugin API.
+
+This plugin model allows an organization to define a custom or private
+telemetry solution while using a stock version of Git.
+
+For example, on Windows, it allows telemetry events to go directly to the
+kernel via the plugin using the high performance Event Tracing for Windows
+(ETW) facility.
+
+The contrib/telemetry-plugin-examples directory contains two example
+plugins:
+ * A trivial log to stderr
+ * A trivial ETW writer

which is not a file but, if enabled, some windows internal thingie where the 
data is gone/duplicated/sent out/whatever.

I for my part would much rather prefer that to be a compile time option so that 
I don't need to check on every git update on windows if this is now enabled or 
not.

Re: [PATCH v4 04/23] unpack-tress: convert clear_ce_flags* to avoid the_index

2018-06-08 Thread Duy Nguyen

On Thu, Jun 7, 2018 at 5:11 PM Duy Nguyen  wrote:
>
> On Thu, Jun 7, 2018 at 9:41 AM, Elijah Newren  wrote:
> > Subject line: unpack-trees rather than unpack-tress.
> >
> >
> >
> > On Wed, Jun 6, 2018 at 9:49 AM, Nguyễn Thái Ngọc Duy  
> > wrote:
> >> Prior to fba92be8f7, this code implicitly (and incorrectly) assumes
> >> the_index when running the exclude machinery. fba92be8f7 helps show
> >> this problem clearer because unpack-trees operation is supposed to
> >> work on whatever index the caller specifies... not specifically
> >> the_index.
> >>
> >> Update the code to use "istate" argument that's originally from
> >> mark_new_skip_worktree(). From the call sites, both in unpack_trees(),
> >> you can see that this function works on two separate indexes:
> >> o->src_index and o->result. The second mark_new_skip_worktree() so far
> >> has incorecctly applied exclude rules on o->src_index instead of
> >> o->result. It's unclear what is the consequences of this, but it's
> >> definitely wrong.
> >>
> >> [1] fba92be8f7 (dir: convert is_excluded_from_list to take an index -
> >> 2017-05-05)
> >>
> >> Signed-off-by: Nguyễn Thái Ngọc Duy 
> >
> > A somewhat curious finding: when I was rebuilding and re-testing all
> > 23 patches, I got a failure on this patch in test 31 of
> > t7063-status-untracked-cache.sh. I did not get any test failures with
> > any of the other patches.  However, after re-running that test or the
> > whole suite half a dozen times with just up to this patch applied, I
> > was not able to trigger the failure again.  Is there a rare race in
> > that testcase?
>
> Untracked cache tests are very time-sensitive. I'll try to run and
> re-run them a couple times to understand more. Thanks for pointing it
> out.

after hours of running tests, either with full 23 patches or just the
first 4, and failing to catch the failure, i declare (or more like,
pray) that you ran into a crack of time that led to a race. I'll take
no action on this, but I'll remember this and watch out for untracked
cache related mails for some time.
-- 
Duy

Re: [RFC PATCH v1] telemetry design overview (part 1)

2018-06-08 Thread Duy Nguyen

On Fri, Jun 8, 2018 at 11:40 AM, Ævar Arnfjörð Bjarmason
 wrote:
>
> On Thu, Jun 07 2018, Johannes Sixt wrote:
>
>> Am 07.06.2018 um 16:53 schrieb g...@jeffhostetler.com:
>>> From: Jeff Hostetler 
>>>
>>> I've been working to add code to Git to optionally collect telemetry data.
>>> The goal is to be able to collect performance data from Git commands and
>>> allow it to be aggregated over a user community to find "slow commands".
>>
>> Seriously? "add code to collect telemetry data" said by somebody whose
>> email address ends with @microsoft.com is very irritating. I really
>> don't want to have yet another switch that I must check after every
>> update that it is still off.
>
> To elaborate a bit on Jeff's reply (since this was discussed in more
> detail at Git Merge this year), the point of this feature is not to ship
> git.git with some default telemetry, but so that in-house users of git
> like Microsoft, Dropbox, Booking.com etc. can build & configure their
> internal versions of git to turn on telemetry for their own users.
>
> There's numerous in-house monkeypatches to git on various sites (at
> least Microsoft & Dropbox reported having internal patches already).
> Something like this facility would allow us to agree on some
> implementation that could be shipped by default (but would be off by
> default), those of us who'd make use of this feature already have "root"
> on those user's machines, and control what git binary they use etc,
> their /etc/gitconfig and so on.

This elaboration should have been in the commit message of the first
patch (and I hope it will be in v2). You guys knew because it was
discussed at Git Merge but the rest of us may not.. It would be much
more convincing to have something like this spelled out.
-- 
Duy

BUG: Merge commits are displayed by `--cherry-pick` despite on they introduce same change

2018-06-08 Thread KES

The `--cherry-pick` option states:

>Omit any commit that introduces the same change as another commit on the 
>“other side”


For next git command I see next tree:

$ git log --graph --decorate --pretty=oneline --abbrev-commit --cherry-mark 
--boundary --left-right

<   bc2820d (openapi3) Merge branch 'openapi3_exception_handling' into 
openapi3
|\
| = 4e1e0aa Testcase for not_found/exception requests to api
| = b39673d 'not_found' templates should not be probed automatically. Only 
when fallback
| = a8677df Dump empty schemas and non empty data
| = 27db48f Implemente before_render hook
|/
<   aa60f0a Merge branch 'hide_temporal_interface' into openapi3
|\
| = 5dd02f2 Mutate current object after update
| = d58c0ab Install missed modules
|/
| >   b8d09ce (xtucha/openapi3) Merge branch 'openapi3_exception_handling' 
into openapi3
| |\
| | = b6362ad Testcase for not_found/exception requests to api
| | = dc926cc 'not_found' templates should not be probed automatically. 
Only when fallback
| | = 55dc88d Dump empty schemas and non empty data
| | = 8369185 Implemente before_render hook
| |/
| >   c03438d Merge branch 'hide_temporal_interface' into openapi3
| |\
| | = d534cc4 Mutate current object after update
| | = 1b6af27 Install modules required by MonkeyMan
| |/
| > a3b7230 Clarify schema
|/
o f5b06ed Assign some defaults

If I want to see only new commit I add `--left-only` option and get merge 
commits in output.


<   bc2820d (openapi3) Merge branch 'openapi3_exception_handling' into 
<   aa60f0a Merge branch 'hide_temporal_interface' into openapi3

This commits introduce same change as another merge commit on the "other side"
Thus they should not be displayed

Re: [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc'

2018-06-08 Thread Derrick Stolee


On 6/8/2018 11:05 AM, Jakub Narębski wrote:

On Fri, 8 Jun 2018 at 15:56, Derrick Stolee  wrote:


[..], the following
diff occurs from the previous patch:

[...]

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index b24e8b6689..9a0661983c 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -33,8 +33,8 @@ test_expect_success 'create commits and repack' '
  '

  graph_git_two_modes() {
-   git -c core.commitGraph=true $1 >output
-   git -c core.commitGraph=false $1 >expect
+   git -c core.graph=true $1 >output
+   git -c core.graph=false $1 >expect
 test_cmp output expect
  }

It seems that you have accidentally removed the fix from previous version.
It needs to be core.commitGraph, not core.graph.




I didn't rebase the fix that I sent as a separate patch [1] (we want 
that change applied to 'master' while this one targets topics in 'next' 
and 'pu'). So this specific diff is unfortunate noise.


Thanks!
-Stolee

[1] 
https://public-inbox.org/git/20180604123906.136417-1-dsto...@microsoft.com/

    [PATCH] t5318-commit-graph.sh: use core.commitGraph

Re: [PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc'

2018-06-08 Thread Jakub Narębski

On Fri, 8 Jun 2018 at 15:56, Derrick Stolee  wrote:

> [..], the following
> diff occurs from the previous patch:
[...]
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index b24e8b6689..9a0661983c 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -33,8 +33,8 @@ test_expect_success 'create commits and repack' '
>  '
>
>  graph_git_two_modes() {
> -   git -c core.commitGraph=true $1 >output
> -   git -c core.commitGraph=false $1 >expect
> +   git -c core.graph=true $1 >output
> +   git -c core.graph=false $1 >expect
> test_cmp output expect
>  }

It seems that you have accidentally removed the fix from previous version.
It needs to be core.commitGraph, not core.graph.

Best,
-- 
Jakub Narębski

Re: GDPR compliance best practices?

2018-06-08 Thread Theodore Y. Ts'o

On Fri, Jun 08, 2018 at 08:26:57AM +0200, Peter Backes wrote:
> 
> If you run a website where the world can access a repository, you are 
> responsible for obeying the GDPR with respect to that repository. If 
> you receive a request to be forgotten, you have to make sure you stop 
> publishing that author's identity as part of the repository.
>

*Anyone* can run a repository.  It's not just github and gitlab.  The
hobbiest in New Zealand, who might never visit Europe (so she can't
be arrested when she visits the fair shores of Europe) and who has no
business interests in Europe, can host such a web site.

So the person trying to engage in censorship would need to contact
*everyone*.  And someone who has a git note in their private repo who
then pushes to github/gitlab would end up pushing that note back up to
the web server.

> You do NOT need to
> 
> - delete it from a private copy you have
> - care about others who publish that data
> - or even make sure the data is deleted from private copies others may 
> have, even if the number of copies is in the thousands.

Great, so you can get github and gitlab to get rid of the information.
But it's *pointless*.  And given that real developers really do care
about who authored a patch, and regularly will do operations that
reference the authorship information, the fact that it is stored
somewhere else (e.g., in a git note, per your proposal), *will* slow
down those operations.

> In practical terms, if someone wishes to exercise his right to be 
> forgotten, he will usually send the request to the maintainer and stop 
> him from distributing the information, and perhaps to a third party he 
> might use as a platform for publication, such as github.

Your problem is in the word: "a"

- Ted

[PATCH v6 13/21] commit-graph: verify generation number

2018-06-08 Thread Derrick Stolee

While iterating through the commit parents, perform the generation
number calculation and compare against the value stored in the
commit-graph.

The tests demonstrate that having a different set of parents affects
the generation number calculation, and this value propagates to
descendants. Hence, we drop the single-line condition on the output.

Since Git will ship with the commit-graph feature without generation
numbers, we need to accept commit-graphs with all generation numbers
equal to zero. In this case, ignore the generation number calculation.

However, verify that we should never have a mix of zero and non-zero
generation numbers. Create a test that sets one commit to generation
zero and all following commits report a failure as they have non-zero
generation in a file that contains generation number zero.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 34 ++
 t/t5318-commit-graph.sh | 11 +++
 2 files changed, 45 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 6d8d774eb0..e0f71658da 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -846,10 +846,14 @@ static void graph_report(const char *fmt, ...)
va_end(ap);
 }
 
+#define GENERATION_ZERO_EXISTS 1
+#define GENERATION_NUMBER_EXISTS 2
+
 int verify_commit_graph(struct repository *r, struct commit_graph *g)
 {
uint32_t i, cur_fanout_pos = 0;
struct object_id prev_oid, cur_oid;
+   int generation_zero = 0;
 
if (!g) {
graph_report("no commit-graph file loaded");
@@ -911,6 +915,7 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
for (i = 0; i < g->num_commits; i++) {
struct commit *graph_commit, *odb_commit;
struct commit_list *graph_parents, *odb_parents;
+   uint32_t max_generation = 0;
 
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
@@ -945,6 +950,9 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
 
oid_to_hex(_parents->item->object.oid),
 
oid_to_hex(_parents->item->object.oid));
 
+   if (graph_parents->item->generation > max_generation)
+   max_generation = 
graph_parents->item->generation;
+
graph_parents = graph_parents->next;
odb_parents = odb_parents->next;
}
@@ -952,6 +960,32 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
if (odb_parents != NULL)
graph_report("commit-graph parent list for commit %s 
terminates early",
 oid_to_hex(_oid));
+
+   if (!graph_commit->generation) {
+   if (generation_zero == GENERATION_NUMBER_EXISTS)
+   graph_report("commit-graph has generation 
number zero for commit %s, but non-zero elsewhere",
+oid_to_hex(_oid));
+   generation_zero = GENERATION_ZERO_EXISTS;
+   } else if (generation_zero == GENERATION_ZERO_EXISTS)
+   graph_report("commit-graph has non-zero generation 
number for commit %s, but zero elsewhere",
+oid_to_hex(_oid));
+
+   if (generation_zero == GENERATION_ZERO_EXISTS)
+   continue;
+
+   /*
+* If one of our parents has generation GENERATION_NUMBER_MAX, 
then
+* our generation is also GENERATION_NUMBER_MAX. Decrement to 
avoid
+* extra logic in the following condition.
+*/
+   if (max_generation == GENERATION_NUMBER_MAX)
+   max_generation--;
+
+   if (graph_commit->generation != max_generation + 1)
+   graph_report("commit-graph generation for commit %s is 
%u != %u",
+oid_to_hex(_oid),
+graph_commit->generation,
+max_generation + 1);
}
 
return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 9a3481c30f..5b75c4dca2 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -272,6 +272,7 @@ GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
 GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
 GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
+GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -366,4 +367,14 @@ test_expect_success 'detect wrong parent' '

[PATCH v6 04/21] commit: force commit to parse from object database

2018-06-08 Thread Derrick Stolee

In anticipation of verifying commit-graph file contents against the
object database, create parse_commit_internal() to allow side-stepping
the commit-graph file and parse directly from the object database.

Due to the use of generation numbers, this method should not be called
unless the intention is explicit in avoiding commits from the
commit-graph file.

Signed-off-by: Derrick Stolee 
---
 commit.c | 9 +++--
 commit.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/commit.c b/commit.c
index d53dc16d72..720c6acddf 100644
--- a/commit.c
+++ b/commit.c
@@ -418,7 +418,7 @@ int parse_commit_buffer(struct commit *item, const void 
*buffer, unsigned long s
return 0;
 }
 
-int parse_commit_gently(struct commit *item, int quiet_on_missing)
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph)
 {
enum object_type type;
void *buffer;
@@ -429,7 +429,7 @@ int parse_commit_gently(struct commit *item, int 
quiet_on_missing)
return -1;
if (item->object.parsed)
return 0;
-   if (parse_commit_in_graph(item))
+   if (use_commit_graph && parse_commit_in_graph(item))
return 0;
buffer = read_object_file(>object.oid, , );
if (!buffer)
@@ -450,6 +450,11 @@ int parse_commit_gently(struct commit *item, int 
quiet_on_missing)
return ret;
 }
 
+int parse_commit_gently(struct commit *item, int quiet_on_missing)
+{
+   return parse_commit_internal(item, quiet_on_missing, 1);
+}
+
 void parse_commit_or_die(struct commit *item)
 {
if (parse_commit(item))
diff --git a/commit.h b/commit.h
index 3ad07c2e3d..7e0f273720 100644
--- a/commit.h
+++ b/commit.h
@@ -77,6 +77,7 @@ struct commit *lookup_commit_reference_by_name(const char 
*name);
 struct commit *lookup_commit_or_die(const struct object_id *oid, const char 
*ref_name);
 
 int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long 
size, int check_graph);
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph);
 int parse_commit_gently(struct commit *item, int quiet_on_missing);
 static inline int parse_commit(struct commit *item)
 {
-- 
2.18.0.rc1

[PATCH v6 01/21] commit-graph: UNLEAK before die()

2018-06-08 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 builtin/commit-graph.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 37420ae0fd..f0875b8bf3 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -51,8 +51,11 @@ static int graph_read(int argc, const char **argv)
graph_name = get_commit_graph_filename(opts.obj_dir);
graph = load_commit_graph_one(graph_name);
 
-   if (!graph)
+   if (!graph) {
+   UNLEAK(graph_name);
die("graph file %s does not exist", graph_name);
+   }
+
FREE_AND_NULL(graph_name);
 
printf("header: %08x %d %d %d %d\n",
-- 
2.18.0.rc1

[PATCH v6 11/21] commit-graph: verify root tree OIDs

2018-06-08 Thread Derrick Stolee

The 'verify' subcommand must compare the commit content parsed from the
commit-graph against the content in the object database. Use
lookup_commit() and parse_commit_in_graph_one() to parse the commits
from the graph and compare against a commit that is loaded separately
and parsed directly from the object database.

Add checks for the root tree OID.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 17 -
 t/t5318-commit-graph.sh |  7 +++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 00e89b71e9..5df18394f9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -866,6 +866,8 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
return verify_commit_graph_error;
 
for (i = 0; i < g->num_commits; i++) {
+   struct commit *graph_commit;
+
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
if (i && oidcmp(_oid, _oid) >= 0)
@@ -883,6 +885,11 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
 
cur_fanout_pos++;
}
+
+   graph_commit = lookup_commit(_oid);
+   if (!parse_commit_in_graph_one(g, graph_commit))
+   graph_report("failed to parse %s from commit-graph",
+oid_to_hex(_oid));
}
 
while (cur_fanout_pos < 256) {
@@ -899,16 +906,24 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
return verify_commit_graph_error;
 
for (i = 0; i < g->num_commits; i++) {
-   struct commit *odb_commit;
+   struct commit *graph_commit, *odb_commit;
 
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
+   graph_commit = lookup_commit(_oid);
odb_commit = (struct commit *)create_object(r, cur_oid.hash, 
alloc_commit_node(r));
if (parse_commit_internal(odb_commit, 0, 0)) {
graph_report("failed to parse %s from object database",
 oid_to_hex(_oid));
continue;
}
+
+   if (oidcmp(_commit_tree_in_graph_one(g, 
graph_commit)->object.oid,
+  get_commit_tree_oid(odb_commit)))
+   graph_report("root tree OID for commit %s in 
commit-graph is %s != %s",
+oid_to_hex(_oid),
+
oid_to_hex(get_commit_tree_oid(graph_commit)),
+
oid_to_hex(get_commit_tree_oid(odb_commit)));
}
 
return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index af5e34c0cb..2b9214bc83 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -267,6 +267,8 @@ GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255))
 GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256))
 GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
 GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 
10))
+GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 
$NUM_COMMITS))
+GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -341,4 +343,9 @@ test_expect_success 'detect OID not in object database' '
"from object database"
 '
 
+test_expect_success 'detect incorrect tree OID' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_TREE "\01" \
+   "root tree OID for commit"
+'
+
 test_done
-- 
2.18.0.rc1

[PATCH v6 14/21] commit-graph: verify commit date

2018-06-08 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 6 ++
 t/t5318-commit-graph.sh | 6 ++
 2 files changed, 12 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index e0f71658da..6d6c6beff9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -986,6 +986,12 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
 oid_to_hex(_oid),
 graph_commit->generation,
 max_generation + 1);
+
+   if (graph_commit->date != odb_commit->date)
+   graph_report("commit date for commit %s in commit-graph 
is %"PRItime" != %"PRItime,
+oid_to_hex(_oid),
+graph_commit->date,
+odb_commit->date);
}
 
return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 5b75c4dca2..b7b4410e75 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -273,6 +273,7 @@ GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + 
$HASH_LEN))
 GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
+GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -377,4 +378,9 @@ test_expect_success 'detect incorrect generation number' '
"non-zero generation number"
 '
 
+test_expect_success 'detect incorrect commit date' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATE "\01" \
+   "commit date"
+'
+
 test_done
-- 
2.18.0.rc1

[PATCH v6 07/21] commit-graph: verify catches corrupt signature

2018-06-08 Thread Derrick Stolee

This is the first of several commits that add a test to check that
'git commit-graph verify' catches corruption in the commit-graph
file. The first test checks that the command catches an error in
the file signature. This is a check that exists in the existing
commit-graph reading code.

Add a helper method 'corrupt_graph_and_verify' to the test script
t5318-commit-graph.sh. This helper corrupts the commit-graph file
at a certain location, runs 'git commit-graph verify', and reports
the output to the 'err' file. This data is filtered to remove the
lines added by 'test_must_fail' when the test is run verbosely.
Then, the output is checked to contain a specific error message.

Most messages from 'git commit-graph verify' will not be marked
for translation. There will be one exception: the message that
reports an invalid checksum will be marked for translation, as that
is the only message that is intended for a typical user.

Helped-by: Szeder Gábor 
Signed-off-by: Derrick Stolee 
---
 t/t5318-commit-graph.sh | 43 +
 1 file changed, 43 insertions(+)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 6ca451dfd2..8f96e2636c 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -235,9 +235,52 @@ test_expect_success 'perform fast-forward merge in full 
repo' '
test_cmp expect output
 '
 
+# the verify tests below expect the commit-graph to contain
+# exactly the commits reachable from the commits/8 branch.
+# If the file changes the set of commits in the list, then the
+# offsets into the binary file will result in different edits
+# and the tests will likely break.
+
 test_expect_success 'git commit-graph verify' '
cd "$TRASH_DIRECTORY/full" &&
+   git rev-parse commits/8 | git commit-graph write --stdin-commits &&
git commit-graph verify >output
 '
 
+GRAPH_BYTE_VERSION=4
+GRAPH_BYTE_HASH=5
+
+# usage: corrupt_graph_and_verify   
+# Manipulates the commit-graph file at the position
+# by inserting the data, then runs 'git commit-graph verify'
+# and places the output in the file 'err'. Test 'err' for
+# the given string.
+corrupt_graph_and_verify() {
+   pos=$1
+   data="${2:-\0}"
+   grepstr=$3
+   cd "$TRASH_DIRECTORY/full" &&
+   test_when_finished mv commit-graph-backup $objdir/info/commit-graph &&
+   cp $objdir/info/commit-graph commit-graph-backup &&
+   printf "$data" | dd of="$objdir/info/commit-graph" bs=1 seek="$pos" 
conv=notrunc &&
+   test_must_fail git commit-graph verify 2>test_err &&
+   grep -v "^+" test_err >err
+   test_i18ngrep "$grepstr" err
+}
+
+test_expect_success 'detect bad signature' '
+   corrupt_graph_and_verify 0 "\0" \
+   "graph signature"
+'
+
+test_expect_success 'detect bad version' '
+   corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
+   "graph version"
+'
+
+test_expect_success 'detect bad hash version' '
+   corrupt_graph_and_verify $GRAPH_BYTE_HASH "\02" \
+   "hash version"
+'
+
 test_done
-- 
2.18.0.rc1

[PATCH v6 08/21] commit-graph: verify required chunks are present

2018-06-08 Thread Derrick Stolee

The commit-graph file requires the following three chunks:

* OID Fanout
* OID Lookup
* Commit Data

If any of these are missing, then the 'verify' subcommand should
report a failure. This includes the chunk IDs malformed or the
chunk count is truncated.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  |  9 +
 t/t5318-commit-graph.sh | 29 +
 2 files changed, 38 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 22ef696e18..f30b4ccee9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -848,5 +848,14 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
return 1;
}
 
+   verify_commit_graph_error = 0;
+
+   if (!g->chunk_oid_fanout)
+   graph_report("commit-graph is missing the OID Fanout chunk");
+   if (!g->chunk_oid_lookup)
+   graph_report("commit-graph is missing the OID Lookup chunk");
+   if (!g->chunk_commit_data)
+   graph_report("commit-graph is missing the Commit Data chunk");
+
return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 8f96e2636c..c03792a8ed 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -249,6 +249,15 @@ test_expect_success 'git commit-graph verify' '
 
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
+GRAPH_BYTE_CHUNK_COUNT=6
+GRAPH_CHUNK_LOOKUP_OFFSET=8
+GRAPH_CHUNK_LOOKUP_WIDTH=12
+GRAPH_CHUNK_LOOKUP_ROWS=5
+GRAPH_BYTE_OID_FANOUT_ID=$GRAPH_CHUNK_LOOKUP_OFFSET
+GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+   1 * $GRAPH_CHUNK_LOOKUP_WIDTH))
+GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+2 * $GRAPH_CHUNK_LOOKUP_WIDTH))
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -283,4 +292,24 @@ test_expect_success 'detect bad hash version' '
"hash version"
 '
 
+test_expect_success 'detect low chunk count' '
+   corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\02" \
+   "missing the .* chunk"
+'
+
+test_expect_success 'detect missing OID fanout chunk' '
+   corrupt_graph_and_verify $GRAPH_BYTE_OID_FANOUT_ID "\0" \
+   "missing the OID Fanout chunk"
+'
+
+test_expect_success 'detect missing OID lookup chunk' '
+   corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ID "\0" \
+   "missing the OID Lookup chunk"
+'
+
+test_expect_success 'detect missing commit data chunk' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATA_ID "\0" \
+   "missing the Commit Data chunk"
+'
+
 test_done
-- 
2.18.0.rc1

[PATCH v6 19/21] commit-graph: add '--reachable' option

2018-06-08 Thread Derrick Stolee

When writing commit-graph files, it can be convenient to ask for all
reachable commits (starting at the ref set) in the resulting file. This
is particularly helpful when writing to stdin is complicated, such as a
future integration with 'git gc'.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  8 ++--
 builtin/commit-graph.c | 16 
 commit-graph.c | 18 ++
 commit-graph.h |  1 +
 t/t5318-commit-graph.sh| 10 ++
 5 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index a222cfab08..dececb79d7 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -38,12 +38,16 @@ Write a commit graph file based on the commits found in 
packfiles.
 +
 With the `--stdin-packs` option, generate the new commit graph by
 walking objects only in the specified pack-indexes. (Cannot be combined
-with --stdin-commits.)
+with `--stdin-commits` or `--reachable`.)
 +
 With the `--stdin-commits` option, generate the new commit graph by
 walking commits starting at the commits specified in stdin as a list
 of OIDs in hex, one OID per line. (Cannot be combined with
---stdin-packs.)
+`--stdin-packs` or `--reachable`.)
++
+With the `--reachable` option, generate the new commit graph by walking
+commits starting at all refs. (Cannot be combined with `--stdin-commits`
+or `--stdin-packs`.)
 +
 With the `--append` option, include all commits that are present in the
 existing commit-graph file.
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index ea28bc311a..c7d0db5ab4 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -10,7 +10,7 @@ static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
N_("git commit-graph verify [--object-dir ]"),
-   N_("git commit-graph write [--object-dir ] [--append] 
[--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits]"),
NULL
 };
 
@@ -25,12 +25,13 @@ static const char * const builtin_commit_graph_read_usage[] 
= {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-   N_("git commit-graph write [--object-dir ] [--append] 
[--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits]"),
NULL
 };
 
 static struct opts_commit_graph {
const char *obj_dir;
+   int reachable;
int stdin_packs;
int stdin_commits;
int append;
@@ -127,6 +128,8 @@ static int graph_write(int argc, const char **argv)
OPT_STRING(0, "object-dir", _dir,
N_("dir"),
N_("The object directory to store the graph")),
+   OPT_BOOL(0, "reachable", ,
+   N_("start walk at all refs")),
OPT_BOOL(0, "stdin-packs", _packs,
N_("scan pack-indexes listed by stdin for commits")),
OPT_BOOL(0, "stdin-commits", _commits,
@@ -140,11 +143,16 @@ static int graph_write(int argc, const char **argv)
 builtin_commit_graph_write_options,
 builtin_commit_graph_write_usage, 0);
 
-   if (opts.stdin_packs && opts.stdin_commits)
-   die(_("cannot use both --stdin-commits and --stdin-packs"));
+   if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1)
+   die(_("use at most one of --reachable, --stdin-commits, or 
--stdin-packs"));
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();
 
+   if (opts.reachable) {
+   write_commit_graph_reachable(opts.obj_dir, opts.append);
+   return 0;
+   }
+
string_list_init(, 0);
if (opts.stdin_packs || opts.stdin_commits) {
struct strbuf buf = STRBUF_INIT;
diff --git a/commit-graph.c b/commit-graph.c
index a06e7e9549..adf54e3fe7 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -7,6 +7,7 @@
 #include "packfile.h"
 #include "commit.h"
 #include "object.h"
+#include "refs.h"
 #include "revision.h"
 #include "sha1-lookup.h"
 #include "commit-graph.h"
@@ -656,6 +657,23 @@ static void compute_generation_numbers(struct 
packed_commit_list* commits)
}
 }
 
+static int add_ref_to_list(const char *refname,
+  const struct object_id *oid,
+  int flags, void *cb_data)
+{
+   struct string_list *list = (struct string_list*)cb_data;
+   string_list_append(list, oid_to_hex(oid));
+   return 0;
+}
+
+void write_commit_graph_reachable(const char *obj_dir, int append)
+{
+   struct string_list list;
+

[PATCH v6 20/21] gc: automatically write commit-graph files

2018-06-08 Thread Derrick Stolee

The commit-graph file is a very helpful feature for speeding up git
operations. In order to make it more useful, make it possible to
write the commit-graph file during standard garbage collection
operations.

Add a 'gc.commitGraph' config setting that triggers writing a
commit-graph file after any non-trivial 'git gc' command. Defaults to
false while the commit-graph feature matures. We specifically do not
want to have this on by default until the commit-graph feature is fully
integrated with history-modifying features like shallow clones.

Helped-by: Ævar Arnfjörð Bjarmason 
Signed-off-by: Derrick Stolee 
---
 Documentation/config.txt | 10 +-
 Documentation/git-gc.txt |  4 
 builtin/gc.c |  6 ++
 t/t5318-commit-graph.sh  | 14 ++
 4 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a9..f2b5ed17c8 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -906,7 +906,8 @@ the `GIT_NOTES_REF` environment variable.  See 
linkgit:git-notes[1].
 
 core.commitGraph::
Enable git commit graph feature. Allows reading from the
-   commit-graph file.
+   commit-graph file. See `gc.commitGraph` for automatically
+   maintaining the file.
 
 core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
@@ -1647,6 +1648,13 @@ this configuration variable is ignored, all packs except 
the base pack
 will be repacked. After this the number of packs should go below
 gc.autoPackLimit and gc.bigPackThreshold should be respected again.
 
+gc.commitGraph::
+   If true, then gc will rewrite the commit-graph file when
+   linkgit:git-gc[1] is run. When using linkgit:git-gc[1]
+   '--auto' the commit-graph will be updated if housekeeping is
+   required. Default is false. See linkgit:git-commit-graph[1]
+   for details.
+
 gc.logExpiry::
If the file gc.log exists, then `git gc --auto` won't run
unless that file is more than 'gc.logExpiry' old.  Default is
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 24b2dd44fe..f5bc98ccb3 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -136,6 +136,10 @@ The optional configuration variable `gc.packRefs` 
determines if
 it within all non-bare repos or it can be set to a boolean value.
 This defaults to true.
 
+The optional configuration variable `gc.commitGraph` determines if
+'git gc' should run 'git commit-graph write'. This can be set to a
+boolean value. This defaults to false.
+
 The optional configuration variable `gc.aggressiveWindow` controls how
 much time is spent optimizing the delta compression of the objects in
 the repository when the --aggressive option is specified.  The larger
diff --git a/builtin/gc.c b/builtin/gc.c
index ccfb1ceaeb..4e06e8372d 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -20,6 +20,7 @@
 #include "sigchain.h"
 #include "argv-array.h"
 #include "commit.h"
+#include "commit-graph.h"
 #include "packfile.h"
 #include "object-store.h"
 #include "pack.h"
@@ -40,6 +41,7 @@ static int aggressive_depth = 50;
 static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
+static int gc_commit_graph = 0;
 static int detach_auto = 1;
 static timestamp_t gc_log_expire_time;
 static const char *gc_log_expire = "1.day.ago";
@@ -129,6 +131,7 @@ static void gc_config(void)
git_config_get_int("gc.aggressivedepth", _depth);
git_config_get_int("gc.auto", _auto_threshold);
git_config_get_int("gc.autopacklimit", _auto_pack_limit);
+   git_config_get_bool("gc.commitgraph", _commit_graph);
git_config_get_bool("gc.autodetach", _auto);
git_config_get_expiry("gc.pruneexpire", _expire);
git_config_get_expiry("gc.worktreepruneexpire", 
_worktrees_expire);
@@ -641,6 +644,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
if (pack_garbage.nr > 0)
clean_pack_garbage();
 
+   if (gc_commit_graph)
+   write_commit_graph_reachable(get_object_directory(), 0);
+
if (auto_gc && too_many_loose_objects())
warning(_("There are too many unreachable loose objects; "
"run 'git prune' to remove them."));
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 06ecbb3f4a..9a0661983c 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -245,6 +245,20 @@ test_expect_success 'perform fast-forward merge in full 
repo' '
test_cmp expect output
 '
 
+test_expect_success 'check that gc computes commit-graph' '
+   cd "$TRASH_DIRECTORY/full" &&
+   git commit --allow-empty -m "blank" &&
+   git commit-graph write --reachable &&
+   cp $objdir/info/commit-graph commit-graph-before-gc &&
+   git reset --hard HEAD~1 &&
+   git config gc.commitGraph true &&
+   git gc &&
+   cp

[PATCH v6 15/21] commit-graph: test for corrupted octopus edge

2018-06-08 Thread Derrick Stolee

The commit-graph file has an extra chunk to store the parent int-ids for
parents beyond the first parent for octopus merges. Our test repo has a
single octopus merge that we can manipulate to demonstrate the 'verify'
subcommand detects incorrect values in that chunk.

Signed-off-by: Derrick Stolee 
---
 t/t5318-commit-graph.sh | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index b7b4410e75..cbd6462226 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -248,6 +248,7 @@ test_expect_success 'git commit-graph verify' '
 '
 
 NUM_COMMITS=9
+NUM_OCTOPUS_EDGES=2
 HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
@@ -274,6 +275,10 @@ 
GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
 GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
 GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
+GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
+GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
+$GRAPH_COMMIT_DATA_WIDTH * $NUM_COMMITS))
+GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -383,4 +388,9 @@ test_expect_success 'detect incorrect commit date' '
"commit date"
 '
 
+test_expect_success 'detect incorrect parent for octopus merge' '
+   corrupt_graph_and_verify $GRAPH_BYTE_OCTOPUS "\01" \
+   "invalid parent"
+'
+
 test_done
-- 
2.18.0.rc1

[PATCH v6 16/21] commit-graph: verify contents match checksum

2018-06-08 Thread Derrick Stolee

The commit-graph file ends with a SHA1 hash of the previous contents. If
a commit-graph file has errors but the checksum hash is correct, then we
know that the problem is a bug in Git and not simply file corruption
after-the-fact.

Compute the checksum right away so it is the first error that appears,
and make the message translatable since this error can be "corrected" by
a user by simply deleting the file and recomputing. The rest of the
errors are useful only to developers.

Be sure to continue checking the rest of the file data if the checksum
is wrong. This is important for our tests, as we break the checksum as
we modify bytes of the commit-graph file.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 16 ++--
 t/t5318-commit-graph.sh |  6 ++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 6d6c6beff9..d926c4b59f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -833,6 +833,7 @@ void write_commit_graph(const char *obj_dir,
oids.nr = 0;
 }
 
+#define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
 static int verify_commit_graph_error;
 
 static void graph_report(const char *fmt, ...)
@@ -852,8 +853,10 @@ static void graph_report(const char *fmt, ...)
 int verify_commit_graph(struct repository *r, struct commit_graph *g)
 {
uint32_t i, cur_fanout_pos = 0;
-   struct object_id prev_oid, cur_oid;
+   struct object_id prev_oid, cur_oid, checksum;
int generation_zero = 0;
+   struct hashfile *f;
+   int devnull;
 
if (!g) {
graph_report("no commit-graph file loaded");
@@ -872,6 +875,15 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
if (verify_commit_graph_error)
return verify_commit_graph_error;
 
+   devnull = open("/dev/null", O_WRONLY);
+   f = hashfd(devnull, NULL);
+   hashwrite(f, g->data, g->data_len - g->hash_len);
+   finalize_hashfile(f, checksum.hash, CSUM_CLOSE);
+   if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) {
+   graph_report(_("the commit-graph file has incorrect checksum 
and is likely corrupt"));
+   verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH;
+   }
+
for (i = 0; i < g->num_commits; i++) {
struct commit *graph_commit;
 
@@ -909,7 +921,7 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
cur_fanout_pos++;
}
 
-   if (verify_commit_graph_error)
+   if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
return verify_commit_graph_error;
 
for (i = 0; i < g->num_commits; i++) {
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index cbd6462226..bd5b8428af 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -279,6 +279,7 @@ GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
 GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
 $GRAPH_COMMIT_DATA_WIDTH * $NUM_COMMITS))
 GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
+GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES))
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -393,4 +394,9 @@ test_expect_success 'detect incorrect parent for octopus 
merge' '
"invalid parent"
 '
 
+test_expect_success 'detect invalid checksum hash' '
+   corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+   "incorrect checksum"
+'
+
 test_done
-- 
2.18.0.rc1

[PATCH v6 21/21] commit-graph: update design document

2018-06-08 Thread Derrick Stolee

The commit-graph feature is now integrated with 'fsck' and 'gc',
so remove those items from the "Future Work" section of the
commit-graph design document.

Also remove the section on lazy-loading trees, as that was completed
in an earlier patch series.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph.txt | 22 --
 1 file changed, 22 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt 
b/Documentation/technical/commit-graph.txt
index e1a883eb46..c664acbd76 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -118,9 +118,6 @@ Future Work
 - The commit graph feature currently does not honor commit grafts. This can
   be remedied by duplicating or refactoring the current graft logic.
 
-- The 'commit-graph' subcommand does not have a "verify" mode that is
-  necessary for integration with fsck.
-
 - After computing and storing generation numbers, we must make graph
   walks aware of generation numbers to gain the performance benefits they
   enable. This will mostly be accomplished by swapping a commit-date-ordered
@@ -130,25 +127,6 @@ Future Work
 - 'log --topo-order'
 - 'tag --merged'
 
-- Currently, parse_commit_gently() requires filling in the root tree
-  object for a commit. This passes through lookup_tree() and consequently
-  lookup_object(). Also, it calls lookup_commit() when loading the parents.
-  These method calls check the ODB for object existence, even if the
-  consumer does not need the content. For example, we do not need the
-  tree contents when computing merge bases. Now that commit parsing is
-  removed from the computation time, these lookup operations are the
-  slowest operations keeping graph walks from being fast. Consider
-  loading these objects without verifying their existence in the ODB and
-  only loading them fully when consumers need them. Consider a method
-  such as "ensure_tree_loaded(commit)" that fully loads a tree before
-  using commit->tree.
-
-- The current design uses the 'commit-graph' subcommand to generate the graph.
-  When this feature stabilizes enough to recommend to most users, we should
-  add automatic graph writes to common operations that create many commits.
-  For example, one could compute a graph on 'clone', 'fetch', or 'repack'
-  commands.
-
 - A server could provide a commit graph file as part of the network protocol
   to avoid extra calculations by clients. This feature is only of benefit if
   the user is willing to trust the file, because verifying the file is correct
-- 
2.18.0.rc1

[PATCH v6 18/21] commit-graph: use string-list API for input

2018-06-08 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 builtin/commit-graph.c | 39 +--
 commit-graph.c | 15 +++
 commit-graph.h |  7 +++
 3 files changed, 23 insertions(+), 38 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 9d108f43a9..ea28bc311a 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -119,13 +119,9 @@ static int graph_read(int argc, const char **argv)
 
 static int graph_write(int argc, const char **argv)
 {
-   const char **pack_indexes = NULL;
-   int packs_nr = 0;
-   const char **commit_hex = NULL;
-   int commits_nr = 0;
-   const char **lines = NULL;
-   int lines_nr = 0;
-   int lines_alloc = 0;
+   struct string_list *pack_indexes = NULL;
+   struct string_list *commit_hex = NULL;
+   struct string_list lines;
 
static struct option builtin_commit_graph_write_options[] = {
OPT_STRING(0, "object-dir", _dir,
@@ -149,34 +145,25 @@ static int graph_write(int argc, const char **argv)
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();
 
+   string_list_init(, 0);
if (opts.stdin_packs || opts.stdin_commits) {
struct strbuf buf = STRBUF_INIT;
-   lines_nr = 0;
-   lines_alloc = 128;
-   ALLOC_ARRAY(lines, lines_alloc);
-
-   while (strbuf_getline(, stdin) != EOF) {
-   ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
-   lines[lines_nr++] = strbuf_detach(, NULL);
-   }
-
-   if (opts.stdin_packs) {
-   pack_indexes = lines;
-   packs_nr = lines_nr;
-   }
-   if (opts.stdin_commits) {
-   commit_hex = lines;
-   commits_nr = lines_nr;
-   }
+
+   while (strbuf_getline(, stdin) != EOF)
+   string_list_append(, strbuf_detach(, NULL));
+
+   if (opts.stdin_packs)
+   pack_indexes = 
+   if (opts.stdin_commits)
+   commit_hex = 
}
 
write_commit_graph(opts.obj_dir,
   pack_indexes,
-  packs_nr,
   commit_hex,
-  commits_nr,
   opts.append);
 
+   string_list_clear(, 0);
return 0;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index d926c4b59f..a06e7e9549 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -657,10 +657,8 @@ static void compute_generation_numbers(struct 
packed_commit_list* commits)
 }
 
 void write_commit_graph(const char *obj_dir,
-   const char **pack_indexes,
-   int nr_packs,
-   const char **commit_hex,
-   int nr_commits,
+   struct string_list *pack_indexes,
+   struct string_list *commit_hex,
int append)
 {
struct packed_oid_list oids;
@@ -701,10 +699,10 @@ void write_commit_graph(const char *obj_dir,
int dirlen;
strbuf_addf(, "%s/pack/", obj_dir);
dirlen = packname.len;
-   for (i = 0; i < nr_packs; i++) {
+   for (i = 0; i < pack_indexes->nr; i++) {
struct packed_git *p;
strbuf_setlen(, dirlen);
-   strbuf_addstr(, pack_indexes[i]);
+   strbuf_addstr(, pack_indexes->items[i].string);
p = add_packed_git(packname.buf, packname.len, 1);
if (!p)
die("error adding pack %s", packname.buf);
@@ -717,12 +715,13 @@ void write_commit_graph(const char *obj_dir,
}
 
if (commit_hex) {
-   for (i = 0; i < nr_commits; i++) {
+   for (i = 0; i < commit_hex->nr; i++) {
const char *end;
struct object_id oid;
struct commit *result;
 
-   if (commit_hex[i] && parse_oid_hex(commit_hex[i], , 
))
+   if (commit_hex->items[i].string &&
+   parse_oid_hex(commit_hex->items[i].string, , 
))
continue;
 
result = lookup_commit_reference_gently(, 1);
diff --git a/commit-graph.h b/commit-graph.h
index 4359812fb4..a79b854470 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -3,6 +3,7 @@
 
 #include "git-compat-util.h"
 #include "repository.h"
+#include "string-list.h"
 
 char *get_commit_graph_filename(const char *obj_dir);
 
@@ -48,10 +49,8 @@ struct commit_graph {
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
 void write_commit_graph(const char *obj_dir,
-   const

[PATCH v6 06/21] commit-graph: add 'verify' subcommand

2018-06-08 Thread Derrick Stolee

If the commit-graph file becomes corrupt, we need a way to verify
that its contents match the object database. In the manner of
'git fsck' we will implement a 'git commit-graph verify' subcommand
to report all issues with the file.

Add the 'verify' subcommand to the 'commit-graph' builtin and its
documentation. The subcommand is currently a no-op except for
loading the commit-graph into memory, which may trigger run-time
errors that would be caught by normal use. Add a simple test that
ensures the command returns a zero error code.

If no commit-graph file exists, this is an acceptable state. Do
not report any errors.

Helped-by: Ramsay Jones 
Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  6 +
 builtin/commit-graph.c | 39 ++
 commit-graph.c | 23 ++
 commit-graph.h |  3 +++
 t/t5318-commit-graph.sh| 10 
 5 files changed, 81 insertions(+)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 4c97b555cc..a222cfab08 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -10,6 +10,7 @@ SYNOPSIS
 
 [verse]
 'git commit-graph read' [--object-dir ]
+'git commit-graph verify' [--object-dir ]
 'git commit-graph write'  [--object-dir ]
 
 
@@ -52,6 +53,11 @@ existing commit-graph file.
 Read a graph file given by the commit-graph file and output basic
 details about the graph file. Used for debugging purposes.
 
+'verify'::
+
+Read the commit-graph file and verify its contents against the object
+database. Used to check for corrupted data.
+
 
 EXAMPLES
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index f0875b8bf3..9d108f43a9 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -3,15 +3,22 @@
 #include "dir.h"
 #include "lockfile.h"
 #include "parse-options.h"
+#include "repository.h"
 #include "commit-graph.h"
 
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
+   N_("git commit-graph verify [--object-dir ]"),
N_("git commit-graph write [--object-dir ] [--append] 
[--stdin-packs|--stdin-commits]"),
NULL
 };
 
+static const char * const builtin_commit_graph_verify_usage[] = {
+   N_("git commit-graph verify [--object-dir ]"),
+   NULL
+};
+
 static const char * const builtin_commit_graph_read_usage[] = {
N_("git commit-graph read [--object-dir ]"),
NULL
@@ -29,6 +36,36 @@ static struct opts_commit_graph {
int append;
 } opts;
 
+
+static int graph_verify(int argc, const char **argv)
+{
+   struct commit_graph *graph = NULL;
+   char *graph_name;
+
+   static struct option builtin_commit_graph_verify_options[] = {
+   OPT_STRING(0, "object-dir", _dir,
+  N_("dir"),
+  N_("The object directory to store the graph")),
+   OPT_END(),
+   };
+
+   argc = parse_options(argc, argv, NULL,
+builtin_commit_graph_verify_options,
+builtin_commit_graph_verify_usage, 0);
+
+   if (!opts.obj_dir)
+   opts.obj_dir = get_object_directory();
+
+   graph_name = get_commit_graph_filename(opts.obj_dir);
+   graph = load_commit_graph_one(graph_name);
+   FREE_AND_NULL(graph_name);
+
+   if (!graph)
+   return 0;
+
+   return verify_commit_graph(the_repository, graph);
+}
+
 static int graph_read(int argc, const char **argv)
 {
struct commit_graph *graph = NULL;
@@ -165,6 +202,8 @@ int cmd_commit_graph(int argc, const char **argv, const 
char *prefix)
if (argc > 0) {
if (!strcmp(argv[0], "read"))
return graph_read(argc, argv);
+   if (!strcmp(argv[0], "verify"))
+   return graph_verify(argc, argv);
if (!strcmp(argv[0], "write"))
return graph_write(argc, argv);
}
diff --git a/commit-graph.c b/commit-graph.c
index 9e228d3bb5..22ef696e18 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -827,3 +827,26 @@ void write_commit_graph(const char *obj_dir,
oids.alloc = 0;
oids.nr = 0;
 }
+
+static int verify_commit_graph_error;
+
+static void graph_report(const char *fmt, ...)
+{
+   va_list ap;
+   verify_commit_graph_error = 1;
+
+   va_start(ap, fmt);
+   vfprintf(stderr, fmt, ap);
+   fprintf(stderr, "\n");
+   va_end(ap);
+}
+
+int verify_commit_graph(struct repository *r, struct commit_graph *g)
+{
+   if (!g) {
+   graph_report("no commit-graph file loaded");
+   return 1;
+   }
+
+   return verify_commit_graph_error;
+}
diff --git a/commit-graph.h b/commit-graph.h
index 96cccb10f3..4359812fb4 100644
---

[PATCH v6 03/21] commit-graph: parse commit from chosen graph

2018-06-08 Thread Derrick Stolee

Before verifying a commit-graph file against the object database, we
need to parse all commits from the given commit-graph file. Create
parse_commit_in_graph_one() to target a given struct commit_graph.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index f83f6d2373..e77b19971d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -314,7 +314,7 @@ static int find_commit_in_graph(struct commit *item, struct 
commit_graph *g, uin
}
 }
 
-int parse_commit_in_graph(struct commit *item)
+static int parse_commit_in_graph_one(struct commit_graph *g, struct commit 
*item)
 {
uint32_t pos;
 
@@ -322,9 +322,21 @@ int parse_commit_in_graph(struct commit *item)
return 0;
if (item->object.parsed)
return 1;
+
+   if (find_commit_in_graph(item, g, ))
+   return fill_commit_in_graph(item, g, pos);
+
+   return 0;
+}
+
+int parse_commit_in_graph(struct commit *item)
+{
+   if (!core_commit_graph)
+   return 0;
+
prepare_commit_graph();
-   if (commit_graph && find_commit_in_graph(item, commit_graph, ))
-   return fill_commit_in_graph(item, commit_graph, pos);
+   if (commit_graph)
+   return parse_commit_in_graph_one(commit_graph, item);
return 0;
 }
 
-- 
2.18.0.rc1

[PATCH v6 05/21] commit-graph: load a root tree from specific graph

2018-06-08 Thread Derrick Stolee

When lazy-loading a tree for a commit, it will be important to select
the tree from a specific struct commit_graph. Create a new method that
specifies the commit-graph file and use that in
get_commit_tree_in_graph().

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index e77b19971d..9e228d3bb5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -362,14 +362,20 @@ static struct tree *load_tree_for_commit(struct 
commit_graph *g, struct commit *
return c->maybe_tree;
 }
 
-struct tree *get_commit_tree_in_graph(const struct commit *c)
+static struct tree *get_commit_tree_in_graph_one(struct commit_graph *g,
+const struct commit *c)
 {
if (c->maybe_tree)
return c->maybe_tree;
if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
-   BUG("get_commit_tree_in_graph called from non-commit-graph 
commit");
+   BUG("get_commit_tree_in_graph_one called from non-commit-graph 
commit");
+
+   return load_tree_for_commit(g, (struct commit *)c);
+}
 
-   return load_tree_for_commit(commit_graph, (struct commit *)c);
+struct tree *get_commit_tree_in_graph(const struct commit *c)
+{
+   return get_commit_tree_in_graph_one(commit_graph, c);
 }
 
 static void write_graph_chunk_fanout(struct hashfile *f,
-- 
2.18.0.rc1

[PATCH v6 17/21] fsck: verify commit-graph

2018-06-08 Thread Derrick Stolee

If core.commitGraph is true, verify the contents of the commit-graph
during 'git fsck' using the 'git commit-graph verify' subcommand. Run
this check on all alternates, as well.

We use a new process for two reasons:

1. The subcommand decouples the details of loading and verifying a
   commit-graph file from the other fsck details.

2. The commit-graph verification requires the commits to be loaded
   in a specific order to guarantee we parse from the commit-graph
   file for some objects and from the object database for others.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-fsck.txt |  3 +++
 builtin/fsck.c | 21 +
 t/t5318-commit-graph.sh|  8 
 3 files changed, 32 insertions(+)

diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index b9f060e3b2..ab9a93fb9b 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -110,6 +110,9 @@ Any corrupt objects you will have to find in backups or 
other archives
 (i.e., you can just remove them and do an 'rsync' with some other site in
 the hopes that somebody else has the object you have corrupted).
 
+If core.commitGraph is true, the commit-graph file will also be inspected
+using 'git commit-graph verify'. See linkgit:git-commit-graph[1].
+
 Extracted Diagnostics
 -
 
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 3ad4f160f9..9fb2edc69f 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -18,6 +18,7 @@
 #include "decorate.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "run-command.h"
 
 #define REACHABLE 0x0001
 #define SEEN  0x0002
@@ -47,6 +48,7 @@ static int name_objects;
 #define ERROR_REACHABLE 02
 #define ERROR_PACK 04
 #define ERROR_REFS 010
+#define ERROR_COMMIT_GRAPH 020
 
 static const char *describe_object(struct object *obj)
 {
@@ -822,5 +824,24 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
}
 
check_connectivity();
+
+   if (core_commit_graph) {
+   struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
+   const char *verify_argv[] = { "commit-graph", "verify", NULL, 
NULL, NULL };
+   commit_graph_verify.argv = verify_argv;
+   commit_graph_verify.git_cmd = 1;
+
+   if (run_command(_graph_verify))
+   errors_found |= ERROR_COMMIT_GRAPH;
+
+   prepare_alt_odb(the_repository);
+   for (alt =  the_repository->objects->alt_odb_list; alt; alt = 
alt->next) {
+   verify_argv[2] = "--object-dir";
+   verify_argv[3] = alt->path;
+   if (run_command(_graph_verify))
+   errors_found |= ERROR_COMMIT_GRAPH;
+   }
+   }
+
return errors_found;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index bd5b8428af..0da5a51552 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -399,4 +399,12 @@ test_expect_success 'detect invalid checksum hash' '
"incorrect checksum"
 '
 
+test_expect_success 'git fsck (checks commit-graph)' '
+   cd "$TRASH_DIRECTORY/full" &&
+   git fsck &&
+   corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+   "incorrect checksum" &&
+   test_must_fail git fsck
+'
+
 test_done
-- 
2.18.0.rc1

[PATCH v6 12/21] commit-graph: verify parent list

2018-06-08 Thread Derrick Stolee

The commit-graph file stores parents in a two-column portion of the
commit data chunk. If there is only one parent, then the second column
stores 0x to indicate no second parent.

The 'verify' subcommand checks the parent list for the commit loaded
from the commit-graph and the one parsed from the object database. Test
these checks for corrupt parents, too many parents, and wrong parents.

Add a boundary check to insert_parent_or_die() for when the parent
position value is out of range.

The octopus merge will be tested in a later commit.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 28 
 t/t5318-commit-graph.sh | 18 ++
 2 files changed, 46 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 5df18394f9..6d8d774eb0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -244,6 +244,9 @@ static struct commit_list **insert_parent_or_die(struct 
commit_graph *g,
struct commit *c;
struct object_id oid;
 
+   if (pos >= g->num_commits)
+   die("invalid parent position %"PRIu64, pos);
+
hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
c = lookup_commit();
if (!c)
@@ -907,6 +910,7 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
 
for (i = 0; i < g->num_commits; i++) {
struct commit *graph_commit, *odb_commit;
+   struct commit_list *graph_parents, *odb_parents;
 
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
 
@@ -924,6 +928,30 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
 oid_to_hex(_oid),
 
oid_to_hex(get_commit_tree_oid(graph_commit)),
 
oid_to_hex(get_commit_tree_oid(odb_commit)));
+
+   graph_parents = graph_commit->parents;
+   odb_parents = odb_commit->parents;
+
+   while (graph_parents) {
+   if (odb_parents == NULL) {
+   graph_report("commit-graph parent list for 
commit %s is too long",
+oid_to_hex(_oid));
+   break;
+   }
+
+   if (oidcmp(_parents->item->object.oid, 
_parents->item->object.oid))
+   graph_report("commit-graph parent for %s is %s 
!= %s",
+oid_to_hex(_oid),
+
oid_to_hex(_parents->item->object.oid),
+
oid_to_hex(_parents->item->object.oid));
+
+   graph_parents = graph_parents->next;
+   odb_parents = odb_parents->next;
+   }
+
+   if (odb_parents != NULL)
+   graph_report("commit-graph parent list for commit %s 
terminates early",
+oid_to_hex(_oid));
}
 
return verify_commit_graph_error;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 2b9214bc83..9a3481c30f 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -269,6 +269,9 @@ GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + 
$HASH_LEN * 8))
 GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 
10))
 GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 
$NUM_COMMITS))
 GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
+GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
+GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
+GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -348,4 +351,19 @@ test_expect_success 'detect incorrect tree OID' '
"root tree OID for commit"
 '
 
+test_expect_success 'detect incorrect parent int-id' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_PARENT "\01" \
+   "invalid parent"
+'
+
+test_expect_success 'detect extra parent int-id' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_EXTRA_PARENT "\00" \
+   "is too long"
+'
+
+test_expect_success 'detect wrong parent' '
+   corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_WRONG_PARENT "\01" \
+   "commit-graph parent for"
+'
+
 test_done
-- 
2.18.0.rc1

[PATCH v6 09/21] commit-graph: verify corrupt OID fanout and lookup

2018-06-08 Thread Derrick Stolee

In the commit-graph file, the OID fanout chunk provides an index into
the OID lookup. The 'verify' subcommand should find incorrect values
in the fanout.

Similarly, the 'verify' subcommand should find out-of-order values in
the OID lookup.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 36 
 t/t5318-commit-graph.sh | 22 ++
 2 files changed, 58 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index f30b4ccee9..866a9e7e41 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -843,6 +843,9 @@ static void graph_report(const char *fmt, ...)
 
 int verify_commit_graph(struct repository *r, struct commit_graph *g)
 {
+   uint32_t i, cur_fanout_pos = 0;
+   struct object_id prev_oid, cur_oid;
+
if (!g) {
graph_report("no commit-graph file loaded");
return 1;
@@ -857,5 +860,38 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
if (!g->chunk_commit_data)
graph_report("commit-graph is missing the Commit Data chunk");
 
+   if (verify_commit_graph_error)
+   return verify_commit_graph_error;
+
+   for (i = 0; i < g->num_commits; i++) {
+   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+   if (i && oidcmp(_oid, _oid) >= 0)
+   graph_report("commit-graph has incorrect OID order: %s 
then %s",
+oid_to_hex(_oid),
+oid_to_hex(_oid));
+
+   oidcpy(_oid, _oid);
+
+   while (cur_oid.hash[0] > cur_fanout_pos) {
+   uint32_t fanout_value = get_be32(g->chunk_oid_fanout + 
cur_fanout_pos);
+   if (i != fanout_value)
+   graph_report("commit-graph has incorrect fanout 
value: fanout[%d] = %u != %u",
+cur_fanout_pos, fanout_value, i);
+
+   cur_fanout_pos++;
+   }
+   }
+
+   while (cur_fanout_pos < 256) {
+   uint32_t fanout_value = get_be32(g->chunk_oid_fanout + 
cur_fanout_pos);
+
+   if (g->num_commits != fanout_value)
+   graph_report("commit-graph has incorrect fanout value: 
fanout[%d] = %u != %u",
+cur_fanout_pos, fanout_value, i);
+
+   cur_fanout_pos++;
+   }
+
return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index c03792a8ed..4809cc881f 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
git commit-graph verify >output
 '
 
+HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
 GRAPH_BYTE_CHUNK_COUNT=6
@@ -258,6 +259,12 @@ GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
1 * $GRAPH_CHUNK_LOOKUP_WIDTH))
 GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
 2 * $GRAPH_CHUNK_LOOKUP_WIDTH))
+GRAPH_FANOUT_OFFSET=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
+  $GRAPH_CHUNK_LOOKUP_WIDTH * $GRAPH_CHUNK_LOOKUP_ROWS))
+GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 * 4))
+GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255))
+GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256))
+GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -312,4 +319,19 @@ test_expect_success 'detect missing commit data chunk' '
"missing the Commit Data chunk"
 '
 
+test_expect_success 'detect incorrect fanout' '
+   corrupt_graph_and_verify $GRAPH_BYTE_FANOUT1 "\01" \
+   "fanout value"
+'
+
+test_expect_success 'detect incorrect fanout final value' '
+   corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \
+   "fanout value"
+'
+
+test_expect_success 'detect incorrect OID order' '
+   corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ORDER "\01" \
+   "incorrect OID order"
+'
+
 test_done
-- 
2.18.0.rc1

[PATCH v6 10/21] commit-graph: verify objects exist

2018-06-08 Thread Derrick Stolee

In the 'verify' subcommand, load commits directly from the object
database to ensure they exist. Parse by skipping the commit-graph.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 18 ++
 t/t5318-commit-graph.sh |  7 +++
 2 files changed, 25 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 866a9e7e41..00e89b71e9 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -11,6 +11,7 @@
 #include "sha1-lookup.h"
 #include "commit-graph.h"
 #include "object-store.h"
+#include "alloc.h"
 
 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
@@ -242,6 +243,7 @@ static struct commit_list **insert_parent_or_die(struct 
commit_graph *g,
 {
struct commit *c;
struct object_id oid;
+
hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
c = lookup_commit();
if (!c)
@@ -893,5 +895,21 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
cur_fanout_pos++;
}
 
+   if (verify_commit_graph_error)
+   return verify_commit_graph_error;
+
+   for (i = 0; i < g->num_commits; i++) {
+   struct commit *odb_commit;
+
+   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+   odb_commit = (struct commit *)create_object(r, cur_oid.hash, 
alloc_commit_node(r));
+   if (parse_commit_internal(odb_commit, 0, 0)) {
+   graph_report("failed to parse %s from object database",
+oid_to_hex(_oid));
+   continue;
+   }
+   }
+
return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 4809cc881f..af5e34c0cb 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
git commit-graph verify >output
 '
 
+NUM_COMMITS=9
 HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
@@ -265,6 +266,7 @@ GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 * 4))
 GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255))
 GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256))
 GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
+GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 
10))
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -334,4 +336,9 @@ test_expect_success 'detect incorrect OID order' '
"incorrect OID order"
 '
 
+test_expect_success 'detect OID not in object database' '
+   corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_MISSING "\01" \
+   "from object database"
+'
+
 test_done
-- 
2.18.0.rc1

[PATCH v6 02/21] commit-graph: fix GRAPH_MIN_SIZE

2018-06-08 Thread Derrick Stolee

The GRAPH_MIN_SIZE macro should be the smallest size of a parsable
commit-graph file. However, the minimum number of chunks was wrong.
It is possible to write a commit-graph file with zero commits, and
that violates this macro's value.

Rewrite the macro, and use extra macros to better explain the magic
constants.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index b63a1fc85e..f83f6d2373 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -35,10 +35,11 @@
 
 #define GRAPH_LAST_EDGE 0x8000
 
+#define GRAPH_HEADER_SIZE 8
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
-#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
-   GRAPH_OID_LEN + 8)
+#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
+   + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
-- 
2.18.0.rc1

[PATCH v6 00/21] Integrate commit-graph into 'fsck' and 'gc'

2018-06-08 Thread Derrick Stolee

Sorry for the fast rerolls, but Ævar found some runtime issues with the
string-list use and '\*' as multiplication on some platforms. Thus, v4
and v5 are broken. I did get a repro of those issues using VSTS Linux
build servers and checked that they are fixed in this version.

This version depends on 'next' and sb/object-store-alloc. To accommodate
the repository arguments added by sb/object-store-alloc, the following
diff occurs from the previous patch:

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 76423b3fa5..c7d0db5ab4 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -3,6 +3,7 @@
 #include "dir.h"
 #include "lockfile.h"
 #include "parse-options.h"
+#include "repository.h"
 #include "commit-graph.h"

 static char const * const builtin_commit_graph_usage[] = {
@@ -63,7 +64,7 @@ static int graph_verify(int argc, const char **argv)
if (!graph)
return 0;

-   return verify_commit_graph(graph);
+   return verify_commit_graph(the_repository, graph);
 }

 static int graph_read(int argc, const char **argv)
@@ -152,9 +153,9 @@ static int graph_write(int argc, const char **argv)
return 0;
}

+   string_list_init(, 0);
if (opts.stdin_packs || opts.stdin_commits) {
struct strbuf buf = STRBUF_INIT;
-   string_list_init(, 0);

while (strbuf_getline(, stdin) != EOF)
string_list_append(, strbuf_detach(, NULL));
diff --git a/commit-graph.c b/commit-graph.c
index 0d5adc8035..adf54e3fe7 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -12,6 +12,7 @@
 #include "sha1-lookup.h"
 #include "commit-graph.h"
 #include "object-store.h"
+#include "alloc.h"

 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
 #define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
@@ -866,7 +867,7 @@ static void graph_report(const char *fmt, ...)
 #define GENERATION_ZERO_EXISTS 1
 #define GENERATION_NUMBER_EXISTS 2

-int verify_commit_graph(struct commit_graph *g)
+int verify_commit_graph(struct repository *r, struct commit_graph *g)
 {
uint32_t i, cur_fanout_pos = 0;
struct object_id prev_oid, cur_oid, checksum;
@@ -948,7 +949,7 @@ int verify_commit_graph(struct commit_graph *g)
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);

graph_commit = lookup_commit(_oid);
-   odb_commit = (struct commit *)create_object(cur_oid.hash, 
alloc_commit_node());
+   odb_commit = (struct commit *)create_object(r, cur_oid.hash, 
alloc_commit_node(r));
if (parse_commit_internal(odb_commit, 0, 0)) {
graph_report("failed to parse %s from object database",
 oid_to_hex(_oid));
diff --git a/commit-graph.h b/commit-graph.h
index ee20f5e280..506cb45fb1 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -2,6 +2,7 @@
 #define COMMIT_GRAPH_H

 #include "git-compat-util.h"
+#include "repository.h"
 #include "string-list.h"

 char *get_commit_graph_filename(const char *obj_dir);
@@ -53,6 +54,6 @@ void write_commit_graph(const char *obj_dir,
struct string_list *commit_hex,
int append);

-int verify_commit_graph(struct commit_graph *g);
+int verify_commit_graph(struct repository *r, struct commit_graph *g);

 #endif
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index b24e8b6689..9a0661983c 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -33,8 +33,8 @@ test_expect_success 'create commits and repack' '
 '

 graph_git_two_modes() {
-   git -c core.commitGraph=true $1 >output
-   git -c core.commitGraph=false $1 >expect
+   git -c core.graph=true $1 >output
+   git -c core.graph=false $1 >expect
test_cmp output expect
 }

@@ -282,17 +282,17 @@ GRAPH_CHUNK_LOOKUP_WIDTH=12
 GRAPH_CHUNK_LOOKUP_ROWS=5
 GRAPH_BYTE_OID_FANOUT_ID=$GRAPH_CHUNK_LOOKUP_OFFSET
 GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
-   1 \* $GRAPH_CHUNK_LOOKUP_WIDTH))
+   1 * $GRAPH_CHUNK_LOOKUP_WIDTH))
 GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
-2 \* $GRAPH_CHUNK_LOOKUP_WIDTH))
+2 * $GRAPH_CHUNK_LOOKUP_WIDTH))
 GRAPH_FANOUT_OFFSET=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
-  $GRAPH_CHUNK_LOOKUP_WIDTH \* $GRAPH_CHUNK_LOOKUP_ROWS))
-GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 \* 4))
-GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 \* 255))
-GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 \* 256))
-GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8))
-GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 4 + 
10))
-GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 
$NUM_COMMITS))
+  $GRAPH_CHUNK_LOOKUP_WIDTH * $GRAPH_CHUNK_LOOKUP_ROWS))

is there a canonical doc about how to deal with whitespace issues?

2018-06-08 Thread Robert P. J. Day



  for one of my courses, i wanted to write a section about the various
techniques for dealing with whitespace issues in git, so i started
making a list, things like:

  - running "git diff --check"
  - "git commit --cleanup=" possibilities
  - config options like core.{eol,safecrlf,autocrlf}
  - i'm sure there are client-side hooks that can be mentioned

etc, etc.

  has anyone ever written a doc that collects these things in one
place? if not, i guess i have to write one.

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday

Re: [PATCH v5 18/21] commit-graph: use string-list API for input

2018-06-08 Thread Derrick Stolee


On 6/6/2018 8:45 AM, Derrick Stolee wrote:

On 6/6/2018 8:26 AM, Ævar Arnfjörð Bjarmason wrote:

On Wed, Jun 06 2018, Derrick Stolee wrote:


On 6/6/2018 8:11 AM, Ævar Arnfjörð Bjarmason wrote:

On Wed, Jun 06 2018, Derrick Stolee wrote:


Signed-off-by: Derrick Stolee 

+    string_list_clear(, 0);
   return 0;
   }

This results in an invalid free() & segfault because you're freeing
 which may not have been allocated by string_list_init().

Good point. Did my tests not catch this? (seems it requires calling
`git commit-graph write` with no `--stdin-packs` or
`--stdin-commits`).

Most of your tests (t5318-commit-graph.sh) segfaulted, but presumably
you're on a more forgiving compiler/platform/options. I compiled with
-O0 -g on clang 4.0.1-8 + Debian testing.


I appreciate the extra platform testing. I'm using GCC on Ubuntu (gcc 
(Ubuntu 7.3.0-16ubuntu3) 7.3.0).


While the errors didn't repro on my box, they did on the VSTS Linux 
build machines [1]. I created an internal PR just so I could run those 
tests (and on our OSX agents which use clang) and so I can verify I 
fixed the situation. I'll send a v6 soon so Junio only picks up a 
version that succeeds in tests.


Thanks,
-Stolee

[1] 
https://github.com/Microsoft/vsts-agent-docker/blob/master/ubuntu/16.04/standard/Dockerfile

    What's installed on a VSTS Linux build agent

Re: GDPR compliance best practices?

2018-06-08 Thread Peter Backes

On Fri, Jun 08, 2018 at 10:13:20AM +0200, Ævar Arnfjörð Bjarmason wrote:
> Can you walk us through how anyone would be expected to fork (as create
> a new project, not the github-ism) existing projects under such a
> regiment?

I don't see your point. Copy the repository to fork. Nothing changes 
about that. Nothing prevents anyone from forking a repository which had 
some of its author names removed from the commits.

> As David Lang notes upthread, "the license is granted to the world, so
> the world has an interest in it". I wouldn't be so sure that this line
> of argument wouldn't work.

As I already stressed, having an interest is not enough. You need to 
have overriding legitimate grounds.

Best wishes
Peter

-- 
Peter Backes, r...@helen.plasma.xg8.de

Re: GDPR compliance best practices?

2018-06-08 Thread Peter Backes

On Fri, Jun 08, 2018 at 12:42:54AM -0700, David Lang wrote:
> Wrong, if you have to delete info, you are not allowed to keep a private
> copy.

Yes you are allowed. See Art. 17 (3) lit e GDPR.

> There is _nothing_ in the GDPR about publishing information,
> everything in it is about what you are allowed to store privately, how you
> are required to protect it (or more precisely, what you are required to do
> if private data gets hacked), and how you are required to keep it available.

Nope, the GDPR is not at all restricted to private copies.

The GDPR has special jargon for publishing; the GDPR calls it 
"disclosure (Art. 4 (2) GDPR) to an unspecified number of unspecified 
recipients (Art. 4 (9) GDPR), including ones in third countries 
(Chapter 5) in a repetitive (Art 49 (1) GDPR) fashion".

Best wishes
Peter

-- 
Peter Backes, r...@helen.plasma.xg8.de

Re: [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo

2018-06-08 Thread Jakub Narebski

Derrick Stolee  writes:

> The commit-graph file stores a condensed version of the commit history.
> This helps speed up several operations involving commit walks. This
> feature does not work well if those commits "change" using features like
> commit grafts, replace objects, or shallow clones.

I like to think about those features as providing an overlay for the
commit graph (similar to overlay filesystems, like overlayfs); in the
case of git-replace quite literally.

I will be calling all those features "history-altering features", for
short.

> Since the commit-graph feature is optional, hidden behind the
> 'core.commitGraph' config option, and requires manual maintenance (until
> ds/commit-graph-fsck' is merged), these issues only arise for expert
> users who decided to opt-in.
>
> This RFC is a first attempt at rectify the issues that come up when
> these features interact. I have not succeeded entirely, as I will
> discuss below.

What I would like to see here in cover letter is a generic description
of _strategy_ to deal with those history-altering features.

>From what I understand you have the following options for each of
replacements, shallow clones and grafts:
 - turn off serialized commit-graph if given history-altering feature is
   present, as if core.commitGraph was set to false
 - invalidate and optionally refresh commit-graph file if given
   history-altering feature is present (maybe only if it changed the
   view of the history, is such check is possible)

For automatic invalidation you would need to have either:
 - cover all possible ways by which given history-altering feature can
   change the view of history, or
 - keep state of history-altering change for which commit-graph was
   created (e.g. in proposed VAL4 chunk) and compare with current view
   of history if it changed

For automatic turning off you would need only to check if
history-altering feature is present.


Let us examine each of those history-altering features that Git
supports:

* shallow clone

  - shallow clone usually means having shorter history, so serialized
commit-graph is not needed as much; also changing the depth
screws-up assumptions about generation numbers

  - there are only a few entry points changing the view of history,
namely fetch and push with shallow options (--depth=,
--deepen=, --shallow-since=, --update-shallow,
--shallow-exclude=, --unshallow)

  - it is easy to check for presence of shallow clone feature by
chacking of $GIT_DIR/shallow exists and is not empty

  - different contents of $GIT_DIR/shallow means different view of
history

  - NOTE: internally uses grafts mechanism (emulated grafts)

* replacements (replace objects, git-replace)

  - git-replace can be used to join current development repository with
historical repository, in which case we would certainly want to make
use of serialied commit graph; on the other hand the replacements do
not necessary alter the view of the history

  - theoretically you could create replacement refs by hand, but in
practice there are TWO ways of getting them:

- using git-replace command to create, edit/change and delete
  replacement objects'
  '
- fetching or having pushed-to refs in refs/replace/* namespace

  - you need to check if there are any refs in refs/replace/* namespace
to check if the feature is used (but it doesn't necessarily mean
that it altered project history)

  - changed list of refs in refs/replace/* namespace (which you can get
with for-each-ref command/API) does not necessarily mean that the
view of the history changed: you can replace non-commit object, you
can replace commits and not change their parents; it is not as easy
as checking if file exists

  - NOTE: the feature can be turned off manually with
GIT_NO_REPLACE_OBJECTS environment variable and with
--no-replace-objects option to git wrapper.

Also when pushing, fetching and fsck-ing this feature is turned off
and refs in refs/replace/* namespace are treated as ordinary refs.

This may mean that we would want to create commit-graph with
replacements for ordinary non-bare repository, and without
replacements for server-side bare repository.

* grafts

  - subset of uses of git-replace, older and *obsolete* feature (because
it is unsafe to use; that is you need to be careful with it).

  - edited by hand, no automatic entry points

  - if $GIT_DIR/info/grafts file is present, then feature is enabled
(barring some corner cases, like empty file or file consisting only
of comments)

  - changed contents of this file means changed view of history (well,
except for reordering lines, or removing/adding empty lines and
comments)

>
> The first two "DO NOT MERGE" commits are not intended to be part of the
> main product, but are ways to help the full test suite run while
> computing and consuming commit-graph files. This is acheived by calling
>

Re: [RFC PATCH v1] telemetry: design documenation

2018-06-08 Thread Ævar Arnfjörð Bjarmason



On Thu, Jun 07 2018, g...@jeffhostetler.com wrote:

> From: Jeff Hostetler 
>
> Create design documentation to describe the telemetry feature.
>
> Signed-off-by: Jeff Hostetler 
> ---
>  Documentation/technical/telemetry.txt | 475 
> ++
>  1 file changed, 475 insertions(+)
>  create mode 100644 Documentation/technical/telemetry.txt
>
> diff --git a/Documentation/technical/telemetry.txt 
> b/Documentation/technical/telemetry.txt
> new file mode 100644
> index 000..0a708ad
> --- /dev/null
> +++ b/Documentation/technical/telemetry.txt
> @@ -0,0 +1,475 @@
> +Telemetry Design Notes
> +==
> +
> +The telemetry feature allows Git to generate structured telemetry data
> +for executed commands.  Data includes command line arguments, execution
> +times, error codes and messages, and information about child processes.
> +
> +Structued data is produced in a JSON-like format.  (See the UTF-8 related

s/Structued/Structured/

> +"limitations" described in json-writer.h)
> +
> +Telemetry data can be written to a local file or sent to a dynamically
> +loaded shared library via a plugin API.
> +
> +The telemetry feature is similar to the existing trace API (defined in
> +Documentation/technical/api-trace.txt).  Telemetry events are generated
> +thoughout the life of a Git command just like trace messages.  But where

s/thoughout/throughout/

> +as trace messages are essentially developer debug messages, telemetry
> +events are intended for logging and automated analysis.
> +
> +The goal of the telemetry feature is to be able to gather usage data across
> +a group of production users to identify real-world performance problems in
> +production.  Additionally, it might help identify common user errors and
> +guide future user training.
> +
> +By default, telemetry is disabled.  Telemetry is controlled using config
> +settings (see "telemetry.*" in Documentation/config.txt).
> +
> +
> +Telemetry Events
> +
> +
> +Telemetry data is generated as a series of events.  Each event is written
> +as a self-describing JSON object.
> +
> +Events: cmd_start and cmd_exit
> +--
> +
> +The `cmd_start` event is emitted the very beginning of the git.exe process
> +in cmd_main() and `cmd_exit` event is emitted at the end of the process in
> +the atexit cleanup routine.
> +
> +For example, running "git version" produces:
> +
> +{
> +  "event_name": "cmd_start",
> +  "argv": [
> +"C:\\work\\gfw\\git.exe",
> +"version"
> +  ],
> +  "clock": 1525978509976086000,
> +  "pid": 25460,
> +  "git_version": "2.17.0.windows.1",
> +  "telemetry_version": "1",
> +  "session_id": "1525978509976086000-25460"
> +}
> +{
> +  "event_name": "cmd_exit",
> +  "argv": [
> +"C:\\work\\gfw\\git.exe",
> +"version"
> +  ],
> +  "clock": 1525978509980903391,
> +  "pid": 25460,
> +  "git_version": "2.17.0.windows.1",
> +  "telemetry_version": "1",
> +  "session_id": "1525978509976086000-25460",
> +  "is_interactive": false,
> +  "exit_code": 0,
> +  "elapsed_time_core": 0.004814,
> +  "elapsed_time_total": 0.004817,
> +  "builtin": {
> +"name": "version"
> +  }
> +}
> +
> +Fields common to all events:
> + * `event_name` is the name of the event.
> + * `argv` is the array of command line arguments.
> + * `clock` is the time of the event in nanoseconds since the epoch.
> + * `pid` is the process id.
> + * `git_version` is the git version string.
> + * `telemetry_version` is the version of the telemetry format.
> + * `session_id` is described in a later section.

I wonder if we should add the path to git to these if RUNTIME_PREFIX is
defined. Would be useful to log experiments with different git versions,
and as noted by the RUNTIME_PREFIX feature this information is not
guaranteed to be something you can extract from argv.

> +Additional fields in cmd_exit:
> + * `is_interactive` is true if git.exe spawned an interactive child process,
> +  such as a pager, editor, prompt, or gui tool.
> + * `exit_code` is the value passed to exit() from main().
> + * `error_message` (not shown) is the array of error messages.
> + * `elapsed-core-time` measures the time in seconds until exit() was called.
> + * `elapsed-total-time` measures the time until the atexit() routine starts
> +  (which will include time spend in other atexit() routines cleaning up
> +  child processes and etc.).
> + * `alias` (not shown) the updated argv after alias expansion.
> + * `builtin.name` is the canonical command name (from the cmd_struct[]
> +  table) of a builtin command.
> + * `builtin.mode` (not shown) is shown for some commands that have different
> +  major modes and performance times.  For example, checkout can switch
> +  branches or repair a single file.
> + * `child_summary` (not shown) is described in a later section.
> + * `timers` (not shown) is described in a later section.
> + * `aux` (not shown) is described in a later section.
> +
> +
> +Events:

Dear Friend,

2018-06-08 Thread Azzia Mohammad

Dear Friend,
 I am Mr.Azzia Mohammed, the head of file department of Bank of Africa
(B.O.A) herein Burkina Faso / Ouagadougou. In my department we
discover an abandoned sum of (US$18 million US Dollars) in an account
that belongs to one of our foreign customer who died along with his
family in plane crash.

 It is therefore upon this discovery that I now decided to make this
business proposal to you and release the money to you as the next of
kin or relation to the deceased for the safety and subsequent
disbursement since nobody is coming for it. I agree that 40% of this
money will be for you, while 60% would be for me. Then after the money
is been transferred into your account, I will visit your country for
an investment under your kind control.

 You have to contact my Bank directly as the real next of kin of this
deceased account with next of kin application form. You have to send
me those your information below to enable me use it and get you next
of kin application form from bank, so that you will contact Bank for
the transfer of this money into your account.


Your Full Name___
Your Home Address
Your Age ___
Your Handset Number
Your Occupation ___

I am waiting for your urgent respond to enable us proceed further for
the transfer.
Yours faithfully,
Mr. Azzia Mohammed.

Re: [PATCHv1 1/3] git-p4: raise exceptions from p4CmdList based on error from p4 server

2018-06-08 Thread Luke Diamand

>>
>>> ErrorId MsgDb::MaxResults  = { ErrorOf( ES_DB, 32,
>>> E_FAILED, EV_ADMIN, 1 ), "Request too large (over %maxResults%); see
>>> 'p4 help maxresults'." } ;//NOTRANS
>>> ErrorId MsgDb::MaxScanRows = { ErrorOf( ES_DB, 61,
>>> E_FAILED, EV_ADMIN, 1 ), "Too many rows scanned (over %maxScanRows%);
>>> see 'p4 help maxscanrows'." } ;//NOTRANS
>>>
>>> I don't think there's actually a way to make it return any language
>>> other than English though. [...]
>>> So I think probably the language is always English.
>>
>> The "NOTRANS" annotation on the error messages is reassuring.
>
> I'll check it works OK on Windows; charset translation might cause a problem.

Works OK on Windows with 2017.2 client/server.

wir bieten 2% Kredite

2018-06-08 Thread Ronald Bernstein

Sehr geehrte Damen und Herren,

Sie brauchen Geld? Sie sind auf der suche nach einem Darlehen? Seriös und
unkompliziert?
Dann sind Sie hier bei uns genau richtig.
Durch unsere jahrelange Erfahrung und kompetente Beratung sind wir
Europaweit tätig.

Wir bieten jedem ein GÜNSTIGES Darlehen zu TOP Konditionen an.
Darlehnen zwischen 5000 CHF/Euro bis zu 20 Millionen CHF/Euro möglich.
Wir erheben dazu 2% Zinssatz.

Lassen Sie sich von unserem kompetenten Team beraten.

Zögern Sie nicht und kontaktieren Sie mich unter für weitere Infos &
Anfragen unter der eingeblendeten Email Adresse.


Ich freue mich von Ihnen zu hören.

Re: [RFC PATCH v1] telemetry design overview (part 1)

2018-06-08 Thread Ævar Arnfjörð Bjarmason

On Thu, Jun 07 2018, Johannes Sixt wrote:

> Am 07.06.2018 um 16:53 schrieb g...@jeffhostetler.com:
>> From: Jeff Hostetler 
>>
>> I've been working to add code to Git to optionally collect telemetry data.
>> The goal is to be able to collect performance data from Git commands and
>> allow it to be aggregated over a user community to find "slow commands".
>
> Seriously? "add code to collect telemetry data" said by somebody whose
> email address ends with @microsoft.com is very irritating. I really
> don't want to have yet another switch that I must check after every
> update that it is still off.

To elaborate a bit on Jeff's reply (since this was discussed in more
detail at Git Merge this year), the point of this feature is not to ship
git.git with some default telemetry, but so that in-house users of git
like Microsoft, Dropbox, Booking.com etc. can build & configure their
internal versions of git to turn on telemetry for their own users.

There's numerous in-house monkeypatches to git on various sites (at
least Microsoft & Dropbox reported having internal patches already).
Something like this facility would allow us to agree on some
implementation that could be shipped by default (but would be off by
default), those of us who'd make use of this feature already have "root"
on those user's machines, and control what git binary they use etc,
their /etc/gitconfig and so on.

Re: [RFC PATCH v1] telemetry design overview (part 1)

2018-06-08 Thread Jeff King

On Thu, Jun 07, 2018 at 11:10:52PM +0200, Johannes Sixt wrote:

> Am 07.06.2018 um 16:53 schrieb g...@jeffhostetler.com:
> > From: Jeff Hostetler 
> > 
> > I've been working to add code to Git to optionally collect telemetry data.
> > The goal is to be able to collect performance data from Git commands and
> > allow it to be aggregated over a user community to find "slow commands".
> 
> Seriously? "add code to collect telemetry data" said by somebody whose email
> address ends with @microsoft.com is very irritating. I really don't want to
> have yet another switch that I must check after every update that it is
> still off.

If you look at the design document, it's off by default and would write
to a file on the filesystem. That doesn't seem all that different from
GIT_TRACE.

-Peff

Re: GDPR compliance best practices?

2018-06-08 Thread Ævar Arnfjörð Bjarmason

On Fri, Jun 08 2018, Peter Backes wrote:

> On Thu, Jun 07, 2018 at 10:53:13PM -0400, Theodore Y. Ts'o wrote:
>> The problem is you've left undefined who is "you"?  With an open
>> source project, anyone who has contributed to open source project has
>> a copyright interest.  That hobbyist in German who submitted a patch?
>> They have a copyright interest.  That US Company based in Redmond,
>> Washington?  They own a copyright interest.  Huawei in China?  They
>> have a copyright interest.
>>
>> So there is no "privately".  And "you" numbers in the thousands and
>> thousands of copyright holders of portions of the open source code.
>
> Of course there is "privately". Every single one of those who have the
> author information can keep it, privately, for themselves. But those
> that have received a request to be forgotten must not keep publishing
> it on the Internet for download or distribute it to others.

Can you walk us through how anyone would be expected to fork (as create
a new project, not the github-ism) existing projects under such a
regiment?

E.g. in git.git we have SOB lines for the whole history, in lieu of
GNU-style copyright assignment (which is how things mainly worked back
in the CVS days) someone can just clone the repository and create a
hostile fork, which is one of the central ideas of free software.

In the world you're describing the history would have been expunged
publicly, and no hosting site would be willing to host it. It might be
gone in practical terms to anyone who just doesn't like how (in this
example) the Git project is run, and thinks they can do it better.

Maybe (again, in this example) the Software Freedom Conservancy's scope
would have to expand to retain this private history (right now they have
nothing to do with copyright).

But then how am I going to fork the Git project if the SFC decides they
don't want to cooperate with me?

As David Lang notes upthread, "the license is granted to the world, so
the world has an interest in it". I wouldn't be so sure that this line
of argument wouldn't work.

Re: GDPR compliance best practices?

2018-06-08 Thread David Lang


On Fri, 8 Jun 2018, Peter Backes wrote:


you are the one arguing that the GDPR prohibits Git from storing and
revealing this license granting data, not me.


It prohibits publishing, and only after a request to be forgotten. It
does not prohibit storing your private copy.


Wrong, if you have to delete info, you are not allowed to keep a private copy. 
There is _nothing_ in the GDPR about publishing information, everything in it is 
about what you are allowed to store privately, how you are required to protect 
it (or more precisely, what you are required to do if private data gets hacked), 
and how you are required to keep it available.

Re: GDPR compliance best practices?

2018-06-08 Thread Peter Backes

On Thu, Jun 07, 2018 at 10:53:13PM -0400, Theodore Y. Ts'o wrote:
> The problem is you've left undefined who is "you"?  With an open
> source project, anyone who has contributed to open source project has
> a copyright interest.  That hobbyist in German who submitted a patch?
> They have a copyright interest.  That US Company based in Redmond,
> Washington?  They own a copyright interest.  Huawei in China?  They
> have a copyright interest.
> 
> So there is no "privately".  And "you" numbers in the thousands and
> thousands of copyright holders of portions of the open source code.

Of course there is "privately". Every single one of those who have the 
author information can keep it, privately, for themselves. But those 
that have received a request to be forgotten must not keep publishing 
it on the Internet for download or distribute it to others.

> And of course, that's the other thing you seem to fundamentally not
> understand about how git works.  Every developer in the world working
> on that open source project has their own copy.  There is
> fundamentally no way that you can expunge that information from every
> single git repository in the world.

The misunderstanding is on your side.

If you run a website where the world can access a repository, you are 
responsible for obeying the GDPR with respect to that repository. If 
you receive a request to be forgotten, you have to make sure you stop 
publishing that author's identity as part of the repository.

You do NOT need to

- delete it from a private copy you have
- care about others who publish that data
- or even make sure the data is deleted from private copies others may 
have, even if the number of copies is in the thousands.

In practical terms, if someone wishes to exercise his right to be 
forgotten, he will usually send the request to the maintainer and stop 
him from distributing the information, and perhaps to a third party he 
might use as a platform for publication, such as github.

Best wishes
Peter

-- 
Peter Backes, r...@helen.plasma.xg8.de

Re: GDPR compliance best practices?

2018-06-08 Thread Peter Backes

On Thu, Jun 07, 2018 at 04:53:16PM -0700, David Lang wrote:
> the license is granted to the world, so the world has an interest in it.

Certainly, but you need to have overriding legitimate grounds. An 
interest is not enough for justification. You have to weight your 
interests against those of the subject.

> Unless you are going to argue that the GDPR outlawed open source 
> development.

No it certainly did not and I don't see how it could.

All the GDPR arguably demands is that the author's identity is deleted 
from a public repository if he wishes so.

Just assume it was a CVS repo. Then removal would not be any issue at 
all. It is a technical speciality of git that makes the removal so 
intricate to implement, which is not at all an intrinsic property of 
open source development.

> you are the one arguing that the GDPR prohibits Git from storing and
> revealing this license granting data, not me.

It prohibits publishing, and only after a request to be forgotten. It 
does not prohibit storing your private copy.

Best wishes
Peter

-- 
Peter Backes, r...@helen.plasma.xg8.de

Re: [PATCH v8 1/2] json_writer: new routines to create data in JSON format

2018-06-08 Thread René Scharfe

Am 07.06.2018 um 16:12 schrieb g...@jeffhostetler.com:
> From: Jeff Hostetler 
> 
> Add a series of jw_ routines and "struct json_writer" structure to compose
> JSON data.  The resulting string data can then be output by commands wanting
> to support a JSON output format.
> 
> The json-writer routines can be used to generate structured data in a
> JSON-like format.  We say "JSON-like" because we do not enforce the Unicode
> (usually UTF-8) requirement on string fields.  Internally, Git does not
> necessarily have Unicode/UTF-8 data for most fields, so it is currently
> unclear the best way to enforce that requirement.  For example, on Linx
> pathnames can contain arbitrary 8-bit character data, so a command like
> "status" would not know how to encode the reported pathnames.  We may want
> to revisit this (or double encode such strings) in the future.
> 
> The initial use for the json-writer routines is for generating telemetry
> data for executed Git commands.  Later, we may want to use them in other
> commands, such as status.
> 
> Helped-by: René Scharfe 
> Helped-by: Wink Saville 
> Helped-by: Ramsay Jones 
> Signed-off-by: Jeff Hostetler 
> ---
>   Makefile|   2 +
>   json-writer.c   | 419 
>   json-writer.h   | 113 +
>   t/helper/test-json-writer.c | 572 
> 
>   t/t0019-json-writer.sh  | 236 ++
>   5 files changed, 1342 insertions(+)
>   create mode 100644 json-writer.c
>   create mode 100644 json-writer.h
>   create mode 100644 t/helper/test-json-writer.c
>   create mode 100755 t/t0019-json-writer.sh
> 
> diff --git a/Makefile b/Makefile
> index a1d8775..4ae6946 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -666,6 +666,7 @@ TEST_PROGRAMS_NEED_X += test-fake-ssh
>  TEST_PROGRAMS_NEED_X += test-genrandom
>  TEST_PROGRAMS_NEED_X += test-hashmap
>  TEST_PROGRAMS_NEED_X += test-index-version
> +TEST_PROGRAMS_NEED_X += test-json-writer>  TEST_PROGRAMS_NEED_X += 
> test-lazy-init-name-hash
>  TEST_PROGRAMS_NEED_X += test-line-buffer
>  TEST_PROGRAMS_NEED_X += test-match-trees

This doesn't apply cleanly on master, because most test helpers have
been combined into a single binary to reduce their disk footprint and
link times.  See efd71f8913 (t/helper: add an empty test-tool program)
for the overall rationale.

test-json-writer could become a built-in as well, I think.  You can see
e.g. in c932a5ff28 (t/helper: merge test-string-list into test-tool)
what needs to be done to convert a helper.

René

95 matches

Mail list logo