Re: [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo

2018-06-08 Thread Jakub Narebski
Derrick Stolee  writes:

> The commit-graph file stores a condensed version of the commit history.
> This helps speed up several operations involving commit walks. This
> feature does not work well if those commits "change" using features like
> commit grafts, replace objects, or shallow clones.

I like to think about those features as providing an overlay for the
commit graph (similar to overlay filesystems, like overlayfs); in the
case of git-replace quite literally.

I will be calling all those features "history-altering features", for
short.

> Since the commit-graph feature is optional, hidden behind the
> 'core.commitGraph' config option, and requires manual maintenance (until
> ds/commit-graph-fsck' is merged), these issues only arise for expert
> users who decided to opt-in.
>
> This RFC is a first attempt at rectify the issues that come up when
> these features interact. I have not succeeded entirely, as I will
> discuss below.

What I would like to see here in cover letter is a generic description
of _strategy_ to deal with those history-altering features.

>From what I understand you have the following options for each of
replacements, shallow clones and grafts:
 - turn off serialized commit-graph if given history-altering feature is
   present, as if core.commitGraph was set to false
 - invalidate and optionally refresh commit-graph file if given
   history-altering feature is present (maybe only if it changed the
   view of the history, is such check is possible)

For automatic invalidation you would need to have either:
 - cover all possible ways by which given history-altering feature can
   change the view of history, or
 - keep state of history-altering change for which commit-graph was
   created (e.g. in proposed VAL4 chunk) and compare with current view
   of history if it changed

For automatic turning off you would need only to check if
history-altering feature is present.


Let us examine each of those history-altering features that Git
supports:

* shallow clone

  - shallow clone usually means having shorter history, so serialized
commit-graph is not needed as much; also changing the depth
screws-up assumptions about generation numbers

  - there are only a few entry points changing the view of history,
namely fetch and push with shallow options (--depth=,
--deepen=, --shallow-since=, --update-shallow,
--shallow-exclude=, --unshallow)

  - it is easy to check for presence of shallow clone feature by
chacking of $GIT_DIR/shallow exists and is not empty

  - different contents of $GIT_DIR/shallow means different view of
history

  - NOTE: internally uses grafts mechanism (emulated grafts)

* replacements (replace objects, git-replace)

  - git-replace can be used to join current development repository with
historical repository, in which case we would certainly want to make
use of serialied commit graph; on the other hand the replacements do
not necessary alter the view of the history

  - theoretically you could create replacement refs by hand, but in
practice there are TWO ways of getting them:

- using git-replace command to create, edit/change and delete
  replacement objects'
  '
- fetching or having pushed-to refs in refs/replace/* namespace

  - you need to check if there are any refs in refs/replace/* namespace
to check if the feature is used (but it doesn't necessarily mean
that it altered project history)

  - changed list of refs in refs/replace/* namespace (which you can get
with for-each-ref command/API) does not necessarily mean that the
view of the history changed: you can replace non-commit object, you
can replace commits and not change their parents; it is not as easy
as checking if file exists

  - NOTE: the feature can be turned off manually with
GIT_NO_REPLACE_OBJECTS environment variable and with
--no-replace-objects option to git wrapper.

Also when pushing, fetching and fsck-ing this feature is turned off
and refs in refs/replace/* namespace are treated as ordinary refs.

This may mean that we would want to create commit-graph with
replacements for ordinary non-bare repository, and without
replacements for server-side bare repository.

* grafts

  - subset of uses of git-replace, older and *obsolete* feature (because
it is unsafe to use; that is you need to be careful with it).

  - edited by hand, no automatic entry points

  - if $GIT_DIR/info/grafts file is present, then feature is enabled
(barring some corner cases, like empty file or file consisting only
of comments)

  - changed contents of this file means changed view of history (well,
except for reordering lines, or removing/adding empty lines and
comments)

>
> The first two "DO NOT MERGE" commits are not intended to be part of the
> main product, but are ways to help the full test suite run while
> computing and consuming commit-graph files. This is acheived by calling
> 

Re: [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo

2018-05-31 Thread Derrick Stolee

On 5/31/2018 2:33 PM, Stefan Beller wrote:

On Thu, May 31, 2018 at 10:40 AM, Derrick Stolee  wrote:

The commit-graph file stores a condensed version of the commit history.
This helps speed up several operations involving commit walks. This
feature does not work well if those commits "change" using features like
commit grafts, replace objects, or shallow clones.

Since the commit-graph feature is optional, hidden behind the
'core.commitGraph' config option, and requires manual maintenance (until
ds/commit-graph-fsck' is merged), these issues only arise for expert
users who decided to opt-in.

This RFC is a first attempt at rectify the issues that come up when
these features interact. I have not succeeded entirely, as I will
discuss below.

The first two "DO NOT MERGE" commits are not intended to be part of the
main product, but are ways to help the full test suite run while
computing and consuming commit-graph files. This is acheived by calling
write_commit_graph_reachable() from `git fetch` and `git commit`. I
believe this is the only dependence on ds/commit-graph-fsck. The other
commits should only depend on ds/commit-graph-lockfile-fix.

I applied these patches on top of ds/commit-graph-fsck
(it would have been nice if you mentioned that it applies there)
Running the test suite with all patches applied results in:

./t0410-partial-clone.sh(Wstat: 256 Tests: 15 Failed: 2)
   Failed tests:  5, 8
./t5307-pack-missing-commit.sh  (Wstat: 256 Tests: 5 Failed: 2)
   Failed tests:  3-4
./t5500-fetch-pack.sh   (Wstat: 256 Tests: 353 Failed: 1)
   Failed test:  348
./t6011-rev-list-with-bad-commit.sh (Wstat: 256 Tests: 6 Failed: 1)
   Failed test:  4
./t6024-recursive-merge.sh  (Wstat: 256 Tests: 6 Failed: 1)
   Failed test:  4

which you identified as 4x noise and t5500 as not understood.

$ GIT_TRACE=1 ./t5500-fetch-pack.sh -d -i -v -x
[...]
expecting success:
git -C shallow12 fetch --shallow-exclude one origin &&
git -C shallow12 log --pretty=tformat:%s origin/master >actual &&
test_write_lines three two >expected &&
test_cmp expected actual

++ git -C shallow12 fetch --shallow-exclude one origin
trace: built-in: git fetch --shallow-exclude one origin
trace: run_command: unset GIT_PREFIX; 'git-upload-pack
'\''/u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/.'\'''
trace: run_command: git --shallow-file  pack-objects --revs --thin
--stdout --shallow --progress --delta-base-offset --include-tag
trace: built-in: git pack-objects --revs --thin --stdout --shallow
--progress --delta-base-offset --include-tag
remote: Counting objects: 4, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 4 (delta 0), reused 0 (delta 0)
trace: run_command: git --shallow-file  unpack-objects --pack_header=2,4
trace: built-in: git unpack-objects --pack_header=2,4
Unpacking objects: 100% (4/4), done.
trace: run_command: git rev-list --objects --stdin --not --all --quiet
trace: built-in: git rev-list --objects --stdin --not --all --quiet
trace: run_command: unset GIT_PREFIX; 'git-upload-pack
'\''/u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/.'\'''
trace: run_command: git pack-objects --revs --thin --stdout --progress
--delta-base-offset
trace: built-in: git pack-objects --revs --thin --stdout --progress
--delta-base-offset
remote: Total 0 (delta 0), reused 0 (delta 0)
trace: run_command: git unpack-objects --pack_header=2,0
trace: built-in: git unpack-objects --pack_header=2,0
trace: run_command: git rev-list --objects --stdin --not --all --quiet
trace: built-in: git rev-list --objects --stdin --not --all --quiet
 From file:///u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/.
  * [new tag] one-> one
  * [new tag] two-> two
run_processes_parallel: preparing to run up to 1 tasks
run_processes_parallel: done
trace: run_command: git gc --auto
trace: built-in: git gc --auto
++ git -C shallow12 log --pretty=tformat:%s origin/master
trace: built-in: git log '--pretty=tformat:%s' origin/master
++ test_write_lines three two
++ printf '%s\n' three two
++ test_cmp expected actual
++ diff -u expected actual
--- expected 2018-05-31 18:14:25.944540810 +
+++ actual 2018-05-31 18:14:25.944540810 +
@@ -1,2 +1,3 @@
  three
  two
+one
error: last command exited with $?=1
not ok 348 - fetch exclude tag one
#
# git -C shallow12 fetch --shallow-exclude one origin &&
# git -C shallow12 log --pretty=tformat:%s origin/master >actual &&
# test_write_lines three two >expected &&
# test_cmp expected actual
#



After these changes, there is one test case that still fails, and I
cannot understand why:

t5500-fetch-pack.sh Failed test:  348

This test fails when performing a `git fetch --shallow-exclude`. When I
halt the test using `t5500-fetch-pack.sh -d -i` and navigate to the
directory to replay the fetch it performs as expected.

What is "as expected" ?

When I insert a test_pause be

Re: [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo

2018-05-31 Thread Stefan Beller
On Thu, May 31, 2018 at 10:40 AM, Derrick Stolee  wrote:
> The commit-graph file stores a condensed version of the commit history.
> This helps speed up several operations involving commit walks. This
> feature does not work well if those commits "change" using features like
> commit grafts, replace objects, or shallow clones.
>
> Since the commit-graph feature is optional, hidden behind the
> 'core.commitGraph' config option, and requires manual maintenance (until
> ds/commit-graph-fsck' is merged), these issues only arise for expert
> users who decided to opt-in.
>
> This RFC is a first attempt at rectify the issues that come up when
> these features interact. I have not succeeded entirely, as I will
> discuss below.
>
> The first two "DO NOT MERGE" commits are not intended to be part of the
> main product, but are ways to help the full test suite run while
> computing and consuming commit-graph files. This is acheived by calling
> write_commit_graph_reachable() from `git fetch` and `git commit`. I
> believe this is the only dependence on ds/commit-graph-fsck. The other
> commits should only depend on ds/commit-graph-lockfile-fix.

I applied these patches on top of ds/commit-graph-fsck
(it would have been nice if you mentioned that it applies there)
Running the test suite with all patches applied results in:

./t0410-partial-clone.sh(Wstat: 256 Tests: 15 Failed: 2)
  Failed tests:  5, 8
./t5307-pack-missing-commit.sh  (Wstat: 256 Tests: 5 Failed: 2)
  Failed tests:  3-4
./t5500-fetch-pack.sh   (Wstat: 256 Tests: 353 Failed: 1)
  Failed test:  348
./t6011-rev-list-with-bad-commit.sh (Wstat: 256 Tests: 6 Failed: 1)
  Failed test:  4
./t6024-recursive-merge.sh  (Wstat: 256 Tests: 6 Failed: 1)
  Failed test:  4

which you identified as 4x noise and t5500 as not understood.

$ GIT_TRACE=1 ./t5500-fetch-pack.sh -d -i -v -x
[...]
expecting success:
git -C shallow12 fetch --shallow-exclude one origin &&
git -C shallow12 log --pretty=tformat:%s origin/master >actual &&
test_write_lines three two >expected &&
test_cmp expected actual

++ git -C shallow12 fetch --shallow-exclude one origin
trace: built-in: git fetch --shallow-exclude one origin
trace: run_command: unset GIT_PREFIX; 'git-upload-pack
'\''/u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/.'\'''
trace: run_command: git --shallow-file  pack-objects --revs --thin
--stdout --shallow --progress --delta-base-offset --include-tag
trace: built-in: git pack-objects --revs --thin --stdout --shallow
--progress --delta-base-offset --include-tag
remote: Counting objects: 4, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 4 (delta 0), reused 0 (delta 0)
trace: run_command: git --shallow-file  unpack-objects --pack_header=2,4
trace: built-in: git unpack-objects --pack_header=2,4
Unpacking objects: 100% (4/4), done.
trace: run_command: git rev-list --objects --stdin --not --all --quiet
trace: built-in: git rev-list --objects --stdin --not --all --quiet
trace: run_command: unset GIT_PREFIX; 'git-upload-pack
'\''/u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/.'\'''
trace: run_command: git pack-objects --revs --thin --stdout --progress
--delta-base-offset
trace: built-in: git pack-objects --revs --thin --stdout --progress
--delta-base-offset
remote: Total 0 (delta 0), reused 0 (delta 0)
trace: run_command: git unpack-objects --pack_header=2,0
trace: built-in: git unpack-objects --pack_header=2,0
trace: run_command: git rev-list --objects --stdin --not --all --quiet
trace: built-in: git rev-list --objects --stdin --not --all --quiet
>From file:///u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/.
 * [new tag] one-> one
 * [new tag] two-> two
run_processes_parallel: preparing to run up to 1 tasks
run_processes_parallel: done
trace: run_command: git gc --auto
trace: built-in: git gc --auto
++ git -C shallow12 log --pretty=tformat:%s origin/master
trace: built-in: git log '--pretty=tformat:%s' origin/master
++ test_write_lines three two
++ printf '%s\n' three two
++ test_cmp expected actual
++ diff -u expected actual
--- expected 2018-05-31 18:14:25.944540810 +
+++ actual 2018-05-31 18:14:25.944540810 +
@@ -1,2 +1,3 @@
 three
 two
+one
error: last command exited with $?=1
not ok 348 - fetch exclude tag one
#
# git -C shallow12 fetch --shallow-exclude one origin &&
# git -C shallow12 log --pretty=tformat:%s origin/master >actual &&
# test_write_lines three two >expected &&
# test_cmp expected actual
#


> After these changes, there is one test case that still fails, and I
> cannot understand why:
>
> t5500-fetch-pack.sh Failed test:  348
>
> This test fails when performing a `git fetch --shallow-exclude`. When I
> halt the test using `t5500-fetch-pack.sh -d -i` and navigate to the
> directory to replay the fetch it performs as expected.

What is "as expected" ?

When I insert a test_pause before 

[RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo

2018-05-31 Thread Derrick Stolee
The commit-graph file stores a condensed version of the commit history.
This helps speed up several operations involving commit walks. This
feature does not work well if those commits "change" using features like
commit grafts, replace objects, or shallow clones.

Since the commit-graph feature is optional, hidden behind the
'core.commitGraph' config option, and requires manual maintenance (until
ds/commit-graph-fsck' is merged), these issues only arise for expert
users who decided to opt-in.

This RFC is a first attempt at rectify the issues that come up when
these features interact. I have not succeeded entirely, as I will
discuss below.

The first two "DO NOT MERGE" commits are not intended to be part of the
main product, but are ways to help the full test suite run while
computing and consuming commit-graph files. This is acheived by calling
write_commit_graph_reachable() from `git fetch` and `git commit`. I
believe this is the only dependence on ds/commit-graph-fsck. The other
commits should only depend on ds/commit-graph-lockfile-fix.

Running the full test suite after these DO NO MERGE commits results in
the following test failures, which I categorize by type of failure.

The following tests are red herrings. Most work by deleting a commit
from the object database and trying to demonstrate a failure. Others
rely on how certain objects are loaded. These are not bugs, but will
add noise if you run the tests suite with this patch.

t0410-partial-clone.sh  Failed tests:  5, 8
t5307-pack-missing-commit.shFailed tests:  3-4
t6011-rev-list-with-bad-commit.sh   Failed test:  4
t6024-recursive-merge.shFailed test:  4

The following tests are fixed in "commit-graph: enable replace-object
and grafts".

t6001-rev-list-graft.sh Failed tests:  3, 5, 7, 9, 11, 13
t6050-replace.shFailed tests:  11-15, 23-24, 29

The following tests involve shallow clones.

t5500-fetch-pack.sh Failed tests:  30-31, 34, 348-349
t5537-fetch-shallow.sh  Failed tests:  4-7, 9
t5802-connect-helper.sh Failed test:  3

These tests are mostly fixed by three commits:

* commit-graph: avoid writing when repo is shallow
* fetch: destroy commit graph on shallow parameters
* commit-graph: revert to odb on missing parents

Each of these cases cover a different interaction that occurs with
shallow clones. Some are due to a commit becoming shallow, while others
are due to a commit becoming unshallow (and hence invalidating its
generation number).

After these changes, there is one test case that still fails, and I
cannot understand why:

t5500-fetch-pack.sh Failed test:  348

This test fails when performing a `git fetch --shallow-exclude`. When I
halt the test using `t5500-fetch-pack.sh -d -i` and navigate to the
directory to replay the fetch it performs as expected. After struggling
with it for a while, I figured I should just put this series up for
discussion. Maybe someone with more experience in shallow clones can
point out the obvious issues I'm having.

Thanks,
-Stolee

Derrick Stolee (6):
  DO NOT MERGE: compute commit-graph on every commit
  DO NOT MERGE: write commit-graph on every fetch
  commit-graph: enable replace-object and grafts
  commit-graph: avoid writing when repo is shallow
  fetch: destroy commit graph on shallow parameters
  commit-graph: revert to odb on missing parents

 builtin/commit.c  |  5 +
 builtin/fetch.c   | 10 +
 builtin/gc.c  |  2 +-
 builtin/replace.c |  3 +++
 commit-graph.c| 65 +++
 commit-graph.h|  9 
 commit.c  |  5 +
 environment.c |  2 +-
 8 files changed, 95 insertions(+), 6 deletions(-)

-- 
2.16.2.338.gcfe06ae955