Re: [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo
Derrick Stolee writes: > The commit-graph file stores a condensed version of the commit history. > This helps speed up several operations involving commit walks. This > feature does not work well if those commits "change" using features like > commit grafts, replace objects, or shallow clones. I like to think about those features as providing an overlay for the commit graph (similar to overlay filesystems, like overlayfs); in the case of git-replace quite literally. I will be calling all those features "history-altering features", for short. > Since the commit-graph feature is optional, hidden behind the > 'core.commitGraph' config option, and requires manual maintenance (until > ds/commit-graph-fsck' is merged), these issues only arise for expert > users who decided to opt-in. > > This RFC is a first attempt at rectify the issues that come up when > these features interact. I have not succeeded entirely, as I will > discuss below. What I would like to see here in cover letter is a generic description of _strategy_ to deal with those history-altering features. >From what I understand you have the following options for each of replacements, shallow clones and grafts: - turn off serialized commit-graph if given history-altering feature is present, as if core.commitGraph was set to false - invalidate and optionally refresh commit-graph file if given history-altering feature is present (maybe only if it changed the view of the history, is such check is possible) For automatic invalidation you would need to have either: - cover all possible ways by which given history-altering feature can change the view of history, or - keep state of history-altering change for which commit-graph was created (e.g. in proposed VAL4 chunk) and compare with current view of history if it changed For automatic turning off you would need only to check if history-altering feature is present. Let us examine each of those history-altering features that Git supports: * shallow clone - shallow clone usually means having shorter history, so serialized commit-graph is not needed as much; also changing the depth screws-up assumptions about generation numbers - there are only a few entry points changing the view of history, namely fetch and push with shallow options (--depth=, --deepen=, --shallow-since=, --update-shallow, --shallow-exclude=, --unshallow) - it is easy to check for presence of shallow clone feature by chacking of $GIT_DIR/shallow exists and is not empty - different contents of $GIT_DIR/shallow means different view of history - NOTE: internally uses grafts mechanism (emulated grafts) * replacements (replace objects, git-replace) - git-replace can be used to join current development repository with historical repository, in which case we would certainly want to make use of serialied commit graph; on the other hand the replacements do not necessary alter the view of the history - theoretically you could create replacement refs by hand, but in practice there are TWO ways of getting them: - using git-replace command to create, edit/change and delete replacement objects' ' - fetching or having pushed-to refs in refs/replace/* namespace - you need to check if there are any refs in refs/replace/* namespace to check if the feature is used (but it doesn't necessarily mean that it altered project history) - changed list of refs in refs/replace/* namespace (which you can get with for-each-ref command/API) does not necessarily mean that the view of the history changed: you can replace non-commit object, you can replace commits and not change their parents; it is not as easy as checking if file exists - NOTE: the feature can be turned off manually with GIT_NO_REPLACE_OBJECTS environment variable and with --no-replace-objects option to git wrapper. Also when pushing, fetching and fsck-ing this feature is turned off and refs in refs/replace/* namespace are treated as ordinary refs. This may mean that we would want to create commit-graph with replacements for ordinary non-bare repository, and without replacements for server-side bare repository. * grafts - subset of uses of git-replace, older and *obsolete* feature (because it is unsafe to use; that is you need to be careful with it). - edited by hand, no automatic entry points - if $GIT_DIR/info/grafts file is present, then feature is enabled (barring some corner cases, like empty file or file consisting only of comments) - changed contents of this file means changed view of history (well, except for reordering lines, or removing/adding empty lines and comments) > > The first two "DO NOT MERGE" commits are not intended to be part of the > main product, but are ways to help the full test suite run while > computing and consuming commit-graph files. This is acheived by calling >
Re: [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo
On 5/31/2018 2:33 PM, Stefan Beller wrote: On Thu, May 31, 2018 at 10:40 AM, Derrick Stolee wrote: The commit-graph file stores a condensed version of the commit history. This helps speed up several operations involving commit walks. This feature does not work well if those commits "change" using features like commit grafts, replace objects, or shallow clones. Since the commit-graph feature is optional, hidden behind the 'core.commitGraph' config option, and requires manual maintenance (until ds/commit-graph-fsck' is merged), these issues only arise for expert users who decided to opt-in. This RFC is a first attempt at rectify the issues that come up when these features interact. I have not succeeded entirely, as I will discuss below. The first two "DO NOT MERGE" commits are not intended to be part of the main product, but are ways to help the full test suite run while computing and consuming commit-graph files. This is acheived by calling write_commit_graph_reachable() from `git fetch` and `git commit`. I believe this is the only dependence on ds/commit-graph-fsck. The other commits should only depend on ds/commit-graph-lockfile-fix. I applied these patches on top of ds/commit-graph-fsck (it would have been nice if you mentioned that it applies there) Running the test suite with all patches applied results in: ./t0410-partial-clone.sh(Wstat: 256 Tests: 15 Failed: 2) Failed tests: 5, 8 ./t5307-pack-missing-commit.sh (Wstat: 256 Tests: 5 Failed: 2) Failed tests: 3-4 ./t5500-fetch-pack.sh (Wstat: 256 Tests: 353 Failed: 1) Failed test: 348 ./t6011-rev-list-with-bad-commit.sh (Wstat: 256 Tests: 6 Failed: 1) Failed test: 4 ./t6024-recursive-merge.sh (Wstat: 256 Tests: 6 Failed: 1) Failed test: 4 which you identified as 4x noise and t5500 as not understood. $ GIT_TRACE=1 ./t5500-fetch-pack.sh -d -i -v -x [...] expecting success: git -C shallow12 fetch --shallow-exclude one origin && git -C shallow12 log --pretty=tformat:%s origin/master >actual && test_write_lines three two >expected && test_cmp expected actual ++ git -C shallow12 fetch --shallow-exclude one origin trace: built-in: git fetch --shallow-exclude one origin trace: run_command: unset GIT_PREFIX; 'git-upload-pack '\''/u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/.'\''' trace: run_command: git --shallow-file pack-objects --revs --thin --stdout --shallow --progress --delta-base-offset --include-tag trace: built-in: git pack-objects --revs --thin --stdout --shallow --progress --delta-base-offset --include-tag remote: Counting objects: 4, done. remote: Compressing objects: 100% (3/3), done. remote: Total 4 (delta 0), reused 0 (delta 0) trace: run_command: git --shallow-file unpack-objects --pack_header=2,4 trace: built-in: git unpack-objects --pack_header=2,4 Unpacking objects: 100% (4/4), done. trace: run_command: git rev-list --objects --stdin --not --all --quiet trace: built-in: git rev-list --objects --stdin --not --all --quiet trace: run_command: unset GIT_PREFIX; 'git-upload-pack '\''/u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/.'\''' trace: run_command: git pack-objects --revs --thin --stdout --progress --delta-base-offset trace: built-in: git pack-objects --revs --thin --stdout --progress --delta-base-offset remote: Total 0 (delta 0), reused 0 (delta 0) trace: run_command: git unpack-objects --pack_header=2,0 trace: built-in: git unpack-objects --pack_header=2,0 trace: run_command: git rev-list --objects --stdin --not --all --quiet trace: built-in: git rev-list --objects --stdin --not --all --quiet From file:///u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/. * [new tag] one-> one * [new tag] two-> two run_processes_parallel: preparing to run up to 1 tasks run_processes_parallel: done trace: run_command: git gc --auto trace: built-in: git gc --auto ++ git -C shallow12 log --pretty=tformat:%s origin/master trace: built-in: git log '--pretty=tformat:%s' origin/master ++ test_write_lines three two ++ printf '%s\n' three two ++ test_cmp expected actual ++ diff -u expected actual --- expected 2018-05-31 18:14:25.944540810 + +++ actual 2018-05-31 18:14:25.944540810 + @@ -1,2 +1,3 @@ three two +one error: last command exited with $?=1 not ok 348 - fetch exclude tag one # # git -C shallow12 fetch --shallow-exclude one origin && # git -C shallow12 log --pretty=tformat:%s origin/master >actual && # test_write_lines three two >expected && # test_cmp expected actual # After these changes, there is one test case that still fails, and I cannot understand why: t5500-fetch-pack.sh Failed test: 348 This test fails when performing a `git fetch --shallow-exclude`. When I halt the test using `t5500-fetch-pack.sh -d -i` and navigate to the directory to replay the fetch it performs as expected. What is "as expected" ? When I insert a test_pause be
Re: [RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo
On Thu, May 31, 2018 at 10:40 AM, Derrick Stolee wrote: > The commit-graph file stores a condensed version of the commit history. > This helps speed up several operations involving commit walks. This > feature does not work well if those commits "change" using features like > commit grafts, replace objects, or shallow clones. > > Since the commit-graph feature is optional, hidden behind the > 'core.commitGraph' config option, and requires manual maintenance (until > ds/commit-graph-fsck' is merged), these issues only arise for expert > users who decided to opt-in. > > This RFC is a first attempt at rectify the issues that come up when > these features interact. I have not succeeded entirely, as I will > discuss below. > > The first two "DO NOT MERGE" commits are not intended to be part of the > main product, but are ways to help the full test suite run while > computing and consuming commit-graph files. This is acheived by calling > write_commit_graph_reachable() from `git fetch` and `git commit`. I > believe this is the only dependence on ds/commit-graph-fsck. The other > commits should only depend on ds/commit-graph-lockfile-fix. I applied these patches on top of ds/commit-graph-fsck (it would have been nice if you mentioned that it applies there) Running the test suite with all patches applied results in: ./t0410-partial-clone.sh(Wstat: 256 Tests: 15 Failed: 2) Failed tests: 5, 8 ./t5307-pack-missing-commit.sh (Wstat: 256 Tests: 5 Failed: 2) Failed tests: 3-4 ./t5500-fetch-pack.sh (Wstat: 256 Tests: 353 Failed: 1) Failed test: 348 ./t6011-rev-list-with-bad-commit.sh (Wstat: 256 Tests: 6 Failed: 1) Failed test: 4 ./t6024-recursive-merge.sh (Wstat: 256 Tests: 6 Failed: 1) Failed test: 4 which you identified as 4x noise and t5500 as not understood. $ GIT_TRACE=1 ./t5500-fetch-pack.sh -d -i -v -x [...] expecting success: git -C shallow12 fetch --shallow-exclude one origin && git -C shallow12 log --pretty=tformat:%s origin/master >actual && test_write_lines three two >expected && test_cmp expected actual ++ git -C shallow12 fetch --shallow-exclude one origin trace: built-in: git fetch --shallow-exclude one origin trace: run_command: unset GIT_PREFIX; 'git-upload-pack '\''/u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/.'\''' trace: run_command: git --shallow-file pack-objects --revs --thin --stdout --shallow --progress --delta-base-offset --include-tag trace: built-in: git pack-objects --revs --thin --stdout --shallow --progress --delta-base-offset --include-tag remote: Counting objects: 4, done. remote: Compressing objects: 100% (3/3), done. remote: Total 4 (delta 0), reused 0 (delta 0) trace: run_command: git --shallow-file unpack-objects --pack_header=2,4 trace: built-in: git unpack-objects --pack_header=2,4 Unpacking objects: 100% (4/4), done. trace: run_command: git rev-list --objects --stdin --not --all --quiet trace: built-in: git rev-list --objects --stdin --not --all --quiet trace: run_command: unset GIT_PREFIX; 'git-upload-pack '\''/u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/.'\''' trace: run_command: git pack-objects --revs --thin --stdout --progress --delta-base-offset trace: built-in: git pack-objects --revs --thin --stdout --progress --delta-base-offset remote: Total 0 (delta 0), reused 0 (delta 0) trace: run_command: git unpack-objects --pack_header=2,0 trace: built-in: git unpack-objects --pack_header=2,0 trace: run_command: git rev-list --objects --stdin --not --all --quiet trace: built-in: git rev-list --objects --stdin --not --all --quiet >From file:///u/git/t/trash directory.t5500-fetch-pack/shallow-exclude/. * [new tag] one-> one * [new tag] two-> two run_processes_parallel: preparing to run up to 1 tasks run_processes_parallel: done trace: run_command: git gc --auto trace: built-in: git gc --auto ++ git -C shallow12 log --pretty=tformat:%s origin/master trace: built-in: git log '--pretty=tformat:%s' origin/master ++ test_write_lines three two ++ printf '%s\n' three two ++ test_cmp expected actual ++ diff -u expected actual --- expected 2018-05-31 18:14:25.944540810 + +++ actual 2018-05-31 18:14:25.944540810 + @@ -1,2 +1,3 @@ three two +one error: last command exited with $?=1 not ok 348 - fetch exclude tag one # # git -C shallow12 fetch --shallow-exclude one origin && # git -C shallow12 log --pretty=tformat:%s origin/master >actual && # test_write_lines three two >expected && # test_cmp expected actual # > After these changes, there is one test case that still fails, and I > cannot understand why: > > t5500-fetch-pack.sh Failed test: 348 > > This test fails when performing a `git fetch --shallow-exclude`. When I > halt the test using `t5500-fetch-pack.sh -d -i` and navigate to the > directory to replay the fetch it performs as expected. What is "as expected" ? When I insert a test_pause before
[RFC PATCH 0/6] Fix commit-graph/graft/replace/shallow combo
The commit-graph file stores a condensed version of the commit history. This helps speed up several operations involving commit walks. This feature does not work well if those commits "change" using features like commit grafts, replace objects, or shallow clones. Since the commit-graph feature is optional, hidden behind the 'core.commitGraph' config option, and requires manual maintenance (until ds/commit-graph-fsck' is merged), these issues only arise for expert users who decided to opt-in. This RFC is a first attempt at rectify the issues that come up when these features interact. I have not succeeded entirely, as I will discuss below. The first two "DO NOT MERGE" commits are not intended to be part of the main product, but are ways to help the full test suite run while computing and consuming commit-graph files. This is acheived by calling write_commit_graph_reachable() from `git fetch` and `git commit`. I believe this is the only dependence on ds/commit-graph-fsck. The other commits should only depend on ds/commit-graph-lockfile-fix. Running the full test suite after these DO NO MERGE commits results in the following test failures, which I categorize by type of failure. The following tests are red herrings. Most work by deleting a commit from the object database and trying to demonstrate a failure. Others rely on how certain objects are loaded. These are not bugs, but will add noise if you run the tests suite with this patch. t0410-partial-clone.sh Failed tests: 5, 8 t5307-pack-missing-commit.shFailed tests: 3-4 t6011-rev-list-with-bad-commit.sh Failed test: 4 t6024-recursive-merge.shFailed test: 4 The following tests are fixed in "commit-graph: enable replace-object and grafts". t6001-rev-list-graft.sh Failed tests: 3, 5, 7, 9, 11, 13 t6050-replace.shFailed tests: 11-15, 23-24, 29 The following tests involve shallow clones. t5500-fetch-pack.sh Failed tests: 30-31, 34, 348-349 t5537-fetch-shallow.sh Failed tests: 4-7, 9 t5802-connect-helper.sh Failed test: 3 These tests are mostly fixed by three commits: * commit-graph: avoid writing when repo is shallow * fetch: destroy commit graph on shallow parameters * commit-graph: revert to odb on missing parents Each of these cases cover a different interaction that occurs with shallow clones. Some are due to a commit becoming shallow, while others are due to a commit becoming unshallow (and hence invalidating its generation number). After these changes, there is one test case that still fails, and I cannot understand why: t5500-fetch-pack.sh Failed test: 348 This test fails when performing a `git fetch --shallow-exclude`. When I halt the test using `t5500-fetch-pack.sh -d -i` and navigate to the directory to replay the fetch it performs as expected. After struggling with it for a while, I figured I should just put this series up for discussion. Maybe someone with more experience in shallow clones can point out the obvious issues I'm having. Thanks, -Stolee Derrick Stolee (6): DO NOT MERGE: compute commit-graph on every commit DO NOT MERGE: write commit-graph on every fetch commit-graph: enable replace-object and grafts commit-graph: avoid writing when repo is shallow fetch: destroy commit graph on shallow parameters commit-graph: revert to odb on missing parents builtin/commit.c | 5 + builtin/fetch.c | 10 + builtin/gc.c | 2 +- builtin/replace.c | 3 +++ commit-graph.c| 65 +++ commit-graph.h| 9 commit.c | 5 + environment.c | 2 +- 8 files changed, 95 insertions(+), 6 deletions(-) -- 2.16.2.338.gcfe06ae955