Re: [PATCH v3 17/20] fsck: verify commit-graph

Derrick Stolee Mon, 04 Jun 2018 04:59:32 -0700

On 6/2/2018 12:17 PM, Jakub Narebski wrote:

Derrick Stolee <dsto...@microsoft.com> writes:

If core.commitGraph is true, verify the contents of the commit-graph
during 'git fsck' using the 'git commit-graph verify' subcommand. Run
this check on all alternates, as well.

All right, so we have one config variable to control the use of
serialized commit-graph feaature.  Nice.

We use a new process for two reasons:

1. The subcommand decouples the details of loading and verifying a
    commit-graph file from the other fsck details.

All right, I can agree with that.

On the other hand using subcommand makes debugging harder, though not in
this case (well separated functionality that can be easily called with a
standalone command to be debugged).

2. The commit-graph verification requires the commits to be loaded
    in a specific order to guarantee we parse from the commit-graph
    file for some objects and from the object database for others.

I don't quite understand this.  Could you explain it in more detail?

We use `lookup_commit()` when verifying the commit-graph. If thesecommits were loaded earlier in the process and parsed directly from theobject database, then we aren't comparing the commit-graph file contentsagainst the ODB.

Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
---
  Documentation/git-fsck.txt |  3 +++
  builtin/fsck.c             | 21 +++++++++++++++++++++
  t/t5318-commit-graph.sh    |  8 ++++++++
  3 files changed, 32 insertions(+)

diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index b9f060e3b2..ab9a93fb9b 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -110,6 +110,9 @@ Any corrupt objects you will have to find in backups or 
other archives
  (i.e., you can just remove them and do an 'rsync' with some other site in
  the hopes that somebody else has the object you have corrupted).

+If core.commitGraph is true, the commit-graph file will also be inspected

Shouldn't we use `core.commitGraph` here?

+using 'git commit-graph verify'. See linkgit:git-commit-graph[1].
+
  Extracted Diagnostics
  ---------------------

diff --git a/builtin/fsck.c b/builtin/fsck.c

index ef78c6c00c..a6d5045b77 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -16,6 +16,7 @@
  #include "streaming.h"
  #include "decorate.h"
  #include "packfile.h"
+#include "run-command.h"

#define REACHABLE 0x0001

  #define SEEN      0x0002
@@ -45,6 +46,7 @@ static int name_objects;
  #define ERROR_REACHABLE 02
  #define ERROR_PACK 04
  #define ERROR_REFS 010
+#define ERROR_COMMIT_GRAPH 020

Minor nitpick and a sidenote: I wonder if it wouldn't be better to
either use hexadecimal constants, or use (1 << n) for all ERROR_*
preprocesor constants.

static const char *describe_object(struct object *obj)

  {
@@ -815,5 +817,24 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
        }

check_connectivity();

+
+       if (core_commit_graph) {
+               struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
+               const char *verify_argv[] = { "commit-graph", "verify", NULL, 
NULL, NULL, NULL };

I see that NULL at index 2 and 3 (at 3rd and 4th place) are here for
"--object-dir" and <alternates-object-dir-path>, the last one is
terminator for that case, but what is next to last NULL (at 5th place)
for?

+               commit_graph_verify.argv = verify_argv;
+               commit_graph_verify.git_cmd = 1;
+
+               if (run_command(&commit_graph_verify))
+                       errors_found |= ERROR_COMMIT_GRAPH;
+
+               prepare_alt_odb();
+               for (alt = alt_odb_list; alt; alt = alt->next) {
+                       verify_argv[2] = "--object-dir";
+                       verify_argv[3] = alt->path;
+                       if (run_command(&commit_graph_verify))
+                               errors_found |= ERROR_COMMIT_GRAPH;
+               }
+       }

For performance reasons it may be better to start those 'git
commit-graph verify' commands asynchronously earlier, so that they can
run in parallel / concurrently wth other checks, and wait for them and
get their error code at the end of git-fsck run.

But that is probably better left for a separate commit.

+
        return errors_found;
  }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 2680a2ebff..4941937163 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -394,4 +394,12 @@ test_expect_success 'detect invalid checksum hash' '
                "incorrect checksum"
  '

+test_expect_success 'git fsck (checks commit-graph)' '

+       cd "$TRASH_DIRECTORY/full" &&
+       git fsck &&
+       corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
+               "incorrect checksum" &&
+       test_must_fail git fsck
+'

All right; though the same caveats apply as with previous commit in
series.  Perhaps it would be better to truncate commit-graph file, or
corrupt it in some 'random' place.

+
  test_done

Best,

Re: [PATCH v3 17/20] fsck: verify commit-graph

Reply via email to