Re: [PATCH v3 09/20] commit-graph: verify corrupt OID fanout and lookup
On Mon, Jun 4, 2018 at 1:32 PM, Derrick Stolee wrote: > On 6/2/2018 12:38 AM, Duy Nguyen wrote: >> >> On Thu, May 24, 2018 at 6:25 PM, Derrick Stolee >> wrote: >>> >>> + if (i && oidcmp(_oid, _oid) >= 0) >>> + graph_report("commit-graph has incorrect OID >>> order: %s then %s", >>> +oid_to_hex(_oid), >>> +oid_to_hex(_oid)); >> >> Should these strings be marked for translation with _()? > > I've been asking myself "Is this message helpful to anyone other than a Git > developer?" and for this series the only one that is helpful to an end-user > is the message about the final hash. If the hash is correct, but these other > messages appear, then there is a bug in the code that wrote the file. > Otherwise, file corruption is more likely and the correct course of action > is to delete and rebuild. Dev-only strings like this are typically prefixed with "BUG:" or "internal error:" (unless BUG() is a better choice). Git is unfortunately not fully i18n-ized and devs from time to time still forget to mark string for translations when appropriate, including me. Because of this, we still have to slowly scan through the code base and mark more strings for translation. Something to say clearly "not translatable on purpose" would help a lot. If "BUG:" and friends are too much noise, a /* no translate */ comment or some other form could also help. But your explanation to me still sounds like corrupted file in some form, which should be translated unless it's too cryptic. commit-graph format may be available in non-English languages and people can still try to figure out the problem without relying entirely on git developers. -- Duy
Re: [PATCH v3 09/20] commit-graph: verify corrupt OID fanout and lookup
On 6/2/2018 12:38 AM, Duy Nguyen wrote: On Thu, May 24, 2018 at 6:25 PM, Derrick Stolee wrote: + if (i && oidcmp(_oid, _oid) >= 0) + graph_report("commit-graph has incorrect OID order: %s then %s", +oid_to_hex(_oid), +oid_to_hex(_oid)); Should these strings be marked for translation with _()? I've been asking myself "Is this message helpful to anyone other than a Git developer?" and for this series the only one that is helpful to an end-user is the message about the final hash. If the hash is correct, but these other messages appear, then there is a bug in the code that wrote the file. Otherwise, file corruption is more likely and the correct course of action is to delete and rebuild. Thanks for being diligent in checking. Thanks, -Stolee
Re: [PATCH v3 09/20] commit-graph: verify corrupt OID fanout and lookup
On Thu, May 24, 2018 at 6:25 PM, Derrick Stolee wrote: > + if (i && oidcmp(_oid, _oid) >= 0) > + graph_report("commit-graph has incorrect OID order: > %s then %s", > +oid_to_hex(_oid), > +oid_to_hex(_oid)); Should these strings be marked for translation with _()? -- Duy
Re: [PATCH v3 09/20] commit-graph: verify corrupt OID fanout and lookup
On 5/30/2018 9:34 AM, Jakub Narebski wrote: Derrick Stolee writes: In the commit-graph file, the OID fanout chunk provides an index into the OID lookup. The 'verify' subcommand should find incorrect values in the fanout. Similarly, the 'verify' subcommand should find out-of-order values in the OID lookup. O.K., the OID Lookup chunk is checked together with OID Fanout chunk because those two chunks are tightly connected: OID Fanout is fanout into OID Lookup. Signed-off-by: Derrick Stolee --- commit-graph.c | 36 t/t5318-commit-graph.sh | 22 ++ 2 files changed, 58 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index 06e3e4f9ba..cbd1aae514 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -855,6 +855,9 @@ static void graph_report(const char *fmt, ...) int verify_commit_graph(struct commit_graph *g) { + uint32_t i, cur_fanout_pos = 0; + struct object_id prev_oid, cur_oid; Minor nitpick about the naming convention: why cur_*, and not curr_*? + if (!g) { graph_report("no commit-graph file loaded"); return 1; @@ -869,5 +872,38 @@ int verify_commit_graph(struct commit_graph *g) if (!g->chunk_commit_data) graph_report("commit-graph is missing the Commit Data chunk"); + if (verify_commit_graph_error) + return verify_commit_graph_error; + Before checking that commits in OID Lookup are sorted, and that OID Fanout correctly points into OID Lookup (thus also checking that OID Lookup is sorted), it would be good to verify that sizes of those chunks are correct. Out of 4 standrd chunks, 1 of them (OIDF) has constant size, and 2 of them have size given by number of commits and hash size - OID Fanout is (256 * 4 bytes) - OID Lookup is (N * H bytes), where N is number of commits, and H is hash size The one that is more significant is if number of commits gets corrupted upwards, and walking through OID Lookup would lead us out of bounds, outside the file size. IIRC we have checked that all chunks fit within file size, isn't it? + for (i = 0; i < g->num_commits; i++) { + hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); Why do you use hashcpy() here, but oidcpy() below? We are copying from a section of binary data, not from a struct object_id *. + + if (i && oidcmp(_oid, _oid) >= 0) All right, OIDs needs to be sorted in ascending lexicographical order, and the above condition checks that this constraint is fullfilled. + graph_report("commit-graph has incorrect OID order: %s then %s", +oid_to_hex(_oid), +oid_to_hex(_oid)); Nice informative error message. + + oidcpy(_oid, _oid); + + while (cur_oid.hash[0] > cur_fanout_pos) { + uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos); + if (i != fanout_value) + graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u", +cur_fanout_pos, fanout_value, i); + + cur_fanout_pos++; + } The k-th entry of fanout, F[k], stores the number of OIDs with first byte at most k. Let's examine it in detail on a simple example: fanout lookup -- -- 0 : 0 ---> 0: 1 1 : 2 -\ 1: 1 2 : 2 --\> 2: 3 3 : 3 ---> 3: 4 We are checking that after most significant byte (first byte) changes, then all fanout position up to the current first byte value (exclusive) point to current position in OID Lookup chunk. Seems all right; it would be nice to come up with rigorous proof that it is all right. + } + + while (cur_fanout_pos < 256) { + uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos); + + if (g->num_commits != fanout_value) + graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u", +cur_fanout_pos, fanout_value, i); + + cur_fanout_pos++; + } All right, this checks that all remaining fanout entries, up and including the 255-ith entry store the total number of commits, N. + return verify_commit_graph_error; } diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 4ef3fe3dc2..c050ef980b 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' ' git commit-graph verify >output ' +HASH_LEN=20 GRAPH_BYTE_VERSION=4 GRAPH_BYTE_HASH=5 GRAPH_BYTE_CHUNK_COUNT=6 @@ -258,6 +259,12 @@
Re: [PATCH v3 09/20] commit-graph: verify corrupt OID fanout and lookup
Derrick Stolee writes: > In the commit-graph file, the OID fanout chunk provides an index into > the OID lookup. The 'verify' subcommand should find incorrect values > in the fanout. > > Similarly, the 'verify' subcommand should find out-of-order values in > the OID lookup. O.K., the OID Lookup chunk is checked together with OID Fanout chunk because those two chunks are tightly connected: OID Fanout is fanout into OID Lookup. > > Signed-off-by: Derrick Stolee > --- > commit-graph.c | 36 > t/t5318-commit-graph.sh | 22 ++ > 2 files changed, 58 insertions(+) > > diff --git a/commit-graph.c b/commit-graph.c > index 06e3e4f9ba..cbd1aae514 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -855,6 +855,9 @@ static void graph_report(const char *fmt, ...) > > int verify_commit_graph(struct commit_graph *g) > { > + uint32_t i, cur_fanout_pos = 0; > + struct object_id prev_oid, cur_oid; Minor nitpick about the naming convention: why cur_*, and not curr_*? > + > if (!g) { > graph_report("no commit-graph file loaded"); > return 1; > @@ -869,5 +872,38 @@ int verify_commit_graph(struct commit_graph *g) > if (!g->chunk_commit_data) > graph_report("commit-graph is missing the Commit Data chunk"); > > + if (verify_commit_graph_error) > + return verify_commit_graph_error; > + Before checking that commits in OID Lookup are sorted, and that OID Fanout correctly points into OID Lookup (thus also checking that OID Lookup is sorted), it would be good to verify that sizes of those chunks are correct. Out of 4 standrd chunks, 1 of them (OIDF) has constant size, and 2 of them have size given by number of commits and hash size - OID Fanout is (256 * 4 bytes) - OID Lookup is (N * H bytes), where N is number of commits, and H is hash size The one that is more significant is if number of commits gets corrupted upwards, and walking through OID Lookup would lead us out of bounds, outside the file size. IIRC we have checked that all chunks fit within file size, isn't it? > + for (i = 0; i < g->num_commits; i++) { > + hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); Why do you use hashcpy() here, but oidcpy() below? > + > + if (i && oidcmp(_oid, _oid) >= 0) All right, OIDs needs to be sorted in ascending lexicographical order, and the above condition checks that this constraint is fullfilled. > + graph_report("commit-graph has incorrect OID order: %s > then %s", > + oid_to_hex(_oid), > + oid_to_hex(_oid)); Nice informative error message. > + > + oidcpy(_oid, _oid); > + > + while (cur_oid.hash[0] > cur_fanout_pos) { > + uint32_t fanout_value = get_be32(g->chunk_oid_fanout + > cur_fanout_pos); > + if (i != fanout_value) > + graph_report("commit-graph has incorrect fanout > value: fanout[%d] = %u != %u", > + cur_fanout_pos, fanout_value, i); > + > + cur_fanout_pos++; > + } The k-th entry of fanout, F[k], stores the number of OIDs with first byte at most k. Let's examine it in detail on a simple example: fanout lookup -- -- 0 : 0 ---> 0: 1 1 : 2 -\ 1: 1 2 : 2 --\> 2: 3 3 : 3 ---> 3: 4 We are checking that after most significant byte (first byte) changes, then all fanout position up to the current first byte value (exclusive) point to current position in OID Lookup chunk. Seems all right; it would be nice to come up with rigorous proof that it is all right. > + } > + > + while (cur_fanout_pos < 256) { > + uint32_t fanout_value = get_be32(g->chunk_oid_fanout + > cur_fanout_pos); > + > + if (g->num_commits != fanout_value) > + graph_report("commit-graph has incorrect fanout value: > fanout[%d] = %u != %u", > + cur_fanout_pos, fanout_value, i); > + > + cur_fanout_pos++; > + } All right, this checks that all remaining fanout entries, up and including the 255-ith entry store the total number of commits, N. > + > return verify_commit_graph_error; > } > diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh > index 4ef3fe3dc2..c050ef980b 100755 > --- a/t/t5318-commit-graph.sh > +++ b/t/t5318-commit-graph.sh > @@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' ' > git commit-graph verify >output > ' > > +HASH_LEN=20 > GRAPH_BYTE_VERSION=4 > GRAPH_BYTE_HASH=5 > GRAPH_BYTE_CHUNK_COUNT=6 > @@ -258,6 +259,12 @@ GRAPH_BYTE_OID_LOOKUP_ID=`expr > $GRAPH_CHUNK_LOOKUP_OFFSET + \ >
[PATCH v3 09/20] commit-graph: verify corrupt OID fanout and lookup
In the commit-graph file, the OID fanout chunk provides an index into the OID lookup. The 'verify' subcommand should find incorrect values in the fanout. Similarly, the 'verify' subcommand should find out-of-order values in the OID lookup. Signed-off-by: Derrick Stolee--- commit-graph.c | 36 t/t5318-commit-graph.sh | 22 ++ 2 files changed, 58 insertions(+) diff --git a/commit-graph.c b/commit-graph.c index 06e3e4f9ba..cbd1aae514 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -855,6 +855,9 @@ static void graph_report(const char *fmt, ...) int verify_commit_graph(struct commit_graph *g) { + uint32_t i, cur_fanout_pos = 0; + struct object_id prev_oid, cur_oid; + if (!g) { graph_report("no commit-graph file loaded"); return 1; @@ -869,5 +872,38 @@ int verify_commit_graph(struct commit_graph *g) if (!g->chunk_commit_data) graph_report("commit-graph is missing the Commit Data chunk"); + if (verify_commit_graph_error) + return verify_commit_graph_error; + + for (i = 0; i < g->num_commits; i++) { + hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); + + if (i && oidcmp(_oid, _oid) >= 0) + graph_report("commit-graph has incorrect OID order: %s then %s", +oid_to_hex(_oid), +oid_to_hex(_oid)); + + oidcpy(_oid, _oid); + + while (cur_oid.hash[0] > cur_fanout_pos) { + uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos); + if (i != fanout_value) + graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u", +cur_fanout_pos, fanout_value, i); + + cur_fanout_pos++; + } + } + + while (cur_fanout_pos < 256) { + uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos); + + if (g->num_commits != fanout_value) + graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u", +cur_fanout_pos, fanout_value, i); + + cur_fanout_pos++; + } + return verify_commit_graph_error; } diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index 4ef3fe3dc2..c050ef980b 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' ' git commit-graph verify >output ' +HASH_LEN=20 GRAPH_BYTE_VERSION=4 GRAPH_BYTE_HASH=5 GRAPH_BYTE_CHUNK_COUNT=6 @@ -258,6 +259,12 @@ GRAPH_BYTE_OID_LOOKUP_ID=`expr $GRAPH_CHUNK_LOOKUP_OFFSET + \ 1 \* $GRAPH_CHUNK_LOOKUP_WIDTH` GRAPH_BYTE_COMMIT_DATA_ID=`expr $GRAPH_CHUNK_LOOKUP_OFFSET + \ 2 \* $GRAPH_CHUNK_LOOKUP_WIDTH` +GRAPH_FANOUT_OFFSET=`expr $GRAPH_CHUNK_LOOKUP_OFFSET + \ + $GRAPH_CHUNK_LOOKUP_WIDTH \* $GRAPH_CHUNK_LOOKUP_ROWS` +GRAPH_BYTE_FANOUT1=`expr $GRAPH_FANOUT_OFFSET + 4 \* 4` +GRAPH_BYTE_FANOUT2=`expr $GRAPH_FANOUT_OFFSET + 4 \* 255` +GRAPH_OID_LOOKUP_OFFSET=`expr $GRAPH_FANOUT_OFFSET + 4 \* 256` +GRAPH_BYTE_OID_LOOKUP_ORDER=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8` # usage: corrupt_graph_and_verify # Manipulates the commit-graph file at the position @@ -312,4 +319,19 @@ test_expect_success 'detect missing commit data chunk' ' "missing the Commit Data chunk" ' +test_expect_success 'detect incorrect fanout' ' + corrupt_graph_and_verify $GRAPH_BYTE_FANOUT1 "\01" \ + "fanout value" +' + +test_expect_success 'detect incorrect fanout' ' + corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \ + "fanout value" +' + +test_expect_success 'detect incorrect OID order' ' + corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ORDER "\01" \ + "incorrect OID order" +' + test_done -- 2.16.2.329.gfb62395de6