Re: [PATCH v3 09/20] commit-graph: verify corrupt OID fanout and lookup

2018-06-04 Thread Duy Nguyen
On Mon, Jun 4, 2018 at 1:32 PM, Derrick Stolee  wrote:
> On 6/2/2018 12:38 AM, Duy Nguyen wrote:
>>
>> On Thu, May 24, 2018 at 6:25 PM, Derrick Stolee 
>> wrote:
>>>
>>> +   if (i && oidcmp(_oid, _oid) >= 0)
>>> +   graph_report("commit-graph has incorrect OID
>>> order: %s then %s",
>>> +oid_to_hex(_oid),
>>> +oid_to_hex(_oid));
>>
>> Should these strings be marked for translation with _()?
>
> I've been asking myself "Is this message helpful to anyone other than a Git
> developer?" and for this series the only one that is helpful to an end-user
> is the message about the final hash. If the hash is correct, but these other
> messages appear, then there is a bug in the code that wrote the file.
> Otherwise, file corruption is more likely and the correct course of action
> is to delete and rebuild.

Dev-only strings like this are typically prefixed with "BUG:" or
"internal error:" (unless BUG() is a better choice). Git is
unfortunately not fully i18n-ized and devs from time to time still
forget to mark string for translations when appropriate, including me.
Because of this, we still have to slowly scan through the code base
and mark more strings for translation. Something to say clearly "not
translatable on purpose" would help a lot. If "BUG:" and friends are
too much noise, a /* no translate */ comment or some other form could
also help.

But your explanation to me still sounds like corrupted file in some
form, which should be translated unless it's too cryptic. commit-graph
format may be available in non-English languages and people can still
try to figure out the problem without relying entirely on git
developers.
-- 
Duy


Re: [PATCH v3 09/20] commit-graph: verify corrupt OID fanout and lookup

2018-06-04 Thread Derrick Stolee

On 6/2/2018 12:38 AM, Duy Nguyen wrote:

On Thu, May 24, 2018 at 6:25 PM, Derrick Stolee  wrote:

+   if (i && oidcmp(_oid, _oid) >= 0)
+   graph_report("commit-graph has incorrect OID order: %s then 
%s",
+oid_to_hex(_oid),
+oid_to_hex(_oid));

Should these strings be marked for translation with _()?
I've been asking myself "Is this message helpful to anyone other than a 
Git developer?" and for this series the only one that is helpful to an 
end-user is the message about the final hash. If the hash is correct, 
but these other messages appear, then there is a bug in the code that 
wrote the file. Otherwise, file corruption is more likely and the 
correct course of action is to delete and rebuild.


Thanks for being diligent in checking.

Thanks,
-Stolee


Re: [PATCH v3 09/20] commit-graph: verify corrupt OID fanout and lookup

2018-06-01 Thread Duy Nguyen
On Thu, May 24, 2018 at 6:25 PM, Derrick Stolee  wrote:
> +   if (i && oidcmp(_oid, _oid) >= 0)
> +   graph_report("commit-graph has incorrect OID order: 
> %s then %s",
> +oid_to_hex(_oid),
> +oid_to_hex(_oid));

Should these strings be marked for translation with _()?
-- 
Duy


Re: [PATCH v3 09/20] commit-graph: verify corrupt OID fanout and lookup

2018-05-30 Thread Derrick Stolee



On 5/30/2018 9:34 AM, Jakub Narebski wrote:

Derrick Stolee  writes:


In the commit-graph file, the OID fanout chunk provides an index into
the OID lookup. The 'verify' subcommand should find incorrect values
in the fanout.

Similarly, the 'verify' subcommand should find out-of-order values in
the OID lookup.

O.K., the OID Lookup chunk is checked together with OID Fanout chunk
because those two chunks are tightly connected: OID Fanout is fanout
into OID Lookup.


Signed-off-by: Derrick Stolee 
---
  commit-graph.c  | 36 
  t/t5318-commit-graph.sh | 22 ++
  2 files changed, 58 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 06e3e4f9ba..cbd1aae514 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -855,6 +855,9 @@ static void graph_report(const char *fmt, ...)
  
  int verify_commit_graph(struct commit_graph *g)

  {
+   uint32_t i, cur_fanout_pos = 0;
+   struct object_id prev_oid, cur_oid;

Minor nitpick about the naming convention: why cur_*, and not curr_*?


+
if (!g) {
graph_report("no commit-graph file loaded");
return 1;
@@ -869,5 +872,38 @@ int verify_commit_graph(struct commit_graph *g)
if (!g->chunk_commit_data)
graph_report("commit-graph is missing the Commit Data chunk");
  
+	if (verify_commit_graph_error)

+   return verify_commit_graph_error;
+

Before checking that commits in OID Lookup are sorted, and that OID
Fanout correctly points into OID Lookup (thus also checking that OID
Lookup is sorted), it would be good to verify that sizes of those chunks
are correct.

Out of 4 standrd chunks, 1 of them (OIDF) has constant size, and 2 of
them have size given by number of commits and hash size
  - OID Fanout is (256 * 4 bytes)
  - OID Lookup is (N * H bytes),
where N is number of commits, and H is hash size

The one that is more significant is if number of commits gets corrupted
upwards, and walking through OID Lookup would lead us out of bounds,
outside the file size.

IIRC we have checked that all chunks fit within file size, isn't it?


+   for (i = 0; i < g->num_commits; i++) {
+   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);

Why do you use hashcpy() here, but oidcpy() below?


We are copying from a section of binary data, not from a struct object_id *.




+
+   if (i && oidcmp(_oid, _oid) >= 0)

All right, OIDs needs to be sorted in ascending lexicographical order,
and the above condition checks that this constraint is fullfilled.


+   graph_report("commit-graph has incorrect OID order: %s then 
%s",
+oid_to_hex(_oid),
+oid_to_hex(_oid));

Nice informative error message.


+
+   oidcpy(_oid, _oid);
+
+   while (cur_oid.hash[0] > cur_fanout_pos) {
+   uint32_t fanout_value = get_be32(g->chunk_oid_fanout + 
cur_fanout_pos);
+   if (i != fanout_value)
+   graph_report("commit-graph has incorrect fanout 
value: fanout[%d] = %u != %u",
+cur_fanout_pos, fanout_value, i);
+
+   cur_fanout_pos++;
+   }

The k-th entry of fanout, F[k], stores the number of OIDs with first
byte at most k.

Let's examine it in detail on a simple example:

fanout lookup
-- --
0 : 0  ---> 0: 1
1 : 2  -\   1: 1
2 : 2  --\> 2: 3
3 : 3  ---> 3: 4

We are checking that after most significant byte (first byte) changes,
then all fanout position up to the current first byte value (exclusive)
point to current position in OID Lookup chunk.

Seems all right; it would be nice to come up with rigorous proof that it
is all right.


+   }
+
+   while (cur_fanout_pos < 256) {
+   uint32_t fanout_value = get_be32(g->chunk_oid_fanout + 
cur_fanout_pos);
+
+   if (g->num_commits != fanout_value)
+   graph_report("commit-graph has incorrect fanout value: 
fanout[%d] = %u != %u",
+cur_fanout_pos, fanout_value, i);
+
+   cur_fanout_pos++;
+   }

All right, this checks that all remaining fanout entries, up and
including the 255-ith entry store the total number of commits, N.


+
return verify_commit_graph_error;
  }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 4ef3fe3dc2..c050ef980b 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
git commit-graph verify >output
  '
  
+HASH_LEN=20

  GRAPH_BYTE_VERSION=4
  GRAPH_BYTE_HASH=5
  GRAPH_BYTE_CHUNK_COUNT=6
@@ -258,6 +259,12 @@ 

Re: [PATCH v3 09/20] commit-graph: verify corrupt OID fanout and lookup

2018-05-30 Thread Jakub Narebski
Derrick Stolee  writes:

> In the commit-graph file, the OID fanout chunk provides an index into
> the OID lookup. The 'verify' subcommand should find incorrect values
> in the fanout.
>
> Similarly, the 'verify' subcommand should find out-of-order values in
> the OID lookup.

O.K., the OID Lookup chunk is checked together with OID Fanout chunk
because those two chunks are tightly connected: OID Fanout is fanout
into OID Lookup.

>
> Signed-off-by: Derrick Stolee 
> ---
>  commit-graph.c  | 36 
>  t/t5318-commit-graph.sh | 22 ++
>  2 files changed, 58 insertions(+)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 06e3e4f9ba..cbd1aae514 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -855,6 +855,9 @@ static void graph_report(const char *fmt, ...)
>  
>  int verify_commit_graph(struct commit_graph *g)
>  {
> + uint32_t i, cur_fanout_pos = 0;
> + struct object_id prev_oid, cur_oid;

Minor nitpick about the naming convention: why cur_*, and not curr_*?

> +
>   if (!g) {
>   graph_report("no commit-graph file loaded");
>   return 1;
> @@ -869,5 +872,38 @@ int verify_commit_graph(struct commit_graph *g)
>   if (!g->chunk_commit_data)
>   graph_report("commit-graph is missing the Commit Data chunk");
>  
> + if (verify_commit_graph_error)
> + return verify_commit_graph_error;
> +

Before checking that commits in OID Lookup are sorted, and that OID
Fanout correctly points into OID Lookup (thus also checking that OID
Lookup is sorted), it would be good to verify that sizes of those chunks
are correct.

Out of 4 standrd chunks, 1 of them (OIDF) has constant size, and 2 of
them have size given by number of commits and hash size
 - OID Fanout is (256 * 4 bytes)
 - OID Lookup is (N * H bytes),
   where N is number of commits, and H is hash size

The one that is more significant is if number of commits gets corrupted
upwards, and walking through OID Lookup would lead us out of bounds,
outside the file size.

IIRC we have checked that all chunks fit within file size, isn't it?

> + for (i = 0; i < g->num_commits; i++) {
> + hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);

Why do you use hashcpy() here, but oidcpy() below?

> +
> + if (i && oidcmp(_oid, _oid) >= 0)

All right, OIDs needs to be sorted in ascending lexicographical order,
and the above condition checks that this constraint is fullfilled. 

> + graph_report("commit-graph has incorrect OID order: %s 
> then %s",
> +  oid_to_hex(_oid),
> +  oid_to_hex(_oid));

Nice informative error message.

> +
> + oidcpy(_oid, _oid);
> +
> + while (cur_oid.hash[0] > cur_fanout_pos) {
> + uint32_t fanout_value = get_be32(g->chunk_oid_fanout + 
> cur_fanout_pos);
> + if (i != fanout_value)
> + graph_report("commit-graph has incorrect fanout 
> value: fanout[%d] = %u != %u",
> +  cur_fanout_pos, fanout_value, i);
> +
> + cur_fanout_pos++;
> + }

The k-th entry of fanout, F[k], stores the number of OIDs with first
byte at most k.

Let's examine it in detail on a simple example:

   fanout lookup
   -- --
   0 : 0  ---> 0: 1
   1 : 2  -\   1: 1
   2 : 2  --\> 2: 3
   3 : 3  ---> 3: 4

We are checking that after most significant byte (first byte) changes,
then all fanout position up to the current first byte value (exclusive)
point to current position in OID Lookup chunk.

Seems all right; it would be nice to come up with rigorous proof that it
is all right.

> + }
> +
> + while (cur_fanout_pos < 256) {
> + uint32_t fanout_value = get_be32(g->chunk_oid_fanout + 
> cur_fanout_pos);
> +
> + if (g->num_commits != fanout_value)
> + graph_report("commit-graph has incorrect fanout value: 
> fanout[%d] = %u != %u",
> +  cur_fanout_pos, fanout_value, i);
> +
> + cur_fanout_pos++;
> + }

All right, this checks that all remaining fanout entries, up and
including the 255-ith entry store the total number of commits, N.

> +
>   return verify_commit_graph_error;
>  }
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 4ef3fe3dc2..c050ef980b 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
>   git commit-graph verify >output
>  '
>  
> +HASH_LEN=20
>  GRAPH_BYTE_VERSION=4
>  GRAPH_BYTE_HASH=5
>  GRAPH_BYTE_CHUNK_COUNT=6
> @@ -258,6 +259,12 @@ GRAPH_BYTE_OID_LOOKUP_ID=`expr 
> $GRAPH_CHUNK_LOOKUP_OFFSET + \
>

[PATCH v3 09/20] commit-graph: verify corrupt OID fanout and lookup

2018-05-24 Thread Derrick Stolee
In the commit-graph file, the OID fanout chunk provides an index into
the OID lookup. The 'verify' subcommand should find incorrect values
in the fanout.

Similarly, the 'verify' subcommand should find out-of-order values in
the OID lookup.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 36 
 t/t5318-commit-graph.sh | 22 ++
 2 files changed, 58 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 06e3e4f9ba..cbd1aae514 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -855,6 +855,9 @@ static void graph_report(const char *fmt, ...)
 
 int verify_commit_graph(struct commit_graph *g)
 {
+   uint32_t i, cur_fanout_pos = 0;
+   struct object_id prev_oid, cur_oid;
+
if (!g) {
graph_report("no commit-graph file loaded");
return 1;
@@ -869,5 +872,38 @@ int verify_commit_graph(struct commit_graph *g)
if (!g->chunk_commit_data)
graph_report("commit-graph is missing the Commit Data chunk");
 
+   if (verify_commit_graph_error)
+   return verify_commit_graph_error;
+
+   for (i = 0; i < g->num_commits; i++) {
+   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+   if (i && oidcmp(_oid, _oid) >= 0)
+   graph_report("commit-graph has incorrect OID order: %s 
then %s",
+oid_to_hex(_oid),
+oid_to_hex(_oid));
+
+   oidcpy(_oid, _oid);
+
+   while (cur_oid.hash[0] > cur_fanout_pos) {
+   uint32_t fanout_value = get_be32(g->chunk_oid_fanout + 
cur_fanout_pos);
+   if (i != fanout_value)
+   graph_report("commit-graph has incorrect fanout 
value: fanout[%d] = %u != %u",
+cur_fanout_pos, fanout_value, i);
+
+   cur_fanout_pos++;
+   }
+   }
+
+   while (cur_fanout_pos < 256) {
+   uint32_t fanout_value = get_be32(g->chunk_oid_fanout + 
cur_fanout_pos);
+
+   if (g->num_commits != fanout_value)
+   graph_report("commit-graph has incorrect fanout value: 
fanout[%d] = %u != %u",
+cur_fanout_pos, fanout_value, i);
+
+   cur_fanout_pos++;
+   }
+
return verify_commit_graph_error;
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 4ef3fe3dc2..c050ef980b 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
git commit-graph verify >output
 '
 
+HASH_LEN=20
 GRAPH_BYTE_VERSION=4
 GRAPH_BYTE_HASH=5
 GRAPH_BYTE_CHUNK_COUNT=6
@@ -258,6 +259,12 @@ GRAPH_BYTE_OID_LOOKUP_ID=`expr $GRAPH_CHUNK_LOOKUP_OFFSET 
+ \
  1 \* $GRAPH_CHUNK_LOOKUP_WIDTH`
 GRAPH_BYTE_COMMIT_DATA_ID=`expr $GRAPH_CHUNK_LOOKUP_OFFSET + \
2 \* $GRAPH_CHUNK_LOOKUP_WIDTH`
+GRAPH_FANOUT_OFFSET=`expr $GRAPH_CHUNK_LOOKUP_OFFSET + \
+ $GRAPH_CHUNK_LOOKUP_WIDTH \* $GRAPH_CHUNK_LOOKUP_ROWS`
+GRAPH_BYTE_FANOUT1=`expr $GRAPH_FANOUT_OFFSET + 4 \* 4`
+GRAPH_BYTE_FANOUT2=`expr $GRAPH_FANOUT_OFFSET + 4 \* 255`
+GRAPH_OID_LOOKUP_OFFSET=`expr $GRAPH_FANOUT_OFFSET + 4 \* 256`
+GRAPH_BYTE_OID_LOOKUP_ORDER=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8`
 
 # usage: corrupt_graph_and_verify   
 # Manipulates the commit-graph file at the position
@@ -312,4 +319,19 @@ test_expect_success 'detect missing commit data chunk' '
"missing the Commit Data chunk"
 '
 
+test_expect_success 'detect incorrect fanout' '
+   corrupt_graph_and_verify $GRAPH_BYTE_FANOUT1 "\01" \
+   "fanout value"
+'
+
+test_expect_success 'detect incorrect fanout' '
+   corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \
+   "fanout value"
+'
+
+test_expect_success 'detect incorrect OID order' '
+   corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ORDER "\01" \
+   "incorrect OID order"
+'
+
 test_done
-- 
2.16.2.329.gfb62395de6