Re: [PATCH v7 3/7] eoie: add End of Index Entry (EOIE) extension

2018-10-02 Thread Ben Peart




On 10/1/2018 11:30 AM, Duy Nguyen wrote:

On Mon, Oct 1, 2018 at 3:46 PM Ben Peart  wrote:

@@ -2479,6 +2491,7 @@ static int do_write_index(struct index_state *istate, 
struct tempfile *tempfile,
 if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0)
 return -1;

+   offset = lseek(newfd, 0, SEEK_CUR) + write_buffer_len;


Note, lseek() could in theory return -1 on error. Looking at the error
code list in the man page it's pretty unlikely though, unless



Good catch. I'll add the logic to check for an error.


+static size_t read_eoie_extension(const char *mmap, size_t mmap_size)
+{
+   /*
+* The end of index entries (EOIE) extension is guaranteed to be last
+* so that it can be found by scanning backwards from the EOF.
+*
+* "EOIE"
+* <4-byte length>
+* <4-byte offset>
+* <20-byte hash>
+*/
+   const char *index, *eoie;
+   uint32_t extsize;
+   size_t offset, src_offset;
+   unsigned char hash[GIT_MAX_RAWSZ];
+   git_hash_ctx c;
+
+   /* ensure we have an index big enough to contain an EOIE extension */
+   if (mmap_size < sizeof(struct cache_header) + EOIE_SIZE_WITH_HEADER + 
the_hash_algo->rawsz)


Using sizeof() for on-disk structures could be dangerous because you
don't know how much padding there could be (I'm not sure if it's
actually specified in the C language spec). I've checked, on at least
x86 and amd64, sizeof(struct cache_header) is 12 bytes, but I don't
know if there are any crazy architectures out there that set higher
padding.



This must be safe as the same code has been in do_read_index() and 
verify_index_from() for a long time.


Re: [PATCH v7 3/7] eoie: add End of Index Entry (EOIE) extension

2018-10-02 Thread Ben Peart




On 10/1/2018 11:17 AM, SZEDER Gábor wrote:

On Mon, Oct 01, 2018 at 09:45:52AM -0400, Ben Peart wrote:

From: Ben Peart 

The End of Index Entry (EOIE) is used to locate the end of the variable
length index entries and the beginning of the extensions. Code can take
advantage of this to quickly locate the index extensions without having
to parse through all of the index entries.

Because it must be able to be loaded before the variable length cache
entries and other index extensions, this extension must be written last.
The signature for this extension is { 'E', 'O', 'I', 'E' }.

The extension consists of:

- 32-bit offset to the end of the index entries

- 160-bit SHA-1 over the extension types and their sizes (but not
their contents).  E.g. if we have "TREE" extension that is N-bytes
long, "REUC" extension that is M-bytes long, followed by "EOIE",
then the hash would be:

SHA-1("TREE" +  +
"REUC" + )

Signed-off-by: Ben Peart 


I think the commit message should explicitly mention that this this
extension

   - will always be written and why,
   - but is optional, so other Git implementations not supporting it will
 have no troubles reading the index,
   - and that it is written even to the shared index and why, and that
 because of this the index checksums in t1700 had to be updated.



Sure, I'll add that additional information to the commit message on the 
next spin.


Re: [PATCH v7 3/7] eoie: add End of Index Entry (EOIE) extension

2018-10-01 Thread Duy Nguyen
On Mon, Oct 1, 2018 at 3:46 PM Ben Peart  wrote:
> @@ -2479,6 +2491,7 @@ static int do_write_index(struct index_state *istate, 
> struct tempfile *tempfile,
> if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0)
> return -1;
>
> +   offset = lseek(newfd, 0, SEEK_CUR) + write_buffer_len;

Note, lseek() could in theory return -1 on error. Looking at the error
code list in the man page it's pretty unlikely though, unless

> +static size_t read_eoie_extension(const char *mmap, size_t mmap_size)
> +{
> +   /*
> +* The end of index entries (EOIE) extension is guaranteed to be last
> +* so that it can be found by scanning backwards from the EOF.
> +*
> +* "EOIE"
> +* <4-byte length>
> +* <4-byte offset>
> +* <20-byte hash>
> +*/
> +   const char *index, *eoie;
> +   uint32_t extsize;
> +   size_t offset, src_offset;
> +   unsigned char hash[GIT_MAX_RAWSZ];
> +   git_hash_ctx c;
> +
> +   /* ensure we have an index big enough to contain an EOIE extension */
> +   if (mmap_size < sizeof(struct cache_header) + EOIE_SIZE_WITH_HEADER + 
> the_hash_algo->rawsz)

Using sizeof() for on-disk structures could be dangerous because you
don't know how much padding there could be (I'm not sure if it's
actually specified in the C language spec). I've checked, on at least
x86 and amd64, sizeof(struct cache_header) is 12 bytes, but I don't
know if there are any crazy architectures out there that set higher
padding.
-- 
Duy


Re: [PATCH v7 3/7] eoie: add End of Index Entry (EOIE) extension

2018-10-01 Thread SZEDER Gábor
On Mon, Oct 01, 2018 at 09:45:52AM -0400, Ben Peart wrote:
> From: Ben Peart 
> 
> The End of Index Entry (EOIE) is used to locate the end of the variable
> length index entries and the beginning of the extensions. Code can take
> advantage of this to quickly locate the index extensions without having
> to parse through all of the index entries.
> 
> Because it must be able to be loaded before the variable length cache
> entries and other index extensions, this extension must be written last.
> The signature for this extension is { 'E', 'O', 'I', 'E' }.
> 
> The extension consists of:
> 
> - 32-bit offset to the end of the index entries
> 
> - 160-bit SHA-1 over the extension types and their sizes (but not
> their contents).  E.g. if we have "TREE" extension that is N-bytes
> long, "REUC" extension that is M-bytes long, followed by "EOIE",
> then the hash would be:
> 
> SHA-1("TREE" +  +
>   "REUC" + )
> 
> Signed-off-by: Ben Peart 

I think the commit message should explicitly mention that this this
extension

  - will always be written and why,
  - but is optional, so other Git implementations not supporting it will
have no troubles reading the index,
  - and that it is written even to the shared index and why, and that
because of this the index checksums in t1700 had to be updated.