Re: [PATCH v7 3/7] eoie: add End of Index Entry (EOIE) extension
On 10/1/2018 11:30 AM, Duy Nguyen wrote: On Mon, Oct 1, 2018 at 3:46 PM Ben Peart wrote: @@ -2479,6 +2491,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0) return -1; + offset = lseek(newfd, 0, SEEK_CUR) + write_buffer_len; Note, lseek() could in theory return -1 on error. Looking at the error code list in the man page it's pretty unlikely though, unless Good catch. I'll add the logic to check for an error. +static size_t read_eoie_extension(const char *mmap, size_t mmap_size) +{ + /* +* The end of index entries (EOIE) extension is guaranteed to be last +* so that it can be found by scanning backwards from the EOF. +* +* "EOIE" +* <4-byte length> +* <4-byte offset> +* <20-byte hash> +*/ + const char *index, *eoie; + uint32_t extsize; + size_t offset, src_offset; + unsigned char hash[GIT_MAX_RAWSZ]; + git_hash_ctx c; + + /* ensure we have an index big enough to contain an EOIE extension */ + if (mmap_size < sizeof(struct cache_header) + EOIE_SIZE_WITH_HEADER + the_hash_algo->rawsz) Using sizeof() for on-disk structures could be dangerous because you don't know how much padding there could be (I'm not sure if it's actually specified in the C language spec). I've checked, on at least x86 and amd64, sizeof(struct cache_header) is 12 bytes, but I don't know if there are any crazy architectures out there that set higher padding. This must be safe as the same code has been in do_read_index() and verify_index_from() for a long time.
Re: [PATCH v7 3/7] eoie: add End of Index Entry (EOIE) extension
On 10/1/2018 11:17 AM, SZEDER Gábor wrote: On Mon, Oct 01, 2018 at 09:45:52AM -0400, Ben Peart wrote: From: Ben Peart The End of Index Entry (EOIE) is used to locate the end of the variable length index entries and the beginning of the extensions. Code can take advantage of this to quickly locate the index extensions without having to parse through all of the index entries. Because it must be able to be loaded before the variable length cache entries and other index extensions, this extension must be written last. The signature for this extension is { 'E', 'O', 'I', 'E' }. The extension consists of: - 32-bit offset to the end of the index entries - 160-bit SHA-1 over the extension types and their sizes (but not their contents). E.g. if we have "TREE" extension that is N-bytes long, "REUC" extension that is M-bytes long, followed by "EOIE", then the hash would be: SHA-1("TREE" + + "REUC" + ) Signed-off-by: Ben Peart I think the commit message should explicitly mention that this this extension - will always be written and why, - but is optional, so other Git implementations not supporting it will have no troubles reading the index, - and that it is written even to the shared index and why, and that because of this the index checksums in t1700 had to be updated. Sure, I'll add that additional information to the commit message on the next spin.
Re: [PATCH v7 3/7] eoie: add End of Index Entry (EOIE) extension
On Mon, Oct 1, 2018 at 3:46 PM Ben Peart wrote: > @@ -2479,6 +2491,7 @@ static int do_write_index(struct index_state *istate, > struct tempfile *tempfile, > if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0) > return -1; > > + offset = lseek(newfd, 0, SEEK_CUR) + write_buffer_len; Note, lseek() could in theory return -1 on error. Looking at the error code list in the man page it's pretty unlikely though, unless > +static size_t read_eoie_extension(const char *mmap, size_t mmap_size) > +{ > + /* > +* The end of index entries (EOIE) extension is guaranteed to be last > +* so that it can be found by scanning backwards from the EOF. > +* > +* "EOIE" > +* <4-byte length> > +* <4-byte offset> > +* <20-byte hash> > +*/ > + const char *index, *eoie; > + uint32_t extsize; > + size_t offset, src_offset; > + unsigned char hash[GIT_MAX_RAWSZ]; > + git_hash_ctx c; > + > + /* ensure we have an index big enough to contain an EOIE extension */ > + if (mmap_size < sizeof(struct cache_header) + EOIE_SIZE_WITH_HEADER + > the_hash_algo->rawsz) Using sizeof() for on-disk structures could be dangerous because you don't know how much padding there could be (I'm not sure if it's actually specified in the C language spec). I've checked, on at least x86 and amd64, sizeof(struct cache_header) is 12 bytes, but I don't know if there are any crazy architectures out there that set higher padding. -- Duy
Re: [PATCH v7 3/7] eoie: add End of Index Entry (EOIE) extension
On Mon, Oct 01, 2018 at 09:45:52AM -0400, Ben Peart wrote: > From: Ben Peart > > The End of Index Entry (EOIE) is used to locate the end of the variable > length index entries and the beginning of the extensions. Code can take > advantage of this to quickly locate the index extensions without having > to parse through all of the index entries. > > Because it must be able to be loaded before the variable length cache > entries and other index extensions, this extension must be written last. > The signature for this extension is { 'E', 'O', 'I', 'E' }. > > The extension consists of: > > - 32-bit offset to the end of the index entries > > - 160-bit SHA-1 over the extension types and their sizes (but not > their contents). E.g. if we have "TREE" extension that is N-bytes > long, "REUC" extension that is M-bytes long, followed by "EOIE", > then the hash would be: > > SHA-1("TREE" + + > "REUC" + ) > > Signed-off-by: Ben Peart I think the commit message should explicitly mention that this this extension - will always be written and why, - but is optional, so other Git implementations not supporting it will have no troubles reading the index, - and that it is written even to the shared index and why, and that because of this the index checksums in t1700 had to be updated.