Re: [PATCH v2 06/24] multi-pack-index: load into memory

2018-07-06 Thread Junio C Hamano
Eric Sunshine  writes:

> On Thu, Jul 5, 2018 at 10:20 AM Derrick Stolee  wrote:
>> On 6/25/2018 3:38 PM, Junio C Hamano wrote:
>> While I don't use substitutions in this patch, I do use them in later
>> patches. Here is the final version of this method:
>>
>> midx_read_expect () {
>>  NUM_PACKS=$1
>>  NUM_OBJECTS=$2
>>  NUM_CHUNKS=$3
>>  EXTRA_CHUNKS="$5"
>>  cat >expect <<-\EOF
>>  header: 4d494458 1 $NUM_CHUNKS $NUM_PACKS
>>  chunks: pack_names oid_fanout oid_lookup
>> object_offsets$EXTRA_CHUNKS
>>  num_objects: $NUM_OBJECTS
>>  packs:
>>  EOF
>>
>> Using <<-\EOF causes these substitutions to fail. Is there a different
>> way I should construct this method?
>
> When you need to interpolate variables into the here-doc, use <<-EOF;
> when you don't, use <<-\EOF.

I think what was said is "in an early step there is no need to
interpolate but the same here-doc will need substitution in a later
step, and that is why I started an early step with a form without
quoting", which is different from "when should I use the form with
and without quoting?"

I think a reasonable response would have been "then please do use
the quoted form in the early step to help reviewers to let them know
there is not yet any substitutions, and then switch to quote-less form
at the step that starts needing substitution.  That way, reviewers can
see the test started to become "interesting" by interpolating variable
bits in the test vector by seeing the line with "<

Re: [PATCH v2 06/24] multi-pack-index: load into memory

2018-07-05 Thread Eric Sunshine
On Thu, Jul 5, 2018 at 10:20 AM Derrick Stolee  wrote:
> On 6/25/2018 3:38 PM, Junio C Hamano wrote:
> While I don't use substitutions in this patch, I do use them in later
> patches. Here is the final version of this method:
>
> midx_read_expect () {
>  NUM_PACKS=$1
>  NUM_OBJECTS=$2
>  NUM_CHUNKS=$3
>  EXTRA_CHUNKS="$5"
>  cat >expect <<-\EOF
>  header: 4d494458 1 $NUM_CHUNKS $NUM_PACKS
>  chunks: pack_names oid_fanout oid_lookup
> object_offsets$EXTRA_CHUNKS
>  num_objects: $NUM_OBJECTS
>  packs:
>  EOF
>
> Using <<-\EOF causes these substitutions to fail. Is there a different
> way I should construct this method?

When you need to interpolate variables into the here-doc, use <<-EOF;
when you don't, use <<-\EOF.


Re: [PATCH v2 06/24] multi-pack-index: load into memory

2018-07-05 Thread Derrick Stolee

On 6/25/2018 3:38 PM, Junio C Hamano wrote:

Derrick Stolee  writes:

+   cat >expect <<- EOF

"<<-\EOF", i.e. make it easy for readers to spot that there is no
funny substitutions happening in the here-doc body.


While I don't use substitutions in this patch, I do use them in later 
patches. Here is the final version of this method:


midx_read_expect () {
    NUM_PACKS=$1
    NUM_OBJECTS=$2
    NUM_CHUNKS=$3
    OBJECT_DIR=$4
    EXTRA_CHUNKS="$5"
    cat >expect <<-\EOF
    header: 4d494458 1 $NUM_CHUNKS $NUM_PACKS
    chunks: pack_names oid_fanout oid_lookup 
object_offsets$EXTRA_CHUNKS

    num_objects: $NUM_OBJECTS
    packs:
    EOF
    if [ $NUM_PACKS -ge 1 ]
    then
    ls $OBJECT_DIR/pack/ | grep idx | sort >> expect
    fi
    printf "object_dir: $OBJECT_DIR\n" >>expect &&
    test-tool read-midx $OBJECT_DIR >actual &&
    test_cmp expect actual
}

Using <<-\EOF causes these substitutions to fail. Is there a different 
way I should construct this method?


Thanks,
-Stolee


Re: [PATCH v2 06/24] multi-pack-index: load into memory

2018-06-25 Thread Junio C Hamano
Derrick Stolee  writes:

> +#define MIDX_HASH_LEN 20
> +#define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
>  
>  static char *get_midx_filename(const char *object_dir)
>  {
>   return xstrfmt("%s/pack/multi-pack-index", object_dir);
>  }
>  
> +struct multi_pack_index *load_multi_pack_index(const char *object_dir)
> +{
> + struct multi_pack_index *m = NULL;
> + int fd;
> + struct stat st;
> + size_t midx_size;
> + void *midx_map = NULL;
> + uint32_t hash_version;
> + char *midx_name = get_midx_filename(object_dir);
> +
> + fd = git_open(midx_name);
> +
> + if (fd < 0) {
> + error_errno(_("failed to read %s"), midx_name);
> + FREE_AND_NULL(midx_name);
> + return NULL;
> + }
> + if (fstat(fd, )) {
> + error_errno(_("failed to read %s"), midx_name);
> + FREE_AND_NULL(midx_name);
> + close(fd);
> + return NULL;
> + }
> +
> + midx_size = xsize_t(st.st_size);
> +
> + if (midx_size < MIDX_MIN_SIZE) {
> + close(fd);
> + error(_("multi-pack-index file %s is too small"), midx_name);
> + goto cleanup_fail;
> + }
> +
> + FREE_AND_NULL(midx_name);

Error handling in the above part looks a bit inconsistent.  I first
thought that the earlier ones manually clean up and leave because
jumping to cleanup_fail would need a successfully opened fd and
successfully mmapped midx_map, but the above "goto" forces
cleanup_fail: to munmap NULL and close an already closed fd.

I wonder if it is simpler to do

cleanup_fail:
/* no need to check for NULL when freeing */
free(m);
free(midx_name);
if (midx_map)
munmap(midx_map, midx_size);
if (0 <= fd)
close(fd);
return NULL;

and have all of the above error codepath to jump there.

> + midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0);
> +
> + m = xcalloc(1, sizeof(*m) + strlen(object_dir) + 1);
> + strcpy(m->object_dir, object_dir);
> + m->data = midx_map;
> +
> + m->signature = get_be32(m->data);
> + if (m->signature != MIDX_SIGNATURE) {
> + error(_("multi-pack-index signature 0x%08x does not match 
> signature 0x%08x"),
> +   m->signature, MIDX_SIGNATURE);
> + goto cleanup_fail;
> + }
> +
> + m->version = m->data[4];
> + if (m->version != MIDX_VERSION) {
> + error(_("multi-pack-index version %d not recognized"),
> +   m->version);
> + goto cleanup_fail;
> + }
> +
> + hash_version = m->data[5];

Is there a good existing example to show a better way to avoid these
hard-coded constants that describe/define the file format?

> + if (hash_version != MIDX_HASH_VERSION) {
> + error(_("hash version %u does not match"), hash_version);
> + goto cleanup_fail;
> + }
> + m->hash_len = MIDX_HASH_LEN;
> +
> + m->num_chunks = *(m->data + 6);

By the way, this mixture of m->data[4] and *(m->data + 6) is even
worse.  You could do get_be32(&8[m->data]) if you want to irritate
readers even more ;-)

> + m->num_packs = get_be32(m->data + 8);
> +
> + return m;
> +
> +cleanup_fail:
> + FREE_AND_NULL(m);
> + FREE_AND_NULL(midx_name);
> + munmap(midx_map, midx_size);
> + close(fd);
> + return NULL;
> +}
> +


> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index 8622a7cdce..0372704c96 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -3,9 +3,19 @@
>  test_description='multi-pack-indexes'
>  . ./test-lib.sh
>  
> +midx_read_expect() {

"midx_read_expect () {", i.e. SP on both sides of (), please.

> + cat >expect <<- EOF

"<<-\EOF", i.e. make it easy for readers to spot that there is no
funny substitutions happening in the here-doc body.


> + header: 4d494458 1 0 0
> + object_dir: .
> + EOF
> + test-tool read-midx . >actual &&
> + test_cmp expect actual
> +}
> +
>  test_expect_success 'write midx with no packs' '
>   git multi-pack-index --object-dir=. write &&
> - test_path_is_file pack/multi-pack-index
> + test_path_is_file pack/multi-pack-index &&
> + midx_read_expect
>  '
>  
>  test_done


[PATCH v2 06/24] multi-pack-index: load into memory

2018-06-25 Thread Derrick Stolee
Create a new multi_pack_index struct for loading multi-pack-indexes into
memory. Create a test-tool builtin for reading basic information about
that multi-pack-index to verify the correct data is written.

Signed-off-by: Derrick Stolee 
---
 Makefile|  1 +
 midx.c  | 78 +
 midx.h  |  4 ++
 object-store.h  | 16 
 t/helper/test-read-midx.c   | 34 
 t/helper/test-tool.c|  1 +
 t/helper/test-tool.h|  1 +
 t/t5319-multi-pack-index.sh | 12 +-
 8 files changed, 146 insertions(+), 1 deletion(-)
 create mode 100644 t/helper/test-read-midx.c

diff --git a/Makefile b/Makefile
index f5636c711d..0b801d1b16 100644
--- a/Makefile
+++ b/Makefile
@@ -717,6 +717,7 @@ TEST_BUILTINS_OBJS += test-online-cpus.o
 TEST_BUILTINS_OBJS += test-path-utils.o
 TEST_BUILTINS_OBJS += test-prio-queue.o
 TEST_BUILTINS_OBJS += test-read-cache.o
+TEST_BUILTINS_OBJS += test-read-midx.o
 TEST_BUILTINS_OBJS += test-ref-store.o
 TEST_BUILTINS_OBJS += test-regex.o
 TEST_BUILTINS_OBJS += test-revision-walking.o
diff --git a/midx.c b/midx.c
index 393d526881..0977397d6a 100644
--- a/midx.c
+++ b/midx.c
@@ -1,18 +1,96 @@
 #include "cache.h"
 #include "csum-file.h"
 #include "lockfile.h"
+#include "object-store.h"
 #include "midx.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
 #define MIDX_HASH_VERSION 1
 #define MIDX_HEADER_SIZE 12
+#define MIDX_HASH_LEN 20
+#define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
 static char *get_midx_filename(const char *object_dir)
 {
return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
 
+struct multi_pack_index *load_multi_pack_index(const char *object_dir)
+{
+   struct multi_pack_index *m = NULL;
+   int fd;
+   struct stat st;
+   size_t midx_size;
+   void *midx_map = NULL;
+   uint32_t hash_version;
+   char *midx_name = get_midx_filename(object_dir);
+
+   fd = git_open(midx_name);
+
+   if (fd < 0) {
+   error_errno(_("failed to read %s"), midx_name);
+   FREE_AND_NULL(midx_name);
+   return NULL;
+   }
+   if (fstat(fd, )) {
+   error_errno(_("failed to read %s"), midx_name);
+   FREE_AND_NULL(midx_name);
+   close(fd);
+   return NULL;
+   }
+
+   midx_size = xsize_t(st.st_size);
+
+   if (midx_size < MIDX_MIN_SIZE) {
+   close(fd);
+   error(_("multi-pack-index file %s is too small"), midx_name);
+   goto cleanup_fail;
+   }
+
+   FREE_AND_NULL(midx_name);
+
+   midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0);
+
+   m = xcalloc(1, sizeof(*m) + strlen(object_dir) + 1);
+   strcpy(m->object_dir, object_dir);
+   m->data = midx_map;
+
+   m->signature = get_be32(m->data);
+   if (m->signature != MIDX_SIGNATURE) {
+   error(_("multi-pack-index signature 0x%08x does not match 
signature 0x%08x"),
+ m->signature, MIDX_SIGNATURE);
+   goto cleanup_fail;
+   }
+
+   m->version = m->data[4];
+   if (m->version != MIDX_VERSION) {
+   error(_("multi-pack-index version %d not recognized"),
+ m->version);
+   goto cleanup_fail;
+   }
+
+   hash_version = m->data[5];
+   if (hash_version != MIDX_HASH_VERSION) {
+   error(_("hash version %u does not match"), hash_version);
+   goto cleanup_fail;
+   }
+   m->hash_len = MIDX_HASH_LEN;
+
+   m->num_chunks = *(m->data + 6);
+
+   m->num_packs = get_be32(m->data + 8);
+
+   return m;
+
+cleanup_fail:
+   FREE_AND_NULL(m);
+   FREE_AND_NULL(midx_name);
+   munmap(midx_map, midx_size);
+   close(fd);
+   return NULL;
+}
+
 static size_t write_midx_header(struct hashfile *f,
unsigned char num_chunks,
uint32_t num_packs)
diff --git a/midx.h b/midx.h
index dbdbe9f873..2d83dd9ec1 100644
--- a/midx.h
+++ b/midx.h
@@ -1,6 +1,10 @@
 #ifndef __MIDX_H__
 #define __MIDX_H__
 
+struct multi_pack_index;
+
+struct multi_pack_index *load_multi_pack_index(const char *object_dir);
+
 int write_midx_file(const char *object_dir);
 
 #endif
diff --git a/object-store.h b/object-store.h
index d683112fd7..4f410841cc 100644
--- a/object-store.h
+++ b/object-store.h
@@ -84,6 +84,22 @@ struct packed_git {
char pack_name[FLEX_ARRAY]; /* more */
 };
 
+struct multi_pack_index {
+   int fd;
+
+   const unsigned char *data;
+   size_t data_len;
+
+   uint32_t signature;
+   unsigned char version;
+   unsigned char hash_len;
+   unsigned char num_chunks;
+   uint32_t num_packs;
+   uint32_t num_objects;
+
+   char object_dir[FLEX_ARRAY];
+};
+
 struct raw_object_store {
/*
 * Path to the