Re: [PATCH v3 24/24] midx: clear midx on repack

2018-07-05 Thread Eric Sunshine
On Thu, Jul 5, 2018 at 8:54 PM Derrick Stolee  wrote:
> If a 'git repack' command replaces existing packfiles, then we must
> clear the existing multi-pack-index before moving the packfiles it
> references.
>
> Signed-off-by: Derrick Stolee 
> ---
>  builtin/repack.c |  8 
>  midx.c   | 12 
>  midx.h   |  1 +
>  3 files changed, 21 insertions(+)

This seems like a pretty important bit of functionality. Is there any
way to add a test of it to t5319 or would it be too difficult (or not
important enough)?


Re: [PATCH v3 17/24] midx: prepare midxed_git struct

2018-07-05 Thread Eric Sunshine
On Thu, Jul 5, 2018 at 8:54 PM Derrick Stolee  wrote:
> midx: prepare midxed_git struct

What's a "midxed_git"? I don't see it in the code anywhere.

> Signed-off-by: Derrick Stolee 


Re: [PATCH v3 16/24] config: create core.multiPackIndex setting

2018-07-05 Thread Eric Sunshine
On Thu, Jul 5, 2018 at 8:54 PM Derrick Stolee  wrote:
> The core.multiPackIndex config setting controls the multi-pack-
> index (MIDX) feature. If false, the setting will disable all reads
> from the multi-pack-index file.
>
> Add comparison commands in t5319-multi-pack-index.sh to check
> typical Git behavior remains the same as the config setting is turned
> on and off. This currently includes 'git rev-list' and 'git log'
> commands to trigger several object database reads.
>
> Signed-off-by: Derrick Stolee 
> ---
> diff --git a/cache.h b/cache.h
> @@ -814,6 +814,7 @@ extern char *git_replace_ref_base;
> +extern int core_multi_pack_index;
> diff --git a/config.c b/config.c
> @@ -1313,6 +1313,11 @@ static int git_default_core_config(const char *var, 
> const char *value)
> +   if (!strcmp(var, "core.multipackindex")) {
> +   core_multi_pack_index = git_config_bool(var, value);
> +   return 0;
> +   }

This is a rather unusual commit. This new configuration is assigned,
but it's never actually consulted, which means that it's impossible
for it to have any impact on functionality, yet tests are being added
to check whether it _did_ have any impact on functionality. Confusing.

Patch 17 does consult 'core_multi_pack_index', so it's only at that
point that it could have any impact. This situation would be less
confusing if you swapped patches 16 and 17 (and, of course, move the
declaration of 'core_multi_pack_index' to patch 17 with a reasonable
default value).

> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> @@ -61,12 +63,42 @@ test_expect_success 'write midx with one v1 pack' '
> +midx_git_two_modes() {
> +   git -c core.multiPackIndex=false $1 >expect &&
> +   git -c core.multiPackIndex=true $1 >actual &&
> +   test_cmp expect actual
> +}
> +
> +compare_results_with_midx() {
> +   MSG=$1
> +   test_expect_success "check normal git operations: $MSG" '
> +   midx_git_two_modes "rev-list --objects --all" &&
> +   midx_git_two_modes "log --raw"
> +   '
> +}

Here, you define midx_git_two_modes() and compare_results_with_midx()...

>  test_expect_success 'write midx with one v2 pack' '
> -   git pack-objects --index-version=2,0x40 pack/test  -   git multi-pack-index --object-dir=. write &&
> -   midx_read_expect 1 17 4 .
> +   git pack-objects --index-version=2,0x40 $objdir/pack/test  +   git multi-pack-index --object-dir=$objdir write &&
> +   midx_read_expect 1 17 4 $objdir
>  '
>
> +midx_git_two_modes() {
> +   git -c core.multiPackIndex=false $1 >expect &&
> +   git -c core.multiPackIndex=true $1 >actual &&
> +   test_cmp expect actual
> +}
> +
> +compare_results_with_midx() {
> +   MSG=$1
> +   test_expect_success "check normal git operations: $MSG" '
> +   midx_git_two_modes "rev-list --objects --all" &&
> +   midx_git_two_modes "log --raw"
> +   '
> +}

... and here you define both functions again.

> +
> +compare_results_with_midx "one v2 pack"


Re: [PATCH v3 15/24] midx: write object offsets

2018-07-05 Thread Eric Sunshine
On Thu, Jul 5, 2018 at 8:54 PM Derrick Stolee  wrote:
> The final pair of chunks for the multi-pack-index file stores the object
> offsets. We default to using 32-bit offsets as in the pack-index version
> 1 format, but if there exists an offset larger than 32-bits, we use a
> trick similar to the pack-index version 2 format by storing all offsets
> at least 2^31 in a 64-bit table; we use the 32-bit table to point into
> that 64-bit table as necessary.
>
> We only store these 64-bit offsets if necessary, so create a test that
> manipulates a version 2 pack-index to fake a large offset. This allows
> us to test that the large offset table is created, but the data does not
> match the actual packfile offsets. The multi-pack-index offset does match
> the (corrupted) pack-index offset, so a future feature will compare these
> offsets during a 'verify' step.
>
> Signed-off-by: Derrick Stolee 
> ---
> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> @@ -6,25 +6,28 @@ test_description='multi-pack-indexes'
> +# usage: corrupt_data   []
> +corrupt_data() {

Style: corrupt_data () {

> +   file=$1
> +   pos=$2
> +   data="${3:-\0}"
> +   printf "$data" | dd of="$file" bs=1 seek="$pos" conv=notrunc
> +}
> +
> +# Force 64-bit offsets by manipulating the idx file.
> +# This makes the IDX file _incorrect_ so be careful to clean up after!
> +test_expect_success 'force some 64-bit offsets with pack-objects' '
> +   mkdir objects64 &&
> +   mkdir objects64/pack &&
> +   pack64=$(git pack-objects --index-version=2,0x40 
> objects64/pack/test-64  +   idx64=objects64/pack/test-64-$pack64.idx &&
> +   chmod u+w $idx64 &&

I guess you don't have to worry about the POSIXPERM prerequisite here
because the file is already writable on Windows, right?

> +   corrupt_data $idx64 2899 "\02" &&
> +   midx64=$(git multi-pack-index write --object-dir=objects64) &&
> +   midx_read_expect 1 62 5 objects64 " large_offsets"
>  '


Re: [PATCH v3 07/24] multi-pack-index: expand test data

2018-07-05 Thread Eric Sunshine
On Fri, Jul 6, 2018 at 12:36 AM Eric Sunshine  wrote:
> On Thu, Jul 5, 2018 at 8:54 PM Derrick Stolee  wrote:
> > +test_expect_success 'write midx with one v1 pack' '
> > +   pack=$(git pack-objects --index-version=1 pack/test  > +   test_when_finished rm pack/test-$pack.pack pack/test-$pack.idx 
> > pack/multi-pack-index &&
> > +   git multi-pack-index --object-dir=. write &&
> > +   midx_read_expect
> > +'
>
> It's odd to see all these tests ending by creating an 'expect' file
> but not actually doing anything with that file.

Ignore this comment. As mentioned in my follow-up to 6/24, I missed
the fact that midx_read_expect() is doing more than just creating the
'expect' file.


Re: [PATCH v3 06/24] multi-pack-index: load into memory

2018-07-05 Thread Eric Sunshine
On Fri, Jul 6, 2018 at 12:19 AM Eric Sunshine  wrote:
> On Thu, Jul 5, 2018 at 8:53 PM Derrick Stolee  wrote:
> > +midx_read_expect () {
> > +   cat >expect <<-EOF
> > +   header: 4d494458 1 0 0
> > +   object_dir: .
> > +   EOF
> > +   test-tool read-midx . >actual &&
> > +   test_cmp expect actual
> > +}
> > +
> >  test_expect_success 'write midx with no packs' '
> > git multi-pack-index --object-dir=. write &&
> > -   test_path_is_file pack/multi-pack-index
> > +   test_path_is_file pack/multi-pack-index &&
> > +   midx_read_expect
>
> Kind of a do-nothing change. I wonder if this step would better be
> delayed until a later patch. (Not necessarily a big deal.)

Never mind. I missed that midx_read_expect() is comparing the 'expect'
and 'actual' files, so this is not a do-nothing change. The function
name may have helped to mislead me (or I was just unobservant or too
focused on its creation of the 'expect' file). I wonder if a different
name would have helped. midx_check(), midx_verify()? Meh.


Re: [PATCH v3 13/24] midx: write object ids in a chunk

2018-07-05 Thread Eric Sunshine
On Thu, Jul 5, 2018 at 8:54 PM Derrick Stolee  wrote:
> Signed-off-by: Derrick Stolee 
> ---
> diff --git a/midx.c b/midx.c
> @@ -18,9 +18,10 @@
> @@ -384,6 +391,32 @@ static size_t write_midx_pack_names(struct hashfile *f,
> +static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char 
> hash_len,
> +   struct pack_midx_entry *objects,
> +   uint32_t nr_objects)
> +{
> +   struct pack_midx_entry *list = objects;
> +   uint32_t i;
> +   size_t written = 0;
> +
> +   for (i = 0; i < nr_objects; i++) {
> +   struct pack_midx_entry *obj = list++;
> +
> +   if (i < nr_objects - 1) {
> +   struct pack_midx_entry *next = list;
> +   if (oidcmp(>oid, >oid) >= 0)
> +   BUG("OIDs not in order: %s >= %s",
> +   oid_to_hex(>oid),
> +   oid_to_hex(>oid));

The above two lines are arguments to BUG(), thus should be indented more.

> +   }
> +


Re: [PATCH v3 11/24] midx: read pack names into array

2018-07-05 Thread Eric Sunshine
On Thu, Jul 5, 2018 at 8:54 PM Derrick Stolee  wrote:
> Signed-off-by: Derrick Stolee 
> ---
> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> @@ -8,8 +8,13 @@ midx_read_expect () {
> cat >expect <<-EOF

Broken &&-chain.

> header: 4d494458 1 1 $NUM_PACKS
> chunks: pack_names
> -   object_dir: .
> +   packs:
> EOF
> +   if [ $NUM_PACKS -ge 1 ]

On this project, use 'test' rather than '['.

> +   then
> +   ls pack/ | grep idx | sort >> expect
> +   fi

Broken &&-chain.

> +   printf "object_dir: .\n" >>expect &&

All this code building up 'expect' could be in a {...} block to make
it clearer and less noisy:

{
cat <<-EOF &&
...
EOF
if test $NUM_PACKS -ge 1
then
ls -1 pack/ | ...
fi &&
printf "..."
} >expect &&

And, some pointless bike-shedding while here: perhaps dashes instead
of underlines? "pack-names", "object-dir"

> test-tool read-midx . >actual &&
> test_cmp expect actual
>  }


Re: [PATCH v3 07/24] multi-pack-index: expand test data

2018-07-05 Thread Eric Sunshine
On Thu, Jul 5, 2018 at 8:54 PM Derrick Stolee  wrote:
> multi-pack-index: expand test data

Since this patch is touching only t5319, a more typical title would be:

t5319: expand test data

> As we build the multi-pack-index file format, we want to test the format
> on real repoasitories. Add tests to t5319-multi-pack-index.sh that

s/repoasitories/repositories/

And, since the title now mentions t5319, this can become simply:

Add tests that...

> create repository data including multiple packfiles with both version 1
> and version 2 formats.
>
> The current 'git multi-pack-index write' command will always write the
> same file with no "real" data. This will be expanded in future commits,
> along with the test expectations.
>
> Signed-off-by: Derrick Stolee 
> ---
> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> @@ -13,9 +13,108 @@ midx_read_expect () {
>  test_expect_success 'write midx with no packs' '
> +   test_when_finished rm -f pack/multi-pack-index &&

Should this test_when_finished() have been added in an earlier patch?
It seems out of place here.

> git multi-pack-index --object-dir=. write &&
> test_path_is_file pack/multi-pack-index &&
> midx_read_expect
>  '
>
> +test_expect_success 'create objects' '
> +   for i in $(test_seq 1 5)
> +   do
> +   iii=$(printf '%03i' $i)
> +   test-tool genrandom "bar" 200 >wide_delta_$iii &&
> +   test-tool genrandom "baz $iii" 50 >>wide_delta_$iii &&

Alternately:

{
test-tool genrandom "bar" 200 &&
 test-tool genrandom "baz $iii" 50
} >wide_delta_$iii &&

which makes it easier to see at a glance that both commands are
populating the same file. Same comment for the other files. (Not worth
a re-roll.)

> +   test-tool genrandom "foo"$i 100 >deep_delta_$iii &&
> +   test-tool genrandom "foo"$(expr $i + 1) 100 >>deep_delta_$iii 
> &&
> +   test-tool genrandom "foo"$(expr $i + 2) 100 >>deep_delta_$iii 
> &&

Or, just use POSIX arithmetic expansion:

$(( $i + 1 ))

> +   echo $iii >file_$iii &&
> +   test-tool genrandom "$iii" 8192 >>file_$iii &&
> +   git update-index --add file_$iii deep_delta_$iii 
> wide_delta_$iii &&
> +   i=$(expr $i + 1) || return 1

Ditto, POSIX arithmetic expansion:

i=$(( $i + 1 ))

(Not worth a re-roll.)

> +   done &&
> +   { echo 101 && test-tool genrandom 100 8192; } >file_101 &&
> +   git update-index --add file_101 &&
> +   tree=$(git write-tree) &&
> +   commit=$(git commit-tree $tree  +   echo $tree &&
> +   git ls-tree $tree | sed -e "s/.* \\([0-9a-f]*\\).*/\\1/"
> +   } >obj-list &&

Perhaps indent the content of the {...} block?

> +   git update-ref HEAD $commit
> +'
> +
> +test_expect_success 'write midx with one v1 pack' '
> +   pack=$(git pack-objects --index-version=1 pack/test  +   test_when_finished rm pack/test-$pack.pack pack/test-$pack.idx 
> pack/multi-pack-index &&
> +   git multi-pack-index --object-dir=. write &&
> +   midx_read_expect
> +'

It's odd to see all these tests ending by creating an 'expect' file
but not actually doing anything with that file.

> +test_expect_success 'Add more objects' '

s/Add/add/

> +   for i in $(test_seq 6 10)
> +   do
> +   iii=$(printf '%03i' $i)
> +   test-tool genrandom "bar" 200 >wide_delta_$iii &&
> +   test-tool genrandom "baz $iii" 50 >>wide_delta_$iii &&
> +   test-tool genrandom "foo"$i 100 >deep_delta_$iii &&
> +   test-tool genrandom "foo"$(expr $i + 1) 100 >>deep_delta_$iii 
> &&
> +   test-tool genrandom "foo"$(expr $i + 2) 100 >>deep_delta_$iii 
> &&
> +   echo $iii >file_$iii &&
> +   test-tool genrandom "$iii" 8192 >>file_$iii &&
> +   git update-index --add file_$iii deep_delta_$iii 
> wide_delta_$iii &&
> +   i=$(expr $i + 1) || return 1
> +   done &&
> +   { echo 101 && test-tool genrandom 100 8192; } >file_101 &&
> +   git update-index --add file_101 &&
> +   tree=$(git write-tree) &&
> +   commit=$(git commit-tree $tree -p HEAD +   echo $tree &&
> +   git ls-tree $tree | sed -e "s/.* \\([0-9a-f]*\\).*/\\1/"
> +   } >obj-list2 &&
> +   git update-ref HEAD $commit
> +'

There seems to be a fair bit of duplication in these tests which
create objects. Is it possible to factor out some of this code into a
shell function?

> +test_expect_success 'write midx with two packs' '
> +   git pack-objects --index-version=1 pack/test-2  +   git multi-pack-index --object-dir=. write &&
> +   midx_read_expect
> +'
> +
> +test_expect_success 'Add more packs' '

s/Add/add/

> +   for j in $(test_seq 1 10)
> +   do
> +   [...]
> +   done
> +'


Re: [PATCH v3 06/24] multi-pack-index: load into memory

2018-07-05 Thread Eric Sunshine
On Thu, Jul 5, 2018 at 8:53 PM Derrick Stolee  wrote:
> Create a new multi_pack_index struct for loading multi-pack-indexes into
> memory. Create a test-tool builtin for reading basic information about
> that multi-pack-index to verify the correct data is written.
>
> Signed-off-by: Derrick Stolee 
> ---
> diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
> @@ -0,0 +1,34 @@
> +/*
> + * test-mktemp.c: code to exercise the creation of temporary files
> + */

Meh. Copy/paste botch.

> +static int read_midx_file(const char *object_dir)
> +{
> +   struct multi_pack_index *m = load_multi_pack_index(object_dir);
> +
> +   if (!m)
> +   return 0;

Should this 'return 0' be a die() or BUG() or something?

> +   printf("header: %08x %d %d %d\n",
> +  m->signature,
> +  m->version,
> +  m->num_chunks,
> +  m->num_packs);
> +
> +   printf("object_dir: %s\n", m->object_dir);
> +
> +   return 0;
> +}
> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> @@ -3,9 +3,19 @@
> +midx_read_expect () {
> +   cat >expect <<-EOF

I guess you're planning on interpolating some variables here in a
later patch, which is why you used -EOF rather than -\EOF?

> +   header: 4d494458 1 0 0
> +   object_dir: .
> +   EOF
> +   test-tool read-midx . >actual &&
> +   test_cmp expect actual
> +}
> +
>  test_expect_success 'write midx with no packs' '
> git multi-pack-index --object-dir=. write &&
> -   test_path_is_file pack/multi-pack-index
> +   test_path_is_file pack/multi-pack-index &&
> +   midx_read_expect
>  '

Kind of a do-nothing change. I wonder if this step would better be
delayed until a later patch. (Not necessarily a big deal.)


Re: [PATCH v3 04/24] multi-pack-index: add 'write' verb

2018-07-05 Thread Eric Sunshine
On Thu, Jul 5, 2018 at 8:53 PM Derrick Stolee  wrote:
> In anticipation of writing multi-pack-indexes, add a
> 'git multi-pack-index write' subcommand and send the options to a
> write_midx_file() method.

Since the 'write' command is a no-op at this point, perhaps say so in
the commit message. Something like:

... add a skeleton 'git multi-pack-index write' subcommand,
which will be fleshed-out by a later commit.

The bit about sending options to write_midx_file() is superfluous;
it's a mere implementation detail which is clearly seen by reading the
patch.

> Also create a basic test file that tests
> the 'write' subcommand.

Maybe: s/file/script

And, as above, perhaps mention that this is a _skeleton_ test script
so as to avoid confusing readers into thinking that something
significant is happening at this stage.

> Signed-off-by: Derrick Stolee 
> ---
> diff --git a/Documentation/git-multi-pack-index.txt 
> b/Documentation/git-multi-pack-index.txt
> +* Write a MIDX file for the packfiles in an alternate.

In an alternate what?

> +---
> +$ git multi-pack-index --object-dir  write
> +---
> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> @@ -2,9 +2,10 @@
>  static char const * const builtin_multi_pack_index_usage[] = {
> -   N_("git multi-pack-index [--object-dir ]"),
> +   N_("git multi-pack-index [--object-dir ] [write]"),

Is there going to be some default behavior when no verb is provided?
The below implementation seems to suggest that the verb is required,
so this probably ought to be typeset as:

git multi-pack-index [--object-dir=] write

Later, when you add more (mutually exclusive) verbs, change the typesetting to:

git multi-pack-index [--object-dir=] (write|...|...)

Alternately, just use:

git multi-pack-index [--object-dir=] 

> @@ -34,5 +35,12 @@ int cmd_multi_pack_index(int argc, const char **argv,
> +   if (argc == 0)
> +   usage_with_options(builtin_multi_pack_index_usage,
> +  builtin_multi_pack_index_options);
> +
> +   if (!strcmp(argv[0], "write"))
> +   return write_midx_file(opts.object_dir);
> +
> return 0;

This should be throwing an error when an unrecognized verb is provided.

It also should be throwing an error when 'write' is given too many
arguments (which, at this point, appears to be 0).


Re: [PATCH v3 03/24] multi-pack-index: add builtin

2018-07-05 Thread Eric Sunshine
On Thu, Jul 5, 2018 at 8:53 PM Derrick Stolee  wrote:
> This new 'git multi-pack-index' builtin will be the plumbing access
> for writing, reading, and checking multi-pack-index files. The
> initial implementation is a no-op.
>
> Signed-off-by: Derrick Stolee 
> ---
> diff --git a/Documentation/git-multi-pack-index.txt 
> b/Documentation/git-multi-pack-index.txt
> +++ b/Documentation/git-multi-pack-index.txt
> +SYNOPSIS
> +
> +'git multi-pack-index' [--object-dir ]

In Git documentation, this is more typically written: [--object-dir=]

> +OPTIONS
> +---
> +--object-dir ::

Ditto: --object-dir=::

> +   Use given directory for the location of Git objects. We check
> +   `/packs/multi-pack-index` for the current MIDX file, and
> +   `/packs` for the pack-files to index.
> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> @@ -0,0 +1,38 @@
> +static char const * const builtin_multi_pack_index_usage[] = {
> +   N_("git multi-pack-index [--object-dir ]"),

Likewise.

> +int cmd_multi_pack_index(int argc, const char **argv,
> +const char *prefix)
> +{
> +   static struct option builtin_multi_pack_index_options[] = {
> +   OPT_FILENAME(0, "object-dir", _dir,
> + N_("The object directory containing set of packfile and 
> pack-index pairs")),

It's more typical not to capitalize these. Also, keep them short, if
possible, so perhaps drop "The".

> +   OPT_END(),
> +   };
> +
> +   if (argc == 2 && !strcmp(argv[1], "-h"))
> +   usage_with_options(builtin_multi_pack_index_usage,
> +  builtin_multi_pack_index_options);

Unless you are planning on adding a short "-h " option
later in the series, then you can do away with this conditional
altogether since the below parse_options() will give you "-h" as help
for free.

> +   git_config(git_default_config, NULL);
> +
> +   argc = parse_options(argc, argv, prefix,
> +builtin_multi_pack_index_options,
> +builtin_multi_pack_index_usage, 0);


[PATCH 3/3] send-email: automatically determine transfer-encoding

2018-07-05 Thread brian m. carlson
git send-email, when invoked without a --transfer-encoding option, sends
8bit data without a MIME version or a transfer encoding.  This has
several downsides.

First, unless the transfer encoding is specified, it defaults to 7bit,
meaning that non-ASCII data isn't allowed.  Second, if lines longer than
998 bytes are used, we will send an message that is invalid according to
RFC 5321.  The --validate option, which is the default, catches this
issue, but it isn't clear to many people how to resolve this.

To solve these issues, default the transfer encoding to "auto", so that
we explicitly specify 8bit encoding when lines don't exceed 998 bytes
and quoted-printable otherwise.  This means that we now always emit
Content-Transfer-Encoding and MIME-Version headers, so remove the
conditionals from this portion of the code.

It is unlikely that the unconditional inclusion of these two headers
will affect the deliverability of messages in anything but a positive
way, since MIME is already widespread and well understood by most email
programs.

Signed-off-by: brian m. carlson 
---
 Documentation/git-send-email.txt |  3 +--
 git-send-email.perl  | 18 ++
 t/t9001-send-email.sh| 21 +
 3 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt
index f44fb4b81e..1a240b52fc 100644
--- a/Documentation/git-send-email.txt
+++ b/Documentation/git-send-email.txt
@@ -147,8 +147,7 @@ Note that no attempts whatsoever are made to validate the 
encoding.
otherwise.
 +
 Default is the value of the `sendemail.transferEncoding` configuration
-value; if that is unspecified, git will use 8bit and not add a
-Content-Transfer-Encoding header.
+value; if that is unspecified, default to `auto`.
 
 --xmailer::
 --no-xmailer::
diff --git a/git-send-email.perl b/git-send-email.perl
index 4ea30c4070..c4fae17787 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -231,7 +231,7 @@ sub do_edit {
 my (@suppress_cc);
 my ($auto_8bit_encoding);
 my ($compose_encoding);
-my ($target_xfer_encoding);
+my $target_xfer_encoding = 'auto';
 
 my ($debug_net_smtp) = 0;  # Net::SMTP, see send_message()
 
@@ -1737,17 +1737,11 @@ sub process_file {
}
}
}
-   if (defined $target_xfer_encoding) {
-   $xfer_encoding = '8bit' if not defined $xfer_encoding;
-   ($message, $xfer_encoding) = apply_transfer_encoding(
-   $message, $xfer_encoding, $target_xfer_encoding);
-   }
-   if (defined $xfer_encoding) {
-   push @xh, "Content-Transfer-Encoding: $xfer_encoding";
-   }
-   if (defined $xfer_encoding or $has_content_type) {
-   unshift @xh, 'MIME-Version: 1.0' unless $has_mime_version;
-   }
+   $xfer_encoding = '8bit' if not defined $xfer_encoding;
+   ($message, $xfer_encoding) = apply_transfer_encoding(
+   $message, $xfer_encoding, $target_xfer_encoding);
+   push @xh, "Content-Transfer-Encoding: $xfer_encoding";
+   unshift @xh, 'MIME-Version: 1.0' unless $has_mime_version;
 
$needs_confirm = (
$confirm eq "always" or
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 6a0ace386b..7edce50e65 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -225,6 +225,8 @@ X-Mailer: X-MAILER-STRING
 In-Reply-To: 
 References: 
 Reply-To: Reply 
+MIME-Version: 1.0
+Content-Transfer-Encoding: 8bit
 
 Result: OK
 EOF
@@ -415,6 +417,7 @@ test_expect_success $PREREQ 'reject long lines' '
--from="Example " \
--to=nob...@example.com \
--smtp-server="$(pwd)/fake.sendmail" \
+   --transfer-encoding=8bit \
$patches longline.patch \
2>errors &&
grep longline.patch errors
@@ -610,6 +613,8 @@ Subject: [PATCH 1/1] Second.
 Date: DATE-STRING
 Message-Id: MESSAGE-ID-STRING
 X-Mailer: X-MAILER-STRING
+MIME-Version: 1.0
+Content-Transfer-Encoding: 8bit
 
 Result: OK
 EOF
@@ -654,6 +659,8 @@ Subject: [PATCH 1/1] Second.
 Date: DATE-STRING
 Message-Id: MESSAGE-ID-STRING
 X-Mailer: X-MAILER-STRING
+MIME-Version: 1.0
+Content-Transfer-Encoding: 8bit
 
 Result: OK
 EOF
@@ -689,6 +696,8 @@ Subject: [PATCH 1/1] Second.
 Date: DATE-STRING
 Message-Id: MESSAGE-ID-STRING
 X-Mailer: X-MAILER-STRING
+MIME-Version: 1.0
+Content-Transfer-Encoding: 8bit
 
 Result: OK
 EOF
@@ -715,6 +724,8 @@ Subject: [PATCH 1/1] Second.
 Date: DATE-STRING
 Message-Id: MESSAGE-ID-STRING
 X-Mailer: X-MAILER-STRING
+MIME-Version: 1.0
+Content-Transfer-Encoding: 8bit
 
 Result: OK
 EOF
@@ -749,6 +760,8 @@ Subject: [PATCH 1/1] Second.
 Date: DATE-STRING
 Message-Id: MESSAGE-ID-STRING
 X-Mailer: X-MAILER-STRING
+MIME-Version: 1.0
+Content-Transfer-Encoding: 8bit
 
 Result: OK
 EOF
@@ -780,6 +793,8 @@ Subject: [PATCH 1/1] Second.
 Date: DATE-STRING
 

[PATCH 2/3] send-email: accept long lines with suitable transfer encoding

2018-07-05 Thread brian m. carlson
With --validate (which is the default), we warn about lines exceeding
998 characters due to the limits specified in RFC 5321.  However, if
we're using a suitable transfer encoding (quoted-printable or base64),
we're guaranteed not to have lines exceeding 76 characters, so there's
no need to fail in this case.  The auto transfer encoding handles this
specific case, so accept it as well.

Signed-off-by: brian m. carlson 
---
 Documentation/git-send-email.txt |  5 +++--
 git-send-email.perl  |  8 ++--
 t/t9001-send-email.sh| 13 +
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt
index 385c7de9e2..f44fb4b81e 100644
--- a/Documentation/git-send-email.txt
+++ b/Documentation/git-send-email.txt
@@ -401,8 +401,9 @@ have been specified, in which case default to 'compose'.
 +
 --
*   Invoke the sendemail-validate hook if present (see 
linkgit:githooks[5]).
-   *   Warn of patches that contain lines longer than 998 
characters; this
-   is due to SMTP limits as described by 
http://www.ietf.org/rfc/rfc2821.txt.
+   *   Warn of patches that contain lines longer than 998 
characters unless
+   a suitable transfer encoding is used; this is due to 
SMTP limits as
+   described by http://www.ietf.org/rfc/rfc2821.txt.
 --
 +
 Default is the value of `sendemail.validate`; if this is not set,
diff --git a/git-send-email.perl b/git-send-email.perl
index a76953c310..4ea30c4070 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -645,7 +645,7 @@ sub is_format_patch_arg {
 if ($validate) {
foreach my $f (@files) {
unless (-p $f) {
-   my $error = validate_patch($f);
+   my $error = validate_patch($f, $target_xfer_encoding);
$error and die sprintf(__("fatal: %s: %s\nwarning: no 
patches were sent\n"),
  $f, $error);
}
@@ -1879,7 +1879,7 @@ sub unique_email_list {
 }
 
 sub validate_patch {
-   my $fn = shift;
+   my ($fn, $xfer_encoding) = @_;
 
if ($repo) {
my $validate_hook = catfile(catdir($repo->repo_path(), 'hooks'),
@@ -1899,6 +1899,10 @@ sub validate_patch {
return $hook_error if $hook_error;
}
 
+   # Any long lines will be automatically fixed if we use a suitable 
transfer
+   # encoding.
+   return if $xfer_encoding =~ /^(?:auto|quoted-printable|base64)$/;
+
open(my $fh, '<', $fn)
or die sprintf(__("unable to open %s: %s\n"), $fn, $!);
while (my $line = <$fh>) {
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 6cdcbcb19e..6a0ace386b 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -480,6 +480,19 @@ test_expect_success $PREREQ 'long lines with auto encoding 
are quoted-printable'
grep "Content-Transfer-Encoding: quoted-printable" msgtxt1
 '
 
+test_expect_success $PREREQ '--validate passes with certain encodings' '
+   for enc in auto quoted-printable base64
+   do
+   git send-email \
+   --from="Example " \
+   --to=nob...@example.com \
+   --smtp-server="$(pwd)/fake.sendmail" \
+   --transfer-encoding=$enc \
+   --validate \
+   $patches longline.patch
+   done
+'
+
 test_expect_success $PREREQ 'Invalid In-Reply-To' '
clean_fake_sendmail &&
git send-email \


[PATCH 0/3] Automatic transfer encoding for patches

2018-07-05 Thread brian m. carlson
git send-email has the --validate option, which is the default, to check
if a patch has lines longer than RFC 5321 allows (998 octets).  However,
it doesn't take into account the fact that using certain transfer
encodings makes this issue moot, and it also doesn't provide any helpful
clue to the user about what to do when this fails.

This series introduces an "auto" value for --transfer-encoding that uses
8bit when possible (i.e. when lines are 998 octets or shorter) and
quoted-printable otherwise; it then makes this the default behavior.  It
also makes --validate aware of transfer encoding so it doesn't complain
when using quoted-printable or base64.

git am already understands how to handle patches in any valid transfer
encoding.

As a result of this series, we always produce MIME messages with a
Content-Transfer-Encoding header.  I don't think this will cause any
problems, but if someone knows of a reason why it would, please shout
loudly.

brian m. carlson (3):
  send-email: add an auto option for transfer encoding
  send-email: accept long lines with suitable transfer encoding
  send-email: automatically determine transfer-encoding

 Documentation/git-send-email.txt | 15 +
 git-send-email.perl  | 36 ++--
 t/t9001-send-email.sh| 58 
 3 files changed, 85 insertions(+), 24 deletions(-)



[PATCH 1/3] send-email: add an auto option for transfer encoding

2018-07-05 Thread brian m. carlson
For most patches, using a transfer encoding of 8bit provides good
compatibility with most servers and makes it as easy as possible to view
patches.  However, there are some patches for which 8bit is not a valid
encoding: RFC 5321 specifies that a message must not have lines
exceeding 998 octets.

Add a transfer encoding value, auto, which indicates that a patch should
use 8bit where allowed and quoted-printable otherwise.  Choose
quoted-printable instead of base64, since base64-encoded plain text is
treated as suspicious by some spam filters.

Signed-off-by: brian m. carlson 
---
 Documentation/git-send-email.txt | 11 +++
 git-send-email.perl  | 12 +++-
 t/t9001-send-email.sh| 24 
 3 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt
index 4f3efde80c..385c7de9e2 100644
--- a/Documentation/git-send-email.txt
+++ b/Documentation/git-send-email.txt
@@ -137,15 +137,18 @@ Note that no attempts whatsoever are made to validate the 
encoding.
Specify encoding of compose message. Default is the value of the
'sendemail.composeencoding'; if that is unspecified, UTF-8 is assumed.
 
---transfer-encoding=(7bit|8bit|quoted-printable|base64)::
+--transfer-encoding=(7bit|8bit|quoted-printable|base64|auto)::
Specify the transfer encoding to be used to send the message over SMTP.
7bit will fail upon encountering a non-ASCII message.  quoted-printable
can be useful when the repository contains files that contain carriage
returns, but makes the raw patch email file (as saved from a MUA) much
harder to inspect manually.  base64 is even more fool proof, but also
-   even more opaque.  Default is the value of the 
`sendemail.transferEncoding`
-   configuration value; if that is unspecified, git will use 8bit and not
-   add a Content-Transfer-Encoding header.
+   even more opaque.  auto will use 8bit when possible, and 
quoted-printable
+   otherwise.
++
+Default is the value of the `sendemail.transferEncoding` configuration
+value; if that is unspecified, git will use 8bit and not add a
+Content-Transfer-Encoding header.
 
 --xmailer::
 --no-xmailer::
diff --git a/git-send-email.perl b/git-send-email.perl
index 8ec70e58ed..a76953c310 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -1739,9 +1739,8 @@ sub process_file {
}
if (defined $target_xfer_encoding) {
$xfer_encoding = '8bit' if not defined $xfer_encoding;
-   $message = apply_transfer_encoding(
+   ($message, $xfer_encoding) = apply_transfer_encoding(
$message, $xfer_encoding, $target_xfer_encoding);
-   $xfer_encoding = $target_xfer_encoding;
}
if (defined $xfer_encoding) {
push @xh, "Content-Transfer-Encoding: $xfer_encoding";
@@ -1852,13 +1851,16 @@ sub apply_transfer_encoding {
$message = MIME::Base64::decode($message)
if ($from eq 'base64');
 
+   $to = ($message =~ /.{999,}/) ? 'quoted-printable' :'8bit'
+   if $to eq 'auto';
+
die __("cannot send message as 7bit")
if ($to eq '7bit' and $message =~ /[^[:ascii:]]/);
-   return $message
+   return ($message, $to)
if ($to eq '7bit' or $to eq '8bit');
-   return MIME::QuotedPrint::encode($message, "\n", 0)
+   return (MIME::QuotedPrint::encode($message, "\n", 0), $to)
if ($to eq 'quoted-printable');
-   return MIME::Base64::encode($message, "\n")
+   return (MIME::Base64::encode($message, "\n"), $to)
if ($to eq 'base64');
die __("invalid transfer encoding");
 }
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index e80eacbb1b..6cdcbcb19e 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -456,6 +456,30 @@ test_expect_success $PREREQ 'allow long lines with 
--no-validate' '
2>errors
 '
 
+test_expect_success $PREREQ 'short lines with auto encoding are 8bit' '
+   clean_fake_sendmail &&
+   git send-email \
+   --from="A " \
+   --to=nob...@example.com \
+   --smtp-server="$(pwd)/fake.sendmail" \
+   --transfer-encoding=auto \
+   $patches &&
+   grep "Content-Transfer-Encoding: 8bit" msgtxt1
+'
+
+test_expect_success $PREREQ 'long lines with auto encoding are 
quoted-printable' '
+   clean_fake_sendmail &&
+   git send-email \
+   --from="Example " \
+   --to=nob...@example.com \
+   --smtp-server="$(pwd)/fake.sendmail" \
+   --transfer-encoding=auto \
+   --no-validate \
+   longline.patch \
+   2>errors &&
+   grep "Content-Transfer-Encoding: quoted-printable" msgtxt1
+'
+
 test_expect_success $PREREQ 'Invalid In-Reply-To' 

Re: [PATCH 0/8] X509 (gpgsm) commit signing support

2018-07-05 Thread brian m. carlson
On Tue, Jul 03, 2018 at 02:38:12PM +0200, Henning Schild wrote:
> This series adds support for signing commits with gpgsm.
> 
> The first two patches are cleanups of gpg-interface, while the next
> four prepare for the introduction of the actual feature in patch 7.
> Finally patch 8 extends the testsuite to cover the new feature.
> 
> This series can be seen as a follow up of a series that appeared under
> the name "gpg-interface: Multiple signing tools" in april 2018 [1]. After
> that series was not merged i decided to get my patches ready. The
> original series aimed at being generic for any sort of signing tool, while
> this series just introduced the X509 variant of gpg. (gpgsm)
> I collected authors and reviewers of that first series and already put them
> on cc.

Overall, I think this is heading in a good direction.  I left a few
comments, but it seemed pretty sane.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 8/8] gpg-interface t: extend the existing GPG tests with GPGSM

2018-07-05 Thread brian m. carlson
On Tue, Jul 03, 2018 at 02:38:20PM +0200, Henning Schild wrote:
> Add test cases to cover the new X509/gpgsm support. Most of them
> resemble existing ones. They just switch the format to X509 and set the
> signingkey when creating signatures. Validation of signatures does not
> need any configuration of git, it does need gpgsm to be configured to
> trust the key(-chain).
> We generate a self-signed key for commit...@example.com and configure
> gpgsm to trust it.
> 
> Signed-off-by: Henning Schild 
> ---
>  t/lib-gpg.sh   |  9 ++-
>  t/lib-gpg/gpgsm-gen-key.in |  6 +
>  t/t4202-log.sh | 66 
> ++
>  t/t5534-push-signed.sh | 52 
>  t/t7003-filter-branch.sh   | 15 +++
>  t/t7030-verify-tag.sh  | 47 +++--
>  t/t7600-merge.sh   | 31 ++
>  7 files changed, 223 insertions(+), 3 deletions(-)
>  create mode 100644 t/lib-gpg/gpgsm-gen-key.in
> 
> diff --git a/t/lib-gpg.sh b/t/lib-gpg.sh
> index a5d3b2cba..9dcb4e990 100755
> --- a/t/lib-gpg.sh
> +++ b/t/lib-gpg.sh
> @@ -38,7 +38,14 @@ then
>   "$TEST_DIRECTORY"/lib-gpg/ownertrust &&
>   gpg --homedir "${GNUPGHOME}" /dev/null 2>&1 \
>   --sign -u commit...@example.com &&
> - test_set_prereq GPG
> + test_set_prereq GPG &&
> + echo | gpgsm --homedir "${GNUPGHOME}" -o 
> "$TEST_DIRECTORY"/lib-gpg/gpgsm.crt.user --passphrase-fd 0 --pinentry-mode 
> loopback --generate-key --batch "$TEST_DIRECTORY"/lib-gpg/gpgsm-gen-key.in &&
> + gpgsm --homedir "${GNUPGHOME}" --import 
> "$TEST_DIRECTORY"/lib-gpg/gpgsm.crt.user &&
> + gpgsm --homedir "${GNUPGHOME}" -K | grep fingerprint: | cut -d" 
> " -f4 | tr -d '\n' > ${GNUPGHOME}/trustlist.txt &&
> + echo " S relax" >> ${GNUPGHOME}/trustlist.txt &&
> + (gpgconf --kill gpg-agent >/dev/null 2>&1 || : ) &&
> + echo hello | gpgsm --homedir "${GNUPGHOME}" -u 
> commit...@example.com -o /dev/null --sign - 2>&1 &&
> + test_set_prereq GPGSM

It looks like the GPGSM prerequisite will only be set if the GPG
prerequisite is set as well.  Do we want to consider the case when the
user might have gpgsm but not gpg?
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 7/8] gpg-interface: introduce new signature format "X509" using gpgsm

2018-07-05 Thread brian m. carlson
On Tue, Jul 03, 2018 at 02:38:19PM +0200, Henning Schild wrote:
> This commit allows git to create and check X509 type signatures using
> gpgsm.
> 
> Signed-off-by: Henning Schild 
> ---
>  Documentation/config.txt |  5 -
>  gpg-interface.c  | 10 +-
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index c88903399..337df6e48 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1828,9 +1828,12 @@ gpg.program::
>   signed, and the program is expected to send the result to its
>   standard output.
>  
> +gpg.programX509::

I'm not super excited about this name.  It seems to indicate we want a
level of hierarchy involved.

A hierarchy like sign.openpgp.program (falling back to gpg.program) and
sign.x509.program might be more logical.

> diff --git a/gpg-interface.c b/gpg-interface.c
> index aa747278e..85d721007 100644
> --- a/gpg-interface.c
> +++ b/gpg-interface.c
> @@ -16,13 +16,18 @@ struct gpg_format_data {
>  
>  #define PGP_SIGNATURE "-BEGIN PGP SIGNATURE-"
>  #define PGP_MESSAGE "-BEGIN PGP MESSAGE-"
> +#define X509_SIGNATURE "-BEGIN SIGNED MESSAGE-"
>  
> -enum gpgformats { PGP_FMT };
> +enum gpgformats { PGP_FMT, X509_FMT };
>  struct gpg_format_data gpg_formats[] = {
>   { .format = "PGP", .program = "gpg",
> .extra_args_verify = { "--keyid-format=long", },
> .sigs = { PGP_SIGNATURE, PGP_MESSAGE, },
>   },
> + { .format = "X509", .program = "gpgsm",

Similarly to my comment about "PGP", I think this would do well as
"x509".
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 3/8] gpg-interface: add new config to select how to sign a commit

2018-07-05 Thread brian m. carlson
On Tue, Jul 03, 2018 at 02:38:15PM +0200, Henning Schild wrote:
> Add "gpg.format" where the user can specify which type of signature to
> use for commits. At the moment only "PGP" is supported and the value is
> not even used. This commit prepares for a new types of signatures.

We typically prefer to have option values specified in lower case.  I
also think "openpgp" might be better than "PGP", since that's the name
of the specification and it would avoid any potential unhappiness about
compatibility with PGP or trademarks.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


[PATCH v3 23/24] packfile: skip loading index if in multi-pack-index

2018-07-05 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 packfile.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/packfile.c b/packfile.c
index 2c819a0ad8..e6ecf12ab5 100644
--- a/packfile.c
+++ b/packfile.c
@@ -469,8 +469,19 @@ static int open_packed_git_1(struct packed_git *p)
ssize_t read_result;
const unsigned hashsz = the_hash_algo->rawsz;
 
-   if (!p->index_data && open_pack_index(p))
-   return error("packfile %s index unavailable", p->pack_name);
+   if (!p->index_data) {
+   struct multi_pack_index *m;
+   const char *pack_name = strrchr(p->pack_name, '/');
+
+   for (m = the_repository->objects->multi_pack_index;
+m; m = m->next) {
+   if (midx_contains_pack(m, pack_name))
+   break;
+   }
+
+   if (!m && open_pack_index(p))
+   return error("packfile %s index unavailable", 
p->pack_name);
+   }
 
if (!pack_max_fds) {
unsigned int max_fds = get_max_fd_limit();
@@ -521,6 +532,10 @@ static int open_packed_git_1(struct packed_git *p)
" supported (try upgrading GIT to a newer version)",
p->pack_name, ntohl(hdr.hdr_version));
 
+   /* Skip index checking if in multi-pack-index */
+   if (!p->index_data)
+   return 0;
+
/* Verify the pack matches its index. */
if (p->num_objects != ntohl(hdr.hdr_entries))
return error("packfile %s claims to have %"PRIu32" objects"
-- 
2.18.0.118.gd4f65b8d14



[PATCH v3 11/24] midx: read pack names into array

2018-07-05 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 midx.c  | 17 +
 object-store.h  |  1 +
 t/helper/test-read-midx.c   |  5 +
 t/t5319-multi-pack-index.sh |  7 ++-
 4 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index f78c161422..ffe29af65d 100644
--- a/midx.c
+++ b/midx.c
@@ -37,6 +37,7 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
uint32_t hash_version;
char *midx_name = get_midx_filename(object_dir);
uint32_t i;
+   const char *cur_pack_name;
 
fd = git_open(midx_name);
 
@@ -117,6 +118,22 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
if (!m->chunk_pack_names)
die(_("multi-pack-index missing required pack-name chunk"));
 
+   m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names));
+
+   cur_pack_name = (const char *)m->chunk_pack_names;
+   for (i = 0; i < m->num_packs; i++) {
+   m->pack_names[i] = cur_pack_name;
+
+   cur_pack_name += strlen(cur_pack_name) + 1;
+
+   if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0) {
+   error(_("multi-pack-index pack names out of order: '%s' 
before '%s'"),
+ m->pack_names[i - 1],
+ m->pack_names[i]);
+   goto cleanup_fail;
+   }
+   }
+
return m;
 
 cleanup_fail:
diff --git a/object-store.h b/object-store.h
index c87d051849..88169b33e9 100644
--- a/object-store.h
+++ b/object-store.h
@@ -99,6 +99,7 @@ struct multi_pack_index {
 
const unsigned char *chunk_pack_names;
 
+   const char **pack_names;
char object_dir[FLEX_ARRAY];
 };
 
diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index a9232d8219..0b53a9e8b5 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -9,6 +9,7 @@
 
 static int read_midx_file(const char *object_dir)
 {
+   uint32_t i;
struct multi_pack_index *m = load_multi_pack_index(object_dir);
 
if (!m)
@@ -27,6 +28,10 @@ static int read_midx_file(const char *object_dir)
 
printf("\n");
 
+   printf("packs:\n");
+   for (i = 0; i < m->num_packs; i++)
+   printf("%s\n", m->pack_names[i]);
+
printf("object_dir: %s\n", m->object_dir);
 
return 0;
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index f458758945..4610352b69 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -8,8 +8,13 @@ midx_read_expect () {
cat >expect <<-EOF
header: 4d494458 1 1 $NUM_PACKS
chunks: pack_names
-   object_dir: .
+   packs:
EOF
+   if [ $NUM_PACKS -ge 1 ]
+   then
+   ls pack/ | grep idx | sort >> expect
+   fi
+   printf "object_dir: .\n" >>expect &&
test-tool read-midx . >actual &&
test_cmp expect actual
 }
-- 
2.18.0.118.gd4f65b8d14



[PATCH v3 22/24] midx: prevent duplicate packfile loads

2018-07-05 Thread Derrick Stolee
The multi-pack-index, when present, tracks the existence of objects and
their offsets within a list of packfiles. This allows us to use the
multi-pack-index for object lookups, abbreviations, and object counts.

When the multi-pack-index tracks a packfile, then we do not need to add
that packfile to the packed_git linked list or the MRU list.

We still need to load the packfiles that are not tracked by the
multi-pack-index.

Signed-off-by: Derrick Stolee 
---
 packfile.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/packfile.c b/packfile.c
index 97e7812b6b..2c819a0ad8 100644
--- a/packfile.c
+++ b/packfile.c
@@ -795,6 +795,7 @@ struct prepare_pack_data {
struct repository *r;
struct string_list *garbage;
int local;
+   struct multi_pack_index *m;
 };
 
 static void prepare_pack(const char *full_name, size_t full_name_len,
@@ -805,6 +806,8 @@ static void prepare_pack(const char *full_name, size_t 
full_name_len,
size_t base_len = full_name_len;
 
if (strip_suffix_mem(full_name, _len, ".idx")) {
+   if (data->m && midx_contains_pack(data->m, file_name))
+   return;
/* Don't reopen a pack we already have. */
for (p = data->r->objects->packed_git; p; p = p->next) {
size_t len;
@@ -839,6 +842,12 @@ static void prepare_packed_git_one(struct repository *r, 
char *objdir, int local
struct prepare_pack_data data;
struct string_list garbage = STRING_LIST_INIT_DUP;
 
+   data.m = r->objects->multi_pack_index;
+
+   /* look for the multi-pack-index for this object directory */
+   while (data.m && strcmp(data.m->object_dir, objdir))
+   data.m = data.m->next;
+
data.r = r;
data.garbage = 
data.local = local;
-- 
2.18.0.118.gd4f65b8d14



[PATCH v3 07/24] multi-pack-index: expand test data

2018-07-05 Thread Derrick Stolee
As we build the multi-pack-index file format, we want to test the format
on real repoasitories. Add tests to t5319-multi-pack-index.sh that
create repository data including multiple packfiles with both version 1
and version 2 formats.

The current 'git multi-pack-index write' command will always write the
same file with no "real" data. This will be expanded in future commits,
along with the test expectations.

Signed-off-by: Derrick Stolee 
---
 t/t5319-multi-pack-index.sh | 99 +
 1 file changed, 99 insertions(+)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 2ecc369529..1be7be02b8 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -13,9 +13,108 @@ midx_read_expect () {
 }
 
 test_expect_success 'write midx with no packs' '
+   test_when_finished rm -f pack/multi-pack-index &&
git multi-pack-index --object-dir=. write &&
test_path_is_file pack/multi-pack-index &&
midx_read_expect
 '
 
+test_expect_success 'create objects' '
+   for i in $(test_seq 1 5)
+   do
+   iii=$(printf '%03i' $i)
+   test-tool genrandom "bar" 200 >wide_delta_$iii &&
+   test-tool genrandom "baz $iii" 50 >>wide_delta_$iii &&
+   test-tool genrandom "foo"$i 100 >deep_delta_$iii &&
+   test-tool genrandom "foo"$(expr $i + 1) 100 >>deep_delta_$iii &&
+   test-tool genrandom "foo"$(expr $i + 2) 100 >>deep_delta_$iii &&
+   echo $iii >file_$iii &&
+   test-tool genrandom "$iii" 8192 >>file_$iii &&
+   git update-index --add file_$iii deep_delta_$iii 
wide_delta_$iii &&
+   i=$(expr $i + 1) || return 1
+   done &&
+   { echo 101 && test-tool genrandom 100 8192; } >file_101 &&
+   git update-index --add file_101 &&
+   tree=$(git write-tree) &&
+   commit=$(git commit-tree $tree obj-list &&
+   git update-ref HEAD $commit
+'
+
+test_expect_success 'write midx with one v1 pack' '
+   pack=$(git pack-objects --index-version=1 pack/test wide_delta_$iii &&
+   test-tool genrandom "baz $iii" 50 >>wide_delta_$iii &&
+   test-tool genrandom "foo"$i 100 >deep_delta_$iii &&
+   test-tool genrandom "foo"$(expr $i + 1) 100 >>deep_delta_$iii &&
+   test-tool genrandom "foo"$(expr $i + 2) 100 >>deep_delta_$iii &&
+   echo $iii >file_$iii &&
+   test-tool genrandom "$iii" 8192 >>file_$iii &&
+   git update-index --add file_$iii deep_delta_$iii 
wide_delta_$iii &&
+   i=$(expr $i + 1) || return 1
+   done &&
+   { echo 101 && test-tool genrandom 100 8192; } >file_101 &&
+   git update-index --add file_101 &&
+   tree=$(git write-tree) &&
+   commit=$(git commit-tree $tree -p HEADobj-list2 &&
+   git update-ref HEAD $commit
+'
+
+test_expect_success 'write midx with two packs' '
+   git pack-objects --index-version=1 pack/test-2 wide_delta_$iii &&
+   test-tool genrandom "baz $iii" 50 >>wide_delta_$iii &&
+   test-tool genrandom "foo"$i 100 >deep_delta_$iii &&
+   test-tool genrandom "foo"$(expr $i + 1) 100 >>deep_delta_$iii &&
+   test-tool genrandom "foo"$(expr $i + 2) 100 >>deep_delta_$iii &&
+   echo $iii >file_$iii &&
+   test-tool genrandom "$iii" 8192 >>file_$iii &&
+   git update-index --add file_$iii deep_delta_$iii 
wide_delta_$iii &&
+   { echo 101 && test-tool genrandom 100 8192; } >file_101 &&
+   git update-index --add file_101 &&
+   tree=$(git write-tree) &&
+   commit=$(git commit-tree $tree -p HEADobj-list &&
+   git update-ref HEAD $commit &&
+   git pack-objects --index-version=2 test-pack 

[PATCH v3 16/24] config: create core.multiPackIndex setting

2018-07-05 Thread Derrick Stolee
The core.multiPackIndex config setting controls the multi-pack-
index (MIDX) feature. If false, the setting will disable all reads
from the multi-pack-index file.

Add comparison commands in t5319-multi-pack-index.sh to check
typical Git behavior remains the same as the config setting is turned
on and off. This currently includes 'git rev-list' and 'git log'
commands to trigger several object database reads.

Signed-off-by: Derrick Stolee 
---
 Documentation/config.txt|  4 +++
 cache.h |  1 +
 config.c|  5 
 environment.c   |  1 +
 t/t5319-multi-pack-index.sh | 56 +++--
 5 files changed, 58 insertions(+), 9 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a9..ab895ebb32 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -908,6 +908,10 @@ core.commitGraph::
Enable git commit graph feature. Allows reading from the
commit-graph file.
 
+core.multiPackIndex::
+   Use the multi-pack-index file to track multiple packfiles using a
+   single index. See linkgit:technical/multi-pack-index[1].
+
 core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
linkgit:git-read-tree[1] for more information.
diff --git a/cache.h b/cache.h
index 89a107a7f7..d12aa49710 100644
--- a/cache.h
+++ b/cache.h
@@ -814,6 +814,7 @@ extern char *git_replace_ref_base;
 extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_commit_graph;
+extern int core_multi_pack_index;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index fbbf0f8e9f..95d8da4243 100644
--- a/config.c
+++ b/config.c
@@ -1313,6 +1313,11 @@ static int git_default_core_config(const char *var, 
const char *value)
return 0;
}
 
+   if (!strcmp(var, "core.multipackindex")) {
+   core_multi_pack_index = git_config_bool(var, value);
+   return 0;
+   }
+
if (!strcmp(var, "core.sparsecheckout")) {
core_apply_sparse_checkout = git_config_bool(var, value);
return 0;
diff --git a/environment.c b/environment.c
index 2a6de2330b..b9bc919cdb 100644
--- a/environment.c
+++ b/environment.c
@@ -67,6 +67,7 @@ enum object_creation_mode object_creation_mode = 
OBJECT_CREATION_MODE;
 char *notes_ref_name;
 int grafts_replace_parents = 1;
 int core_commit_graph;
+int core_multi_pack_index;
 int core_apply_sparse_checkout;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index ae6c9d4d02..fc582c9a59 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -3,6 +3,8 @@
 test_description='multi-pack-indexes'
 . ./test-lib.sh
 
+objdir=.git/objects
+
 midx_read_expect () {
NUM_PACKS=$1
NUM_OBJECTS=$2
@@ -61,12 +63,42 @@ test_expect_success 'write midx with one v1 pack' '
midx_read_expect 1 17 4 .
 '
 
+midx_git_two_modes() {
+   git -c core.multiPackIndex=false $1 >expect &&
+   git -c core.multiPackIndex=true $1 >actual &&
+   test_cmp expect actual
+}
+
+compare_results_with_midx() {
+   MSG=$1
+   test_expect_success "check normal git operations: $MSG" '
+   midx_git_two_modes "rev-list --objects --all" &&
+   midx_git_two_modes "log --raw"
+   '
+}
+
 test_expect_success 'write midx with one v2 pack' '
-   git pack-objects --index-version=2,0x40 pack/test expect &&
+   git -c core.multiPackIndex=true $1 >actual &&
+   test_cmp expect actual
+}
+
+compare_results_with_midx() {
+   MSG=$1
+   test_expect_success "check normal git operations: $MSG" '
+   midx_git_two_modes "rev-list --objects --all" &&
+   midx_git_two_modes "log --raw"
+   '
+}
+
+compare_results_with_midx "one v2 pack"
+
 test_expect_success 'Add more objects' '
for i in $(test_seq 6 10)
do
@@ -92,11 +124,13 @@ test_expect_success 'Add more objects' '
 '
 
 test_expect_success 'write midx with two packs' '
-   git pack-objects --index-version=1 pack/test-2 obj-list &&
git update-ref HEAD $commit &&
-   git pack-objects --index-version=2 pack/test-pack   []
 corrupt_data() {
file=$1
-- 
2.18.0.118.gd4f65b8d14



[PATCH v3 17/24] midx: prepare midxed_git struct

2018-07-05 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 midx.c | 22 ++
 midx.h |  3 +++
 object-store.h |  9 +
 packfile.c |  6 +-
 4 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index cc35abe7a2..d5a61c0c53 100644
--- a/midx.c
+++ b/midx.c
@@ -180,6 +180,28 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
return NULL;
 }
 
+int prepare_multi_pack_index_one(struct repository *r, const char *object_dir)
+{
+   struct multi_pack_index *m = r->objects->multi_pack_index;
+   struct multi_pack_index *m_search;
+
+   if (!core_multi_pack_index)
+   return 0;
+
+   for (m_search = m; m_search; m_search = m_search->next)
+   if (!strcmp(object_dir, m_search->object_dir))
+   return 1;
+
+   r->objects->multi_pack_index = load_multi_pack_index(object_dir);
+
+   if (r->objects->multi_pack_index) {
+   r->objects->multi_pack_index->next = m;
+   return 1;
+   }
+
+   return 0;
+}
+
 static size_t write_midx_header(struct hashfile *f,
unsigned char num_chunks,
uint32_t num_packs)
diff --git a/midx.h b/midx.h
index 2d83dd9ec1..731ad6f094 100644
--- a/midx.h
+++ b/midx.h
@@ -1,9 +1,12 @@
 #ifndef __MIDX_H__
 #define __MIDX_H__
 
+#include "repository.h"
+
 struct multi_pack_index;
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir);
+int prepare_multi_pack_index_one(struct repository *r, const char *object_dir);
 
 int write_midx_file(const char *object_dir);
 
diff --git a/object-store.h b/object-store.h
index 07bcc80e02..7d67ad7aa9 100644
--- a/object-store.h
+++ b/object-store.h
@@ -85,6 +85,8 @@ struct packed_git {
 };
 
 struct multi_pack_index {
+   struct multi_pack_index *next;
+
int fd;
 
const unsigned char *data;
@@ -126,6 +128,13 @@ struct raw_object_store {
 */
struct oidmap *replace_map;
 
+   /*
+* private data
+*
+* should only be accessed directly by packfile.c and midx.c
+*/
+   struct multi_pack_index *multi_pack_index;
+
/*
 * private data
 *
diff --git a/packfile.c b/packfile.c
index 3d652212c6..5d4493dbf4 100644
--- a/packfile.c
+++ b/packfile.c
@@ -15,6 +15,7 @@
 #include "tree-walk.h"
 #include "tree.h"
 #include "object-store.h"
+#include "midx.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -935,10 +936,13 @@ static void prepare_packed_git(struct repository *r)
 
if (r->objects->packed_git_initialized)
return;
+   prepare_multi_pack_index_one(r, r->objects->objectdir);
prepare_packed_git_one(r, r->objects->objectdir, 1);
prepare_alt_odb(r);
-   for (alt = r->objects->alt_odb_list; alt; alt = alt->next)
+   for (alt = r->objects->alt_odb_list; alt; alt = alt->next) {
+   prepare_multi_pack_index_one(r, alt->path);
prepare_packed_git_one(r, alt->path, 0);
+   }
rearrange_packed_git(r);
prepare_packed_git_mru(r);
r->objects->packed_git_initialized = 1;
-- 
2.18.0.118.gd4f65b8d14



[PATCH v3 24/24] midx: clear midx on repack

2018-07-05 Thread Derrick Stolee
If a 'git repack' command replaces existing packfiles, then we must
clear the existing multi-pack-index before moving the packfiles it
references.

Signed-off-by: Derrick Stolee 
---
 builtin/repack.c |  8 
 midx.c   | 12 
 midx.h   |  1 +
 3 files changed, 21 insertions(+)

diff --git a/builtin/repack.c b/builtin/repack.c
index 6c636e159e..66a7d8e8ea 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -8,6 +8,7 @@
 #include "strbuf.h"
 #include "string-list.h"
 #include "argv-array.h"
+#include "midx.h"
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
@@ -174,6 +175,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
int no_update_server_info = 0;
int quiet = 0;
int local = 0;
+   int midx_cleared = 0;
 
struct option builtin_repack_options[] = {
OPT_BIT('a', NULL, _everything,
@@ -340,6 +342,12 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
continue;
}
 
+   if (!midx_cleared) {
+   /* if we move a packfile, it will invalidated 
the midx */
+   clear_midx_file(get_object_directory());
+   midx_cleared = 1;
+   }
+
fname_old = mkpathdup("%s/old-%s%s", packdir,
item->string, exts[ext].name);
if (file_exists(fname_old))
diff --git a/midx.c b/midx.c
index 7c00b02436..8afd08f3fe 100644
--- a/midx.c
+++ b/midx.c
@@ -903,3 +903,15 @@ int write_midx_file(const char *object_dir)
free(midx_name);
return 0;
 }
+
+void clear_midx_file(const char *object_dir)
+{
+   char *midx = get_midx_filename(object_dir);
+
+   if (remove_path(midx)) {
+   UNLEAK(midx);
+   die(_("failed to clear multi-pack-index at %s"), midx);
+   }
+
+   free(midx);
+}
diff --git a/midx.h b/midx.h
index 5faffb7bc6..5a42cbed1d 100644
--- a/midx.h
+++ b/midx.h
@@ -15,5 +15,6 @@ int midx_contains_pack(struct multi_pack_index *m, const char 
*idx_name);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir);
 
 int write_midx_file(const char *object_dir);
+void clear_midx_file(const char *object_dir);
 
 #endif
-- 
2.18.0.118.gd4f65b8d14



[PATCH v3 21/24] midx: use midx in approximate_object_count

2018-07-05 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 packfile.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/packfile.c b/packfile.c
index c0eb5ac885..97e7812b6b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -861,10 +861,13 @@ unsigned long approximate_object_count(void)
 {
if (!the_repository->objects->approximate_object_count_valid) {
unsigned long count;
+   struct multi_pack_index *m;
struct packed_git *p;
 
prepare_packed_git(the_repository);
count = 0;
+   for (m = get_multi_pack_index(the_repository); m; m = m->next)
+   count += m->num_objects;
for (p = the_repository->objects->packed_git; p; p = p->next) {
if (open_pack_index(p))
continue;
-- 
2.18.0.118.gd4f65b8d14



[PATCH v3 20/24] midx: use existing midx when writing new one

2018-07-05 Thread Derrick Stolee
Due to how Windows handles replacing a lockfile when there is an open
handle, create the close_midx() method to close the existing midx before
writing the new one.

Signed-off-by: Derrick Stolee 
---
 midx.c | 116 ++---
 midx.h |   1 +
 2 files changed, 111 insertions(+), 6 deletions(-)

diff --git a/midx.c b/midx.c
index e66025f066..7c00b02436 100644
--- a/midx.c
+++ b/midx.c
@@ -181,6 +181,23 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
return NULL;
 }
 
+static void close_midx(struct multi_pack_index *m)
+{
+   uint32_t i;
+   munmap((unsigned char *)m->data, m->data_len);
+   close(m->fd);
+   m->fd = -1;
+
+   for (i = 0; i < m->num_packs; i++) {
+   if (m->packs[i]) {
+   close_pack(m->packs[i]);
+   free(m->packs);
+   }
+   }
+   FREE_AND_NULL(m->packs);
+   FREE_AND_NULL(m->pack_names);
+}
+
 static int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id)
 {
struct strbuf pack_name = STRBUF_INIT;
@@ -280,6 +297,29 @@ int fill_midx_entry(const struct object_id *oid, struct 
pack_entry *e, struct mu
return nth_midxed_pack_entry(m, e, pos);
 }
 
+int midx_contains_pack(struct multi_pack_index *m, const char *idx_name)
+{
+   uint32_t first = 0, last = m->num_packs;
+
+   while (first < last) {
+   uint32_t mid = first + (last - first) / 2;
+   const char *current;
+   int cmp;
+
+   current = m->pack_names[mid];
+   cmp = strcmp(idx_name, current);
+   if (!cmp)
+   return 1;
+   if (cmp > 0) {
+   first = mid + 1;
+   continue;
+   }
+   last = mid;
+   }
+
+   return 0;
+}
+
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir)
 {
struct multi_pack_index *m = r->objects->multi_pack_index;
@@ -326,6 +366,7 @@ struct pack_list {
uint32_t alloc_list;
uint32_t alloc_names;
size_t pack_name_concat_len;
+   struct multi_pack_index *m;
 };
 
 static void add_pack_to_midx(const char *full_path, size_t full_path_len,
@@ -334,6 +375,9 @@ static void add_pack_to_midx(const char *full_path, size_t 
full_path_len,
struct pack_list *packs = (struct pack_list *)data;
 
if (ends_with(file_name, ".idx")) {
+   if (packs->m && midx_contains_pack(packs->m, file_name))
+   return;
+
ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list);
ALLOC_GROW(packs->names, packs->nr + 1, packs->alloc_names);
 
@@ -418,6 +462,23 @@ static int midx_oid_compare(const void *_a, const void *_b)
return a->pack_int_id - b->pack_int_id;
 }
 
+static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
+ uint32_t *pack_perm,
+ struct pack_midx_entry *e,
+ uint32_t pos)
+{
+   if (pos >= m->num_objects)
+   return 1;
+
+   nth_midxed_object_oid(>oid, m, pos);
+   e->pack_int_id = pack_perm[nth_midxed_pack_int_id(m, pos)];
+   e->offset = nth_midxed_offset(m, pos);
+
+   /* consider objects in midx to be from "old" packs */
+   e->pack_mtime = 0;
+   return 0;
+}
+
 static void fill_pack_entry(uint32_t pack_int_id,
struct packed_git *p,
uint32_t cur_object,
@@ -443,7 +504,8 @@ static void fill_pack_entry(uint32_t pack_int_id,
  * Copy only the de-duplicated entries (selected by most-recent modified time
  * of a packfile containing the object).
  */
-static struct pack_midx_entry *get_sorted_entries(struct packed_git **p,
+static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
+ struct packed_git **p,
  uint32_t *perm,
  uint32_t nr_packs,
  uint32_t *nr_objects)
@@ -452,8 +514,9 @@ static struct pack_midx_entry *get_sorted_entries(struct 
packed_git **p,
uint32_t alloc_fanout, alloc_objects, total_objects = 0;
struct pack_midx_entry *entries_by_fanout = NULL;
struct pack_midx_entry *deduplicated_entries = NULL;
+   uint32_t start_pack = m ? m->num_packs : 0;
 
-   for (cur_pack = 0; cur_pack < nr_packs; cur_pack++)
+   for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++)
total_objects += p[cur_pack]->num_objects;
 
/*
@@ -470,7 +533,23 @@ static struct pack_midx_entry *get_sorted_entries(struct 
packed_git **p,
for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
uint32_t nr_fanout = 

[PATCH v3 18/24] midx: read objects from multi-pack-index

2018-07-05 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 midx.c | 91 +-
 midx.h |  2 ++
 object-store.h |  1 +
 packfile.c |  8 -
 4 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index d5a61c0c53..84b045060a 100644
--- a/midx.c
+++ b/midx.c
@@ -4,7 +4,7 @@
 #include "lockfile.h"
 #include "packfile.h"
 #include "object-store.h"
-#include "packfile.h"
+#include "sha1-lookup.h"
 #include "midx.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
@@ -152,6 +152,7 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
m->num_objects = ntohl(m->chunk_oid_fanout[255]);
 
m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names));
+   m->packs = xcalloc(m->num_packs, sizeof(*m->packs));
 
cur_pack_name = (const char *)m->chunk_pack_names;
for (i = 0; i < m->num_packs; i++) {
@@ -180,6 +181,94 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
return NULL;
 }
 
+static int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id)
+{
+   struct strbuf pack_name = STRBUF_INIT;
+
+   if (pack_int_id >= m->num_packs)
+   BUG("bad pack-int-id");
+
+   if (m->packs[pack_int_id])
+   return 0;
+
+   strbuf_addf(_name, "%s/pack/%s", m->object_dir,
+   m->pack_names[pack_int_id]);
+
+   m->packs[pack_int_id] = add_packed_git(pack_name.buf, pack_name.len, 1);
+   strbuf_release(_name);
+   return !m->packs[pack_int_id];
+}
+
+int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, 
uint32_t *result)
+{
+   return bsearch_hash(oid->hash, m->chunk_oid_fanout, m->chunk_oid_lookup,
+   MIDX_HASH_LEN, result);
+}
+
+static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
+{
+   const unsigned char *offset_data;
+   uint32_t offset32;
+
+   offset_data = m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH;
+   offset32 = get_be32(offset_data + sizeof(uint32_t));
+
+   if (m->chunk_large_offsets && offset32 & MIDX_LARGE_OFFSET_NEEDED) {
+   if (sizeof(offset32) < sizeof(uint64_t))
+   die(_("multi-pack-index stores a 64-bit offset, but 
off_t is too small"));
+
+   offset32 ^= MIDX_LARGE_OFFSET_NEEDED;
+   return get_be64(m->chunk_large_offsets + sizeof(uint64_t) * 
offset32);
+   }
+
+   return offset32;
+}
+
+static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t 
pos)
+{
+   return get_be32(m->chunk_object_offsets + pos * 
MIDX_CHUNK_OFFSET_WIDTH);
+}
+
+static int nth_midxed_pack_entry(struct multi_pack_index *m, struct pack_entry 
*e, uint32_t pos)
+{
+   uint32_t pack_int_id;
+   struct packed_git *p;
+
+   if (pos >= m->num_objects)
+   return 0;
+
+   pack_int_id = nth_midxed_pack_int_id(m, pos);
+
+   if (prepare_midx_pack(m, pack_int_id))
+   die(_("error preparing packfile from multi-pack-index"));
+   p = m->packs[pack_int_id];
+
+   /*
+   * We are about to tell the caller where they can locate the
+   * requested object.  We better make sure the packfile is
+   * still here and can be accessed before supplying that
+   * answer, as it may have been deleted since the MIDX was
+   * loaded!
+   */
+   if (!is_pack_valid(p))
+   return 0;
+
+   e->offset = nth_midxed_offset(m, pos);
+   e->p = p;
+
+   return 1;
+}
+
+int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct 
multi_pack_index *m)
+{
+   uint32_t pos;
+
+   if (!bsearch_midx(oid, m, ))
+   return 0;
+
+   return nth_midxed_pack_entry(m, e, pos);
+}
+
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir)
 {
struct multi_pack_index *m = r->objects->multi_pack_index;
diff --git a/midx.h b/midx.h
index 731ad6f094..6b74a0640f 100644
--- a/midx.h
+++ b/midx.h
@@ -6,6 +6,8 @@
 struct multi_pack_index;
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir);
+int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, 
uint32_t *result);
+int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct 
multi_pack_index *m);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir);
 
 int write_midx_file(const char *object_dir);
diff --git a/object-store.h b/object-store.h
index 7d67ad7aa9..03cc278758 100644
--- a/object-store.h
+++ b/object-store.h
@@ -106,6 +106,7 @@ struct multi_pack_index {
const unsigned char *chunk_large_offsets;
 
const char **pack_names;
+   struct packed_git **packs;
char object_dir[FLEX_ARRAY];
 };
 
diff --git a/packfile.c b/packfile.c
index 5d4493dbf4..bc763d91b9 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1902,11 +1902,17 @@ static int 

[PATCH v3 15/24] midx: write object offsets

2018-07-05 Thread Derrick Stolee
The final pair of chunks for the multi-pack-index file stores the object
offsets. We default to using 32-bit offsets as in the pack-index version
1 format, but if there exists an offset larger than 32-bits, we use a
trick similar to the pack-index version 2 format by storing all offsets
at least 2^31 in a 64-bit table; we use the 32-bit table to point into
that 64-bit table as necessary.

We only store these 64-bit offsets if necessary, so create a test that
manipulates a version 2 pack-index to fake a large offset. This allows
us to test that the large offset table is created, but the data does not
match the actual packfile offsets. The multi-pack-index offset does match
the (corrupted) pack-index offset, so a future feature will compare these
offsets during a 'verify' step.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt |  15 +++-
 midx.c  | 100 +++-
 object-store.h  |   2 +
 t/helper/test-read-midx.c   |   4 +
 t/t5319-multi-pack-index.sh |  44 ---
 5 files changed, 150 insertions(+), 15 deletions(-)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 3215f7bfcd..cab5bdd2ff 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -311,7 +311,20 @@ CHUNK DATA:
The OIDs for all objects in the MIDX are stored in lexicographic
order in this chunk.
 
-   (This section intentionally left incomplete.)
+   Object Offsets (ID: {'O', 'O', 'F', 'F'})
+   Stores two 4-byte values for every object.
+   1: The pack-int-id for the pack storing this object.
+   2: The offset within the pack.
+   If all offsets are less than 2^31, then the large offset chunk
+   will not exist and offsets are stored as in IDX v1.
+   If there is at least one offset value larger than 2^32-1, then
+   the large offset chunk must exist. If the large offset chunk
+   exists and the 31st bit is on, then removing that bit reveals
+   the row in the large offsets containing the 8-byte offset of
+   this object.
+
+   [Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
+   8-byte offsets into large packfiles.
 
 TRAILER:
 
diff --git a/midx.c b/midx.c
index 404147bb9f..cc35abe7a2 100644
--- a/midx.c
+++ b/midx.c
@@ -18,13 +18,18 @@
 #define MIDX_HASH_LEN 20
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
-#define MIDX_MAX_CHUNKS 3
+#define MIDX_MAX_CHUNKS 5
 #define MIDX_CHUNK_ALIGNMENT 4
 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
 #define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
 #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
+#define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
 #define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
 #define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
+#define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
+#define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
+#define MIDX_LARGE_OFFSET_NEEDED 0x8000
 
 static char *get_midx_filename(const char *object_dir)
 {
@@ -114,6 +119,14 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
m->chunk_oid_lookup = m->data + chunk_offset;
break;
 
+   case MIDX_CHUNKID_OBJECTOFFSETS:
+   m->chunk_object_offsets = m->data + 
chunk_offset;
+   break;
+
+   case MIDX_CHUNKID_LARGEOFFSETS:
+   m->chunk_large_offsets = m->data + chunk_offset;
+   break;
+
case 0:
die(_("terminating multi-pack-index chunk id 
appears earlier than expected"));
break;
@@ -133,6 +146,8 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
die(_("multi-pack-index missing required OID fanout chunk"));
if (!m->chunk_oid_lookup)
die(_("multi-pack-index missing required OID lookup chunk"));
+   if (!m->chunk_object_offsets)
+   die(_("multi-pack-index missing required object offsets 
chunk"));
 
m->num_objects = ntohl(m->chunk_oid_fanout[255]);
 
@@ -456,6 +471,56 @@ static size_t write_midx_oid_lookup(struct hashfile *f, 
unsigned char hash_len,
return written;
 }
 
+static size_t write_midx_object_offsets(struct hashfile *f, int 
large_offset_needed,
+   struct pack_midx_entry *objects, 
uint32_t nr_objects)
+{
+   struct pack_midx_entry *list = objects;
+   uint32_t i, nr_large_offset = 0;
+   size_t written = 0;
+
+   for (i 

[PATCH v3 19/24] midx: use midx in abbreviation calculations

2018-07-05 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 midx.c  | 11 ++
 midx.h  |  3 ++
 packfile.c  |  6 
 packfile.h  |  1 +
 sha1-name.c | 70 +
 t/t5319-multi-pack-index.sh |  3 +-
 6 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 84b045060a..e66025f066 100644
--- a/midx.c
+++ b/midx.c
@@ -205,6 +205,17 @@ int bsearch_midx(const struct object_id *oid, struct 
multi_pack_index *m, uint32
MIDX_HASH_LEN, result);
 }
 
+struct object_id *nth_midxed_object_oid(struct object_id *oid,
+   struct multi_pack_index *m,
+   uint32_t n)
+{
+   if (n >= m->num_objects)
+   return NULL;
+
+   hashcpy(oid->hash, m->chunk_oid_lookup + m->hash_len * n);
+   return oid;
+}
+
 static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 {
const unsigned char *offset_data;
diff --git a/midx.h b/midx.h
index 6b74a0640f..f7c2ec7893 100644
--- a/midx.h
+++ b/midx.h
@@ -7,6 +7,9 @@ struct multi_pack_index;
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, 
uint32_t *result);
+struct object_id *nth_midxed_object_oid(struct object_id *oid,
+   struct multi_pack_index *m,
+   uint32_t n);
 int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct 
multi_pack_index *m);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir);
 
diff --git a/packfile.c b/packfile.c
index bc763d91b9..c0eb5ac885 100644
--- a/packfile.c
+++ b/packfile.c
@@ -961,6 +961,12 @@ struct packed_git *get_packed_git(struct repository *r)
return r->objects->packed_git;
 }
 
+struct multi_pack_index *get_multi_pack_index(struct repository *r)
+{
+   prepare_packed_git(r);
+   return r->objects->multi_pack_index;
+}
+
 struct list_head *get_packed_git_mru(struct repository *r)
 {
prepare_packed_git(r);
diff --git a/packfile.h b/packfile.h
index b0eed44c0b..046280caf3 100644
--- a/packfile.h
+++ b/packfile.h
@@ -45,6 +45,7 @@ extern void install_packed_git(struct repository *r, struct 
packed_git *pack);
 
 struct packed_git *get_packed_git(struct repository *r);
 struct list_head *get_packed_git_mru(struct repository *r);
+struct multi_pack_index *get_multi_pack_index(struct repository *r);
 
 /*
  * Give a rough count of objects in the repository. This sacrifices accuracy
diff --git a/sha1-name.c b/sha1-name.c
index 60d9ef3c7e..7dc71201e6 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "repository.h"
+#include "midx.h"
 
 static int get_oid_oneline(const char *, struct object_id *, struct 
commit_list *);
 
@@ -149,6 +150,32 @@ static int match_sha(unsigned len, const unsigned char *a, 
const unsigned char *
return 1;
 }
 
+static void unique_in_midx(struct multi_pack_index *m,
+  struct disambiguate_state *ds)
+{
+   uint32_t num, i, first = 0;
+   const struct object_id *current = NULL;
+   num = m->num_objects;
+
+   if (!num)
+   return;
+
+   bsearch_midx(>bin_pfx, m, );
+
+   /*
+* At this point, "first" is the location of the lowest object
+* with an object name that could match "bin_pfx".  See if we have
+* 0, 1 or more objects that actually match(es).
+*/
+   for (i = first; i < num && !ds->ambiguous; i++) {
+   struct object_id oid;
+   current = nth_midxed_object_oid(, m, i);
+   if (!match_sha(ds->len, ds->bin_pfx.hash, current->hash))
+   break;
+   update_candidates(ds, current);
+   }
+}
+
 static void unique_in_pack(struct packed_git *p,
   struct disambiguate_state *ds)
 {
@@ -177,8 +204,12 @@ static void unique_in_pack(struct packed_git *p,
 
 static void find_short_packed_object(struct disambiguate_state *ds)
 {
+   struct multi_pack_index *m;
struct packed_git *p;
 
+   for (m = get_multi_pack_index(the_repository); m && !ds->ambiguous;
+m = m->next)
+   unique_in_midx(m, ds);
for (p = get_packed_git(the_repository); p && !ds->ambiguous;
 p = p->next)
unique_in_pack(p, ds);
@@ -527,6 +558,42 @@ static int extend_abbrev_len(const struct object_id *oid, 
void *cb_data)
return 0;
 }
 
+static void find_abbrev_len_for_midx(struct multi_pack_index *m,
+struct min_abbrev_data *mad)
+{
+   int match = 0;
+   uint32_t num, first = 0;
+   struct object_id oid;
+   const struct object_id *mad_oid;
+
+   if 

[PATCH v3 14/24] midx: write object id fanout chunk

2018-07-05 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt |  5 +++
 midx.c  | 53 +++--
 object-store.h  |  1 +
 t/helper/test-read-midx.c   |  4 +-
 t/t5319-multi-pack-index.sh | 16 
 5 files changed, 68 insertions(+), 11 deletions(-)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 78ee0489c6..3215f7bfcd 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -302,6 +302,11 @@ CHUNK DATA:
name. This is the only chunk not guaranteed to be a multiple of four
bytes in length, so should be the last chunk for alignment reasons.
 
+   OID Fanout (ID: {'O', 'I', 'D', 'F'})
+   The ith entry, F[i], stores the number of OIDs with first
+   byte at most i. Thus F[255] stores the total
+   number of objects.
+
OID Lookup (ID: {'O', 'I', 'D', 'L'})
The OIDs for all objects in the MIDX are stored in lexicographic
order in this chunk.
diff --git a/midx.c b/midx.c
index 7606addab6..404147bb9f 100644
--- a/midx.c
+++ b/midx.c
@@ -18,11 +18,13 @@
 #define MIDX_HASH_LEN 20
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
-#define MIDX_MAX_CHUNKS 2
+#define MIDX_MAX_CHUNKS 3
 #define MIDX_CHUNK_ALIGNMENT 4
 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
+#define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
 #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
 #define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
+#define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
 
 static char *get_midx_filename(const char *object_dir)
 {
@@ -104,6 +106,10 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
m->chunk_pack_names = m->data + chunk_offset;
break;
 
+   case MIDX_CHUNKID_OIDFANOUT:
+   m->chunk_oid_fanout = (uint32_t *)(m->data + 
chunk_offset);
+   break;
+
case MIDX_CHUNKID_OIDLOOKUP:
m->chunk_oid_lookup = m->data + chunk_offset;
break;
@@ -123,9 +129,13 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
 
if (!m->chunk_pack_names)
die(_("multi-pack-index missing required pack-name chunk"));
+   if (!m->chunk_oid_fanout)
+   die(_("multi-pack-index missing required OID fanout chunk"));
if (!m->chunk_oid_lookup)
die(_("multi-pack-index missing required OID lookup chunk"));
 
+   m->num_objects = ntohl(m->chunk_oid_fanout[255]);
+
m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names));
 
cur_pack_name = (const char *)m->chunk_pack_names;
@@ -391,6 +401,35 @@ static size_t write_midx_pack_names(struct hashfile *f,
return written;
 }
 
+static size_t write_midx_oid_fanout(struct hashfile *f,
+   struct pack_midx_entry *objects,
+   uint32_t nr_objects)
+{
+   struct pack_midx_entry *list = objects;
+   struct pack_midx_entry *last = objects + nr_objects;
+   uint32_t count = 0;
+   uint32_t i;
+
+   /*
+   * Write the first-level table (the list is sorted,
+   * but we use a 256-entry lookup to be able to avoid
+   * having to do eight extra binary search iterations).
+   */
+   for (i = 0; i < 256; i++) {
+   struct pack_midx_entry *next = list;
+
+   while (next < last && next->oid.hash[0] == i) {
+   count++;
+   next++;
+   }
+
+   hashwrite_be32(f, count);
+   list = next;
+   }
+
+   return MIDX_CHUNK_FANOUT_SIZE;
+}
+
 static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len,
struct pack_midx_entry *objects,
uint32_t nr_objects)
@@ -463,7 +502,7 @@ int write_midx_file(const char *object_dir)
FREE_AND_NULL(midx_name);
 
cur_chunk = 0;
-   num_chunks = 2;
+   num_chunks = 3;
 
written = write_midx_header(f, num_chunks, packs.nr);
 
@@ -471,9 +510,13 @@ int write_midx_file(const char *object_dir)
chunk_offsets[cur_chunk] = written + (num_chunks + 1) * 
MIDX_CHUNKLOOKUP_WIDTH;
 
cur_chunk++;
-   chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP;
+   chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT;
chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + 
packs.pack_name_concat_len;
 
+   cur_chunk++;
+   chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP;
+   chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + 
MIDX_CHUNK_FANOUT_SIZE;
+
 

[PATCH v3 10/24] multi-pack-index: write pack names in chunk

2018-07-05 Thread Derrick Stolee
The multi-pack-index needs to track which packfiles it indexes. Store
these in our first required chunk. Since filenames are not well
structured, add padding to keep good alignment in later chunks.

Modify the 'git multi-pack-index read' subcommand to output the
existence of the pack-file name chunk. Modify t5319-multi-pack-index.sh
to reflect this new output and the new expected number of chunks.

Defense in depth: A pattern we are using in the multi-pack-index feature
is to verify the data as we write it. We want to ensure we never write
invalid data to the multi-pack-index. There are many checks that verify
that the values we are writing fit the format definitions. This mainly
helps developers while working on the feature, but it can also identify
issues that only appear when dealing with very large data sets. These
large sets are hard to encode into test cases.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt |   6 +
 midx.c  | 176 +++-
 object-store.h  |   2 +
 t/helper/test-read-midx.c   |   7 +
 t/t5319-multi-pack-index.sh |   3 +-
 5 files changed, 191 insertions(+), 3 deletions(-)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index e060e693f4..6c5a77475f 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -296,6 +296,12 @@ CHUNK LOOKUP:
 
 CHUNK DATA:
 
+   Packfile Names (ID: {'P', 'N', 'A', 'M'})
+   Stores the packfile names as concatenated, null-terminated strings.
+   Packfiles must be listed in lexicographic order for fast lookups by
+   name. This is the only chunk not guaranteed to be a multiple of four
+   bytes in length, so should be the last chunk for alignment reasons.
+
(This section intentionally left incomplete.)
 
 TRAILER:
diff --git a/midx.c b/midx.c
index b0722485df..f78c161422 100644
--- a/midx.c
+++ b/midx.c
@@ -17,6 +17,11 @@
 #define MIDX_HASH_LEN 20
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
+#define MIDX_MAX_CHUNKS 1
+#define MIDX_CHUNK_ALIGNMENT 4
+#define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
+#define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
+
 static char *get_midx_filename(const char *object_dir)
 {
return xstrfmt("%s/pack/multi-pack-index", object_dir);
@@ -31,6 +36,7 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
void *midx_map = NULL;
uint32_t hash_version;
char *midx_name = get_midx_filename(object_dir);
+   uint32_t i;
 
fd = git_open(midx_name);
 
@@ -84,6 +90,33 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
 
m->num_packs = get_be32(m->data + MIDX_BYTE_NUM_PACKS);
 
+   for (i = 0; i < m->num_chunks; i++) {
+   uint32_t chunk_id = get_be32(m->data + MIDX_HEADER_SIZE +
+MIDX_CHUNKLOOKUP_WIDTH * i);
+   uint64_t chunk_offset = get_be64(m->data + MIDX_HEADER_SIZE + 4 
+
+MIDX_CHUNKLOOKUP_WIDTH * i);
+
+   switch (chunk_id) {
+   case MIDX_CHUNKID_PACKNAMES:
+   m->chunk_pack_names = m->data + chunk_offset;
+   break;
+
+   case 0:
+   die(_("terminating multi-pack-index chunk id 
appears earlier than expected"));
+   break;
+
+   default:
+   /*
+* Do nothing on unrecognized chunks, allowing 
future
+* extensions to add optional chunks.
+*/
+   break;
+   }
+   }
+
+   if (!m->chunk_pack_names)
+   die(_("multi-pack-index missing required pack-name chunk"));
+
return m;
 
 cleanup_fail:
@@ -116,8 +149,11 @@ static size_t write_midx_header(struct hashfile *f,
 
 struct pack_list {
struct packed_git **list;
+   char **names;
uint32_t nr;
uint32_t alloc_list;
+   uint32_t alloc_names;
+   size_t pack_name_concat_len;
 };
 
 static void add_pack_to_midx(const char *full_path, size_t full_path_len,
@@ -127,6 +163,7 @@ static void add_pack_to_midx(const char *full_path, size_t 
full_path_len,
 
if (ends_with(file_name, ".idx")) {
ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list);
+   ALLOC_GROW(packs->names, packs->nr + 1, packs->alloc_names);
 
packs->list[packs->nr] = add_packed_git(full_path,
 full_path_len,
@@ -136,17 +173,90 @@ static void add_pack_to_midx(const char *full_path, 
size_t full_path_len,

[PATCH v3 02/24] multi-pack-index: add format details

2018-07-05 Thread Derrick Stolee
The multi-pack-index feature generalizes the existing pack-index
feature by indexing objects across multiple pack-files.

Describe the basic file format, using a 12-byte header followed by
a lookup table for a list of "chunks" which will be described later.
The file ends with a footer containing a checksum using the hash
algorithm.

The header allows later versions to create breaking changes by
advancing the version number. We can also change the hash algorithm
using a different version value.

We will add the individual chunk format information as we introduce
the code that writes that information.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt | 49 +
 1 file changed, 49 insertions(+)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 70a99fd142..e060e693f4 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -252,3 +252,52 @@ Pack file entry: <+
 corresponding packfile.
 
 20-byte SHA-1-checksum of all of the above.
+
+== multi-pack-index (MIDX) files have the following format:
+
+The multi-pack-index files refer to multiple pack-files and loose objects.
+
+In order to allow extensions that add extra data to the MIDX, we organize
+the body into "chunks" and provide a lookup table at the beginning of the
+body. The header includes certain length values, such as the number of packs,
+the number of base MIDX files, hash lengths and types.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+   4-byte signature:
+   The signature is: {'M', 'I', 'D', 'X'}
+
+   1-byte version number:
+   Git only writes or recognizes version 1.
+
+   1-byte Object Id Version
+   Git only writes or recognizes version 1 (SHA1).
+
+   1-byte number of "chunks"
+
+   1-byte number of base multi-pack-index files:
+   This value is currently always zero.
+
+   4-byte number of pack files
+
+CHUNK LOOKUP:
+
+   (C + 1) * 12 bytes providing the chunk offsets:
+   First 4 bytes describe chunk id. Value 0 is a terminating label.
+   Other 8 bytes provide offset in current file for chunk to start.
+   (Chunks are provided in file-order, so you can infer the length
+   using the next chunk position if necessary.)
+
+   The remaining data in the body is described one chunk at a time, and
+   these chunks may be given in any order. Chunks are required unless
+   otherwise specified.
+
+CHUNK DATA:
+
+   (This section intentionally left incomplete.)
+
+TRAILER:
+
+   20-byte SHA1-checksum of the above contents.
-- 
2.18.0.118.gd4f65b8d14



[PATCH v3 06/24] multi-pack-index: load into memory

2018-07-05 Thread Derrick Stolee
Create a new multi_pack_index struct for loading multi-pack-indexes into
memory. Create a test-tool builtin for reading basic information about
that multi-pack-index to verify the correct data is written.

Signed-off-by: Derrick Stolee 
---
 Makefile|  1 +
 midx.c  | 82 +
 midx.h  |  4 ++
 object-store.h  | 16 
 t/helper/test-read-midx.c   | 34 +++
 t/helper/test-tool.c|  1 +
 t/helper/test-tool.h|  1 +
 t/t5319-multi-pack-index.sh | 12 +-
 8 files changed, 150 insertions(+), 1 deletion(-)
 create mode 100644 t/helper/test-read-midx.c

diff --git a/Makefile b/Makefile
index f5636c711d..0b801d1b16 100644
--- a/Makefile
+++ b/Makefile
@@ -717,6 +717,7 @@ TEST_BUILTINS_OBJS += test-online-cpus.o
 TEST_BUILTINS_OBJS += test-path-utils.o
 TEST_BUILTINS_OBJS += test-prio-queue.o
 TEST_BUILTINS_OBJS += test-read-cache.o
+TEST_BUILTINS_OBJS += test-read-midx.o
 TEST_BUILTINS_OBJS += test-ref-store.o
 TEST_BUILTINS_OBJS += test-regex.o
 TEST_BUILTINS_OBJS += test-revision-walking.o
diff --git a/midx.c b/midx.c
index f85f2d334d..fb388f5858 100644
--- a/midx.c
+++ b/midx.c
@@ -1,18 +1,100 @@
 #include "cache.h"
 #include "csum-file.h"
 #include "lockfile.h"
+#include "object-store.h"
 #include "midx.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
+#define MIDX_BYTE_FILE_VERSION 4
+#define MIDX_BYTE_HASH_VERSION 5
+#define MIDX_BYTE_NUM_CHUNKS 6
+#define MIDX_BYTE_NUM_PACKS 8
 #define MIDX_HASH_VERSION 1
 #define MIDX_HEADER_SIZE 12
+#define MIDX_HASH_LEN 20
+#define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
 static char *get_midx_filename(const char *object_dir)
 {
return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
 
+struct multi_pack_index *load_multi_pack_index(const char *object_dir)
+{
+   struct multi_pack_index *m = NULL;
+   int fd;
+   struct stat st;
+   size_t midx_size;
+   void *midx_map = NULL;
+   uint32_t hash_version;
+   char *midx_name = get_midx_filename(object_dir);
+
+   fd = git_open(midx_name);
+
+   if (fd < 0)
+   goto cleanup_fail;
+   if (fstat(fd, )) {
+   error_errno(_("failed to read %s"), midx_name);
+   goto cleanup_fail;
+   }
+
+   midx_size = xsize_t(st.st_size);
+
+   if (midx_size < MIDX_MIN_SIZE) {
+   close(fd);
+   error(_("multi-pack-index file %s is too small"), midx_name);
+   goto cleanup_fail;
+   }
+
+   FREE_AND_NULL(midx_name);
+
+   midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0);
+
+   m = xcalloc(1, sizeof(*m) + strlen(object_dir) + 1);
+   strcpy(m->object_dir, object_dir);
+   m->fd = fd;
+   m->data = midx_map;
+   m->data_len = midx_size;
+
+   m->signature = get_be32(m->data);
+   if (m->signature != MIDX_SIGNATURE) {
+   error(_("multi-pack-index signature 0x%08x does not match 
signature 0x%08x"),
+ m->signature, MIDX_SIGNATURE);
+   goto cleanup_fail;
+   }
+
+   m->version = m->data[MIDX_BYTE_FILE_VERSION];
+   if (m->version != MIDX_VERSION) {
+   error(_("multi-pack-index version %d not recognized"),
+ m->version);
+   goto cleanup_fail;
+   }
+
+   hash_version = m->data[MIDX_BYTE_HASH_VERSION];
+   if (hash_version != MIDX_HASH_VERSION) {
+   error(_("hash version %u does not match"), hash_version);
+   goto cleanup_fail;
+   }
+   m->hash_len = MIDX_HASH_LEN;
+
+   m->num_chunks = m->data[MIDX_BYTE_NUM_CHUNKS];
+
+   m->num_packs = get_be32(m->data + MIDX_BYTE_NUM_PACKS);
+
+   return m;
+
+cleanup_fail:
+   /* no need to check for NULL when freeing */
+   free(m);
+   free(midx_name);
+   if (midx_map)
+   munmap(midx_map, midx_size);
+   if (0 <= fd)
+   close(fd);
+   return NULL;
+}
+
 static size_t write_midx_header(struct hashfile *f,
unsigned char num_chunks,
uint32_t num_packs)
diff --git a/midx.h b/midx.h
index dbdbe9f873..2d83dd9ec1 100644
--- a/midx.h
+++ b/midx.h
@@ -1,6 +1,10 @@
 #ifndef __MIDX_H__
 #define __MIDX_H__
 
+struct multi_pack_index;
+
+struct multi_pack_index *load_multi_pack_index(const char *object_dir);
+
 int write_midx_file(const char *object_dir);
 
 #endif
diff --git a/object-store.h b/object-store.h
index d683112fd7..4f410841cc 100644
--- a/object-store.h
+++ b/object-store.h
@@ -84,6 +84,22 @@ struct packed_git {
char pack_name[FLEX_ARRAY]; /* more */
 };
 
+struct multi_pack_index {
+   int fd;
+
+   const unsigned char *data;
+   size_t data_len;
+
+   uint32_t signature;
+   unsigned char version;
+   unsigned char hash_len;
+   unsigned 

[PATCH v3 09/24] multi-pack-index: read packfile list

2018-07-05 Thread Derrick Stolee
When constructing a multi-pack-index file for a given object directory,
read the files within the enclosed pack directory and find matches that
end with ".idx" and find the correct paired packfile using
add_packed_git().

Signed-off-by: Derrick Stolee 
---
 midx.c  | 46 -
 t/t5319-multi-pack-index.sh | 16 ++---
 2 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/midx.c b/midx.c
index fb388f5858..b0722485df 100644
--- a/midx.c
+++ b/midx.c
@@ -1,6 +1,8 @@
 #include "cache.h"
 #include "csum-file.h"
+#include "dir.h"
 #include "lockfile.h"
+#include "packfile.h"
 #include "object-store.h"
 #include "midx.h"
 
@@ -112,12 +114,39 @@ static size_t write_midx_header(struct hashfile *f,
return MIDX_HEADER_SIZE;
 }
 
+struct pack_list {
+   struct packed_git **list;
+   uint32_t nr;
+   uint32_t alloc_list;
+};
+
+static void add_pack_to_midx(const char *full_path, size_t full_path_len,
+const char *file_name, void *data)
+{
+   struct pack_list *packs = (struct pack_list *)data;
+
+   if (ends_with(file_name, ".idx")) {
+   ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list);
+
+   packs->list[packs->nr] = add_packed_git(full_path,
+full_path_len,
+0);
+   if (!packs->list[packs->nr]) {
+   warning(_("failed to add packfile '%s'"),
+   full_path);
+   return;
+   }
+   }
+}
+
 int write_midx_file(const char *object_dir)
 {
unsigned char num_chunks = 0;
char *midx_name;
+   uint32_t i;
struct hashfile *f = NULL;
struct lock_file lk;
+   struct pack_list packs;
 
midx_name = get_midx_filename(object_dir);
if (safe_create_leading_directories(midx_name)) {
@@ -126,14 +155,29 @@ int write_midx_file(const char *object_dir)
  midx_name);
}
 
+   packs.nr = 0;
+   packs.alloc_list = 16;
+   packs.list = NULL;
+   ALLOC_ARRAY(packs.list, packs.alloc_list);
+
+   for_each_file_in_pack_dir(object_dir, add_pack_to_midx, );
+
hold_lock_file_for_update(, midx_name, LOCK_DIE_ON_ERROR);
f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
FREE_AND_NULL(midx_name);
 
-   write_midx_header(f, num_chunks, 0);
+   write_midx_header(f, num_chunks, packs.nr);
 
finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
commit_lock_file();
 
+   for (i = 0; i < packs.nr; i++) {
+   if (packs.list[i]) {
+   close_pack(packs.list[i]);
+   free(packs.list[i]);
+   }
+   }
+
+   free(packs.list);
return 0;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 1be7be02b8..fd0a3f3be7 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -4,8 +4,9 @@ test_description='multi-pack-indexes'
 . ./test-lib.sh
 
 midx_read_expect () {
+   NUM_PACKS=$1
cat >expect <<-EOF
-   header: 4d494458 1 0 0
+   header: 4d494458 1 0 $NUM_PACKS
object_dir: .
EOF
test-tool read-midx . >actual &&
@@ -15,8 +16,7 @@ midx_read_expect () {
 test_expect_success 'write midx with no packs' '
test_when_finished rm -f pack/multi-pack-index &&
git multi-pack-index --object-dir=. write &&
-   test_path_is_file pack/multi-pack-index &&
-   midx_read_expect
+   midx_read_expect 0
 '
 
 test_expect_success 'create objects' '
@@ -47,13 +47,13 @@ test_expect_success 'write midx with one v1 pack' '
pack=$(git pack-objects --index-version=1 pack/test obj-list &&
git update-ref HEAD $commit &&
-   git pack-objects --index-version=2 test-pack 

[PATCH v3 05/24] midx: write header information to lockfile

2018-07-05 Thread Derrick Stolee
As we begin writing the multi-pack-index format to disk, start with
the basics: the 12-byte header and the 20-byte checksum footer. Start
with these basics so we can add the rest of the format in small
increments.

As we implement the format, we will use a technique to check that our
computed offsets within the multi-pack-index file match what we are
actually writing. Each method that writes to the hashfile will return
the number of bytes written, and we will track that those values match
our expectations.

Currently, write_midx_header() returns 12, but is not checked. We will
check the return value in a later commit.

Signed-off-by: Derrick Stolee 
---
 midx.c  | 50 +
 t/t5319-multi-pack-index.sh |  3 ++-
 2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 32468db1a2..f85f2d334d 100644
--- a/midx.c
+++ b/midx.c
@@ -1,7 +1,57 @@
 #include "cache.h"
+#include "csum-file.h"
+#include "lockfile.h"
 #include "midx.h"
 
+#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
+#define MIDX_VERSION 1
+#define MIDX_HASH_VERSION 1
+#define MIDX_HEADER_SIZE 12
+
+static char *get_midx_filename(const char *object_dir)
+{
+   return xstrfmt("%s/pack/multi-pack-index", object_dir);
+}
+
+static size_t write_midx_header(struct hashfile *f,
+   unsigned char num_chunks,
+   uint32_t num_packs)
+{
+   unsigned char byte_values[4];
+
+   hashwrite_be32(f, MIDX_SIGNATURE);
+   byte_values[0] = MIDX_VERSION;
+   byte_values[1] = MIDX_HASH_VERSION;
+   byte_values[2] = num_chunks;
+   byte_values[3] = 0; /* unused */
+   hashwrite(f, byte_values, sizeof(byte_values));
+   hashwrite_be32(f, num_packs);
+
+   return MIDX_HEADER_SIZE;
+}
+
 int write_midx_file(const char *object_dir)
 {
+   unsigned char num_chunks = 0;
+   char *midx_name;
+   struct hashfile *f = NULL;
+   struct lock_file lk;
+
+   midx_name = get_midx_filename(object_dir);
+   if (safe_create_leading_directories(midx_name)) {
+   UNLEAK(midx_name);
+   die_errno(_("unable to create leading directories of %s"),
+ midx_name);
+   }
+
+   hold_lock_file_for_update(, midx_name, LOCK_DIE_ON_ERROR);
+   f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
+   FREE_AND_NULL(midx_name);
+
+   write_midx_header(f, num_chunks, 0);
+
+   finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
+   commit_lock_file();
+
return 0;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index ec3ddbe79c..8622a7cdce 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -4,7 +4,8 @@ test_description='multi-pack-indexes'
 . ./test-lib.sh
 
 test_expect_success 'write midx with no packs' '
-   git multi-pack-index --object-dir=. write
+   git multi-pack-index --object-dir=. write &&
+   test_path_is_file pack/multi-pack-index
 '
 
 test_done
-- 
2.18.0.118.gd4f65b8d14



[PATCH v3 03/24] multi-pack-index: add builtin

2018-07-05 Thread Derrick Stolee
This new 'git multi-pack-index' builtin will be the plumbing access
for writing, reading, and checking multi-pack-index files. The
initial implementation is a no-op.

Signed-off-by: Derrick Stolee 
---
 .gitignore |  3 +-
 Documentation/git-multi-pack-index.txt | 36 
 Makefile   |  1 +
 builtin.h  |  1 +
 builtin/multi-pack-index.c | 38 ++
 command-list.txt   |  1 +
 git.c  |  1 +
 7 files changed, 80 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/git-multi-pack-index.txt
 create mode 100644 builtin/multi-pack-index.c

diff --git a/.gitignore b/.gitignore
index 388cc4beee..25633bc515 100644
--- a/.gitignore
+++ b/.gitignore
@@ -99,8 +99,9 @@
 /git-mergetool--lib
 /git-mktag
 /git-mktree
-/git-name-rev
+/git-multi-pack-index
 /git-mv
+/git-name-rev
 /git-notes
 /git-p4
 /git-pack-redundant
diff --git a/Documentation/git-multi-pack-index.txt 
b/Documentation/git-multi-pack-index.txt
new file mode 100644
index 00..a83c0496a6
--- /dev/null
+++ b/Documentation/git-multi-pack-index.txt
@@ -0,0 +1,36 @@
+git-multi-pack-index(1)
+==
+
+NAME
+
+git-multi-pack-index - Write and verify multi-pack-indexes
+
+
+SYNOPSIS
+
+[verse]
+'git multi-pack-index' [--object-dir ]
+
+DESCRIPTION
+---
+Write or verify a multi-pack-index (MIDX) file.
+
+OPTIONS
+---
+
+--object-dir ::
+   Use given directory for the location of Git objects. We check
+   `/packs/multi-pack-index` for the current MIDX file, and
+   `/packs` for the pack-files to index.
+
+
+SEE ALSO
+
+See link:technical/multi-pack-index.html[The Multi-Pack-Index Design
+Document] and link:technical/pack-format.html[The Multi-Pack-Index
+Format] for more information on the multi-pack-index feature.
+
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index e4b503d259..54610875ec 100644
--- a/Makefile
+++ b/Makefile
@@ -1047,6 +1047,7 @@ BUILTIN_OBJS += builtin/merge-recursive.o
 BUILTIN_OBJS += builtin/merge-tree.o
 BUILTIN_OBJS += builtin/mktag.o
 BUILTIN_OBJS += builtin/mktree.o
+BUILTIN_OBJS += builtin/multi-pack-index.o
 BUILTIN_OBJS += builtin/mv.o
 BUILTIN_OBJS += builtin/name-rev.o
 BUILTIN_OBJS += builtin/notes.o
diff --git a/builtin.h b/builtin.h
index 4e0f64723e..70997d7ace 100644
--- a/builtin.h
+++ b/builtin.h
@@ -191,6 +191,7 @@ extern int cmd_merge_recursive(int argc, const char **argv, 
const char *prefix);
 extern int cmd_merge_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_mktag(int argc, const char **argv, const char *prefix);
 extern int cmd_mktree(int argc, const char **argv, const char *prefix);
+extern int cmd_multi_pack_index(int argc, const char **argv, const char 
*prefix);
 extern int cmd_mv(int argc, const char **argv, const char *prefix);
 extern int cmd_name_rev(int argc, const char **argv, const char *prefix);
 extern int cmd_notes(int argc, const char **argv, const char *prefix);
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
new file mode 100644
index 00..4853291477
--- /dev/null
+++ b/builtin/multi-pack-index.c
@@ -0,0 +1,38 @@
+#include "builtin.h"
+#include "cache.h"
+#include "config.h"
+#include "parse-options.h"
+
+static char const * const builtin_multi_pack_index_usage[] = {
+   N_("git multi-pack-index [--object-dir ]"),
+   NULL
+};
+
+static struct opts_multi_pack_index {
+   const char *object_dir;
+} opts;
+
+int cmd_multi_pack_index(int argc, const char **argv,
+const char *prefix)
+{
+   static struct option builtin_multi_pack_index_options[] = {
+   OPT_FILENAME(0, "object-dir", _dir,
+ N_("The object directory containing set of packfile and 
pack-index pairs")),
+   OPT_END(),
+   };
+
+   if (argc == 2 && !strcmp(argv[1], "-h"))
+   usage_with_options(builtin_multi_pack_index_usage,
+  builtin_multi_pack_index_options);
+
+   git_config(git_default_config, NULL);
+
+   argc = parse_options(argc, argv, prefix,
+builtin_multi_pack_index_options,
+builtin_multi_pack_index_usage, 0);
+
+   if (!opts.object_dir)
+   opts.object_dir = get_object_directory();
+
+   return 0;
+}
diff --git a/command-list.txt b/command-list.txt
index e1c26c1bb7..61071f8fa2 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -123,6 +123,7 @@ git-merge-index plumbingmanipulators
 git-merge-one-file  purehelpers
 git-mergetool   ancillarymanipulators   
complete
 git-merge-tree  ancillaryinterrogators
+git-multi-pack-indexplumbingmanipulators
 git-mktag 

[PATCH v3 12/24] midx: sort and deduplicate objects from packfiles

2018-07-05 Thread Derrick Stolee
Before writing a list of objects and their offsets to a multi-pack-index,
we need to collect the list of objects contained in the packfiles. There
may be multiple copies of some objects, so this list must be deduplicated.

It is possible to artificially get into a state where there are many
duplicate copies of objects. That can create high memory pressure if we
are to create a list of all objects before de-duplication. To reduce
this memory pressure without a significant performance drop,
automatically group objects by the first byte of their object id. Use
the IDX fanout tables to group the data, copy to a local array, then
sort.

Copy only the de-duplicated entries. Select the duplicate based on the
most-recent modified time of a packfile containing the object.

Signed-off-by: Derrick Stolee 
---
 midx.c | 131 -
 packfile.c |  17 +++
 packfile.h |   2 +
 3 files changed, 148 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index ffe29af65d..028e3aa5e9 100644
--- a/midx.c
+++ b/midx.c
@@ -4,6 +4,7 @@
 #include "lockfile.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "packfile.h"
 #include "midx.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
@@ -183,14 +184,22 @@ static void add_pack_to_midx(const char *full_path, 
size_t full_path_len,
ALLOC_GROW(packs->names, packs->nr + 1, packs->alloc_names);
 
packs->list[packs->nr] = add_packed_git(full_path,
-full_path_len,
-0);
+   full_path_len, 0);
+
if (!packs->list[packs->nr]) {
warning(_("failed to add packfile '%s'"),
full_path);
return;
}
 
+   if (open_pack_index(packs->list[packs->nr])) {
+   warning(_("failed to open pack-index '%s'"),
+   full_path);
+   close_pack(packs->list[packs->nr]);
+   FREE_AND_NULL(packs->list[packs->nr]);
+   return;
+   }
+
packs->names[packs->nr] = xstrdup(file_name);
packs->pack_name_concat_len += strlen(file_name) + 1;
packs->nr++;
@@ -231,6 +240,119 @@ static void sort_packs_by_name(char **pack_names, 
uint32_t nr_packs, uint32_t *p
free(pairs);
 }
 
+struct pack_midx_entry {
+   struct object_id oid;
+   uint32_t pack_int_id;
+   time_t pack_mtime;
+   uint64_t offset;
+};
+
+static int midx_oid_compare(const void *_a, const void *_b)
+{
+   const struct pack_midx_entry *a = (const struct pack_midx_entry *)_a;
+   const struct pack_midx_entry *b = (const struct pack_midx_entry *)_b;
+   int cmp = oidcmp(>oid, >oid);
+
+   if (cmp)
+   return cmp;
+
+   if (a->pack_mtime > b->pack_mtime)
+   return -1;
+   else if (a->pack_mtime < b->pack_mtime)
+   return 1;
+
+   return a->pack_int_id - b->pack_int_id;
+}
+
+static void fill_pack_entry(uint32_t pack_int_id,
+   struct packed_git *p,
+   uint32_t cur_object,
+   struct pack_midx_entry *entry)
+{
+   if (!nth_packed_object_oid(>oid, p, cur_object))
+   die(_("failed to locate object %d in packfile"), cur_object);
+
+   entry->pack_int_id = pack_int_id;
+   entry->pack_mtime = p->mtime;
+
+   entry->offset = nth_packed_object_offset(p, cur_object);
+}
+
+/*
+ * It is possible to artificially get into a state where there are many
+ * duplicate copies of objects. That can create high memory pressure if
+ * we are to create a list of all objects before de-duplication. To reduce
+ * this memory pressure without a significant performance drop, automatically
+ * group objects by the first byte of their object id. Use the IDX fanout
+ * tables to group the data, copy to a local array, then sort.
+ *
+ * Copy only the de-duplicated entries (selected by most-recent modified time
+ * of a packfile containing the object).
+ */
+static struct pack_midx_entry *get_sorted_entries(struct packed_git **p,
+ uint32_t *perm,
+ uint32_t nr_packs,
+ uint32_t *nr_objects)
+{
+   uint32_t cur_fanout, cur_pack, cur_object;
+   uint32_t alloc_fanout, alloc_objects, total_objects = 0;
+   struct pack_midx_entry *entries_by_fanout = NULL;
+   struct pack_midx_entry *deduplicated_entries = NULL;
+
+   for (cur_pack = 0; cur_pack < nr_packs; cur_pack++)
+   total_objects += p[cur_pack]->num_objects;
+
+   /*
+* As we de-duplicate by fanout value, we expect the 

[PATCH v3 08/24] packfile: generalize pack directory list

2018-07-05 Thread Derrick Stolee
In anticipation of sharing the pack directory listing with the
multi-pack-index, generalize prepare_packed_git_one() into
for_each_file_in_pack_dir().

Signed-off-by: Derrick Stolee 
---
 packfile.c | 101 +
 packfile.h |   6 
 2 files changed, 69 insertions(+), 38 deletions(-)

diff --git a/packfile.c b/packfile.c
index 7cd45aa4b2..ee1ab9b804 100644
--- a/packfile.c
+++ b/packfile.c
@@ -738,13 +738,14 @@ static void report_pack_garbage(struct string_list *list)
report_helper(list, seen_bits, first, list->nr);
 }
 
-static void prepare_packed_git_one(struct repository *r, char *objdir, int 
local)
+void for_each_file_in_pack_dir(const char *objdir,
+  each_file_in_pack_dir_fn fn,
+  void *data)
 {
struct strbuf path = STRBUF_INIT;
size_t dirnamelen;
DIR *dir;
struct dirent *de;
-   struct string_list garbage = STRING_LIST_INIT_DUP;
 
strbuf_addstr(, objdir);
strbuf_addstr(, "/pack");
@@ -759,53 +760,77 @@ static void prepare_packed_git_one(struct repository *r, 
char *objdir, int local
strbuf_addch(, '/');
dirnamelen = path.len;
while ((de = readdir(dir)) != NULL) {
-   struct packed_git *p;
-   size_t base_len;
-
if (is_dot_or_dotdot(de->d_name))
continue;
 
strbuf_setlen(, dirnamelen);
strbuf_addstr(, de->d_name);
 
-   base_len = path.len;
-   if (strip_suffix_mem(path.buf, _len, ".idx")) {
-   /* Don't reopen a pack we already have. */
-   for (p = r->objects->packed_git; p;
-p = p->next) {
-   size_t len;
-   if (strip_suffix(p->pack_name, ".pack", ) &&
-   len == base_len &&
-   !memcmp(p->pack_name, path.buf, len))
-   break;
-   }
-   if (p == NULL &&
-   /*
-* See if it really is a valid .idx file with
-* corresponding .pack file that we can map.
-*/
-   (p = add_packed_git(path.buf, path.len, local)) != 
NULL)
-   install_packed_git(r, p);
-   }
-
-   if (!report_garbage)
-   continue;
-
-   if (ends_with(de->d_name, ".idx") ||
-   ends_with(de->d_name, ".pack") ||
-   ends_with(de->d_name, ".bitmap") ||
-   ends_with(de->d_name, ".keep") ||
-   ends_with(de->d_name, ".promisor"))
-   string_list_append(, path.buf);
-   else
-   report_garbage(PACKDIR_FILE_GARBAGE, path.buf);
+   fn(path.buf, path.len, de->d_name, data);
}
+
closedir(dir);
-   report_pack_garbage();
-   string_list_clear(, 0);
strbuf_release();
 }
 
+struct prepare_pack_data {
+   struct repository *r;
+   struct string_list *garbage;
+   int local;
+};
+
+static void prepare_pack(const char *full_name, size_t full_name_len,
+const char *file_name, void *_data)
+{
+   struct prepare_pack_data *data = (struct prepare_pack_data *)_data;
+   struct packed_git *p;
+   size_t base_len = full_name_len;
+
+   if (strip_suffix_mem(full_name, _len, ".idx")) {
+   /* Don't reopen a pack we already have. */
+   for (p = data->r->objects->packed_git; p; p = p->next) {
+   size_t len;
+   if (strip_suffix(p->pack_name, ".pack", ) &&
+   len == base_len &&
+   !memcmp(p->pack_name, full_name, len))
+   break;
+   }
+
+   if (!p) {
+   p = add_packed_git(full_name, full_name_len, 
data->local);
+   if (p)
+   install_packed_git(data->r, p);
+   }
+   }
+
+   if (!report_garbage)
+   return;
+
+   if (ends_with(file_name, ".idx") ||
+   ends_with(file_name, ".pack") ||
+   ends_with(file_name, ".bitmap") ||
+   ends_with(file_name, ".keep") ||
+   ends_with(file_name, ".promisor"))
+   string_list_append(data->garbage, full_name);
+   else
+   report_garbage(PACKDIR_FILE_GARBAGE, full_name);
+}
+
+static void prepare_packed_git_one(struct repository *r, char *objdir, int 
local)
+{
+   struct prepare_pack_data data;
+   struct string_list garbage = STRING_LIST_INIT_DUP;
+
+   data.r = r;
+   data.garbage = 
+   

[PATCH v3 04/24] multi-pack-index: add 'write' verb

2018-07-05 Thread Derrick Stolee
In anticipation of writing multi-pack-indexes, add a
'git multi-pack-index write' subcommand and send the options to a
write_midx_file() method. Also create a basic test file that tests
the 'write' subcommand.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-multi-pack-index.txt | 22 +-
 Makefile   |  1 +
 builtin/multi-pack-index.c | 10 +-
 midx.c |  7 +++
 midx.h |  6 ++
 t/t5319-multi-pack-index.sh| 10 ++
 6 files changed, 54 insertions(+), 2 deletions(-)
 create mode 100644 midx.c
 create mode 100644 midx.h
 create mode 100755 t/t5319-multi-pack-index.sh

diff --git a/Documentation/git-multi-pack-index.txt 
b/Documentation/git-multi-pack-index.txt
index a83c0496a6..be97c9372e 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -9,7 +9,7 @@ git-multi-pack-index - Write and verify multi-pack-indexes
 SYNOPSIS
 
 [verse]
-'git multi-pack-index' [--object-dir ]
+'git multi-pack-index' [--object-dir ] 
 
 DESCRIPTION
 ---
@@ -23,6 +23,26 @@ OPTIONS
`/packs/multi-pack-index` for the current MIDX file, and
`/packs` for the pack-files to index.
 
+write::
+   When given as the verb, write a new MIDX file to
+   `/packs/multi-pack-index`.
+
+
+EXAMPLES
+
+
+* Write a MIDX file for the packfiles in the current .git folder.
++
+---
+$ git multi-pack-index write
+---
+
+* Write a MIDX file for the packfiles in an alternate.
++
+---
+$ git multi-pack-index --object-dir  write
+---
+
 
 SEE ALSO
 
diff --git a/Makefile b/Makefile
index 54610875ec..f5636c711d 100644
--- a/Makefile
+++ b/Makefile
@@ -890,6 +890,7 @@ LIB_OBJS += merge.o
 LIB_OBJS += merge-blobs.o
 LIB_OBJS += merge-recursive.o
 LIB_OBJS += mergesort.o
+LIB_OBJS += midx.o
 LIB_OBJS += name-hash.o
 LIB_OBJS += notes.o
 LIB_OBJS += notes-cache.o
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 4853291477..14b32e1373 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -2,9 +2,10 @@
 #include "cache.h"
 #include "config.h"
 #include "parse-options.h"
+#include "midx.h"
 
 static char const * const builtin_multi_pack_index_usage[] = {
-   N_("git multi-pack-index [--object-dir ]"),
+   N_("git multi-pack-index [--object-dir ] [write]"),
NULL
 };
 
@@ -34,5 +35,12 @@ int cmd_multi_pack_index(int argc, const char **argv,
if (!opts.object_dir)
opts.object_dir = get_object_directory();
 
+   if (argc == 0)
+   usage_with_options(builtin_multi_pack_index_usage,
+  builtin_multi_pack_index_options);
+
+   if (!strcmp(argv[0], "write"))
+   return write_midx_file(opts.object_dir);
+
return 0;
 }
diff --git a/midx.c b/midx.c
new file mode 100644
index 00..32468db1a2
--- /dev/null
+++ b/midx.c
@@ -0,0 +1,7 @@
+#include "cache.h"
+#include "midx.h"
+
+int write_midx_file(const char *object_dir)
+{
+   return 0;
+}
diff --git a/midx.h b/midx.h
new file mode 100644
index 00..dbdbe9f873
--- /dev/null
+++ b/midx.h
@@ -0,0 +1,6 @@
+#ifndef __MIDX_H__
+#define __MIDX_H__
+
+int write_midx_file(const char *object_dir);
+
+#endif
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
new file mode 100755
index 00..ec3ddbe79c
--- /dev/null
+++ b/t/t5319-multi-pack-index.sh
@@ -0,0 +1,10 @@
+#!/bin/sh
+
+test_description='multi-pack-indexes'
+. ./test-lib.sh
+
+test_expect_success 'write midx with no packs' '
+   git multi-pack-index --object-dir=. write
+'
+
+test_done
-- 
2.18.0.118.gd4f65b8d14



[PATCH v3 13/24] midx: write object ids in a chunk

2018-07-05 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt |  4 +++
 midx.c  | 47 +++--
 object-store.h  |  1 +
 t/helper/test-read-midx.c   |  2 ++
 t/t5319-multi-pack-index.sh |  4 +--
 5 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 6c5a77475f..78ee0489c6 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -302,6 +302,10 @@ CHUNK DATA:
name. This is the only chunk not guaranteed to be a multiple of four
bytes in length, so should be the last chunk for alignment reasons.
 
+   OID Lookup (ID: {'O', 'I', 'D', 'L'})
+   The OIDs for all objects in the MIDX are stored in lexicographic
+   order in this chunk.
+
(This section intentionally left incomplete.)
 
 TRAILER:
diff --git a/midx.c b/midx.c
index 028e3aa5e9..7606addab6 100644
--- a/midx.c
+++ b/midx.c
@@ -18,9 +18,10 @@
 #define MIDX_HASH_LEN 20
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
-#define MIDX_MAX_CHUNKS 1
+#define MIDX_MAX_CHUNKS 2
 #define MIDX_CHUNK_ALIGNMENT 4
 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
+#define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
 #define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
 
 static char *get_midx_filename(const char *object_dir)
@@ -103,6 +104,10 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
m->chunk_pack_names = m->data + chunk_offset;
break;
 
+   case MIDX_CHUNKID_OIDLOOKUP:
+   m->chunk_oid_lookup = m->data + chunk_offset;
+   break;
+
case 0:
die(_("terminating multi-pack-index chunk id 
appears earlier than expected"));
break;
@@ -118,6 +123,8 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
 
if (!m->chunk_pack_names)
die(_("multi-pack-index missing required pack-name chunk"));
+   if (!m->chunk_oid_lookup)
+   die(_("multi-pack-index missing required OID lookup chunk"));
 
m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names));
 
@@ -384,6 +391,32 @@ static size_t write_midx_pack_names(struct hashfile *f,
return written;
 }
 
+static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len,
+   struct pack_midx_entry *objects,
+   uint32_t nr_objects)
+{
+   struct pack_midx_entry *list = objects;
+   uint32_t i;
+   size_t written = 0;
+
+   for (i = 0; i < nr_objects; i++) {
+   struct pack_midx_entry *obj = list++;
+
+   if (i < nr_objects - 1) {
+   struct pack_midx_entry *next = list;
+   if (oidcmp(>oid, >oid) >= 0)
+   BUG("OIDs not in order: %s >= %s",
+   oid_to_hex(>oid),
+   oid_to_hex(>oid));
+   }
+
+   hashwrite(f, obj->oid.hash, (int)hash_len);
+   written += hash_len;
+   }
+
+   return written;
+}
+
 int write_midx_file(const char *object_dir)
 {
unsigned char cur_chunk, num_chunks = 0;
@@ -430,7 +463,7 @@ int write_midx_file(const char *object_dir)
FREE_AND_NULL(midx_name);
 
cur_chunk = 0;
-   num_chunks = 1;
+   num_chunks = 2;
 
written = write_midx_header(f, num_chunks, packs.nr);
 
@@ -438,9 +471,13 @@ int write_midx_file(const char *object_dir)
chunk_offsets[cur_chunk] = written + (num_chunks + 1) * 
MIDX_CHUNKLOOKUP_WIDTH;
 
cur_chunk++;
-   chunk_ids[cur_chunk] = 0;
+   chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP;
chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + 
packs.pack_name_concat_len;
 
+   cur_chunk++;
+   chunk_ids[cur_chunk] = 0;
+   chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * 
MIDX_HASH_LEN;
+
for (i = 0; i <= num_chunks; i++) {
if (i && chunk_offsets[i] < chunk_offsets[i - 1])
BUG("incorrect chunk offsets: %"PRIu64" before %"PRIu64,
@@ -470,6 +507,10 @@ int write_midx_file(const char *object_dir)
written += write_midx_pack_names(f, 
packs.names, packs.nr);
break;
 
+   case MIDX_CHUNKID_OIDLOOKUP:
+   written += write_midx_oid_lookup(f, 
MIDX_HASH_LEN, entries, nr_entries);
+   break;
+
default:
BUG("trying to 

[PATCH v3 00/24] Multi-pack-index (MIDX)

2018-07-05 Thread Derrick Stolee
Thanks for the feedback on v2. I cleaned up the patch to respond to all
that feedback. I'll include the version diff in a follow-up email.

You can see the CI builds for Linux, Mac, and Windows linked from the
GitHub pull request [1].

Biggest changes in this version:

* Deleted the extra static method.

* Use $(...) instead of `...` in test script

* Fixed spaces in tabbing (hopefully I caught them all)

* Cleaned up some inter-patch diff issues

* `...` quotes in git-multi-pack-index.txt

* Replace FREE_AND_NULL() with free() when obvious that a variable is
  going out of scope

* Due to how Windows handles open file handles when replacing a
  lockfile, be sure to close the midx before writing a new one.

I do still want to revisit the "order by fanout" vs "merge sort"
discussion that we had on v1, but this series is very big already. I'd
like to come back to that as a follow-up.

Thanks,
-Stolee

[1] https://github.com/gitgitgadget/git/pull/5

Derrick Stolee (24):
  multi-pack-index: add design document
  multi-pack-index: add format details
  multi-pack-index: add builtin
  multi-pack-index: add 'write' verb
  midx: write header information to lockfile
  multi-pack-index: load into memory
  multi-pack-index: expand test data
  packfile: generalize pack directory list
  multi-pack-index: read packfile list
  multi-pack-index: write pack names in chunk
  midx: read pack names into array
  midx: sort and deduplicate objects from packfiles
  midx: write object ids in a chunk
  midx: write object id fanout chunk
  midx: write object offsets
  config: create core.multiPackIndex setting
  midx: prepare midxed_git struct
  midx: read objects from multi-pack-index
  midx: use midx in abbreviation calculations
  midx: use existing midx when writing new one
  midx: use midx in approximate_object_count
  midx: prevent duplicate packfile loads
  packfile: skip loading index if in multi-pack-index
  midx: clear midx on repack

 .gitignore   |   3 +-
 Documentation/config.txt |   4 +
 Documentation/git-multi-pack-index.txt   |  56 ++
 Documentation/technical/multi-pack-index.txt | 109 +++
 Documentation/technical/pack-format.txt  |  77 ++
 Makefile |   3 +
 builtin.h|   1 +
 builtin/multi-pack-index.c   |  46 +
 builtin/repack.c |   8 +
 cache.h  |   1 +
 command-list.txt |   1 +
 config.c |   5 +
 environment.c|   1 +
 git.c|   1 +
 midx.c   | 896 +++
 midx.h   |  20 +
 object-store.h   |  33 +
 packfile.c   | 169 +++-
 packfile.h   |   9 +
 sha1-name.c  |  70 ++
 t/helper/test-read-midx.c|  54 ++
 t/helper/test-tool.c |   1 +
 t/helper/test-tool.h |   1 +
 t/t5319-multi-pack-index.sh  | 191 
 24 files changed, 1717 insertions(+), 43 deletions(-)
 create mode 100644 Documentation/git-multi-pack-index.txt
 create mode 100644 Documentation/technical/multi-pack-index.txt
 create mode 100644 builtin/multi-pack-index.c
 create mode 100644 midx.c
 create mode 100644 midx.h
 create mode 100644 t/helper/test-read-midx.c
 create mode 100755 t/t5319-multi-pack-index.sh


base-commit: 53f9a3e157dbbc901a02ac2c73346d375e24978c
-- 
2.18.0.118.gd4f65b8d14



[PATCH v3 01/24] multi-pack-index: add design document

2018-07-05 Thread Derrick Stolee
Signed-off-by: Derrick Stolee 
---
 Documentation/technical/multi-pack-index.txt | 109 +++
 1 file changed, 109 insertions(+)
 create mode 100644 Documentation/technical/multi-pack-index.txt

diff --git a/Documentation/technical/multi-pack-index.txt 
b/Documentation/technical/multi-pack-index.txt
new file mode 100644
index 00..d7e57639f7
--- /dev/null
+++ b/Documentation/technical/multi-pack-index.txt
@@ -0,0 +1,109 @@
+Multi-Pack-Index (MIDX) Design Notes
+
+
+The Git object directory contains a 'pack' directory containing
+packfiles (with suffix ".pack") and pack-indexes (with suffix
+".idx"). The pack-indexes provide a way to lookup objects and
+navigate to their offset within the pack, but these must come
+in pairs with the packfiles. This pairing depends on the file
+names, as the pack-index differs only in suffix with its pack-
+file. While the pack-indexes provide fast lookup per packfile,
+this performance degrades as the number of packfiles increases,
+because abbreviations need to inspect every packfile and we are
+more likely to have a miss on our most-recently-used packfile.
+For some large repositories, repacking into a single packfile
+is not feasible due to storage space or excessive repack times.
+
+The multi-pack-index (MIDX for short) stores a list of objects
+and their offsets into multiple packfiles. It contains:
+
+- A list of packfile names.
+- A sorted list of object IDs.
+- A list of metadata for the ith object ID including:
+  - A value j referring to the jth packfile.
+  - An offset within the jth packfile for the object.
+- If large offsets are required, we use another list of large
+  offsets similar to version 2 pack-indexes.
+
+Thus, we can provide O(log N) lookup time for any number
+of packfiles.
+
+Design Details
+--
+
+- The MIDX is stored in a file named 'multi-pack-index' in the
+  .git/objects/pack directory. This could be stored in the pack
+  directory of an alternate. It refers only to packfiles in that
+  same directory.
+
+- The pack.multiIndex config setting must be on to consume MIDX files.
+
+- The file format includes parameters for the object ID hash
+  function, so a future change of hash algorithm does not require
+  a change in format.
+
+- The MIDX keeps only one record per object ID. If an object appears
+  in multiple packfiles, then the MIDX selects the copy in the most-
+  recently modified packfile.
+
+- If there exist packfiles in the pack directory not registered in
+  the MIDX, then those packfiles are loaded into the `packed_git`
+  list and `packed_git_mru` cache.
+
+- The pack-indexes (.idx files) remain in the pack directory so we
+  can delete the MIDX file, set core.midx to false, or downgrade
+  without any loss of information.
+
+- The MIDX file format uses a chunk-based approach (similar to the
+  commit-graph file) that allows optional data to be added.
+
+Future Work
+---
+
+- Add a 'verify' subcommand to the 'git midx' builtin to verify the
+  contents of the multi-pack-index file match the offsets listed in
+  the corresponding pack-indexes.
+
+- The multi-pack-index allows many packfiles, especially in a context
+  where repacking is expensive (such as a very large repo), or
+  unexpected maintenance time is unacceptable (such as a high-demand
+  build machine). However, the multi-pack-index needs to be rewritten
+  in full every time. We can extend the format to be incremental, so
+  writes are fast. By storing a small "tip" multi-pack-index that
+  points to large "base" MIDX files, we can keep writes fast while
+  still reducing the number of binary searches required for object
+  lookups.
+
+- The reachability bitmap is currently paired directly with a single
+  packfile, using the pack-order as the object order to hopefully
+  compress the bitmaps well using run-length encoding. This could be
+  extended to pair a reachability bitmap with a multi-pack-index. If
+  the multi-pack-index is extended to store a "stable object order"
+  (a function Order(hash) = integer that is constant for a given hash,
+  even as the multi-pack-index is updated) then a reachability bitmap
+  could point to a multi-pack-index and be updated independently.
+
+- Packfiles can be marked as "special" using empty files that share
+  the initial name but replace ".pack" with ".keep" or ".promisor".
+  We can add an optional chunk of data to the multi-pack-index that
+  records flags of information about the packfiles. This allows new
+  states, such as 'repacked' or 'redeltified', that can help with
+  pack maintenance in a multi-pack environment. It may also be
+  helpful to organize packfiles by object type (commit, tree, blob,
+  etc.) and use this metadata to help that maintenance.
+
+- The partial clone feature records special "promisor" packs that
+  may point to objects that are not stored locally, but available
+  on request to a server. The multi-pack-index does not 

Business Proposal

2018-07-05 Thread BRENDA WILSON



I am Sgt.Brenda Wilson, originally from Lake Jackson Texas USA.I personally 
made a special research and I came across your information. I am presently 
writing this mail to you from U.S Military base Kabul Afghanistan I have a 
secured business proposal for you. Reply for more details via my private E-mail 
( brendawilson...@hotmail.com )


Re: [PATCH] builtin/config: work around an unsized array forward declaration

2018-07-05 Thread Jeff King
On Thu, Jul 05, 2018 at 09:50:53PM +0200, Beat Bolli wrote:

> > Your patch is obviously correct, but I think here there might be an even
> > simpler solution: just bump option_parse_type() below the declaration,
> > since it's the only one that needs it. That hunk is bigger, but the
> > overall diff is simpler, and we don't need to carry that extra wrapper
> > function.
> 
> That was dscho's first try in the GitHub issue. It doesn't compile
> because the OPT_CALLBACK* macros in the builtin_config_options
> declaration inserts a pointer to option_parse_type into the array items.
> We need at least one forward declaration, and my patch seemed the least
> intrusive.

Ah, right, so it actually is mutually recursive.  Forward-declaring
option_parse_type() would fix it, along with the reordering. I'm
ambivalent between the available options, then; we might as well go with
what you posted, then, since it's already done. :)

-Peff


Re: [PATCH] builtin/config: work around an unsized array forward declaration

2018-07-05 Thread Beat Bolli
Hi Peff

On 05.07.18 21:38, Jeff King wrote:
> On Thu, Jul 05, 2018 at 08:34:45PM +0200, Beat Bolli wrote:
> 
>> As reported here[0], Microsoft Visual Studio 2017.2 and "gcc -pedantic"
>> don't understand the forward declaration of an unsized static array.
>> They insist on an array size:
>>
>> d:\git\src\builtin\config.c(70,46): error C2133: 
>> 'builtin_config_options': unknown size
>>
>> The thread [1] explains that this is due to the single-pass nature of
>> old compilers.
> 
> Right, that makes sense.
> 
>> To work around this error, introduce the forward-declared function
>> usage_builtin_config() instead that uses the array
>> builtin_config_options only after it has been defined.
>>
>> Also use this function in all other places where usage_with_options() is
>> called with the same arguments.
> 
> Your patch is obviously correct, but I think here there might be an even
> simpler solution: just bump option_parse_type() below the declaration,
> since it's the only one that needs it. That hunk is bigger, but the
> overall diff is simpler, and we don't need to carry that extra wrapper
> function.

That was dscho's first try in the GitHub issue. It doesn't compile
because the OPT_CALLBACK* macros in the builtin_config_options
declaration inserts a pointer to option_parse_type into the array items.
We need at least one forward declaration, and my patch seemed the least
intrusive.

> As a general rule for this case (because reordering isn't always an
> option), I also wonder if we should prefer just introducing a pointer
> alias:
> 
>   /* forward declaration is a pointer */
>   static struct option *builtin_config_options;
> 
>   /* later, declare the actual storage and its alias */
>   static struct option builtin_config_options_storage[] = {
>   ...
>   };
>   static struct option *builtin_config_options = 
> builtin_config_options_storage;
> 
> There are occasionally cases where the caller really wants an array and
> not a pointer, but in practice those are pretty rare.
> 
> I have a slight preference for the reordering solution in this case, but
> any of them would be OK with me.
> 
> -Peff 

Regards, Beat



Re: [PATCH] builtin/config: work around an unsized array forward declaration

2018-07-05 Thread Jeff King
On Thu, Jul 05, 2018 at 08:34:45PM +0200, Beat Bolli wrote:

> As reported here[0], Microsoft Visual Studio 2017.2 and "gcc -pedantic"
> don't understand the forward declaration of an unsized static array.
> They insist on an array size:
> 
> d:\git\src\builtin\config.c(70,46): error C2133: 
> 'builtin_config_options': unknown size
> 
> The thread [1] explains that this is due to the single-pass nature of
> old compilers.

Right, that makes sense.

> To work around this error, introduce the forward-declared function
> usage_builtin_config() instead that uses the array
> builtin_config_options only after it has been defined.
> 
> Also use this function in all other places where usage_with_options() is
> called with the same arguments.

Your patch is obviously correct, but I think here there might be an even
simpler solution: just bump option_parse_type() below the declaration,
since it's the only one that needs it. That hunk is bigger, but the
overall diff is simpler, and we don't need to carry that extra wrapper
function.

As a general rule for this case (because reordering isn't always an
option), I also wonder if we should prefer just introducing a pointer
alias:

  /* forward declaration is a pointer */
  static struct option *builtin_config_options;

  /* later, declare the actual storage and its alias */
  static struct option builtin_config_options_storage[] = {
...
  };
  static struct option *builtin_config_options = builtin_config_options_storage;

There are occasionally cases where the caller really wants an array and
not a pointer, but in practice those are pretty rare.

I have a slight preference for the reordering solution in this case, but
any of them would be OK with me.

-Peff


Re: [PATCH] builtin/config: work around an unsized array forward declaration

2018-07-05 Thread Taylor Blau
On Thu, Jul 05, 2018 at 08:34:45PM +0200, Beat Bolli wrote:
> As reported here[0], Microsoft Visual Studio 2017.2 and "gcc -pedantic"
> don't understand the forward declaration of an unsized static array.
> They insist on an array size:
>
> d:\git\src\builtin\config.c(70,46): error C2133: 
> 'builtin_config_options': unknown size
>
> The thread [1] explains that this is due to the single-pass nature of
> old compilers.
>
> To work around this error, introduce the forward-declared function
> usage_builtin_config() instead that uses the array
> builtin_config_options only after it has been defined.

Argh, I think that this is my fault (via: fb0dc3bac1 (builtin/config.c:
support `--type=` as preferred alias for `--`, 2018-04-18)).

Thank you for the explanation above, and for the patch below. I reviewed
it myself, and the fix seems to be appropriate.

> Also use this function in all other places where usage_with_options() is
> called with the same arguments.
>
> [0]: https://github.com/git-for-windows/git/issues/1735
> [1]: https://groups.google.com/forum/#!topic/comp.lang.c.moderated/bmiF2xMz51U
>
> Fixes https://github.com/git-for-windows/git/issues/1735
>
> Reported-By: Karen Huang (via GitHub)
> Signed-off-by: Beat Bolli 

Thanks,
Taylor


Re: [PATCH v2 05/24] midx: write header information to lockfile

2018-07-05 Thread Derrick Stolee

On 6/25/2018 3:19 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


+#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
+#define MIDX_VERSION 1
+#define MIDX_HASH_VERSION 1
+#define MIDX_HEADER_SIZE 12
+
+static char *get_midx_filename(const char *object_dir)
+{
+   return xstrfmt("%s/pack/multi-pack-index", object_dir);
+}
+
+static size_t write_midx_header(struct hashfile *f,
+   unsigned char num_chunks,
+   uint32_t num_packs)
+{
+   unsigned char byte_values[4];
+   hashwrite_be32(f, MIDX_SIGNATURE);

WARNING: Missing a blank line after declarations
#48: FILE: midx.c:21:
+   unsigned char byte_values[4];
+   hashwrite_be32(f, MIDX_SIGNATURE);


+   byte_values[0] = MIDX_VERSION;
+   byte_values[1] = MIDX_HASH_VERSION;
+   byte_values[2] = num_chunks;
+   byte_values[3] = 0; /* unused */
+   hashwrite(f, byte_values, sizeof(byte_values));
+   hashwrite_be32(f, num_packs);
+
+   return MIDX_HEADER_SIZE;
+}
+
  int write_midx_file(const char *object_dir)
  {
+   unsigned char num_chunks = 0;
+   char *midx_name;
+   struct hashfile *f;
+   struct lock_file lk;
+
+   midx_name = get_midx_filename(object_dir);
+   if (safe_create_leading_directories(midx_name)) {
+   UNLEAK(midx_name);
+   die_errno(_("unable to create leading directories of %s"),
+ midx_name);
+   }
+
+   hold_lock_file_for_update(, midx_name, LOCK_DIE_ON_ERROR);
+   f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
+   FREE_AND_NULL(midx_name);

I am not sure why people prefer FREE_AND_NULL over free() for things
like this.  It is on stack; it's not like this is a static variable
visible after this function returns or anything like that.


I default to FREE_AND_NULL(X) because a later change may introduce logic 
to use X later in the same code block. In this case, we add a 'cleanup:' 
at the end which would fail in a success case if we don't set midx_name 
to NULL here.


I think there are some other FREE_AND_NULLs that are currently at the 
end of a code block, so I'll work to clean those up.





+   write_midx_header(f, num_chunks, 0);
+
+   finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
+   commit_lock_file();
+
return 0;
  }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index ec3ddbe79c..8622a7cdce 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -4,7 +4,8 @@ test_description='multi-pack-indexes'
  . ./test-lib.sh
  
  test_expect_success 'write midx with no packs' '

-   git multi-pack-index --object-dir=. write
+   git multi-pack-index --object-dir=. write &&
+   test_path_is_file pack/multi-pack-index
  '
  
  test_done




Re: [PATCH v2 06/24] multi-pack-index: load into memory

2018-07-05 Thread Eric Sunshine
On Thu, Jul 5, 2018 at 10:20 AM Derrick Stolee  wrote:
> On 6/25/2018 3:38 PM, Junio C Hamano wrote:
> While I don't use substitutions in this patch, I do use them in later
> patches. Here is the final version of this method:
>
> midx_read_expect () {
>  NUM_PACKS=$1
>  NUM_OBJECTS=$2
>  NUM_CHUNKS=$3
>  EXTRA_CHUNKS="$5"
>  cat >expect <<-\EOF
>  header: 4d494458 1 $NUM_CHUNKS $NUM_PACKS
>  chunks: pack_names oid_fanout oid_lookup
> object_offsets$EXTRA_CHUNKS
>  num_objects: $NUM_OBJECTS
>  packs:
>  EOF
>
> Using <<-\EOF causes these substitutions to fail. Is there a different
> way I should construct this method?

When you need to interpolate variables into the here-doc, use <<-EOF;
when you don't, use <<-\EOF.


[PATCH] builtin/config: work around an unsized array forward declaration

2018-07-05 Thread Beat Bolli
As reported here[0], Microsoft Visual Studio 2017.2 and "gcc -pedantic"
don't understand the forward declaration of an unsized static array.
They insist on an array size:

d:\git\src\builtin\config.c(70,46): error C2133: 'builtin_config_options': 
unknown size

The thread [1] explains that this is due to the single-pass nature of
old compilers.

To work around this error, introduce the forward-declared function
usage_builtin_config() instead that uses the array
builtin_config_options only after it has been defined.

Also use this function in all other places where usage_with_options() is
called with the same arguments.

[0]: https://github.com/git-for-windows/git/issues/1735
[1]: https://groups.google.com/forum/#!topic/comp.lang.c.moderated/bmiF2xMz51U

Fixes https://github.com/git-for-windows/git/issues/1735

Reported-By: Karen Huang (via GitHub)
Signed-off-by: Beat Bolli 
---
 builtin/config.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index b29d26dede..2c93a289a7 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -67,7 +67,7 @@ static int show_origin;
{ OPTION_CALLBACK, (s), (l), (v), NULL, (h), PARSE_OPT_NOARG | \
PARSE_OPT_NONEG, option_parse_type, (i) }
 
-static struct option builtin_config_options[];
+static NORETURN void usage_builtin_config(void);
 
 static int option_parse_type(const struct option *opt, const char *arg,
 int unset)
@@ -111,8 +111,7 @@ static int option_parse_type(const struct option *opt, 
const char *arg,
 * --type=int'.
 */
error("only one type at a time.");
-   usage_with_options(builtin_config_usage,
-   builtin_config_options);
+   usage_builtin_config();
}
*to_type = new_type;
 
@@ -157,11 +156,16 @@ static struct option builtin_config_options[] = {
OPT_END(),
 };
 
+static NORETURN void usage_builtin_config(void)
+{
+   usage_with_options(builtin_config_usage, builtin_config_options);
+}
+
 static void check_argc(int argc, int min, int max) {
if (argc >= min && argc <= max)
return;
error("wrong number of arguments");
-   usage_with_options(builtin_config_usage, builtin_config_options);
+   usage_builtin_config();
 }
 
 static void show_config_origin(struct strbuf *buf)
@@ -596,7 +600,7 @@ int cmd_config(int argc, const char **argv, const char 
*prefix)
if (use_global_config + use_system_config + use_local_config +
!!given_config_source.file + !!given_config_source.blob > 1) {
error("only one config file at a time.");
-   usage_with_options(builtin_config_usage, 
builtin_config_options);
+   usage_builtin_config();
}
 
if (use_local_config && nongit)
@@ -660,12 +664,12 @@ int cmd_config(int argc, const char **argv, const char 
*prefix)
 
if ((actions & (ACTION_GET_COLOR|ACTION_GET_COLORBOOL)) && type) {
error("--get-color and variable type are incoherent");
-   usage_with_options(builtin_config_usage, 
builtin_config_options);
+   usage_builtin_config();
}
 
if (HAS_MULTI_BITS(actions)) {
error("only one action at a time.");
-   usage_with_options(builtin_config_usage, 
builtin_config_options);
+   usage_builtin_config();
}
if (actions == 0)
switch (argc) {
@@ -673,25 +677,24 @@ int cmd_config(int argc, const char **argv, const char 
*prefix)
case 2: actions = ACTION_SET; break;
case 3: actions = ACTION_SET_ALL; break;
default:
-   usage_with_options(builtin_config_usage, 
builtin_config_options);
+   usage_builtin_config();
}
if (omit_values &&
!(actions == ACTION_LIST || actions == ACTION_GET_REGEXP)) {
error("--name-only is only applicable to --list or 
--get-regexp");
-   usage_with_options(builtin_config_usage, 
builtin_config_options);
+   usage_builtin_config();
}
 
if (show_origin && !(actions &
(ACTION_GET|ACTION_GET_ALL|ACTION_GET_REGEXP|ACTION_LIST))) {
error("--show-origin is only applicable to --get, --get-all, "
  "--get-regexp, and --list.");
-   usage_with_options(builtin_config_usage, 
builtin_config_options);
+   usage_builtin_config();
}
 
if (default_value && !(actions & ACTION_GET)) {
error("--default is only applicable to --get");
-   usage_with_options(builtin_config_usage,
-   builtin_config_options);
+   usage_builtin_config();
}
 
if (actions & PAGING_ACTIONS)
-- 
2.15.0.rc1.299.gda03b47c3



Re: Git 2.18: RUNTIME_PREFIX... is it working?

2018-07-05 Thread Paul Smith
On Wed, 2018-07-04 at 13:22 +0200, Johannes Schindelin wrote:
> > Basically what happens is that I run configure with
> > --prefix=/my/install/path --with-gitconfig=etc/gitconfig
> > --with-gitattributes=etc/gitattributes.
> > 
> > Then I run make with RUNTIME_PREFIX=YesPlease.
> 
> Ah. In Git for Windows, we do not use configure. I *think* this
> points to an incompatibility of the RUNTIME_PREFIX feature with our
> autoconf support, and this is a grand opportunity for you to step in
> and help.
> 
> Essentially, what you will want to do is to implement a new configure
> option --with-runtime-prefix that then prevents the autoconf script
> from munging the relative paths in the way it does.

FYI I was able to get this to work by overriding variables on the make
command line, like this:

  make ... RUNTIME_PREFIX=YesPlease \
  gitexecdir=libexec/git-core \
  template_dir=share/git-core/templates \
  sysconfdir=etc

I agree a new autoconf option would be much simpler to use.  I'll think
about it as I happen to have some some experience in these areas ;) ...
but time is limited of course :).


Re: [GSoC][PATCH v2 1/7] sequencer: make two functions and an enum from sequencer.c public

2018-07-05 Thread Alban Gruin
Hi Junio,

Le 03/07/2018 à 22:20, Junio C Hamano a écrit :
> Alban Gruin  writes:
> 
>> -enum check_level {
>> -CHECK_IGNORE = 0, CHECK_WARN, CHECK_ERROR
>> -};
>> -
>> -static enum check_level get_missing_commit_check_level(void)
>> +enum missing_commit_check_level get_missing_commit_check_level(void)
> 
> The new name definitely is better than "check_level" in the global
> context, but "missing_commit" is much less important thing to say
> than "this symbol is to be used when driving 'rebase' (or even
> 'rebase-i')", I think.  "enum rebase_i_drop_commit_check" with
> "get_rebase_i_drop_commit_check()" perhaps?
> 

I don’t really like those names, but the function and the enum should
eventually move to rebase-interactive.c and become static again, so we
could revert their names in due course.

Cheers,
Alban



RE: [PATCH v6 2/8] read-cache: teach make_cache_entry to take object_id

2018-07-05 Thread Jameson Miller
> >
> > Teach make_cache_entry function to take object_id instead of a SHA-1.
> 
> This repeats the subject line?
> 
> Sign off missing.

Thank you Stefan for pointing this out. This does not add much information to 
the subject line. I could clean it up if I re-roll to fix up the missing sign 
off.

Junio - would you like me to re-roll this patch series and include the correct 
sign off, or would you be able to correct this? I can do whichever is easiest 
for you.


Re: [PATCH v3 0/2] grep.c: teach --only-matching to 'git-grep(1)'

2018-07-05 Thread Taylor Blau
On Thu, Jul 05, 2018 at 10:21:11AM -0400, Jeff King wrote:
> On Tue, Jul 03, 2018 at 04:51:52PM -0500, Taylor Blau wrote:
>
> > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> > index 0de3493b80..be13fc3253 100644
> > --- a/Documentation/git-grep.txt
> > +++ b/Documentation/git-grep.txt
> > @@ -17,7 +17,7 @@ SYNOPSIS
> >[-l | --files-with-matches] [-L | --files-without-match]
> >[(-O | --open-files-in-pager) []]
> >[-z | --null]
> > -  [-c | --count] [--all-match] [-q | --quiet]
> > +  [ -o | --only-matching ] [-c | --count] [--all-match] [-q | --quiet]
> >[--max-depth ]
> >[--color[=] | --no-color]
> >[--break] [--heading] [-p | --show-function]
> > @@ -201,6 +201,10 @@ providing this option will cause it to die.
> > Output \0 instead of the character that normally follows a
> > file name.
> >
> > +-o::
> > +--only-matching::
> > +  Output only the matching part of the lines.
> > +
>
> Putting myself into the shoes of a naive reader, I have to wonder what
> happens when there are multiple matching parts on the same line. I know
> the answer from your commit message, but maybe that should be covered
> here? Maybe:
>
>   Output only the matching part of the lines. If there are multiple
>   matching parts, each is output on a separate line.

I think that this might be clear enough on its own, especially since
this is the same as BSD grep on my machine. I think that part_s_ of a
line indicates that behavior, but perhaps not. On GNU grep, this is:

  Print only the matched (non-empty) parts of a matching line, with each
  such part on a separate output line.

I'm happy to pick either and re-send this patch (2/2) again, if it
wouldn't be too much to juggle. Otherwise, I can re-roll to v4.


Thanks,
Taylor


Re: [PATCH v3 0/2] grep.c: teach --only-matching to 'git-grep(1)'

2018-07-05 Thread Jeff King
On Tue, Jul 03, 2018 at 04:51:52PM -0500, Taylor Blau wrote:

> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> index 0de3493b80..be13fc3253 100644
> --- a/Documentation/git-grep.txt
> +++ b/Documentation/git-grep.txt
> @@ -17,7 +17,7 @@ SYNOPSIS
>  [-l | --files-with-matches] [-L | --files-without-match]
>  [(-O | --open-files-in-pager) []]
>  [-z | --null]
> -[-c | --count] [--all-match] [-q | --quiet]
> +[ -o | --only-matching ] [-c | --count] [--all-match] [-q | --quiet]
>  [--max-depth ]
>  [--color[=] | --no-color]
>  [--break] [--heading] [-p | --show-function]
> @@ -201,6 +201,10 @@ providing this option will cause it to die.
>   Output \0 instead of the character that normally follows a
>   file name.
> 
> +-o::
> +--only-matching::
> +  Output only the matching part of the lines.
> +

Putting myself into the shoes of a naive reader, I have to wonder what
happens when there are multiple matching parts on the same line. I know
the answer from your commit message, but maybe that should be covered
here? Maybe:

  Output only the matching part of the lines. If there are multiple
  matching parts, each is output on a separate line.

-Peff


Re: [PATCH v2 06/24] multi-pack-index: load into memory

2018-07-05 Thread Derrick Stolee

On 6/25/2018 3:38 PM, Junio C Hamano wrote:

Derrick Stolee  writes:

+   cat >expect <<- EOF

"<<-\EOF", i.e. make it easy for readers to spot that there is no
funny substitutions happening in the here-doc body.


While I don't use substitutions in this patch, I do use them in later 
patches. Here is the final version of this method:


midx_read_expect () {
    NUM_PACKS=$1
    NUM_OBJECTS=$2
    NUM_CHUNKS=$3
    OBJECT_DIR=$4
    EXTRA_CHUNKS="$5"
    cat >expect <<-\EOF
    header: 4d494458 1 $NUM_CHUNKS $NUM_PACKS
    chunks: pack_names oid_fanout oid_lookup 
object_offsets$EXTRA_CHUNKS

    num_objects: $NUM_OBJECTS
    packs:
    EOF
    if [ $NUM_PACKS -ge 1 ]
    then
    ls $OBJECT_DIR/pack/ | grep idx | sort >> expect
    fi
    printf "object_dir: $OBJECT_DIR\n" >>expect &&
    test-tool read-midx $OBJECT_DIR >actual &&
    test_cmp expect actual
}

Using <<-\EOF causes these substitutions to fail. Is there a different 
way I should construct this method?


Thanks,
-Stolee


Re: [PATCH 4/8] gpg-interface: introduce an abstraction for multiple gpg formats

2018-07-05 Thread Henning Schild
Am Wed, 4 Jul 2018 09:10:17 +0200
schrieb Martin Ågren :

> Hi Henning,
> 
> On 3 July 2018 at 14:38, Henning Schild 
> wrote:
> > Create a struct that holds the format details for the supported
> > formats. At the moment that is still just "PGP". This commit
> > prepares for the introduction of more formats, that might use other
> > programs and match other signatures.
> >
> > Signed-off-by: Henning Schild   
> 
> Welcome to the mailing list! :-)

Thanks!

> I'll just comment on a few thoughts I had while skimming this.
> 
> >  static char *configured_signing_key;
> > -static const char *gpg_format = "PGP";
> > -static const char *gpg_program = "gpg";
> > +struct gpg_format_data {
> > +   const char *format;
> > +   const char *program;
> > +   const char *extra_args_verify[1];
> > +   const char *sigs[2];
> > +};
> >
> >  #define PGP_SIGNATURE "-BEGIN PGP SIGNATURE-"
> >  #define PGP_MESSAGE "-BEGIN PGP MESSAGE-"
> >
> > +enum gpgformats { PGP_FMT };
> > +struct gpg_format_data gpg_formats[] = {
> > +   { .format = "PGP", .program = "gpg",
> > + .extra_args_verify = { "--keyid-format=long", },
> > + .sigs = { PGP_SIGNATURE, PGP_MESSAGE, },
> > +   },
> > +};  
> 
> I think those trailing commas are ok now, but I'm not sure...
> 
> I had the same thought about designated initializers. Those should be
> ok now, c.f. cbc0f81d96 (strbuf: use designated initializers in
> STRBUF_INIT, 2017-07-10) and a73b3680c4 (Add and use generic name->id
> mapping code for color slot parsing, 2018-05-26).

Ok, i did not actually check coding style yet. I could run it through a
tool, given there is a suggestion. Or i could address issues someone
points out in the review.
What i get from your comment is that it might be ok to leave the code
as is, others have introduces similar constructs before me.

> > +static const char *gpg_format = "PGP";
> > +
> > +static struct gpg_format_data *get_format_data(void)
> > +{
> > +   int i;
> > +   for (i = 0; i < ARRAY_SIZE(gpg_formats); i++)
> > +   if (!strcmp(gpg_formats[i].format, gpg_format))
> > +   return gpg_formats + i;
> > +   assert(0);  
> 
> This might be better written as `BUG("bad gpg_format '%s'",
> gpg_format);` or something like that.
> 
> (It's not supposed to be triggered, not even by invalid data from the
> user, right?)

Yes that is code that can not (should not) be reached. I agree that an
assert(0) is not very expressive and will fix that in v2.

> 
> > if (!strcmp(var, "gpg.format")) {
> > -   if (!strcmp(value, "PGP"))  
> 
> This line was added in patch 3. It errors out precisely when
> gpg.format is "PGP", no? That this doesn't break the whole series is
> explained by 1) it being removed in this patch 4, and 2) there being
> no tests. It makes me wonder if something like patch 5 (test
> gpg.format) could be part of patch 3, both with negative ("=
> malformed") and positive ("= PGP") tests?

I will pull the tests from patch 5 before touching that code and fix up
issues inbetween. The whole series saw a "make test" inbetween all
commits.

> > +   j = 0;
> > +   for (i = 0; i < ARRAY_SIZE(gpg_formats); i++)
> > +   if (!strcmp(value, gpg_formats[i].format)) {
> > +   j++;
> > +   break;
> > +   }
> > +   if (!j)
> > return error("malformed value for %s: %s",
> > var, value);  
> 
> `if (i == ARRAY_SIZE(gpg_formats))` and drop `j`?
> 
> Or check whether `get_format_data()` returns NULL? Hmm, well you
> can't, since it takes its "input" from a global variable...
> 
> If you want to keep that global nature, the duplication of
> search-logic could perhaps be avoided by having a helper function for
> returning the index of a gpg_format (or -1).

True, the two are almost the same and should be merged. Will do in v2.

Henning

> > return git_config_string(_format, var, value);
> > }  
> 
> Martin