date:20180712

Re: [PATCH] t3404: fix use of "VAR=VAL cmd" with a shell function

2018-07-12 Thread Eric Sunshine

On Thu, Jul 12, 2018 at 7:52 PM Jeff King  wrote:
> (I do agree that being able to automatically catch these with a linter
> would be worth brain cycles, but I cannot immediately think a of a way
> to do so).

Perhaps something like this[1]?

[1]: 
https://public-inbox.org/git/20180713055205.32351-1-sunsh...@sunshineco.com/T/

[PATCH 2/4] t/check-non-portable-shell: stop being so polite

2018-07-12 Thread Eric Sunshine

Error messages emitted by this linting script are long and noisy,
consisting of several sections:

:: error: : 

Many problem explanations ask the reader to "please" use a suggested
alternative, however, such politeness is unnecessary and just adds to
the noise and length of the line, so drop "please" to make the message a
bit more concise.

Signed-off-by: Eric Sunshine 
---
 t/check-non-portable-shell.pl | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
index e07f028437..11028578ff 100755
--- a/t/check-non-portable-shell.pl
+++ b/t/check-non-portable-shell.pl
@@ -17,12 +17,12 @@ sub err {
 while (<>) {
chomp;
/\bsed\s+-i/ and err 'sed -i is not portable';
-   /\becho\s+-[neE]/ and err 'echo with option is not portable (please use 
printf)';
+   /\becho\s+-[neE]/ and err 'echo with option is not portable (use 
printf)';
/^\s*declare\s+/ and err 'arrays/declare not portable';
-   /^\s*[^#]\s*which\s/ and err 'which is not portable (please use type)';
-   /\btest\s+[^=]*==/ and err '"test a == b" is not portable (please use 
=)';
-   /\bwc -l.*"\s*=/ and err '`"$(wc -l)"` is not portable (please use 
test_line_count)';
-   /\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable 
(please use FOO=bar && export FOO)';
+   /^\s*[^#]\s*which\s/ and err 'which is not portable (use type)';
+   /\btest\s+[^=]*==/ and err '"test a == b" is not portable (use =)';
+   /\bwc -l.*"\s*=/ and err '`"$(wc -l)"` is not portable (use 
test_line_count)';
+   /\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable 
(use FOO=bar && export FOO)';
# this resets our $. for each file
close ARGV if eof;
 }
-- 
2.18.0.233.g985f88cf7e

[PATCH 4/4] t/check-non-portable-shell: detect "FOO=bar shell_func"

2018-07-12 Thread Eric Sunshine

One-shot environment variable assignments, such as 'FOO' in
"FOO=bar cmd", exist only during the invocation of 'cmd'. However, if
'cmd' happens to be a shell function, then 'FOO' is assigned in the
executing shell itself, and that assignment remains until the process
exits (unless explicitly unset). Since this side-effect of
"FOO=bar shell_func" is unlikely to be intentional, detect and report
such usage.

To distinguish shell functions from other commands, perform a pre-scan
of shell scripts named as input, gleaning a list of function names by
recognizing lines of the form (loosely matching whitespace):

shell_func () {

and later report suspect lines of the form (loosely matching quoted
values):

FOO=bar [BAR=foo ...] shell_func

Also take care to stitch together incomplete lines (those ending with
"\") since suspect invocations may be split over multiple lines:

FOO=bar BAR=foo \
shell_func

Signed-off-by: Eric Sunshine 
---
 t/check-non-portable-shell.pl | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
index f6dbe28b19..d5823f71d8 100755
--- a/t/check-non-portable-shell.pl
+++ b/t/check-non-portable-shell.pl
@@ -7,17 +7,34 @@
 use warnings;
 
 my $exit_code=0;
+my %func;
 
 sub err {
my $msg = shift;
s/^\s+//;
s/\s+$//;
+   s/\s+/ /g;
print "$ARGV:$.: error: $msg: $_\n";
$exit_code = 1;
 }
 
+# glean names of shell functions
+for my $i (@ARGV) {
+   open(my $f, '<', $i) or die "$0: $i: $!\n";
+   while (<$f>) {
+   $func{$1} = 1 if /^\s*(\w+)\s*\(\)\s*{\s*$/;
+   }
+   close $f;
+}
+
 while (<>) {
chomp;
+   # stitch together incomplete lines (those ending with "\")
+   while (s/\\$//) {
+   $_ .= readline;
+   chomp;
+   }
+
/\bsed\s+-i/ and err 'sed -i is not portable';
/\becho\s+-[neE]/ and err 'echo with option is not portable (use 
printf)';
/^\s*declare\s+/ and err 'arrays/declare not portable';
@@ -25,6 +42,8 @@ sub err {
/\btest\s+[^=]*==/ and err '"test a == b" is not portable (use =)';
/\bwc -l.*"\s*=/ and err '`"$(wc -l)"` is not portable (use 
test_line_count)';
/\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable 
(use FOO=bar && export FOO)';
+   /^\s*([A-Z0-9_]+=(\w+|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and
+   err '"FOO=bar shell_func" assignment extends beyond 
"shell_func"';
# this resets our $. for each file
close ARGV if eof;
 }
-- 
2.18.0.233.g985f88cf7e

[PATCH 3/4] t/check-non-portable-shell: make error messages more compact

2018-07-12 Thread Eric Sunshine

Error messages emitted by this linting script are long and noisy,
consisting of several sections:

:: error: : 

The line of failed shell text, usually coming from within a test body,
is often indented by one or two TABs, with the result that the actual
(important) text is separated from  by a good deal of empty
space. This can make for a difficult read, especially on typical
80-column terminals.

Make the messages more compact and perhaps a bit easier to digest by
folding out the leading whitespace from .

Signed-off-by: Eric Sunshine 
---
 t/check-non-portable-shell.pl | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
index 11028578ff..f6dbe28b19 100755
--- a/t/check-non-portable-shell.pl
+++ b/t/check-non-portable-shell.pl
@@ -10,6 +10,8 @@
 
 sub err {
my $msg = shift;
+   s/^\s+//;
+   s/\s+$//;
print "$ARGV:$.: error: $msg: $_\n";
$exit_code = 1;
 }
-- 
2.18.0.233.g985f88cf7e

[PATCH 1/4] t6046/t9833: fix use of "VAR=VAL cmd" with a shell function

2018-07-12 Thread Eric Sunshine

Unlike "FOO=bar cmd" one-shot environment variable assignments
which exist only for the invocation of 'cmd', those assigned by
"FOO=bar shell_func" exist within the running shell and continue to
do so until the process exits (or are explicitly unset). It is
unlikely that this behavior was intended by the test author.

In these particular tests, the "FOO=bar shell_func" invocations are
already in subshells, so the assignments don't last too long, don't
appear to harm subsequent commands in the same subshells, and don't
affect other tests in the same scripts, however, the usage is
nevertheless misleading and poor practice, so fix the tests to assign
and export the environment variables in the usual fashion.

Signed-off-by: Eric Sunshine 
---
 t/t6046-merge-skip-unneeded-updates.sh | 4 +++-
 t/t9833-errors.sh  | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/t/t6046-merge-skip-unneeded-updates.sh 
b/t/t6046-merge-skip-unneeded-updates.sh
index fcefffcaec..38e24f787c 100755
--- a/t/t6046-merge-skip-unneeded-updates.sh
+++ b/t/t6046-merge-skip-unneeded-updates.sh
@@ -366,7 +366,9 @@ test_expect_success '2c-check: Modify b & add c VS rename 
b->c' '
 
git checkout A^0 &&
 
-   GIT_MERGE_VERBOSITY=3 test_must_fail git merge -s recursive B^0 
>out 2>err &&
+   GIT_MERGE_VERBOSITY=3 &&
+   export GIT_MERGE_VERBOSITY &&
+   test_must_fail git merge -s recursive B^0 >out 2>err &&
 
test_i18ngrep "CONFLICT (rename/add): Rename b->c" out &&
test_i18ngrep ! "Skipped c" out &&
diff --git a/t/t9833-errors.sh b/t/t9833-errors.sh
index 9ba892de7a..277d347012 100755
--- a/t/t9833-errors.sh
+++ b/t/t9833-errors.sh
@@ -26,7 +26,9 @@ test_expect_success 'error handling' '
) &&
p4 passwd -P newpassword &&
(
-   P4PASSWD=badpassword test_must_fail git p4 clone //depot/foo 
2>errmsg &&
+   P4PASSWD=badpassword &&
+   export P4PASSWD &&
+   test_must_fail git p4 clone //depot/foo 2>errmsg &&
grep -q "failure accessing depot.*P4PASSWD" errmsg
)
 '
-- 
2.18.0.233.g985f88cf7e

[PATCH 0/4] test-lint: detect one-shot "FOO=bar shell_func"

2018-07-12 Thread Eric Sunshine

One-shot "FOO=bar cmd" environment variable assignments exist only
during invocation of 'cmd'. However, if 'cmd' is a shell function, then
the variable is assigned in the running shell and exists until the
process exits (or is unset explicitly). Such a side-effect is almost
certainly unintended by a test author and is likely due to lack of
familiarity with the problem.

Upgrade "make test-lint" to detect this sort of suspect usage.

Also fix a couple instances of "FOO=bar shell_func" detected by the
improved linting.

This series is built atop 'jc/t3404-one-shot-export-fix'[1].

[1]: https://public-inbox.org/git/xmqqefg8w73c@gitster-ct.c.googlers.com/T/

Eric Sunshine (4):
  t6046/t9833: fix use of "VAR=VAL cmd" with a shell function
  t/check-non-portable-shell: stop being so polite
  t/check-non-portable-shell: make error messages more compact
  t/check-non-portable-shell: detect "FOO=bar shell_func"

 t/check-non-portable-shell.pl  | 31 +-
 t/t6046-merge-skip-unneeded-updates.sh |  4 +++-
 t/t9833-errors.sh  |  4 +++-
 3 files changed, 32 insertions(+), 7 deletions(-)

-- 
2.18.0.233.g985f88cf7e

Proposal

2018-07-12 Thread Miss Victoria Mehmet

Hello



I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks

Proposal

2018-07-12 Thread Miss Victoria Mehmet

Hello



I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks

Proposal

2018-07-12 Thread Miss Victoria Mehmet

Hello



I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks

Proposal

2018-07-12 Thread Miss Victoria Mehmet

Hello



I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks

Proposal

2018-07-12 Thread Miss Victoria Mehmet

Hello



I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks

Proposal

2018-07-12 Thread Miss Victoria Mehmet

Hello



I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks

Re: [PATCH v4 16/23] config: create core.multiPackIndex setting

2018-07-12 Thread Derrick Stolee


On 7/12/2018 5:05 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


The core.multiPackIndex config setting controls the multi-pack-
index (MIDX) feature. If false, the setting will disable all reads
from the multi-pack-index file.

Read this config setting in the new prepare_multi_pack_index_one()
which is called during prepare_packed_git(). This check is run once
per repository.

Add comparison commands in t5319-multi-pack-index.sh to check
typical Git behavior remains the same as the config setting is turned
on and off. This currently includes 'git rev-list' and 'git log'
commands to trigger several object database reads. Currently, these
would only catch an error in the prepare_multi_pack_index_one(), but
with later commits will catch errors in object lookups, abbreviations,
and approximate object counts.

Signed-off-by: Derrick Stolee 

midx: prepare midxed_git struct

Signed-off-by: Derrick Stolee 

What is going on around here?
Sorry. I squashed the commits, and intended to drop this second commit 
message.



diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 4a4fa26f7a..601e28a2f0 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -3,6 +3,8 @@
  test_description='multi-pack-indexes'
  . ./test-lib.sh
  
+objdir=.git/objects

+
  midx_read_expect () {
NUM_PACKS=$1
NUM_OBJECTS=$2
@@ -76,18 +78,35 @@ test_expect_success 'create objects' '
  '
  
  test_expect_success 'write midx with one v1 pack' '

-   pack=$(git pack-objects --index-version=1 pack/test 
Hmph, so we used to run tests as if $cwd were GIT_OBJECT_DIRECTORY
but now we are running them from the top-level of the working tree,
just like all the other tests?  Interesting.

This is the first time we _need_ them in the .git/object directory.

  '
  
+midx_git_two_modes() {

+   git -c core.multiPackIndex=false $1 >expect &&
+   git -c core.multiPackIndex=true $1 >actual &&
+   test_cmp expect actual
+}
+
+compare_results_with_midx() {

Style: "compare_results_with_midx () {", just like mdx_read_expect
near the top of the file, but unlike midx_git_two_modes we see
nearby.  Please keep "git grep 'funcname () {'" a usable way to
locate where a shell function is defined without forcing people to
type an asterisk.


Sorry that I missed these two.

Thanks,

-Stolee

Proposal

2018-07-12 Thread Miss Victoria Mehmet

Hello



I have a business proposal of mutual benefits i would like to discuss with
you i asked before and i still await your positive response thanks

[PATCH 2/2] tag: don't warn if target is missing but promised

2018-07-12 Thread Jonathan Tan

deref_tag() prints a warning if the object that a tag refers to does not
exist. However, when a partial clone has an annotated tag from its
promisor remote, but not the object that it refers to, printing a
warning on such a tag is incorrect.

This occurs, for example, when the checkout that happens after a partial
clone causes some objects to be fetched - and as part of the fetch, all
local refs are read. The test included in this patch demonstrates this
situation.

Therefore, do not print a warning in this case.

Signed-off-by: Jonathan Tan 
---
 t/t5616-partial-clone.sh |  9 +++--
 tag.c| 13 ++---
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index e8dfeafe7..bbbe7537d 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -229,9 +229,13 @@ test_expect_success 'when partial cloning, tolerate server 
not sending target of
git -C "$SERVER" tag -m message -a myblob "$BLOB" &&
 
# Craft a packfile including the tag, but not the blob it points to.
-   printf "%s\n%s\n--not\n%s\n" \
+   # Also, omit objects referenced from HEAD in order to force a second
+   # fetch (to fetch missing objects) upon the automatic checkout that
+   # happens after a clone.
+   printf "%s\n%s\n--not\n%s\n%s\n" \
$(git -C "$SERVER" rev-parse HEAD) \
$(git -C "$SERVER" rev-parse myblob) \
+   $(git -C "$SERVER" rev-parse HEAD^{tree}) \
$(git -C "$SERVER" rev-parse myblob^{blob}) |
git -C "$SERVER" pack-objects --thin --stdout >incomplete.pack 
&&
 
@@ -249,7 +253,8 @@ test_expect_success 'when partial cloning, tolerate server 
not sending target of
 
# Exercise to make sure it works.
git -c protocol.version=2 clone \
-   --filter=blob:none $HTTPD_URL/one_time_sed/server repo &&
+   --filter=blob:none $HTTPD_URL/one_time_sed/server repo 2> err &&
+   ! grep "missing object referenced by" err &&
 
# Ensure that the one-time-sed script was used.
! test -e "$HTTPD_ROOT_PATH/one-time-sed"
diff --git a/tag.c b/tag.c
index 3d37c1bd2..1110e3643 100644
--- a/tag.c
+++ b/tag.c
@@ -4,6 +4,7 @@
 #include "tree.h"
 #include "blob.h"
 #include "gpg-interface.h"
+#include "packfile.h"
 
 const char *tag_type = "tag";
 
@@ -64,12 +65,18 @@ int gpg_verify_tag(const struct object_id *oid, const char 
*name_to_report,
 
 struct object *deref_tag(struct object *o, const char *warn, int warnlen)
 {
+   struct object_id *last_oid = NULL;
while (o && o->type == OBJ_TAG)
-   if (((struct tag *)o)->tagged)
-   o = parse_object(&((struct tag *)o)->tagged->oid);
-   else
+   if (((struct tag *)o)->tagged) {
+   last_oid = &((struct tag *)o)->tagged->oid;
+   o = parse_object(last_oid);
+   } else {
+   last_oid = NULL;
o = NULL;
+   }
if (!o && warn) {
+   if (last_oid && is_promisor_object(last_oid))
+   return NULL;
if (!warnlen)
warnlen = strlen(warn);
error("missing object referenced by '%.*s'", warnlen, warn);
-- 
2.18.0.203.gfac676dfb9-goog

[PATCH 1/2] revision: tolerate promised targets of tags

2018-07-12 Thread Jonathan Tan

In handle_commit(), it is fatal for an annotated tag to point to a
non-existent object. --exclude-promisor-objects should relax this rule
and allow non-existent objects that are promisor objects, but this is
not the case. Update handle_commit() to tolerate this situation.

This was observed when cloning from a repository with an annotated tag
pointing to a blob. The test included in this patch demonstrates this
case.

Signed-off-by: Jonathan Tan 
---
---
 revision.c   |  3 +++
 t/t5616-partial-clone.sh | 39 +++
 2 files changed, 42 insertions(+)

diff --git a/revision.c b/revision.c
index 1b37da988..95546e6d4 100644
--- a/revision.c
+++ b/revision.c
@@ -248,6 +248,9 @@ static struct commit *handle_commit(struct rev_info *revs,
if (!object) {
if (revs->ignore_missing_links || (flags & 
UNINTERESTING))
return NULL;
+   if (revs->exclude_promisor_objects &&
+   is_promisor_object(>tagged->oid))
+   return NULL;
die("bad object %s", oid_to_hex(>tagged->oid));
}
object->flags |= flags;
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 44d8e8017..e8dfeafe7 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -216,6 +216,45 @@ test_expect_success 'upon cloning, check that all refs 
point to objects' '
! test -e "$HTTPD_ROOT_PATH/one-time-sed"
 '
 
+test_expect_success 'when partial cloning, tolerate server not sending target 
of tag' '
+   SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+   rm -rf "$SERVER" repo &&
+   test_create_repo "$SERVER" &&
+   test_commit -C "$SERVER" foo &&
+   test_config -C "$SERVER" uploadpack.allowfilter 1 &&
+   test_config -C "$SERVER" uploadpack.allowanysha1inwant 1 &&
+
+   # Create an annotated tag pointing to a blob.
+   BLOB=$(echo blob-contents | git -C "$SERVER" hash-object --stdin -w) &&
+   git -C "$SERVER" tag -m message -a myblob "$BLOB" &&
+
+   # Craft a packfile including the tag, but not the blob it points to.
+   printf "%s\n%s\n--not\n%s\n" \
+   $(git -C "$SERVER" rev-parse HEAD) \
+   $(git -C "$SERVER" rev-parse myblob) \
+   $(git -C "$SERVER" rev-parse myblob^{blob}) |
+   git -C "$SERVER" pack-objects --thin --stdout >incomplete.pack 
&&
+
+   # Replace the existing packfile with the crafted one. The protocol
+   # requires that the packfile be sent in sideband 1, hence the extra
+   # \x01 byte at the beginning.
+   printf "1,/packfile/!c %04xx01%s" \
+   "$(($(wc -c "$HTTPD_ROOT_PATH/one-time-sed" &&
+
+   # Use protocol v2 because the sed command looks for the "packfile"
+   # section header.
+   test_config -C "$SERVER" protocol.version 2 &&
+
+   # Exercise to make sure it works.
+   git -c protocol.version=2 clone \
+   --filter=blob:none $HTTPD_URL/one_time_sed/server repo &&
+
+   # Ensure that the one-time-sed script was used.
+   ! test -e "$HTTPD_ROOT_PATH/one-time-sed"
+'
+
 stop_httpd
 
 test_done
-- 
2.18.0.203.gfac676dfb9-goog

[PATCH 0/2] Annotated tags pointing to missing but promised blobs

2018-07-12 Thread Jonathan Tan

These are based on jt/partial-clone-fsck-connectivity.

The patches in jt/partial-clone-fsck-connectivity were motivated by bugs
I discovered in partial clones when refs pointed to blobs directly.
While continuing to work on this, I noticed issues with annotated tags -
that is, refs pointing to tags pointing to blobs. Here are some fixes.

Jonathan Tan (2):
  revision: tolerate promised targets of tags
  tag: don't warn if target is missing but promised

 revision.c   |  3 +++
 t/t5616-partial-clone.sh | 44 
 tag.c| 13 +---
 3 files changed, 57 insertions(+), 3 deletions(-)

-- 
2.18.0.203.gfac676dfb9-goog

Re: [PATCH] t3404: fix use of "VAR=VAL cmd" with a shell function

2018-07-12 Thread Jeff King

On Thu, Jul 12, 2018 at 01:31:49PM -0700, Junio C Hamano wrote:

> >> ...would you want to use test_when_finished here (both for robustness,
> >> but also to make it more clear to a reader what's going on)?
> >
> > Perhaps.
> 
> Yes, but this one ends up to be overly ugly.
> 
> The restoreing of the AUTHOR_NAME must be done not just before this
> test_expect_success finishes and control goes on to the next test,
> but also before "git rebase -i" in the middle of this test that is
> expected to fail with conflict, as we want it to see the original
> author name (not the modified AttributeMe name).

OK, that is ugly.

>  test_expect_success 'retain authorship w/ conflicts' '
> + oGIT_AUTHOR_NAME=$GIT_AUTHOR_NAME &&
> + test_when_finished "GIT_AUTHOR_NAME=\$oGIT_AUTHOR_NAME" &&
> +
>   git reset --hard twerp &&
>   test_commit a conflict a conflict-a &&
>   git reset --hard twerp &&
> - GIT_AUTHOR_NAME=AttributeMe \
> +
> + GIT_AUTHOR_NAME=AttributeMe &&
> + export GIT_AUTHOR_NAME &&
>   test_commit b conflict b conflict-b &&
> + GIT_AUTHOR_NAME=$oGIT_AUTHOR_NAME &&
> +

I'd actually go so far as to say that it is less ugly without the helper
entirely, like:

diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index 352a52e59d..10d50650ae 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -267,8 +267,9 @@ test_expect_success 'retain authorship w/ conflicts' '
git reset --hard twerp &&
test_commit a conflict a conflict-a &&
git reset --hard twerp &&
-   GIT_AUTHOR_NAME=AttributeMe \
-   test_commit b conflict b conflict-b &&
+   echo b >conflict &&
+   git add conflict &&
+   git commit --author="AttributeMe " -m b &&
set_fake_editor &&
test_must_fail git rebase -i conflict-a &&
echo resolved >conflict &&

but it is probably not worth spending more brain cycles on this. Any of
the solutions is fine by me.

(I do agree that being able to automatically catch these with a linter
would be worth brain cycles, but I cannot immediately think a of a way
to do so).

-Peff

Proposal

2018-07-12 Thread Miss Victoria Mehmet

Hello

I have a business proposal of mutual benefits i would like to discuss with
you.

Re: [PATCH v4 04/23] multi-pack-index: add 'write' verb

2018-07-12 Thread Eric Sunshine

On Thu, Jul 12, 2018 at 3:40 PM Derrick Stolee  wrote:
> In anticipation of writing multi-pack-indexes, add a skeleton
> 'git multi-pack-index write' subcommand and send the options to a
> write_midx_file() method. Also create a skeleton test script that
> tests the 'write' subcommand.
>
> Signed-off-by: Derrick Stolee 
> ---
> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> @@ -30,5 +31,17 @@ int cmd_multi_pack_index(int argc, const char **argv,
> +   if (argc == 0)
> +   goto usage;
> +
> +   if (!strcmp(argv[0], "write")) {
> +   if (argc > 1)
> +   goto usage;
> +
> +   return write_midx_file(opts.object_dir);
> +   }
> +
> +usage:
> +   usage_with_options(builtin_multi_pack_index_usage,
> +  builtin_multi_pack_index_options);
>  }

I realize that some other commands may work this way, but merely
showing 'usage' is fairly user-hostile. What would be much more
helpful would be to explain to the user what the problem is; for
instance, "no action specified" and "'write' takes no arguments", or
something. That way the user knows exactly what corrective action to
take rather than having to try to guess based upon 'usage'. Showing
'usage' after the actual error message may or may not make sense
(though, I prefer not doing so since noisy 'usage' tends to swamp the
actual error message, making it easy to miss).

I wouldn't want to see a re-roll just for this, especially for such a
long series. Perhaps such a change can be done as a follow-up patch by
someone at some point.

By the way, if you ever need to add options or arguments to "write" or
"verify", you might want to model it after how it's done in
builtin/worktree.c, in which each subcommand is responsible for its
own argument processing, rather than handling subcommand arguments
here in the top-level function.

Proposal

2018-07-12 Thread Miss Victoria Mehmet

Hello

I have a business proposal of mutual benefits i would like to discuss with
you.

Proposal

2018-07-12 Thread Miss Victoria Mehmet

Hello

I have a business proposal of mutual benefits i would like to discuss with
you.

Proposal

2018-07-12 Thread Miss Victoria Mehmet

Hello

I have a business proposal of mutual benefits i would like to discuss with
you.

Proposal

2018-07-12 Thread Miss Victoria Mehmet

Hello

I have a business proposal of mutual benefits i would like to discuss with
you.

Proposal

2018-07-12 Thread Miss Victoria Mehmet

Hello

I have a business proposal of mutual benefits i would like to discuss with
you.

Re: [PATCH 2/6] git-submodule.sh: rename unused variables

2018-07-12 Thread Junio C Hamano

Stefan Beller  writes:

> The 'mode' variable is not used in cmd_update for its original purpose,
> rename it to 'dummy' as it only serves the purpose to abort quickly
> documenting this knowledge.

It seems that

 (1) the while loop in git-submodule.sh we see in this patch is the
 only thing that read from submodule--helper update-clone; and

 (2) the mode/sha1/stage output from prepare_to_clone_next_submodule()
 is shown only when update_clone_get_next_task which in turn is
 run only during update-clone task

so this conversion will not have unintended bad effect on other
codepaths that use similarly formatted (but already different)
output used by the users of module_list.  

Which tells us that this step is safe.

I am not sure how much it buys us not having to format mode into hex
and not having to call ce_stage(), when we have cache entry anyway
in the codeflow, though.  IOW, it is unclear at this second step in
the six-patch series if we get to the end of the tunnel sooner by
having this step here.  Let's keep reading ;-).




> The variable 'stage' is also not used any more in cmd_update, so remove it.
>
> This went unnoticed as first each function used the commonly used
> submodule listing, which was converted in 74703a1e4df (submodule: rewrite
> `module_list` shell function in C, 2015-09-02). When cmd_update was
> using its own function starting in 48308681b07 (git submodule update:
> have a dedicated helper for cloning, 2016-02-29), its removal was missed.
>
> Signed-off-by: Stefan Beller 
> ---
>  builtin/submodule--helper.c | 5 ++---
>  git-submodule.sh| 4 ++--
>  2 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index 20ae9191ca3..ebcfe85bfa9 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -1571,9 +1571,8 @@ static int prepare_to_clone_next_submodule(const struct 
> cache_entry *ce,
>   needs_cloning = !file_exists(sb.buf);
>  
>   strbuf_reset();
> - strbuf_addf(, "%06o %s %d %d\t%s\n", ce->ce_mode,
> - oid_to_hex(>oid), ce_stage(ce),
> - needs_cloning, ce->name);
> + strbuf_addf(, "dummy %s %d\t%s\n",
> + oid_to_hex(>oid), needs_cloning, ce->name);
>   string_list_append(>projectlines, sb.buf);
>  
>   if (!needs_cloning)
> diff --git a/git-submodule.sh b/git-submodule.sh
> index 8a611865397..56588aa304d 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -531,9 +531,9 @@ cmd_update()
>   "$@" || echo "#unmatched" $?
>   } | {
>   err=
> - while read -r mode sha1 stage just_cloned sm_path
> + while read -r quickabort sha1 just_cloned sm_path
>   do
> - die_if_unmatched "$mode" "$sha1"
> + die_if_unmatched "$quickabort" "$sha1"
>  
>   name=$(git submodule--helper name "$sm_path") || exit
>   if ! test -z "$update"

Re: [PATCH 1/6] git-submodule.sh: align error reporting for update mode to use path

2018-07-12 Thread Junio C Hamano

Stefan Beller  writes:

> All other error messages in cmd_update are reporting the submodule based
> on its path, so let's do that for invalid update modes, too.
>
> Signed-off-by: Stefan Beller 
> ---

Makes sense.

>  git-submodule.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/git-submodule.sh b/git-submodule.sh
> index 5f9d9f6ea37..8a611865397 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -627,7 +627,7 @@ cmd_update()
>   must_die_on_failure=yes
>   ;;
>   *)
> - die "$(eval_gettext "Invalid update mode 
> '$update_module' for submodule '$name'")"
> + die "$(eval_gettext "Invalid update mode 
> '$update_module' for submodule path '$path'")"
>   esac
>  
>   if (sanitize_submodule_env; cd "$sm_path" && $command 
> "$sha1")

Re: [PATCH v4 00/23] Multi-pack-index (MIDX)

2018-07-12 Thread Junio C Hamano

Derrick Stolee  writes:

> * The global 'core_multi_pack_index' is replaced with a one-time call to
>   git_config_bool() per repository that loads a multi-pack-index.

OK.  I guess that is more consistent, as the configuration would be
per-repo and multi-pack-index is per object-store.

After a quick look at the diff between the application of these
patches and what has been queued in 'pu', I found most (perhaps all)
of the changes are strict improvements.  I may have more comments
later after reading the patches in full.

Thanks.

Re: [PATCH v4 16/23] config: create core.multiPackIndex setting

2018-07-12 Thread Junio C Hamano

Derrick Stolee  writes:

> The core.multiPackIndex config setting controls the multi-pack-
> index (MIDX) feature. If false, the setting will disable all reads
> from the multi-pack-index file.
>
> Read this config setting in the new prepare_multi_pack_index_one()
> which is called during prepare_packed_git(). This check is run once
> per repository.
>
> Add comparison commands in t5319-multi-pack-index.sh to check
> typical Git behavior remains the same as the config setting is turned
> on and off. This currently includes 'git rev-list' and 'git log'
> commands to trigger several object database reads. Currently, these
> would only catch an error in the prepare_multi_pack_index_one(), but
> with later commits will catch errors in object lookups, abbreviations,
> and approximate object counts.
>
> Signed-off-by: Derrick Stolee 
>
> midx: prepare midxed_git struct
>
> Signed-off-by: Derrick Stolee 

What is going on around here?

> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index 4a4fa26f7a..601e28a2f0 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -3,6 +3,8 @@
>  test_description='multi-pack-indexes'
>  . ./test-lib.sh
>  
> +objdir=.git/objects
> +
>  midx_read_expect () {
>   NUM_PACKS=$1
>   NUM_OBJECTS=$2
> @@ -76,18 +78,35 @@ test_expect_success 'create objects' '
>  '
>  
>  test_expect_success 'write midx with one v1 pack' '
> - pack=$(git pack-objects --index-version=1 pack/test  - test_when_finished rm pack/test-$pack.pack pack/test-$pack.idx 
> pack/multi-pack-index &&
> - git multi-pack-index --object-dir=. write &&
> - midx_read_expect 1 18 4 .
> + pack=$(git pack-objects --index-version=1 $objdir/pack/test  &&
> + test_when_finished rm $objdir/pack/test-$pack.pack \
> + $objdir/pack/test-$pack.idx $objdir/pack/multi-pack-index &&
> + git multi-pack-index --object-dir=$objdir write &&
> + midx_read_expect 1 18 4 $objdir

Hmph, so we used to run tests as if $cwd were GIT_OBJECT_DIRECTORY
but now we are running them from the top-level of the working tree,
just like all the other tests?  Interesting.

>  '
>  
> +midx_git_two_modes() {
> + git -c core.multiPackIndex=false $1 >expect &&
> + git -c core.multiPackIndex=true $1 >actual &&
> + test_cmp expect actual
> +}
> +
> +compare_results_with_midx() {

Style: "compare_results_with_midx () {", just like mdx_read_expect
near the top of the file, but unlike midx_git_two_modes we see
nearby.  Please keep "git grep 'funcname () {'" a usable way to
locate where a shell function is defined without forcing people to
type an asterisk.

Re: [PATCH] RFC: submodule-config: introduce trust level

2018-07-12 Thread Junio C Hamano

Stefan Beller  writes:

> In practice we always want to stack the settings starting with the
> .gitmodules file as a base and then put the config on top of it.

Could you substantiate this claim with justification?  Thanks.

Re: [PATCH] sequencer.c: terminate the last line of author-script properly

2018-07-12 Thread Junio C Hamano

Junio C Hamano  writes:

> I _think_ the right and safe way to fix taht is to do something like
> this:
>
>   test -f .git/rebase-merge/author-script &&
>   (
>   safe_unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL ... &&
>   eval "$(cat .git/rebase-merge/author-script)" &&
>   test ... &&
>   test ... &&
>   test ...
>   )
>
> That way, we won't have to worry about GIT_AUTHOR_* variables
> getting modified and affecting the tests that come later in the
> script.

It turns out that the use of subshell is *essential* for this test,
as GIT_AUTHOR_* variables are exported and must remain so.  unsetting
and reading back may allows us to ensure that shell variables have
the expected value, but then they are no longer exported, which will
mean later tests will use whatever random author ident the person or
the 'bot who is running the tests, not the one expected to be used
by the test author(s).

For tonight's pushout, I'll queue this on top.

-- >8 --
From: Junio C Hamano 
Date: Thu, 12 Jul 2018 13:23:02 -0700
Subject: [PATCH] SQUASH???

---
 t/t3404-rebase-interactive.sh | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index 2d189da2f1..b0cef509ab 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -81,11 +81,13 @@ test_expect_success 'rebase -i writes out 
.git/rebase-merge/author-script in "ed
set_fake_editor &&
FAKE_LINES="edit 1" git rebase -i HEAD^ &&
test -f .git/rebase-merge/author-script &&
-   unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE &&
-   eval "$(cat .git/rebase-merge/author-script)" &&
-   test "$(git show --quiet --pretty=format:%an)" = "$GIT_AUTHOR_NAME" &&
-   test "$(git show --quiet --pretty=format:%ae)" = "$GIT_AUTHOR_EMAIL" &&
-   test "$(git show --quiet --date=raw --pretty=format:@%ad)" = 
"$GIT_AUTHOR_DATE"
+   (
+   sane_unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE &&
+   eval "$(cat .git/rebase-merge/author-script)" &&
+   test "$(git show --quiet --pretty=format:%an)" = 
"$GIT_AUTHOR_NAME" &&
+   test "$(git show --quiet --pretty=format:%ae)" = 
"$GIT_AUTHOR_EMAIL" &&
+   test "$(git show --quiet --date=raw --pretty=format:@%ad)" = 
"$GIT_AUTHOR_DATE"
+   )
 '
 
 test_expect_success 'rebase -i with the exec command' '
-- 
2.18.0-129-ge3331758f1

Re: [PATCH v3] handle lower case drive letters on Windows

2018-07-12 Thread Junio C Hamano

Ben Peart  writes:

>> > Thanks.  I thought it was strange as well until I realized you only
>> > see the error message if there _isn't_ a valid drive letter so seeing
>> > the entire path makes sense as it will likely be something like
>> "\\server\volume\directory"
>> 
>> Yup, that is why I thought Dscho meant to say something like
>> 
>>  "'%s': path without valid drive prefix"
>> 
>> in my response ;-)
>
> I'm fine with that - want me to leave it alone, spin a V4 or have you tweak 
> it?

That's "helped-by Dscho" patch, so I'd leave it to him by queuing v3
overnight and wait to deal with the final decision by the weekend
;-) That way, I do not have to make a decision on anything Windows
related.

Re: [PATCH] t3404: fix use of "VAR=VAL cmd" with a shell function

2018-07-12 Thread Junio C Hamano

Junio C Hamano  writes:

>>> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
>>> index 7e9f375a24..fd43443ff5 100755
>>> --- a/t/t3404-rebase-interactive.sh
>>> +++ b/t/t3404-rebase-interactive.sh
>>> @@ -280,8 +280,11 @@ test_expect_success 'retain authorship w/ conflicts' '
>>> git reset --hard twerp &&
>>> test_commit a conflict a conflict-a &&
>>> git reset --hard twerp &&
>>> -   GIT_AUTHOR_NAME=AttributeMe \
>>> +   oGIT_AUTHOR_NAME=$GIT_AUTHOR_NAME &&
>>> +   GIT_AUTHOR_NAME=AttributeMe &&
>>> +   export GIT_AUTHOR_NAME &&
>>> test_commit b conflict b conflict-b &&
>>> +   GIT_AUTHOR_NAME=$oGIT_AUTHOR_NAME &&
>>
>> ...would you want to use test_when_finished here (both for robustness,
>> but also to make it more clear to a reader what's going on)?
>
> Perhaps.

Yes, but this one ends up to be overly ugly.

The restoreing of the AUTHOR_NAME must be done not just before this
test_expect_success finishes and control goes on to the next test,
but also before "git rebase -i" in the middle of this test that is
expected to fail with conflict, as we want it to see the original
author name (not the modified AttributeMe name).

 t/t3404-rebase-interactive.sh | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index 481a350090..489b6196e0 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -256,11 +256,18 @@ test_expect_success 'retain authorship' '
 '
 
 test_expect_success 'retain authorship w/ conflicts' '
+   oGIT_AUTHOR_NAME=$GIT_AUTHOR_NAME &&
+   test_when_finished "GIT_AUTHOR_NAME=\$oGIT_AUTHOR_NAME" &&
+
git reset --hard twerp &&
test_commit a conflict a conflict-a &&
git reset --hard twerp &&
-   GIT_AUTHOR_NAME=AttributeMe \
+
+   GIT_AUTHOR_NAME=AttributeMe &&
+   export GIT_AUTHOR_NAME &&
test_commit b conflict b conflict-b &&
+   GIT_AUTHOR_NAME=$oGIT_AUTHOR_NAME &&
+
set_fake_editor &&
test_must_fail git rebase -i conflict-a &&
echo resolved >conflict &&

Re: [PATCH] t3404: fix use of "VAR=VAL cmd" with a shell function

2018-07-12 Thread Junio C Hamano

Jeff King  writes:

> On Thu, Jul 12, 2018 at 01:07:51PM -0700, Junio C Hamano wrote:
>
>> Bash may take it happily but running test with dash reveals a breakage.
>> 
>> This was not discovered for a long time as no tests after this test
>> depended on GIT_AUTHOR_NAME to be reverted correctly back to the
>> original value after this step is done.
>> 
>> Signed-off-by: Junio C Hamano 
>> ---
>> 
>>  * We could enclose the setting and exporting inside a subshell and
>>do without the oGIT_AUTHOR_NAME temporary variable, but that
>>would interfere with the timestamp increments done by
>>test_commit, so I think doing it this way may be preferrable.
>
> Yeah, I agree that setting/unsetting is probably more sane for this
> case. Though...
>
>> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
>> index 7e9f375a24..fd43443ff5 100755
>> --- a/t/t3404-rebase-interactive.sh
>> +++ b/t/t3404-rebase-interactive.sh
>> @@ -280,8 +280,11 @@ test_expect_success 'retain authorship w/ conflicts' '
>>  git reset --hard twerp &&
>>  test_commit a conflict a conflict-a &&
>>  git reset --hard twerp &&
>> -GIT_AUTHOR_NAME=AttributeMe \
>> +oGIT_AUTHOR_NAME=$GIT_AUTHOR_NAME &&
>> +GIT_AUTHOR_NAME=AttributeMe &&
>> +export GIT_AUTHOR_NAME &&
>>  test_commit b conflict b conflict-b &&
>> +GIT_AUTHOR_NAME=$oGIT_AUTHOR_NAME &&
>
> ...would you want to use test_when_finished here (both for robustness,
> but also to make it more clear to a reader what's going on)?

Perhaps.

I wish our test-lint caught "VAR=VAL shellfunc", but it is rather
hard to do so, I would imagine.

Re: [PATCH] sequencer.c: terminate the last line of author-script properly

2018-07-12 Thread Junio C Hamano

Yup ;-)

On Thu, Jul 12, 2018 at 1:16 PM, Eric Sunshine  wrote:
> On Thu, Jul 12, 2018 at 4:13 PM Junio C Hamano  wrote:
>> I _think_ the right and safe way to fix taht is to do something like
>> this:
>>
>> test -f .git/rebase-merge/author-script &&
>> (
>> safe_unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL ... &&
>
> s/safe_unset/sane_unset/
>
>> eval "$(cat .git/rebase-merge/author-script)" &&
>> test ... &&
>> test ... &&
>> test ...
>> )

Re: [PATCH] sequencer.c: terminate the last line of author-script properly

2018-07-12 Thread Eric Sunshine

On Thu, Jul 12, 2018 at 4:13 PM Junio C Hamano  wrote:
> I _think_ the right and safe way to fix taht is to do something like
> this:
>
> test -f .git/rebase-merge/author-script &&
> (
> safe_unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL ... &&

s/safe_unset/sane_unset/

> eval "$(cat .git/rebase-merge/author-script)" &&
> test ... &&
> test ... &&
> test ...
> )

Re: [PATCH] t3404: fix use of "VAR=VAL cmd" with a shell function

2018-07-12 Thread Jeff King

On Thu, Jul 12, 2018 at 01:07:51PM -0700, Junio C Hamano wrote:

> Bash may take it happily but running test with dash reveals a breakage.
> 
> This was not discovered for a long time as no tests after this test
> depended on GIT_AUTHOR_NAME to be reverted correctly back to the
> original value after this step is done.
> 
> Signed-off-by: Junio C Hamano 
> ---
> 
>  * We could enclose the setting and exporting inside a subshell and
>do without the oGIT_AUTHOR_NAME temporary variable, but that
>would interfere with the timestamp increments done by
>test_commit, so I think doing it this way may be preferrable.

Yeah, I agree that setting/unsetting is probably more sane for this
case. Though...

> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> index 7e9f375a24..fd43443ff5 100755
> --- a/t/t3404-rebase-interactive.sh
> +++ b/t/t3404-rebase-interactive.sh
> @@ -280,8 +280,11 @@ test_expect_success 'retain authorship w/ conflicts' '
>   git reset --hard twerp &&
>   test_commit a conflict a conflict-a &&
>   git reset --hard twerp &&
> - GIT_AUTHOR_NAME=AttributeMe \
> + oGIT_AUTHOR_NAME=$GIT_AUTHOR_NAME &&
> + GIT_AUTHOR_NAME=AttributeMe &&
> + export GIT_AUTHOR_NAME &&
>   test_commit b conflict b conflict-b &&
> + GIT_AUTHOR_NAME=$oGIT_AUTHOR_NAME &&

...would you want to use test_when_finished here (both for robustness,
but also to make it more clear to a reader what's going on)?

It's too bad test_commit does not take arbitrary options, as you could
just use --author then, side-stepping the whole thing.

-Peff

Re: [PATCH] sequencer.c: terminate the last line of author-script properly

2018-07-12 Thread Junio C Hamano

"Akinori MUSHA"  writes:

> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> index 352a52e59..345b103eb 100755
> --- a/t/t3404-rebase-interactive.sh
> +++ b/t/t3404-rebase-interactive.sh
> @@ -75,6 +75,19 @@ test_expect_success 'rebase --keep-empty' '
>   test_line_count = 6 actual
>  '
>  
> +test_expect_success 'rebase -i writes out .git/rebase-merge/author-script in 
> "edit" that sh(1) can parse' '
> + test_when_finished "git rebase --abort ||:" &&
> + git checkout master &&
> + set_fake_editor &&
> + FAKE_LINES="edit 1" git rebase -i HEAD^ &&
> + test -f .git/rebase-merge/author-script &&
> + unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE &&

Is this "unset" safe?  Some POSIX compliant shells barf if you unset
a variable that is not set, so the answer to my question is yes only
if we know these three variables are always set.

> + eval "$(cat .git/rebase-merge/author-script)" &&
> + test "$(git show --quiet --pretty=format:%an)" = "$GIT_AUTHOR_NAME" &&
> + test "$(git show --quiet --pretty=format:%ae)" = "$GIT_AUTHOR_EMAIL" &&
> + test "$(git show --quiet --date=raw --pretty=format:@%ad)" = 
> "$GIT_AUTHOR_DATE"

Oh, actually it is even worse than that.  What if author-script is
bogus, like in the version before your patch fixes the code?  We do
not restore the AUTHOR_NAME/EMAIL/DATE after this test_expect_success
fails.  How does that, i.e. missing some variable, affect execution
of later steps in this same test script?

I _think_ the right and safe way to fix taht is to do something like
this:

test -f .git/rebase-merge/author-script &&
(
safe_unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL ... &&
eval "$(cat .git/rebase-merge/author-script)" &&
test ... &&
test ... &&
test ...
)

That way, we won't have to worry about GIT_AUTHOR_* variables
getting modified and affecting the tests that come later in the
script.

> +'
> +
>  test_expect_success 'rebase -i with the exec command' '
>   git checkout master &&
>   (
> -- 
> 2.18.0

RE: [PATCH v3] handle lower case drive letters on Windows

2018-07-12 Thread Ben Peart

> > Thanks.  I thought it was strange as well until I realized you only
> > see the error message if there _isn't_ a valid drive letter so seeing
> > the entire path makes sense as it will likely be something like
> "\\server\volume\directory"
> 
> Yup, that is why I thought Dscho meant to say something like
> 
>   "'%s': path without valid drive prefix"
> 
> in my response ;-)

I'm fine with that - want me to leave it alone, spin a V4 or have you tweak it?

[PATCH] t3404: fix use of "VAR=VAL cmd" with a shell function

2018-07-12 Thread Junio C Hamano

Bash may take it happily but running test with dash reveals a breakage.

This was not discovered for a long time as no tests after this test
depended on GIT_AUTHOR_NAME to be reverted correctly back to the
original value after this step is done.

Signed-off-by: Junio C Hamano 
---

 * We could enclose the setting and exporting inside a subshell and
   do without the oGIT_AUTHOR_NAME temporary variable, but that
   would interfere with the timestamp increments done by
   test_commit, so I think doing it this way may be preferrable.

 t/t3404-rebase-interactive.sh | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index 7e9f375a24..fd43443ff5 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -280,8 +280,11 @@ test_expect_success 'retain authorship w/ conflicts' '
git reset --hard twerp &&
test_commit a conflict a conflict-a &&
git reset --hard twerp &&
-   GIT_AUTHOR_NAME=AttributeMe \
+   oGIT_AUTHOR_NAME=$GIT_AUTHOR_NAME &&
+   GIT_AUTHOR_NAME=AttributeMe &&
+   export GIT_AUTHOR_NAME &&
test_commit b conflict b conflict-b &&
+   GIT_AUTHOR_NAME=$oGIT_AUTHOR_NAME &&
set_fake_editor &&
test_must_fail git rebase -i conflict-a &&
echo resolved >conflict &&

Re: [PATCH v2] sha1-name.c: for ":/", find detached HEAD commits

2018-07-12 Thread William Chargin

Contents look good to me. I don't understand why the file name in your
patch is sha1_name.c as opposed to sha1-name.c (I see e5e5e0883 from
2018-04-10, but that sounds pretty old), but I trust that whatever
you're doing there is correct.

> Thanks for working on this.

You're quite welcome. Thanks to you, Peff, and Duy for your help. This
was a pleasant experience for me as a new contributor. :-)

Re: [PATCH v2 01/10] t/test-lib: teach --chain-lint to detect broken &&-chains in subshells

2018-07-12 Thread Junio C Hamano

Eric Sunshine  writes:

> However, existing tests aside, the more important goal is detecting
> problems in new or updated tests hiding genuine bugs in changes to Git
> itself, so it may have some value.

Yes, indeed.

Re: [PATCH v3] handle lower case drive letters on Windows

2018-07-12 Thread Junio C Hamano

Ben Peart  writes:

>> -Original Message-
>> From: Junio C Hamano  On Behalf Of Junio C Hamano
>> Sent: Thursday, July 12, 2018 3:13 PM
>> To: Ben Peart 
>> Cc: git@vger.kernel.org; sbel...@google.com; johannes.schinde...@gmx.de
>> Subject: Re: [PATCH v3] handle lower case drive letters on Windows
>> 
>> Ben Peart  writes:
>> 
>> > On Windows, if a tool calls SetCurrentDirectory with a lower case drive
>> > letter, the subsequent call to GetCurrentDirectory will return the same
>> > lower case drive letter. Powershell, for example, does not normalize the
>> > path. If that happens, test-drop-caches will error out as it does not
>> > correctly to handle lower case drive letters.
>> >
>> > Helped-by: Johannes Schindelin 
>> > Signed-off-by: Ben Peart 
>> > ---
>> 
>> Thanks.  Will replace, even though showing the whole Buffer (I am
>> guessing it is the equivalent of result from getcwd(3) call) and
>> complaining about drive "letter" feels a bit strange ;-)
>> 
>
> Thanks.  I thought it was strange as well until I realized you only see the
> error message if there _isn't_ a valid drive letter so seeing the entire
> path makes sense as it will likely be something like 
> "\\server\volume\directory"

Yup, that is why I thought Dscho meant to say something like

"'%s': path without valid drive prefix"

in my response ;-)

[PATCH 3/6] builtin/submodule--helper: factor out submodule updating

2018-07-12 Thread Stefan Beller

Separate the command line parsing from the actual execution of the command
within the repository. For now there is not a lot of execution as
most of it is still in git-submodule.sh.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 59 +
 1 file changed, 33 insertions(+), 26 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index ebcfe85bfa9..96929ba1096 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1472,6 +1472,8 @@ struct submodule_update_clone {
/* failed clones to be retried again */
const struct cache_entry **failed_clones;
int failed_clones_nr, failed_clones_alloc;
+
+   int max_jobs;
 };
 #define SUBMODULE_UPDATE_CLONE_INIT {0, MODULE_LIST_INIT, 0, \
SUBMODULE_UPDATE_STRATEGY_INIT, 0, 0, -1, STRING_LIST_INIT_DUP, 0, \
@@ -1714,11 +1716,36 @@ static int gitmodules_update_clone_config(const char 
*var, const char *value,
return 0;
 }
 
+static int update_submodules(struct submodule_update_clone *suc)
+{
+   struct string_list_item *item;
+
+   run_processes_parallel(suc->max_jobs,
+  update_clone_get_next_task,
+  update_clone_start_failure,
+  update_clone_task_finished,
+  suc);
+
+   /*
+* We saved the output and put it out all at once now.
+* That means:
+* - the listener does not have to interleave their (checkout)
+*   work with our fetching.  The writes involved in a
+*   checkout involve more straightforward sequential I/O.
+* - the listener can avoid doing any work if fetching failed.
+*/
+   if (suc->quickstop)
+   return 1;
+
+   for_each_string_list_item(item, >projectlines)
+   fprintf(stdout, "%s", item->string);
+
+   return 0;
+}
+
 static int update_clone(int argc, const char **argv, const char *prefix)
 {
const char *update = NULL;
-   int max_jobs = 1;
-   struct string_list_item *item;
struct pathspec pathspec;
struct submodule_update_clone suc = SUBMODULE_UPDATE_CLONE_INIT;
 
@@ -1740,7 +1767,7 @@ static int update_clone(int argc, const char **argv, 
const char *prefix)
OPT_STRING(0, "depth", , "",
   N_("Create a shallow clone truncated to the "
  "specified number of revisions")),
-   OPT_INTEGER('j', "jobs", _jobs,
+   OPT_INTEGER('j', "jobs", _jobs,
N_("parallel jobs")),
OPT_BOOL(0, "recommend-shallow", _shallow,
N_("whether the initial clone should follow the 
shallow recommendation")),
@@ -1756,8 +1783,8 @@ static int update_clone(int argc, const char **argv, 
const char *prefix)
};
suc.prefix = prefix;
 
-   config_from_gitmodules(gitmodules_update_clone_config, _jobs);
-   git_config(gitmodules_update_clone_config, _jobs);
+   config_from_gitmodules(gitmodules_update_clone_config, _jobs);
+   git_config(gitmodules_update_clone_config, _jobs);
 
argc = parse_options(argc, argv, prefix, module_update_clone_options,
 git_submodule_helper_usage, 0);
@@ -1772,27 +1799,7 @@ static int update_clone(int argc, const char **argv, 
const char *prefix)
if (pathspec.nr)
suc.warn_if_uninitialized = 1;
 
-   run_processes_parallel(max_jobs,
-  update_clone_get_next_task,
-  update_clone_start_failure,
-  update_clone_task_finished,
-  );
-
-   /*
-* We saved the output and put it out all at once now.
-* That means:
-* - the listener does not have to interleave their (checkout)
-*   work with our fetching.  The writes involved in a
-*   checkout involve more straightforward sequential I/O.
-* - the listener can avoid doing any work if fetching failed.
-*/
-   if (suc.quickstop)
-   return 1;
-
-   for_each_string_list_item(item, )
-   fprintf(stdout, "%s", item->string);
-
-   return 0;
+   return update_submodules();
 }
 
 static int resolve_relative_path(int argc, const char **argv, const char 
*prefix)
-- 
2.18.0.203.gfac676dfb9-goog

[PATCH 4/6] builtin/submodule--helper: store update_clone information in a struct

2018-07-12 Thread Stefan Beller

The information that is printed for update_submodules in
'submodule--helper update-clone' and consumed by 'git submodule update'
is stored as a string per submodule. This made sense at the time of
48308681b07 (git submodule update: have a dedicated helper for cloning,
2016-02-29), but as we want to migrate the rest of the submodule update
into C, we're better off having access to the raw information in a helper
struct.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 37 +++--
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 96929ba1096..c9c3fe2bf28 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1444,6 +1444,12 @@ static int module_clone(int argc, const char **argv, 
const char *prefix)
return 0;
 }
 
+struct submodule_update_clone_information {
+   const struct submodule *sub;
+   struct object_id oid;
+   unsigned just_cloned;
+};
+
 struct submodule_update_clone {
/* index into 'list', the list of submodules to look into for cloning */
int current;
@@ -1463,8 +1469,9 @@ struct submodule_update_clone {
const char *recursive_prefix;
const char *prefix;
 
-   /* Machine-readable status lines to be consumed by git-submodule.sh */
-   struct string_list projectlines;
+   /* to be consumed by git-submodule.sh */
+   struct submodule_update_clone_information *submodule_lines;
+   int submodule_lines_nr; int submodule_lines_alloc;
 
/* If we want to stop as fast as possible and return an error */
unsigned quickstop : 1;
@@ -1478,7 +1485,7 @@ struct submodule_update_clone {
 #define SUBMODULE_UPDATE_CLONE_INIT {0, MODULE_LIST_INIT, 0, \
SUBMODULE_UPDATE_STRATEGY_INIT, 0, 0, -1, STRING_LIST_INIT_DUP, 0, \
NULL, NULL, NULL, \
-   STRING_LIST_INIT_DUP, 0, NULL, 0, 0}
+   NULL, 0, 0, 0, NULL, 0, 0, 0}
 
 
 static void next_submodule_warn_missing(struct submodule_update_clone *suc,
@@ -1572,10 +1579,12 @@ static int prepare_to_clone_next_submodule(const struct 
cache_entry *ce,
strbuf_addf(, "%s/.git", ce->name);
needs_cloning = !file_exists(sb.buf);
 
-   strbuf_reset();
-   strbuf_addf(, "dummy %s %d\t%s\n",
-   oid_to_hex(>oid), needs_cloning, ce->name);
-   string_list_append(>projectlines, sb.buf);
+   ALLOC_GROW(suc->submodule_lines, suc->submodule_lines_nr + 1,
+suc->submodule_lines_alloc);
+   oidcpy(>submodule_lines[suc->submodule_lines_nr].oid, >oid);
+   suc->submodule_lines[suc->submodule_lines_nr].just_cloned = 
needs_cloning;
+   suc->submodule_lines[suc->submodule_lines_nr].sub = sub;
+   suc->submodule_lines_nr++;
 
if (!needs_cloning)
goto cleanup;
@@ -1718,7 +1727,8 @@ static int gitmodules_update_clone_config(const char 
*var, const char *value,
 
 static int update_submodules(struct submodule_update_clone *suc)
 {
-   struct string_list_item *item;
+   int i;
+   struct strbuf sb = STRBUF_INIT;
 
run_processes_parallel(suc->max_jobs,
   update_clone_get_next_task,
@@ -1737,9 +1747,16 @@ static int update_submodules(struct 
submodule_update_clone *suc)
if (suc->quickstop)
return 1;
 
-   for_each_string_list_item(item, >projectlines)
-   fprintf(stdout, "%s", item->string);
+   for (i = 0; i < suc->submodule_lines_nr; i++) {
+   strbuf_addf(, "dummy %s %d\t%s\n",
+   oid_to_hex(>submodule_lines[i].oid),
+   suc->submodule_lines[i].just_cloned,
+   suc->submodule_lines[i].sub->path);
+   fprintf(stdout, "%s", sb.buf);
+   strbuf_reset();
+   }
 
+   strbuf_release();
return 0;
 }
 
-- 
2.18.0.203.gfac676dfb9-goog

[PATCH 2/6] git-submodule.sh: rename unused variables

2018-07-12 Thread Stefan Beller

The 'mode' variable is not used in cmd_update for its original purpose,
rename it to 'dummy' as it only serves the purpose to abort quickly
documenting this knowledge.

The variable 'stage' is also not used any more in cmd_update, so remove it.

This went unnoticed as first each function used the commonly used
submodule listing, which was converted in 74703a1e4df (submodule: rewrite
`module_list` shell function in C, 2015-09-02). When cmd_update was
using its own function starting in 48308681b07 (git submodule update:
have a dedicated helper for cloning, 2016-02-29), its removal was missed.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 5 ++---
 git-submodule.sh| 4 ++--
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 20ae9191ca3..ebcfe85bfa9 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1571,9 +1571,8 @@ static int prepare_to_clone_next_submodule(const struct 
cache_entry *ce,
needs_cloning = !file_exists(sb.buf);
 
strbuf_reset();
-   strbuf_addf(, "%06o %s %d %d\t%s\n", ce->ce_mode,
-   oid_to_hex(>oid), ce_stage(ce),
-   needs_cloning, ce->name);
+   strbuf_addf(, "dummy %s %d\t%s\n",
+   oid_to_hex(>oid), needs_cloning, ce->name);
string_list_append(>projectlines, sb.buf);
 
if (!needs_cloning)
diff --git a/git-submodule.sh b/git-submodule.sh
index 8a611865397..56588aa304d 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -531,9 +531,9 @@ cmd_update()
"$@" || echo "#unmatched" $?
} | {
err=
-   while read -r mode sha1 stage just_cloned sm_path
+   while read -r quickabort sha1 just_cloned sm_path
do
-   die_if_unmatched "$mode" "$sha1"
+   die_if_unmatched "$quickabort" "$sha1"
 
name=$(git submodule--helper name "$sm_path") || exit
if ! test -z "$update"
-- 
2.18.0.203.gfac676dfb9-goog

[PATCH 5/6] builtin/submodule--helper: factor out method to update a single submodule

2018-07-12 Thread Stefan Beller

In a later patch we'll find this method handy.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index c9c3fe2bf28..4bbf580df79 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1725,10 +1725,17 @@ static int gitmodules_update_clone_config(const char 
*var, const char *value,
return 0;
 }
 
+static void update_submodule(struct submodule_update_clone_information *suci)
+{
+   fprintf(stdout, "dummy %s %d\t%s\n",
+   oid_to_hex(>oid),
+   suci->just_cloned,
+   suci->sub->path);
+}
+
 static int update_submodules(struct submodule_update_clone *suc)
 {
int i;
-   struct strbuf sb = STRBUF_INIT;
 
run_processes_parallel(suc->max_jobs,
   update_clone_get_next_task,
@@ -1747,16 +1754,9 @@ static int update_submodules(struct 
submodule_update_clone *suc)
if (suc->quickstop)
return 1;
 
-   for (i = 0; i < suc->submodule_lines_nr; i++) {
-   strbuf_addf(, "dummy %s %d\t%s\n",
-   oid_to_hex(>submodule_lines[i].oid),
-   suc->submodule_lines[i].just_cloned,
-   suc->submodule_lines[i].sub->path);
-   fprintf(stdout, "%s", sb.buf);
-   strbuf_reset();
-   }
+   for (i = 0; i < suc->submodule_lines_nr; i++)
+   update_submodule(>submodule_lines[i]);
 
-   strbuf_release();
return 0;
 }
 
-- 
2.18.0.203.gfac676dfb9-goog

[PATCH 1/6] git-submodule.sh: align error reporting for update mode to use path

2018-07-12 Thread Stefan Beller

All other error messages in cmd_update are reporting the submodule based
on its path, so let's do that for invalid update modes, too.

Signed-off-by: Stefan Beller 
---
 git-submodule.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 5f9d9f6ea37..8a611865397 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -627,7 +627,7 @@ cmd_update()
must_die_on_failure=yes
;;
*)
-   die "$(eval_gettext "Invalid update mode 
'$update_module' for submodule '$name'")"
+   die "$(eval_gettext "Invalid update mode 
'$update_module' for submodule path '$path'")"
esac
 
if (sanitize_submodule_env; cd "$sm_path" && $command 
"$sha1")
-- 
2.18.0.203.gfac676dfb9-goog

[PATCH 6/6] submodule--helper: introduce new update-module-mode helper

2018-07-12 Thread Stefan Beller

This chews off a bit of the shell part of the update command in
git-submodule.sh. When writing the C code, keep in mind that the
submodule--helper part will go away eventually and we want to have
a C function that is able to determine the submodule update strategy,
it as a nicety, make determine_submodule_update_strategy accessible
for arbitrary repositories.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 61 +
 git-submodule.sh| 16 +-
 2 files changed, 62 insertions(+), 15 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 4bbf580df79..e53231cf286 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1444,6 +1444,66 @@ static int module_clone(int argc, const char **argv, 
const char *prefix)
return 0;
 }
 
+static void determine_submodule_update_strategy(struct repository *r,
+   int just_cloned,
+   const char *path,
+   const char *update,
+   struct 
submodule_update_strategy *out)
+{
+   const struct submodule *sub = submodule_from_path(r, _oid, path);
+   char *key;
+   const char *val;
+
+   key = xstrfmt("submodule.%s.update", sub->name);
+
+   if (update) {
+   trace_printf("parsing update");
+   if (parse_submodule_update_strategy(update, out) < 0)
+   die(_("Invalid update mode '%s' for submodule path 
'%s'"),
+   update, path);
+   } else if (!repo_config_get_string_const(r, key, )) {
+   if (parse_submodule_update_strategy(val, out) < 0)
+   die(_("Invalid update mode '%s' configured for 
submodule path '%s'"),
+   val, path);
+   } else if (sub->update_strategy.type != SM_UPDATE_UNSPECIFIED) {
+   trace_printf("loaded thing");
+   out->type = sub->update_strategy.type;
+   out->command = sub->update_strategy.command;
+   } else
+   out->type = SM_UPDATE_CHECKOUT;
+
+   if (just_cloned &&
+   (out->type == SM_UPDATE_MERGE ||
+out->type == SM_UPDATE_REBASE ||
+out->type == SM_UPDATE_NONE))
+   out->type = SM_UPDATE_CHECKOUT;
+
+   free(key);
+}
+
+static int module_update_module_mode(int argc, const char **argv, const char 
*prefix)
+{
+   const char *path, *update = NULL;
+   int just_cloned;
+   struct submodule_update_strategy update_strategy = { .type = 
SM_UPDATE_CHECKOUT };
+
+   if (argc < 3 || argc > 4)
+   die("submodule--helper update-module-clone expects 
  []");
+
+   just_cloned = git_config_int("just_cloned", argv[1]);
+   path = argv[2];
+
+   if (argc == 4)
+   update = argv[3];
+
+   determine_submodule_update_strategy(the_repository,
+   just_cloned, path, update,
+   _strategy);
+   fprintf(stdout, submodule_strategy_to_string(_strategy));
+
+   return 0;
+}
+
 struct submodule_update_clone_information {
const struct submodule *sub;
struct object_id oid;
@@ -2039,6 +2099,7 @@ static struct cmd_struct commands[] = {
{"list", module_list, 0},
{"name", module_name, 0},
{"clone", module_clone, 0},
+   {"update-module-mode", module_update_module_mode, 0},
{"update-clone", update_clone, 0},
{"relative-path", resolve_relative_path, 0},
{"resolve-relative-url", resolve_relative_url, 0},
diff --git a/git-submodule.sh b/git-submodule.sh
index 56588aa304d..215760898ce 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -535,27 +535,13 @@ cmd_update()
do
die_if_unmatched "$quickabort" "$sha1"
 
-   name=$(git submodule--helper name "$sm_path") || exit
-   if ! test -z "$update"
-   then
-   update_module=$update
-   else
-   update_module=$(git config submodule."$name".update)
-   if test -z "$update_module"
-   then
-   update_module="checkout"
-   fi
-   fi
+   update_module=$(git submodule--helper update-module-mode 
$just_cloned "$sm_path" $update)
 
displaypath=$(git submodule--helper relative-path 
"$prefix$sm_path" "$wt_prefix")
 
if test $just_cloned -eq 1
then
subsha1=
-   case "$update_module" in
-   merge | rebase | none)
-   update_module=checkout ;;
-   esac
else

[PATCH 0/6] git-submodule.sh: convert part of cmd_update to C

2018-07-12 Thread Stefan Beller

I thought about writing it all in one go, but the series got too large,
so let's chew one bite at a time.

Thanks,
Stefan

Stefan Beller (6):
  git-submodule.sh: align error reporting for update mode to use path
  git-submodule.sh: rename unused variables
  builtin/submodule--helper: factor out submodule updating
  builtin/submodule--helper: store update_clone information in a struct
  builtin/submodule--helper: factor out method to update a single
submodule
  submodule--helper: introduce new update-module-mode helper

 builtin/submodule--helper.c | 152 
 git-submodule.sh|  22 +-
 2 files changed, 122 insertions(+), 52 deletions(-)

-- 
2.18.0.203.gfac676dfb9-goog

[PATCH v4 22/23] packfile: skip loading index if in multi-pack-index

2018-07-12 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 packfile.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/packfile.c b/packfile.c
index 2c819a0ad8..e6ecf12ab5 100644
--- a/packfile.c
+++ b/packfile.c
@@ -469,8 +469,19 @@ static int open_packed_git_1(struct packed_git *p)
ssize_t read_result;
const unsigned hashsz = the_hash_algo->rawsz;
 
-   if (!p->index_data && open_pack_index(p))
-   return error("packfile %s index unavailable", p->pack_name);
+   if (!p->index_data) {
+   struct multi_pack_index *m;
+   const char *pack_name = strrchr(p->pack_name, '/');
+
+   for (m = the_repository->objects->multi_pack_index;
+m; m = m->next) {
+   if (midx_contains_pack(m, pack_name))
+   break;
+   }
+
+   if (!m && open_pack_index(p))
+   return error("packfile %s index unavailable", 
p->pack_name);
+   }
 
if (!pack_max_fds) {
unsigned int max_fds = get_max_fd_limit();
@@ -521,6 +532,10 @@ static int open_packed_git_1(struct packed_git *p)
" supported (try upgrading GIT to a newer version)",
p->pack_name, ntohl(hdr.hdr_version));
 
+   /* Skip index checking if in multi-pack-index */
+   if (!p->index_data)
+   return 0;
+
/* Verify the pack matches its index. */
if (p->num_objects != ntohl(hdr.hdr_entries))
return error("packfile %s claims to have %"PRIu32" objects"
-- 
2.18.0.118.gd4f65b8d14

[PATCH v4 23/23] midx: clear midx on repack

2018-07-12 Thread Derrick Stolee

If a 'git repack' command replaces existing packfiles, then we must
clear the existing multi-pack-index before moving the packfiles it
references.

Signed-off-by: Derrick Stolee 
---
 builtin/repack.c|  9 +
 midx.c  | 12 
 midx.h  |  1 +
 t/t5319-multi-pack-index.sh |  9 +
 4 files changed, 31 insertions(+)

diff --git a/builtin/repack.c b/builtin/repack.c
index 6c636e159e..7f7cdc8b17 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -8,6 +8,7 @@
 #include "strbuf.h"
 #include "string-list.h"
 #include "argv-array.h"
+#include "midx.h"
 
 static int delta_base_offset = 1;
 static int pack_kept_objects = -1;
@@ -174,6 +175,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
int no_update_server_info = 0;
int quiet = 0;
int local = 0;
+   int midx_cleared = 0;
 
struct option builtin_repack_options[] = {
OPT_BIT('a', NULL, _everything,
@@ -333,6 +335,13 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
for_each_string_list_item(item, ) {
for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
+
+   if (!midx_cleared) {
+   /* if we move a packfile, it will invalidated 
the midx */
+   clear_midx_file(get_object_directory());
+   midx_cleared = 1;
+   }
+
fname = mkpathdup("%s/pack-%s%s", packdir,
item->string, exts[ext].name);
if (!file_exists(fname)) {
diff --git a/midx.c b/midx.c
index bf2334acc6..19b7df338e 100644
--- a/midx.c
+++ b/midx.c
@@ -904,3 +904,15 @@ int write_midx_file(const char *object_dir)
free(midx_name);
return 0;
 }
+
+void clear_midx_file(const char *object_dir)
+{
+   char *midx = get_midx_filename(object_dir);
+
+   if (remove_path(midx)) {
+   UNLEAK(midx);
+   die(_("failed to clear multi-pack-index at %s"), midx);
+   }
+
+   free(midx);
+}
diff --git a/midx.h b/midx.h
index d4cde99473..e3b07f1586 100644
--- a/midx.h
+++ b/midx.h
@@ -39,5 +39,6 @@ int midx_contains_pack(struct multi_pack_index *m, const char 
*idx_name);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir);
 
 int write_midx_file(const char *object_dir);
+void clear_midx_file(const char *object_dir);
 
 #endif
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 601e28a2f0..5ad6614465 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -141,6 +141,15 @@ test_expect_success 'write midx with twelve packs' '
 
 compare_results_with_midx "twelve packs"
 
+test_expect_success 'repack removes multi-pack-index' '
+   test_path_is_file $objdir/pack/multi-pack-index &&
+   git repack -adf &&
+   test_path_is_missing $objdir/pack/multi-pack-index
+'
+
+compare_results_with_midx "after repack"
+
+
 # usage: corrupt_data   []
 corrupt_data () {
file=$1
-- 
2.18.0.118.gd4f65b8d14

[PATCH v4 19/23] midx: use existing midx when writing new one

2018-07-12 Thread Derrick Stolee

Due to how Windows handles replacing a lockfile when there is an open
handle, create the close_midx() method to close the existing midx before
writing the new one.

Signed-off-by: Derrick Stolee 
---
 midx.c | 116 ++---
 midx.h |   1 +
 2 files changed, 111 insertions(+), 6 deletions(-)

diff --git a/midx.c b/midx.c
index 4e014ff6e3..bf2334acc6 100644
--- a/midx.c
+++ b/midx.c
@@ -179,6 +179,23 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
return NULL;
 }
 
+static void close_midx(struct multi_pack_index *m)
+{
+   uint32_t i;
+   munmap((unsigned char *)m->data, m->data_len);
+   close(m->fd);
+   m->fd = -1;
+
+   for (i = 0; i < m->num_packs; i++) {
+   if (m->packs[i]) {
+   close_pack(m->packs[i]);
+   free(m->packs);
+   }
+   }
+   FREE_AND_NULL(m->packs);
+   FREE_AND_NULL(m->pack_names);
+}
+
 static int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id)
 {
struct strbuf pack_name = STRBUF_INIT;
@@ -278,6 +295,29 @@ int fill_midx_entry(const struct object_id *oid, struct 
pack_entry *e, struct mu
return nth_midxed_pack_entry(m, e, pos);
 }
 
+int midx_contains_pack(struct multi_pack_index *m, const char *idx_name)
+{
+   uint32_t first = 0, last = m->num_packs;
+
+   while (first < last) {
+   uint32_t mid = first + (last - first) / 2;
+   const char *current;
+   int cmp;
+
+   current = m->pack_names[mid];
+   cmp = strcmp(idx_name, current);
+   if (!cmp)
+   return 1;
+   if (cmp > 0) {
+   first = mid + 1;
+   continue;
+   }
+   last = mid;
+   }
+
+   return 0;
+}
+
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir)
 {
struct multi_pack_index *m = r->objects->multi_pack_index;
@@ -326,6 +366,7 @@ struct pack_list {
uint32_t alloc_list;
uint32_t alloc_names;
size_t pack_name_concat_len;
+   struct multi_pack_index *m;
 };
 
 static void add_pack_to_midx(const char *full_path, size_t full_path_len,
@@ -334,6 +375,9 @@ static void add_pack_to_midx(const char *full_path, size_t 
full_path_len,
struct pack_list *packs = (struct pack_list *)data;
 
if (ends_with(file_name, ".idx")) {
+   if (packs->m && midx_contains_pack(packs->m, file_name))
+   return;
+
ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list);
ALLOC_GROW(packs->names, packs->nr + 1, packs->alloc_names);
 
@@ -419,6 +463,23 @@ static int midx_oid_compare(const void *_a, const void *_b)
return a->pack_int_id - b->pack_int_id;
 }
 
+static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
+ uint32_t *pack_perm,
+ struct pack_midx_entry *e,
+ uint32_t pos)
+{
+   if (pos >= m->num_objects)
+   return 1;
+
+   nth_midxed_object_oid(>oid, m, pos);
+   e->pack_int_id = pack_perm[nth_midxed_pack_int_id(m, pos)];
+   e->offset = nth_midxed_offset(m, pos);
+
+   /* consider objects in midx to be from "old" packs */
+   e->pack_mtime = 0;
+   return 0;
+}
+
 static void fill_pack_entry(uint32_t pack_int_id,
struct packed_git *p,
uint32_t cur_object,
@@ -444,7 +505,8 @@ static void fill_pack_entry(uint32_t pack_int_id,
  * Copy only the de-duplicated entries (selected by most-recent modified time
  * of a packfile containing the object).
  */
-static struct pack_midx_entry *get_sorted_entries(struct packed_git **p,
+static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
+ struct packed_git **p,
  uint32_t *perm,
  uint32_t nr_packs,
  uint32_t *nr_objects)
@@ -453,8 +515,9 @@ static struct pack_midx_entry *get_sorted_entries(struct 
packed_git **p,
uint32_t alloc_fanout, alloc_objects, total_objects = 0;
struct pack_midx_entry *entries_by_fanout = NULL;
struct pack_midx_entry *deduplicated_entries = NULL;
+   uint32_t start_pack = m ? m->num_packs : 0;
 
-   for (cur_pack = 0; cur_pack < nr_packs; cur_pack++)
+   for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++)
total_objects += p[cur_pack]->num_objects;
 
/*
@@ -471,7 +534,23 @@ static struct pack_midx_entry *get_sorted_entries(struct 
packed_git **p,
for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
uint32_t nr_fanout =

[PATCH v4 21/23] midx: prevent duplicate packfile loads

2018-07-12 Thread Derrick Stolee

The multi-pack-index, when present, tracks the existence of objects and
their offsets within a list of packfiles. This allows us to use the
multi-pack-index for object lookups, abbreviations, and object counts.

When the multi-pack-index tracks a packfile, then we do not need to add
that packfile to the packed_git linked list or the MRU list.

We still need to load the packfiles that are not tracked by the
multi-pack-index.

Signed-off-by: Derrick Stolee 
---
 packfile.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/packfile.c b/packfile.c
index 97e7812b6b..2c819a0ad8 100644
--- a/packfile.c
+++ b/packfile.c
@@ -795,6 +795,7 @@ struct prepare_pack_data {
struct repository *r;
struct string_list *garbage;
int local;
+   struct multi_pack_index *m;
 };
 
 static void prepare_pack(const char *full_name, size_t full_name_len,
@@ -805,6 +806,8 @@ static void prepare_pack(const char *full_name, size_t 
full_name_len,
size_t base_len = full_name_len;
 
if (strip_suffix_mem(full_name, _len, ".idx")) {
+   if (data->m && midx_contains_pack(data->m, file_name))
+   return;
/* Don't reopen a pack we already have. */
for (p = data->r->objects->packed_git; p; p = p->next) {
size_t len;
@@ -839,6 +842,12 @@ static void prepare_packed_git_one(struct repository *r, 
char *objdir, int local
struct prepare_pack_data data;
struct string_list garbage = STRING_LIST_INIT_DUP;
 
+   data.m = r->objects->multi_pack_index;
+
+   /* look for the multi-pack-index for this object directory */
+   while (data.m && strcmp(data.m->object_dir, objdir))
+   data.m = data.m->next;
+
data.r = r;
data.garbage = 
data.local = local;
-- 
2.18.0.118.gd4f65b8d14

[PATCH v4 09/23] multi-pack-index: read packfile list

2018-07-12 Thread Derrick Stolee

When constructing a multi-pack-index file for a given object directory,
read the files within the enclosed pack directory and find matches that
end with ".idx" and find the correct paired packfile using
add_packed_git().

Signed-off-by: Derrick Stolee 
---
 midx.c  | 48 -
 t/t5319-multi-pack-index.sh | 15 ++--
 2 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/midx.c b/midx.c
index c1ff5acf85..f742d7ccd7 100644
--- a/midx.c
+++ b/midx.c
@@ -1,6 +1,8 @@
 #include "cache.h"
 #include "csum-file.h"
+#include "dir.h"
 #include "lockfile.h"
+#include "packfile.h"
 #include "object-store.h"
 #include "midx.h"
 
@@ -109,12 +111,41 @@ static size_t write_midx_header(struct hashfile *f,
return MIDX_HEADER_SIZE;
 }
 
+struct pack_list {
+   struct packed_git **list;
+   uint32_t nr;
+   uint32_t alloc_list;
+};
+
+static void add_pack_to_midx(const char *full_path, size_t full_path_len,
+const char *file_name, void *data)
+{
+   struct pack_list *packs = (struct pack_list *)data;
+
+   if (ends_with(file_name, ".idx")) {
+   ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list);
+
+   packs->list[packs->nr] = add_packed_git(full_path,
+   full_path_len,
+   0);
+   if (!packs->list[packs->nr]) {
+   warning(_("failed to add packfile '%s'"),
+   full_path);
+   return;
+   }
+
+   packs->nr++;
+   }
+}
+
 int write_midx_file(const char *object_dir)
 {
unsigned char num_chunks = 0;
char *midx_name;
+   uint32_t i;
struct hashfile *f = NULL;
struct lock_file lk;
+   struct pack_list packs;
 
midx_name = get_midx_filename(object_dir);
if (safe_create_leading_directories(midx_name)) {
@@ -123,14 +154,29 @@ int write_midx_file(const char *object_dir)
  midx_name);
}
 
+   packs.nr = 0;
+   packs.alloc_list = 16;
+   packs.list = NULL;
+   ALLOC_ARRAY(packs.list, packs.alloc_list);
+
+   for_each_file_in_pack_dir(object_dir, add_pack_to_midx, );
+
hold_lock_file_for_update(, midx_name, LOCK_DIE_ON_ERROR);
f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
FREE_AND_NULL(midx_name);
 
-   write_midx_header(f, num_chunks, 0);
+   write_midx_header(f, num_chunks, packs.nr);
 
finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
commit_lock_file();
 
+   for (i = 0; i < packs.nr; i++) {
+   if (packs.list[i]) {
+   close_pack(packs.list[i]);
+   free(packs.list[i]);
+   }
+   }
+
+   free(packs.list);
return 0;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 1240127ec1..54117a7f49 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -4,8 +4,9 @@ test_description='multi-pack-indexes'
 . ./test-lib.sh
 
 midx_read_expect () {
+   NUM_PACKS=$1
cat >expect <<-EOF
-   header: 4d494458 1 0 0
+   header: 4d494458 1 0 $NUM_PACKS
object-dir: .
EOF
test-tool read-midx . >actual &&
@@ -15,7 +16,7 @@ midx_read_expect () {
 test_expect_success 'write midx with no packs' '
test_when_finished rm -f pack/multi-pack-index &&
git multi-pack-index --object-dir=. write &&
-   midx_read_expect
+   midx_read_expect 0
 '
 
 generate_objects () {
@@ -65,13 +66,13 @@ test_expect_success 'write midx with one v1 pack' '
pack=$(git pack-objects --index-version=1 pack/test

[PATCH v4 13/23] midx: write object ids in a chunk

2018-07-12 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt |  4 +++
 midx.c  | 47 +++--
 midx.h  |  1 +
 t/helper/test-read-midx.c   |  2 ++
 t/t5319-multi-pack-index.sh |  4 +--
 5 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 6c5a77475f..78ee0489c6 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -302,6 +302,10 @@ CHUNK DATA:
name. This is the only chunk not guaranteed to be a multiple of four
bytes in length, so should be the last chunk for alignment reasons.
 
+   OID Lookup (ID: {'O', 'I', 'D', 'L'})
+   The OIDs for all objects in the MIDX are stored in lexicographic
+   order in this chunk.
+
(This section intentionally left incomplete.)
 
 TRAILER:
diff --git a/midx.c b/midx.c
index 29f8de5ee6..3f113e1beb 100644
--- a/midx.c
+++ b/midx.c
@@ -18,9 +18,10 @@
 #define MIDX_HASH_LEN 20
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
-#define MIDX_MAX_CHUNKS 1
+#define MIDX_MAX_CHUNKS 2
 #define MIDX_CHUNK_ALIGNMENT 4
 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
+#define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
 #define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
 
 static char *get_midx_filename(const char *object_dir)
@@ -101,6 +102,10 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
m->chunk_pack_names = m->data + chunk_offset;
break;
 
+   case MIDX_CHUNKID_OIDLOOKUP:
+   m->chunk_oid_lookup = m->data + chunk_offset;
+   break;
+
case 0:
die(_("terminating multi-pack-index chunk id 
appears earlier than expected"));
break;
@@ -116,6 +121,8 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
 
if (!m->chunk_pack_names)
die(_("multi-pack-index missing required pack-name chunk"));
+   if (!m->chunk_oid_lookup)
+   die(_("multi-pack-index missing required OID lookup chunk"));
 
m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names));
 
@@ -382,6 +389,32 @@ static size_t write_midx_pack_names(struct hashfile *f,
return written;
 }
 
+static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len,
+   struct pack_midx_entry *objects,
+   uint32_t nr_objects)
+{
+   struct pack_midx_entry *list = objects;
+   uint32_t i;
+   size_t written = 0;
+
+   for (i = 0; i < nr_objects; i++) {
+   struct pack_midx_entry *obj = list++;
+
+   if (i < nr_objects - 1) {
+   struct pack_midx_entry *next = list;
+   if (oidcmp(>oid, >oid) >= 0)
+   BUG("OIDs not in order: %s >= %s",
+   oid_to_hex(>oid),
+   oid_to_hex(>oid));
+   }
+
+   hashwrite(f, obj->oid.hash, (int)hash_len);
+   written += hash_len;
+   }
+
+   return written;
+}
+
 int write_midx_file(const char *object_dir)
 {
unsigned char cur_chunk, num_chunks = 0;
@@ -428,7 +461,7 @@ int write_midx_file(const char *object_dir)
FREE_AND_NULL(midx_name);
 
cur_chunk = 0;
-   num_chunks = 1;
+   num_chunks = 2;
 
written = write_midx_header(f, num_chunks, packs.nr);
 
@@ -436,9 +469,13 @@ int write_midx_file(const char *object_dir)
chunk_offsets[cur_chunk] = written + (num_chunks + 1) * 
MIDX_CHUNKLOOKUP_WIDTH;
 
cur_chunk++;
-   chunk_ids[cur_chunk] = 0;
+   chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP;
chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + 
packs.pack_name_concat_len;
 
+   cur_chunk++;
+   chunk_ids[cur_chunk] = 0;
+   chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * 
MIDX_HASH_LEN;
+
for (i = 0; i <= num_chunks; i++) {
if (i && chunk_offsets[i] < chunk_offsets[i - 1])
BUG("incorrect chunk offsets: %"PRIu64" before %"PRIu64,
@@ -468,6 +505,10 @@ int write_midx_file(const char *object_dir)
written += write_midx_pack_names(f, 
packs.names, packs.nr);
break;
 
+   case MIDX_CHUNKID_OIDLOOKUP:
+   written += write_midx_oid_lookup(f, 
MIDX_HASH_LEN, entries, nr_entries);
+   break;
+
default:

[PATCH v4 16/23] config: create core.multiPackIndex setting

2018-07-12 Thread Derrick Stolee

The core.multiPackIndex config setting controls the multi-pack-
index (MIDX) feature. If false, the setting will disable all reads
from the multi-pack-index file.

Read this config setting in the new prepare_multi_pack_index_one()
which is called during prepare_packed_git(). This check is run once
per repository.

Add comparison commands in t5319-multi-pack-index.sh to check
typical Git behavior remains the same as the config setting is turned
on and off. This currently includes 'git rev-list' and 'git log'
commands to trigger several object database reads. Currently, these
would only catch an error in the prepare_multi_pack_index_one(), but
with later commits will catch errors in object lookups, abbreviations,
and approximate object counts.

Signed-off-by: Derrick Stolee 

midx: prepare midxed_git struct

Signed-off-by: Derrick Stolee 
---
 Documentation/config.txt|  5 
 midx.c  | 25 ++
 midx.h  |  5 
 object-store.h  |  7 +
 packfile.c  |  6 -
 t/t5319-multi-pack-index.sh | 51 +++--
 6 files changed, 85 insertions(+), 14 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a9..25f817ca42 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -908,6 +908,11 @@ core.commitGraph::
Enable git commit graph feature. Allows reading from the
commit-graph file.
 
+core.multiPackIndex::
+   Use the multi-pack-index file to track multiple packfiles using a
+   single index. See link:technical/multi-pack-index.html[the
+   multi-pack-index design document].
+
 core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
linkgit:git-read-tree[1] for more information.
diff --git a/midx.c b/midx.c
index e83110ae92..4090cf4ca4 100644
--- a/midx.c
+++ b/midx.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "config.h"
 #include "csum-file.h"
 #include "dir.h"
 #include "lockfile.h"
@@ -177,6 +178,30 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
return NULL;
 }
 
+int prepare_multi_pack_index_one(struct repository *r, const char *object_dir)
+{
+   struct multi_pack_index *m = r->objects->multi_pack_index;
+   struct multi_pack_index *m_search;
+   int config_value;
+
+   if (repo_config_get_bool(r, "core.multipackindex", _value) ||
+   !config_value)
+   return 0;
+
+   for (m_search = m; m_search; m_search = m_search->next)
+   if (!strcmp(object_dir, m_search->object_dir))
+   return 1;
+
+   r->objects->multi_pack_index = load_multi_pack_index(object_dir);
+
+   if (r->objects->multi_pack_index) {
+   r->objects->multi_pack_index->next = m;
+   return 1;
+   }
+
+   return 0;
+}
+
 static size_t write_midx_header(struct hashfile *f,
unsigned char num_chunks,
uint32_t num_packs)
diff --git a/midx.h b/midx.h
index e15966272f..9bcfc82d2e 100644
--- a/midx.h
+++ b/midx.h
@@ -1,7 +1,11 @@
 #ifndef __MIDX_H__
 #define __MIDX_H__
 
+#include "repository.h"
+
 struct multi_pack_index {
+   struct multi_pack_index *next;
+
int fd;
 
const unsigned char *data;
@@ -25,6 +29,7 @@ struct multi_pack_index {
 };
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir);
+int prepare_multi_pack_index_one(struct repository *r, const char *object_dir);
 
 int write_midx_file(const char *object_dir);
 
diff --git a/object-store.h b/object-store.h
index 13a766aea8..c2b162489a 100644
--- a/object-store.h
+++ b/object-store.h
@@ -105,6 +105,13 @@ struct raw_object_store {
 */
struct oidmap *replace_map;
 
+   /*
+* private data
+*
+* should only be accessed directly by packfile.c and midx.c
+*/
+   struct multi_pack_index *multi_pack_index;
+
/*
 * private data
 *
diff --git a/packfile.c b/packfile.c
index 3d652212c6..5d4493dbf4 100644
--- a/packfile.c
+++ b/packfile.c
@@ -15,6 +15,7 @@
 #include "tree-walk.h"
 #include "tree.h"
 #include "object-store.h"
+#include "midx.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -935,10 +936,13 @@ static void prepare_packed_git(struct repository *r)
 
if (r->objects->packed_git_initialized)
return;
+   prepare_multi_pack_index_one(r, r->objects->objectdir);
prepare_packed_git_one(r, r->objects->objectdir, 1);
prepare_alt_odb(r);
-   for (alt = r->objects->alt_odb_list; alt; alt = alt->next)
+   for (alt = r->objects->alt_odb_list; alt; alt = alt->next) {
+   prepare_multi_pack_index_one(r, alt->path);
prepare_packed_git_one(r, alt->path, 0);
+   }
rearrange_packed_git(r);

[PATCH v4 18/23] midx: use midx in abbreviation calculations

2018-07-12 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 midx.c  | 11 +
 midx.h  |  3 +++
 packfile.c  |  6 +
 packfile.h  |  1 +
 sha1-name.c | 70 +
 5 files changed, 91 insertions(+)

diff --git a/midx.c b/midx.c
index 182535933c..4e014ff6e3 100644
--- a/midx.c
+++ b/midx.c
@@ -203,6 +203,17 @@ int bsearch_midx(const struct object_id *oid, struct 
multi_pack_index *m, uint32
MIDX_HASH_LEN, result);
 }
 
+struct object_id *nth_midxed_object_oid(struct object_id *oid,
+   struct multi_pack_index *m,
+   uint32_t n)
+{
+   if (n >= m->num_objects)
+   return NULL;
+
+   hashcpy(oid->hash, m->chunk_oid_lookup + m->hash_len * n);
+   return oid;
+}
+
 static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 {
const unsigned char *offset_data;
diff --git a/midx.h b/midx.h
index 377838c9ca..1b976df873 100644
--- a/midx.h
+++ b/midx.h
@@ -31,6 +31,9 @@ struct multi_pack_index {
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir);
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, 
uint32_t *result);
+struct object_id *nth_midxed_object_oid(struct object_id *oid,
+   struct multi_pack_index *m,
+   uint32_t n);
 int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct 
multi_pack_index *m);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir);
 
diff --git a/packfile.c b/packfile.c
index bc763d91b9..c0eb5ac885 100644
--- a/packfile.c
+++ b/packfile.c
@@ -961,6 +961,12 @@ struct packed_git *get_packed_git(struct repository *r)
return r->objects->packed_git;
 }
 
+struct multi_pack_index *get_multi_pack_index(struct repository *r)
+{
+   prepare_packed_git(r);
+   return r->objects->multi_pack_index;
+}
+
 struct list_head *get_packed_git_mru(struct repository *r)
 {
prepare_packed_git(r);
diff --git a/packfile.h b/packfile.h
index b0eed44c0b..046280caf3 100644
--- a/packfile.h
+++ b/packfile.h
@@ -45,6 +45,7 @@ extern void install_packed_git(struct repository *r, struct 
packed_git *pack);
 
 struct packed_git *get_packed_git(struct repository *r);
 struct list_head *get_packed_git_mru(struct repository *r);
+struct multi_pack_index *get_multi_pack_index(struct repository *r);
 
 /*
  * Give a rough count of objects in the repository. This sacrifices accuracy
diff --git a/sha1-name.c b/sha1-name.c
index 60d9ef3c7e..7dc71201e6 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -12,6 +12,7 @@
 #include "packfile.h"
 #include "object-store.h"
 #include "repository.h"
+#include "midx.h"
 
 static int get_oid_oneline(const char *, struct object_id *, struct 
commit_list *);
 
@@ -149,6 +150,32 @@ static int match_sha(unsigned len, const unsigned char *a, 
const unsigned char *
return 1;
 }
 
+static void unique_in_midx(struct multi_pack_index *m,
+  struct disambiguate_state *ds)
+{
+   uint32_t num, i, first = 0;
+   const struct object_id *current = NULL;
+   num = m->num_objects;
+
+   if (!num)
+   return;
+
+   bsearch_midx(>bin_pfx, m, );
+
+   /*
+* At this point, "first" is the location of the lowest object
+* with an object name that could match "bin_pfx".  See if we have
+* 0, 1 or more objects that actually match(es).
+*/
+   for (i = first; i < num && !ds->ambiguous; i++) {
+   struct object_id oid;
+   current = nth_midxed_object_oid(, m, i);
+   if (!match_sha(ds->len, ds->bin_pfx.hash, current->hash))
+   break;
+   update_candidates(ds, current);
+   }
+}
+
 static void unique_in_pack(struct packed_git *p,
   struct disambiguate_state *ds)
 {
@@ -177,8 +204,12 @@ static void unique_in_pack(struct packed_git *p,
 
 static void find_short_packed_object(struct disambiguate_state *ds)
 {
+   struct multi_pack_index *m;
struct packed_git *p;
 
+   for (m = get_multi_pack_index(the_repository); m && !ds->ambiguous;
+m = m->next)
+   unique_in_midx(m, ds);
for (p = get_packed_git(the_repository); p && !ds->ambiguous;
 p = p->next)
unique_in_pack(p, ds);
@@ -527,6 +558,42 @@ static int extend_abbrev_len(const struct object_id *oid, 
void *cb_data)
return 0;
 }
 
+static void find_abbrev_len_for_midx(struct multi_pack_index *m,
+struct min_abbrev_data *mad)
+{
+   int match = 0;
+   uint32_t num, first = 0;
+   struct object_id oid;
+   const struct object_id *mad_oid;
+
+   if (!m->num_objects)
+   return;
+
+   num = m->num_objects;
+   mad_oid = mad->oid;
+   match =

[PATCH v4 08/23] packfile: generalize pack directory list

2018-07-12 Thread Derrick Stolee

In anticipation of sharing the pack directory listing with the
multi-pack-index, generalize prepare_packed_git_one() into
for_each_file_in_pack_dir().

Signed-off-by: Derrick Stolee 
---
 packfile.c | 101 +
 packfile.h |   6 
 2 files changed, 69 insertions(+), 38 deletions(-)

diff --git a/packfile.c b/packfile.c
index 7cd45aa4b2..ee1ab9b804 100644
--- a/packfile.c
+++ b/packfile.c
@@ -738,13 +738,14 @@ static void report_pack_garbage(struct string_list *list)
report_helper(list, seen_bits, first, list->nr);
 }
 
-static void prepare_packed_git_one(struct repository *r, char *objdir, int 
local)
+void for_each_file_in_pack_dir(const char *objdir,
+  each_file_in_pack_dir_fn fn,
+  void *data)
 {
struct strbuf path = STRBUF_INIT;
size_t dirnamelen;
DIR *dir;
struct dirent *de;
-   struct string_list garbage = STRING_LIST_INIT_DUP;
 
strbuf_addstr(, objdir);
strbuf_addstr(, "/pack");
@@ -759,53 +760,77 @@ static void prepare_packed_git_one(struct repository *r, 
char *objdir, int local
strbuf_addch(, '/');
dirnamelen = path.len;
while ((de = readdir(dir)) != NULL) {
-   struct packed_git *p;
-   size_t base_len;
-
if (is_dot_or_dotdot(de->d_name))
continue;
 
strbuf_setlen(, dirnamelen);
strbuf_addstr(, de->d_name);
 
-   base_len = path.len;
-   if (strip_suffix_mem(path.buf, _len, ".idx")) {
-   /* Don't reopen a pack we already have. */
-   for (p = r->objects->packed_git; p;
-p = p->next) {
-   size_t len;
-   if (strip_suffix(p->pack_name, ".pack", ) &&
-   len == base_len &&
-   !memcmp(p->pack_name, path.buf, len))
-   break;
-   }
-   if (p == NULL &&
-   /*
-* See if it really is a valid .idx file with
-* corresponding .pack file that we can map.
-*/
-   (p = add_packed_git(path.buf, path.len, local)) != 
NULL)
-   install_packed_git(r, p);
-   }
-
-   if (!report_garbage)
-   continue;
-
-   if (ends_with(de->d_name, ".idx") ||
-   ends_with(de->d_name, ".pack") ||
-   ends_with(de->d_name, ".bitmap") ||
-   ends_with(de->d_name, ".keep") ||
-   ends_with(de->d_name, ".promisor"))
-   string_list_append(, path.buf);
-   else
-   report_garbage(PACKDIR_FILE_GARBAGE, path.buf);
+   fn(path.buf, path.len, de->d_name, data);
}
+
closedir(dir);
-   report_pack_garbage();
-   string_list_clear(, 0);
strbuf_release();
 }
 
+struct prepare_pack_data {
+   struct repository *r;
+   struct string_list *garbage;
+   int local;
+};
+
+static void prepare_pack(const char *full_name, size_t full_name_len,
+const char *file_name, void *_data)
+{
+   struct prepare_pack_data *data = (struct prepare_pack_data *)_data;
+   struct packed_git *p;
+   size_t base_len = full_name_len;
+
+   if (strip_suffix_mem(full_name, _len, ".idx")) {
+   /* Don't reopen a pack we already have. */
+   for (p = data->r->objects->packed_git; p; p = p->next) {
+   size_t len;
+   if (strip_suffix(p->pack_name, ".pack", ) &&
+   len == base_len &&
+   !memcmp(p->pack_name, full_name, len))
+   break;
+   }
+
+   if (!p) {
+   p = add_packed_git(full_name, full_name_len, 
data->local);
+   if (p)
+   install_packed_git(data->r, p);
+   }
+   }
+
+   if (!report_garbage)
+   return;
+
+   if (ends_with(file_name, ".idx") ||
+   ends_with(file_name, ".pack") ||
+   ends_with(file_name, ".bitmap") ||
+   ends_with(file_name, ".keep") ||
+   ends_with(file_name, ".promisor"))
+   string_list_append(data->garbage, full_name);
+   else
+   report_garbage(PACKDIR_FILE_GARBAGE, full_name);
+}
+
+static void prepare_packed_git_one(struct repository *r, char *objdir, int 
local)
+{
+   struct prepare_pack_data data;
+   struct string_list garbage = STRING_LIST_INIT_DUP;
+
+   data.r = r;
+   data.garbage = 
+

[PATCH v4 17/23] midx: read objects from multi-pack-index

2018-07-12 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 midx.c | 91 +-
 midx.h |  3 ++
 packfile.c |  8 -
 3 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 4090cf4ca4..182535933c 100644
--- a/midx.c
+++ b/midx.c
@@ -5,7 +5,7 @@
 #include "lockfile.h"
 #include "packfile.h"
 #include "object-store.h"
-#include "packfile.h"
+#include "sha1-lookup.h"
 #include "midx.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
@@ -151,6 +151,7 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
m->num_objects = ntohl(m->chunk_oid_fanout[255]);
 
m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names));
+   m->packs = xcalloc(m->num_packs, sizeof(*m->packs));
 
cur_pack_name = (const char *)m->chunk_pack_names;
for (i = 0; i < m->num_packs; i++) {
@@ -178,6 +179,94 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
return NULL;
 }
 
+static int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id)
+{
+   struct strbuf pack_name = STRBUF_INIT;
+
+   if (pack_int_id >= m->num_packs)
+   BUG("bad pack-int-id");
+
+   if (m->packs[pack_int_id])
+   return 0;
+
+   strbuf_addf(_name, "%s/pack/%s", m->object_dir,
+   m->pack_names[pack_int_id]);
+
+   m->packs[pack_int_id] = add_packed_git(pack_name.buf, pack_name.len, 1);
+   strbuf_release(_name);
+   return !m->packs[pack_int_id];
+}
+
+int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, 
uint32_t *result)
+{
+   return bsearch_hash(oid->hash, m->chunk_oid_fanout, m->chunk_oid_lookup,
+   MIDX_HASH_LEN, result);
+}
+
+static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
+{
+   const unsigned char *offset_data;
+   uint32_t offset32;
+
+   offset_data = m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH;
+   offset32 = get_be32(offset_data + sizeof(uint32_t));
+
+   if (m->chunk_large_offsets && offset32 & MIDX_LARGE_OFFSET_NEEDED) {
+   if (sizeof(offset32) < sizeof(uint64_t))
+   die(_("multi-pack-index stores a 64-bit offset, but 
off_t is too small"));
+
+   offset32 ^= MIDX_LARGE_OFFSET_NEEDED;
+   return get_be64(m->chunk_large_offsets + sizeof(uint64_t) * 
offset32);
+   }
+
+   return offset32;
+}
+
+static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t 
pos)
+{
+   return get_be32(m->chunk_object_offsets + pos * 
MIDX_CHUNK_OFFSET_WIDTH);
+}
+
+static int nth_midxed_pack_entry(struct multi_pack_index *m, struct pack_entry 
*e, uint32_t pos)
+{
+   uint32_t pack_int_id;
+   struct packed_git *p;
+
+   if (pos >= m->num_objects)
+   return 0;
+
+   pack_int_id = nth_midxed_pack_int_id(m, pos);
+
+   if (prepare_midx_pack(m, pack_int_id))
+   die(_("error preparing packfile from multi-pack-index"));
+   p = m->packs[pack_int_id];
+
+   /*
+   * We are about to tell the caller where they can locate the
+   * requested object.  We better make sure the packfile is
+   * still here and can be accessed before supplying that
+   * answer, as it may have been deleted since the MIDX was
+   * loaded!
+   */
+   if (!is_pack_valid(p))
+   return 0;
+
+   e->offset = nth_midxed_offset(m, pos);
+   e->p = p;
+
+   return 1;
+}
+
+int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct 
multi_pack_index *m)
+{
+   uint32_t pos;
+
+   if (!bsearch_midx(oid, m, ))
+   return 0;
+
+   return nth_midxed_pack_entry(m, e, pos);
+}
+
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir)
 {
struct multi_pack_index *m = r->objects->multi_pack_index;
diff --git a/midx.h b/midx.h
index 9bcfc82d2e..377838c9ca 100644
--- a/midx.h
+++ b/midx.h
@@ -25,10 +25,13 @@ struct multi_pack_index {
const unsigned char *chunk_large_offsets;
 
const char **pack_names;
+   struct packed_git **packs;
char object_dir[FLEX_ARRAY];
 };
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir);
+int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, 
uint32_t *result);
+int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct 
multi_pack_index *m);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir);
 
 int write_midx_file(const char *object_dir);
diff --git a/packfile.c b/packfile.c
index 5d4493dbf4..bc763d91b9 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1902,11 +1902,17 @@ static int fill_pack_entry(const struct object_id *oid,
 int find_pack_entry(struct repository *r, const struct object_id *oid, struct 
pack_entry *e)
 {
struct list_head *pos;
+   struct

[PATCH v4 12/23] midx: sort and deduplicate objects from packfiles

2018-07-12 Thread Derrick Stolee

Before writing a list of objects and their offsets to a multi-pack-index,
we need to collect the list of objects contained in the packfiles. There
may be multiple copies of some objects, so this list must be deduplicated.

It is possible to artificially get into a state where there are many
duplicate copies of objects. That can create high memory pressure if we
are to create a list of all objects before de-duplication. To reduce
this memory pressure without a significant performance drop,
automatically group objects by the first byte of their object id. Use
the IDX fanout tables to group the data, copy to a local array, then
sort.

Copy only the de-duplicated entries. Select the duplicate based on the
most-recent modified time of a packfile containing the object.

Signed-off-by: Derrick Stolee 
---
 midx.c | 128 +
 packfile.c |  17 +++
 packfile.h |   2 +
 3 files changed, 147 insertions(+)

diff --git a/midx.c b/midx.c
index fcdf6553ce..29f8de5ee6 100644
--- a/midx.c
+++ b/midx.c
@@ -4,6 +4,7 @@
 #include "lockfile.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "packfile.h"
 #include "midx.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
@@ -182,12 +183,21 @@ static void add_pack_to_midx(const char *full_path, 
size_t full_path_len,
packs->list[packs->nr] = add_packed_git(full_path,
full_path_len,
0);
+
if (!packs->list[packs->nr]) {
warning(_("failed to add packfile '%s'"),
full_path);
return;
}
 
+   if (open_pack_index(packs->list[packs->nr])) {
+   warning(_("failed to open pack-index '%s'"),
+   full_path);
+   close_pack(packs->list[packs->nr]);
+   FREE_AND_NULL(packs->list[packs->nr]);
+   return;
+   }
+
packs->names[packs->nr] = xstrdup(file_name);
packs->pack_name_concat_len += strlen(file_name) + 1;
packs->nr++;
@@ -228,6 +238,119 @@ static void sort_packs_by_name(char **pack_names, 
uint32_t nr_packs, uint32_t *p
free(pairs);
 }
 
+struct pack_midx_entry {
+   struct object_id oid;
+   uint32_t pack_int_id;
+   time_t pack_mtime;
+   uint64_t offset;
+};
+
+static int midx_oid_compare(const void *_a, const void *_b)
+{
+   const struct pack_midx_entry *a = (const struct pack_midx_entry *)_a;
+   const struct pack_midx_entry *b = (const struct pack_midx_entry *)_b;
+   int cmp = oidcmp(>oid, >oid);
+
+   if (cmp)
+   return cmp;
+
+   if (a->pack_mtime > b->pack_mtime)
+   return -1;
+   else if (a->pack_mtime < b->pack_mtime)
+   return 1;
+
+   return a->pack_int_id - b->pack_int_id;
+}
+
+static void fill_pack_entry(uint32_t pack_int_id,
+   struct packed_git *p,
+   uint32_t cur_object,
+   struct pack_midx_entry *entry)
+{
+   if (!nth_packed_object_oid(>oid, p, cur_object))
+   die(_("failed to locate object %d in packfile"), cur_object);
+
+   entry->pack_int_id = pack_int_id;
+   entry->pack_mtime = p->mtime;
+
+   entry->offset = nth_packed_object_offset(p, cur_object);
+}
+
+/*
+ * It is possible to artificially get into a state where there are many
+ * duplicate copies of objects. That can create high memory pressure if
+ * we are to create a list of all objects before de-duplication. To reduce
+ * this memory pressure without a significant performance drop, automatically
+ * group objects by the first byte of their object id. Use the IDX fanout
+ * tables to group the data, copy to a local array, then sort.
+ *
+ * Copy only the de-duplicated entries (selected by most-recent modified time
+ * of a packfile containing the object).
+ */
+static struct pack_midx_entry *get_sorted_entries(struct packed_git **p,
+ uint32_t *perm,
+ uint32_t nr_packs,
+ uint32_t *nr_objects)
+{
+   uint32_t cur_fanout, cur_pack, cur_object;
+   uint32_t alloc_fanout, alloc_objects, total_objects = 0;
+   struct pack_midx_entry *entries_by_fanout = NULL;
+   struct pack_midx_entry *deduplicated_entries = NULL;
+
+   for (cur_pack = 0; cur_pack < nr_packs; cur_pack++)
+   total_objects += p[cur_pack]->num_objects;
+
+   /*
+* As we de-duplicate by fanout value, we expect the fanout
+* slices to be evenly distributed, with some noise. Hence,
+* allocate slightly more than one 256th.
+*/
+   alloc_objects = alloc_fanout

[PATCH v4 20/23] midx: use midx in approximate_object_count

2018-07-12 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 packfile.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/packfile.c b/packfile.c
index c0eb5ac885..97e7812b6b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -861,10 +861,13 @@ unsigned long approximate_object_count(void)
 {
if (!the_repository->objects->approximate_object_count_valid) {
unsigned long count;
+   struct multi_pack_index *m;
struct packed_git *p;
 
prepare_packed_git(the_repository);
count = 0;
+   for (m = get_multi_pack_index(the_repository); m; m = m->next)
+   count += m->num_objects;
for (p = the_repository->objects->packed_git; p; p = p->next) {
if (open_pack_index(p))
continue;
-- 
2.18.0.118.gd4f65b8d14

[PATCH v4 15/23] midx: write object offsets

2018-07-12 Thread Derrick Stolee

The final pair of chunks for the multi-pack-index file stores the object
offsets. We default to using 32-bit offsets as in the pack-index version
1 format, but if there exists an offset larger than 32-bits, we use a
trick similar to the pack-index version 2 format by storing all offsets
at least 2^31 in a 64-bit table; we use the 32-bit table to point into
that 64-bit table as necessary.

We only store these 64-bit offsets if necessary, so create a test that
manipulates a version 2 pack-index to fake a large offset. This allows
us to test that the large offset table is created, but the data does not
match the actual packfile offsets. The multi-pack-index offset does match
the (corrupted) pack-index offset, so a future feature will compare these
offsets during a 'verify' step.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt |  15 +++-
 midx.c  | 100 +++-
 midx.h  |   2 +
 t/helper/test-read-midx.c   |   4 +
 t/t5319-multi-pack-index.sh |  49 +---
 5 files changed, 155 insertions(+), 15 deletions(-)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 3215f7bfcd..cab5bdd2ff 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -311,7 +311,20 @@ CHUNK DATA:
The OIDs for all objects in the MIDX are stored in lexicographic
order in this chunk.
 
-   (This section intentionally left incomplete.)
+   Object Offsets (ID: {'O', 'O', 'F', 'F'})
+   Stores two 4-byte values for every object.
+   1: The pack-int-id for the pack storing this object.
+   2: The offset within the pack.
+   If all offsets are less than 2^31, then the large offset chunk
+   will not exist and offsets are stored as in IDX v1.
+   If there is at least one offset value larger than 2^32-1, then
+   the large offset chunk must exist. If the large offset chunk
+   exists and the 31st bit is on, then removing that bit reveals
+   the row in the large offsets containing the 8-byte offset of
+   this object.
+
+   [Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
+   8-byte offsets into large packfiles.
 
 TRAILER:
 
diff --git a/midx.c b/midx.c
index 7a954eb0cd..e83110ae92 100644
--- a/midx.c
+++ b/midx.c
@@ -18,13 +18,18 @@
 #define MIDX_HASH_LEN 20
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
-#define MIDX_MAX_CHUNKS 3
+#define MIDX_MAX_CHUNKS 5
 #define MIDX_CHUNK_ALIGNMENT 4
 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
 #define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
 #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
+#define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
 #define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
 #define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
+#define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
+#define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
+#define MIDX_LARGE_OFFSET_NEEDED 0x8000
 
 static char *get_midx_filename(const char *object_dir)
 {
@@ -112,6 +117,14 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
m->chunk_oid_lookup = m->data + chunk_offset;
break;
 
+   case MIDX_CHUNKID_OBJECTOFFSETS:
+   m->chunk_object_offsets = m->data + 
chunk_offset;
+   break;
+
+   case MIDX_CHUNKID_LARGEOFFSETS:
+   m->chunk_large_offsets = m->data + chunk_offset;
+   break;
+
case 0:
die(_("terminating multi-pack-index chunk id 
appears earlier than expected"));
break;
@@ -131,6 +144,8 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
die(_("multi-pack-index missing required OID fanout chunk"));
if (!m->chunk_oid_lookup)
die(_("multi-pack-index missing required OID lookup chunk"));
+   if (!m->chunk_object_offsets)
+   die(_("multi-pack-index missing required object offsets 
chunk"));
 
m->num_objects = ntohl(m->chunk_oid_fanout[255]);
 
@@ -454,6 +469,56 @@ static size_t write_midx_oid_lookup(struct hashfile *f, 
unsigned char hash_len,
return written;
 }
 
+static size_t write_midx_object_offsets(struct hashfile *f, int 
large_offset_needed,
+   struct pack_midx_entry *objects, 
uint32_t nr_objects)
+{
+   struct pack_midx_entry *list = objects;
+   uint32_t i, nr_large_offset = 0;
+   size_t written = 0;
+
+   for

[PATCH v4 14/23] midx: write object id fanout chunk

2018-07-12 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt |  5 +++
 midx.c  | 53 +++--
 midx.h  |  1 +
 t/helper/test-read-midx.c   |  4 +-
 t/t5319-multi-pack-index.sh | 16 
 5 files changed, 68 insertions(+), 11 deletions(-)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 78ee0489c6..3215f7bfcd 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -302,6 +302,11 @@ CHUNK DATA:
name. This is the only chunk not guaranteed to be a multiple of four
bytes in length, so should be the last chunk for alignment reasons.
 
+   OID Fanout (ID: {'O', 'I', 'D', 'F'})
+   The ith entry, F[i], stores the number of OIDs with first
+   byte at most i. Thus F[255] stores the total
+   number of objects.
+
OID Lookup (ID: {'O', 'I', 'D', 'L'})
The OIDs for all objects in the MIDX are stored in lexicographic
order in this chunk.
diff --git a/midx.c b/midx.c
index 3f113e1beb..7a954eb0cd 100644
--- a/midx.c
+++ b/midx.c
@@ -18,11 +18,13 @@
 #define MIDX_HASH_LEN 20
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
-#define MIDX_MAX_CHUNKS 2
+#define MIDX_MAX_CHUNKS 3
 #define MIDX_CHUNK_ALIGNMENT 4
 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
+#define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
 #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
 #define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
+#define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
 
 static char *get_midx_filename(const char *object_dir)
 {
@@ -102,6 +104,10 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
m->chunk_pack_names = m->data + chunk_offset;
break;
 
+   case MIDX_CHUNKID_OIDFANOUT:
+   m->chunk_oid_fanout = (uint32_t *)(m->data + 
chunk_offset);
+   break;
+
case MIDX_CHUNKID_OIDLOOKUP:
m->chunk_oid_lookup = m->data + chunk_offset;
break;
@@ -121,9 +127,13 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
 
if (!m->chunk_pack_names)
die(_("multi-pack-index missing required pack-name chunk"));
+   if (!m->chunk_oid_fanout)
+   die(_("multi-pack-index missing required OID fanout chunk"));
if (!m->chunk_oid_lookup)
die(_("multi-pack-index missing required OID lookup chunk"));
 
+   m->num_objects = ntohl(m->chunk_oid_fanout[255]);
+
m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names));
 
cur_pack_name = (const char *)m->chunk_pack_names;
@@ -389,6 +399,35 @@ static size_t write_midx_pack_names(struct hashfile *f,
return written;
 }
 
+static size_t write_midx_oid_fanout(struct hashfile *f,
+   struct pack_midx_entry *objects,
+   uint32_t nr_objects)
+{
+   struct pack_midx_entry *list = objects;
+   struct pack_midx_entry *last = objects + nr_objects;
+   uint32_t count = 0;
+   uint32_t i;
+
+   /*
+   * Write the first-level table (the list is sorted,
+   * but we use a 256-entry lookup to be able to avoid
+   * having to do eight extra binary search iterations).
+   */
+   for (i = 0; i < 256; i++) {
+   struct pack_midx_entry *next = list;
+
+   while (next < last && next->oid.hash[0] == i) {
+   count++;
+   next++;
+   }
+
+   hashwrite_be32(f, count);
+   list = next;
+   }
+
+   return MIDX_CHUNK_FANOUT_SIZE;
+}
+
 static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len,
struct pack_midx_entry *objects,
uint32_t nr_objects)
@@ -461,7 +500,7 @@ int write_midx_file(const char *object_dir)
FREE_AND_NULL(midx_name);
 
cur_chunk = 0;
-   num_chunks = 2;
+   num_chunks = 3;
 
written = write_midx_header(f, num_chunks, packs.nr);
 
@@ -469,9 +508,13 @@ int write_midx_file(const char *object_dir)
chunk_offsets[cur_chunk] = written + (num_chunks + 1) * 
MIDX_CHUNKLOOKUP_WIDTH;
 
cur_chunk++;
-   chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP;
+   chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT;
chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + 
packs.pack_name_concat_len;
 
+   cur_chunk++;
+   chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP;
+   chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + 
MIDX_CHUNK_FANOUT_SIZE;
+

[PATCH v4 11/23] midx: read pack names into array

2018-07-12 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 midx.c  | 17 +
 midx.h  |  1 +
 t/helper/test-read-midx.c   |  5 +
 t/t5319-multi-pack-index.sh | 17 -
 4 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/midx.c b/midx.c
index ca7a32bf95..fcdf6553ce 100644
--- a/midx.c
+++ b/midx.c
@@ -37,6 +37,7 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
uint32_t hash_version;
char *midx_name = get_midx_filename(object_dir);
uint32_t i;
+   const char *cur_pack_name;
 
fd = git_open(midx_name);
 
@@ -115,6 +116,22 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
if (!m->chunk_pack_names)
die(_("multi-pack-index missing required pack-name chunk"));
 
+   m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names));
+
+   cur_pack_name = (const char *)m->chunk_pack_names;
+   for (i = 0; i < m->num_packs; i++) {
+   m->pack_names[i] = cur_pack_name;
+
+   cur_pack_name += strlen(cur_pack_name) + 1;
+
+   if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0) {
+   error(_("multi-pack-index pack names out of order: '%s' 
before '%s'"),
+ m->pack_names[i - 1],
+ m->pack_names[i]);
+   goto cleanup_fail;
+   }
+   }
+
return m;
 
 cleanup_fail:
diff --git a/midx.h b/midx.h
index 38af01fa3b..17b56172e3 100644
--- a/midx.h
+++ b/midx.h
@@ -16,6 +16,7 @@ struct multi_pack_index {
 
const unsigned char *chunk_pack_names;
 
+   const char **pack_names;
char object_dir[FLEX_ARRAY];
 };
 
diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 3f2d2cfa78..76a60d7882 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -6,6 +6,7 @@
 
 static int read_midx_file(const char *object_dir)
 {
+   uint32_t i;
struct multi_pack_index *m = load_multi_pack_index(object_dir);
 
if (!m)
@@ -24,6 +25,10 @@ static int read_midx_file(const char *object_dir)
 
printf("\n");
 
+   printf("packs:\n");
+   for (i = 0; i < m->num_packs; i++)
+   printf("%s\n", m->pack_names[i]);
+
printf("object-dir: %s\n", m->object_dir);
 
return 0;
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 7512d55c92..e8da082c64 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -5,11 +5,18 @@ test_description='multi-pack-indexes'
 
 midx_read_expect () {
NUM_PACKS=$1
-   cat >expect <<-EOF
-   header: 4d494458 1 1 $NUM_PACKS
-   chunks: pack-names
-   object-dir: .
-   EOF
+   {
+   cat <<-EOF &&
+   header: 4d494458 1 1 $NUM_PACKS
+   chunks: pack-names
+   packs:
+   EOF
+   if test $NUM_PACKS -ge 1
+   then
+   ls pack/ | grep idx | sort
+   fi &&
+   printf "object-dir: .\n"
+   } >expect &&
test-tool read-midx . >actual &&
test_cmp expect actual
 }
-- 
2.18.0.118.gd4f65b8d14

[PATCH v4 10/23] multi-pack-index: write pack names in chunk

2018-07-12 Thread Derrick Stolee

The multi-pack-index needs to track which packfiles it indexes. Store
these in our first required chunk. Since filenames are not well
structured, add padding to keep good alignment in later chunks.

Modify the 'git multi-pack-index read' subcommand to output the
existence of the pack-file name chunk. Modify t5319-multi-pack-index.sh
to reflect this new output and the new expected number of chunks.

Defense in depth: A pattern we are using in the multi-pack-index feature
is to verify the data as we write it. We want to ensure we never write
invalid data to the multi-pack-index. There are many checks that verify
that the values we are writing fit the format definitions. This mainly
helps developers while working on the feature, but it can also identify
issues that only appear when dealing with very large data sets. These
large sets are hard to encode into test cases.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt |   6 +
 midx.c  | 174 +++-
 midx.h  |   2 +
 t/helper/test-read-midx.c   |   7 +
 t/t5319-multi-pack-index.sh |   3 +-
 5 files changed, 189 insertions(+), 3 deletions(-)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index e060e693f4..6c5a77475f 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -296,6 +296,12 @@ CHUNK LOOKUP:
 
 CHUNK DATA:
 
+   Packfile Names (ID: {'P', 'N', 'A', 'M'})
+   Stores the packfile names as concatenated, null-terminated strings.
+   Packfiles must be listed in lexicographic order for fast lookups by
+   name. This is the only chunk not guaranteed to be a multiple of four
+   bytes in length, so should be the last chunk for alignment reasons.
+
(This section intentionally left incomplete.)
 
 TRAILER:
diff --git a/midx.c b/midx.c
index f742d7ccd7..ca7a32bf95 100644
--- a/midx.c
+++ b/midx.c
@@ -17,6 +17,11 @@
 #define MIDX_HASH_LEN 20
 #define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
+#define MIDX_MAX_CHUNKS 1
+#define MIDX_CHUNK_ALIGNMENT 4
+#define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
+#define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
+
 static char *get_midx_filename(const char *object_dir)
 {
return xstrfmt("%s/pack/multi-pack-index", object_dir);
@@ -31,6 +36,7 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
void *midx_map = NULL;
uint32_t hash_version;
char *midx_name = get_midx_filename(object_dir);
+   uint32_t i;
 
fd = git_open(midx_name);
 
@@ -82,6 +88,33 @@ struct multi_pack_index *load_multi_pack_index(const char 
*object_dir)
 
m->num_packs = get_be32(m->data + MIDX_BYTE_NUM_PACKS);
 
+   for (i = 0; i < m->num_chunks; i++) {
+   uint32_t chunk_id = get_be32(m->data + MIDX_HEADER_SIZE +
+MIDX_CHUNKLOOKUP_WIDTH * i);
+   uint64_t chunk_offset = get_be64(m->data + MIDX_HEADER_SIZE + 4 
+
+MIDX_CHUNKLOOKUP_WIDTH * i);
+
+   switch (chunk_id) {
+   case MIDX_CHUNKID_PACKNAMES:
+   m->chunk_pack_names = m->data + chunk_offset;
+   break;
+
+   case 0:
+   die(_("terminating multi-pack-index chunk id 
appears earlier than expected"));
+   break;
+
+   default:
+   /*
+* Do nothing on unrecognized chunks, allowing 
future
+* extensions to add optional chunks.
+*/
+   break;
+   }
+   }
+
+   if (!m->chunk_pack_names)
+   die(_("multi-pack-index missing required pack-name chunk"));
+
return m;
 
 cleanup_fail:
@@ -113,8 +146,11 @@ static size_t write_midx_header(struct hashfile *f,
 
 struct pack_list {
struct packed_git **list;
+   char **names;
uint32_t nr;
uint32_t alloc_list;
+   uint32_t alloc_names;
+   size_t pack_name_concat_len;
 };
 
 static void add_pack_to_midx(const char *full_path, size_t full_path_len,
@@ -124,6 +160,7 @@ static void add_pack_to_midx(const char *full_path, size_t 
full_path_len,
 
if (ends_with(file_name, ".idx")) {
ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list);
+   ALLOC_GROW(packs->names, packs->nr + 1, packs->alloc_names);
 
packs->list[packs->nr] = add_packed_git(full_path,
full_path_len,
@@ -134,18 +171,89 @@ static void add_pack_to_midx(const char *full_path, 
size_t full_path_len,

[PATCH v4 07/23] t5319: expand test data

2018-07-12 Thread Derrick Stolee

As we build the multi-pack-index file format, we want to test the format
on real repositories. Add tests that create repository data including
multiple packfiles with both version 1 and version 2 formats.

The current 'git multi-pack-index write' command will always write the
same file with no "real" data. This will be expanded in future commits,
along with the test expectations.

Signed-off-by: Derrick Stolee 
---
 t/t5319-multi-pack-index.sh | 84 +
 1 file changed, 84 insertions(+)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 506bd8abb8..1240127ec1 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -18,4 +18,88 @@ test_expect_success 'write midx with no packs' '
midx_read_expect
 '
 
+generate_objects () {
+   i=$1
+   iii=$(printf '%03i' $i)
+   {
+   test-tool genrandom "bar" 200 &&
+   test-tool genrandom "baz $iii" 50
+   } >wide_delta_$iii &&
+   {
+   test-tool genrandom "foo"$i 100 &&
+   test-tool genrandom "foo"$(( $i + 1 )) 100 &&
+   test-tool genrandom "foo"$(( $i + 2 )) 100
+   } >deep_delta_$iii &&
+   {
+   echo $iii &&
+   test-tool genrandom "$iii" 8192
+   } >file_$iii &&
+   git update-index --add file_$iii deep_delta_$iii wide_delta_$iii
+}
+
+commit_and_list_objects () {
+   {
+   echo 101 &&
+   test-tool genrandom 100 8192;
+   } >file_101 &&
+   git update-index --add file_101 &&
+   tree=$(git write-tree) &&
+   commit=$(git commit-tree $tree -p HEADobj-list &&
+   git reset --hard $commit
+}
+
+test_expect_success 'create objects' '
+   test_commit initial &&
+   for i in $(test_seq 1 5)
+   do
+   generate_objects $i
+   done &&
+   commit_and_list_objects
+'
+
+test_expect_success 'write midx with one v1 pack' '
+   pack=$(git pack-objects --index-version=1 pack/test

[PATCH v4 05/23] midx: write header information to lockfile

2018-07-12 Thread Derrick Stolee

As we begin writing the multi-pack-index format to disk, start with
the basics: the 12-byte header and the 20-byte checksum footer. Start
with these basics so we can add the rest of the format in small
increments.

As we implement the format, we will use a technique to check that our
computed offsets within the multi-pack-index file match what we are
actually writing. Each method that writes to the hashfile will return
the number of bytes written, and we will track that those values match
our expectations.

Currently, write_midx_header() returns 12, but is not checked. We will
check the return value in a later commit.

Signed-off-by: Derrick Stolee 
---
 midx.c  | 50 +
 t/t5319-multi-pack-index.sh |  4 ++-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 32468db1a2..f85f2d334d 100644
--- a/midx.c
+++ b/midx.c
@@ -1,7 +1,57 @@
 #include "cache.h"
+#include "csum-file.h"
+#include "lockfile.h"
 #include "midx.h"
 
+#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
+#define MIDX_VERSION 1
+#define MIDX_HASH_VERSION 1
+#define MIDX_HEADER_SIZE 12
+
+static char *get_midx_filename(const char *object_dir)
+{
+   return xstrfmt("%s/pack/multi-pack-index", object_dir);
+}
+
+static size_t write_midx_header(struct hashfile *f,
+   unsigned char num_chunks,
+   uint32_t num_packs)
+{
+   unsigned char byte_values[4];
+
+   hashwrite_be32(f, MIDX_SIGNATURE);
+   byte_values[0] = MIDX_VERSION;
+   byte_values[1] = MIDX_HASH_VERSION;
+   byte_values[2] = num_chunks;
+   byte_values[3] = 0; /* unused */
+   hashwrite(f, byte_values, sizeof(byte_values));
+   hashwrite_be32(f, num_packs);
+
+   return MIDX_HEADER_SIZE;
+}
+
 int write_midx_file(const char *object_dir)
 {
+   unsigned char num_chunks = 0;
+   char *midx_name;
+   struct hashfile *f = NULL;
+   struct lock_file lk;
+
+   midx_name = get_midx_filename(object_dir);
+   if (safe_create_leading_directories(midx_name)) {
+   UNLEAK(midx_name);
+   die_errno(_("unable to create leading directories of %s"),
+ midx_name);
+   }
+
+   hold_lock_file_for_update(, midx_name, LOCK_DIE_ON_ERROR);
+   f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
+   FREE_AND_NULL(midx_name);
+
+   write_midx_header(f, num_chunks, 0);
+
+   finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
+   commit_lock_file();
+
return 0;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index ec3ddbe79c..50e80f8f2c 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -4,7 +4,9 @@ test_description='multi-pack-indexes'
 . ./test-lib.sh
 
 test_expect_success 'write midx with no packs' '
-   git multi-pack-index --object-dir=. write
+   test_when_finished rm -f pack/multi-pack-index &&
+   git multi-pack-index --object-dir=. write &&
+   test_path_is_file pack/multi-pack-index
 '
 
 test_done
-- 
2.18.0.118.gd4f65b8d14

[PATCH v4 04/23] multi-pack-index: add 'write' verb

2018-07-12 Thread Derrick Stolee

In anticipation of writing multi-pack-indexes, add a skeleton
'git multi-pack-index write' subcommand and send the options to a
write_midx_file() method. Also create a skeleton test script that
tests the 'write' subcommand.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-multi-pack-index.txt | 22 +-
 Makefile   |  1 +
 builtin/multi-pack-index.c | 17 +++--
 midx.c |  7 +++
 midx.h |  6 ++
 t/t5319-multi-pack-index.sh| 10 ++
 6 files changed, 60 insertions(+), 3 deletions(-)
 create mode 100644 midx.c
 create mode 100644 midx.h
 create mode 100755 t/t5319-multi-pack-index.sh

diff --git a/Documentation/git-multi-pack-index.txt 
b/Documentation/git-multi-pack-index.txt
index ec9982cbfc..a62af1caca 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -9,7 +9,7 @@ git-multi-pack-index - Write and verify multi-pack-indexes
 SYNOPSIS
 
 [verse]
-'git multi-pack-index' [--object-dir=]
+'git multi-pack-index' [--object-dir=] 
 
 DESCRIPTION
 ---
@@ -23,6 +23,26 @@ OPTIONS
`/packs/multi-pack-index` for the current MIDX file, and
`/packs` for the pack-files to index.
 
+write::
+   When given as the verb, write a new MIDX file to
+   `/packs/multi-pack-index`.
+
+
+EXAMPLES
+
+
+* Write a MIDX file for the packfiles in the current .git folder.
++
+---
+$ git multi-pack-index write
+---
+
+* Write a MIDX file for the packfiles in an alternate object store.
++
+---
+$ git multi-pack-index --object-dir  write
+---
+
 
 SEE ALSO
 
diff --git a/Makefile b/Makefile
index 54610875ec..f5636c711d 100644
--- a/Makefile
+++ b/Makefile
@@ -890,6 +890,7 @@ LIB_OBJS += merge.o
 LIB_OBJS += merge-blobs.o
 LIB_OBJS += merge-recursive.o
 LIB_OBJS += mergesort.o
+LIB_OBJS += midx.o
 LIB_OBJS += name-hash.o
 LIB_OBJS += notes.o
 LIB_OBJS += notes-cache.o
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 3161ddae86..6a7aa00cf2 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -2,9 +2,10 @@
 #include "cache.h"
 #include "config.h"
 #include "parse-options.h"
+#include "midx.h"
 
 static char const * const builtin_multi_pack_index_usage[] = {
-   N_("git multi-pack-index [--object-dir=]"),
+   N_("git multi-pack-index [--object-dir=] write"),
NULL
 };
 
@@ -30,5 +31,17 @@ int cmd_multi_pack_index(int argc, const char **argv,
if (!opts.object_dir)
opts.object_dir = get_object_directory();
 
-   return 0;
+   if (argc == 0)
+   goto usage;
+
+   if (!strcmp(argv[0], "write")) {
+   if (argc > 1)
+   goto usage;
+
+   return write_midx_file(opts.object_dir);
+   }
+
+usage:
+   usage_with_options(builtin_multi_pack_index_usage,
+  builtin_multi_pack_index_options);
 }
diff --git a/midx.c b/midx.c
new file mode 100644
index 00..32468db1a2
--- /dev/null
+++ b/midx.c
@@ -0,0 +1,7 @@
+#include "cache.h"
+#include "midx.h"
+
+int write_midx_file(const char *object_dir)
+{
+   return 0;
+}
diff --git a/midx.h b/midx.h
new file mode 100644
index 00..dbdbe9f873
--- /dev/null
+++ b/midx.h
@@ -0,0 +1,6 @@
+#ifndef __MIDX_H__
+#define __MIDX_H__
+
+int write_midx_file(const char *object_dir);
+
+#endif
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
new file mode 100755
index 00..ec3ddbe79c
--- /dev/null
+++ b/t/t5319-multi-pack-index.sh
@@ -0,0 +1,10 @@
+#!/bin/sh
+
+test_description='multi-pack-indexes'
+. ./test-lib.sh
+
+test_expect_success 'write midx with no packs' '
+   git multi-pack-index --object-dir=. write
+'
+
+test_done
-- 
2.18.0.118.gd4f65b8d14

[PATCH v4 06/23] multi-pack-index: load into memory

2018-07-12 Thread Derrick Stolee

Create a new multi_pack_index struct for loading multi-pack-indexes into
memory. Create a test-tool builtin for reading basic information about
that multi-pack-index to verify the correct data is written.

Signed-off-by: Derrick Stolee 
---
 Makefile|  1 +
 midx.c  | 79 +
 midx.h  | 18 +
 object-store.h  |  2 +
 t/helper/test-read-midx.c   | 31 +++
 t/helper/test-tool.c|  1 +
 t/helper/test-tool.h|  1 +
 t/t5319-multi-pack-index.sh | 11 +-
 8 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 t/helper/test-read-midx.c

diff --git a/Makefile b/Makefile
index f5636c711d..0b801d1b16 100644
--- a/Makefile
+++ b/Makefile
@@ -717,6 +717,7 @@ TEST_BUILTINS_OBJS += test-online-cpus.o
 TEST_BUILTINS_OBJS += test-path-utils.o
 TEST_BUILTINS_OBJS += test-prio-queue.o
 TEST_BUILTINS_OBJS += test-read-cache.o
+TEST_BUILTINS_OBJS += test-read-midx.o
 TEST_BUILTINS_OBJS += test-ref-store.o
 TEST_BUILTINS_OBJS += test-regex.o
 TEST_BUILTINS_OBJS += test-revision-walking.o
diff --git a/midx.c b/midx.c
index f85f2d334d..c1ff5acf85 100644
--- a/midx.c
+++ b/midx.c
@@ -1,18 +1,97 @@
 #include "cache.h"
 #include "csum-file.h"
 #include "lockfile.h"
+#include "object-store.h"
 #include "midx.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
+#define MIDX_BYTE_FILE_VERSION 4
+#define MIDX_BYTE_HASH_VERSION 5
+#define MIDX_BYTE_NUM_CHUNKS 6
+#define MIDX_BYTE_NUM_PACKS 8
 #define MIDX_HASH_VERSION 1
 #define MIDX_HEADER_SIZE 12
+#define MIDX_HASH_LEN 20
+#define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + MIDX_HASH_LEN)
 
 static char *get_midx_filename(const char *object_dir)
 {
return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
 
+struct multi_pack_index *load_multi_pack_index(const char *object_dir)
+{
+   struct multi_pack_index *m = NULL;
+   int fd;
+   struct stat st;
+   size_t midx_size;
+   void *midx_map = NULL;
+   uint32_t hash_version;
+   char *midx_name = get_midx_filename(object_dir);
+
+   fd = git_open(midx_name);
+
+   if (fd < 0)
+   goto cleanup_fail;
+   if (fstat(fd, )) {
+   error_errno(_("failed to read %s"), midx_name);
+   goto cleanup_fail;
+   }
+
+   midx_size = xsize_t(st.st_size);
+
+   if (midx_size < MIDX_MIN_SIZE) {
+   error(_("multi-pack-index file %s is too small"), midx_name);
+   goto cleanup_fail;
+   }
+
+   FREE_AND_NULL(midx_name);
+
+   midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0);
+
+   FLEX_ALLOC_MEM(m, object_dir, object_dir, strlen(object_dir));
+   m->fd = fd;
+   m->data = midx_map;
+   m->data_len = midx_size;
+
+   m->signature = get_be32(m->data);
+   if (m->signature != MIDX_SIGNATURE) {
+   error(_("multi-pack-index signature 0x%08x does not match 
signature 0x%08x"),
+ m->signature, MIDX_SIGNATURE);
+   goto cleanup_fail;
+   }
+
+   m->version = m->data[MIDX_BYTE_FILE_VERSION];
+   if (m->version != MIDX_VERSION) {
+   error(_("multi-pack-index version %d not recognized"),
+ m->version);
+   goto cleanup_fail;
+   }
+
+   hash_version = m->data[MIDX_BYTE_HASH_VERSION];
+   if (hash_version != MIDX_HASH_VERSION) {
+   error(_("hash version %u does not match"), hash_version);
+   goto cleanup_fail;
+   }
+   m->hash_len = MIDX_HASH_LEN;
+
+   m->num_chunks = m->data[MIDX_BYTE_NUM_CHUNKS];
+
+   m->num_packs = get_be32(m->data + MIDX_BYTE_NUM_PACKS);
+
+   return m;
+
+cleanup_fail:
+   free(m);
+   free(midx_name);
+   if (midx_map)
+   munmap(midx_map, midx_size);
+   if (0 <= fd)
+   close(fd);
+   return NULL;
+}
+
 static size_t write_midx_header(struct hashfile *f,
unsigned char num_chunks,
uint32_t num_packs)
diff --git a/midx.h b/midx.h
index dbdbe9f873..0e05051bca 100644
--- a/midx.h
+++ b/midx.h
@@ -1,6 +1,24 @@
 #ifndef __MIDX_H__
 #define __MIDX_H__
 
+struct multi_pack_index {
+   int fd;
+
+   const unsigned char *data;
+   size_t data_len;
+
+   uint32_t signature;
+   unsigned char version;
+   unsigned char hash_len;
+   unsigned char num_chunks;
+   uint32_t num_packs;
+   uint32_t num_objects;
+
+   char object_dir[FLEX_ARRAY];
+};
+
+struct multi_pack_index *load_multi_pack_index(const char *object_dir);
+
 int write_midx_file(const char *object_dir);
 
 #endif
diff --git a/object-store.h b/object-store.h
index d683112fd7..13a766aea8 100644
--- a/object-store.h
+++ b/object-store.h
@@ -84,6 +84,8 @@ struct packed_git {
char pack_name[FLEX_ARRAY]; /* more */
 };
 
+struct multi_pack_index;

[PATCH v4 03/23] multi-pack-index: add builtin

2018-07-12 Thread Derrick Stolee

This new 'git multi-pack-index' builtin will be the plumbing access
for writing, reading, and checking multi-pack-index files. The
initial implementation is a no-op.

Signed-off-by: Derrick Stolee 
---
 .gitignore |  3 ++-
 Documentation/git-multi-pack-index.txt | 36 ++
 Makefile   |  1 +
 builtin.h  |  1 +
 builtin/multi-pack-index.c | 34 
 command-list.txt   |  1 +
 git.c  |  1 +
 7 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/git-multi-pack-index.txt
 create mode 100644 builtin/multi-pack-index.c

diff --git a/.gitignore b/.gitignore
index 388cc4beee..25633bc515 100644
--- a/.gitignore
+++ b/.gitignore
@@ -99,8 +99,9 @@
 /git-mergetool--lib
 /git-mktag
 /git-mktree
-/git-name-rev
+/git-multi-pack-index
 /git-mv
+/git-name-rev
 /git-notes
 /git-p4
 /git-pack-redundant
diff --git a/Documentation/git-multi-pack-index.txt 
b/Documentation/git-multi-pack-index.txt
new file mode 100644
index 00..ec9982cbfc
--- /dev/null
+++ b/Documentation/git-multi-pack-index.txt
@@ -0,0 +1,36 @@
+git-multi-pack-index(1)
+==
+
+NAME
+
+git-multi-pack-index - Write and verify multi-pack-indexes
+
+
+SYNOPSIS
+
+[verse]
+'git multi-pack-index' [--object-dir=]
+
+DESCRIPTION
+---
+Write or verify a multi-pack-index (MIDX) file.
+
+OPTIONS
+---
+
+--object-dir=::
+   Use given directory for the location of Git objects. We check
+   `/packs/multi-pack-index` for the current MIDX file, and
+   `/packs` for the pack-files to index.
+
+
+SEE ALSO
+
+See link:technical/multi-pack-index.html[The Multi-Pack-Index Design
+Document] and link:technical/pack-format.html[The Multi-Pack-Index
+Format] for more information on the multi-pack-index feature.
+
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index e4b503d259..54610875ec 100644
--- a/Makefile
+++ b/Makefile
@@ -1047,6 +1047,7 @@ BUILTIN_OBJS += builtin/merge-recursive.o
 BUILTIN_OBJS += builtin/merge-tree.o
 BUILTIN_OBJS += builtin/mktag.o
 BUILTIN_OBJS += builtin/mktree.o
+BUILTIN_OBJS += builtin/multi-pack-index.o
 BUILTIN_OBJS += builtin/mv.o
 BUILTIN_OBJS += builtin/name-rev.o
 BUILTIN_OBJS += builtin/notes.o
diff --git a/builtin.h b/builtin.h
index 4e0f64723e..70997d7ace 100644
--- a/builtin.h
+++ b/builtin.h
@@ -191,6 +191,7 @@ extern int cmd_merge_recursive(int argc, const char **argv, 
const char *prefix);
 extern int cmd_merge_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_mktag(int argc, const char **argv, const char *prefix);
 extern int cmd_mktree(int argc, const char **argv, const char *prefix);
+extern int cmd_multi_pack_index(int argc, const char **argv, const char 
*prefix);
 extern int cmd_mv(int argc, const char **argv, const char *prefix);
 extern int cmd_name_rev(int argc, const char **argv, const char *prefix);
 extern int cmd_notes(int argc, const char **argv, const char *prefix);
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
new file mode 100644
index 00..3161ddae86
--- /dev/null
+++ b/builtin/multi-pack-index.c
@@ -0,0 +1,34 @@
+#include "builtin.h"
+#include "cache.h"
+#include "config.h"
+#include "parse-options.h"
+
+static char const * const builtin_multi_pack_index_usage[] = {
+   N_("git multi-pack-index [--object-dir=]"),
+   NULL
+};
+
+static struct opts_multi_pack_index {
+   const char *object_dir;
+} opts;
+
+int cmd_multi_pack_index(int argc, const char **argv,
+const char *prefix)
+{
+   static struct option builtin_multi_pack_index_options[] = {
+   OPT_FILENAME(0, "object-dir", _dir,
+ N_("object directory containing set of packfile and 
pack-index pairs")),
+   OPT_END(),
+   };
+
+   git_config(git_default_config, NULL);
+
+   argc = parse_options(argc, argv, prefix,
+builtin_multi_pack_index_options,
+builtin_multi_pack_index_usage, 0);
+
+   if (!opts.object_dir)
+   opts.object_dir = get_object_directory();
+
+   return 0;
+}
diff --git a/command-list.txt b/command-list.txt
index e1c26c1bb7..61071f8fa2 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -123,6 +123,7 @@ git-merge-index plumbingmanipulators
 git-merge-one-file  purehelpers
 git-mergetool   ancillarymanipulators   
complete
 git-merge-tree  ancillaryinterrogators
+git-multi-pack-indexplumbingmanipulators
 git-mktag   plumbingmanipulators
 git-mktree  plumbingmanipulators
 git-mv  mainporcelain   worktree

[PATCH v4 00/23] Multi-pack-index (MIDX)

2018-07-12 Thread Derrick Stolee

v3 had a lot of interesting feedback, most of which was non-functional,
but made a big impact on the shape of the patch, especially the test
script.

These are the important changes:

* 'git multi-pack-index' will report usage if the 'write' verb is not
  provided, or if extra parameters are provided. A later series will
  create the 'verify' verb.

* t5319-multi-pack-index.sh has a reoganized way to generate object
  data, so it has fewer code clones.

* 'test-tool read-midx' uses '-' instead of '_'.

* The global 'core_multi_pack_index' is replaced with a one-time call to
  git_config_bool() per repository that loads a multi-pack-index.

* 'struct multi_pack_index' is now defined in midx.h and kept anonymous
  to object-store.h.

* Added a test that 'git repack' removes the multi-pack-index.

* Fixed a doc bug when linking to the technical docs.

I included the diff between the latest ds/multi-pack-index and this
series as part of this message.

You can see the CI builds for Linux, Mac, and Windows linked from the
GitHub pull request [1].

Thanks,
-Stolee

[1] https://github.com/gitgitgadget/git/pull/5

Derrick Stolee (23):
  multi-pack-index: add design document
  multi-pack-index: add format details
  multi-pack-index: add builtin
  multi-pack-index: add 'write' verb
  midx: write header information to lockfile
  multi-pack-index: load into memory
  t5319: expand test data
  packfile: generalize pack directory list
  multi-pack-index: read packfile list
  multi-pack-index: write pack names in chunk
  midx: read pack names into array
  midx: sort and deduplicate objects from packfiles
  midx: write object ids in a chunk
  midx: write object id fanout chunk
  midx: write object offsets
  config: create core.multiPackIndex setting
  midx: read objects from multi-pack-index
  midx: use midx in abbreviation calculations
  midx: use existing midx when writing new one
  midx: use midx in approximate_object_count
  midx: prevent duplicate packfile loads
  packfile: skip loading index if in multi-pack-index
  midx: clear midx on repack

 .gitignore   |   3 +-
 Documentation/config.txt |   5 +
 Documentation/git-multi-pack-index.txt   |  56 ++
 Documentation/technical/multi-pack-index.txt | 109 +++
 Documentation/technical/pack-format.txt  |  77 ++
 Makefile |   3 +
 builtin.h|   1 +
 builtin/multi-pack-index.c   |  47 +
 builtin/repack.c |   9 +
 command-list.txt |   1 +
 git.c|   1 +
 midx.c   | 918 +++
 midx.h   |  44 +
 object-store.h   |   9 +
 packfile.c   | 169 +++-
 packfile.h   |   9 +
 sha1-name.c  |  70 ++
 t/helper/test-read-midx.c|  51 ++
 t/helper/test-tool.c |   1 +
 t/helper/test-tool.h |   1 +
 t/t5319-multi-pack-index.sh  | 179 
 21 files changed, 1720 insertions(+), 43 deletions(-)
 create mode 100644 Documentation/git-multi-pack-index.txt
 create mode 100644 Documentation/technical/multi-pack-index.txt
 create mode 100644 builtin/multi-pack-index.c
 create mode 100644 midx.c
 create mode 100644 midx.h
 create mode 100644 t/helper/test-read-midx.c
 create mode 100755 t/t5319-multi-pack-index.sh


base-commit: 53f9a3e157dbbc901a02ac2c73346d375e24978c
-- 
2.18.0.118.gd4f65b8d14

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 9dcde07a34..25f817ca42 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -910,7 +910,8 @@ core.commitGraph::
 
 core.multiPackIndex::
Use the multi-pack-index file to track multiple packfiles using a
-   single index. See link:technical/multi-pack-index[1].
+   single index. See link:technical/multi-pack-index.html[the
+   multi-pack-index design document].
 
 core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
diff --git a/Documentation/git-multi-pack-index.txt 
b/Documentation/git-multi-pack-index.txt
index be97c9372e..a62af1caca 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -9,7 +9,7 @@ git-multi-pack-index - Write and verify multi-pack-indexes
 SYNOPSIS
 
 [verse]
-'git multi-pack-index' [--object-dir ] 
+'git multi-pack-index' [--object-dir=] 
 
 DESCRIPTION
 ---
@@ -18,7 +18,7 @@ Write or verify a multi-pack-index (MIDX) file.
 OPTIONS
 ---
 
---object-dir ::
+--object-dir=::
Use given directory for the location of Git objects. We check
`/packs/multi-pack-index` for the current MIDX file, and
`/packs` for the pack-files to index.

[PATCH v4 01/23] multi-pack-index: add design document

2018-07-12 Thread Derrick Stolee

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/multi-pack-index.txt | 109 +++
 1 file changed, 109 insertions(+)
 create mode 100644 Documentation/technical/multi-pack-index.txt

diff --git a/Documentation/technical/multi-pack-index.txt 
b/Documentation/technical/multi-pack-index.txt
new file mode 100644
index 00..d7e57639f7
--- /dev/null
+++ b/Documentation/technical/multi-pack-index.txt
@@ -0,0 +1,109 @@
+Multi-Pack-Index (MIDX) Design Notes
+
+
+The Git object directory contains a 'pack' directory containing
+packfiles (with suffix ".pack") and pack-indexes (with suffix
+".idx"). The pack-indexes provide a way to lookup objects and
+navigate to their offset within the pack, but these must come
+in pairs with the packfiles. This pairing depends on the file
+names, as the pack-index differs only in suffix with its pack-
+file. While the pack-indexes provide fast lookup per packfile,
+this performance degrades as the number of packfiles increases,
+because abbreviations need to inspect every packfile and we are
+more likely to have a miss on our most-recently-used packfile.
+For some large repositories, repacking into a single packfile
+is not feasible due to storage space or excessive repack times.
+
+The multi-pack-index (MIDX for short) stores a list of objects
+and their offsets into multiple packfiles. It contains:
+
+- A list of packfile names.
+- A sorted list of object IDs.
+- A list of metadata for the ith object ID including:
+  - A value j referring to the jth packfile.
+  - An offset within the jth packfile for the object.
+- If large offsets are required, we use another list of large
+  offsets similar to version 2 pack-indexes.
+
+Thus, we can provide O(log N) lookup time for any number
+of packfiles.
+
+Design Details
+--
+
+- The MIDX is stored in a file named 'multi-pack-index' in the
+  .git/objects/pack directory. This could be stored in the pack
+  directory of an alternate. It refers only to packfiles in that
+  same directory.
+
+- The pack.multiIndex config setting must be on to consume MIDX files.
+
+- The file format includes parameters for the object ID hash
+  function, so a future change of hash algorithm does not require
+  a change in format.
+
+- The MIDX keeps only one record per object ID. If an object appears
+  in multiple packfiles, then the MIDX selects the copy in the most-
+  recently modified packfile.
+
+- If there exist packfiles in the pack directory not registered in
+  the MIDX, then those packfiles are loaded into the `packed_git`
+  list and `packed_git_mru` cache.
+
+- The pack-indexes (.idx files) remain in the pack directory so we
+  can delete the MIDX file, set core.midx to false, or downgrade
+  without any loss of information.
+
+- The MIDX file format uses a chunk-based approach (similar to the
+  commit-graph file) that allows optional data to be added.
+
+Future Work
+---
+
+- Add a 'verify' subcommand to the 'git midx' builtin to verify the
+  contents of the multi-pack-index file match the offsets listed in
+  the corresponding pack-indexes.
+
+- The multi-pack-index allows many packfiles, especially in a context
+  where repacking is expensive (such as a very large repo), or
+  unexpected maintenance time is unacceptable (such as a high-demand
+  build machine). However, the multi-pack-index needs to be rewritten
+  in full every time. We can extend the format to be incremental, so
+  writes are fast. By storing a small "tip" multi-pack-index that
+  points to large "base" MIDX files, we can keep writes fast while
+  still reducing the number of binary searches required for object
+  lookups.
+
+- The reachability bitmap is currently paired directly with a single
+  packfile, using the pack-order as the object order to hopefully
+  compress the bitmaps well using run-length encoding. This could be
+  extended to pair a reachability bitmap with a multi-pack-index. If
+  the multi-pack-index is extended to store a "stable object order"
+  (a function Order(hash) = integer that is constant for a given hash,
+  even as the multi-pack-index is updated) then a reachability bitmap
+  could point to a multi-pack-index and be updated independently.
+
+- Packfiles can be marked as "special" using empty files that share
+  the initial name but replace ".pack" with ".keep" or ".promisor".
+  We can add an optional chunk of data to the multi-pack-index that
+  records flags of information about the packfiles. This allows new
+  states, such as 'repacked' or 'redeltified', that can help with
+  pack maintenance in a multi-pack environment. It may also be
+  helpful to organize packfiles by object type (commit, tree, blob,
+  etc.) and use this metadata to help that maintenance.
+
+- The partial clone feature records special "promisor" packs that
+  may point to objects that are not stored locally, but available
+  on request to a server. The multi-pack-index does not

[PATCH v4 02/23] multi-pack-index: add format details

2018-07-12 Thread Derrick Stolee

The multi-pack-index feature generalizes the existing pack-index
feature by indexing objects across multiple pack-files.

Describe the basic file format, using a 12-byte header followed by
a lookup table for a list of "chunks" which will be described later.
The file ends with a footer containing a checksum using the hash
algorithm.

The header allows later versions to create breaking changes by
advancing the version number. We can also change the hash algorithm
using a different version value.

We will add the individual chunk format information as we introduce
the code that writes that information.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/pack-format.txt | 49 +
 1 file changed, 49 insertions(+)

diff --git a/Documentation/technical/pack-format.txt 
b/Documentation/technical/pack-format.txt
index 70a99fd142..e060e693f4 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -252,3 +252,52 @@ Pack file entry: <+
 corresponding packfile.
 
 20-byte SHA-1-checksum of all of the above.
+
+== multi-pack-index (MIDX) files have the following format:
+
+The multi-pack-index files refer to multiple pack-files and loose objects.
+
+In order to allow extensions that add extra data to the MIDX, we organize
+the body into "chunks" and provide a lookup table at the beginning of the
+body. The header includes certain length values, such as the number of packs,
+the number of base MIDX files, hash lengths and types.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+   4-byte signature:
+   The signature is: {'M', 'I', 'D', 'X'}
+
+   1-byte version number:
+   Git only writes or recognizes version 1.
+
+   1-byte Object Id Version
+   Git only writes or recognizes version 1 (SHA1).
+
+   1-byte number of "chunks"
+
+   1-byte number of base multi-pack-index files:
+   This value is currently always zero.
+
+   4-byte number of pack files
+
+CHUNK LOOKUP:
+
+   (C + 1) * 12 bytes providing the chunk offsets:
+   First 4 bytes describe chunk id. Value 0 is a terminating label.
+   Other 8 bytes provide offset in current file for chunk to start.
+   (Chunks are provided in file-order, so you can infer the length
+   using the next chunk position if necessary.)
+
+   The remaining data in the body is described one chunk at a time, and
+   these chunks may be given in any order. Chunks are required unless
+   otherwise specified.
+
+CHUNK DATA:
+
+   (This section intentionally left incomplete.)
+
+TRAILER:
+
+   20-byte SHA1-checksum of the above contents.
-- 
2.18.0.118.gd4f65b8d14

Re: [PATCH v2 01/10] t/test-lib: teach --chain-lint to detect broken &&-chains in subshells

2018-07-12 Thread Eric Sunshine

On Thu, Jul 12, 2018 at 12:56 PM Jeff King  wrote:
> Like Junio, I'm a little nervous that this is going to end up being a
> maintenance burden. People may hit false positives and then be
> confronted with this horrible mass of sed to try to figure out what went
> wrong [...]

A very valid concern.

> But I came around to thinking:
>   - this found and fixed real problems in the test suite, with minimal
> false positives across the existing code

The counterargument (and arguing against my own case) is that, while
it found 3 or 4 genuine test bugs hidden by &&-breakage, they were
just that: bugs in the tests; they weren't hiding any bugs in Git
itself, which is pretty measly return for the effort invested in the
linter.

However, existing tests aside, the more important goal is detecting
problems in new or updated tests hiding genuine bugs in changes to Git
itself, so it may have some value.

>   - worst case is that relief is only a "git revert" away

Right. It's just a developer aid, not a user-facing feature which has
to be maintained in perpetuity, so retiring it is easy.

RE: [PATCH v3] handle lower case drive letters on Windows

2018-07-12 Thread Ben Peart

> -Original Message-
> From: Junio C Hamano  On Behalf Of Junio C Hamano
> Sent: Thursday, July 12, 2018 3:13 PM
> To: Ben Peart 
> Cc: git@vger.kernel.org; sbel...@google.com; johannes.schinde...@gmx.de
> Subject: Re: [PATCH v3] handle lower case drive letters on Windows
> 
> Ben Peart  writes:
> 
> > On Windows, if a tool calls SetCurrentDirectory with a lower case drive
> > letter, the subsequent call to GetCurrentDirectory will return the same
> > lower case drive letter. Powershell, for example, does not normalize the
> > path. If that happens, test-drop-caches will error out as it does not
> > correctly to handle lower case drive letters.
> >
> > Helped-by: Johannes Schindelin 
> > Signed-off-by: Ben Peart 
> > ---
> 
> Thanks.  Will replace, even though showing the whole Buffer (I am
> guessing it is the equivalent of result from getcwd(3) call) and
> complaining about drive "letter" feels a bit strange ;-)
> 

Thanks.  I thought it was strange as well until I realized you only see the
error message if there _isn't_ a valid drive letter so seeing the entire
path makes sense as it will likely be something like "\\server\volume\directory"

Re: [PATCH v3] handle lower case drive letters on Windows

2018-07-12 Thread Junio C Hamano

Ben Peart  writes:

> On Windows, if a tool calls SetCurrentDirectory with a lower case drive
> letter, the subsequent call to GetCurrentDirectory will return the same
> lower case drive letter. Powershell, for example, does not normalize the
> path. If that happens, test-drop-caches will error out as it does not
> correctly to handle lower case drive letters.
>
> Helped-by: Johannes Schindelin 
> Signed-off-by: Ben Peart 
> ---

Thanks.  Will replace, even though showing the whole Buffer (I am
guessing it is the equivalent of result from getcwd(3) call) and
complaining about drive "letter" feels a bit strange ;-)

> Notes:
> Base Ref: master
> Web-Diff: https://github.com/benpeart/git/commit/1ff8de1b6c
> Checkout: git fetch https://github.com/benpeart/git drop-caches-v3 && git 
> checkout 1ff8de1b6c
> 
> ### Interdiff (v2..v3):
> 
> diff --git a/t/helper/test-drop-caches.c b/t/helper/test-drop-caches.c
> index 37047189c3..f65e301f9d 100644
> --- a/t/helper/test-drop-caches.c
> +++ b/t/helper/test-drop-caches.c
> @@ -16,8 +16,8 @@ static int cmd_sync(void)
>   if ((0 == dwRet) || (dwRet > MAX_PATH))
>   return error("Error getting current directory");
> 
> - if ((toupper(Buffer[0]) < 'A') || (toupper(Buffer[0]) > 'Z'))
> - return error("Invalid drive letter '%c'", Buffer[0]);
> + if (!has_dos_drive_prefix(Buffer))
> + return error("'%s': invalid drive letter", Buffer);
> 
>   szVolumeAccessPath[4] = Buffer[0];
>   hVolWrite = CreateFile(szVolumeAccessPath, GENERIC_READ | GENERIC_WRITE,
> 
> ### Patches
>
>  t/helper/test-drop-caches.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/t/helper/test-drop-caches.c b/t/helper/test-drop-caches.c
> index d6bcfddf13..f65e301f9d 100644
> --- a/t/helper/test-drop-caches.c
> +++ b/t/helper/test-drop-caches.c
> @@ -16,8 +16,8 @@ static int cmd_sync(void)
>   if ((0 == dwRet) || (dwRet > MAX_PATH))
>   return error("Error getting current directory");
>  
> - if ((Buffer[0] < 'A') || (Buffer[0] > 'Z'))
> - return error("Invalid drive letter '%c'", Buffer[0]);
> + if (!has_dos_drive_prefix(Buffer))
> + return error("'%s': invalid drive letter", Buffer);
>  
>   szVolumeAccessPath[4] = Buffer[0];
>   hVolWrite = CreateFile(szVolumeAccessPath, GENERIC_READ | GENERIC_WRITE,
>
> base-commit: e3331758f12da22f4103eec7efe1b5304a9be5e9

Re: [PATCH v2] sha1-name.c: for ":/", find detached HEAD commits

2018-07-12 Thread Junio C Hamano

William Chargin  writes:

>> As we discussed during the review on v1, ":/"
>> is *NOT* pathspec (that is why having these tests in t4208 is wrong
>> but we are following existing mistakes).
>
> Ah, I understand the terminology better now. Thanks. I'll change the
> commit message wording to use "extended SHA-1s" instead of "pathspecs".
>
>> Now you have Peff's sign-off for the one-liner code change, the last
>> one is redundant.
>
> Okay: I'll remove the "Based-on-patch-by" line.
>
>> Other than the above two nits, the patch looks good.  I would have
>> broken the line after "including HEAD.", but the slightly-long line
>> is also OK.
>
> Happy to change this while making the above edits. :-)

Let's save a round-trip.  Here is what I'd queue for now, and you
can just say "looks good" if you agree with the result, or send an
updated version ;-).

Thanks for working on this.

-- >8 --
From: William Chargin 
Date: Wed, 11 Jul 2018 22:49:09 -0700
Subject: [PATCH] sha1-name.c: for ":/", find detached HEAD commits

This patch broadens the set of commits matched by ":/" to
include commits reachable from HEAD but not any named ref. This avoids
surprising behavior when working with a detached HEAD and trying to
refer to a commit that was recently created and only exists within the
detached state.

If multiple worktrees exist, only the current worktree's HEAD is
considered reachable. This is consistent with the existing behavior for
other per-worktree refs: e.g., bisect refs are considered reachable, but
only within the relevant worktree.

Signed-off-by: Jeff King 
Signed-off-by: William Chargin 
Signed-off-by: Junio C Hamano 
---
 Documentation/revisions.txt   |  3 ++-
 sha1_name.c   |  1 +
 t/t4208-log-magic-pathspec.sh | 26 ++
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/Documentation/revisions.txt b/Documentation/revisions.txt
index dfcc49c72c..0845c3f590 100644
--- a/Documentation/revisions.txt
+++ b/Documentation/revisions.txt
@@ -180,7 +180,8 @@ existing tag object.
   A colon, followed by a slash, followed by a text, names
   a commit whose commit message matches the specified regular expression.
   This name returns the youngest matching commit which is
-  reachable from any ref. The regular expression can match any part of the
+  reachable from any ref, including HEAD.
+  The regular expression can match any part of the
   commit message. To match messages starting with a string, one can use
   e.g. ':/^foo'. The special sequence ':/!' is reserved for modifiers to what
   is matched. ':/!-foo' performs a negative match, while ':/!!foo' matches a
diff --git a/sha1_name.c b/sha1_name.c
index 611c7d24dd..19f66713e1 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1649,6 +1649,7 @@ static int get_oid_with_context_1(const char *name,
struct commit_list *list = NULL;
 
for_each_ref(handle_one_ref, );
+   head_ref(handle_one_ref, );
commit_list_sort_by_date();
return get_oid_oneline(name + 2, oid, list);
}
diff --git a/t/t4208-log-magic-pathspec.sh b/t/t4208-log-magic-pathspec.sh
index a1705f70cf..69643d101d 100755
--- a/t/t4208-log-magic-pathspec.sh
+++ b/t/t4208-log-magic-pathspec.sh
@@ -25,6 +25,32 @@ test_expect_success '"git log :/a -- " should not be 
ambiguous' '
git log :/a --
 '
 
+test_expect_success '"git log :/detached -- " should find a commit only in 
HEAD' '
+   test_when_finished "git checkout master" &&
+   git checkout --detach &&
+   # Must manually call `test_tick` instead of using `test_commit`,
+   # because the latter additionally creates a tag, which would make
+   # the commit reachable not only via HEAD.
+   test_tick &&
+   git commit --allow-empty -m detached &&
+   test_tick &&
+   git commit --allow-empty -m something-else &&
+   git log :/detached --
+'
+
+test_expect_success '"git log :/detached -- " should not find an orphaned 
commit' '
+   test_must_fail git log :/detached --
+'
+
+test_expect_success '"git log :/detached -- " should find HEAD only of own 
worktree' '
+   git worktree add other-tree HEAD &&
+   git -C other-tree checkout --detach &&
+   test_tick &&
+   git -C other-tree commit --allow-empty -m other-detached &&
+   git -C other-tree log :/other-detached -- &&
+   test_must_fail git log :/other-detached --
+'
+
 test_expect_success '"git log -- :/a" should not be ambiguous' '
git log -- :/a
 '
-- 
2.18.0-129-ge3331758f1

[PATCH] RFC: submodule-config: introduce trust level

2018-07-12 Thread Stefan Beller

The series merged at 614ea03a71e (Merge branch
'bw/submodule-config-cleanup', 2017-08-26) went to great length to make it
explicit to the caller where a value came from, as that would help the
caller to be careful to decide which values to take from where, i.e. be
careful about security implications.

In practice we always want to stack the settings starting with the
.gitmodules file as a base and then put the config on top of it.
So let's manage the security aspects impolitely in the submodule-config
machinery directly where we implement its parsing as there is a good
place to reason about the trust that we need to put into a parsed value.

This patch implements the trust level that is passed to the parsing,'
currently we only pass in the 'in_repo' level of trust, which is the
.gitmodules file.

Follow up patches could add other sources that populate the submodule
config again.

Signed-off-by: Stefan Beller 
---

 This is on top of ao/config-from-gitmodules.

 submodule-config.c | 31 ---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 77421a49719..09eab9f00e0 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -21,6 +21,11 @@ struct submodule_cache {
unsigned gitmodules_read:1;
 };
 
+enum submodule_config_trust_level {
+   in_repo = 1,
+   configured_by_user = 2
+};
+
 /*
  * thin wrapper struct needed to insert 'struct submodule' entries to
  * the hashmap
@@ -387,12 +392,14 @@ struct parse_config_parameter {
struct submodule_cache *cache;
const struct object_id *treeish_name;
const struct object_id *gitmodules_oid;
+   enum submodule_config_trust_level source;
int overwrite;
 };
 
 static int parse_config(const char *var, const char *value, void *data)
 {
struct parse_config_parameter *me = data;
+   enum submodule_config_trust_level source = me->source;
struct submodule *submodule;
struct strbuf name = STRBUF_INIT, item = STRBUF_INIT;
int ret = 0;
@@ -406,6 +413,7 @@ static int parse_config(const char *var, const char *value, 
void *data)
 name.buf);
 
if (!strcmp(item.buf, "path")) {
+   /* all sources allowed */
if (!value)
ret = config_error_nonbool(var);
else if (!me->overwrite && submodule->path)
@@ -419,6 +427,7 @@ static int parse_config(const char *var, const char *value, 
void *data)
cache_put_path(me->cache, submodule);
}
} else if (!strcmp(item.buf, "fetchrecursesubmodules")) {
+   /* all sources allowed */
/* when parsing worktree configurations we can die early */
int die_on_error = is_null_oid(me->gitmodules_oid);
if (!me->overwrite &&
@@ -430,6 +439,7 @@ static int parse_config(const char *var, const char *value, 
void *data)
var, value,
die_on_error);
} else if (!strcmp(item.buf, "ignore")) {
+   /* all sources allowed */
if (!value)
ret = config_error_nonbool(var);
else if (!me->overwrite && submodule->ignore)
@@ -446,6 +456,7 @@ static int parse_config(const char *var, const char *value, 
void *data)
submodule->ignore = xstrdup(value);
}
} else if (!strcmp(item.buf, "url")) {
+   /* all sources allowed */
if (!value) {
ret = config_error_nonbool(var);
} else if (!me->overwrite && submodule->url) {
@@ -456,16 +467,27 @@ static int parse_config(const char *var, const char 
*value, void *data)
submodule->url = xstrdup(value);
}
} else if (!strcmp(item.buf, "update")) {
+   struct submodule_update_strategy st;
if (!value)
ret = config_error_nonbool(var);
else if (!me->overwrite &&
 submodule->update_strategy.type != 
SM_UPDATE_UNSPECIFIED)
warn_multiple_config(me->treeish_name, submodule->name,
 "update");
-   else if (parse_submodule_update_strategy(value,
->update_strategy) < 0)
-   die(_("invalid value for %s"), var);
+   else if (parse_submodule_update_strategy(value, ) < 0)
+   die(_("invalid value for %s"), var);
+   else if (source <= in_repo) {
+   if (st.type == SM_UPDATE_COMMAND) {
+   submodule->update_strategy.type = st.type;
+   submodule->update_strategy.command = \
+

gitweb and Levenshtein

2018-07-12 Thread David Brown

Howdy, I want to hack the getweb_make_index.perl script to create a 
string search using: 
https://github.com/git/git/blob/master/levenshtein.c.


How do i reference the compiled code?

I would like to call this routine using Java and maybe Perl.

Please advise.

Thanks.

Regards,

Re: [PATCH 21/25] t5000-t5999: fix broken &&-chains

2018-07-12 Thread Eric Sunshine

On Thu, Jul 12, 2018 at 2:50 PM Junio C Hamano  wrote:
> Eric Sunshine  writes:
> > The exact reason is that the prerequisite was not met (indeed, I
> > wasn't even aware of that prerequisite), so the commit message can be
> > more direct:
> >
> > This was missed by the previous clean-ups due to an unmet
> > CLONE_2GB prerequisite.
> >
> > Thanks for saving a round-trip.
> >
> >> Signed-off-by: Junio C Hamano 
>
> There still need a sign-off, so ... ;-)

For whose sign-off are you waiting, SZEDER's or mine? (I presume SZEDER's.)

Re: [PATCH 21/25] t5000-t5999: fix broken &&-chains

2018-07-12 Thread Junio C Hamano

Eric Sunshine  writes:

> On Thu, Jul 12, 2018 at 2:35 PM Junio C Hamano  wrote:
>> Junio C Hamano  writes:
>> Oops, sent before completing the message.
>>
>> For that to happen, we need a sign-off ;-)
>>
>> I guess this one would have been caught with the "sed script on
>> subshell" linter that does not execute?
>
> Yes, this is correctly caught when the prerequisite is met.
>
>> -- >8 --
>> Subject: t5608: fix broken &&-chain
>>
>> This is inside a loop that is run inside a subshell, in a test that
>> is protected with CLONE_2GB prerequisite, one or more which is quite
>> likely reason why it wasn't caught durin the previous clean-up.
>
> s/durin/during/
>
> The exact reason is that the prerequisite was not met (indeed, I
> wasn't even aware of that prerequisite), so the commit message can be
> more direct:
>
> This was missed by the previous clean-ups due to an unmet
> CLONE_2GB prerequisite.
>
> Thanks for saving a round-trip.
>
>> Signed-off-by: Junio C Hamano 

There still need a sign-off, so ... ;-)

Re: [PATCH] t6036: fix broken && chain in sub-shell

2018-07-12 Thread Junio C Hamano

Ramsay Jones  writes:

> Signed-off-by: Ramsay Jones 
> ---
>
> Hi Junio,
>
> I had a test failure on 'pu' today - Eric's chain-lint series
> found another broken chain in one of Elijah's new tests (on the
> 'en/t6036-recursive-corner-cases' branch).

Thanks, I see the same breakage in my build, too.

>
> ATB,
> Ramsay Jones
>
>  t/t6036-recursive-corner-cases.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/t6036-recursive-corner-cases.sh 
> b/t/t6036-recursive-corner-cases.sh
> index 2a44acace..59e52c5a0 100755
> --- a/t/t6036-recursive-corner-cases.sh
> +++ b/t/t6036-recursive-corner-cases.sh
> @@ -495,7 +495,7 @@ test_expect_success 'setup differently handled merges of 
> directory/file conflict
>   test_write_lines 1 2 3 4 5 6 7 8 >a &&
>   git add a &&
>   git commit -m E3 &&
> - git tag E3
> + git tag E3 &&
>  
>   git checkout C^0 &&
>   test_must_fail git merge B^0 &&

Re: [PATCH 21/25] t5000-t5999: fix broken &&-chains

2018-07-12 Thread Eric Sunshine

On Thu, Jul 12, 2018 at 2:35 PM Junio C Hamano  wrote:
> Junio C Hamano  writes:
> Oops, sent before completing the message.
>
> For that to happen, we need a sign-off ;-)
>
> I guess this one would have been caught with the "sed script on
> subshell" linter that does not execute?

Yes, this is correctly caught when the prerequisite is met.

> -- >8 --
> Subject: t5608: fix broken &&-chain
>
> This is inside a loop that is run inside a subshell, in a test that
> is protected with CLONE_2GB prerequisite, one or more which is quite
> likely reason why it wasn't caught durin the previous clean-up.

s/durin/during/

The exact reason is that the prerequisite was not met (indeed, I
wasn't even aware of that prerequisite), so the commit message can be
more direct:

This was missed by the previous clean-ups due to an unmet
CLONE_2GB prerequisite.

Thanks for saving a round-trip.

> Signed-off-by: Junio C Hamano

Re: [PATCH 21/25] t5000-t5999: fix broken &&-chains

2018-07-12 Thread Junio C Hamano

Junio C Hamano  writes:

> Eric Sunshine  writes:
>
>> On Thu, Jul 12, 2018 at 8:37 AM SZEDER Gábor  wrote:
>>> The change below should be squashed into this patch to fix a
>>> previously unnoticed broken &&-chain.  I think you missed it, because
>>> this test script is rather expensive and you didn't run it with
>>> GIT_TEST_CLONE_2GB=YesPlease.
>>>
>>> diff --git a/t/t5608-clone-2gb.sh b/t/t5608-clone-2gb.sh
>>> @@ -23,7 +23,7 @@ test_expect_success CLONE_2GB 'setup' '
>>> -   echo "M 100644 :$i $i" >> commit
>>> +   echo "M 100644 :$i $i" >> commit &&
>>
>> Thanks for finding this. I tried to get as much coverage as possible
>> by installing packages I don't normally have installed (Apache, cvs,
>> cvsps, Subversion, Perforce, etc.) and even temporarily modified a
>> script or two to force it run ...
>
> Thanks, both.  I think the &&-chain fix series is already large and
> also otherwise seem to be pretty solid, so let's not reroll but
> queue this addition on top.

Oops, sent before completing the message.

For that to happen, we need a sign-off ;-)

I guess this one would have been caught with the "sed script on
subshell" linter that does not execute?

-- >8 --
Subject: t5608: fix broken &&-chain

This is inside a loop that is run inside a subshell, in a test that
is protected with CLONE_2GB prerequisite, one or more which is quite
likely reason why it wasn't caught durin the previous clean-up.

Signed-off-by: Junio C Hamano 
---
 t/t5608-clone-2gb.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t5608-clone-2gb.sh b/t/t5608-clone-2gb.sh
index df822d9a3e..2c6bc07344 100755
--- a/t/t5608-clone-2gb.sh
+++ b/t/t5608-clone-2gb.sh
@@ -23,7 +23,7 @@ test_expect_success CLONE_2GB 'setup' '
printf "blob\nmark :$i\ndata $blobsize\n" &&
#test-tool genrandom $i $blobsize &&
printf "%-${blobsize}s" $i &&
-   echo "M 100644 :$i $i" >> commit
+   echo "M 100644 :$i $i" >> commit &&
i=$(($i+1)) ||
echo $? > exit-status
 done &&

Re: [PATCH 21/25] t5000-t5999: fix broken &&-chains

2018-07-12 Thread Junio C Hamano

Eric Sunshine  writes:

> On Thu, Jul 12, 2018 at 8:37 AM SZEDER Gábor  wrote:
>> The change below should be squashed into this patch to fix a
>> previously unnoticed broken &&-chain.  I think you missed it, because
>> this test script is rather expensive and you didn't run it with
>> GIT_TEST_CLONE_2GB=YesPlease.
>>
>> diff --git a/t/t5608-clone-2gb.sh b/t/t5608-clone-2gb.sh
>> @@ -23,7 +23,7 @@ test_expect_success CLONE_2GB 'setup' '
>> -   echo "M 100644 :$i $i" >> commit
>> +   echo "M 100644 :$i $i" >> commit &&
>
> Thanks for finding this. I tried to get as much coverage as possible
> by installing packages I don't normally have installed (Apache, cvs,
> cvsps, Subversion, Perforce, etc.) and even temporarily modified a
> script or two to force it run ...

Thanks, both.  I think the &&-chain fix series is already large and
also otherwise seem to be pretty solid, so let's not reroll but
queue this addition on top.

Re: [GSoC][PATCH v3 12/13] rebase -i: implement the logic to initialize the variable $revision in C

2018-07-12 Thread Junio C Hamano

Junio C Hamano  writes:

> If I am reading the body of this if() block correctly, I think it
> does everything init_revisions_and_shortrevisions shell function
> does, i.e. compute $revisions for both cases with or without
> upstream and write squash-onto state if needed, so that we can call
> the sequencer_make_script() helper with necessary $revisions arg
> without being passed from the command line of --make-script helper.
>
> But the hunk below tells me that you are still calling
> init_revisions_and_shortrevisions shell function before we are
> called.  Why?  IOW, why isn't this step removing that shell function
> *and* the call to it, now its logic is fully implemented in the body
> of this if() block?

You can ignore this part (but not the rest) of my comments, as 13/13
answers it adequately.  After this step, the shell function still
needs to be called to set $shortrevisions.

Thanks.

Re: [PATCH 21/25] t5000-t5999: fix broken &&-chains

2018-07-12 Thread Jeff King

On Thu, Jul 12, 2018 at 01:44:49PM -0400, Eric Sunshine wrote:

> On Thu, Jul 12, 2018 at 8:37 AM SZEDER Gábor  wrote:
> > The change below should be squashed into this patch to fix a
> > previously unnoticed broken &&-chain.  I think you missed it, because
> > this test script is rather expensive and you didn't run it with
> > GIT_TEST_CLONE_2GB=YesPlease.
> >
> > diff --git a/t/t5608-clone-2gb.sh b/t/t5608-clone-2gb.sh
> > @@ -23,7 +23,7 @@ test_expect_success CLONE_2GB 'setup' '
> > -   echo "M 100644 :$i $i" >> commit
> > +   echo "M 100644 :$i $i" >> commit &&
> 
> Thanks for finding this. I tried to get as much coverage as possible
> by installing packages I don't normally have installed (Apache, cvs,
> cvsps, Subversion, Perforce, etc.) and even temporarily modified a
> script or two to force it run when I simply couldn't meet some
> prerequisite, thus reducing the "skipped" messages to a minimum, but I
> wasn't even aware of this prerequisite since I never saw a "skipped"
> message for it.

I think in theory we should be able to lint _every_ test, even if we're
not running it. After all, the point is that a proper linting runs zero
commands.

That said, it may not be worth the implementation effort. The linting
happens at a pretty low-level, and we've already decided to skip tests
long before then (even for single prereqs, let alone skip_all cases
where we exit the script early).

> Looking at it more closely, I think the reason it didn't come to my
> attention is that this script doesn't use the standard skip_all="..."
> mechanism for skipping the tests but instead "rolls its own", and
> apparently 'prove' simply swallowed (or was unable to produce) an
> overall "skipped" message for this script.

Yeah, that is a bit funny. For a whole-test skip like this, I think
skip_all is easier to read, as it is immediately apparent to the reader
that nothing that _doesn't_ meet that prereq should be in the file. So
I'd be happy to see it switched (though it's not _that_ big a deal, so
leaving it is fine, too).

By the way, "prove --directives" can help with finding individual
skipped tests.

-Peff

Re: [GSoC][PATCH v3 12/13] rebase -i: implement the logic to initialize the variable $revision in C

2018-07-12 Thread Junio C Hamano

Alban Gruin  writes:

> @@ -50,6 +71,13 @@ int cmd_rebase__helper(int argc, const char **argv, const 
> char *prefix)
>   N_("prepare the branch to be rebased"), 
> PREPARE_BRANCH),
>   OPT_CMDMODE(0, "complete-action", ,
>   N_("complete the action"), COMPLETE_ACTION),
> + OPT_STRING(0, "onto", , N_("onto"), N_("onto")),
> + OPT_STRING(0, "restrict-revision", _revisions,
> +N_("restrict-revision"), N_("restrict revision")),
> + OPT_STRING(0, "squash-onto", _onto, N_("squash-onto"),
> +N_("squash onto")),
> + OPT_STRING(0, "upstream", , N_("upstream"),
> +N_("the upstream commit")),
>   OPT_END()
>   };
>  
> @@ -77,8 +105,30 @@ int cmd_rebase__helper(int argc, const char **argv, const 
> char *prefix)
>   return !!sequencer_continue();
>   if (command == ABORT && argc == 1)
>   return !!sequencer_remove_state();
> - if (command == MAKE_SCRIPT && argc > 1)
> - return !!sequencer_make_script(stdout, argc, argv, flags);
> + if (command == MAKE_SCRIPT && (argc == 1 || argc == 2)) {

Sorry but I am confused.  The call to rebase--helper --make-script
in git-rebase--interactive.sh we see below has only dashed options
(like --upstream, --onto, --squash-onto, --restrict-revision) and
the option parameters given to these options.

What are the remaining 1 or 2 arguments we are processing here?

Well, actually argc==1 means there is nothing else, so I am asking
about the case where argc==2.

> + char *revisions = NULL;
> + struct argv_array make_script_args = ARGV_ARRAY_INIT;
> +
> + if (!upstream && squash_onto)
> + write_file(path_squash_onto(), "%s\n", squash_onto);
> +
> + ret = get_revision_ranges(upstream, onto, _hash, 
> );
> + if (ret)
> + return ret;
> +
> + argv_array_pushl(_script_args, "", revisions, NULL);
> + if (argc == 2)
> + argv_array_push(_script_args, restrict_revisions);
> +
> + ret = !!sequencer_make_script(stdout,
> +   make_script_args.argc, 
> make_script_args.argv,
> +   flags);

Exactly the same comment on !! as an earlier step applies here.

> + free(revisions);
> + argv_array_clear(_script_args);

If I am reading the body of this if() block correctly, I think it
does everything init_revisions_and_shortrevisions shell function
does, i.e. compute $revisions for both cases with or without
upstream and write squash-onto state if needed, so that we can call
the sequencer_make_script() helper with necessary $revisions arg
without being passed from the command line of --make-script helper.

But the hunk below tells me that you are still calling
init_revisions_and_shortrevisions shell function before we are
called.  Why?  IOW, why isn't this step removing that shell function
*and* the call to it, now its logic is fully implemented in the body
of this if() block?

> + return ret;
> + }
>   if (command == CHECK_TODO_LIST && argc == 1)
>   return !!check_todo_list();
>   if (command == REARRANGE_SQUASH && argc == 1)

Thanks.

> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index 0d66c0f8b..4ca47aed1 100644
> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -92,7 +92,9 @@ git_rebase__interactive () {
>   git rebase--helper --make-script ${keep_empty:+--keep-empty} \
>   ${rebase_merges:+--rebase-merges} \
>   ${rebase_cousins:+--rebase-cousins} \
> - $revisions ${restrict_revision+^$restrict_revision} >"$todo" ||
> + ${upstream:+--upstream "$upstream"} ${onto:+--onto "$onto"} \
> + ${squash_onto:+--squash-onto "$squash_onto"} \
> + ${restrict_revision:+--restrict-revision ^"$restrict_revision"} 
> >"$todo" ||
>   die "$(gettext "Could not generate todo list")"
>  
>   exec git rebase--helper --complete-action "$shortrevisions" 
> "$onto_name" \

Re: [PATCH v3 07/24] multi-pack-index: expand test data

2018-07-12 Thread Derrick Stolee


On 7/12/2018 2:02 PM, Eric Sunshine wrote:

On Thu, Jul 12, 2018 at 10:10 AM Derrick Stolee  wrote:

On 7/6/2018 12:36 AM, Eric Sunshine wrote:

There seems to be a fair bit of duplication in these tests which
create objects. Is it possible to factor out some of this code into a
shell function?

In addition to the other small changes, this refactor in particular was
a big change (but a good one). I'm sending my current progress in this
direction, as I expect this can be improved.

I like the amount of code reduction. A couple minor comments...


+generate_objects () {
+   i=$1
+   iii=$(printf '%03i' $i)
+   {
+   test-tool genrandom "bar" 200 &&
+   test-tool genrandom "baz $iii" 50
+   } >wide_delta_$iii &&
+   {
+   test-tool genrandom "foo"$i 100 &&
+   test-tool genrandom "foo"$(( $i + 1 )) 100 &&
+   test-tool genrandom "foo"$(( $i + 2 )) 100
+   } >>deep_delta_$iii &&

I think this should be: s/>>/>/


It should!


+   echo $iii >file_$iii &&
+   test-tool genrandom "$iii" 8192 >>file_$iii &&

And this: s/>>/>/


In addition, I should wrap these two commands in { } like the files above.

Thanks,

-Stolee

Re: [PATCH v3 07/24] multi-pack-index: expand test data

2018-07-12 Thread Eric Sunshine

On Thu, Jul 12, 2018 at 10:10 AM Derrick Stolee  wrote:
> On 7/6/2018 12:36 AM, Eric Sunshine wrote:
> > There seems to be a fair bit of duplication in these tests which
> > create objects. Is it possible to factor out some of this code into a
> > shell function?
>
> In addition to the other small changes, this refactor in particular was
> a big change (but a good one). I'm sending my current progress in this
> direction, as I expect this can be improved.

I like the amount of code reduction. A couple minor comments...

> +generate_objects () {
> +   i=$1
> +   iii=$(printf '%03i' $i)
> +   {
> +   test-tool genrandom "bar" 200 &&
> +   test-tool genrandom "baz $iii" 50
> +   } >wide_delta_$iii &&
> +   {
> +   test-tool genrandom "foo"$i 100 &&
> +   test-tool genrandom "foo"$(( $i + 1 )) 100 &&
> +   test-tool genrandom "foo"$(( $i + 2 )) 100
> +   } >>deep_delta_$iii &&

I think this should be: s/>>/>/

> +   echo $iii >file_$iii &&
> +   test-tool genrandom "$iii" 8192 >>file_$iii &&

And this: s/>>/>/

> +   git update-index --add file_$iii deep_delta_$iii wide_delta_$iii
> +}

Re: [PATCH 21/25] t5000-t5999: fix broken &&-chains

2018-07-12 Thread Eric Sunshine

On Thu, Jul 12, 2018 at 8:37 AM SZEDER Gábor  wrote:
> The change below should be squashed into this patch to fix a
> previously unnoticed broken &&-chain.  I think you missed it, because
> this test script is rather expensive and you didn't run it with
> GIT_TEST_CLONE_2GB=YesPlease.
>
> diff --git a/t/t5608-clone-2gb.sh b/t/t5608-clone-2gb.sh
> @@ -23,7 +23,7 @@ test_expect_success CLONE_2GB 'setup' '
> -   echo "M 100644 :$i $i" >> commit
> +   echo "M 100644 :$i $i" >> commit &&

Thanks for finding this. I tried to get as much coverage as possible
by installing packages I don't normally have installed (Apache, cvs,
cvsps, Subversion, Perforce, etc.) and even temporarily modified a
script or two to force it run when I simply couldn't meet some
prerequisite, thus reducing the "skipped" messages to a minimum, but I
wasn't even aware of this prerequisite since I never saw a "skipped"
message for it.

Looking at it more closely, I think the reason it didn't come to my
attention is that this script doesn't use the standard skip_all="..."
mechanism for skipping the tests but instead "rolls its own", and
apparently 'prove' simply swallowed (or was unable to produce) an
overall "skipped" message for this script.

Re: [PATCH v3 0/6] Object store refactoring: commit graph

2018-07-12 Thread Derrick Stolee


On 7/11/2018 6:42 PM, Jonathan Tan wrote:

This is on _both_ ds/commit-graph-fsck and sb/object-store-lookup,
following Stolee's suggestion.

(It also seems better to build it this way to me, since both these
branches are going into "next" according to the latest What's Cooking.)

Junio wrote in [1]:


I've added SQUASH??? patch at the tip of each of the above,
rebuilt 'pu' with them and pushed the result out.  It seems that
Travis is happier with the result.

Please do not forget to squash them in when/if rerolling.  If there
is no need to change anything else other than squashing them, you
can tell me to which commit in your series the fix needs to be
squashed in (that would save me time to figure it out, obviously).

I'm rerolling because I also need to update the last patch with the new
lookup_commit() function signature that Stefan's sb/object-store-lookup
introduces. I have squashed the SQUASH??? patch into the corresponding
patch in this patch set.

Changes from v2:
  - now also based on sb/object-store-lookup in addition to
ds/commit-graph-fsck (I rebased ds/commit-graph-fsck onto
sb-object-store-lookup, then rebased this patch set onto the result)
  - patches 1-5 are unchanged
  - patch 6:
- used "PRItime" instead of "ul" when printing a timestamp (the
  SQUASH??? patch)
- updated invocations of lookup_commit() to take a repository object

[1] https://public-inbox.org/git/xmqqpnzt1myi@gitster-ct.c.googlers.com/


I re-read the patch series and think you addressed all feedback. I have 
no more to add.


Reviewed-by: Derrick Stolee

Re: rev-parse --show-toplevel broken during exec'ed rebase?

2018-07-12 Thread Vitali Lovich

On Thu, Jul 12, 2018 at 8:23 AM Junio C Hamano  wrote:
>
> Vitali Lovich  writes:
>
> > Repro (starting with cwd within git project):
> >> (cd xdiff; git rev-parse --show-toplevel)
> > ... path to git repository
> >> git rebase -i 18404434bf406f6a6f892ed73320c5cf9cc187dd
> > # Stop at some commit for editing
> >> (cd xdiff; git rev-parse --show-toplevel)
> > ... path to git repository
> >> git rebase 18404434bf406f6a6f892ed73320c5cf9cc187dd -x "(cd xdiff; git 
> >> rev-parse --show-toplevel)"
> > ... path to git repository/xdiff !!!
> >
> > This seems like incorrect behaviour to me since it's a weird
> > inconsistency (even with other rebase contexts) & the documentation
> > says "Show the absolute path of the top-level directory." with no
> > caveats.
>
> Does it reproduce with older rebase (say from v2.10 days), too?
>
> I suspect that the above is because "git rebase" is exporting
> GIT_DIR without exporting GIT_WORK_TREE.  When the former is given
> without the latter, that tells Git that you are at the top-level of
> the working tree (if that is not the case, you also export the
> latter to tell Git where the top-level is).
>
> I suspect that "git rebase" before the ongoing piecemeal rewrite to
> C started (or scripted Porcelain in general) refrained from
> exporting GIT_DIR to the environment, and if my suspicion is correct,
> older "git rebase" would behave differently to your test case.

Unfortunately I don't have an older git version handy to test this out.

Re: [PATCH v3 16/24] config: create core.multiPackIndex setting

2018-07-12 Thread Junio C Hamano

SZEDER Gábor  writes:

>> Thanks for this point. It seems to work using
>> "link:technical/multi-pack-index[1]", which is what I'll use in the next
>> version.
>
> It doesn't work, it merely works around the build failure.

Sorry. I fell into the same trap X-<.

link:techincal/multi-pack-index.html[the technical documentation
for it]

or something like that, perhaps.

Re: [PATCH] sequencer.c: terminate the last line of author-script properly

2018-07-12 Thread Junio C Hamano

"Akinori MUSHA"  writes:

> It looks like write_author_script() intends to write out a file in
> Bourne shell syntax, but it doesn't put a closing single quote on the
> last line.

s/closing single quote/& and the terminating newline/?

>
> This patch makes .git/rebase-merge/author-script actually parsable by
> sh(1) by adding a single quote and a linefeed to terminate the line
> properly.

Sounds good.

I wonder why this breakage was left unnoticed for a long time,
though.  It's not like writing and reading the author-script from C
code was done first in the "rebase -i" and friends that are users of
the sequencer machinery (I think we had code to do so in "git am"
that was rewritten in C first).  Do we have a similar issue over
there as well?  If not, perhaps if we reused the existing code that
was not broken, we wouldn't have seen this breakage on the sequencer
side?

Thanks.

>
> Signed-off-by: Akinori MUSHA 
> ---
>  sequencer.c   |  1 +
>  t/t3404-rebase-interactive.sh | 13 +
>  2 files changed, 14 insertions(+)
>
> diff --git a/sequencer.c b/sequencer.c
> index 4034c0461..5f32b6df1 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -651,6 +651,7 @@ static int write_author_script(const char *message)
>   strbuf_addch(, *(message++));
>   else
>   strbuf_addf(, "'%c'", *(message++));
> + strbuf_addstr(, "'\n");
>   res = write_message(buf.buf, buf.len, rebase_path_author_script(), 1);
>   strbuf_release();
>   return res;
> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> index 352a52e59..345b103eb 100755
> --- a/t/t3404-rebase-interactive.sh
> +++ b/t/t3404-rebase-interactive.sh
> @@ -75,6 +75,19 @@ test_expect_success 'rebase --keep-empty' '
>   test_line_count = 6 actual
>  '
>  
> +test_expect_success 'rebase -i writes out .git/rebase-merge/author-script in 
> "edit" that sh(1) can parse' '
> + test_when_finished "git rebase --abort ||:" &&
> + git checkout master &&
> + set_fake_editor &&
> + FAKE_LINES="edit 1" git rebase -i HEAD^ &&
> + test -f .git/rebase-merge/author-script &&
> + unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE &&
> + eval "$(cat .git/rebase-merge/author-script)" &&
> + test "$(git show --quiet --pretty=format:%an)" = "$GIT_AUTHOR_NAME" &&
> + test "$(git show --quiet --pretty=format:%ae)" = "$GIT_AUTHOR_EMAIL" &&
> + test "$(git show --quiet --date=raw --pretty=format:@%ad)" = 
> "$GIT_AUTHOR_DATE"
> +'
> +
>  test_expect_success 'rebase -i with the exec command' '
>   git checkout master &&
>   (
> -- 
> 2.18.0

Re: [PATCH 0/2] Fix --rebase-merges with custom commentChar

2018-07-12 Thread Junio C Hamano

Aaron Schrab  writes:

> Subject: [PATCH v2] sequencer: use configured comment character
>
> Use the configured comment character when generating comments about
> branches in a todo list.  Failure to honor this configuration causes a
> failure to parse the resulting todo list.

OK.

>
> Note that the comment_line_char has already been resolved by this point,
> even if the user has configured the comment character to be selected
> automatically.

Isn't this a slight lie?

The core.commentchar=auto setting is noticed by everybody (including
the users of the sequencer machinery), but it is honored only by
builtin/commit.c::prepare_to_commit() that is called by
builtin/commit.c::cmd_commit(), i.e. the implementation of "git
commit" that should not be used as a subroutine by other commands,
and by nothing else.  If the user has core.commentchar=auto, the
comment_line_char is left to the default '#' in the sequencer
codepath.

I think the patch is still correct and safe, but the reason why it
is so is not because we chose a suitable character (that is how I
read what "has already been resolved by this point" means) by
calling builtin/commit.c::adjust_comment_line_char().  Isn't it
because the "script" the function is working on does not have a line
that came from arbitrary end-user input that may happen to begin
with '#', hence the default '#' is safe to use?

> Signed-off-by: Aaron Schrab 
> ---
>  sequencer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index 4034c0461b..caf91af29d 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -3991,7 +3991,7 @@ static int make_script_with_merges(struct 
> pretty_print_context *pp,
>   entry = oidmap_get(, >object.oid);
>  
>   if (entry)
> - fprintf(out, "\n# Branch %s\n", entry->string);
> + fprintf(out, "\n%c Branch %s\n", comment_line_char, 
> entry->string);
>   else
>   fprintf(out, "\n");

Re: [PATCH] ref-filter: mark some file-local symbols as static

2018-07-12 Thread Оля Тележная

2018-07-12 18:57 GMT+03:00 Ramsay Jones :
>
> Signed-off-by: Ramsay Jones 
> ---
>
> Hi Olga,
>
> If you need to re-roll your 'ot/ref-filter-object-info' branch,
> could you please squash this into the relevant patch (commit c5d9a471d6,
> "ref-filter: use oid_object_info() to get object", 2018-07-09).
>
> [Both sparse and my static-check.pl script are complaining about
> the 'oi' and 'oi_deref' symbols.]

You are right. Thanks a lot!

>
> Thanks!
>
> ATB,
> Ramsay Jones
>
>  ref-filter.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ref-filter.c b/ref-filter.c
> index 9aedf29e0..70fb15619 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -63,7 +63,7 @@ struct refname_atom {
> int lstrip, rstrip;
>  };
>
> -struct expand_data {
> +static struct expand_data {
> struct object_id oid;
> enum object_type type;
> unsigned long size;
> --
> 2.18.0

Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

2018-07-12 Thread Junio C Hamano

Johannes Schindelin  writes:

> I would like to ask you to reinstate the post-rewrite hook, as it still
> improves the situation over the current one.

Without post-rewrite I seem to be getting correct amlog entries for
commits created by "git rebase"; do our rebase--am backend still
trigger post-applypatch hook in its "am" phase to apply the patches
created with "format-patch"?

Re: [PATCH v2 01/10] t/test-lib: teach --chain-lint to detect broken &&-chains in subshells

2018-07-12 Thread Jeff King

On Thu, Jul 12, 2018 at 06:50:20AM -0400, Eric Sunshine wrote:

> And, perhaps most important: We're not tied indefinitely to the
> "subset" implemented by the current linter. If it is indeed found to
> be too strict or limiting, it can always be loosened or retired
> altogether.

Yeah, I agree this is the key point.

Like Junio, I'm a little nervous that this is going to end up being a
maintenance burden. People may hit false positives and then be
confronted with this horrible mass of sed to try to figure out what went
wrong (which isn't to bust on your sed in particular; I think you made a
heroic effort in commenting).

But I came around to thinking:

  - this found and fixed real problems in the test suite, with minimal
false positives across the existing code

  - it's being done by a long-time contributor, not somebody who is
going to dump sed on us and leave

  - worst case is that relief is only a "git revert" away

So I'm OK with merging it, and even running it by default.

-Peff

1 2 >

1 - 100 of 144 matches

Mail list logo