Re: [PATCH v6 5/6] grep: enable recurse-submodules to work on objects

2016-11-30 Thread Johannes Sixt

Am 01.12.2016 um 02:28 schrieb Brandon Williams:

+   git init "su:b" &&


Don't do that. Colons in file names won't work on Windows.

-- Hannes



Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)

2016-11-30 Thread Jeff King
On Thu, Dec 01, 2016 at 08:09:16AM +0100, Johannes Sixt wrote:

> Am 01.12.2016 um 00:40 schrieb Jeff King:
> > 20813 access("su:b/../.git/modules/su:b/refs", X_OK) = 0
> 
> Side note: where does this weirdness "su:b" come from? (I haven't looked at
> any of the patches, yet, let alone tested.) Colons in file or directory
> names won't work on Windows (in the way one would expect).

It's explicitly used in the test, I assume to check that the recursive
grep is not confused into treating the name as a tree-ish.

I think it would have to either be marked with a prereq on Windows, or
modified to do the whole thing in-index (and use "grep --cached").

-Peff


Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)

2016-11-30 Thread Johannes Sixt

Am 01.12.2016 um 00:40 schrieb Jeff King:

20813 access("su:b/../.git/modules/su:b/refs", X_OK) = 0


Side note: where does this weirdness "su:b" come from? (I haven't looked 
at any of the patches, yet, let alone tested.) Colons in file or 
directory names won't work on Windows (in the way one would expect).


-- Hannes



Re: 'git repack' and repack.writeBitmaps=true with kept packs

2016-11-30 Thread Jeff King
On Wed, Nov 30, 2016 at 04:15:33PM -0800, Steven Noonan wrote:

> It seems like it's behaving as though I've provided
> --pack-kept-objects. In order to ensure the .bitmap is created, it
> repacks everything, including everything in existing .pack files (not
> respecting .keep). But then it's not deleting the old .pack file
> (oddly, respecting .keep).

Right, that's exactly what's happening.

The bitmaps require a completely reachable set inside the pack, so if
you omit some objects that are in .keep packs, we cannot generate the
bitmap. So we have to either disable bitmaps, or pack the kept objects.
By default, we do the latter (and I'll explain why in a minute).

We can't delete the .keep packfiles because we don't know for sure that
we've included all of their contents in the new pack (not to mention
that somebody asked to keep them, and we don't know why; we should
respect that).

> What I'd expect it to do here is ignore the 'repack.writeBitmaps =
> true' value if there's a .keep that needs to be respected. Is this not
> a correct assumption?

In practice, I think that ends up worse. The .keep files are used by
receive-pack as lockfiles for incoming pushes. So imagine you kick off a
full repack just as somebody is pushing, and repack sees the temporary
.keep file. Your options are:

  1. Disable bitmaps, leaving the repository with no bitmaps at all
 until the next repack (because you're deleting the old bitmaps
 along with the old, non-kept pack).

  2. Duplicate the newly pushed objects in the pack (if they're even
 reachable; you're also racing to see the ref updates). Now you have
 bitmaps, but you're wasting a bit of space to store the racy push
 twice (and it goes away next time you repack).

If you're running a Git server which depends on bitmaps for good
performance, then (2) is much better. And that's the default.

If you want to override it, you can pass --no-pack-kept-objects, or set
repack.packKeptObjects to false.

I think the documentation for --pack-kept-objects could be a bit more
clear for this case. It doesn't mention the default value, nor that what
you really want with "-b" is probably "--no-pack-kept-objects".

-Peff


Re: [PATCH] xdiff: Do not enable XDL_FAST_HASH by default

2016-11-30 Thread Jeff King
On Wed, Nov 30, 2016 at 11:26:43PM -0500, Anders Kaseorg wrote:

> > So I suspect a better strategy in general is to just override the
> > uname_* variables when cross-compiling.
> 
> The specific case of an i386 userspace on an x86_64 kernel is important 
> independently of the general cross compilation problem (in fact, the words 
> “cross compilation” may not even really apply here).  And I don’t think 
> one should have to manually tweak the build for this setup, especially 
> since the compiler already has the needed information.

Ah, I mistook that you were really cross-compiling x86-64 from i386, in
which case you'd generally have to set CC, etc for the cross-compile
chain. I agree this is a much more subtle case, and it's nice for it to
work out of the box.

> diff --git a/Makefile b/Makefile
> index f53fcc90d..c237d4f91 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -341,7 +341,6 @@ all::
>  # Define XDL_FAST_HASH to use an alternative line-hashing method in
>  # the diff algorithm.  It gives a nice speedup if your processor has
>  # fast unaligned word loads.  Does NOT work on big-endian systems!
> -# Enabled by default on x86_64.

This is a nice incremental step in the sense that people can still
enable it if they want to in order to time it, play with it, etc. But
given what we know, I wonder if the help text here should warn people.

Or I guess we could move straight to dropping it entirely.

Here's what that patch might look like (I retimed it just be sure, and
was sad to see that it really _is_ making some cases faster. But I still
think slower-but-predictable is a better default).

I didn't drop uname_M here. If we go this route, I think it would make
sense to do that in a separate patch on top, with your commit message
explaining why it is a bad idea versus using compiler-defined macros.

-- >8 --
Subject: [PATCH] xdiff: drop XDL_FAST_HASH

The xdiff code hashes every line of both sides of a diff,
and then compares those hashes to find duplicates. The
overall performance depends both on how fast we can compute
the hashes, but also on how many hash collisions we see.

The idea of XDL_FAST_HASH is to speed up the hash
computation. But the generated hashes have worse collision
behavior. This means that in some cases it speeds diffs up
(running "git log -p" on git.git improves by ~8% with it),
but in others it can slow things down. One pathological case
saw over a 100x slowdown[1].

There may be a better hash function that covers both
properties, but in the meantime we are better off with the
original hash. It's slightly slower in the common case, but
it has fewer surprising pathological cases.

[1] http://public-inbox.org/git/20141222041944.ga...@peff.net/

Signed-off-by: Jeff King 
---
 Makefile |   9 -
 config.mak.uname |   3 --
 xdiff/xutils.c   | 106 ---
 3 files changed, 118 deletions(-)

diff --git a/Makefile b/Makefile
index f53fcc90d..f61076997 100644
--- a/Makefile
+++ b/Makefile
@@ -338,11 +338,6 @@ all::
 #
 # Define NATIVE_CRLF if your platform uses CRLF for line endings.
 #
-# Define XDL_FAST_HASH to use an alternative line-hashing method in
-# the diff algorithm.  It gives a nice speedup if your processor has
-# fast unaligned word loads.  Does NOT work on big-endian systems!
-# Enabled by default on x86_64.
-#
 # Define GIT_USER_AGENT if you want to change how git identifies itself during
 # network interactions.  The default is "git/$(GIT_VERSION)".
 #
@@ -1485,10 +1480,6 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
MSGFMT += --check --statistics
 endif
 
-ifneq (,$(XDL_FAST_HASH))
-   BASIC_CFLAGS += -DXDL_FAST_HASH
-endif
-
 ifdef GMTIME_UNRELIABLE_ERRORS
COMPAT_OBJS += compat/gmtime.o
BASIC_CFLAGS += -DGMTIME_UNRELIABLE_ERRORS
diff --git a/config.mak.uname b/config.mak.uname
index b232908f8..447f36ac2 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -17,9 +17,6 @@ endif
 # because maintaining the nesting to match is a pain.  If
 # we had "elif" things would have been much nicer...
 
-ifeq ($(uname_M),x86_64)
-   XDL_FAST_HASH = YesPlease
-endif
 ifeq ($(uname_S),OSF1)
# Need this for u_short definitions et al
BASIC_CFLAGS += -D_OSF_SOURCE
diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index 027192a1c..04d7b32e4 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -264,110 +264,6 @@ static unsigned long xdl_hash_record_with_whitespace(char 
const **data,
return ha;
 }
 
-#ifdef XDL_FAST_HASH
-
-#define REPEAT_BYTE(x)  ((~0ul / 0xff) * (x))
-
-#define ONEBYTES   REPEAT_BYTE(0x01)
-#define NEWLINEBYTES   REPEAT_BYTE(0x0a)
-#define HIGHBITS   REPEAT_BYTE(0x80)
-
-/* Return the high bit set in the first byte that is a zero */
-static inline unsigned long has_zero(unsigned long a)
-{
-   return ((a - ONEBYTES) & ~a) & HIGHBITS;
-}
-
-static inline long count_masked_bytes(unsigned long mask)
-{
-   if (sizeof(long) == 8) {
- 

[PATCH] xdiff: Do not enable XDL_FAST_HASH by default

2016-11-30 Thread Anders Kaseorg
Although XDL_FAST_HASH computes hashes slightly faster on some
architectures, its collision characteristics are much worse, resulting
in some pathological diffs running over 100x slower
(http://public-inbox.org/git/20141222041944.ga...@peff.net/).

Furthermore, it was being enabled when ‘uname -m’ returns x86_64, even
if we are cross-compiling for a different architecture.  This mistake
was also causing the Debian build reproducibility test to fail
(https://tests.reproducible-builds.org/debian/index_variations.html).
Future architecture-specific definitions should be based on compiler
macros such as __x86_64__ rather than uname.

Signed-off-by: Anders Kaseorg 
---

[Oops, also resending for Thomas’s new email address.  Sorry for the 
spam.]

On Wed, 30 Nov 2016, Jeff King wrote:
> However, I think this might be the tip of the iceberg. There are lots of
> Makefile knobs whose defaults are tweaked based on uname output. This
> one caught you because you are cross-compiling across architectures, but
> in theory you could cross-compile for FreeBSD from Linux, or whatever.
> 
> So I suspect a better strategy in general is to just override the
> uname_* variables when cross-compiling.

The specific case of an i386 userspace on an x86_64 kernel is important 
independently of the general cross compilation problem (in fact, the words 
“cross compilation” may not even really apply here).  And I don’t think 
one should have to manually tweak the build for this setup, especially 
since the compiler already has the needed information.

> All that being said, I actually think an easier fix for this particular
> case might be to drop XDL_FAST_HASH entirely.

Works for me.

Anders

 Makefile | 1 -
 config.mak.uname | 5 -
 2 files changed, 6 deletions(-)

diff --git a/Makefile b/Makefile
index f53fcc90d..c237d4f91 100644
--- a/Makefile
+++ b/Makefile
@@ -341,7 +341,6 @@ all::
 # Define XDL_FAST_HASH to use an alternative line-hashing method in
 # the diff algorithm.  It gives a nice speedup if your processor has
 # fast unaligned word loads.  Does NOT work on big-endian systems!
-# Enabled by default on x86_64.
 #
 # Define GIT_USER_AGENT if you want to change how git identifies itself during
 # network interactions.  The default is "git/$(GIT_VERSION)".
diff --git a/config.mak.uname b/config.mak.uname
index b232908f8..2831a68c3 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -1,10 +1,8 @@
 # Platform specific Makefile tweaks based on uname detection
 
 uname_S := $(shell sh -c 'uname -s 2>/dev/null || echo not')
-uname_M := $(shell sh -c 'uname -m 2>/dev/null || echo not')
 uname_O := $(shell sh -c 'uname -o 2>/dev/null || echo not')
 uname_R := $(shell sh -c 'uname -r 2>/dev/null || echo not')
-uname_P := $(shell sh -c 'uname -p 2>/dev/null || echo not')
 uname_V := $(shell sh -c 'uname -v 2>/dev/null || echo not')
 
 ifdef MSVC
@@ -17,9 +15,6 @@ endif
 # because maintaining the nesting to match is a pain.  If
 # we had "elif" things would have been much nicer...
 
-ifeq ($(uname_M),x86_64)
-   XDL_FAST_HASH = YesPlease
-endif
 ifeq ($(uname_S),OSF1)
# Need this for u_short definitions et al
BASIC_CFLAGS += -D_OSF_SOURCE
-- 
2.11.0



Re: [PATCH v6 1/6] submodules: add helper functions to determine presence of submodules

2016-11-30 Thread Jeff King
On Wed, Nov 30, 2016 at 05:28:29PM -0800, Brandon Williams wrote:

> +/*
> + * Determine if a submodule has been populated at a given 'path'
> + */
> +int is_submodule_populated(const char *path)
> +{
> + int ret = 0;
> + struct stat st;
> + char *gitdir = xstrfmt("%s/.git", path);
> +
> + if (!stat(gitdir, ))
> + ret = 1;
> +
> + free(gitdir);
> + return ret;
> +}

I don't know if it's worth changing or not, but this could be a bit
shorter:

  int is_submodule_populated(const char *path)
  {
return !access(mkpath("%s/.git", path), F_OK);
  }

There is a file_exists() helper, but it uses lstat(), which I think you
don't want (because you'd prefer to bail on a broken .git symlink). But
access(F_OK) does what you want, I think.

mkpath() is generally an unsafe function because it uses a static
buffer, but it's handy and safe for handing values to syscalls like
this.

I say "I don't know if it's worth it" because what you've written is
fine, and while more lines, it's fairly obvious and safe.

-Peff


[PATCH] xdiff: Do not enable XDL_FAST_HASH by default

2016-11-30 Thread Anders Kaseorg
Although XDL_FAST_HASH computes hashes slightly faster on some
architectures, its collision characteristics are much worse, resulting
in some pathological diffs running over 100x slower
(http://public-inbox.org/git/20141222041944.ga...@peff.net/).

Furthermore, it was being enabled when ‘uname -m’ returns x86_64, even
if we are cross-compiling for a different architecture.  This mistake
was also causing the Debian build reproducibility test to fail
(https://tests.reproducible-builds.org/debian/index_variations.html).
Future architecture-specific definitions should be based on compiler
macros such as __x86_64__ rather than uname.

Signed-off-by: Anders Kaseorg 
---

On Wed, 30 Nov 2016, Jeff King wrote:
> However, I think this might be the tip of the iceberg. There are lots of
> Makefile knobs whose defaults are tweaked based on uname output. This
> one caught you because you are cross-compiling across architectures, but
> in theory you could cross-compile for FreeBSD from Linux, or whatever.
> 
> So I suspect a better strategy in general is to just override the
> uname_* variables when cross-compiling.

The specific case of an i386 userspace on an x86_64 kernel is important 
independently of the general cross compilation problem (in fact, the words 
“cross compilation” may not even really apply here).  And I don’t think 
one should have to manually tweak the build for this setup, especially 
since the compiler already has the needed information.

> All that being said, I actually think an easier fix for this particular
> case might be to drop XDL_FAST_HASH entirely.

Works for me.

Anders

 Makefile | 1 -
 config.mak.uname | 5 -
 2 files changed, 6 deletions(-)

diff --git a/Makefile b/Makefile
index f53fcc90d..c237d4f91 100644
--- a/Makefile
+++ b/Makefile
@@ -341,7 +341,6 @@ all::
 # Define XDL_FAST_HASH to use an alternative line-hashing method in
 # the diff algorithm.  It gives a nice speedup if your processor has
 # fast unaligned word loads.  Does NOT work on big-endian systems!
-# Enabled by default on x86_64.
 #
 # Define GIT_USER_AGENT if you want to change how git identifies itself during
 # network interactions.  The default is "git/$(GIT_VERSION)".
diff --git a/config.mak.uname b/config.mak.uname
index b232908f8..2831a68c3 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -1,10 +1,8 @@
 # Platform specific Makefile tweaks based on uname detection
 
 uname_S := $(shell sh -c 'uname -s 2>/dev/null || echo not')
-uname_M := $(shell sh -c 'uname -m 2>/dev/null || echo not')
 uname_O := $(shell sh -c 'uname -o 2>/dev/null || echo not')
 uname_R := $(shell sh -c 'uname -r 2>/dev/null || echo not')
-uname_P := $(shell sh -c 'uname -p 2>/dev/null || echo not')
 uname_V := $(shell sh -c 'uname -v 2>/dev/null || echo not')
 
 ifdef MSVC
@@ -17,9 +15,6 @@ endif
 # because maintaining the nesting to match is a pain.  If
 # we had "elif" things would have been much nicer...
 
-ifeq ($(uname_M),x86_64)
-   XDL_FAST_HASH = YesPlease
-endif
 ifeq ($(uname_S),OSF1)
# Need this for u_short definitions et al
BASIC_CFLAGS += -D_OSF_SOURCE
-- 
2.11.0



Re: [PATCH v6 0/6] recursively grep across submodules

2016-11-30 Thread Jeff King
On Wed, Nov 30, 2016 at 05:28:28PM -0800, Brandon Williams wrote:

> v6 fixes a race condition which existed in the 'is_submodule_populated'
> function.  Instead of calling 'resolve_gitdir' to check for the existance of a
> .git file/directory, use 'stat'.  'resolve_gitdir' calls 'chdir' which can
> affect other running threads trying to load thier files into a buffer in
> memory.

This one passes my stress-test for t7814 (though I imagine you already
knew that).

I tried to think of things that could go wrong by using a simple stat()
instead of resolve_gitdir(). They should only differ when ".git" for
some reason does not point to a git repository. My initial thought is
that this might be more vocal about errors, because the child process
will complain. But actually, the original would already die if the
".git" file is funny, so we were pretty vocal already.

I also wondered whether the sub-process might skip a bogus ".git" file
and keep looking upward in the filesystem tree (which would confusingly
end up back in the super-project!). But it looks like we bail hard when
we see a ".git" file but it's bogus. Which is probably a good thing in
general for submodules.

I'm not sure any of that is actually even worth worrying about, as such
a setup is broken by definition. I just wanted to think it through as a
devil's advocate, and even that seems pretty reasonable.

-Peff


Re: [PATCH] difftool.c: mark a file-local symbol with static

2016-11-30 Thread Jeff King
On Thu, Dec 01, 2016 at 01:18:35AM +, Ramsay Jones wrote:

> >> I forgot, we ended up reversing course later and silencing them:
> >>
> >>   http://public-inbox.org/git/20140505052117.gc6...@sigill.intra.peff.net/
> >>
> >> By the rationale of that conversation, we should be doing:
> >>
> >>   warning("%s", "");
> >>
> >> here.
> > 
> > I forgot too.  Thanks for digging up that thread.
> 
> Yes, I blamed wt-status.c:227 and came up with commit 7d7d68022
> as well.
> 
> So, by the same rationale, we should remove -Wno-format-zero-length
> from DEVELOPER_CFLAGS. yes?

I don't have a preference on which direction we go, but yes, right now
we are in an awkward middle ground. We should do one of:

  1. Drop -Wno-format-zero-length from DEVELOPER_CFLAGS and make sure
 future patches to do not violate it.

  2. Declare warning("") as OK.

I still think the warning is silly, but (1) has value in that it
produces the least surprise and annoyance to various people building
Git.

-Peff


Re: [PATCH] Define XDL_FAST_HASH when building *for* (not *on*) x86_64

2016-11-30 Thread Jeff King
[resend; the original had an outdated address for Thomas, and I would
 definitely like his blessing before removing XDL_FAST_HASH].

On Wed, Nov 30, 2016 at 10:04:07PM -0500, Anders Kaseorg wrote:

> Previously, XDL_FAST_HASH was defined when ‘uname -m’ returns x86_64,
> even if we are cross-compiling for a different architecture.  Check
> the __x86_64__ compiler macro instead.
> 
> In addition to fixing the cross compilation bug, this is needed to
> pass the Debian build reproducibility test
> (https://tests.reproducible-builds.org/debian/index_variations.html).

I don't think this is a good approach to fix it. Right now XDL_FAST_HASH
is a Makefile knob that can be turned by the user, and can be used
either to explicitly enable or explicitly disable the feature.

With your patch, building with "make XDL_FAST_HASH=Yes" will still
explicitly enable it, but "make XDL_FAST_HASH=" would no longer disable
it (because even if unset, the compiler would turn it on when it sees
__x86_64__).

And being able to turn it off is important; more on that in a second.

So I think if we wanted to auto-detect based on __x86_64__, we'd
probably need to be able to set it to "auto" or something, and then

  #if defined(XDL_FAST_HASH_AUTO) && __x86_64__
  #define XDL_FAST_HASH
  #endif

or something.

However, I think this might be the tip of the iceberg. There are lots of
Makefile knobs whose defaults are tweaked based on uname output. This
one caught you because you are cross-compiling across architectures, but
in theory you could cross-compile for FreeBSD from Linux, or whatever.

So I suspect a better strategy in general is to just override the
uname_* variables when cross-compiling.


All that being said, I actually think an easier fix for this particular
case might be to drop XDL_FAST_HASH entirely. It computes the hashes
slightly faster, but its collision characteristics are much worse. About
2 years ago I ran across a pathological diff that ran over 100x slower
with XDL_FAST_HASH:

  http://public-inbox.org/git/20141222041944.ga...@peff.net/

The discussion veered into whether we should have a randomized hash
secured against DoS attacks. I played around with some alternatives, but
never found anything quite as fast for the "normal" case. And having
disabled XDL_FAST_HASH on GitHub's servers, it wasn't a big priority for
me.

I'd be happy if somebody wanted to investigate other hash functions
further. But barring that, I think we should drop XDL_FAST_HASH (or at
the very least stop turning it on by default) in the meantime. It's just
not a good tradeoff.

-Peff


Re: [PATCH] Define XDL_FAST_HASH when building *for* (not *on*) x86_64

2016-11-30 Thread Jeff King
On Wed, Nov 30, 2016 at 10:04:07PM -0500, Anders Kaseorg wrote:

> Previously, XDL_FAST_HASH was defined when ‘uname -m’ returns x86_64,
> even if we are cross-compiling for a different architecture.  Check
> the __x86_64__ compiler macro instead.
> 
> In addition to fixing the cross compilation bug, this is needed to
> pass the Debian build reproducibility test
> (https://tests.reproducible-builds.org/debian/index_variations.html).

I don't think this is a good approach to fix it. Right now XDL_FAST_HASH
is a Makefile knob that can be turned by the user, and can be used
either to explicitly enable or explicitly disable the feature.

With your patch, building with "make XDL_FAST_HASH=Yes" will still
explicitly enable it, but "make XDL_FAST_HASH=" would no longer disable
it (because even if unset, the compiler would turn it on when it sees
__x86_64__).

And being able to turn it off is important; more on that in a second.

So I think if we wanted to auto-detect based on __x86_64__, we'd
probably need to be able to set it to "auto" or something, and then

  #if defined(XDL_FAST_HASH_AUTO) && __x86_64__
  #define XDL_FAST_HASH
  #endif

or something.

However, I think this might be the tip of the iceberg. There are lots of
Makefile knobs whose defaults are tweaked based on uname output. This
one caught you because you are cross-compiling across architectures, but
in theory you could cross-compile for FreeBSD from Linux, or whatever.

So I suspect a better strategy in general is to just override the
uname_* variables when cross-compiling.


All that being said, I actually think an easier fix for this particular
case might be to drop XDL_FAST_HASH entirely. It computes the hashes
slightly faster, but its collision characteristics are much worse. About
2 years ago I ran across a pathological diff that ran over 100x slower
with XDL_FAST_HASH:

  http://public-inbox.org/git/20141222041944.ga...@peff.net/

The discussion veered into whether we should have a randomized hash
secured against DoS attacks. I played around with some alternatives, but
never found anything quite as fast for the "normal" case. And having
disabled XDL_FAST_HASH on GitHub's servers, it wasn't a big priority for
me.

I'd be happy if somebody wanted to investigate other hash functions
further. But barring that, I think we should drop XDL_FAST_HASH (or at
the very least stop turning it on by default) in the meantime. It's just
not a good tradeoff.

-Peff


[PATCH] Define XDL_FAST_HASH when building *for* (not *on*) x86_64

2016-11-30 Thread Anders Kaseorg
Previously, XDL_FAST_HASH was defined when ‘uname -m’ returns x86_64,
even if we are cross-compiling for a different architecture.  Check
the __x86_64__ compiler macro instead.

In addition to fixing the cross compilation bug, this is needed to
pass the Debian build reproducibility test
(https://tests.reproducible-builds.org/debian/index_variations.html).

Signed-off-by: Anders Kaseorg 
---
 config.mak.uname | 5 -
 xdiff/xutils.c   | 4 
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/config.mak.uname b/config.mak.uname
index b232908f8..2831a68c3 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -1,10 +1,8 @@
 # Platform specific Makefile tweaks based on uname detection
 
 uname_S := $(shell sh -c 'uname -s 2>/dev/null || echo not')
-uname_M := $(shell sh -c 'uname -m 2>/dev/null || echo not')
 uname_O := $(shell sh -c 'uname -o 2>/dev/null || echo not')
 uname_R := $(shell sh -c 'uname -r 2>/dev/null || echo not')
-uname_P := $(shell sh -c 'uname -p 2>/dev/null || echo not')
 uname_V := $(shell sh -c 'uname -v 2>/dev/null || echo not')
 
 ifdef MSVC
@@ -17,9 +15,6 @@ endif
 # because maintaining the nesting to match is a pain.  If
 # we had "elif" things would have been much nicer...
 
-ifeq ($(uname_M),x86_64)
-   XDL_FAST_HASH = YesPlease
-endif
 ifeq ($(uname_S),OSF1)
# Need this for u_short definitions et al
BASIC_CFLAGS += -D_OSF_SOURCE
diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index 027192a1c..f46351fe4 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -264,6 +264,10 @@ static unsigned long xdl_hash_record_with_whitespace(char 
const **data,
return ha;
 }
 
+#ifdef __x86_64__
+#define XDL_FAST_HASH
+#endif
+
 #ifdef XDL_FAST_HASH
 
 #define REPEAT_BYTE(x)  ((~0ul / 0xff) * (x))
-- 
2.11.0



[PATCH v6 4/6] grep: optionally recurse into submodules

2016-11-30 Thread Brandon Williams
Allow grep to recognize submodules and recursively search for patterns in
each submodule.  This is done by forking off a process to recursively
call grep on each submodule.  The top level --super-prefix option is
used to pass a path to the submodule which can in turn be used to
prepend to output or in pathspec matching logic.

Recursion only occurs for submodules which have been initialized and
checked out by the parent project.  If a submodule hasn't been
initialized and checked out it is simply skipped.

In order to support the existing multi-threading infrastructure in grep,
output from each child process is captured in a strbuf so that it can be
later printed to the console in an ordered fashion.

To limit the number of theads that are created, each child process has
half the number of threads as its parents (minimum of 1), otherwise we
potentailly have a fork-bomb.

Signed-off-by: Brandon Williams 
---
 Documentation/git-grep.txt |   5 +
 builtin/grep.c | 300 ++---
 git.c  |   2 +-
 t/t7814-grep-recurse-submodules.sh |  99 
 4 files changed, 386 insertions(+), 20 deletions(-)
 create mode 100755 t/t7814-grep-recurse-submodules.sh

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 0ecea6e..17aa1ba 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,6 +26,7 @@ SYNOPSIS
   [--threads ]
   [-f ] [-e] 
   [--and|--or|--not|(|)|-e ...]
+  [--recurse-submodules]
   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | 
...]
   [--] [...]
 
@@ -88,6 +89,10 @@ OPTIONS
mechanism.  Only useful when searching files in the current directory
with `--no-index`.
 
+--recurse-submodules::
+   Recursively search in each submodule that has been initialized and
+   checked out in the repository.
+
 -a::
 --text::
Process binary files as if they were text.
diff --git a/builtin/grep.c b/builtin/grep.c
index 8887b6a..dca0be6 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -18,12 +18,20 @@
 #include "quote.h"
 #include "dir.h"
 #include "pathspec.h"
+#include "submodule.h"
 
 static char const * const grep_usage[] = {
N_("git grep [] [-e]  [...] [[--] ...]"),
NULL
 };
 
+static const char *super_prefix;
+static int recurse_submodules;
+static struct argv_array submodule_options = ARGV_ARRAY_INIT;
+
+static int grep_submodule_launch(struct grep_opt *opt,
+const struct grep_source *gs);
+
 #define GREP_NUM_THREADS_DEFAULT 8
 static int num_threads;
 
@@ -174,7 +182,10 @@ static void *run(void *arg)
break;
 
opt->output_priv = w;
-   hit |= grep_source(opt, >source);
+   if (w->source.type == GREP_SOURCE_SUBMODULE)
+   hit |= grep_submodule_launch(opt, >source);
+   else
+   hit |= grep_source(opt, >source);
grep_source_clear_data(>source);
work_done(w);
}
@@ -300,6 +311,10 @@ static int grep_sha1(struct grep_opt *opt, const unsigned 
char *sha1,
if (opt->relative && opt->prefix_length) {
quote_path_relative(filename + tree_name_len, opt->prefix, 
);
strbuf_insert(, 0, filename, tree_name_len);
+   } else if (super_prefix) {
+   strbuf_add(, filename, tree_name_len);
+   strbuf_addstr(, super_prefix);
+   strbuf_addstr(, filename + tree_name_len);
} else {
strbuf_addstr(, filename);
}
@@ -328,10 +343,13 @@ static int grep_file(struct grep_opt *opt, const char 
*filename)
 {
struct strbuf buf = STRBUF_INIT;
 
-   if (opt->relative && opt->prefix_length)
+   if (opt->relative && opt->prefix_length) {
quote_path_relative(filename, opt->prefix, );
-   else
+   } else {
+   if (super_prefix)
+   strbuf_addstr(, super_prefix);
strbuf_addstr(, filename);
+   }
 
 #ifndef NO_PTHREADS
if (num_threads) {
@@ -378,31 +396,260 @@ static void run_pager(struct grep_opt *opt, const char 
*prefix)
exit(status);
 }
 
-static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, 
int cached)
+static void compile_submodule_options(const struct grep_opt *opt,
+ const struct pathspec *pathspec,
+ int cached, int untracked,
+ int opt_exclude, int use_index,
+ int pattern_type_arg)
+{
+   struct grep_pat *pattern;
+   int i;
+
+   if (recurse_submodules)
+   argv_array_push(_options, "--recurse-submodules");
+
+   if (cached)
+   argv_array_push(_options, 

[PATCH v6 6/6] grep: search history of moved submodules

2016-11-30 Thread Brandon Williams
If a submodule was renamed at any point since it's inception then if you
were to try and grep on a commit prior to the submodule being moved, you
wouldn't be able to find a working directory for the submodule since the
path in the past is different from the current path.

This patch teaches grep to find the .git directory for a submodule in
the parents .git/modules/ directory in the event the path to the
submodule in the commit that is being searched differs from the state of
the currently checked out commit.  If found, the child process that is
spawned to grep the submodule will chdir into its gitdir instead of a
working directory.

In order to override the explicit setting of submodule child process's
gitdir environment variable (which was introduced in '10f5c526')
`GIT_DIR_ENVIORMENT` needs to be pushed onto child process's env_array.
This allows the searching of history from a submodule's gitdir, rather
than from a working directory.

Signed-off-by: Brandon Williams 
---
 builtin/grep.c | 20 +--
 t/t7814-grep-recurse-submodules.sh | 41 ++
 2 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 5918a26..2c727ef 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -547,6 +547,7 @@ static int grep_submodule_launch(struct grep_opt *opt,
name = gs->name;
 
prepare_submodule_repo_env(_array);
+   argv_array_push(_array, GIT_DIR_ENVIRONMENT);
 
/* Add super prefix */
argv_array_pushf(, "--super-prefix=%s%s/",
@@ -615,8 +616,23 @@ static int grep_submodule(struct grep_opt *opt, const 
unsigned char *sha1,
 {
if (!is_submodule_initialized(path))
return 0;
-   if (!is_submodule_populated(path))
-   return 0;
+   if (!is_submodule_populated(path)) {
+   /*
+* If searching history, check for the presense of the
+* submodule's gitdir before skipping the submodule.
+*/
+   if (sha1) {
+   const struct submodule *sub =
+   submodule_from_path(null_sha1, path);
+   if (sub)
+   path = git_path("modules/%s", sub->name);
+
+   if (!(is_directory(path) && is_git_directory(path)))
+   return 0;
+   } else {
+   return 0;
+   }
+   }
 
 #ifndef NO_PTHREADS
if (num_threads) {
diff --git a/t/t7814-grep-recurse-submodules.sh 
b/t/t7814-grep-recurse-submodules.sh
index 9e93fe7..0507771 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -186,6 +186,47 @@ test_expect_success 'grep recurse submodule colon in name' 
'
test_cmp expect actual
 '
 
+test_expect_success 'grep history with moved submoules' '
+   git init parent &&
+   test_when_finished "rm -rf parent" &&
+   echo "foobar" >parent/file &&
+   git -C parent add file &&
+   git -C parent commit -m "add file" &&
+
+   git init sub &&
+   test_when_finished "rm -rf sub" &&
+   echo "foobar" >sub/file &&
+   git -C sub add file &&
+   git -C sub commit -m "add file" &&
+
+   git -C parent submodule add ../sub dir/sub &&
+   git -C parent commit -m "add submodule" &&
+
+   cat >expect <<-\EOF &&
+   dir/sub/file:foobar
+   file:foobar
+   EOF
+   git -C parent grep -e "foobar" --recurse-submodules >actual &&
+   test_cmp expect actual &&
+
+   git -C parent mv dir/sub sub-moved &&
+   git -C parent commit -m "moved submodule" &&
+
+   cat >expect <<-\EOF &&
+   file:foobar
+   sub-moved/file:foobar
+   EOF
+   git -C parent grep -e "foobar" --recurse-submodules >actual &&
+   test_cmp expect actual &&
+
+   cat >expect <<-\EOF &&
+   HEAD^:dir/sub/file:foobar
+   HEAD^:file:foobar
+   EOF
+   git -C parent grep -e "foobar" --recurse-submodules HEAD^ >actual &&
+   test_cmp expect actual
+'
+
 test_incompatible_with_recurse_submodules ()
 {
test_expect_success "--recurse-submodules and $1 are incompatible" "
-- 
2.8.0.rc3.226.g39d4020



[PATCH v6 3/6] grep: add submodules as a grep source type

2016-11-30 Thread Brandon Williams
Add `GREP_SOURCE_SUBMODULE` as a grep_source type and cases for this new
type in the various switch statements in grep.c.

When initializing a grep_source with type `GREP_SOURCE_SUBMODULE` the
identifier can either be NULL (to indicate that the working tree will be
used) or a SHA1 (the REV of the submodule to be grep'd).  If the
identifier is a SHA1 then we want to fall through to the
`GREP_SOURCE_SHA1` case to handle the copying of the SHA1.

Signed-off-by: Brandon Williams 
---
 grep.c | 16 +++-
 grep.h |  1 +
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/grep.c b/grep.c
index 1194d35..0dbdc1d 100644
--- a/grep.c
+++ b/grep.c
@@ -1735,12 +1735,23 @@ void grep_source_init(struct grep_source *gs, enum 
grep_source_type type,
case GREP_SOURCE_FILE:
gs->identifier = xstrdup(identifier);
break;
+   case GREP_SOURCE_SUBMODULE:
+   if (!identifier) {
+   gs->identifier = NULL;
+   break;
+   }
+   /*
+* FALL THROUGH
+* If the identifier is non-NULL (in the submodule case) it
+* will be a SHA1 that needs to be copied.
+*/
case GREP_SOURCE_SHA1:
gs->identifier = xmalloc(20);
hashcpy(gs->identifier, identifier);
break;
case GREP_SOURCE_BUF:
gs->identifier = NULL;
+   break;
}
 }
 
@@ -1760,6 +1771,7 @@ void grep_source_clear_data(struct grep_source *gs)
switch (gs->type) {
case GREP_SOURCE_FILE:
case GREP_SOURCE_SHA1:
+   case GREP_SOURCE_SUBMODULE:
free(gs->buf);
gs->buf = NULL;
gs->size = 0;
@@ -1831,8 +1843,10 @@ static int grep_source_load(struct grep_source *gs)
return grep_source_load_sha1(gs);
case GREP_SOURCE_BUF:
return gs->buf ? 0 : -1;
+   case GREP_SOURCE_SUBMODULE:
+   break;
}
-   die("BUG: invalid grep_source type");
+   die("BUG: invalid grep_source type to load");
 }
 
 void grep_source_load_driver(struct grep_source *gs)
diff --git a/grep.h b/grep.h
index 5856a23..267534c 100644
--- a/grep.h
+++ b/grep.h
@@ -161,6 +161,7 @@ struct grep_source {
GREP_SOURCE_SHA1,
GREP_SOURCE_FILE,
GREP_SOURCE_BUF,
+   GREP_SOURCE_SUBMODULE,
} type;
void *identifier;
 
-- 
2.8.0.rc3.226.g39d4020



[PATCH v6 2/6] submodules: load gitmodules file from commit sha1

2016-11-30 Thread Brandon Williams
teach submodules to load a '.gitmodules' file from a commit sha1.  This
enables the population of the submodule_cache to be based on the state
of the '.gitmodules' file from a particular commit.

Signed-off-by: Brandon Williams 
---
 cache.h|  2 ++
 config.c   |  8 
 submodule-config.c |  6 +++---
 submodule-config.h |  3 +++
 submodule.c| 12 
 submodule.h|  1 +
 6 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/cache.h b/cache.h
index 1be6526..559a461 100644
--- a/cache.h
+++ b/cache.h
@@ -1690,6 +1690,8 @@ extern int git_default_config(const char *, const char *, 
void *);
 extern int git_config_from_file(config_fn_t fn, const char *, void *);
 extern int git_config_from_mem(config_fn_t fn, const enum config_origin_type,
const char *name, const char *buf, 
size_t len, void *data);
+extern int git_config_from_blob_sha1(config_fn_t fn, const char *name,
+const unsigned char *sha1, void *data);
 extern void git_config_push_parameter(const char *text);
 extern int git_config_from_parameters(config_fn_t fn, void *data);
 extern void git_config(config_fn_t fn, void *);
diff --git a/config.c b/config.c
index 83fdecb..4d78e72 100644
--- a/config.c
+++ b/config.c
@@ -1214,10 +1214,10 @@ int git_config_from_mem(config_fn_t fn, const enum 
config_origin_type origin_typ
return do_config_from(, fn, data);
 }
 
-static int git_config_from_blob_sha1(config_fn_t fn,
-const char *name,
-const unsigned char *sha1,
-void *data)
+int git_config_from_blob_sha1(config_fn_t fn,
+ const char *name,
+ const unsigned char *sha1,
+ void *data)
 {
enum object_type type;
char *buf;
diff --git a/submodule-config.c b/submodule-config.c
index 098085b..8b9a2ef 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -379,9 +379,9 @@ static int parse_config(const char *var, const char *value, 
void *data)
return ret;
 }
 
-static int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
- unsigned char *gitmodules_sha1,
- struct strbuf *rev)
+int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
+  unsigned char *gitmodules_sha1,
+  struct strbuf *rev)
 {
int ret = 0;
 
diff --git a/submodule-config.h b/submodule-config.h
index d05c542..78584ba 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -29,6 +29,9 @@ const struct submodule *submodule_from_name(const unsigned 
char *commit_sha1,
const char *name);
 const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
const char *path);
+extern int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
+ unsigned char *gitmodules_sha1,
+ struct strbuf *rev);
 void submodule_free(void);
 
 #endif /* SUBMODULE_CONFIG_H */
diff --git a/submodule.c b/submodule.c
index f336ca9..8516ab0 100644
--- a/submodule.c
+++ b/submodule.c
@@ -198,6 +198,18 @@ void gitmodules_config(void)
}
 }
 
+void gitmodules_config_sha1(const unsigned char *commit_sha1)
+{
+   struct strbuf rev = STRBUF_INIT;
+   unsigned char sha1[20];
+
+   if (gitmodule_sha1_from_commit(commit_sha1, sha1, )) {
+   git_config_from_blob_sha1(submodule_config, rev.buf,
+ sha1, NULL);
+   }
+   strbuf_release();
+}
+
 /*
  * Determine if a submodule has been initialized at a given 'path'
  */
diff --git a/submodule.h b/submodule.h
index 6ec5f2f..9203d89 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,6 +37,7 @@ void set_diffopt_flags_from_submodule_config(struct 
diff_options *diffopt,
const char *path);
 int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
+extern void gitmodules_config_sha1(const unsigned char *commit_sha1);
 extern int is_submodule_initialized(const char *path);
 extern int is_submodule_populated(const char *path);
 int parse_submodule_update_strategy(const char *value,
-- 
2.8.0.rc3.226.g39d4020



[PATCH v6 5/6] grep: enable recurse-submodules to work on objects

2016-11-30 Thread Brandon Williams
Teach grep to recursively search in submodules when provided with a
 object. This allows grep to search a submodule based on the state
of the submodule that is present in a commit of the super project.

When grep is provided with a  object, the name of the object is
prefixed to all output.  In order to provide uniformity of output
between the parent and child processes the option `--parent-basename`
has been added so that the child can preface all of it's output with the
name of the parent's object instead of the name of the commit SHA1 of
the submodule. This changes output from the command
`git grep -e. -l --recurse-submodules HEAD`

from:
  HEAD:file
  :sub/file

to:
  HEAD:file
  HEAD:sub/file

Signed-off-by: Brandon Williams 
---
 Documentation/git-grep.txt |  13 -
 builtin/grep.c |  76 ---
 t/t7814-grep-recurse-submodules.sh | 103 -
 tree-walk.c|  28 ++
 4 files changed, 211 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 17aa1ba..71f32f3 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -26,7 +26,7 @@ SYNOPSIS
   [--threads ]
   [-f ] [-e] 
   [--and|--or|--not|(|)|-e ...]
-  [--recurse-submodules]
+  [--recurse-submodules] [--parent-basename ]
   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | 
...]
   [--] [...]
 
@@ -91,7 +91,16 @@ OPTIONS
 
 --recurse-submodules::
Recursively search in each submodule that has been initialized and
-   checked out in the repository.
+   checked out in the repository.  When used in combination with the
+option the prefix of all submodule output will be the name of
+   the parent project's  object.
+
+--parent-basename ::
+   For internal use only.  In order to produce uniform output with the
+   --recurse-submodules option, this option can be used to provide the
+   basename of a parent's  object to a submodule so the submodule
+   can prefix its output with the parent's name rather than the SHA1 of
+   the submodule.
 
 -a::
 --text::
diff --git a/builtin/grep.c b/builtin/grep.c
index dca0be6..5918a26 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -19,6 +19,7 @@
 #include "dir.h"
 #include "pathspec.h"
 #include "submodule.h"
+#include "submodule-config.h"
 
 static char const * const grep_usage[] = {
N_("git grep [] [-e]  [...] [[--] ...]"),
@@ -28,6 +29,7 @@ static char const * const grep_usage[] = {
 static const char *super_prefix;
 static int recurse_submodules;
 static struct argv_array submodule_options = ARGV_ARRAY_INIT;
+static const char *parent_basename;
 
 static int grep_submodule_launch(struct grep_opt *opt,
 const struct grep_source *gs);
@@ -534,19 +536,53 @@ static int grep_submodule_launch(struct grep_opt *opt,
 {
struct child_process cp = CHILD_PROCESS_INIT;
int status, i;
+   const char *end_of_base;
+   const char *name;
struct work_item *w = opt->output_priv;
 
+   end_of_base = strchr(gs->name, ':');
+   if (gs->identifier && end_of_base)
+   name = end_of_base + 1;
+   else
+   name = gs->name;
+
prepare_submodule_repo_env(_array);
 
/* Add super prefix */
argv_array_pushf(, "--super-prefix=%s%s/",
 super_prefix ? super_prefix : "",
-gs->name);
+name);
argv_array_push(, "grep");
 
+   /*
+* Add basename of parent project
+* When performing grep on a tree object the filename is prefixed
+* with the object's name: 'tree-name:filename'.  In order to
+* provide uniformity of output we want to pass the name of the
+* parent project's object name to the submodule so the submodule can
+* prefix its output with the parent's name and not its own SHA1.
+*/
+   if (gs->identifier && end_of_base)
+   argv_array_pushf(, "--parent-basename=%.*s",
+(int) (end_of_base - gs->name),
+gs->name);
+
/* Add options */
-   for (i = 0; i < submodule_options.argc; i++)
+   for (i = 0; i < submodule_options.argc; i++) {
+   /*
+* If there is a tree identifier for the submodule, add the
+* rev after adding the submodule options but before the
+* pathspecs.  To do this we listen for the '--' and insert the
+* sha1 before pushing the '--' onto the child process argv
+* array.
+*/
+   if (gs->identifier &&
+   !strcmp("--", submodule_options.argv[i])) {
+   argv_array_push(, sha1_to_hex(gs->identifier));
+  

[PATCH v6 1/6] submodules: add helper functions to determine presence of submodules

2016-11-30 Thread Brandon Williams
Add two helper functions to submodules.c.
`is_submodule_initialized()` checks if a submodule has been initialized
at a given path and `is_submodule_populated()` check if a submodule
has been checked out at a given path.

Signed-off-by: Brandon Williams 
---
 submodule.c | 39 +++
 submodule.h |  2 ++
 2 files changed, 41 insertions(+)

diff --git a/submodule.c b/submodule.c
index 6f7d883..f336ca9 100644
--- a/submodule.c
+++ b/submodule.c
@@ -198,6 +198,45 @@ void gitmodules_config(void)
}
 }
 
+/*
+ * Determine if a submodule has been initialized at a given 'path'
+ */
+int is_submodule_initialized(const char *path)
+{
+   int ret = 0;
+   const struct submodule *module = NULL;
+
+   module = submodule_from_path(null_sha1, path);
+
+   if (module) {
+   char *key = xstrfmt("submodule.%s.url", module->name);
+   char *value = NULL;
+
+   ret = !git_config_get_string(key, );
+
+   free(value);
+   free(key);
+   }
+
+   return ret;
+}
+
+/*
+ * Determine if a submodule has been populated at a given 'path'
+ */
+int is_submodule_populated(const char *path)
+{
+   int ret = 0;
+   struct stat st;
+   char *gitdir = xstrfmt("%s/.git", path);
+
+   if (!stat(gitdir, ))
+   ret = 1;
+
+   free(gitdir);
+   return ret;
+}
+
 int parse_submodule_update_strategy(const char *value,
struct submodule_update_strategy *dst)
 {
diff --git a/submodule.h b/submodule.h
index d9e197a..6ec5f2f 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,6 +37,8 @@ void set_diffopt_flags_from_submodule_config(struct 
diff_options *diffopt,
const char *path);
 int submodule_config(const char *var, const char *value, void *cb);
 void gitmodules_config(void);
+extern int is_submodule_initialized(const char *path);
+extern int is_submodule_populated(const char *path);
 int parse_submodule_update_strategy(const char *value,
struct submodule_update_strategy *dst);
 const char *submodule_strategy_to_string(const struct 
submodule_update_strategy *s);
-- 
2.8.0.rc3.226.g39d4020



[PATCH v6 0/6] recursively grep across submodules

2016-11-30 Thread Brandon Williams
v6 fixes a race condition which existed in the 'is_submodule_populated'
function.  Instead of calling 'resolve_gitdir' to check for the existance of a
.git file/directory, use 'stat'.  'resolve_gitdir' calls 'chdir' which can
affect other running threads trying to load thier files into a buffer in
memory.

Thanks to Stefan and Jeff for help debugging this problem.

Brandon Williams (6):
  submodules: add helper functions to determine presence of submodules
  submodules: load gitmodules file from commit sha1
  grep: add submodules as a grep source type
  grep: optionally recurse into submodules
  grep: enable recurse-submodules to work on  objects
  grep: search history of moved submodules

 Documentation/git-grep.txt |  14 ++
 builtin/grep.c | 386 ++---
 cache.h|   2 +
 config.c   |   8 +-
 git.c  |   2 +-
 grep.c |  16 +-
 grep.h |   1 +
 submodule-config.c |   6 +-
 submodule-config.h |   3 +
 submodule.c|  51 +
 submodule.h|   3 +
 t/t7814-grep-recurse-submodules.sh | 241 +++
 tree-walk.c|  28 +++
 13 files changed, 730 insertions(+), 31 deletions(-)
 create mode 100755 t/t7814-grep-recurse-submodules.sh

--- interdiff based on 'bw/grep-recurse-submodules'

diff --git a/submodule.c b/submodule.c
index 062e58b..8516ab0 100644
--- a/submodule.c
+++ b/submodule.c
@@ -239,9 +239,10 @@ int is_submodule_initialized(const char *path)
 int is_submodule_populated(const char *path)
 {
int ret = 0;
+   struct stat st;
char *gitdir = xstrfmt("%s/.git", path);
 
-   if (resolve_gitdir(gitdir))
+   if (!stat(gitdir, ))
ret = 1;
 
free(gitdir);

-- 
2.8.0.rc3.226.g39d4020



[PATCH] Documentation/install-webdoc.sh: quote a potentially unsafe shell expansion

2016-11-30 Thread Austin English
Found via shellcheck

In Documentation/install-webdoc.sh line 21:
mkdir -p $(dirname "$T/$h")
 ^-- SC2046: Quote this to prevent word splitting.

-- 
-Austin
GPG: 14FB D7EA A041 937B
From 1050538f252d22311185065ab8837c71b17003fb Mon Sep 17 00:00:00 2001
From: Austin English 
Date: Wed, 30 Nov 2016 19:21:25 -0600
Subject: [PATCH] Documentation/install-webdoc.sh: quote a potentially unsafe
 shell expansion

Signed-off-by: Austin English 
---
 Documentation/install-webdoc.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/install-webdoc.sh b/Documentation/install-webdoc.sh
index ed8b4ff..5fb2dc5 100755
--- a/Documentation/install-webdoc.sh
+++ b/Documentation/install-webdoc.sh
@@ -18,7 +18,7 @@ do
 	else
 		echo >&2 "# install $h $T/$h"
 		rm -f "$T/$h"
-		mkdir -p $(dirname "$T/$h")
+		mkdir -p "$(dirname "$T/$h")"
 		cp "$h" "$T/$h"
 	fi
 done
-- 
2.7.3



Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)

2016-11-30 Thread Brandon Williams
On 11/30, Stefan Beller wrote:
> > Oh interesting, I wonder if there is a way to not have to perform a
> > chdir since taking a lock to lstat wouldn't be ideal.
> 
> I think we could rewrite is_submodule_populated to be
> 
> int is_submodule_populated_cheap_with_no_chdir(char *path)
> {
> return stat(path + ".git")
> }
> 
> i.e. just take the presence of the .git file/dir as a hint to run
> the child process?

I like this approach, its a quick (thread-safe) check to see if the
submodule is interesting.  If there happens to be an error with the
submodule's .git file/directory then the child process will fail out.

-- 
Brandon Williams


Re: [PATCH] difftool.c: mark a file-local symbol with static

2016-11-30 Thread Ramsay Jones


On 30/11/16 23:46, Junio C Hamano wrote:
> Jeff King  writes:
> 
>> I forgot, we ended up reversing course later and silencing them:
>>
>>   http://public-inbox.org/git/20140505052117.gc6...@sigill.intra.peff.net/
>>
>> By the rationale of that conversation, we should be doing:
>>
>>   warning("%s", "");
>>
>> here.
> 
> I forgot too.  Thanks for digging up that thread.

Yes, I blamed wt-status.c:227 and came up with commit 7d7d68022
as well.

So, by the same rationale, we should remove -Wno-format-zero-length
from DEVELOPER_CFLAGS. yes?

ATB,
Ramsay Jones



[PATCH] transport-helper: drop broken "unchanged" feature

2016-11-30 Thread Jonathan Tan
Commit f8ec916 ("Allow helpers to report in "list" command that the ref
is unchanged", 2009-11-17) allowed a remote helper to report that a ref
is unchanged even if it did not know the contents of a ref. However,
that commit also made Git wrongly assume that such a remote ref always
has the contents of the local ref of the same name.

(Git also cannot assume that the remote ref has the value of the current
destination local ref, or any other ref, since the previous import/fetch
could have been made using a different refspec.)

Drop that assumption.

Signed-off-by: Jonathan Tan 
---

Here is a patch that would remove the assumption (if it is indeed
wrong).

 Documentation/gitremote-helpers.txt |  3 +++
 transport-helper.c  | 43 -
 2 files changed, 3 insertions(+), 43 deletions(-)

diff --git a/Documentation/gitremote-helpers.txt 
b/Documentation/gitremote-helpers.txt
index 9e8681f..c862339 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -393,6 +393,9 @@ attributes are defined.
 'unchanged'::
This ref is unchanged since the last import or fetch, although
the helper cannot necessarily determine what value that produced.
+   Git may import or fetch this ref anyway, because it does not
+   keep a record of the last values corresponding to the refs of a
+   specific remote.
 
 OPTIONS
 ---
diff --git a/transport-helper.c b/transport-helper.c
index 91aed35..6ab8e2f 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -392,9 +392,6 @@ static int fetch_with_fetch(struct transport *transport,
 
for (i = 0; i < nr_heads; i++) {
const struct ref *posn = to_fetch[i];
-   if (posn->status & REF_STATUS_UPTODATE)
-   continue;
-
strbuf_addf(, "fetch %s %s\n",
oid_to_hex(>old_oid),
posn->symref ? posn->symref : posn->name);
@@ -492,9 +489,6 @@ static int fetch_with_import(struct transport *transport,
 
for (i = 0; i < nr_heads; i++) {
posn = to_fetch[i];
-   if (posn->status & REF_STATUS_UPTODATE)
-   continue;
-
strbuf_addf(, "import %s\n",
posn->symref ? posn->symref : posn->name);
sendline(data, );
@@ -531,8 +525,6 @@ static int fetch_with_import(struct transport *transport,
for (i = 0; i < nr_heads; i++) {
char *private, *name;
posn = to_fetch[i];
-   if (posn->status & REF_STATUS_UPTODATE)
-   continue;
name = posn->symref ? posn->symref : posn->name;
if (data->refspecs)
private = apply_refspecs(data->refspecs, 
data->refspec_nr, name);
@@ -649,21 +641,12 @@ static int fetch(struct transport *transport,
 int nr_heads, struct ref **to_fetch)
 {
struct helper_data *data = transport->data;
-   int i, count;
 
if (process_connect(transport, 0)) {
do_take_over(transport);
return transport->fetch(transport, nr_heads, to_fetch);
}
 
-   count = 0;
-   for (i = 0; i < nr_heads; i++)
-   if (!(to_fetch[i]->status & REF_STATUS_UPTODATE))
-   count++;
-
-   if (!count)
-   return 0;
-
if (data->check_connectivity &&
data->transport_options.check_self_contained_and_connected)
set_helper_option(transport, "check-connectivity", "true");
@@ -1009,23 +992,6 @@ static int push_refs(struct transport *transport,
return -1;
 }
 
-
-static int has_attribute(const char *attrs, const char *attr) {
-   int len;
-   if (!attrs)
-   return 0;
-
-   len = strlen(attr);
-   for (;;) {
-   const char *space = strchrnul(attrs, ' ');
-   if (len == space - attrs && !strncmp(attrs, attr, len))
-   return 1;
-   if (!*space)
-   return 0;
-   attrs = space + 1;
-   }
-}
-
 static struct ref *get_refs_list(struct transport *transport, int for_push)
 {
struct helper_data *data = transport->data;
@@ -1067,15 +1033,6 @@ static struct ref *get_refs_list(struct transport 
*transport, int for_push)
(*tail)->symref = xstrdup(buf.buf + 1);
else if (buf.buf[0] != '?')
get_oid_hex(buf.buf, &(*tail)->old_oid);
-   if (eon) {
-   if (has_attribute(eon + 1, "unchanged")) {
-   (*tail)->status |= REF_STATUS_UPTODATE;
-   if (read_ref((*tail)->name,
-(*tail)->old_oid.hash) < 0)
-   die(_("Could not read ref %s"),
- 

[BUG?] Remote helper's wrong assumption of unchanged ref

2016-11-30 Thread Jonathan Tan
Commit f8ec916 ("Allow helpers to report in "list" command that the ref
is unchanged", 2009-11-17) allowed a remote helper to report that a ref
is unchanged even if it did not know the contents of a ref. However,
that commit also made Git assume that such a remote ref has the contents
of the local ref of the same name. If I'm not missing anything, this
assumption seems wrong; the attached test illustrates one case of local
edits being made after cloning with default parameters.

The original e-mail thread [1] seems to indicate that this feature is
meant for a remote helper with no Git-specific code (which is possible
if it supports "import" but not "fetch" - in this case, it would not
deal with SHA-1s at all) to nevertheless indicate "unchanged", most
likely to support optimizations on the client side.

But it seems to me that Git cannot perform this optimization. In other
words, it should just ignore "unchanged". If this makes sense, I'll
prepare a patch to do this.

[1] "[PATCH 00/13] Native and foreign helpers"

---
 git-remote-testgit.sh |  8 +++-
 t/t5801-remote-helpers.sh | 40 
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/git-remote-testgit.sh b/git-remote-testgit.sh
index 752c763..6357868 100755
--- a/git-remote-testgit.sh
+++ b/git-remote-testgit.sh
@@ -52,7 +52,13 @@ do
echo
;;
list)
-   git for-each-ref --format='? %(refname)' 'refs/heads/'
+   if test -n "$GIT_REMOTE_TESTGIT_UNCHANGED_BRANCH_REGEX"
+   then
+   git for-each-ref --format='? %(refname)' 'refs/heads/' |
+   sed 
"/${GIT_REMOTE_TESTGIT_UNCHANGED_BRANCH_REGEX}/s/$/ unchanged/"
+   else
+   git for-each-ref --format='? %(refname)' 'refs/heads/'
+   fi
head=$(git symbolic-ref HEAD)
echo "@$head HEAD"
echo
diff --git a/t/t5801-remote-helpers.sh b/t/t5801-remote-helpers.sh
index 362b158..4a48f2b 100755
--- a/t/t5801-remote-helpers.sh
+++ b/t/t5801-remote-helpers.sh
@@ -301,4 +301,44 @@ test_expect_success 'fetch url' '
compare_refs server HEAD local FETCH_HEAD
 '
 
+test_expect_success 'setup remote repository and divergent clone' '
+   git init s2 &&
+   (
+   cd s2 &&
+   test_commit M1 &&
+   git checkout -b mybranch &&
+   test_commit B1
+   ) &&
+   git clone "testgit::${PWD}/s2" divergent &&
+
+   (
+   cd divergent &&
+   git checkout master &&
+   test_commit M2 &&
+   git checkout mybranch &&
+   test_commit B2
+   )
+'
+
+test_expect_success 'fetch with unchanged claims' '
+   rm -rf local &&
+   cp -r divergent local &&
+
+   # No unchanged branches
+
+   GIT_REMOTE_TESTGIT_UNCHANGED_BRANCH_REGEX=ABCDE git -C local fetch &&
+   compare_refs s2 M1 local refs/remotes/origin/master &&
+   compare_refs s2 B1 local refs/remotes/origin/mybranch &&
+
+   # One unchanged branch
+
+   GIT_REMOTE_TESTGIT_UNCHANGED_BRANCH_REGEX=mybranch git -C local fetch &&
+   compare_refs s2 M1 local refs/remotes/origin/master &&
+
+   # I (Jonathan Tan) would expect refs/remotes/origin/mybranch to be B1,
+   # but it is B2.
+   test_must_fail compare_refs s2 B1 local refs/remotes/origin/mybranch &&
+   compare_refs local B2 local refs/remotes/origin/mybranch
+'
+
 test_done
-- 
2.8.0.rc3.226.g39d4020



Grant.

2016-11-30 Thread Capital Web

Hello,
 
A donation has been made out to you and your family from my foundation.
 
For details, send an E-mail now to: castrat...@gmail.com
 
Regards



Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)

2016-11-30 Thread Jeff King
On Wed, Nov 30, 2016 at 04:06:05PM -0800, Brandon Williams wrote:

> On 11/30, Jeff King wrote:
> > So I think there is some other chdir(). I'm not sure if there is an easy
> > way to get a backtrace on every call to chdir() in every thread. I'm
> > sure somebody more clever than me could figure out how to make gdb do it
> > automatically, but it might be workable manually. I think the chdir was
> > in the main thread.
> > 
> > -Peff
> 
> Yeah maybe we're missing something else...
> 
> How did you run strace with your stress script?

It's hidden in the patch I sent a moment ago, but basically just "strace
-o foo.out" will dump the trace in the trash directory. After the stress
script runs, you can "cat fail/trash*/foo.out".

> > (gdb) bt
> > #0  chdir () at ../sysdeps/unix/syscall-template.S:84
> > #1  0x555fe259 in real_path_internal (path=0x559f6b30 
> > "su:b/../.git/modules/su:b", die_on_error=1)
> > at abspath.c:84
> > #2  0x555fe48a in real_path (path=0x559f6b30 
> > "su:b/../.git/modules/su:b") at abspath.c:135
> > #3  0x556d09e6 in read_gitfile_gently (path=0x559f6ac0 
> > "su:b/.git", return_error_code=0x0)
> > at setup.c:555
> > #4  0x556d19cf in resolve_gitdir (suspect=0x559f6ac0 
> > "su:b/.git") at setup.c:1021
> > #5  0x556e7e34 in is_submodule_populated (path=0x559f5ec8 
> > "su:b") at submodule.c:244
> > #6  0x555a0f05 in grep_submodule (opt=0x7fffd8b0, sha1=0x0, 
> > filename=0x559f5ec8 "su:b", 
> > path=0x559f5ec8 "su:b") at builtin/grep.c:619
> > #7  0x555a12ac in grep_cache (opt=0x7fffd8b0, 
> > pathspec=0x7fffd880, cached=0) at builtin/grep.c:700
> > #8  0x555a36cb in cmd_grep (argc=0, argv=0x7fffdf40, 
> > prefix=0x0) at builtin/grep.c:1257
> > #9  0x5556603b in run_builtin (p=0x559b3ad8 , 
> > argc=4, argv=0x7fffdf40) at git.c:373
> > #10 0x555662bc in handle_builtin (argc=4, argv=0x7fffdf40) at 
> > git.c:572
> > #11 0x5556641a in run_argv (argcp=0x7fffddfc, 
> > argv=0x7fffddf0) at git.c:630
> > #12 0x555665a8 in cmd_main (argc=4, argv=0x7fffdf40) at 
> > git.c:702
> > #13 0x555fde47 in main (argc=7, argv=0x7fffdf28) at 
> > common-main.c:40
> > 
> > So is_submodule_populated() needs to take a lock. But what's really
> > gross is that the _other_ threads need to lock just to call lstat().
> > Presumably it could be done as a reader/writer type of lock where many
> > "reader" threads can take the "I need to lstat()" lock simultaneously,
> > but block when an "I'm going to chdir()" writer holds it.
> 
> Oh interesting, I wonder if there is a way to not have to perform a
> chdir since taking a lock to lstat wouldn't be ideal.

I don't think so.  It comes from real_path(), which needs to either
chdir(), or start interpreting symbolic links itself (and madness that
way lies).

I think with a reader/writer lock as I described it wouldn't be too bad.
The common case would pay only the locking cost and not ever block,
since submodules are rare (and they're super-heavyweight to descend into
anyway).

I think putting it at the individual lstat() would be way too low, but
probably you could do it right before calling grep_source(). It may even
be possible to do some of the submodule work ahead of time while holding
grep_lock().

> Thanks for helping out with this!

I wasn't planning on it, but this turned into an intriguing puzzle. ;)

-Peff


'git repack' and repack.writeBitmaps=true with kept packs

2016-11-30 Thread Steven Noonan
I have some unexpected behavior with 'git repack' on git 2.10.2 and 2.11.0.

$ cat /etc/gitconfig
[pack]
writeBitmapHashCache = true
[repack]
writeBitmaps = true
$ touch objects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.keep
$ find objects
objects
objects/pack
objects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.keep
objects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.idx
objects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.bitmap
objects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.pack
objects/info
objects/info/packs
$ git repack -Adfl
Counting objects: 16321, done.
Delta compression using up to 24 threads.
Compressing objects: 100% (16118/16118), done.
Writing objects: 100% (16321/16321), done.
Reusing bitmaps: 110, done.
Selecting bitmap commits: 3257, done.
Building bitmaps: 100% (137/137), done.
Total 16321 (delta 11568), reused 4507 (delta 0)
$ du -sh objects/pack/pack*
100Kobjects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.bitmap
524Kobjects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.idx
512 objects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.keep
4.3Mobjects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.pack
100Kobjects/pack/pack-6043151bee7bd61bdae6e3a2ba0f13cd1b0277af.bitmap
524Kobjects/pack/pack-6043151bee7bd61bdae6e3a2ba0f13cd1b0277af.idx
5.8Mobjects/pack/pack-6043151bee7bd61bdae6e3a2ba0f13cd1b0277af.pack

It seems like it's behaving as though I've provided
--pack-kept-objects. In order to ensure the .bitmap is created, it
repacks everything, including everything in existing .pack files (not
respecting .keep). But then it's not deleting the old .pack file
(oddly, respecting .keep).

What I'd expect it to do here is ignore the 'repack.writeBitmaps =
true' value if there's a .keep that needs to be respected. Is this not
a correct assumption?


Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)

2016-11-30 Thread Stefan Beller
> Oh interesting, I wonder if there is a way to not have to perform a
> chdir since taking a lock to lstat wouldn't be ideal.

I think we could rewrite is_submodule_populated to be

int is_submodule_populated_cheap_with_no_chdir(char *path)
{
return stat(path + ".git")
}

i.e. just take the presence of the .git file/dir as a hint to run
the child process?


Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)

2016-11-30 Thread Brandon Williams
On 11/30, Jeff King wrote:
> On Wed, Nov 30, 2016 at 06:59:52PM -0500, Jeff King wrote:
> 
> > So I think there is some other chdir(). I'm not sure if there is an easy
> > way to get a backtrace on every call to chdir() in every thread. I'm
> > sure somebody more clever than me could figure out how to make gdb do it
> > automatically, but it might be workable manually. I think the chdir was
> > in the main thread.
> 
> Ah, that turned out to be quite easy. The culprit is probably:
> 
> (gdb) bt
> #0  chdir () at ../sysdeps/unix/syscall-template.S:84
> #1  0x555fe259 in real_path_internal (path=0x559f6b30 
> "su:b/../.git/modules/su:b", die_on_error=1)
> at abspath.c:84
> #2  0x555fe48a in real_path (path=0x559f6b30 
> "su:b/../.git/modules/su:b") at abspath.c:135
> #3  0x556d09e6 in read_gitfile_gently (path=0x559f6ac0 
> "su:b/.git", return_error_code=0x0)
> at setup.c:555
> #4  0x556d19cf in resolve_gitdir (suspect=0x559f6ac0 "su:b/.git") 
> at setup.c:1021
> #5  0x556e7e34 in is_submodule_populated (path=0x559f5ec8 "su:b") 
> at submodule.c:244
> #6  0x555a0f05 in grep_submodule (opt=0x7fffd8b0, sha1=0x0, 
> filename=0x559f5ec8 "su:b", 
> path=0x559f5ec8 "su:b") at builtin/grep.c:619
> #7  0x555a12ac in grep_cache (opt=0x7fffd8b0, 
> pathspec=0x7fffd880, cached=0) at builtin/grep.c:700
> #8  0x555a36cb in cmd_grep (argc=0, argv=0x7fffdf40, prefix=0x0) 
> at builtin/grep.c:1257
> #9  0x5556603b in run_builtin (p=0x559b3ad8 , 
> argc=4, argv=0x7fffdf40) at git.c:373
> #10 0x555662bc in handle_builtin (argc=4, argv=0x7fffdf40) at 
> git.c:572
> #11 0x5556641a in run_argv (argcp=0x7fffddfc, 
> argv=0x7fffddf0) at git.c:630
> #12 0x555665a8 in cmd_main (argc=4, argv=0x7fffdf40) at git.c:702
> #13 0x555fde47 in main (argc=7, argv=0x7fffdf28) at 
> common-main.c:40
> 
> So is_submodule_populated() needs to take a lock. But what's really
> gross is that the _other_ threads need to lock just to call lstat().
> Presumably it could be done as a reader/writer type of lock where many
> "reader" threads can take the "I need to lstat()" lock simultaneously,
> but block when an "I'm going to chdir()" writer holds it.
> 
> -Peff

Oh interesting, I wonder if there is a way to not have to perform a
chdir since taking a lock to lstat wouldn't be ideal.

Thanks for helping out with this!

-- 
Brandon Williams


Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)

2016-11-30 Thread Brandon Williams
On 11/30, Jeff King wrote:
> So I think there is some other chdir(). I'm not sure if there is an easy
> way to get a backtrace on every call to chdir() in every thread. I'm
> sure somebody more clever than me could figure out how to make gdb do it
> automatically, but it might be workable manually. I think the chdir was
> in the main thread.
> 
> -Peff

Yeah maybe we're missing something else...

How did you run strace with your stress script?

-- 
Brandon Williams


Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)

2016-11-30 Thread Jeff King
On Wed, Nov 30, 2016 at 06:59:52PM -0500, Jeff King wrote:

> So I think there is some other chdir(). I'm not sure if there is an easy
> way to get a backtrace on every call to chdir() in every thread. I'm
> sure somebody more clever than me could figure out how to make gdb do it
> automatically, but it might be workable manually. I think the chdir was
> in the main thread.

Ah, that turned out to be quite easy. The culprit is probably:

(gdb) bt
#0  chdir () at ../sysdeps/unix/syscall-template.S:84
#1  0x555fe259 in real_path_internal (path=0x559f6b30 
"su:b/../.git/modules/su:b", die_on_error=1)
at abspath.c:84
#2  0x555fe48a in real_path (path=0x559f6b30 
"su:b/../.git/modules/su:b") at abspath.c:135
#3  0x556d09e6 in read_gitfile_gently (path=0x559f6ac0 "su:b/.git", 
return_error_code=0x0)
at setup.c:555
#4  0x556d19cf in resolve_gitdir (suspect=0x559f6ac0 "su:b/.git") 
at setup.c:1021
#5  0x556e7e34 in is_submodule_populated (path=0x559f5ec8 "su:b") 
at submodule.c:244
#6  0x555a0f05 in grep_submodule (opt=0x7fffd8b0, sha1=0x0, 
filename=0x559f5ec8 "su:b", 
path=0x559f5ec8 "su:b") at builtin/grep.c:619
#7  0x555a12ac in grep_cache (opt=0x7fffd8b0, 
pathspec=0x7fffd880, cached=0) at builtin/grep.c:700
#8  0x555a36cb in cmd_grep (argc=0, argv=0x7fffdf40, prefix=0x0) at 
builtin/grep.c:1257
#9  0x5556603b in run_builtin (p=0x559b3ad8 , argc=4, 
argv=0x7fffdf40) at git.c:373
#10 0x555662bc in handle_builtin (argc=4, argv=0x7fffdf40) at 
git.c:572
#11 0x5556641a in run_argv (argcp=0x7fffddfc, argv=0x7fffddf0) 
at git.c:630
#12 0x555665a8 in cmd_main (argc=4, argv=0x7fffdf40) at git.c:702
#13 0x555fde47 in main (argc=7, argv=0x7fffdf28) at common-main.c:40

So is_submodule_populated() needs to take a lock. But what's really
gross is that the _other_ threads need to lock just to call lstat().
Presumably it could be done as a reader/writer type of lock where many
"reader" threads can take the "I need to lstat()" lock simultaneously,
but block when an "I'm going to chdir()" writer holds it.

-Peff


Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)

2016-11-30 Thread Jeff King
On Wed, Nov 30, 2016 at 06:46:36PM -0500, Jeff King wrote:

> > For now the work around could be to just pass "-C " to the child
> > process instead of relying on run-command to chdir.
> 
> Yeah, that would push it after the exec. I just don't understand why
> that would be necessary.

Hmm. It still seems to fail, even with the workaround (the patch in
run-command is there to make sure there's not some other call that we're
not catching):

diff --git a/builtin/grep.c b/builtin/grep.c
index 2c727ef49..3323a3e7f 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -553,6 +553,7 @@ static int grep_submodule_launch(struct grep_opt *opt,
argv_array_pushf(, "--super-prefix=%s%s/",
 super_prefix ? super_prefix : "",
 name);
+   argv_array_pushl(, "-C", gs->path, NULL);
argv_array_push(, "grep");
 
/*
@@ -586,7 +587,6 @@ static int grep_submodule_launch(struct grep_opt *opt,
}
 
cp.git_cmd = 1;
-   cp.dir = gs->path;
 
/*
 * Capture output to output buffer and check the return code from the
diff --git a/run-command.c b/run-command.c
index 5a4dbb66d..d040f4f77 100644
--- a/run-command.c
+++ b/run-command.c
@@ -393,6 +393,8 @@ int start_command(struct child_process *cmd)
close(cmd->out);
}
 
+   if (cmd->dir && git_env_bool("GIT_NO_CHDIR", 0))
+   die("temporarily disallowing chdir");
if (cmd->dir && chdir(cmd->dir))
die_errno("exec '%s': cd to '%s' failed", cmd->argv[0],
cmd->dir);
diff --git a/t/t7814-grep-recurse-submodules.sh 
b/t/t7814-grep-recurse-submodules.sh
index 050777186..591ff74ed 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -175,7 +175,7 @@ test_expect_success 'grep recurse submodule colon in name' '
fi:le:foobar
su:b/fi:le:foobar
EOF
-   git -C parent grep -e "foobar" --recurse-submodules >actual &&
+   GIT_NO_CHDIR=1 strace -o foo.out -f git -C parent grep -e "foobar" 
--recurse-submodules >actual &&
test_cmp expect actual &&
 
cat >expect <<-\EOF &&


So I think there is some other chdir(). I'm not sure if there is an easy
way to get a backtrace on every call to chdir() in every thread. I'm
sure somebody more clever than me could figure out how to make gdb do it
automatically, but it might be workable manually. I think the chdir was
in the main thread.

-Peff


Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)

2016-11-30 Thread Brandon Williams
On 11/30, Jeff King wrote:
> On Wed, Nov 30, 2016 at 03:42:48PM -0800, Brandon Williams wrote:
> 
> > > where 20813 and 20867 are two threads of the main process. One is doing
> > > the lstat and the other calls chdir at the same moment.
> >
> > Yeah so it looks like the start_command function calls chdir.  Which
> > means any uses of the run-command interface are not thread safe
> 
> That seems crazy. The chdir should be happening on the child side of the
> fork (and looking at the code, it seems to be the case). And on the
> Windows side, without fork, it's an option to the spawn call, which
> makes sense.
> 
> > For now the work around could be to just pass "-C " to the child
> > process instead of relying on run-command to chdir.
> 
> Yeah, that would push it after the exec. I just don't understand why
> that would be necessary.
> 
> -Peff

You're right, I jumped the gun.  That doesn't seem to fix the problem
(as I'm still seeing the same failure).


-- 
Brandon Williams


Re: [PATCH] difftool.c: mark a file-local symbol with static

2016-11-30 Thread Junio C Hamano
Jeff King  writes:

> I forgot, we ended up reversing course later and silencing them:
>
>   http://public-inbox.org/git/20140505052117.gc6...@sigill.intra.peff.net/
>
> By the rationale of that conversation, we should be doing:
>
>   warning("%s", "");
>
> here.

I forgot too.  Thanks for digging up that thread.



Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)

2016-11-30 Thread Jeff King
On Wed, Nov 30, 2016 at 03:42:48PM -0800, Brandon Williams wrote:

> > where 20813 and 20867 are two threads of the main process. One is doing
> > the lstat and the other calls chdir at the same moment.
>
> Yeah so it looks like the start_command function calls chdir.  Which
> means any uses of the run-command interface are not thread safe

That seems crazy. The chdir should be happening on the child side of the
fork (and looking at the code, it seems to be the case). And on the
Windows side, without fork, it's an option to the spawn call, which
makes sense.

> For now the work around could be to just pass "-C " to the child
> process instead of relying on run-command to chdir.

Yeah, that would push it after the exec. I just don't understand why
that would be necessary.

-Peff


Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)

2016-11-30 Thread Brandon Williams
On 11/30, Jeff King wrote:
> On Wed, Nov 30, 2016 at 06:32:04PM -0500, Jeff King wrote:
> 
> > On Wed, Nov 30, 2016 at 03:28:23PM -0800, Brandon Williams wrote:
> > 
> > > So I couldn't find a race condition in the code.  I tracked the problem
> > > to grep_source_load_file which attempts to run lstat on the file so that
> > > it can read it into a buffer.  The lstat call fails with ENOENT (which
> > > conveniently is skipped by the if statement which calls error_errno).  So
> > > for some reason the file cannot be found and read into memory resulting
> > > in nothing being grep'ed for that particular file (since the buffer is
> > > NULL).
> > 
> > That's definitely weird. Is it possible that any of the underlying calls
> > from another thread are using chdir()? I think realpath() make do that
> > behind the scenes, and there may be others.
> > 
> > A full strace from a failing case would be interesting reading. In
> > theory we should be able to get that by running the stress script for
> > long enough. :)
> 
> Actually, it failed pretty much immediately. I guess the extra stracing
> changes the timing to make the problem _more_ likely.
> 
> And indeed, I see:
> 
> 20867 lstat("fi:le",  
> 20813 <... read resumed> "", 232)   = 0
> 20871 futex(0x558cdec8b164, FUTEX_WAIT_PRIVATE, 7, NULL 
> 20813 close(7 
> 20870 <... futex resumed> ) = 0
> 20869 lstat(".gitmodules",  
> 20813 <... close resumed> ) = 0
> 20865 set_robust_list(0x7f1df92579e0, 24 
> 20813 lstat("su:b/../.git/modules/su:b/commondir", 0x7ffecc8b3ac0) = -1 
> ENOENT (No such file or directory)
> 20865 <... set_robust_list resumed> )   = 0
> 20868 set_robust_list(0x7f1df7a549e0, 24 
> 20813 access("su:b/../.git/modules/su:b/objects", X_OK) = 0
> 20813 access("su:b/../.git/modules/su:b/refs", X_OK) = 0
> 20813 stat("su:b/../.git/modules/su:b", {st_mode=S_IFDIR|0755, st_size=280, 
> ...}) = 0
> 20813 getcwd("/var/ram/git-stress/root-4/trash 
> directory.t7814-grep-recurse-submodules/parent", 129) = 80
> 20869 <... lstat resumed> {st_mode=S_IFREG|0644, st_size=47, ...}) = 0
> 20813 chdir("su:b/../.git/modules/su:b") = 0
> 20869 open(".gitmodules", O_RDONLY 
> 20813 getcwd( 
> 20867 <... lstat resumed> 0x7f1df8254cf0) = -1 ENOENT (No such file or 
> directory)
> 
> where 20813 and 20867 are two threads of the main process. One is doing
> the lstat and the other calls chdir at the same moment.
> 
> -Peff

Yeah so it looks like the start_command function calls chdir.  Which
means any uses of the run-command interface are not thread safe

For now the work around could be to just pass "-C " to the child
process instead of relying on run-command to chdir.

-- 
Brandon Williams


Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)

2016-11-30 Thread Stefan Beller
On Wed, Nov 30, 2016 at 3:40 PM, Jeff King  wrote:
> On Wed, Nov 30, 2016 at 06:32:04PM -0500, Jeff King wrote:
>
>> On Wed, Nov 30, 2016 at 03:28:23PM -0800, Brandon Williams wrote:
>>
>> > So I couldn't find a race condition in the code.  I tracked the problem
>> > to grep_source_load_file which attempts to run lstat on the file so that
>> > it can read it into a buffer.  The lstat call fails with ENOENT (which
>> > conveniently is skipped by the if statement which calls error_errno).  So
>> > for some reason the file cannot be found and read into memory resulting
>> > in nothing being grep'ed for that particular file (since the buffer is
>> > NULL).
>>
>> That's definitely weird. Is it possible that any of the underlying calls
>> from another thread are using chdir()? I think realpath() make do that
>> behind the scenes, and there may be others.
>>
>> A full strace from a failing case would be interesting reading. In
>> theory we should be able to get that by running the stress script for
>> long enough. :)
>
> Actually, it failed pretty much immediately. I guess the extra stracing
> changes the timing to make the problem _more_ likely.
>
> And indeed, I see:
>
> 20867 lstat("fi:le",  
> 20813 <... read resumed> "", 232)   = 0
> 20871 futex(0x558cdec8b164, FUTEX_WAIT_PRIVATE, 7, NULL 
> 20813 close(7 
> 20870 <... futex resumed> ) = 0
> 20869 lstat(".gitmodules",  
> 20813 <... close resumed> ) = 0
> 20865 set_robust_list(0x7f1df92579e0, 24 
> 20813 lstat("su:b/../.git/modules/su:b/commondir", 0x7ffecc8b3ac0) = -1 
> ENOENT (No such file or directory)
> 20865 <... set_robust_list resumed> )   = 0
> 20868 set_robust_list(0x7f1df7a549e0, 24 
> 20813 access("su:b/../.git/modules/su:b/objects", X_OK) = 0
> 20813 access("su:b/../.git/modules/su:b/refs", X_OK) = 0
> 20813 stat("su:b/../.git/modules/su:b", {st_mode=S_IFDIR|0755, st_size=280, 
> ...}) = 0
> 20813 getcwd("/var/ram/git-stress/root-4/trash 
> directory.t7814-grep-recurse-submodules/parent", 129) = 80
> 20869 <... lstat resumed> {st_mode=S_IFREG|0644, st_size=47, ...}) = 0
> 20813 chdir("su:b/../.git/modules/su:b") = 0
> 20869 open(".gitmodules", O_RDONLY 
> 20813 getcwd( 
> 20867 <... lstat resumed> 0x7f1df8254cf0) = -1 ENOENT (No such file or 
> directory)
>
> where 20813 and 20867 are two threads of the main process. One is doing
> the lstat and the other calls chdir at the same moment.
>

Lessons learned here:
The run-command API is not thread safe when used with setting the directory.
If you need to run a thing in a threaded environment run
git -C  ... such that the child chdirs.

Are there any other threaded environments that run things with .dir set?


Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)

2016-11-30 Thread Jeff King
On Wed, Nov 30, 2016 at 06:32:04PM -0500, Jeff King wrote:

> On Wed, Nov 30, 2016 at 03:28:23PM -0800, Brandon Williams wrote:
> 
> > So I couldn't find a race condition in the code.  I tracked the problem
> > to grep_source_load_file which attempts to run lstat on the file so that
> > it can read it into a buffer.  The lstat call fails with ENOENT (which
> > conveniently is skipped by the if statement which calls error_errno).  So
> > for some reason the file cannot be found and read into memory resulting
> > in nothing being grep'ed for that particular file (since the buffer is
> > NULL).
> 
> That's definitely weird. Is it possible that any of the underlying calls
> from another thread are using chdir()? I think realpath() make do that
> behind the scenes, and there may be others.
> 
> A full strace from a failing case would be interesting reading. In
> theory we should be able to get that by running the stress script for
> long enough. :)

Actually, it failed pretty much immediately. I guess the extra stracing
changes the timing to make the problem _more_ likely.

And indeed, I see:

20867 lstat("fi:le",  
20813 <... read resumed> "", 232)   = 0
20871 futex(0x558cdec8b164, FUTEX_WAIT_PRIVATE, 7, NULL 
20813 close(7 
20870 <... futex resumed> ) = 0
20869 lstat(".gitmodules",  
20813 <... close resumed> ) = 0
20865 set_robust_list(0x7f1df92579e0, 24 
20813 lstat("su:b/../.git/modules/su:b/commondir", 0x7ffecc8b3ac0) = -1 ENOENT 
(No such file or directory)
20865 <... set_robust_list resumed> )   = 0
20868 set_robust_list(0x7f1df7a549e0, 24 
20813 access("su:b/../.git/modules/su:b/objects", X_OK) = 0
20813 access("su:b/../.git/modules/su:b/refs", X_OK) = 0
20813 stat("su:b/../.git/modules/su:b", {st_mode=S_IFDIR|0755, st_size=280, 
...}) = 0
20813 getcwd("/var/ram/git-stress/root-4/trash 
directory.t7814-grep-recurse-submodules/parent", 129) = 80
20869 <... lstat resumed> {st_mode=S_IFREG|0644, st_size=47, ...}) = 0
20813 chdir("su:b/../.git/modules/su:b") = 0
20869 open(".gitmodules", O_RDONLY 
20813 getcwd( 
20867 <... lstat resumed> 0x7f1df8254cf0) = -1 ENOENT (No such file or 
directory)

where 20813 and 20867 are two threads of the main process. One is doing
the lstat and the other calls chdir at the same moment.

-Peff


Re: [RFC/PATCH v3 01/16] Add initial external odb support

2016-11-30 Thread Jeff King
On Wed, Nov 30, 2016 at 03:30:09PM -0800, Junio C Hamano wrote:

> Christian Couder  writes:
> 
> > From: Jeff King 
> >
> > Signed-off-by: Christian Couder 
> > ---
> 
> By the time the series loses RFC prefix, we'd need to see the above
> three lines straightened out a bit more, e.g. a real message and a
> more proper sign-off sequence.

Actually, I would assume that this patch would go away altogether and
get folded into commits that get us closer to the end state. But I
haven't read the series carefully yet, so maybe it really does work
better with this as a base.

I am perfectly OK dropping my commit count by one and getting "based on
a patch by Peff" in a cover letter or elsewhere.

-Peff


Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)

2016-11-30 Thread Jeff King
On Wed, Nov 30, 2016 at 03:28:23PM -0800, Brandon Williams wrote:

> So I couldn't find a race condition in the code.  I tracked the problem
> to grep_source_load_file which attempts to run lstat on the file so that
> it can read it into a buffer.  The lstat call fails with ENOENT (which
> conveniently is skipped by the if statement which calls error_errno).  So
> for some reason the file cannot be found and read into memory resulting
> in nothing being grep'ed for that particular file (since the buffer is
> NULL).

That's definitely weird. Is it possible that any of the underlying calls
from another thread are using chdir()? I think realpath() make do that
behind the scenes, and there may be others.

A full strace from a failing case would be interesting reading. In
theory we should be able to get that by running the stress script for
long enough. :)

-Peff


Re: [RFC/PATCH v3 01/16] Add initial external odb support

2016-11-30 Thread Junio C Hamano
Christian Couder  writes:

> From: Jeff King 
>
> Signed-off-by: Christian Couder 
> ---

By the time the series loses RFC prefix, we'd need to see the above
three lines straightened out a bit more, e.g. a real message and a
more proper sign-off sequence.

> +static struct odb_helper *find_or_create_helper(const char *name, int len)
> +{
> + struct odb_helper *o;
> +
> + for (o = helpers; o; o = o->next)
> + if (!strncmp(o->name, name, len) && !o->name[len])
> + return o;
> +
> + o = odb_helper_new(name, len);
> + *helpers_tail = o;
> + helpers_tail = >next;
> +
> + return o;
> +}
> +
> +static int external_odb_config(const char *var, const char *value, void 
> *data)
> +{
> + struct odb_helper *o;
> + const char *key, *dot;
> +
> + if (!skip_prefix(var, "odb.", ))
> + return 0;
> + dot = strrchr(key, '.');
> + if (!dot)
> + return 0;

Is this something Peff wrote long time ago?  I find it surprising
that he would write this without using parse_config_key().

> +struct odb_helper_cmd {
> + struct argv_array argv;
> + struct child_process child;
> +};
> +
> +static void prepare_helper_command(struct argv_array *argv, const char *cmd,
> +const char *fmt, va_list ap)
> +{
> + struct strbuf buf = STRBUF_INIT;
> +
> + strbuf_addstr(, cmd);
> + strbuf_addch(, ' ');
> + strbuf_vaddf(, fmt, ap);
> +
> + argv_array_push(argv, buf.buf);
> + strbuf_release();

This concatenates the cmdname (like "get") and its parameters into a
single string (e.g. "get 454cb6bd52a4de614a3633e4f547af03d5c3b640")
and then places the resulting string as the sole element in argv[]
array, which I find somewhat unusual.  It at least deserves a
comment in front of the function that the callers are responsible to
ensure that the result of vaddf(fmt, ap) is properly shell-quoted.
Otherwise it is inviting quoting errors in the future (even if there
is no such errors in the current code and command set, i.e. a 40-hex
object name that "get" subcommand takes happens to require no
special shell-quoting consideration).

> +int odb_helper_fetch_object(struct odb_helper *o, const unsigned char *sha1,
> + int fd)
> +{
> + struct odb_helper_object *obj;
> + struct odb_helper_cmd cmd;
> + unsigned long total_got;
> + git_zstream stream;
> + int zret = Z_STREAM_END;
> + git_SHA_CTX hash;
> + unsigned char real_sha1[20];
> +
> + obj = odb_helper_lookup(o, sha1);
> + if (!obj)
> + return -1;
> +
> + if (odb_helper_start(o, , "get %s", sha1_to_hex(sha1)) < 0)
> + return -1;
> +
> + memset(, 0, sizeof(stream));
> + git_inflate_init();
> + git_SHA1_Init();
> + total_got = 0;
> +
> + for (;;) {
> + unsigned char buf[4096];
> + int r;
> +
> + r = xread(cmd.child.out, buf, sizeof(buf));
> + if (r < 0) {
> + error("unable to read from odb helper '%s': %s",
> +   o->name, strerror(errno));
> + close(cmd.child.out);
> + odb_helper_finish(o, );
> + git_inflate_end();
> + return -1;
> + }
> + if (r == 0)
> + break;
> +
> + write_or_die(fd, buf, r);
> +
> + stream.next_in = buf;
> + stream.avail_in = r;
> + do {
> + unsigned char inflated[4096];
> + unsigned long got;
> +
> + stream.next_out = inflated;
> + stream.avail_out = sizeof(inflated);
> + zret = git_inflate(, Z_SYNC_FLUSH);
> + got = sizeof(inflated) - stream.avail_out;
> +
> + git_SHA1_Update(, inflated, got);
> + /* skip header when counting size */
> + if (!total_got) {
> + const unsigned char *p = memchr(inflated, '\0', 
> got);

Does this assume that a single xread() that can result in a
short-read would not return in the middle of "header", and if so, is
that a safe assumption to make?

> + if (p)
> + got -= p - inflated + 1;
> + else
> + got = 0;
> + }
> + total_got += got;
> + } while (stream.avail_in && zret == Z_OK);
> + }
> +
> + close(cmd.child.out);
> + git_inflate_end();
> + git_SHA1_Final(real_sha1, );
> + if (odb_helper_finish(o, ))
> + return -1;
> + if (zret != Z_STREAM_END) {
> + warning("bad zlib data from odb helper '%s' for %s",
> + o->name, 

Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)

2016-11-30 Thread Brandon Williams
On 11/30, Brandon Williams wrote:
> On 11/29, Jeff King wrote:
> > On Tue, Nov 29, 2016 at 01:37:59AM -0500, Jeff King wrote:
> > 
> > >   2. Grep threads doing more complicated stuff that needs to take a
> > >  lock. You might try building with -fsanitize=thread to see if it
> > >  turns up anything.
> > 
> > I tried this and it didn't find anything useful. It complains about
> > multiple threads calling want_color() at the same time, which you can
> > silence with something like:
> > 
> > diff --git a/builtin/grep.c b/builtin/grep.c
> > index 2c727ef49..d48846f40 100644
> > --- a/builtin/grep.c
> > +++ b/builtin/grep.c
> > @@ -207,6 +207,12 @@ static void start_threads(struct grep_opt *opt)
> >  {
> > int i;
> >  
> > +   /*
> > +* trigger want_color() for its side effect of caching the result;
> > +* otherwise the threads will fight over setting the cache
> > +*/
> > +   want_color(GIT_COLOR_AUTO);
> > +
> > pthread_mutex_init(_mutex, NULL);
> > pthread_mutex_init(_read_mutex, NULL);
> > pthread_mutex_init(_attr_mutex, NULL);
> > 
> > But the problem persists even with that patch, so it is something else.
> > It may still be a threading problem; -fsanitize=thread isn't perfect. I
> > also couldn't get the stress-test to fail when compiled with it. But
> > that may simply mean that the timing of the resulting binary is changed
> > enough not to trigger the issue.
> > 
> > -Peff
> 
> With you're stress script I'm able to see the failures.  The interesting
> thing is that the entry missing is always from the non-submodule file.

So I couldn't find a race condition in the code.  I tracked the problem
to grep_source_load_file which attempts to run lstat on the file so that
it can read it into a buffer.  The lstat call fails with ENOENT (which
conveniently is skipped by the if statement which calls error_errno).  So
for some reason the file cannot be found and read into memory resulting
in nothing being grep'ed for that particular file (since the buffer is
NULL).

Maybe there is something wrong with the way I wrote the tests
themselves?  So I could try rewriting them.  Any other ideas?

-- 
Brandon Williams


[RFC/PATCH v3 08/16] t0400: add test for external odb write support

2016-11-30 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/t0400-external-odb.sh | 8 
 1 file changed, 8 insertions(+)

diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index 6c6da5cf4f..3c868cad4c 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -66,4 +66,12 @@ test_expect_success 'helper can add objects to alt repo' '
test "$size" -eq "$alt_size"
 '
 
+test_expect_success 'commit adds objects to alt repo' '
+   test_config odb.magic.command "$HELPER" &&
+   test_commit three &&
+   hash3=$(git ls-tree HEAD | grep three.t | cut -f1 | cut -d\  -f3) &&
+   content=$(cd alt-repo && git show "$hash3") &&
+   test "$content" = "three"
+'
+
 test_done
-- 
2.11.0.rc2.37.geb49ca6



[RFC/PATCH v3 04/16] t0400: add 'put' command to odb-helper script

2016-11-30 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/t0400-external-odb.sh | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index fe85413725..0f1bb97f38 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -7,6 +7,10 @@ test_description='basic tests for external object databases'
 ALT_SOURCE="$PWD/alt-repo/.git"
 export ALT_SOURCE
 write_script odb-helper <<\EOF
+die() {
+   printf >&2 "%s\n" "$@"
+   exit 1
+}
 GIT_DIR=$ALT_SOURCE; export GIT_DIR
 case "$1" in
 have)
@@ -16,6 +20,16 @@ have)
 get)
cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
;;
+put)
+   sha1="$2"
+   size="$3"
+   kind="$4"
+   writen=$(git hash-object -w -t "$kind" --stdin)
+   test "$writen" = "$sha1" || die "bad sha1 passed '$sha1' vs writen 
'$writen'"
+   ;;
+*)
+   die "unknown command '$1'"
+   ;;
 esac
 EOF
 HELPER="\"$PWD\"/odb-helper"
-- 
2.11.0.rc2.37.geb49ca6



Re: [PATCH] difftool.c: mark a file-local symbol with static

2016-11-30 Thread Jeff King
On Wed, Nov 30, 2016 at 10:37:40PM +, Ramsay Jones wrote:

> > Anyway. Those are all options, but I don't think there is any problem
> > with sticking with warning("") for now. It is not the first part of the
> > code that tickles the format-zero-length warning.
> 
> Hmm, well I have been building git for some time with the warning
> enabled without problem.
> 
> [I can see a few examples of 'status_printf_ln(s, c, "%s", "");' in
> git-grep output, so ...]
> 
> I am not suggesting any change here, and was just curious why it
> seemed to be unnecessary now (I knew it was necessary at one time).

I forgot, we ended up reversing course later and silencing them:

  http://public-inbox.org/git/20140505052117.gc6...@sigill.intra.peff.net/

By the rationale of that conversation, we should be doing:

  warning("%s", "");

here.

-Peff


Re: "git add -p ." raises an unexpected "warning: empty strings as pathspecs will be made invalid in upcoming releases. please use . instead if you meant to match all paths"

2016-11-30 Thread Junio C Hamano
Junio C Hamano  forgot to Cc: the author of the
most relevant change to the issue, d426430e6e ("pathspec: warn on
empty strings as pathspec", 2016-06-22).

> Kevin Daudt  writes:
>
>> On Wed, Nov 30, 2016 at 12:31:49PM -0800, Peter Urda wrote:
>>> After upgrading to version 2.11.0 I am getting a warning about empty
>>> strings as pathspecs while using 'patch'
>>> 
>>> - Ran 'git add -p .' from the root of my git repository.
>>> 
>>> - I was able to normally stage my changes, but was presented with a
>>> "warning: empty strings as pathspecs will be made invalid in upcoming
>>> releases. please use . instead if you meant to match all paths"
>>> message.
>>> 
>>> - I expected no warning message since I included a "." with my original 
>>> command.
>>> 
>>> I believe that I should not be seeing this warning message as I
>>> included the requested "." pathspec.
>
> Yes, this seems to be caused by pathspec.c::prefix_pathspec()
> overwriting the original pathspec "." into "".  The callchain
> looks like this:
>
> builtin/add.c::interactive_add()
>  -> parse_pathspec()
> passes argv[] that has "." to the caller,
> receives pathspec whose pathspec->items[].original
>   is supposed to point at the unmolested original,
> but prefix_pathspec() munges "." into ""
>  -> run_add_interactive()
> which runs "git add--interactive" with
>   pathspec->items[].original as pathspecs
>
>
> Perhaps this would work it around, but there should be a better way
> to fix it (like, making sure that what we call "original" indeed
> stays "original").
>
>  builtin/add.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index e8fb80b36e..137097192d 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -167,9 +167,18 @@ int run_add_interactive(const char *revision, const char 
> *patch_mode,
>   if (revision)
>   argv_array_push(, revision);
>   argv_array_push(, "--");
> - for (i = 0; i < pathspec->nr; i++)
> + for (i = 0; i < pathspec->nr; i++) {
>   /* pass original pathspec, to be re-parsed */
> + if (!*pathspec->items[i].original) {
> + /*
> +  * work around a misfeature in parse_pathspecs()
> +  * that munges "." into "".
> +  */
> + argv_array_push(, ".");
> + continue;
> + }
>   argv_array_push(, pathspec->items[i].original);
> + }
>  
>   status = run_command_v_opt(argv.argv, RUN_GIT_CMD);
>   argv_array_clear();
> @@ -180,7 +189,7 @@ int interactive_add(int argc, const char **argv, const 
> char *prefix, int patch)
>  {
>   struct pathspec pathspec;
>  
> - parse_pathspec(, 0,
> + parse_pathspec(, 0,
>  PATHSPEC_PREFER_FULL |
>  PATHSPEC_SYMLINK_LEADING_PATH |
>  PATHSPEC_PREFIX_ORIGIN,


Re: [PATCH] difftool.c: mark a file-local symbol with static

2016-11-30 Thread Ramsay Jones


On 30/11/16 21:25, Jeff King wrote:
> On Wed, Nov 30, 2016 at 12:40:25PM -0800, Junio C Hamano wrote:
> 
>> Ramsay Jones  writes:
>>
>>> [I have fixed my config.mak file now, so I don't see the warning
>>> anymore! Having -Wno-format-zero-length in DEVELOPER_CFLAGS, or
>>> not, is a separate matter.]
>>
>> I suspect that 658df95a4a ("add DEVELOPER makefile knob to check for
>> acknowledged warnings", 2016-02-25) took it from me (namely, Make
>> script in my 'todo' branch).  In turn, I added it to my set of flags
>> in order to squelch this exact warning, so...
> 
> For anybody interested in the history, we started using this when
> status_printf() got the format attribute. Relevant patch and discussion:
> 
>   
> http://public-inbox.org/git/20130710002328.gc19...@sigill.intra.peff.net/T/#u

Ah, thank you! I've been trying to remember this for the last hour or so ...
(I misremembered something about gettext, and went looking in the wrong
direction ;-) ).

> We went with disabling the warning because it really is wrong. It makes
> an assumption that calling a format function with an empty string
> doesn't do anything, but that's only true of the stock printf functions.
> Our custom functions _do_ have a side effect in this case.

Yes, I agree, gcc is making an unwarranted assumption.

> The other options are to have a special function for "print a warning
> (or status) line with no content". Or to teach those functions to handle
> newlines specially. We've often discussed that you should be able to do:
> 
>   warning("foo\nbar");
> 
> and have it print:
> 
>   warning: foo
>   warning: bar
> 
> That's useful in itself, and would probably make cases like this easier
> to handle, too. But it's a pretty big change. Another option would be to
> just teach formatting functions to handle a single "\n" as a synonym for
> the empty string (or even detect trailing newlines and avoid appending
> our own in that case). That would mean you could do:
> 
>   warning("\n");
> 
> to print a blank line. That's arguably more obvious about the intent to
> a reader (I say arguably because the new behavior _is_ subtle if you
> happen to know that warning() usually appends a newline).

Yes, I remember the discussion now and agree that this could
cause problems.

> Anyway. Those are all options, but I don't think there is any problem
> with sticking with warning("") for now. It is not the first part of the
> code that tickles the format-zero-length warning.

Hmm, well I have been building git for some time with the warning
enabled without problem.

[I can see a few examples of 'status_printf_ln(s, c, "%s", "");' in
git-grep output, so ...]

I am not suggesting any change here, and was just curious why it
seemed to be unnecessary now (I knew it was necessary at one time).

ATB,
Ramsay Jones



Re: [RFC/PATCH v3 00/16] Add initial experimental external ODB support

2016-11-30 Thread Junio C Hamano
Christian Couder  writes:

> For now there should be one odb ref per blob. Each ref name should be
> refs/odbs// where  is the sha1 of the blob stored
> in the external odb named .
>
> These odb refs should all point to a blob that should be stored in the
> Git repository and contain information about the blob stored in the
> external odb. This information can be specific to the external odb.
> The repos can then share this information using commands like:
>
> `git fetch origin "refs/odbs//*:refs/odbs//*"`

Unless this is designed to serve only a handful of blobs, I cannot
see how this design would scale successfully.  I notice you wrote
"For now" at the beginning, but what is the envisioned way this will
evolve in the future?



Re: [PATCHv2 4/4] submodule: add embed-git-dir function

2016-11-30 Thread Junio C Hamano
Stefan Beller  writes:

> On Wed, Nov 30, 2016 at 1:39 PM, Junio C Hamano  wrote:
>> Stefan Beller  writes:
>>
>>> git relocate-git-dir (--into-workingtree|--into-gitdir) \
>>
>> I am not sure if you meant this as a submodule-specific subcommand
>> or more general helper.  "into-workingtree" suggests to me that it
>> is submodule specific, so I'll base my response on that assumption.
>>
>> Would there ever be a situation where you already have submodule
>> repositories in the right place (according to the more modern
>> practice, to keep them in .git/modules/ of superproject) and want to
>> move them to embed them in worktrees of submodules?  I do not think
>> of any.
>
>  "Hi, I made a mistake by using submodules. I don't want to use
>   them any more, I rather want to:
>   A) make it a separate git repo again and I'll keep them in sync myself
>   B) ... "

OK, I can buy that.  Thanks.


[RFC/PATCH v3 12/16] lib-httpd: add upload.sh

2016-11-30 Thread Christian Couder
This cgi will be used to upload objects to, or to delete
objects from, an apache web server.

This way the apache server can work as an external object
database.

Signed-off-by: Christian Couder 
---
 t/lib-httpd.sh|  1 +
 t/lib-httpd/upload.sh | 45 +
 2 files changed, 46 insertions(+)
 create mode 100644 t/lib-httpd/upload.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 2e659a8ee2..d80b004549 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -132,6 +132,7 @@ prepare_httpd() {
cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
install_script broken-smart-http.sh
install_script error.sh
+   install_script upload.sh
 
ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/upload.sh b/t/lib-httpd/upload.sh
new file mode 100644
index 00..172be0f73f
--- /dev/null
+++ b/t/lib-httpd/upload.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+# In part from 
http://codereview.stackexchange.com/questions/79549/bash-cgi-upload-file
+
+FILES_DIR="www/files"
+
+OLDIFS="$IFS"
+IFS='&'
+set -- $QUERY_STRING
+IFS="$OLDIFS"
+
+while test $# -gt 0
+do
+key=${1%=*}
+val=${1#*=}
+
+case "$key" in
+   "sha1") sha1="$val" ;;
+   "type") type="$val" ;;
+   "size") size="$val" ;;
+   "delete") delete=1 ;;
+   *) echo >&2 "unknown key '$key'" ;;
+esac
+
+shift
+done
+
+case "$REQUEST_METHOD" in
+  POST)
+if test "$delete" = "1"
+then
+   rm -f "$FILES_DIR/$sha1-$size-$type"
+else
+   mkdir -p "$FILES_DIR"
+   cat >"$FILES_DIR/$sha1-$size-$type"
+fi
+
+echo 'Status: 204 No Content'
+echo
+;;
+
+  *)
+echo 'Status: 405 Method Not Allowed'
+echo
+esac
-- 
2.11.0.rc2.37.geb49ca6



[RFC/PATCH v3 02/16] external odb foreach

2016-11-30 Thread Christian Couder
From: Jeff King 

---
 external-odb.c | 14 ++
 external-odb.h |  6 ++
 odb-helper.c   | 15 +++
 odb-helper.h   |  4 
 4 files changed, 39 insertions(+)

diff --git a/external-odb.c b/external-odb.c
index 1ccfa99a01..42978a3298 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -113,3 +113,17 @@ int external_odb_fetch_object(const unsigned char *sha1)
 
return -1;
 }
+
+int external_odb_for_each_object(each_external_object_fn fn, void *data)
+{
+   struct odb_helper *o;
+
+   external_odb_init();
+
+   for (o = helpers; o; o = o->next) {
+   int r = odb_helper_for_each_object(o, fn, data);
+   if (r)
+   return r;
+   }
+   return 0;
+}
diff --git a/external-odb.h b/external-odb.h
index 2397477684..cea8570a49 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -5,4 +5,10 @@ const char *external_odb_root(void);
 int external_odb_has_object(const unsigned char *sha1);
 int external_odb_fetch_object(const unsigned char *sha1);
 
+typedef int (*each_external_object_fn)(const unsigned char *sha1,
+  enum object_type type,
+  unsigned long size,
+  void *data);
+int external_odb_for_each_object(each_external_object_fn, void *);
+
 #endif /* EXTERNAL_ODB_H */
diff --git a/odb-helper.c b/odb-helper.c
index 244bc86792..2db59caa53 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -237,3 +237,18 @@ int odb_helper_fetch_object(struct odb_helper *o, const 
unsigned char *sha1,
 
return 0;
 }
+
+int odb_helper_for_each_object(struct odb_helper *o,
+  each_external_object_fn fn,
+  void *data)
+{
+   int i;
+   for (i = 0; i < o->have_nr; i++) {
+   struct odb_helper_object *obj = >have[i];
+   int r = fn(obj->sha1, obj->type, obj->size, data);
+   if (r)
+   return r;
+   }
+
+   return 0;
+}
diff --git a/odb-helper.h b/odb-helper.h
index 0f704f9452..8c3916d215 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -1,6 +1,8 @@
 #ifndef ODB_HELPER_H
 #define ODB_HELPER_H
 
+#include "external-odb.h"
+
 struct odb_helper {
const char *name;
const char *cmd;
@@ -21,5 +23,7 @@ struct odb_helper *odb_helper_new(const char *name, int 
namelen);
 int odb_helper_has_object(struct odb_helper *o, const unsigned char *sha1);
 int odb_helper_fetch_object(struct odb_helper *o, const unsigned char *sha1,
int fd);
+int odb_helper_for_each_object(struct odb_helper *o,
+  each_external_object_fn, void *);
 
 #endif /* ODB_HELPER_H */
-- 
2.11.0.rc2.37.geb49ca6



Re: "git add -p ." raises an unexpected "warning: empty strings as pathspecs will be made invalid in upcoming releases. please use . instead if you meant to match all paths"

2016-11-30 Thread Junio C Hamano
Kevin Daudt  writes:

> On Wed, Nov 30, 2016 at 12:31:49PM -0800, Peter Urda wrote:
>> After upgrading to version 2.11.0 I am getting a warning about empty
>> strings as pathspecs while using 'patch'
>> 
>> - Ran 'git add -p .' from the root of my git repository.
>> 
>> - I was able to normally stage my changes, but was presented with a
>> "warning: empty strings as pathspecs will be made invalid in upcoming
>> releases. please use . instead if you meant to match all paths"
>> message.
>> 
>> - I expected no warning message since I included a "." with my original 
>> command.
>> 
>> I believe that I should not be seeing this warning message as I
>> included the requested "." pathspec.

Yes, this seems to be caused by pathspec.c::prefix_pathspec()
overwriting the original pathspec "." into "".  The callchain
looks like this:

builtin/add.c::interactive_add()
 -> parse_pathspec()
passes argv[] that has "." to the caller,
receives pathspec whose pathspec->items[].original
is supposed to point at the unmolested original,
but prefix_pathspec() munges "." into ""
 -> run_add_interactive()
which runs "git add--interactive" with
pathspec->items[].original as pathspecs


Perhaps this would work it around, but there should be a better way
to fix it (like, making sure that what we call "original" indeed
stays "original").

 builtin/add.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index e8fb80b36e..137097192d 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -167,9 +167,18 @@ int run_add_interactive(const char *revision, const char 
*patch_mode,
if (revision)
argv_array_push(, revision);
argv_array_push(, "--");
-   for (i = 0; i < pathspec->nr; i++)
+   for (i = 0; i < pathspec->nr; i++) {
/* pass original pathspec, to be re-parsed */
+   if (!*pathspec->items[i].original) {
+   /*
+* work around a misfeature in parse_pathspecs()
+* that munges "." into "".
+*/
+   argv_array_push(, ".");
+   continue;
+   }
argv_array_push(, pathspec->items[i].original);
+   }
 
status = run_command_v_opt(argv.argv, RUN_GIT_CMD);
argv_array_clear();
@@ -180,7 +189,7 @@ int interactive_add(int argc, const char **argv, const char 
*prefix, int patch)
 {
struct pathspec pathspec;
 
-   parse_pathspec(, 0,
+   parse_pathspec(, 0,
   PATHSPEC_PREFER_FULL |
   PATHSPEC_SYMLINK_LEADING_PATH |
   PATHSPEC_PREFIX_ORIGIN,


[RFC/PATCH v3 07/16] external-odb: accept only blobs for now

2016-11-30 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 external-odb.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/external-odb.c b/external-odb.c
index bb70fe3298..6dd7b2548b 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -133,6 +133,10 @@ int external_odb_write_object(const void *buf, unsigned 
long len,
 {
struct odb_helper *o;
 
+   /* For now accept only blobs */
+   if (strcmp(type, "blob"))
+   return 1;
+
external_odb_init();
 
for (o = helpers; o; o = o->next) {
-- 
2.11.0.rc2.37.geb49ca6



[RFC/PATCH v3 11/16] lib-httpd: pass config file to start_httpd()

2016-11-30 Thread Christian Couder
This makes it possible to start an apache web server with different
config files.

This will be used in a later patch to pass a config file that makes
apache store external objects.

Signed-off-by: Christian Couder 
---
 t/lib-httpd.sh | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 435a37465a..2e659a8ee2 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -171,12 +171,14 @@ prepare_httpd() {
 }
 
 start_httpd() {
+   APACHE_CONF_FILE=${1-apache.conf}
+
prepare_httpd >&3 2>&4
 
trap 'code=$?; stop_httpd; (exit $code); die' EXIT
 
"$LIB_HTTPD_PATH" -d "$HTTPD_ROOT_PATH" \
-   -f "$TEST_PATH/apache.conf" $HTTPD_PARA \
+   -f "$TEST_PATH/$APACHE_CONF_FILE" $HTTPD_PARA \
-c "Listen 127.0.0.1:$LIB_HTTPD_PORT" -k start \
>&3 2>&4
if test $? -ne 0
@@ -191,7 +193,7 @@ stop_httpd() {
trap 'die' EXIT
 
"$LIB_HTTPD_PATH" -d "$HTTPD_ROOT_PATH" \
-   -f "$TEST_PATH/apache.conf" $HTTPD_PARA -k stop
+   -f "$TEST_PATH/$APACHE_CONF_FILE" $HTTPD_PARA -k stop
 }
 
 test_http_push_nonff () {
-- 
2.11.0.rc2.37.geb49ca6



[RFC/PATCH v3 03/16] t0400: use --batch-all-objects to get all objects

2016-11-30 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/t0400-external-odb.sh | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index 2b016173a0..fe85413725 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -10,9 +10,7 @@ write_script odb-helper <<\EOF
 GIT_DIR=$ALT_SOURCE; export GIT_DIR
 case "$1" in
 have)
-   git rev-list --all --objects |
-   cut -d' ' -f1 |
-   git cat-file --batch-check |
+   git cat-file --batch-check --batch-all-objects |
awk '{print $1 " " $3 " " $2}'
;;
 get)
-- 
2.11.0.rc2.37.geb49ca6



[RFC/PATCH v3 13/16] lib-httpd: add list.sh

2016-11-30 Thread Christian Couder
This cgi script can list Git objects that have been uploaded as
files to an apache web server. This script can also retrieve
the content of each of these files.

This will help make apache work as an external object database.

Signed-off-by: Christian Couder 
---
 t/lib-httpd.sh  |  1 +
 t/lib-httpd/list.sh | 34 ++
 2 files changed, 35 insertions(+)
 create mode 100644 t/lib-httpd/list.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index d80b004549..f31ea261f5 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -133,6 +133,7 @@ prepare_httpd() {
install_script broken-smart-http.sh
install_script error.sh
install_script upload.sh
+   install_script list.sh
 
ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/list.sh b/t/lib-httpd/list.sh
new file mode 100644
index 00..a54402558f
--- /dev/null
+++ b/t/lib-httpd/list.sh
@@ -0,0 +1,34 @@
+#!/bin/sh
+
+FILES_DIR="www/files"
+
+OLDIFS="$IFS"
+IFS='&'
+set -- $QUERY_STRING
+IFS="$OLDIFS"
+
+while test $# -gt 0
+do
+key=${1%=*}
+val=${1#*=}
+
+case "$key" in
+   "sha1") sha1="$val" ;;
+   *) echo >&2 "unknown key '$key'" ;;
+esac
+
+shift
+done
+
+echo 'Status: 200 OK'
+echo
+
+if test -d "$FILES_DIR"
+then
+if test -n "$sha1"
+then
+   cat "$FILES_DIR/$sha1"-*
+else
+   ls "$FILES_DIR" | tr '-' ' '
+fi
+fi
-- 
2.11.0.rc2.37.geb49ca6



[RFC/PATCH v3 10/16] Add t0410 to test external ODB transfer

2016-11-30 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/t0410-transfer-e-odb.sh | 136 ++
 1 file changed, 136 insertions(+)
 create mode 100755 t/t0410-transfer-e-odb.sh

diff --git a/t/t0410-transfer-e-odb.sh b/t/t0410-transfer-e-odb.sh
new file mode 100755
index 00..868b55db94
--- /dev/null
+++ b/t/t0410-transfer-e-odb.sh
@@ -0,0 +1,136 @@
+#!/bin/sh
+
+test_description='basic tests for transfering external ODBs'
+
+. ./test-lib.sh
+
+ORIG_SOURCE="$PWD/.git"
+export ORIG_SOURCE
+
+ALT_SOURCE1="$PWD/alt-repo1/.git"
+export ALT_SOURCE1
+write_script odb-helper1 <<\EOF
+die() {
+   printf >&2 "%s\n" "$@"
+   exit 1
+}
+GIT_DIR=$ALT_SOURCE1; export GIT_DIR
+case "$1" in
+have)
+   git cat-file --batch-check --batch-all-objects |
+   awk '{print $1 " " $3 " " $2}'
+   ;;
+get)
+   cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
+   ;;
+put)
+   sha1="$2"
+   size="$3"
+   kind="$4"
+   writen=$(git hash-object -w -t "$kind" --stdin)
+   test "$writen" = "$sha1" || die "bad sha1 passed '$sha1' vs writen 
'$writen'"
+   ref_hash=$(echo "$sha1 $size $kind" | GIT_DIR=$ORIG_SOURCE 
GIT_NO_EXTERNAL_ODB=1 git hash-object -w -t blob --stdin) || exit
+   GIT_DIR=$ORIG_SOURCE git update-ref refs/odbs/magic/"$sha1" "$ref_hash"
+   ;;
+*)
+   die "unknown command '$1'"
+   ;;
+esac
+EOF
+HELPER1="\"$PWD\"/odb-helper1"
+
+OTHER_SOURCE="$PWD/.git"
+export OTHER_SOURCE
+
+ALT_SOURCE2="$PWD/alt-repo2/.git"
+export ALT_SOURCE2
+write_script odb-helper2 <<\EOF
+die() {
+   printf >&2 "%s\n" "$@"
+   exit 1
+}
+GIT_DIR=$ALT_SOURCE2; export GIT_DIR
+case "$1" in
+have)
+   GIT_DIR=$OTHER_SOURCE git for-each-ref --format='%(objectname)' 
refs/odbs/magic/ | GIT_DIR=$OTHER_SOURCE xargs git show
+   ;;
+get)
+   OBJ_FILE="$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
+   if ! test -f "$OBJ_FILE"
+   then
+   # "Download" the missing object by copying it from alt-repo1
+   OBJ_DIR=$(echo $2 | sed 's/\(..\).*/\1/')
+   OBJ_BASE=$(basename "$OBJ_FILE")
+   ALT_OBJ_DIR1="$ALT_SOURCE1/objects/$OBJ_DIR"
+   ALT_OBJ_DIR2="$ALT_SOURCE2/objects/$OBJ_DIR"
+   mkdir -p "$ALT_OBJ_DIR2" || die "Could not mkdir 
'$ALT_OBJ_DIR2'"
+   OBJ_SRC="$ALT_OBJ_DIR1/$OBJ_BASE"
+   cp "$OBJ_SRC" "$ALT_OBJ_DIR2" ||
+   die "Could not cp '$OBJ_SRC' into '$ALT_OBJ_DIR2'"
+   fi
+   cat "$OBJ_FILE" || die "Could not cat '$OBJ_FILE'"
+   ;;
+put)
+   sha1="$2"
+   size="$3"
+   kind="$4"
+   writen=$(git hash-object -w -t "$kind" --stdin)
+   test "$writen" = "$sha1" || die "bad sha1 passed '$sha1' vs writen 
'$writen'"
+   ref_hash=$(echo "$sha1 $size $kind" | GIT_DIR=$OTHER_SOURCE 
GIT_NO_EXTERNAL_ODB=1 git hash-object -w -t blob --stdin) || exit
+   GIT_DIR=$OTHER_SOURCE git update-ref refs/odbs/magic/"$sha1" "$ref_hash"
+   ;;
+*)
+   die "unknown command '$1'"
+   ;;
+esac
+EOF
+HELPER2="\"$PWD\"/odb-helper2"
+
+test_expect_success 'setup first alternate repo' '
+   git init alt-repo1 &&
+   test_commit zero &&
+   git config odb.magic.command "$HELPER1"
+'
+
+test_expect_success 'setup other repo and its alternate repo' '
+   git init other-repo &&
+   git init alt-repo2 &&
+   (cd other-repo &&
+git remote add origin .. &&
+git pull origin master &&
+git checkout master &&
+git log)
+'
+
+test_expect_success 'new blobs are put in first object store' '
+   test_commit one &&
+   hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+   content=$(cd alt-repo1 && git show "$hash1") &&
+   test "$content" = "one" &&
+   test_commit two &&
+   hash2=$(git ls-tree HEAD | grep two.t | cut -f1 | cut -d\  -f3) &&
+   content=$(cd alt-repo1 && git show "$hash2") &&
+   test "$content" = "two"
+'
+
+test_expect_success 'other repo gets the blobs from object store' '
+   (cd other-repo &&
+git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+test_must_fail git cat-file blob "$hash1" &&
+test_must_fail git cat-file blob "$hash2" &&
+git config odb.magic.command "$HELPER2" &&
+git cat-file blob "$hash1" &&
+git cat-file blob "$hash2"
+   )
+'
+
+test_expect_success 'other repo gets everything else' '
+   (cd other-repo &&
+git fetch origin &&
+content=$(git show "$hash1") &&
+test "$content" = "one" &&
+content=$(git show "$hash2") &&
+test "$content" = "two")
+'
+
+test_done
-- 
2.11.0.rc2.37.geb49ca6



[RFC/PATCH v3 09/16] Add GIT_NO_EXTERNAL_ODB env variable

2016-11-30 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 cache.h| 9 +
 environment.c  | 4 
 external-odb.c | 6 ++
 sha1_file.c| 3 +++
 4 files changed, 22 insertions(+)

diff --git a/cache.h b/cache.h
index b419b9b7ce..503b618a1f 100644
--- a/cache.h
+++ b/cache.h
@@ -422,6 +422,7 @@ static inline enum object_type object_type(unsigned int 
mode)
 #define CEILING_DIRECTORIES_ENVIRONMENT "GIT_CEILING_DIRECTORIES"
 #define NO_REPLACE_OBJECTS_ENVIRONMENT "GIT_NO_REPLACE_OBJECTS"
 #define GIT_REPLACE_REF_BASE_ENVIRONMENT "GIT_REPLACE_REF_BASE"
+#define NO_EXTERNAL_ODB_ENVIRONMENT "GIT_NO_EXTERNAL_ODB"
 #define GITATTRIBUTES_FILE ".gitattributes"
 #define INFOATTRIBUTES_FILE "info/attributes"
 #define ATTRIBUTE_MACRO_PREFIX "[attr]"
@@ -698,6 +699,14 @@ void reset_shared_repository(void);
 extern int check_replace_refs;
 extern char *git_replace_ref_base;
 
+/*
+ * Do external odbs need to be used this run?  This variable is
+ * initialized to true unless $GIT_NO_EXTERNAL_ODB is set, but it
+ * maybe set to false by some commands that do not want external
+ * odbs to be active.
+ */
+extern int use_external_odb;
+
 extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_apply_sparse_checkout;
diff --git a/environment.c b/environment.c
index 0935ec696e..8aecdd0544 100644
--- a/environment.c
+++ b/environment.c
@@ -47,6 +47,7 @@ const char *excludes_file;
 enum auto_crlf auto_crlf = AUTO_CRLF_FALSE;
 int check_replace_refs = 1;
 char *git_replace_ref_base;
+int use_external_odb = 1;
 enum eol core_eol = EOL_UNSET;
 enum safe_crlf safe_crlf = SAFE_CRLF_WARN;
 unsigned whitespace_rule_cfg = WS_DEFAULT_RULE;
@@ -120,6 +121,7 @@ const char * const local_repo_env[] = {
INDEX_ENVIRONMENT,
NO_REPLACE_OBJECTS_ENVIRONMENT,
GIT_REPLACE_REF_BASE_ENVIRONMENT,
+   NO_EXTERNAL_ODB_ENVIRONMENT,
GIT_PREFIX_ENVIRONMENT,
GIT_SUPER_PREFIX_ENVIRONMENT,
GIT_SHALLOW_FILE_ENVIRONMENT,
@@ -185,6 +187,8 @@ static void setup_git_env(void)
replace_ref_base = getenv(GIT_REPLACE_REF_BASE_ENVIRONMENT);
git_replace_ref_base = xstrdup(replace_ref_base ? replace_ref_base
  : "refs/replace/");
+   if (getenv(NO_EXTERNAL_ODB_ENVIRONMENT))
+   use_external_odb = 0;
namespace = expand_namespace(getenv(GIT_NAMESPACE_ENVIRONMENT));
namespace_len = strlen(namespace);
shallow_file = getenv(GIT_SHALLOW_FILE_ENVIRONMENT);
diff --git a/external-odb.c b/external-odb.c
index 6dd7b2548b..a980fbfbf2 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -63,6 +63,9 @@ int external_odb_has_object(const unsigned char *sha1)
 {
struct odb_helper *o;
 
+   if (!use_external_odb)
+   return 0;
+
external_odb_init();
 
for (o = helpers; o; o = o->next)
@@ -133,6 +136,9 @@ int external_odb_write_object(const void *buf, unsigned 
long len,
 {
struct odb_helper *o;
 
+   if (!use_external_odb)
+   return 1;
+
/* For now accept only blobs */
if (strcmp(type, "blob"))
return 1;
diff --git a/sha1_file.c b/sha1_file.c
index 3532c1c598..92f1244205 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -556,6 +556,9 @@ void prepare_external_alt_odb(void)
static int linked_external;
const char *path;
 
+   if (!use_external_odb)
+   return;
+
if (linked_external)
return;
 
-- 
2.11.0.rc2.37.geb49ca6



Re: [PATCHv2 4/4] submodule: add embed-git-dir function

2016-11-30 Thread Stefan Beller
On Wed, Nov 30, 2016 at 1:39 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> git relocate-git-dir (--into-workingtree|--into-gitdir) \
>
> I am not sure if you meant this as a submodule-specific subcommand
> or more general helper.  "into-workingtree" suggests to me that it
> is submodule specific, so I'll base my response on that assumption.
>
> Would there ever be a situation where you already have submodule
> repositories in the right place (according to the more modern
> practice, to keep them in .git/modules/ of superproject) and want to
> move them to embed them in worktrees of submodules?  I do not think
> of any.

 "Hi, I made a mistake by using submodules. I don't want to use
  them any more, I rather want to:
  A) make it a separate git repo again and I'll keep them in sync myself
  B) ... "

 "I abuse submodules for what git-LFS was designed for, and the
  submodule is on a different mount point, please keep the git directory
  also at that mount point".

Not sure I agree these problems and the proposed solutions are beautiful,
but that is what people may think of as a fast hack?

>
> If there is no such situation, I do not think we want a verb that is
> direction-neutral (e.g. "move" or "relocate") with two options.
> Rather we would want "git submodule unembed-git-dir" or something
> like that.

So when we want to have a generic function in C ("relocate_gitdir")
for both worktree and submodules, the recursive flag is not supposed
to invoke a submodule specific helper, but a generic helper.

Alternatively we make the function not as generic and claim the
recursive part is submodule specific and we can happily call
"git submodule [un]embed-git-dir" recursively.


CVSImport - spaces in CVS path

2016-11-30 Thread Yojoa
I'm in the process of moving an entire collection of cvs modules into git.
I'm working in Mac Yosemite. Everything is working fine except for one
thing. A couple of the CVS modules have spaces in the paths. Below is what
my command line looks like. When the path has spaces I've tried putting it
in single and double quotes and using escape characters.  None of that
matters. I always get a message that it can't read dir/dir/path and then
messages that it can't find the modules "with" or "spaces".  

git cvsimport -v -d :pserver:MYLOGIN:/usr/local/cvsroot/
dir/dir/PathWithNoSpaces/dir

git cvsimport -v -d :pserver:MYLOGIN:/usr/local/cvsroot/ dir/dir/path with
spaces/dir





--
View this message in context: 
http://git.661346.n2.nabble.com/CVSImport-spaces-in-CVS-path-tp7657459.html
Sent from the git mailing list archive at Nabble.com.


Re: [PATCHv2 4/4] submodule: add embed-git-dir function

2016-11-30 Thread Junio C Hamano
Stefan Beller  writes:

> git relocate-git-dir (--into-workingtree|--into-gitdir) \

I am not sure if you meant this as a submodule-specific subcommand
or more general helper.  "into-workingtree" suggests to me that it
is submodule specific, so I'll base my response on that assumption.

Would there ever be a situation where you already have submodule
repositories in the right place (according to the more modern
practice, to keep them in .git/modules/ of superproject) and want to
move them to embed them in worktrees of submodules?  I do not think
of any.

If there is no such situation, I do not think we want a verb that is
direction-neutral (e.g. "move" or "relocate") with two options.
Rather we would want "git submodule unembed-git-dir" or something
like that.


Re: [PATCH v2] tag, branch, for-each-ref: add --ignore-case for sorting and filtering

2016-11-30 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> This option makes sorting ignore case, which is great when you have
> branches named bug-12-do-something, Bug-12-do-some-more and
> BUG-12-do-what and want to group them together. Sorting externally may
> not be an option because we lose coloring and column layout from
> git-branch and git-tag.
>
> The same could be said for filtering, but it's probably less important
> because you can always go with the ugly pattern [bB][uU][gG]-* if you're
> desperate.

But of course --ignore-case is of course much easier.

> You can't have case-sensitive filtering and case-insensitive sorting (or
> the other way around) with this though. But who would want that?

I do not feel uncomfortable declaring that the answer to that
question is "nobody" ;-)

> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  v2 has a different approach, and I think it's a better one even with
>  that unanswered question above.

Yeah, I think this would be easier to use.

> @@ -512,15 +512,6 @@ static void print_ref_list(struct ref_filter *filter, 
> struct ref_sorting *sortin
>   if (filter->verbose)
>   maxwidth = calc_maxwidth(, strlen(remote_prefix));
>  
> - /*
> -  * If no sorting parameter is given then we default to sorting
> -  * by 'refname'. This would give us an alphabetically sorted
> -  * array with the 'HEAD' ref at the beginning followed by
> -  * local branches 'refs/heads/...' and finally remote-tacking
> -  * branches 'refs/remotes/...'.
> -  */
> - if (!sorting)
> - sorting = ref_default_sorting();

So it is now a BUG() to give sorting==NULL to this function, which
is OK and I do not think we even need an assert() for it (i.e. the
code with the patch looks good).

> @@ -744,6 +739,16 @@ int cmd_branch(int argc, const char **argv, const char 
> *prefix)
>   if ((filter.kind & FILTER_REFS_BRANCHES) && filter.detached)
>   filter.kind |= FILTER_REFS_DETACHED_HEAD;
>   filter.name_patterns = argv;
> + /*
> +  * If no sorting parameter is given then we default to sorting
> +  * by 'refname'. This would give us an alphabetically sorted
> +  * array with the 'HEAD' ref at the beginning followed by
> +  * local branches 'refs/heads/...' and finally remote-tacking
> +  * branches 'refs/remotes/...'.
> +  */
> + if (!sorting)
> + sorting = ref_default_sorting();
> + sorting->ignore_case = icase;
>   print_ref_list(, sorting);
>   print_columns(, colopts, NULL);
>   string_list_clear(, 0);

... and the fallback is moved to the caller, which makes sense.

> diff --git a/ref-filter.c b/ref-filter.c
> index f5f7a70..bd98010 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -1231,8 +1231,14 @@ static int commit_contains(struct ref_filter *filter, 
> struct commit *commit)
>   * matches a pattern "refs/heads/mas") or a wildcard (e.g. the same ref
>   * matches "refs/heads/mas*", too).
>   */
> -static int match_pattern(const char **patterns, const char *refname)
> +static int match_pattern(const struct ref_filter *filter, const char 
> *refname)
>  {
> + const char **patterns = filter->name_patterns;
> + unsigned flags = 0;
> +
> + if (filter->ignore_case)
> + flags |= WM_CASEFOLD;
> +

Ahh, OK.  My reading stuttered when seeing "sorting and filtering"
in the option description for "git tag", but this makes perfect
sense.  Good job.

> @@ -1255,9 +1261,15 @@ static int match_pattern(const char **patterns, const 
> char *refname)
>   * matches a pattern "refs/heads/" but not "refs/heads/m") or a
>   * wildcard (e.g. the same ref matches "refs/heads/m*", too).
>   */
> -static int match_name_as_path(const char **pattern, const char *refname)
> +static int match_name_as_path(const struct ref_filter *filter, const char 
> *refname)
>  {
> + const char **pattern = filter->name_patterns;
>   int namelen = strlen(refname);
> + unsigned flags = WM_PATHNAME;
> +
> + if (filter->ignore_case)
> + flags |= WM_CASEFOLD;
> +

Likewise.  Very simple and nicely done.

> @@ -1536,18 +1548,20 @@ static int cmp_ref_sorting(struct ref_sorting *s, 
> struct ref_array_item *a, stru
>   struct atom_value *va, *vb;
>   int cmp;
>   cmp_type cmp_type = used_atom[s->atom].type;
> + int (*cmp_fn)(const char *, const char *);
>  
>   get_ref_atom_value(a, s->atom, );
>   get_ref_atom_value(b, s->atom, );
> + cmp_fn = s->ignore_case ? strcasecmp : strcmp;
>   if (s->version)
>   cmp = versioncmp(va->s, vb->s);
>   else if (cmp_type == FIELD_STR)
> - cmp = strcmp(va->s, vb->s);
> + cmp = cmp_fn(va->s, vb->s);
>   else {
>   if (va->ul < vb->ul)
>   cmp = -1;
>   else 

Re: "git add -p ." raises an unexpected "warning: empty strings as pathspecs will be made invalid in upcoming releases. please use . instead if you meant to match all paths"

2016-11-30 Thread Kevin Daudt
On Wed, Nov 30, 2016 at 12:31:49PM -0800, Peter Urda wrote:
> After upgrading to version 2.11.0 I am getting a warning about empty
> strings as pathspecs while using 'patch'
> 
> - Ran 'git add -p .' from the root of my git repository.
> 
> - I was able to normally stage my changes, but was presented with a
> "warning: empty strings as pathspecs will be made invalid in upcoming
> releases. please use . instead if you meant to match all paths"
> message.
> 
> - I expected no warning message since I included a "." with my original 
> command.
> 
> I believe that I should not be seeing this warning message as I
> included the requested "." pathspec.
> 
> ~ Peter Urda
> 
> http://urda.cc

I can reproduce this. Note that it only happens when you specify '-p'.
Without the --patch option, the warning does not appear.


Re: [PATCH] difftool.c: mark a file-local symbol with static

2016-11-30 Thread Jeff King
On Wed, Nov 30, 2016 at 12:40:25PM -0800, Junio C Hamano wrote:

> Ramsay Jones  writes:
> 
> > [I have fixed my config.mak file now, so I don't see the warning
> > anymore! Having -Wno-format-zero-length in DEVELOPER_CFLAGS, or
> > not, is a separate matter.]
> 
> I suspect that 658df95a4a ("add DEVELOPER makefile knob to check for
> acknowledged warnings", 2016-02-25) took it from me (namely, Make
> script in my 'todo' branch).  In turn, I added it to my set of flags
> in order to squelch this exact warning, so...

For anybody interested in the history, we started using this when
status_printf() got the format attribute. Relevant patch and discussion:

  http://public-inbox.org/git/20130710002328.gc19...@sigill.intra.peff.net/T/#u

We went with disabling the warning because it really is wrong. It makes
an assumption that calling a format function with an empty string
doesn't do anything, but that's only true of the stock printf functions.
Our custom functions _do_ have a side effect in this case.

The other options are to have a special function for "print a warning
(or status) line with no content". Or to teach those functions to handle
newlines specially. We've often discussed that you should be able to do:

  warning("foo\nbar");

and have it print:

  warning: foo
  warning: bar

That's useful in itself, and would probably make cases like this easier
to handle, too. But it's a pretty big change. Another option would be to
just teach formatting functions to handle a single "\n" as a synonym for
the empty string (or even detect trailing newlines and avoid appending
our own in that case). That would mean you could do:

  warning("\n");

to print a blank line. That's arguably more obvious about the intent to
a reader (I say arguably because the new behavior _is_ subtle if you
happen to know that warning() usually appends a newline).

Anyway. Those are all options, but I don't think there is any problem
with sticking with warning("") for now. It is not the first part of the
code that tickles the format-zero-length warning.

-Peff


[RFC/PATCH v3 15/16] odb-helper: add 'store_plain_objects' to 'struct odb_helper'

2016-11-30 Thread Christian Couder
This adds a configuration option odb..plainObjects and the
corresponding boolean variable called 'store_plain_objects' in
'struct odb_helper' to make it possible for external object
databases to store object as plain objects instead of Git objects.

The existing odb_helper_fetch_object() is renamed
odb_helper_fetch_git_object() and a new odb_helper_fetch_plain_object()
is introduce to deal with external objects that are not in Git format.

Signed-off-by: Christian Couder 
---
 external-odb.c |   2 +
 odb-helper.c   | 113 -
 odb-helper.h   |   1 +
 3 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index a980fbfbf2..af55377281 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -36,6 +36,8 @@ static int external_odb_config(const char *var, const char 
*value, void *data)
 
if (!strcmp(key, "command"))
return git_config_string(>cmd, var, value);
+   if (!strcmp(key, "plainobjects"))
+   o->store_plain_objects = git_config_bool(var, value);
 
return 0;
 }
diff --git a/odb-helper.c b/odb-helper.c
index 7b7de7380f..6b9fb7927a 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -153,8 +153,107 @@ int odb_helper_has_object(struct odb_helper *o, const 
unsigned char *sha1)
return !!odb_helper_lookup(o, sha1);
 }
 
-int odb_helper_fetch_object(struct odb_helper *o, const unsigned char *sha1,
-   int fd)
+static int odb_helper_fetch_plain_object(struct odb_helper *o,
+const unsigned char *sha1,
+int fd)
+{
+   struct odb_helper_object *obj;
+   struct odb_helper_cmd cmd;
+   unsigned long total_got = 0;
+
+   char hdr[32];
+   int hdrlen;
+
+   int ret = Z_STREAM_END;
+   unsigned char compressed[4096];
+   git_zstream stream;
+   git_SHA_CTX hash;
+   unsigned char real_sha1[20];
+
+   obj = odb_helper_lookup(o, sha1);
+   if (!obj)
+   return -1;
+
+   if (odb_helper_start(o, , 0, "get %s", sha1_to_hex(sha1)) < 0)
+   return -1;
+
+   /* Set it up */
+   git_deflate_init(, zlib_compression_level);
+   stream.next_out = compressed;
+   stream.avail_out = sizeof(compressed);
+   git_SHA1_Init();
+
+   /* First header.. */
+   hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", typename(obj->type), 
obj->size) + 1;
+   stream.next_in = (unsigned char *)hdr;
+   stream.avail_in = hdrlen;
+   while (git_deflate(, 0) == Z_OK)
+   ; /* nothing */
+   git_SHA1_Update(, hdr, hdrlen);
+
+   for (;;) {
+   unsigned char buf[4096];
+   int r;
+
+   r = xread(cmd.child.out, buf, sizeof(buf));
+   if (r < 0) {
+   error("unable to read from odb helper '%s': %s",
+ o->name, strerror(errno));
+   close(cmd.child.out);
+   odb_helper_finish(o, );
+   git_deflate_end();
+   return -1;
+   }
+   if (r == 0)
+   break;
+
+   total_got += r;
+
+   /* Then the data itself.. */
+   stream.next_in = (void *)buf;
+   stream.avail_in = r;
+   do {
+   unsigned char *in0 = stream.next_in;
+   ret = git_deflate(, Z_FINISH);
+   git_SHA1_Update(, in0, stream.next_in - in0);
+   write_or_die(fd, compressed, stream.next_out - 
compressed);
+   stream.next_out = compressed;
+   stream.avail_out = sizeof(compressed);
+   } while (ret == Z_OK);
+   }
+
+   close(cmd.child.out);
+   if (ret != Z_STREAM_END) {
+   warning("bad zlib data from odb helper '%s' for %s",
+   o->name, sha1_to_hex(sha1));
+   return -1;
+   }
+   ret = git_deflate_end_gently();
+   if (ret != Z_OK) {
+   warning("deflateEnd on object %s from odb helper '%s' failed 
(%d)",
+   sha1_to_hex(sha1), o->name, ret);
+   return -1;
+   }
+   git_SHA1_Final(real_sha1, );
+   if (hashcmp(sha1, real_sha1)) {
+   warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
+   o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
+   return -1;
+   }
+   if (odb_helper_finish(o, ))
+   return -1;
+   if (total_got != obj->size) {
+   warning("size mismatch from odb helper '%s' for %s (%lu != 
%lu)",
+   o->name, sha1_to_hex(sha1), total_got, obj->size);
+   return -1;
+   }
+
+   return 0;
+}
+
+static int 

[RFC/PATCH v3 01/16] Add initial external odb support

2016-11-30 Thread Christian Couder
From: Jeff King 

Signed-off-by: Christian Couder 
---
 Makefile|   2 +
 cache.h |   9 ++
 external-odb.c  | 115 +++
 external-odb.h  |   8 ++
 odb-helper.c| 239 
 odb-helper.h|  25 +
 sha1_file.c |  64 ++---
 t/t0400-external-odb.sh |  48 ++
 8 files changed, 495 insertions(+), 15 deletions(-)
 create mode 100644 external-odb.c
 create mode 100644 external-odb.h
 create mode 100644 odb-helper.c
 create mode 100644 odb-helper.h
 create mode 100755 t/t0400-external-odb.sh

diff --git a/Makefile b/Makefile
index f53fcc90d7..88e78da886 100644
--- a/Makefile
+++ b/Makefile
@@ -747,6 +747,7 @@ LIB_OBJS += ewah/ewah_bitmap.o
 LIB_OBJS += ewah/ewah_io.o
 LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
+LIB_OBJS += external-odb.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
 LIB_OBJS += gettext.o
@@ -779,6 +780,7 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
+LIB_OBJS += odb-helper.o
 LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-bitmap-write.o
 LIB_OBJS += pack-check.o
diff --git a/cache.h b/cache.h
index a50a61a197..b419b9b7ce 100644
--- a/cache.h
+++ b/cache.h
@@ -885,6 +885,12 @@ const char *git_path_shallow(void);
 extern const char *sha1_file_name(const unsigned char *sha1);
 
 /*
+ * Like sha1_file_name, but return the filename within a specific alternate
+ * object directory. Shares the same static buffer with sha1_file_name.
+ */
+extern const char *sha1_file_name_alt(const char *objdir, const unsigned char 
*sha1);
+
+/*
  * Return the name of the (local) packfile with the specified sha1 in
  * its name.  The return value is a pointer to memory that is
  * overwritten each time this function is called.
@@ -1135,6 +1141,8 @@ extern int do_check_packed_object_crc;
 
 extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned 
long size, const char *type);
 
+extern int create_object_tmpfile(struct strbuf *tmp, const char *filename);
+extern void close_sha1_file(int fd);
 extern int finalize_object_file(const char *tmpfile, const char *filename);
 
 extern int has_sha1_pack(const unsigned char *sha1);
@@ -1409,6 +1417,7 @@ extern void read_info_alternates(const char * 
relative_base, int depth);
 extern char *compute_alternate_path(const char *path, struct strbuf *err);
 typedef int alt_odb_fn(struct alternate_object_database *, void *);
 extern int foreach_alt_odb(alt_odb_fn, void*);
+extern void prepare_external_alt_odb(void);
 
 /*
  * Allocate a "struct alternate_object_database" but do _not_ actually
diff --git a/external-odb.c b/external-odb.c
new file mode 100644
index 00..1ccfa99a01
--- /dev/null
+++ b/external-odb.c
@@ -0,0 +1,115 @@
+#include "cache.h"
+#include "external-odb.h"
+#include "odb-helper.h"
+
+static struct odb_helper *helpers;
+static struct odb_helper **helpers_tail = 
+
+static struct odb_helper *find_or_create_helper(const char *name, int len)
+{
+   struct odb_helper *o;
+
+   for (o = helpers; o; o = o->next)
+   if (!strncmp(o->name, name, len) && !o->name[len])
+   return o;
+
+   o = odb_helper_new(name, len);
+   *helpers_tail = o;
+   helpers_tail = >next;
+
+   return o;
+}
+
+static int external_odb_config(const char *var, const char *value, void *data)
+{
+   struct odb_helper *o;
+   const char *key, *dot;
+
+   if (!skip_prefix(var, "odb.", ))
+   return 0;
+   dot = strrchr(key, '.');
+   if (!dot)
+   return 0;
+
+   o = find_or_create_helper(key, dot - key);
+   key = dot + 1;
+
+   if (!strcmp(key, "command"))
+   return git_config_string(>cmd, var, value);
+
+   return 0;
+}
+
+static void external_odb_init(void)
+{
+   static int initialized;
+
+   if (initialized)
+   return;
+   initialized = 1;
+
+   git_config(external_odb_config, NULL);
+}
+
+const char *external_odb_root(void)
+{
+   static const char *root;
+   if (!root)
+   root = git_pathdup("objects/external");
+   return root;
+}
+
+int external_odb_has_object(const unsigned char *sha1)
+{
+   struct odb_helper *o;
+
+   external_odb_init();
+
+   for (o = helpers; o; o = o->next)
+   if (odb_helper_has_object(o, sha1))
+   return 1;
+   return 0;
+}
+
+int external_odb_fetch_object(const unsigned char *sha1)
+{
+   struct odb_helper *o;
+   const char *path;
+
+   if (!external_odb_has_object(sha1))
+   return -1;
+
+   path = sha1_file_name_alt(external_odb_root(), sha1);
+   safe_create_leading_directories_const(path);
+   prepare_external_alt_odb();
+
+   for (o = helpers; o; o = o->next) {
+   struct strbuf tmpfile = 

[RFC/PATCH v3 14/16] lib-httpd: add apache-e-odb.conf

2016-11-30 Thread Christian Couder
This is an apache config file to test external object databases.
It uses the upload.sh and list.sh cgi that have been added
previously to make apache store external objects.

Signed-off-by: Christian Couder 
---
 t/lib-httpd/apache-e-odb.conf | 214 ++
 1 file changed, 214 insertions(+)
 create mode 100644 t/lib-httpd/apache-e-odb.conf

diff --git a/t/lib-httpd/apache-e-odb.conf b/t/lib-httpd/apache-e-odb.conf
new file mode 100644
index 00..19a1540c82
--- /dev/null
+++ b/t/lib-httpd/apache-e-odb.conf
@@ -0,0 +1,214 @@
+ServerName dummy
+PidFile httpd.pid
+DocumentRoot www
+LogFormat "%h %l %u %t \"%r\" %>s %b" common
+CustomLog access.log common
+ErrorLog error.log
+
+   LoadModule log_config_module modules/mod_log_config.so
+
+
+   LoadModule alias_module modules/mod_alias.so
+
+
+   LoadModule cgi_module modules/mod_cgi.so
+
+
+   LoadModule env_module modules/mod_env.so
+
+
+   LoadModule rewrite_module modules/mod_rewrite.so
+
+
+   LoadModule version_module modules/mod_version.so
+
+
+   LoadModule headers_module modules/mod_headers.so
+
+
+
+LockFile accept.lock
+
+
+
+
+   LoadModule auth_module modules/mod_auth.so
+
+
+
+= 2.1>
+
+   LoadModule auth_basic_module modules/mod_auth_basic.so
+
+
+   LoadModule authn_file_module modules/mod_authn_file.so
+
+
+   LoadModule authz_user_module modules/mod_authz_user.so
+
+
+   LoadModule authz_host_module modules/mod_authz_host.so
+
+
+
+= 2.4>
+
+   LoadModule authn_core_module modules/mod_authn_core.so
+
+
+   LoadModule authz_core_module modules/mod_authz_core.so
+
+
+   LoadModule access_compat_module modules/mod_access_compat.so
+
+
+   LoadModule mpm_prefork_module modules/mod_mpm_prefork.so
+
+
+   LoadModule unixd_module modules/mod_unixd.so
+
+
+
+PassEnv GIT_VALGRIND
+PassEnv GIT_VALGRIND_OPTIONS
+PassEnv GNUPGHOME
+PassEnv ASAN_OPTIONS
+PassEnv GIT_TRACE
+PassEnv GIT_CONFIG_NOSYSTEM
+
+Alias /dumb/ www/
+Alias /auth/dumb/ www/auth/dumb/
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+   SetEnv GIT_COMMITTER_NAME "Custom User"
+   SetEnv GIT_COMMITTER_EMAIL cus...@example.com
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+   SetEnv GIT_NAMESPACE ns
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+   Header set Set-Cookie name=value
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+
+ScriptAlias /upload/ upload.sh/
+ScriptAlias /list/ list.sh/
+
+   Options FollowSymlinks
+
+
+  Options ExecCGI
+
+
+  Options ExecCGI
+
+
+   Options ExecCGI
+
+
+RewriteEngine on
+RewriteRule ^/smart-redir-perm/(.*)$ /smart/$1 [R=301]
+RewriteRule ^/smart-redir-temp/(.*)$ /smart/$1 [R=302]
+RewriteRule ^/smart-redir-auth/(.*)$ /auth/smart/$1 [R=301]
+RewriteRule ^/smart-redir-limited/(.*)/info/refs$ /smart/$1/info/refs [R=301]
+RewriteRule ^/ftp-redir/(.*)$ ftp://localhost:1000/$1 [R=302]
+
+RewriteRule ^/loop-redir/x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-(.*) /$1 
[R=302]
+RewriteRule ^/loop-redir/(.*)$ /loop-redir/x-$1 [R=302]
+
+# Apache 2.2 does not understand , so we use RewriteCond.
+# And as RewriteCond does not allow testing for non-matches, we match
+# the desired case first (one has abra, two has cadabra), and let it
+# pass by marking the RewriteRule as [L], "last rule, do not process
+# any other matching RewriteRules after this"), and then have another
+# RewriteRule that matches all other cases and lets them fail via '[F]',
+# "fail the request".
+RewriteCond %{HTTP:x-magic-one} =abra
+RewriteCond %{HTTP:x-magic-two} =cadabra
+RewriteRule ^/smart_headers/.* - [L]
+RewriteRule ^/smart_headers/.* - [F]
+
+
+LoadModule ssl_module modules/mod_ssl.so
+
+SSLCertificateFile httpd.pem
+SSLCertificateKeyFile httpd.pem
+SSLRandomSeed startup file:/dev/urandom 512
+SSLRandomSeed connect file:/dev/urandom 512
+SSLSessionCache none
+SSLMutex file:ssl_mutex
+SSLEngine On
+
+
+
+   AuthType Basic
+   AuthName "git-auth"
+   AuthUserFile passwd
+   Require valid-user
+
+
+
+   AuthType Basic
+   AuthName "git-auth"
+   AuthUserFile passwd
+   Require valid-user
+
+
+
+   AuthType Basic
+   AuthName "git-auth"
+   AuthUserFile passwd
+   Require valid-user
+
+
+RewriteCond %{QUERY_STRING} service=git-receive-pack [OR]
+RewriteCond %{REQUEST_URI} /git-receive-pack$
+RewriteRule ^/half-auth-complete/ - [E=AUTHREQUIRED:yes]
+
+
+  Order Deny,Allow
+  Deny from env=AUTHREQUIRED
+
+  AuthType Basic
+  AuthName "Git Access"
+  AuthUserFile passwd
+  Require valid-user
+  Satisfy Any
+
+
+
+   LoadModule dav_module modules/mod_dav.so
+   LoadModule dav_fs_module 

[RFC/PATCH v3 16/16] t0420: add test with HTTP external odb

2016-11-30 Thread Christian Couder
This tests that an apache web server can be used as an
external object database and store files in their native
format instead of converting them to a Git object.

Signed-off-by: Christian Couder 
---
 t/t0420-transfer-http-e-odb.sh | 118 +
 1 file changed, 118 insertions(+)
 create mode 100755 t/t0420-transfer-http-e-odb.sh

diff --git a/t/t0420-transfer-http-e-odb.sh b/t/t0420-transfer-http-e-odb.sh
new file mode 100755
index 00..9fb84877b5
--- /dev/null
+++ b/t/t0420-transfer-http-e-odb.sh
@@ -0,0 +1,118 @@
+#!/bin/sh
+
+test_description='tests for transfering external objects to an HTTPD server'
+
+. ./test-lib.sh
+
+# If we don't specify a port, the current test number will be used
+# which will not work as it is less than 1024, so it can only be used by root.
+LIB_HTTPD_PORT=$(expr ${this_test#t} + 12000)
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd apache-e-odb.conf
+
+# odb helper script must see this
+export HTTPD_URL
+
+write_script odb-http-helper <<\EOF
+die() {
+   printf >&2 "%s\n" "$@"
+   exit 1
+}
+echo >&2 "odb-http-helper args:" "$@"
+case "$1" in
+have)
+   list_url="$HTTPD_URL/list/"
+   curl "$list_url" ||
+   die "curl '$list_url' failed"
+   ;;
+get)
+   get_url="$HTTPD_URL/list/?sha1=$2"
+   curl "$get_url" ||
+   die "curl '$get_url' failed"
+   ;;
+put)
+   sha1="$2"
+   size="$3"
+   kind="$4"
+   upload_url="$HTTPD_URL/upload/?sha1=$sha1=$size=$kind"
+   curl --data-binary @- --include "$upload_url" >out ||
+   die "curl '$upload_url' failed"
+   ref_hash=$(echo "$sha1 $size $kind" | GIT_NO_EXTERNAL_ODB=1 git 
hash-object -w -t blob --stdin) || exit
+   git update-ref refs/odbs/magic/"$sha1" "$ref_hash"
+   ;;
+*)
+   die "unknown command '$1'"
+   ;;
+esac
+EOF
+HELPER="\"$PWD\"/odb-http-helper"
+
+
+test_expect_success 'setup repo with a root commit and the helper' '
+   test_commit zero &&
+   git config odb.magic.command "$HELPER" &&
+   git config odb.magic.plainObjects "true"
+'
+
+test_expect_success 'setup another repo from the first one' '
+   git init other-repo &&
+   (cd other-repo &&
+git remote add origin .. &&
+git pull origin master &&
+git checkout master &&
+git log)
+'
+
+UPLOADFILENAME="hello_apache_upload.txt"
+
+UPLOAD_URL="$HTTPD_URL/upload/?sha1=$UPLOADFILENAME=123=blob"
+
+test_expect_success 'can upload a file' '
+   echo "Hello Apache World!" >hello_to_send.txt &&
+   echo "How are you?" >>hello_to_send.txt &&
+   curl --data-binary @hello_to_send.txt --include "$UPLOAD_URL" 
>out_upload
+'
+
+LIST_URL="$HTTPD_URL/list/"
+
+test_expect_success 'can list uploaded files' '
+   curl --include "$LIST_URL" >out_list &&
+   grep "$UPLOADFILENAME" out_list
+'
+
+test_expect_success 'can delete uploaded files' '
+   curl --data "delete" --include "$UPLOAD_URL=1" >out_delete &&
+   curl --include "$LIST_URL" >out_list2 &&
+   ! grep "$UPLOADFILENAME" out_list2
+'
+
+FILES_DIR="httpd/www/files"
+
+test_expect_success 'new blobs are transfered to the http server' '
+   test_commit one &&
+   hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+   echo "$hash1-4-blob" >expected &&
+   ls "$FILES_DIR" >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'blobs can be retrieved from the http server' '
+   git cat-file blob "$hash1" &&
+   git log -p >expected
+'
+
+test_expect_success 'update other repo from the first one' '
+   (cd other-repo &&
+git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+test_must_fail git cat-file blob "$hash1" &&
+git config odb.magic.command "$HELPER" &&
+git config odb.magic.plainObjects "true" &&
+git cat-file blob "$hash1" &&
+git pull origin master)
+'
+
+stop_httpd
+
+test_done
-- 
2.11.0.rc2.37.geb49ca6



[RFC/PATCH v3 06/16] external odb: add write support

2016-11-30 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 external-odb.c | 15 +++
 external-odb.h |  2 ++
 odb-helper.c   | 41 +
 odb-helper.h   |  3 +++
 sha1_file.c|  2 ++
 5 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 42978a3298..bb70fe3298 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -127,3 +127,18 @@ int external_odb_for_each_object(each_external_object_fn 
fn, void *data)
}
return 0;
 }
+
+int external_odb_write_object(const void *buf, unsigned long len,
+ const char *type, unsigned char *sha1)
+{
+   struct odb_helper *o;
+
+   external_odb_init();
+
+   for (o = helpers; o; o = o->next) {
+   int r = odb_helper_write_object(o, buf, len, type, sha1);
+   if (r <= 0)
+   return r;
+   }
+   return 1;
+}
diff --git a/external-odb.h b/external-odb.h
index cea8570a49..55d291d1cf 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -10,5 +10,7 @@ typedef int (*each_external_object_fn)(const unsigned char 
*sha1,
   unsigned long size,
   void *data);
 int external_odb_for_each_object(each_external_object_fn, void *);
+int external_odb_write_object(const void *buf, unsigned long len,
+ const char *type, unsigned char *sha1);
 
 #endif /* EXTERNAL_ODB_H */
diff --git a/odb-helper.c b/odb-helper.c
index 2db59caa53..7b7de7380f 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -33,9 +33,10 @@ static void prepare_helper_command(struct argv_array *argv, 
const char *cmd,
strbuf_release();
 }
 
-__attribute__((format (printf,3,4)))
+__attribute__((format (printf,4,5)))
 static int odb_helper_start(struct odb_helper *o,
struct odb_helper_cmd *cmd,
+   int use_stdin,
const char *fmt, ...)
 {
va_list ap;
@@ -52,7 +53,10 @@ static int odb_helper_start(struct odb_helper *o,
 
cmd->child.argv = cmd->argv.argv;
cmd->child.use_shell = 1;
-   cmd->child.no_stdin = 1;
+   if (use_stdin)
+   cmd->child.in = -1;
+   else
+   cmd->child.no_stdin = 1;
cmd->child.out = -1;
 
if (start_command(>child) < 0) {
@@ -109,7 +113,7 @@ static void odb_helper_load_have(struct odb_helper *o)
return;
o->have_valid = 1;
 
-   if (odb_helper_start(o, , "have") < 0)
+   if (odb_helper_start(o, , 0, "have") < 0)
return;
 
fh = xfdopen(cmd.child.out, "r");
@@ -164,7 +168,7 @@ int odb_helper_fetch_object(struct odb_helper *o, const 
unsigned char *sha1,
if (!obj)
return -1;
 
-   if (odb_helper_start(o, , "get %s", sha1_to_hex(sha1)) < 0)
+   if (odb_helper_start(o, , 0, "get %s", sha1_to_hex(sha1)) < 0)
return -1;
 
memset(, 0, sizeof(stream));
@@ -252,3 +256,32 @@ int odb_helper_for_each_object(struct odb_helper *o,
 
return 0;
 }
+
+int odb_helper_write_object(struct odb_helper *o,
+   const void *buf, unsigned long len,
+   const char *type, unsigned char *sha1)
+{
+   struct odb_helper_cmd cmd;
+
+   if (odb_helper_start(o, , 1, "put %s %lu %s",
+sha1_to_hex(sha1), len, type) < 0)
+   return -1;
+
+   do {
+   int w = xwrite(cmd.child.in, buf, len);
+   if (w < 0) {
+   error("unable to write to odb helper '%s': %s",
+ o->name, strerror(errno));
+   close(cmd.child.in);
+   close(cmd.child.out);
+   odb_helper_finish(o, );
+   return -1;
+   }
+   len -= w;
+   } while (len > 0);
+
+   close(cmd.child.in);
+   close(cmd.child.out);
+   odb_helper_finish(o, );
+   return 0;
+}
diff --git a/odb-helper.h b/odb-helper.h
index 8c3916d215..af31cc27d5 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -25,5 +25,8 @@ int odb_helper_fetch_object(struct odb_helper *o, const 
unsigned char *sha1,
int fd);
 int odb_helper_for_each_object(struct odb_helper *o,
   each_external_object_fn, void *);
+int odb_helper_write_object(struct odb_helper *o,
+   const void *buf, unsigned long len,
+   const char *type, unsigned char *sha1);
 
 #endif /* ODB_HELPER_H */
diff --git a/sha1_file.c b/sha1_file.c
index 6d68157e30..3532c1c598 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3320,6 +3320,8 @@ int write_sha1_file(const void *buf, unsigned long len, 
const char *type, unsign
 * it out into .git/objects/??/?{38} file.
 */

[RFC/PATCH v3 00/16] Add initial experimental external ODB support

2016-11-30 Thread Christian Couder
Goal


Git can store its objects only in the form of loose objects in
separate files or packed objects in a pack file.

To be able to better handle some kind of objects, for example big
blobs, it would be nice if Git could store its objects in other object
databases (ODB).

To do that, this patch series makes it possible to register commands,
using "odb..command" config variables, to access external
ODBs where objects can be stored and retrieved.

External ODBs should be able to tranfer information about the blobs
they store. This patch series shows how this is possible using kind of
replace refs.

Design
~~

* Registered command

Each registered command manages access to one external ODB and will be
called the following ways:

  - " have": the command should output the sha1, size and
type of all the objects the external ODB contains, one object per
line.

  - " get ": the command should then read from the
external ODB the content of the object corresponding to  and
output it on stdout.

  - " put   ": the command should then read
from stdin an object and store it in the external ODB.

* Transfer

To tranfer information about the blobs stored in external ODB, some
special refs, called "odb ref", similar as replace refs, are used.

For now there should be one odb ref per blob. Each ref name should be
refs/odbs// where  is the sha1 of the blob stored
in the external odb named .

These odb refs should all point to a blob that should be stored in the
Git repository and contain information about the blob stored in the
external odb. This information can be specific to the external odb.
The repos can then share this information using commands like:

`git fetch origin "refs/odbs//*:refs/odbs//*"`

* External object database

This RFC patch series shows in the tests:

  - how to use another git repository as an external ODB
  - how to use an http server as an external ODB

Design discussion about performance
~~~

Yeah, it is not efficient to fork/exec a command to just read or write
one object to or from the external ODB. Batch calls and/or using a
daemon and/or RPC should be used instead to be able to store regular
objects in an external ODB. But for now the external ODB would be all
about really big files, where the cost of a fork+exec should not
matter much. If we later want to extend usage of external ODBs, yeah
we will probably need to design other mechanisms.

Here are some related explanations from Peff:

{{{
Because this "external odb" essentially acts as a git alternate, we
would hit it only when we couldn't find an object through regular means.
Git would then make the object available in the usual on-disk format
(probably as a loose object).

So in most processes, we would not need to consult the odb command at
all. And when we do, the first thing would be to get its "have" list,
which would at most run once per process.

So the per-object cost is really calling "get", and my assumption there
was that the cost of actually retrieving the object over the network
would dwarf the fork/exec cost.

I also waffled on having git cache the output of " have" in
some fast-lookup format to save even the single fork/exec. But I figured
that was something that could be added later if needed.

You'll note that this is sort of a "fault-in" model. Another model would
be to treat external odb updates similar to fetches. I.e., we touch the
network only during a special update operation, and then try to work
locally with whatever the external odb has. IMHO this policy could
actually be up to the external odb itself (i.e., its "have" command
could serve from a local cache if it likes).
}}}

Implementation
~~

* Mechanism to call the registered commands

This series adds a set of function in external-odb.{c,h} that are
called by the rest of Git to manage all the external ODBs.

These functions use 'struct odb_helper' and its associated functions
defined in odb-helper.{c,h} to talk to the different external ODBs by
launching the configured "odb..command" commands and writing
to or reading from them.

The tests in this series creates an odb-helper script that is
registered using the "odb.magic.command" config variable, and then
called to read from and write to the external ODB.

* ODB refs

For now odb ref management is only implemented in a registered command
in t0410, but maybe this or some parts of it could be done by Git
itself.

When a new blob is added to an external odb, its sha1, size and type
are writen in another new blob and the odb ref is created.

When the list of existing blobs is requested from the external odb,
the content of the blobs pointed to by the odb refs can also be used
by the odb to claim that it can get the objects.

When a blob is actually requested from the external odb, it can use
the content stored in the blobs pointed to by the odb refs to get the
actual blobs and then pass them.

Highlevel view of the patches in the series

[RFC/PATCH v3 05/16] t0400: add test for 'put' command

2016-11-30 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/t0400-external-odb.sh | 9 +
 1 file changed, 9 insertions(+)

diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index 0f1bb97f38..6c6da5cf4f 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -57,4 +57,13 @@ test_expect_success 'helper can retrieve alt objects' '
test_cmp expect actual
 '
 
+test_expect_success 'helper can add objects to alt repo' '
+   hash=$(echo "Hello odb!" | git hash-object -w -t blob --stdin) &&
+   test -f .git/objects/$(echo $hash | sed "s#..#&/#") &&
+   size=$(git cat-file -s "$hash") &&
+   git cat-file blob "$hash" | ./odb-helper put "$hash" "$size" blob &&
+   alt_size=$(cd alt-repo && git cat-file -s "$hash") &&
+   test "$size" -eq "$alt_size"
+'
+
 test_done
-- 
2.11.0.rc2.37.geb49ca6



Grant.

2016-11-30 Thread web

Hello,
 
A donation has been made out to you and your family from my foundation.
 
For details, send an E-mail now to: castrat...@gmail.com
 
Regards



Re: [PATCHv2 4/4] submodule: add embed-git-dir function

2016-11-30 Thread Junio C Hamano
Duy Nguyen  writes:

> On Wed, Nov 23, 2016 at 2:22 AM, Stefan Beller  wrote:
>> +/*
>> + * Migrate the given submodule (and all its submodules recursively) from
>> + * having its git directory within the working tree to the git dir nested
>> + * in its superprojects git dir under modules/.
>> + */
>> +void migrate_submodule_gitdir(const char *prefix, const char *path,
>> + int recursive)
>
> Submodules and worktrees seem to have many things in common. The first
> one is this. "git worktree move" on a worktree that contains
> submodules .git also benefits from something like this [1]. I suggest
> you move this function to some neutral place and maybe rename it to
> relocate_gitdir() or something.

Yeah, good suggestion (including name; first round used "intern" I
had trouble with, then "embed" which was OK-ish, but probably
"relocate" is better choice.  If anything, what Stefan's series adds
is a command to un-embed embedded one).

> It probably should take a bit flag instead of "recursive" here. One
> thing I would need is the ability to tell this function "I have moved
> all these .git dirs already (because I move whole worktree in one
> operation), here are the old and new locations of them, fix them up!".
> In other words, no rename() could be optionally skipped.

Thanks two of you for working well together ;-)


Re: [PATCHv2 4/4] submodule: add embed-git-dir function

2016-11-30 Thread Stefan Beller
On Wed, Nov 30, 2016 at 10:04 AM, Stefan Beller  wrote:

>> Submodules and worktrees seem to have many things in common.
>
> Yes. :)

I moved the code to dir.{c,h} and renamed it to a more generic "move_gitdir",
but then I am thinking further about how to align worktrees and submodules here,
specifically the recursive part.

Whenever submodules and recursion is involved so far we spawned off a child
process to take care of the issue in the submodule itself. Here we followed
the same pattern and called the submodule--helper to embed the git dirs
in the submodule recursively.

As this function is not supposed to be submodule specific anymore, I'd
rather not want to have the recursive childrens code live inside the submodule
helper, so we also need a neutral helper for moving git directories?

So there are a couple ways forward:
* We declare the "recursive" flag to be submodule specific and keep
the recursion
  in the submodule helper
* the recursive flag is generic enough and we invent a plumbing helper.
  Analogous to the submodule--helper, that was originally invented as a C aid
  for the git-submodule shell script, we could have a "git--helper" or just
  "git plumbing ", though that sounds like reinventing the wheel as
  traditionally we'd just have a top level command to do a specific thing.
  (side note: Some parts of git-rev-parse could go into such a plumbing
  command)

For now I'd think to declare recursion in this function to be
submodule specific.

Stefan


Re: [PATCHv2 4/4] submodule: add embed-git-dir function

2016-11-30 Thread Stefan Beller
On Wed, Nov 30, 2016 at 12:51 PM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>> On Wed, Nov 23, 2016 at 2:22 AM, Stefan Beller  wrote:
>>> +/*
>>> + * Migrate the given submodule (and all its submodules recursively) from
>>> + * having its git directory within the working tree to the git dir nested
>>> + * in its superprojects git dir under modules/.
>>> + */
>>> +void migrate_submodule_gitdir(const char *prefix, const char *path,
>>> + int recursive)
>>
>> Submodules and worktrees seem to have many things in common. The first
>> one is this. "git worktree move" on a worktree that contains
>> submodules .git also benefits from something like this [1]. I suggest
>> you move this function to some neutral place and maybe rename it to
>> relocate_gitdir() or something.
>
> Yeah, good suggestion (including name; first round used "intern" I
> had trouble with, then "embed" which was OK-ish, but probably
> "relocate" is better choice.  If anything, what Stefan's series adds
> is a command to un-embed embedded one).

Heh, good counter perspective. I thought about embedding it
"into the superprojects git directory" whereas your un-embed sounds
like you'd assume embedding is targetd towards the working dir.

relocate as a fancy name for move sounds like it expects 2 arguments
(source and destination), but maybe we could go with:

git relocate-git-dir (--into-workingtree|--into-gitdir) \
  [--recurse-submodules] \
  [--only-fix-gitfile-links-and-core-setting-as-I-moved-it-myself-already] \
  [--dryrun] [--verbose] [--unsafe-move]

No need to have a pathspec here IMHO. Later it could have another flag
to specify the "main" worktree or such as well.


Re: [PATCH] difftool.c: mark a file-local symbol with static

2016-11-30 Thread Junio C Hamano
Ramsay Jones  writes:

> [I have fixed my config.mak file now, so I don't see the warning
> anymore! Having -Wno-format-zero-length in DEVELOPER_CFLAGS, or
> not, is a separate matter.]

I suspect that 658df95a4a ("add DEVELOPER makefile knob to check for
acknowledged warnings", 2016-02-25) took it from me (namely, Make
script in my 'todo' branch).  In turn, I added it to my set of flags
in order to squelch this exact warning, so...




"git add -p ." raises an unexpected "warning: empty strings as pathspecs will be made invalid in upcoming releases. please use . instead if you meant to match all paths"

2016-11-30 Thread Peter Urda
After upgrading to version 2.11.0 I am getting a warning about empty
strings as pathspecs while using 'patch'

- Ran 'git add -p .' from the root of my git repository.

- I was able to normally stage my changes, but was presented with a
"warning: empty strings as pathspecs will be made invalid in upcoming
releases. please use . instead if you meant to match all paths"
message.

- I expected no warning message since I included a "." with my original command.

I believe that I should not be seeing this warning message as I
included the requested "." pathspec.

~ Peter Urda

http://urda.cc


Re: [PATCH] difftool.c: mark a file-local symbol with static

2016-11-30 Thread Ramsay Jones


On 30/11/16 11:07, Johannes Schindelin wrote:
> Hi Ramsay,
> 
> On Tue, 29 Nov 2016, Ramsay Jones wrote:
>> Also, due to a problem in my config.mak file on Linux (a commented
>> out line that had a line continuation '\', gr!), gcc issued a
>> warning, thus:
>>
>>   builtin/difftool.c: In function ‘run_dir_diff’:
>>   builtin/difftool.c:568:13: warning: zero-length gnu_printf format string 
>> [-Wformat-zero-length]
>>warning("");
>>^
>> I am not sure why -Wno-format-zero-length is set in DEVELOPER_CFLAGS,
>> but do you really need to space the output with an an 'empty'
>> "warning:" line? (Just curious).
> 
> That `warning("");` comes from a straight-forward port of this line (see
> https://github.com/git/git/blob/v2.11.0/git-difftool.perl#L425):
> 
>   $errmsg .= "warning:\n";

Ah, OK, so it is being used to 'space out' the (possibly multiple)
'Both files modified' warning(s) followed by a 2-line warning
summary. Hmm, I am not sure the 'Working tree file has been left'
(after each pair of files) part of the message is adding much ...

> I could see two possible ways out:
> 
> - warning("%s", ""); (ugly!)
> 
> - do away with the "prefix every line with warning:" convention and simply
>   have a multi-line `warning(_("...\n...\n"), ...)`
> 
> What do you think?

I think, for now, it is probably best to do nothing. ;-)

Until you have replaced the 'legacy' difftool, it would be best
not to alter the output from the tool, so that the tests can be
used unaltered on both. This could be addressed (if necessary)
after you complete your series.

[I have fixed my config.mak file now, so I don't see the warning
anymore! Having -Wno-format-zero-length in DEVELOPER_CFLAGS, or
not, is a separate matter.]

ATB,
Ramsay Jones



Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)

2016-11-30 Thread Brandon Williams
On 11/29, Jeff King wrote:
> On Tue, Nov 29, 2016 at 01:37:59AM -0500, Jeff King wrote:
> 
> >   2. Grep threads doing more complicated stuff that needs to take a
> >  lock. You might try building with -fsanitize=thread to see if it
> >  turns up anything.
> 
> I tried this and it didn't find anything useful. It complains about
> multiple threads calling want_color() at the same time, which you can
> silence with something like:
> 
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 2c727ef49..d48846f40 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -207,6 +207,12 @@ static void start_threads(struct grep_opt *opt)
>  {
>   int i;
>  
> + /*
> +  * trigger want_color() for its side effect of caching the result;
> +  * otherwise the threads will fight over setting the cache
> +  */
> + want_color(GIT_COLOR_AUTO);
> +
>   pthread_mutex_init(_mutex, NULL);
>   pthread_mutex_init(_read_mutex, NULL);
>   pthread_mutex_init(_attr_mutex, NULL);
> 
> But the problem persists even with that patch, so it is something else.
> It may still be a threading problem; -fsanitize=thread isn't perfect. I
> also couldn't get the stress-test to fail when compiled with it. But
> that may simply mean that the timing of the resulting binary is changed
> enough not to trigger the issue.
> 
> -Peff

With you're stress script I'm able to see the failures.  The interesting
thing is that the entry missing is always from the non-submodule file.

-- 
Brandon Williams


Re: gitconfig includes

2016-11-30 Thread Eli Barzilay
On Wed, Nov 30, 2016 at 2:06 PM, Jeff King  wrote:
>
> I'm not sure what your script does exactly, but in general I think the
> right thing for most scripts is _not_ to use a specific-file option
> like --global.
>
> If the script is looking up a config value on behalf of a user, it
> probably makes sense for it to use the normal config lookup procedure
> (system, global, repo, command-line), which also enables includes by
> default. That would make it consistent with internal git config
> lookups (e.g., user.name probably only ever appears in global config,
> but you _can_ override it at the repo level if you want to).

This is intended for git newbies (and big company => infinite supply of
them), and also allows them to conveniently nuke the repo and start from
a fresh copy, so it makes sense to make the script inspect/tweak the
global settings.  If knowing git "well enough" was an assumed
requirement, I'd definitely do the normal thing.


> I know that's mostly orthogonal to what we're discussing, but I'd feel
> more convinced that enabling "--includes" with "--global" is useful if
> I thought that "--global" was useful in the first place outside of a
> few narrow debugging cases.

Ok.  Perhaps I overestimated the utility of --global anyway, given the
above...

-- 
   ((x=>x(x))(x=>x(x)))  Eli Barzilay:
   http://barzilay.org/  Maze is Life!


Re: gitconfig includes

2016-11-30 Thread Jeff King
On Wed, Nov 30, 2016 at 01:54:35PM -0500, Eli Barzilay wrote:

> I don't have any strong opinion, but FWIW, the use case I have for this
> is as follows: I sync my ~/.gitconfig between my own machine and a work
> machine.  On the work machine though, I like people to have work emails,
> and I wrote some scripts that verify that.  For my case, I added an
> include of a ~/.gitconfig.more which is not synced, and has values that
> override the ones in ~/.gitconfig.  Since I'm the one who also wrote
> that script, I just added an "--includes" to the check so it won't barf
> on my setup, but had it not been my script I'd be stuck.

I'm not sure what your script does exactly, but in general I think the
right thing for most scripts is _not_ to use a specific-file option like
--global.

If the script is looking up a config value on behalf of a user, it
probably makes sense for it to use the normal config lookup procedure
(system, global, repo, command-line), which also enables includes by
default. That would make it consistent with internal git config lookups
(e.g., user.name probably only ever appears in global config, but you
_can_ override it at the repo level if you want to).

I know that's mostly orthogonal to what we're discussing, but I'd feel
more convinced that enabling "--includes" with "--global" is useful if I
thought that "--global" was useful in the first place outside of a few
narrow debugging cases.

-Peff


Re: gitconfig includes

2016-11-30 Thread Eli Barzilay
On Tue, Nov 29, 2016 at 4:50 PM, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> I think it's arguable whether "--global" should behave the same.
>
> I know you know this and I am writing this message for others.
>
> I admit that I wondered if "a single file" ought to cover these
> short-hand notations like --global and --local while re-reading the
> log message of 9b25a0b52 (config: add include directive,
> 2012-02-06).  In other words, I agree that it used to be arguable
> before we released v1.7.10.
>
> It no longer is arguable simply due to backward compatibilty.  The
> ship has long sailed.

I don't have any strong opinion, but FWIW, the use case I have for this
is as follows: I sync my ~/.gitconfig between my own machine and a work
machine.  On the work machine though, I like people to have work emails,
and I wrote some scripts that verify that.  For my case, I added an
include of a ~/.gitconfig.more which is not synced, and has values that
override the ones in ~/.gitconfig.  Since I'm the one who also wrote
that script, I just added an "--includes" to the check so it won't barf
on my setup, but had it not been my script I'd be stuck.

This is all a "FWIW" -- in case anyone thinks about use cases for a
possible (future) change of the default.

-- 
   ((x=>x(x))(x=>x(x)))  Eli Barzilay:
   http://barzilay.org/  Maze is Life!


Re: [PATCH v3 1/2] difftool: add a skeleton for the upcoming builtin

2016-11-30 Thread Junio C Hamano
Jakub Narębski  writes:

>> My original "create a file in libexec/git-core/" was simple, did the job
>> reliably, and worked also for testing.
>> 
>> It is a pity that you two gentlemen shot it down for being inelegant. And
>> ever since, we try to find a solution that is as simple, works as
>> reliably, also for testing, *and* appeases your tastes.
>
> I just would like to note that existence of file is used for both
> git-daemon and gitweb (the latter following the git-daemon example).
>
> So there is a precedent for the use of this mechanism.

I think you are thinking about git-daemon-export-ok (for 'git
daemon') and $GITWEB_EXPORT_OK file (for 'gitweb').

You do realize that it is apples-and-oranges [*1*] to take these as
analogous to what Dscho is trying to do, don't you?

First of all, these are to control access to each repository on the
server side; the presence of the file is checked in each repository.
What Dscho wants is to control the behaviour of an installation of
Git as a whole, no matter which repository is being accessed [*2*,
*3*].

More importantly, did you notice that git-daemon-export-ok predates
the configuration mechanism by a large margin?  The "does the file
exist?" check done in a87e8be2ae ("Add a "git-daemon" that listens
on a TCP port", 2005-07-13) is a relic from the past [*4*], and
32f4aaccaa ("gitweb: export options", 2006-09-17) added
GITWEB_EXPORT_OK to mimic it, also long time ago [*5*].  They are
not something you would want to mimic in new programs these days.

Besides, $GIT_EXEC_PATH is where you place git subcommands.  Who in
the right mind considers it even remotely sane to design a system
where you have to throw in a file that is not a command to /usr/bin
to control the behaviour of your system? [*6*]

So the "precedent" is irrelevant in the first place, and even if it
were relevant, it is a bad piece of advice to mimic it.


[Footnote]

*1* Or is it apples-and-pineapples these days?

*2* Not that I agree with that desire, if I understand him correctly
from his description against the approach based on an
environment variable.  If a user has multiple installations and
not even aware of which one of them s/he is currently using, a
mechanism that affects only one of them (instead of consistently
affecting all of them) would lead to more confusion, I would
think.  

*3* If such hermetically configured independent installations are
desirable, etc/gitconfig aka "git config --system" is a more
appropriate thing to use, and you do not need to do repository
discovery before you can read it.

*4* If we had config mechanism, we would have used it just like we
use daemon.* variables to control what services are enabled for
each repository.

*5* By that time, the config mechanism did already exist, so the
GITWEB_EXPORT_OK could have been a per-repository configuration,
but "gitweb" had another excuse to deviate from the norm.  "Is
this repository visible?" was done during repository listing and
the script did not want to run "git config" in each and every
repository-like directory it encountered in File::Find::find().

*6* And I do not think $GIT_EXEC_PATH vs /usr/bin is
apples-and-oranges analogy.


Re: [PATCHv2 4/4] submodule: add embed-git-dir function

2016-11-30 Thread Stefan Beller
On Wed, Nov 30, 2016 at 3:14 AM, Duy Nguyen  wrote:
> On Wed, Nov 23, 2016 at 2:22 AM, Stefan Beller  wrote:
>> +/*
>> + * Migrate the given submodule (and all its submodules recursively) from
>> + * having its git directory within the working tree to the git dir nested
>> + * in its superprojects git dir under modules/.
>> + */
>> +void migrate_submodule_gitdir(const char *prefix, const char *path,
>> + int recursive)
>
> Submodules and worktrees seem to have many things in common.

Yes. :)

> The first
> one is this. "git worktree move" on a worktree that contains
> submodules .git also benefits from something like this [1].

That patch is a sensible approach. :)
(By checking all files to not be submodules a worktree would not run
into problems like
a127331cd81, mv: allow moving nested submodules)

> I suggest
> you move this function to some neutral place and maybe rename it to
> relocate_gitdir() or something.

ok tell me where this neutral place is found?
(I'd prefer to not clobber it into cache.h *the* most neutral place in git)
Maybe dir.{c,h} ?

>
> It probably should take a bit flag instead of "recursive" here. One
> thing I would need is the ability to tell this function "I have moved
> all these .git dirs already (because I move whole worktree in one
> operation), here are the old and new locations of them, fix them up!".
> In other words, no rename() could be optionally skipped.

In the non-main working trees you'd also have a .git file linking
to the actual git dir and you'd only have to fix them up instead of moving.


>
> [1] 
> https://public-inbox.org/git/20161128094319.16176-11-pclo...@gmail.com/T/#u
>
>> +{
>> +   char *old_git_dir;
>> +   const char *new_git_dir;
>> +   const struct submodule *sub;
>> +
>> +   old_git_dir = xstrfmt("%s/.git", path);
>> +   if (read_gitfile(old_git_dir))
>> +   /* If it is an actual gitfile, it doesn't need migration. */
>> +   goto out;
>> +
>> +   sub = submodule_from_path(null_sha1, path);
>> +   if (!sub)
>> +   die(_("Could not lookup name for submodule '%s'"),
>> + path);
>> +
>> +   new_git_dir = git_common_path("modules/%s", sub->name);
>
> Why doesn't git_path() work here? This would make "modules" shared
> between worktrees, even though it's not normally. That inconsistency
> could cause trouble.

I thought that was a long term goal?
(I actually think about reviving the series you sent out a few weeks ago
to make worktree and submodules work well together)

So for that we'd want to have at least the object store shared across all
worktrees.

>
>> +   if (safe_create_leading_directories_const(new_git_dir) < 0)
>> +   die(_("could not create directory '%s'"), new_git_dir);
>> +
>> +   if (!prefix)
>> +   prefix = get_super_prefix();
>> +   printf("Migrating git directory of %s%s from\n'%s' to\n'%s'\n",
>> +   prefix ? prefix : "", path,
>> +   real_path(old_git_dir), new_git_dir);
>> +
>> +   if (rename(old_git_dir, new_git_dir) < 0)
>> +   die_errno(_("Could not migrate git directory from '%s' to 
>> '%s'"),
>> +   old_git_dir, new_git_dir);
>> +
>> +   connect_work_tree_and_git_dir(path, new_git_dir);
>
> Another thing in common is, both submodules and worktrees use some
> form of textual symlinks. You need to fix up some here. But if this
> submodule has multiple worktreee, there may be some "symlinks" in
> .git/worktrees which would need fixing up as well.

We could signal that via one of the flag bits?
(e.g. FIXUP_WORKTREE_SYMLINKS )

>
> You don't have to do the fix up thing right away, but I think we
> should at least make sure we leave no dangling links behind (by
> die()ing early if we find a .git dir we can't handle yet)
> --
> Duy


[PATCH v2 1/1] convert: git cherry-pick -Xrenormalize did not work

2016-11-30 Thread tboegi
From: Torsten Bögershausen 

Working with a repo that used to be all CRLF. At some point it
was changed to all LF, with `text=auto` in .gitattributes.
Trying to cherry-pick a commit from before the switchover fails:

$ git cherry-pick -Xrenormalize 
fatal: CRLF would be replaced by LF in [path]

Commit 65237284 "unify the "auto" handling of CRLF" introduced
a regression:

Whenever crlf_action is CRLF_TEXT_XXX and not CRLF_AUTO_XXX,
SAFE_CRLF_RENORMALIZE was feed into check_safe_crlf().
This is wrong because here everything else than SAFE_CRLF_WARN is
treated as SAFE_CRLF_FAIL.

Call check_safe_crlf() only if checksafe is SAFE_CRLF_WARN or SAFE_CRLF_FAIL.

Reported-by: Eevee (Lexy Munroe) 
Signed-off-by: Torsten Bögershausen 
---
 convert.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/convert.c b/convert.c
index be91358..f8e4dfe 100644
--- a/convert.c
+++ b/convert.c
@@ -281,13 +281,13 @@ static int crlf_to_git(const char *path, const char *src, 
size_t len,
/*
 * If the file in the index has any CR in it, do not convert.
 * This is the new safer autocrlf handling.
+  - unless we want to renormalize in a merge or cherry-pick
 */
-   if (checksafe == SAFE_CRLF_RENORMALIZE)
-   checksafe = SAFE_CRLF_FALSE;
-   else if (has_cr_in_index(path))
+   if ((checksafe != SAFE_CRLF_RENORMALIZE) && 
has_cr_in_index(path))
convert_crlf_into_lf = 0;
}
-   if (checksafe && len) {
+   if ((checksafe == SAFE_CRLF_WARN ||
+   (checksafe == SAFE_CRLF_FAIL)) && len) {
struct text_stat new_stats;
memcpy(_stats, , sizeof(new_stats));
/* simulate "git add" */
-- 
2.10.0



Re: [PATCH v3 1/2] difftool: add a skeleton for the upcoming builtin

2016-11-30 Thread Jakub Narębski
Hello,

W dniu 28.11.2016 o 18:34, Johannes Schindelin pisze:

> My original "create a file in libexec/git-core/" was simple, did the job
> reliably, and worked also for testing.
> 
> It is a pity that you two gentlemen shot it down for being inelegant. And
> ever since, we try to find a solution that is as simple, works as
> reliably, also for testing, *and* appeases your tastes.

I just would like to note that existence of file is used for both
git-daemon and gitweb (the latter following the git-daemon example).

So there is a precedent for the use of this mechanism.

Best,
-- 
Jakub Narębski



Re: [PATCH v3 1/2] difftool: add a skeleton for the upcoming builtin

2016-11-30 Thread Jeff King
On Wed, Nov 30, 2016 at 01:30:47PM +0100, Johannes Schindelin wrote:

> On Tue, 29 Nov 2016, Jeff King wrote:
> 
> > On Tue, Nov 29, 2016 at 09:36:55PM +0100, Johannes Schindelin wrote:
> > 
> > > So the suggestion by both you and Peff, to use an environment variable,
> > > which is either global, or requires the user to set it manually per
> > > session, is simply not a good idea at all.
> > 
> > No, my suggestion was to use config and have the test suite use an
> > environment variable to test both cases (preferably automatically,
> > without the user having to do anything).
> > 
> > I do not see how that fails to cover all of your use cases.
> 
> Oh, so the suggestion is to have *both* a config *and* an environment
> variable. That is not elegant.

No, that is not at all what I said.  I was going to explain myself
again, but I do not see what good it would do, as clearly my point did
not come across in the other three emails. And then you would just
complain that I am making work for you. So whatever. I do not care about
your difftool topic at all. Do whatever you like (which hey, I already
said before, too).

-Peff


[PATCH v2] tag, branch, for-each-ref: add --ignore-case for sorting and filtering

2016-11-30 Thread Nguyễn Thái Ngọc Duy
This option makes sorting ignore case, which is great when you have
branches named bug-12-do-something, Bug-12-do-some-more and
BUG-12-do-what and want to group them together. Sorting externally may
not be an option because we lose coloring and column layout from
git-branch and git-tag.

The same could be said for filtering, but it's probably less important
because you can always go with the ugly pattern [bB][uU][gG]-* if you're
desperate.

You can't have case-sensitive filtering and case-insensitive sorting (or
the other way around) with this though. But who would want that?

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 v2 has a different approach, and I think it's a better one even with
 that unanswered question above.

 Documentation/git-branch.txt   |  4 
 Documentation/git-for-each-ref.txt |  3 +++
 Documentation/git-tag.txt  |  4 
 builtin/branch.c   | 23 ++-
 builtin/for-each-ref.c |  5 -
 builtin/tag.c  |  4 
 ref-filter.c   | 28 +---
 ref-filter.h   |  2 ++
 t/t3203-branch-output.sh   | 22 ++
 t/t7004-tag.sh | 20 
 10 files changed, 98 insertions(+), 17 deletions(-)

diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
index 1fe7344..5516a47 100644
--- a/Documentation/git-branch.txt
+++ b/Documentation/git-branch.txt
@@ -118,6 +118,10 @@ OPTIONS
default to color output.
Same as `--color=never`.
 
+-i::
+--ignore-case::
+   Sorting and filtering branches are case insensitive.
+
 --column[=]::
 --no-column::
Display branch listing in columns. See configuration variable
diff --git a/Documentation/git-for-each-ref.txt 
b/Documentation/git-for-each-ref.txt
index f57e69b..6d22974 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -79,6 +79,9 @@ OPTIONS
Only list refs which contain the specified commit (HEAD if not
specified).
 
+--ignore-case::
+   Sorting and filtering refs are case insensitive.
+
 FIELD NAMES
 ---
 
diff --git a/Documentation/git-tag.txt b/Documentation/git-tag.txt
index 80019c5..76cfe40 100644
--- a/Documentation/git-tag.txt
+++ b/Documentation/git-tag.txt
@@ -108,6 +108,10 @@ OPTIONS
variable if it exists, or lexicographic order otherwise. See
linkgit:git-config[1].
 
+-i::
+--ignore-case::
+   Sorting and filtering tags are case insensitive.
+
 --column[=]::
 --no-column::
Display tag listing in columns. See configuration variable
diff --git a/builtin/branch.c b/builtin/branch.c
index 60cc5c8..36e0a21 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -512,15 +512,6 @@ static void print_ref_list(struct ref_filter *filter, 
struct ref_sorting *sortin
if (filter->verbose)
maxwidth = calc_maxwidth(, strlen(remote_prefix));
 
-   /*
-* If no sorting parameter is given then we default to sorting
-* by 'refname'. This would give us an alphabetically sorted
-* array with the 'HEAD' ref at the beginning followed by
-* local branches 'refs/heads/...' and finally remote-tacking
-* branches 'refs/remotes/...'.
-*/
-   if (!sorting)
-   sorting = ref_default_sorting();
ref_array_sort(sorting, );
 
for (i = 0; i < array.nr; i++)
@@ -645,6 +636,7 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
const char *new_upstream = NULL;
enum branch_track track;
struct ref_filter filter;
+   int icase = 0;
static struct ref_sorting *sorting = NULL, **sorting_tail = 
 
struct option options[] = {
@@ -686,6 +678,7 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
OPTION_CALLBACK, 0, "points-at", _at, 
N_("object"),
N_("print only branches of the object"), 0, 
parse_opt_object_name
},
+   OPT_BOOL('i', "ignore-case", , N_("sorting and filtering 
are case insensitive")),
OPT_END(),
};
 
@@ -723,6 +716,8 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
 
if (filter.abbrev == -1)
filter.abbrev = DEFAULT_ABBREV;
+   filter.ignore_case = icase;
+
finalize_colopts(, -1);
if (filter.verbose) {
if (explicitly_enable_column(colopts))
@@ -744,6 +739,16 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
if ((filter.kind & FILTER_REFS_BRANCHES) && filter.detached)
filter.kind |= FILTER_REFS_DETACHED_HEAD;
filter.name_patterns = argv;
+   /*
+* If no sorting parameter is given then we default to sorting
+* by 'refname'. This would give us an alphabetically 

Re: [PATCH v3 1/2] difftool: add a skeleton for the upcoming builtin

2016-11-30 Thread Johannes Schindelin
Hi Junio,

On Tue, 29 Nov 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > So the suggestion by both you and Peff, to use an environment variable,
> > which is either global, or requires the user to set it manually per
> > session, is simply not a good idea at all.
> 
> As I already said, I do not have a strong preference between config
> and env.  I raised the env as a possible alternative that you can
> think about its pros and cons, and as I already said, if you thought
> and your concluded that config would work better for your needs,
> that is fine by me.

The env flat out fails, on the grounds of not integrating nicely into a
Git for Windows installer.

The config kinda works now. But for what price. It stole 4 hours I did not
have. When the libexec/git-core/use-builtin-difftool solution took me a
grand total of half an hour to devise, implement and test.

And you know what? I still do not really see what is so bad about it.

And I still see what is bad about the config "solution": it *creates* a
chicken-and-egg problem with the order of config reading vs running
scripts. It *creates* problems for requiring to spawn a `git config` call
because reading the config in-process would mess up the global variables
and environment *beyond repair*, making it *impossible* to even spawn the
git-legacy-difftool Perl script.

In short: the config setting now works. But it is ugly as hell. I wish I
never had listened to you.

Ciao,
Dscho


Re: [PATCH v3 1/2] difftool: add a skeleton for the upcoming builtin

2016-11-30 Thread Johannes Schindelin
Hi Peff,

On Tue, 29 Nov 2016, Jeff King wrote:

> On Tue, Nov 29, 2016 at 09:36:55PM +0100, Johannes Schindelin wrote:
> 
> > So the suggestion by both you and Peff, to use an environment variable,
> > which is either global, or requires the user to set it manually per
> > session, is simply not a good idea at all.
> 
> No, my suggestion was to use config and have the test suite use an
> environment variable to test both cases (preferably automatically,
> without the user having to do anything).
> 
> I do not see how that fails to cover all of your use cases.

Oh, so the suggestion is to have *both* a config *and* an environment
variable. That is not elegant.

Ciao,
Dscho


Re: [PATCHv2 4/4] submodule: add embed-git-dir function

2016-11-30 Thread Duy Nguyen
On Wed, Nov 23, 2016 at 2:22 AM, Stefan Beller  wrote:
> +/*
> + * Migrate the given submodule (and all its submodules recursively) from
> + * having its git directory within the working tree to the git dir nested
> + * in its superprojects git dir under modules/.
> + */
> +void migrate_submodule_gitdir(const char *prefix, const char *path,
> + int recursive)

Submodules and worktrees seem to have many things in common. The first
one is this. "git worktree move" on a worktree that contains
submodules .git also benefits from something like this [1]. I suggest
you move this function to some neutral place and maybe rename it to
relocate_gitdir() or something.

It probably should take a bit flag instead of "recursive" here. One
thing I would need is the ability to tell this function "I have moved
all these .git dirs already (because I move whole worktree in one
operation), here are the old and new locations of them, fix them up!".
In other words, no rename() could be optionally skipped.

[1] https://public-inbox.org/git/20161128094319.16176-11-pclo...@gmail.com/T/#u

> +{
> +   char *old_git_dir;
> +   const char *new_git_dir;
> +   const struct submodule *sub;
> +
> +   old_git_dir = xstrfmt("%s/.git", path);
> +   if (read_gitfile(old_git_dir))
> +   /* If it is an actual gitfile, it doesn't need migration. */
> +   goto out;
> +
> +   sub = submodule_from_path(null_sha1, path);
> +   if (!sub)
> +   die(_("Could not lookup name for submodule '%s'"),
> + path);
> +
> +   new_git_dir = git_common_path("modules/%s", sub->name);

Why doesn't git_path() work here? This would make "modules" shared
between worktrees, even though it's not normally. That inconsistency
could cause trouble.

> +   if (safe_create_leading_directories_const(new_git_dir) < 0)
> +   die(_("could not create directory '%s'"), new_git_dir);
> +
> +   if (!prefix)
> +   prefix = get_super_prefix();
> +   printf("Migrating git directory of %s%s from\n'%s' to\n'%s'\n",
> +   prefix ? prefix : "", path,
> +   real_path(old_git_dir), new_git_dir);
> +
> +   if (rename(old_git_dir, new_git_dir) < 0)
> +   die_errno(_("Could not migrate git directory from '%s' to 
> '%s'"),
> +   old_git_dir, new_git_dir);
> +
> +   connect_work_tree_and_git_dir(path, new_git_dir);

Another thing in common is, both submodules and worktrees use some
form of textual symlinks. You need to fix up some here. But if this
submodule has multiple worktreee, there may be some "symlinks" in
.git/worktrees which would need fixing up as well.

You don't have to do the fix up thing right away, but I think we
should at least make sure we leave no dangling links behind (by
die()ing early if we find a .git dir we can't handle yet)
-- 
Duy


Re: [PATCH] difftool.c: mark a file-local symbol with static

2016-11-30 Thread Johannes Schindelin
Hi Ramsay,

On Tue, 29 Nov 2016, Ramsay Jones wrote:

> If you need to re-roll your 'js/difftool-builtin' branch, could
> you please squash this into the relevant patch.

Fixed. Thanks!

> Also, due to a problem in my config.mak file on Linux (a commented
> out line that had a line continuation '\', gr!), gcc issued a
> warning, thus:
> 
>   builtin/difftool.c: In function ‘run_dir_diff’:
>   builtin/difftool.c:568:13: warning: zero-length gnu_printf format string 
> [-Wformat-zero-length]
>warning("");
>^
> I am not sure why -Wno-format-zero-length is set in DEVELOPER_CFLAGS,
> but do you really need to space the output with an an 'empty'
> "warning:" line? (Just curious).

That `warning("");` comes from a straight-forward port of this line (see
https://github.com/git/git/blob/v2.11.0/git-difftool.perl#L425):

$errmsg .= "warning:\n";

I could see two possible ways out:

- warning("%s", ""); (ugly!)

- do away with the "prefix every line with warning:" convention and simply
  have a multi-line `warning(_("...\n...\n"), ...)`

What do you think?
Dscho