date:20170302

[PATCH] t/perf: use $MODERN_GIT for all repo-copying steps

2017-03-02 Thread Jeff King

On Fri, Mar 03, 2017 at 01:45:12AM -0500, Jeff King wrote:

> I repeated the tests over fbd4a703 given in the commit message of
> ee9a7002fc and confirmed that it behaves the same with your fixed
> version of the test. I did have to tweak a few other things to get the
> test to run against such an old version of git, though. I'll follow-up
> with a patch.

Here's that patch.

-- >8 --
Subject: [PATCH] t/perf: use $MODERN_GIT for all repo-copying steps

Since 1a0962dee (t/perf: fix regression in testing older
versions of git, 2016-06-22), we point "$MODERN_GIT" to a
copy of git that matches the t/perf script itself, and which
can be used for tasks outside of the actual timings. This is
needed because the setup done by perf scripts keeps moving
forward in time, and may use features that the older
versions of git we are testing do not have.

That commit used $MODERN_GIT to fix a case where we relied
on the relatively recent --git-path option. But if you go
back further still, there are more problems.

Since 7501b5921 (perf: make the tests work in worktrees,
2016-05-13), we use "git -C", but versions of git older than
44e1e4d67 (git: run in a directory given with -C option,
2013-09-09) don't know about "-C". So testing an old version
of git with a new version of t/perf will fail the setup
step.

We can fix this by using $MODERN_GIT during the setup;
there's no need to use the antique version, since it doesn't
affect the timings. Likewise, we'll adjust the "init"
invocation; antique versions of git called this "init-db".

Signed-off-by: Jeff King 
---
With this patch I was able to run p0001 against v1.7.0. I don't think we
can go further back than that because the perf library depends on the
presence of bin-wrappers. That's probably enough. Unlike the t/interop
library I proposed recently it's not that interesting to go really far
back in time (and I did hack around the bin-wrappers thing in t/interop;
you really can test against v1.0.0 there).

 t/perf/perf-lib.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
index 46f08ee08..ab4b8b06a 100644
--- a/t/perf/perf-lib.sh
+++ b/t/perf/perf-lib.sh
@@ -83,7 +83,7 @@ test_perf_create_repo_from () {
error "bug in the test script: not 2 parameters to test-create-repo"
repo="$1"
source="$2"
-   source_git="$(git -C "$source" rev-parse --git-dir)"
+   source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-dir)"
objects_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-path objects)"
mkdir -p "$repo/.git"
(
@@ -102,7 +102,7 @@ test_perf_create_repo_from () {
) &&
(
cd "$repo" &&
-   git init -q && {
+   "$MODERN_GIT" init -q && {
test_have_prereq SYMLINKS ||
git config core.symlinks false
} &&
-- 
2.12.0.385.gdf4947bc7

Re: [PATCH v2 8/9] read_early_config(): really discover .git/

2017-03-02 Thread Jeff King

On Fri, Mar 03, 2017 at 03:04:32AM +0100, Johannes Schindelin wrote:

> Earlier, we punted and simply assumed that we are in the top-level
> directory of the project, and that there is no .git file but a .git/
> directory so that we can read directly from .git/config.
> 
> However, that is not necessarily true. We may be in a subdirectory. Or
> .git may be a gitfile. Or the environment variable GIT_DIR may be set.
> 
> To remedy this situation, we just refactored the way
> setup_git_directory() discovers the .git/ directory, to make it
> reusable, and more importantly, to leave all global variables and the
> current working directory alone.
> 
> Let's discover the .git/ directory correctly in read_early_config() by
> using that new function.
> 
> This fixes 4 known breakages in t7006.

Yay, this is much nicer.

I think the "dirty hack..." comment in read_early_config() can be
condensed (or go away) now.

I think we _could_ do away with read_early_config() entirely, and just
have the regular config code do this lookup when we're not already in a
repo. But then we'd really need to depend on the "creating_repository"
flag being managed correctly.

I think I prefer the idea that a few "early" spots like pager and alias
config need to use this special function to access the config. That's
less likely to cause surprises when some config option is picked up
before we have run setup_git_directory().

There is one surprising case that I think we need to deal with even now,
though. If I do:

  git init repo
  git -C repo config pager.upload-pack 'echo whoops'
  git upload-pack repo
  cd repo && git upload-pack .

the first one is OK, but the second reads and executes the pager config
from the repo, even though we usually consider upload-pack to be OK to
run in an untrusted repo. This _isn't_ a new thing in your patch, but
just something I noticed while we are here.

And maybe it is a non-issue. The early-config bits all happen via the
git wrapper, and normally we use the direct dashed "git-upload-pack" to
access the other repositories. Certainly I have been known to use "git
-c ... upload-pack" while debugging various things.

So I dunno. You really have to jump through some hoops for it to matter,
but I just wonder if the git wrapper should be special-casing
upload-pack, too.

-Peff

[PATCH] t/perf: export variable used in other blocks

2017-03-02 Thread Jonathan Tan

In p0001, a variable was created in a test_expect_success block to be
used in later test_perf blocks, but was not exported. This caused the
variable to not appear in those blocks (this can be verified by writing
'test -n "$commit"' in those blocks), resulting in a slightly different
invocation than what was intended. Export that variable.

Signed-off-by: Jonathan Tan 
---

Performance indeed does drop significantly with this patch. And I cannot
say "at least this is more correct", because it isn't - as Peff thinks,
it is indeed still not 100% accurate because of the resurfacing-object
issue he mentions (I've verified this by constructing a test that fails
even after this patch set).

Also, since we don't want to unify tree_ and blob_ (for the reason Peff
mentioned in another e-mail), feel free to drop this patch set.

Having said that, while looking at the perf test, I noticed an issue
with a non-exported variable, which is corrected in this patch.

 t/perf/p0001-rev-list.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/perf/p0001-rev-list.sh b/t/perf/p0001-rev-list.sh
index 16359d51a..ebf172401 100755
--- a/t/perf/p0001-rev-list.sh
+++ b/t/perf/p0001-rev-list.sh
@@ -15,7 +15,8 @@ test_perf 'rev-list --all --objects' '
 '
 
 test_expect_success 'create new unreferenced commit' '
-   commit=$(git commit-tree HEAD^{tree} -p HEAD)
+   commit=$(git commit-tree HEAD^{tree} -p HEAD) &&
+   test_export commit
 '
 
 test_perf 'rev-list $commit --not --all' '
-- 
2.12.0.rc1.440.g5b76565f74-goog

Re: [PATCH v7 3/3] config: add conditional include

2017-03-02 Thread Jeff King

On Wed, Mar 01, 2017 at 09:47:40AM -0800, Junio C Hamano wrote:

> > @@ -185,6 +271,12 @@ int git_config_include(const char *var, const char 
> > *value, void *data)
> >  
> > if (!strcmp(var, "include.path"))
> > ret = handle_path_include(value, inc);
> > +
> > +   if (!parse_config_key(var, "includeif", &cond, &cond_len, &key) &&
> > +   (cond && include_condition_is_true(cond, cond_len)) &&
> > +   !strcmp(key, "path"))
> > +   ret = handle_path_include(value, inc);
> > +
> > return ret;
> >  }
> 
> So "includeif.path" (misspelled one without any condition) falls
> through to "return ret" and gives the value we got from inc->fn().
> I am OK with that (i.e. "missing condition is false").

Yeah, I think that is sensible, just as not-understood nonsense like
"include.foobar" would be ignored, as well.

> Or we can make it go to handle_path_include(), effectively making
> the "include.path" a short-hand for "includeIf.path".  I am also OK
> with that (i.e. "missing condition is true").

I think we want "missing condition is false". Certainly an empty
condition like "includeIf..path" is false, as are conditions we don't
understand.

> Or we could even have "include.[.]path" without
> "includeIf"?  I am not sure if it is a bad idea that paints
> ourselves in a corner, but somehow I find it tempting.

That was how I had originally envisioned the namespace working when I
introduced include.path long ago. And I think Duy's v1 used that, but
the feedback was that it was not sufficiently obvious that the
subsection was a conditional.

-Peff

Re: [PATCH] t/perf: export variable used in other blocks

2017-03-02 Thread Jeff King

On Thu, Mar 02, 2017 at 11:50:41AM -0800, Jonathan Tan wrote:

> In p0001, a variable was created in a test_expect_success block to be
> used in later test_perf blocks, but was not exported. This caused the
> variable to not appear in those blocks (this can be verified by writing
> 'test -n "$commit"' in those blocks), resulting in a slightly different
> invocation than what was intended. Export that variable.

Thanks, this is obviously the right thing to do, and the mistake is mine
from ea97002fc (t/perf: time rev-list with UNINTERESTING commits,
2014-01-20). This is not the first time I've been confused by missing
variables in t/perf scripts, since it behaves differently than the
normal test suite. I wonder if we should turn on "set -a" during t/perf
setup snippets. That's a bit of a blunt tool, but I suspect it would
just be easier to work with.

I was curious that the tests added by ea97002fc showed something useful
even with the bug you're fixing here. But it's because the actual slice
of history we meant to traverse isn't important. It's intentionally
tiny to show off the time spent dealing with the UNINTERESTING commits.
So in effect we were traversing no commits instead of a tiny set, but
the timing results were the same.

I repeated the tests over fbd4a703 given in the commit message of
ee9a7002fc and confirmed that it behaves the same with your fixed
version of the test. I did have to tweak a few other things to get the
test to run against such an old version of git, though. I'll follow-up
with a patch.

-Peff

Re: [PATCH v7 0/3] Conditional config include

2017-03-02 Thread Jeff King

On Wed, Mar 01, 2017 at 06:26:28PM +0700, Nguyễn Thái Ngọc Duy wrote:

> I don't have a good answer for Jeff's PS about includeIf ugliness. I
> agree that includeif is ugly but includeIf looks a bit better. I don't
> see a better option (if only "include" does not start or end with a
> vowel...). Maybe includewith? Suggestions are welcome.

I actually think "include-if" _looks_ better, although maybe the
inconsistency with "-" is something we don't want to encourage (though I
also think the implicit include..path was OK, too). Feel free to
just ignore me. I will live with it either way.

For those following on the mailing list, there is some discussion at:

  https://github.com/git/git/commit/484f78e46d00c6d35f20058671a8c76bb924fb33

I think that is mostly focused around another failing in the
error-handling of the config code, and that does not need to be
addressed by this series (though of course I'd welcome any fixes).

But there's a test failure that probably does need to be dealt with
before this graduates to 'next'.

-Peff

Re: [PATCH v2 7/9] read_early_config(): avoid .git/config hack when unneeded

2017-03-02 Thread Jeff King

On Fri, Mar 03, 2017 at 03:04:28AM +0100, Johannes Schindelin wrote:

> So far, we only look whether the startup_info claims to have seen a
> git_dir.
> 
> However, do_git_config_sequence() (and consequently the
> git_config_with_options() call used by read_early_config() asks the
> have_git_dir() function whether we have a .git/ directory, which in turn
> also looks at git_dir and at the environment variable GIT_DIR. And when
> this is the case, the repository config is handled already, so we do not
> have to do that again explicitly.
> 
> Let's just use the same function, have_git_dir(), to determine whether we
> have to handle .git/config explicitly.

Good, this makes sense.

-Peff

Re: [PATCH v2 6/9] read_early_config(): special-case builtins that create a repository

2017-03-02 Thread Jeff King

On Fri, Mar 03, 2017 at 03:04:24AM +0100, Johannes Schindelin wrote:

> When the command we are about to execute wants to create a repository
> (i.e. the command is `init` or `clone`), we *must not* look for a
> repository config.

Hmm. I'm not sure if this one is worth the hackery here.

Yes, it would be wrong for init or clone to read something like
core.sharedrepository from a repo it happens to be in. But I wonder if
it would be cleaner to consider calls to read_early_config their own
"pre-command" stage that may respect global config, or config in a repo
directory you happen to be sitting in.

Because I think for aliases, we're going to end up having to do that
anyway (you won't know that your alias is "clone" until you've resolved
it!). And I think the pager fits into this "pre-command" concept, too
(we already have "-p" as a pre-command option on the command-line).

I dunno. It probably doesn't matter _too much_ either way. But it's one
less hack to maintain going forward, and it also makes your "git-init
respects the pager" into the normal, consistent thing.

-Peff

Re: log -S/-G (aka pickaxe) searches binary files by default

2017-03-02 Thread Jeff King

On Thu, Mar 02, 2017 at 05:36:17PM -0800, Junio C Hamano wrote:

> On Thu, Mar 2, 2017 at 4:52 PM, Thomas Braun
>  wrote:
> >
> > I happen to have quite large binary files in my repos.
> >
> > Today I realized that a line like
> > git log -G a
> > searches also files found to be binary (or explicitly marked as binary).
> >
> > Is that on purpose?
> 
> No, it's a mere oversight (as I do not think I never even thought
> about special casing binary
> files from day one, it is unlikely that you would find _any_ old
> version of Git that behaves
> differently).

The email focuses on "-G", and I think it is wrong to look in binary
files there, as "grep in diff" does not make sense for a binary file
that we would refuse to diff.

But the subject also mentions "-S". I always assumed it was intentional
to look in binary files there, as it is searching for a pure byte
sequence. I would not mind an option to disable that, but I think the
default should remain on.

-Peff

Re: [PATCH v2 0/9] Fix the early config

2017-03-02 Thread Jeff King

On Fri, Mar 03, 2017 at 03:03:56AM +0100, Johannes Schindelin wrote:

> These patches are an attempt to make Git's startup sequence a bit less
> surprising.
> 
> The idea here is to discover the .git/ directory gently (i.e. without
> changing the current working directory, or global variables), and to use
> it to read the .git/config file early, before we actually called
> setup_git_directory() (if we ever do that).

Thanks for working on this. I think it's a huge improvement over the
status quo, and over the earlier attempt. I don't see anything hugely
wrong with this series, though I did note one bug, along with some minor
refinements.

> My dirty little secret is that I actually need this for something else
> entirely. I need to patch an internal version of Git to gather
> statistics, and to that end I need to read the config before and after
> running every Git command. Hence the need for a gentle, and correct
> early config.

We do something similar at GitHub, but it falls into two categories:

  - stat-gathering that's on all the time, so doesn't need to look at
config (I'm not sure in your case if you want to trigger the
gathering with config, or if config is just one of the things you
are gathering).

  - logging that is turned on selectively for some repos, but which
doesn't have to be looked up until we know we are in a repo

I looked into making something upstream-able, and my approach was to
allow GIT_TRACE_* variables to be specified in the config. But of course
that ran into the early-config problem (and I really wanted repo-level
config there, because I'd like to be able to turn on tracing for just a
single problematic repo).

So not really something you need to work on, but maybe food for thought
as you work on your internal project.

-Peff

Re: [PATCH v2 9/9] Test read_early_config()

2017-03-02 Thread Jeff King

On Fri, Mar 03, 2017 at 03:04:36AM +0100, Johannes Schindelin wrote:

> So far, we had no explicit tests of that function.

Makes sense. The pager tests fixed in an earlier commit were effectively
checking those, but I don't mind making it explicit.

-Peff

Re: [PATCH v2 2/9] setup_git_directory(): use is_dir_sep() helper

2017-03-02 Thread Jeff King

On Fri, Mar 03, 2017 at 03:04:07AM +0100, Johannes Schindelin wrote:

> It is okay in practice to test for forward slashes in the output of
> getcwd(), because we go out of our way to convert backslashes to forward
> slashes in getcwd()'s output on Windows.
> 
> Still, the correct way to test for a dir separator is by using the
> helper function we introduced for that very purpose. It also serves as a
> good documentation what the code tries to do (not "how").

Makes sense, but...

> @@ -910,7 +910,8 @@ static const char *setup_git_directory_gently_1(int 
> *nongit_ok)
>   return setup_bare_git_dir(&cwd, offset, nongit_ok);
>  
>   offset_parent = offset;
> - while (--offset_parent > ceil_offset && cwd.buf[offset_parent] 
> != '/');
> + while (--offset_parent > ceil_offset &&
> +!is_dir_sep(dir->buf[offset_parent]));

What is "dir"? I'm guessing this patch got reordered and it should stay
as cwd.buf?

-Peff

Re: [PATCH v2 3/9] setup_git_directory(): avoid changing global state during discovery

2017-03-02 Thread Jeff King

On Fri, Mar 03, 2017 at 03:04:11AM +0100, Johannes Schindelin wrote:

> For historical reasons, Git searches for the .git/ directory (or the
> .git file) by changing the working directory successively to the parent
> directory of the current directory, until either anything was found or
> until a ceiling or a mount point is hit.

This is starting to get into the meat of changes we've been putting off
writing for ages. I'll read with my fingers crossed. :)

> Further global state may be changed, depending on the actual type of
> discovery, e.g. the global variable `repository_format_precious_objects`
> is set in the `check_repository_format_gently()` function (which is a
> bit surprising, given the function name).

It's gentle in the sense that if it does not find a valid repo, it
touches no state. I do suspect the functions you want are the ones it
builds on: read_repository_format() and verify_repository_format(),
which I added not too long ago for the exact purpose of checking repo
config without touching anything global.

> We do have a use case where we would like to find the .git/ directory
> without having any global state touched, though: when we read the early
> config e.g. for the pager or for alias expansion.
> 
> Let's just rename the function `setup_git_directory_gently_1()` to
> `discover_git_directory()` and move all code that changes any global
> state back into `setup_git_directory_gently()`.

Given the earlier paragraph, it sounds like you want to move the
global-state-changing parts out of check_repository_format_gently(). But
that wouldn't be right; it's triggered separate from the discovery
process by things like enter_repo().

However, I don't see that happening in the patch, which is good. I just
wonder if the earlier paragraph should really be complaining about how
setup_git_directory() (and its callees) touches a lot of global state,
not check_repository_format_gently(), whose use is but one of multiple
global-state modifications.

> Note: the new loop is a *little* tricky, as we have to handle the root
> directory specially: we cannot simply strip away the last component
> including the slash, as the root directory only has that slash. To remedy
> that, we introduce the `min_offset` variable that holds the minimal length
> of an absolute path, and using that to special-case the root directory,
> including an early exit before trying to find the parent of the root
> directory.

I wondered how t1509 fared with this, as it is the only test of
repositories at the root directory (and it is not run by default because
it involves a bunch of flaky and expensive chroot setup). Unfortunately
it seems to fail with your patch:

  expecting success: 
test_cmp_val "foo/" "$(git rev-parse --show-prefix)"

  --- expected
  +++ result
  @@ -1 +1 @@
  -foo/
  +oo/
  not ok 66 - auto gitdir, foo: prefix

Could the problem be this:

> + int ceil_offset = -1, min_offset = has_dos_drive_prefix(dir->buf) ? 3 : 
> 1;
> [...]
> - if (ceil_offset < 0 && has_dos_drive_prefix(cwd.buf))
> - ceil_offset = 1;
> + if (ceil_offset < 0)
> + ceil_offset = min_offset - 2;

It works the same as before in the dos-drive case, but we'd get
ceil_offset = -1 otherwise.  Which seems weird, but maybe I don't
understand how ceil_offset is supposed to work.

Interestingly, I don't think this is the bug, though. We still correctly
find /.git as a repository. The problem seems to happen later, in
setup_discovered_git_dir(), which does this:

  /* Make "offset" point to past the '/', and add a '/' at the end */
  offset++;
  strbuf_addch(cwd, '/');
  return cwd->buf + offset;

Here, "offset" is the length of the working tree path. The root
directory case differs from normal in that "offset" already accounts for
the trailing slash.

So I think the bug comes from:

> - ret = setup_discovered_git_dir(gitdirenv,
> -&cwd, offset,
> -nongit_ok);
> [...]
> + prefix = setup_discovered_git_dir(gitdir.buf, &cwd, dir.len,
> +   nongit_ok);

The original knew that "offset" took into account the off-by-one for the
root, but that's lost when we use dir.len. I haven't studied it enough
to know the best solution, but I suspect you will.

Other than this bug, I very much like the direction that this patch is
taking us.

-Peff

Business proposal

2017-03-02 Thread Qatif Oil Group Of Companies---




--
Dear Friend,

I would like to discuss a very important issue with you. I am writing  
to find out if this is your valid email. Please, let me know if this  
email is valid


Kind regards
Adrien Saif
Attorney to Qatif Group of Companies.

Re: [PATCH v2 1/9] t7006: replace dubious test

2017-03-02 Thread Jeff King

On Fri, Mar 03, 2017 at 03:04:02AM +0100, Johannes Schindelin wrote:

> The intention of that test case, however, was to verify that the
> setup_git_directory() function has not run, because it changes global
> state such as the current working directory.
> 
> To keep that spirit, but fix the incorrect assumption, this patch
> replaces that test case by a new one that verifies that the pager is
> run in the subdirectory, i.e. that the current working directory has
> not been changed at the time the pager is configured and launched, even
> if the `rev-parse` command requires a .git/ directory and *will* change
> the working directory.

This is a great solution to the question I had in v1 ("how do we know
when setup_git_directory() is run?"). It not only checks the
internal-code case that we care about, but it also shows how real users
would get bit if we do the wrong thing.

> +test_expect_failure TTY 'core.pager in repo config works and retains cwd' '
> + sane_unset GIT_PAGER &&
> + test_config core.pager "cat >cwd-retained" &&
> + (
> + cd sub &&
> + rm -f cwd-retained &&
> + test_terminal git -p rev-parse HEAD &&
> + test -e cwd-retained
> + )
> +'

Does this actually need TTY and test_terminal? You replace the pager
with something that doesn't care about the terminal, and "-p" should
unconditionally turn it on.

(Also a minor nit: we usually prefer test_path_is_file to "test -e"
these days).

-Peff

Re: SHA1 collisions found

2017-03-02 Thread Linus Torvalds

On Thu, Mar 2, 2017 at 5:50 PM, Mike Hommey  wrote:
>
> What if the "object version" is a hash of the content (as opposed to
> header + content like the normal git hash)?

It doesn't actually matter for that attack.

The concept of the attack is actually fairly simple: generate a lot of
collisions in the first hash (and they outline how you only need to
generate 't' serial collisions and turn them into 2**t collisions),
and then just check those collisions against the second hash.

If you have enough collisions in the first one, having a collision in
the second one is inevitable just from the birthday rule.

Now, *In*practice* that attack is not very easy to do. Each collision
is still hard to generate. And because of the git object rules (the
size has to match), it limits you a bit in what collisions you
generate.

But the fact that git requires the right header can be considered just
a variation on the initial state to SHA1, and then the additional
requirement might be as easy as just saying that your collision
generation function just always needs to generate a fixed-size block
(which could be just 64 bytes - the SHA1 blocking size).

So assuming you can arbitrarily generate collisions (brute force:
2**80 operations) you could make the rule be that you generate
something that starts with one 64-byte block that matches the git
rules:

   "blob 6454\0"..pad with repeating NUL bytes..

and then you generate 100 pairs of 64-byte SHA1 collisions (where the
first starts with the initial value of that fixed blob prefix, the
next with the state after the first block, etc etc).

Now you can generate 2**100 different sequences that all are exactly
6464 bytes (101 64-byte blocks) and all have the same SHA1 - all all
share that first fixed 64-byte block.

You needed "only" on the order of 100 * 2**80 SHA1 operations to do
that in theory.

An since you have 2**100 objects, you know that you will have a likely
birthday paradox even if your secondary hash is 200 bits long.

So all in all, you generated a collision in on the order of 2**100 operations.

So instead of getting the security of "160+200" bits, you only got 200
bits worth of real collision resistance.

NOTE!! All of the above is very much assuming "brute force". If you
can brute-force the hash, you can completely break any hash. The input
block to SHA1 is 64 bytes, so by definition you have 512 bits of data
to play with, and you're generating a 160-bit output: there is no
question what-so-ever that you couldn't generate any hash you want if
you brute-force things.

The place where things like having a fixed object header can help is
when the attack in question requires some repeated patterns. For
example, if you're not brute-forcing things, your attack on the hash
will likely involve using very particular patterns to change a number
of bits in certain ways, and then combining those particular patterns
to get the hash collision you wanted.  And *that* is when you may need
to add some particular data to the middle to make the end result be a
particular match.

But a brute-force attack definitely doesn't need size changes. You can
make the size be pretty much anything you want (modulo really small
inputs, of course - a one-byte input only has 256 different possible
hashes ;) if you have the resources to just go and try every possible
combination until you get the hash you wanted.

I may have overly simplified the paper, but that's the basic idea.

   Linus

Re: log -S/-G (aka pickaxe) searches binary files by default

2017-03-02 Thread Junio C Hamano

On Thu, Mar 2, 2017 at 4:52 PM, Thomas Braun
 wrote:
>
> I happen to have quite large binary files in my repos.
>
> Today I realized that a line like
> git log -G a
> searches also files found to be binary (or explicitly marked as binary).
>
> Is that on purpose?

No, it's a mere oversight (as I do not think I never even thought
about special casing binary
files from day one, it is unlikely that you would find _any_ old
version of Git that behaves
differently).

Business proposal

2017-03-02 Thread Qatif Oil Group Of Companies---




--
Dear Friend,

I would like to discuss a very important issue with you. I am writing  
to find out if this is your valid email. Please, let me know if this  
email is valid


Kind regards
Adrien Saif
Attorney to Qatif Group of Companies.

Re: [PATCH v1] Travis: also test on 32-bit Linux

2017-03-02 Thread Johannes Schindelin

Hi Junio,

On Thu, 2 Mar 2017, Junio C Hamano wrote:

> Another question is which v3 people mean in the discussion, when you
> and Dscho work on improvements at the same time and each post the
> "next" version marked as "v3", and they comment on one of them?

But then, Lars & I communicate in a more direct way than the mailing list
when discussing teeny tiny details such as: "could you have a look at my
mail" or "how would you change .travis.yml to do XYZ".

With that level of communication, there is almost no danger of two v3s.

Ciao,
Johannes

[PATCH v2 9/9] Test read_early_config()

2017-03-02 Thread Johannes Schindelin

So far, we had no explicit tests of that function.

Signed-off-by: Johannes Schindelin 
---
 t/helper/test-config.c  | 15 +++
 t/t1309-early-config.sh | 50 +
 2 files changed, 65 insertions(+)
 create mode 100755 t/t1309-early-config.sh

diff --git a/t/helper/test-config.c b/t/helper/test-config.c
index 83a4f2ab869..8e3ed6a76cb 100644
--- a/t/helper/test-config.c
+++ b/t/helper/test-config.c
@@ -66,6 +66,16 @@ static int iterate_cb(const char *var, const char *value, 
void *data)
return 0;
 }
 
+static int early_config_cb(const char *var, const char *value, void *vdata)
+{
+   const char *key = vdata;
+
+   if (!strcmp(key, var))
+   printf("%s\n", value);
+
+   return 0;
+}
+
 int cmd_main(int argc, const char **argv)
 {
int i, val;
@@ -73,6 +83,11 @@ int cmd_main(int argc, const char **argv)
const struct string_list *strptr;
struct config_set cs;
 
+   if (argc == 3 && !strcmp(argv[1], "read_early_config")) {
+   read_early_config(early_config_cb, (void *)argv[2]);
+   return 0;
+   }
+
setup_git_directory();
 
git_configset_init(&cs);
diff --git a/t/t1309-early-config.sh b/t/t1309-early-config.sh
new file mode 100755
index 000..0c55dee514c
--- /dev/null
+++ b/t/t1309-early-config.sh
@@ -0,0 +1,50 @@
+#!/bin/sh
+
+test_description='Test read_early_config()'
+
+. ./test-lib.sh
+
+test_expect_success 'read early config' '
+   test_config early.config correct &&
+   test-config read_early_config early.config >output &&
+   test correct = "$(cat output)"
+'
+
+test_expect_success 'in a sub-directory' '
+   test_config early.config sub &&
+   mkdir -p sub &&
+   (
+   cd sub &&
+   test-config read_early_config early.config
+   ) >output &&
+   test sub = "$(cat output)"
+'
+
+test_expect_success 'ceiling' '
+   test_config early.config ceiling &&
+   mkdir -p sub &&
+   (
+   GIT_CEILING_DIRECTORIES="$PWD" &&
+   export GIT_CEILING_DIRECTORIES &&
+   cd sub &&
+   test-config read_early_config early.config
+   ) >output &&
+   test -z "$(cat output)"
+'
+
+test_expect_success 'ceiling #2' '
+   mkdir -p xdg/git &&
+   git config -f xdg/git/config early.config xdg &&
+   test_config early.config ceiling &&
+   mkdir -p sub &&
+   (
+   XDG_CONFIG_HOME="$PWD"/xdg &&
+   GIT_CEILING_DIRECTORIES="$PWD" &&
+   export GIT_CEILING_DIRECTORIES XDG_CONFIG_HOME &&
+   cd sub &&
+   test-config read_early_config early.config
+   ) >output &&
+   test xdg = "$(cat output)"
+'
+
+test_done
-- 
2.12.0.windows.1.3.g8a117c48243

[PATCH v2 8/9] read_early_config(): really discover .git/

2017-03-02 Thread Johannes Schindelin

Earlier, we punted and simply assumed that we are in the top-level
directory of the project, and that there is no .git file but a .git/
directory so that we can read directly from .git/config.

However, that is not necessarily true. We may be in a subdirectory. Or
.git may be a gitfile. Or the environment variable GIT_DIR may be set.

To remedy this situation, we just refactored the way
setup_git_directory() discovers the .git/ directory, to make it
reusable, and more importantly, to leave all global variables and the
current working directory alone.

Let's discover the .git/ directory correctly in read_early_config() by
using that new function.

This fixes 4 known breakages in t7006.

Signed-off-by: Johannes Schindelin 
---
 config.c | 9 +++--
 t/t7006-pager.sh | 8 
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/config.c b/config.c
index 3ee5e5a15d6..bcda397d42e 100644
--- a/config.c
+++ b/config.c
@@ -1414,6 +1414,8 @@ static void configset_iter(struct config_set *cs, 
config_fn_t fn, void *data)
 
 void read_early_config(config_fn_t cb, void *data)
 {
+   struct strbuf buf = STRBUF_INIT;
+
git_config_with_options(cb, data, NULL, 1);
 
/*
@@ -1434,13 +1436,16 @@ void read_early_config(config_fn_t cb, void *data)
 * valid repository), and would rarely make things worse (i.e., you do
 * not generally have a .git/config file sitting around).
 */
-   if (!startup_info->creating_repository && !have_git_dir()) {
+   if (!startup_info->creating_repository && !have_git_dir() &&
+   discover_git_directory(&buf)) {
struct git_config_source repo_config;
 
memset(&repo_config, 0, sizeof(repo_config));
-   repo_config.file = ".git/config";
+   strbuf_addstr(&buf, "/config");
+   repo_config.file = buf.buf;
git_config_with_options(cb, data, &repo_config, 1);
}
+   strbuf_release(&buf);
 }
 
 static void git_config_check_init(void);
diff --git a/t/t7006-pager.sh b/t/t7006-pager.sh
index 427bfc605ad..bf89340988b 100755
--- a/t/t7006-pager.sh
+++ b/t/t7006-pager.sh
@@ -360,19 +360,19 @@ test_pager_choices   'git aliasedlog'
 test_default_pagerexpect_success 'git -p aliasedlog'
 test_PAGER_overrides  expect_success 'git -p aliasedlog'
 test_core_pager_overrides expect_success 'git -p aliasedlog'
-test_core_pager_subdirexpect_failure 'git -p aliasedlog'
+test_core_pager_subdirexpect_success 'git -p aliasedlog'
 test_GIT_PAGER_overrides  expect_success 'git -p aliasedlog'
 
 test_default_pagerexpect_success 'git -p true'
 test_PAGER_overrides  expect_success 'git -p true'
 test_core_pager_overrides expect_success 'git -p true'
-test_core_pager_subdirexpect_failure 'git -p true'
+test_core_pager_subdirexpect_success 'git -p true'
 test_GIT_PAGER_overrides  expect_success 'git -p true'
 
 test_default_pagerexpect_success test_must_fail 'git -p request-pull'
 test_PAGER_overrides  expect_success test_must_fail 'git -p request-pull'
 test_core_pager_overrides expect_success test_must_fail 'git -p request-pull'
-test_core_pager_subdirexpect_failure test_must_fail 'git -p request-pull'
+test_core_pager_subdirexpect_success test_must_fail 'git -p request-pull'
 test_GIT_PAGER_overrides  expect_success test_must_fail 'git -p request-pull'
 
 test_default_pagerexpect_success test_must_fail 'git -p'
@@ -380,7 +380,7 @@ test_PAGER_overrides  expect_success test_must_fail 
'git -p'
 test_local_config_ignored expect_failure test_must_fail 'git -p'
 test_GIT_PAGER_overrides  expect_success test_must_fail 'git -p'
 
-test_expect_failure TTY 'core.pager in repo config works and retains cwd' '
+test_expect_success TTY 'core.pager in repo config works and retains cwd' '
sane_unset GIT_PAGER &&
test_config core.pager "cat >cwd-retained" &&
(
-- 
2.12.0.windows.1.3.g8a117c48243

[PATCH v2 0/9] Fix the early config

2017-03-02 Thread Johannes Schindelin

These patches are an attempt to make Git's startup sequence a bit less
surprising.

The idea here is to discover the .git/ directory gently (i.e. without
changing the current working directory, or global variables), and to use
it to read the .git/config file early, before we actually called
setup_git_directory() (if we ever do that).

This also allows us to fix the early config e.g. to determine the pager
or to resolve aliases in a non-surprising manner.

The first iteration of this patch series still tried to be clever and to
avoid having to disentangle the side effects from the
setup_git_directory_gently_1() function simply by duplicating the logic.

However, Peff suggested in a very short sentence that this would not fly
well.

Little did I know that I would spend the better part of an entire week
on trying to address that innocuous comment! There are simply *so many*
side effects in that code. Who would have thought that a function
called check_repository_format() would set global variables?

But after all that work, I am actually quite a bit satisfied with the
way things turned out.

My dirty little secret is that I actually need this for something else
entirely. I need to patch an internal version of Git to gather
statistics, and to that end I need to read the config before and after
running every Git command. Hence the need for a gentle, and correct
early config.

Notes:

- I do not handle dashed invocations of `init` and `clone` correctly.
  That is, even if `git-init` and `git-clone` clearly do not want to
  read the local config, they do. It does not matter all that much
  because they do not use a pager, but still. It is a wart.

- The read_early_config() function is still called multiple times,
  re-reading all the config files and re-discovering the .git/ directory
  multiple times, which is quite wasteful. I was tempted to take care of
  that but I must not run the danger to spread myself even thinner these
  days. If a patch adding that caching were to fly my way, I'd gladly
  integrate it, of course... ;-)

Changes since v1:

- the discover_git_directory() function is no longer completely separate
  from setup_git_directory(), but a callee of the latter.

- t7006 succeeds now (I removed the incorrect test case in favor of one
  that verifies that setup_git_directory() was not run via the tell-tale
  that the current working directory has not changed when the pager
  runs).


Johannes Schindelin (9):
  t7006: replace dubious test
  setup_git_directory(): use is_dir_sep() helper
  setup_git_directory(): avoid changing global state during discovery
  Export the discover_git_directory() function
  Make read_early_config() reusable
  read_early_config(): special-case builtins that create a repository
  read_early_config(): avoid .git/config hack when unneeded
  read_early_config(): really discover .git/
  Test read_early_config()

 cache.h |   4 +-
 config.c|  36 +
 git.c   |   3 +
 pager.c |  31 ---
 setup.c | 211 +++-
 t/helper/test-config.c  |  15 
 t/t1309-early-config.sh |  50 
 t/t7006-pager.sh|  18 -
 8 files changed, 257 insertions(+), 111 deletions(-)
 create mode 100755 t/t1309-early-config.sh


base-commit: 3bc53220cb2dcf709f7a027a3f526befd021d858
Published-As: https://github.com/dscho/git/releases/tag/early-config-v2
Fetch-It-Via: git fetch https://github.com/dscho/git early-config-v2

Interdiff vs v1:

 diff --git a/builtin/am.c b/builtin/am.c
 index 2b81dabddd9..f7a7a971fbe 100644
 --- a/builtin/am.c
 +++ b/builtin/am.c
 @@ -1791,7 +1791,7 @@ static int do_interactive(struct am_state *state)
}
strbuf_release(&msg);
} else if (*reply == 'v' || *reply == 'V') {
 -  const char *pager = git_pager(1, 1);
 +  const char *pager = git_pager(1);
struct child_process cp = CHILD_PROCESS_INIT;
  
if (!pager)
 diff --git a/builtin/blame.c b/builtin/blame.c
 index 628ca237da6..cffc6265408 100644
 --- a/builtin/blame.c
 +++ b/builtin/blame.c
 @@ -2919,7 +2919,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
assign_blame(&sb, opt);
  
if (!incremental)
 -  setup_pager(1);
 +  setup_pager();
  
free(final_commit_name);
  
 diff --git a/builtin/grep.c b/builtin/grep.c
 index f820e4b1c4d..9304c33e750 100644
 --- a/builtin/grep.c
 +++ b/builtin/grep.c
 @@ -1133,7 +1133,7 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
}
  
if (show_in_pager == default_pager)
 -  show_in_pager = git_pager(1, 1);
 +  show_in_pager = git_pager(1);
if (show_in_pager) {
opt.color = 0;
opt.name_only = 1;
 @@ -1268,7 +1268,7 @@ int cmd_grep(int argc, c

[PATCH v2 1/9] t7006: replace dubious test

2017-03-02 Thread Johannes Schindelin

The idea of the test case "git -p - core.pager is not used from
subdirectory" was to verify that the setup_git_directory() function had
not been called just to obtain the core.pager setting.

However, we are about to fix the early config machinery so that it
*does* work, without messing up the global state.

Once that is done, the core.pager setting *will* be used, even when
running from a subdirectory, and that is a Good Thing.

The intention of that test case, however, was to verify that the
setup_git_directory() function has not run, because it changes global
state such as the current working directory.

To keep that spirit, but fix the incorrect assumption, this patch
replaces that test case by a new one that verifies that the pager is
run in the subdirectory, i.e. that the current working directory has
not been changed at the time the pager is configured and launched, even
if the `rev-parse` command requires a .git/ directory and *will* change
the working directory.

Signed-off-by: Johannes Schindelin 
---
 t/t7006-pager.sh | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/t/t7006-pager.sh b/t/t7006-pager.sh
index c8dc665f2fd..427bfc605ad 100755
--- a/t/t7006-pager.sh
+++ b/t/t7006-pager.sh
@@ -378,9 +378,19 @@ test_GIT_PAGER_overrides  expect_success test_must_fail 
'git -p request-pull'
 test_default_pagerexpect_success test_must_fail 'git -p'
 test_PAGER_overrides  expect_success test_must_fail 'git -p'
 test_local_config_ignored expect_failure test_must_fail 'git -p'
-test_no_local_config_subdir expect_success test_must_fail 'git -p'
 test_GIT_PAGER_overrides  expect_success test_must_fail 'git -p'
 
+test_expect_failure TTY 'core.pager in repo config works and retains cwd' '
+   sane_unset GIT_PAGER &&
+   test_config core.pager "cat >cwd-retained" &&
+   (
+   cd sub &&
+   rm -f cwd-retained &&
+   test_terminal git -p rev-parse HEAD &&
+   test -e cwd-retained
+   )
+'
+
 test_doesnt_paginate  expect_failure test_must_fail 'git -p nonsense'
 
 test_pager_choices   'git shortlog'
-- 
2.12.0.windows.1.3.g8a117c48243

[PATCH v2 6/9] read_early_config(): special-case builtins that create a repository

2017-03-02 Thread Johannes Schindelin

When the command we are about to execute wants to create a repository
(i.e. the command is `init` or `clone`), we *must not* look for a
repository config.

Signed-off-by: Johannes Schindelin 
---
 cache.h  | 2 +-
 config.c | 3 ++-
 git.c| 3 +++
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 6b6780064f0..0af7141242f 100644
--- a/cache.h
+++ b/cache.h
@@ -2070,7 +2070,7 @@ const char *split_cmdline_strerror(int cmdline_errno);
 
 /* setup.c */
 struct startup_info {
-   int have_repository;
+   int have_repository, creating_repository;
const char *prefix;
 };
 extern struct startup_info *startup_info;
diff --git a/config.c b/config.c
index 9cfbeafd04c..980fcc6ff2e 100644
--- a/config.c
+++ b/config.c
@@ -1434,7 +1434,8 @@ void read_early_config(config_fn_t cb, void *data)
 * valid repository), and would rarely make things worse (i.e., you do
 * not generally have a .git/config file sitting around).
 */
-   if (!startup_info->have_repository) {
+   if (!startup_info->creating_repository &&
+   !startup_info->have_repository) {
struct git_config_source repo_config;
 
memset(&repo_config, 0, sizeof(repo_config));
diff --git a/git.c b/git.c
index 33f52acbcc8..9fb9bb90a21 100644
--- a/git.c
+++ b/git.c
@@ -337,6 +337,9 @@ static int run_builtin(struct cmd_struct *p, int argc, 
const char **argv)
struct stat st;
const char *prefix;
 
+   if (p->fn == cmd_init_db || p->fn == cmd_clone)
+   startup_info->creating_repository = 1;
+
prefix = NULL;
help = argc == 2 && !strcmp(argv[1], "-h");
if (!help) {
-- 
2.12.0.windows.1.3.g8a117c48243

[PATCH v2 7/9] read_early_config(): avoid .git/config hack when unneeded

2017-03-02 Thread Johannes Schindelin

So far, we only look whether the startup_info claims to have seen a
git_dir.

However, do_git_config_sequence() (and consequently the
git_config_with_options() call used by read_early_config() asks the
have_git_dir() function whether we have a .git/ directory, which in turn
also looks at git_dir and at the environment variable GIT_DIR. And when
this is the case, the repository config is handled already, so we do not
have to do that again explicitly.

Let's just use the same function, have_git_dir(), to determine whether we
have to handle .git/config explicitly.

Signed-off-by: Johannes Schindelin 
---
 config.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/config.c b/config.c
index 980fcc6ff2e..3ee5e5a15d6 100644
--- a/config.c
+++ b/config.c
@@ -1434,8 +1434,7 @@ void read_early_config(config_fn_t cb, void *data)
 * valid repository), and would rarely make things worse (i.e., you do
 * not generally have a .git/config file sitting around).
 */
-   if (!startup_info->creating_repository &&
-   !startup_info->have_repository) {
+   if (!startup_info->creating_repository && !have_git_dir()) {
struct git_config_source repo_config;
 
memset(&repo_config, 0, sizeof(repo_config));
-- 
2.12.0.windows.1.3.g8a117c48243

[PATCH v2 2/9] setup_git_directory(): use is_dir_sep() helper

2017-03-02 Thread Johannes Schindelin

It is okay in practice to test for forward slashes in the output of
getcwd(), because we go out of our way to convert backslashes to forward
slashes in getcwd()'s output on Windows.

Still, the correct way to test for a dir separator is by using the
helper function we introduced for that very purpose. It also serves as a
good documentation what the code tries to do (not "how").

Signed-off-by: Johannes Schindelin 
---
 setup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/setup.c b/setup.c
index 967f289f1ef..89a0cef9231 100644
--- a/setup.c
+++ b/setup.c
@@ -910,7 +910,8 @@ static const char *setup_git_directory_gently_1(int 
*nongit_ok)
return setup_bare_git_dir(&cwd, offset, nongit_ok);
 
offset_parent = offset;
-   while (--offset_parent > ceil_offset && cwd.buf[offset_parent] 
!= '/');
+   while (--offset_parent > ceil_offset &&
+  !is_dir_sep(dir->buf[offset_parent]));
if (offset_parent <= ceil_offset)
return setup_nongit(cwd.buf, nongit_ok);
if (one_filesystem) {
-- 
2.12.0.windows.1.3.g8a117c48243

[PATCH v2 4/9] Export the discover_git_directory() function

2017-03-02 Thread Johannes Schindelin

The function we introduced earlier needs to return both the path to the
.git/ directory as well as the "cd-up" path to allow
setup_git_directory() to retain its previous behavior as if it changed
the current working directory on its quest for the .git/ directory.

Let's rename it and export a function with an easier signature that
*just* discovers the .git/ directory.

We will use it in a subsequent patch to support early config reading
better.

Signed-off-by: Johannes Schindelin 
---
 cache.h |  1 +
 setup.c | 29 ++---
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index 80b6372cf76..a104b76c02e 100644
--- a/cache.h
+++ b/cache.h
@@ -518,6 +518,7 @@ extern void set_git_work_tree(const char *tree);
 #define ALTERNATE_DB_ENVIRONMENT "GIT_ALTERNATE_OBJECT_DIRECTORIES"
 
 extern void setup_work_tree(void);
+extern const char *discover_git_directory(struct strbuf *gitdir);
 extern const char *setup_git_directory_gently(int *);
 extern const char *setup_git_directory(void);
 extern char *prefix_path(const char *prefix, int len, const char *path);
diff --git a/setup.c b/setup.c
index edac3c27dc1..7ceca6cc6ef 100644
--- a/setup.c
+++ b/setup.c
@@ -839,8 +839,8 @@ enum discovery_result {
  * the discovered .git/ directory, if any. This path may be relative against
  * `dir` (i.e. *not* necessarily the cwd).
  */
-static enum discovery_result discover_git_directory(struct strbuf *dir,
-   struct strbuf *gitdir)
+static enum discovery_result discover_git_directory_1(struct strbuf *dir,
+ struct strbuf *gitdir)
 {
const char *env_ceiling_dirs = getenv(CEILING_DIRECTORIES_ENVIRONMENT);
struct string_list ceiling_dirs = STRING_LIST_INIT_DUP;
@@ -921,6 +921,29 @@ static enum discovery_result discover_git_directory(struct 
strbuf *dir,
}
 }
 
+const char *discover_git_directory(struct strbuf *gitdir)
+{
+   struct strbuf dir = STRBUF_INIT;
+   int len;
+
+   if (strbuf_getcwd(&dir))
+   return NULL;
+
+   len = dir.len;
+   if (discover_git_directory_1(&dir, gitdir) < 0) {
+   strbuf_release(&dir);
+   return NULL;
+   }
+
+   if (dir.len < len && !is_absolute_path(gitdir->buf)) {
+   strbuf_addch(&dir, '/');
+   strbuf_insert(gitdir, 0, dir.buf, dir.len);
+   }
+   strbuf_release(&dir);
+
+   return gitdir->buf;
+}
+
 const char *setup_git_directory_gently(int *nongit_ok)
 {
struct strbuf cwd = STRBUF_INIT, dir = STRBUF_INIT, gitdir = 
STRBUF_INIT;
@@ -947,7 +970,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
die_errno(_("Unable to read current working directory"));
strbuf_addbuf(&dir, &cwd);
 
-   switch (discover_git_directory(&dir, &gitdir)) {
+   switch (discover_git_directory_1(&dir, &gitdir)) {
case GIT_DIR_NONE:
prefix = NULL;
break;
-- 
2.12.0.windows.1.3.g8a117c48243

[PATCH v2 5/9] Make read_early_config() reusable

2017-03-02 Thread Johannes Schindelin

The pager configuration needs to be read early, possibly before
discovering any .git/ directory.

Let's not hide this function in pager.c, but make it available to other
callers.

Signed-off-by: Johannes Schindelin 
---
 cache.h  |  1 +
 config.c | 31 +++
 pager.c  | 31 ---
 3 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/cache.h b/cache.h
index a104b76c02e..6b6780064f0 100644
--- a/cache.h
+++ b/cache.h
@@ -1798,6 +1798,7 @@ extern int git_config_from_blob_sha1(config_fn_t fn, 
const char *name,
 const unsigned char *sha1, void *data);
 extern void git_config_push_parameter(const char *text);
 extern int git_config_from_parameters(config_fn_t fn, void *data);
+extern void read_early_config(config_fn_t cb, void *data);
 extern void git_config(config_fn_t fn, void *);
 extern int git_config_with_options(config_fn_t fn, void *,
   struct git_config_source *config_source,
diff --git a/config.c b/config.c
index c6b874a7bf7..9cfbeafd04c 100644
--- a/config.c
+++ b/config.c
@@ -1412,6 +1412,37 @@ static void configset_iter(struct config_set *cs, 
config_fn_t fn, void *data)
}
 }
 
+void read_early_config(config_fn_t cb, void *data)
+{
+   git_config_with_options(cb, data, NULL, 1);
+
+   /*
+* Note that this is a really dirty hack that does the wrong thing in
+* many cases. The crux of the problem is that we cannot run
+* setup_git_directory() early on in git's setup, so we have no idea if
+* we are in a repository or not, and therefore are not sure whether
+* and how to read repository-local config.
+*
+* So if we _aren't_ in a repository (or we are but we would reject its
+* core.repositoryformatversion), we'll read whatever is in .git/config
+* blindly. Similarly, if we _are_ in a repository, but not at the
+* root, we'll fail to find .git/config (because it's really
+* ../.git/config, etc). See t7006 for a complete set of failures.
+*
+* However, we have historically provided this hack because it does
+* work some of the time (namely when you are at the top-level of a
+* valid repository), and would rarely make things worse (i.e., you do
+* not generally have a .git/config file sitting around).
+*/
+   if (!startup_info->have_repository) {
+   struct git_config_source repo_config;
+
+   memset(&repo_config, 0, sizeof(repo_config));
+   repo_config.file = ".git/config";
+   git_config_with_options(cb, data, &repo_config, 1);
+   }
+}
+
 static void git_config_check_init(void);
 
 void git_config(config_fn_t fn, void *data)
diff --git a/pager.c b/pager.c
index ae796433630..73ca8bc3b17 100644
--- a/pager.c
+++ b/pager.c
@@ -43,37 +43,6 @@ static int core_pager_config(const char *var, const char 
*value, void *data)
return 0;
 }
 
-static void read_early_config(config_fn_t cb, void *data)
-{
-   git_config_with_options(cb, data, NULL, 1);
-
-   /*
-* Note that this is a really dirty hack that does the wrong thing in
-* many cases. The crux of the problem is that we cannot run
-* setup_git_directory() early on in git's setup, so we have no idea if
-* we are in a repository or not, and therefore are not sure whether
-* and how to read repository-local config.
-*
-* So if we _aren't_ in a repository (or we are but we would reject its
-* core.repositoryformatversion), we'll read whatever is in .git/config
-* blindly. Similarly, if we _are_ in a repository, but not at the
-* root, we'll fail to find .git/config (because it's really
-* ../.git/config, etc). See t7006 for a complete set of failures.
-*
-* However, we have historically provided this hack because it does
-* work some of the time (namely when you are at the top-level of a
-* valid repository), and would rarely make things worse (i.e., you do
-* not generally have a .git/config file sitting around).
-*/
-   if (!startup_info->have_repository) {
-   struct git_config_source repo_config;
-
-   memset(&repo_config, 0, sizeof(repo_config));
-   repo_config.file = ".git/config";
-   git_config_with_options(cb, data, &repo_config, 1);
-   }
-}
-
 const char *git_pager(int stdout_is_tty)
 {
const char *pager;
-- 
2.12.0.windows.1.3.g8a117c48243

[PATCH v2 3/9] setup_git_directory(): avoid changing global state during discovery

2017-03-02 Thread Johannes Schindelin

For historical reasons, Git searches for the .git/ directory (or the
.git file) by changing the working directory successively to the parent
directory of the current directory, until either anything was found or
until a ceiling or a mount point is hit.

Further global state may be changed, depending on the actual type of
discovery, e.g. the global variable `repository_format_precious_objects`
is set in the `check_repository_format_gently()` function (which is a
bit surprising, given the function name).

We do have a use case where we would like to find the .git/ directory
without having any global state touched, though: when we read the early
config e.g. for the pager or for alias expansion.

Let's just rename the function `setup_git_directory_gently_1()` to
`discover_git_directory()` and move all code that changes any global
state back into `setup_git_directory_gently()`.

In subsequent patches, we will export the `discover_git_directory()`
function and make use of it.

Note: the new loop is a *little* tricky, as we have to handle the root
directory specially: we cannot simply strip away the last component
including the slash, as the root directory only has that slash. To remedy
that, we introduce the `min_offset` variable that holds the minimal length
of an absolute path, and using that to special-case the root directory,
including an early exit before trying to find the parent of the root
directory.

Signed-off-by: Johannes Schindelin 
---
 setup.c | 187 ++--
 1 file changed, 112 insertions(+), 75 deletions(-)

diff --git a/setup.c b/setup.c
index 89a0cef9231..edac3c27dc1 100644
--- a/setup.c
+++ b/setup.c
@@ -816,50 +816,49 @@ static int canonicalize_ceiling_entry(struct 
string_list_item *item,
}
 }
 
+enum discovery_result {
+   GIT_DIR_NONE = 0,
+   GIT_DIR_EXPLICIT,
+   GIT_DIR_DISCOVERED,
+   GIT_DIR_BARE,
+   /* these are errors */
+   GIT_DIR_HIT_CEILING = -1,
+   GIT_DIR_HIT_MOUNT_POINT = -2
+};
+
 /*
  * We cannot decide in this function whether we are in the work tree or
  * not, since the config can only be read _after_ this function was called.
+ *
+ * Also, we avoid changing any global state (such as the current working
+ * directory) to allow early callers.
+ *
+ * The directory where the search should start needs to be passed in via the
+ * `dir` parameter; upon return, the `dir` buffer will contain the path of
+ * the directory where the search ended, and `gitdir` will contain the path of
+ * the discovered .git/ directory, if any. This path may be relative against
+ * `dir` (i.e. *not* necessarily the cwd).
  */
-static const char *setup_git_directory_gently_1(int *nongit_ok)
+static enum discovery_result discover_git_directory(struct strbuf *dir,
+   struct strbuf *gitdir)
 {
const char *env_ceiling_dirs = getenv(CEILING_DIRECTORIES_ENVIRONMENT);
struct string_list ceiling_dirs = STRING_LIST_INIT_DUP;
-   static struct strbuf cwd = STRBUF_INIT;
-   const char *gitdirenv, *ret;
-   char *gitfile;
-   int offset, offset_parent, ceil_offset = -1;
+   const char *gitdirenv;
+   int ceil_offset = -1, min_offset = has_dos_drive_prefix(dir->buf) ? 3 : 
1;
dev_t current_device = 0;
int one_filesystem = 1;
 
/*
-* We may have read an incomplete configuration before
-* setting-up the git directory. If so, clear the cache so
-* that the next queries to the configuration reload complete
-* configuration (including the per-repo config file that we
-* ignored previously).
-*/
-   git_config_clear();
-
-   /*
-* Let's assume that we are in a git repository.
-* If it turns out later that we are somewhere else, the value will be
-* updated accordingly.
-*/
-   if (nongit_ok)
-   *nongit_ok = 0;
-
-   if (strbuf_getcwd(&cwd))
-   die_errno(_("Unable to read current working directory"));
-   offset = cwd.len;
-
-   /*
 * If GIT_DIR is set explicitly, we're not going
 * to do any discovery, but we still do repository
 * validation.
 */
gitdirenv = getenv(GIT_DIR_ENVIRONMENT);
-   if (gitdirenv)
-   return setup_explicit_git_dir(gitdirenv, &cwd, nongit_ok);
+   if (gitdirenv) {
+   strbuf_addstr(gitdir, gitdirenv);
+   return GIT_DIR_EXPLICIT;
+   }
 
if (env_ceiling_dirs) {
int empty_entry_found = 0;
@@ -867,15 +866,15 @@ static const char *setup_git_directory_gently_1(int 
*nongit_ok)
string_list_split(&ceiling_dirs, env_ceiling_dirs, PATH_SEP, 
-1);
filter_string_list(&ceiling_dirs, 0,
   canonicalize_ceiling_entry, 
&empty_entry_found);
-   ceil_offset = longest_ancestor_length(cwd.buf,

Re: SHA1 collisions found

2017-03-02 Thread Mike Hommey

On Thu, Mar 02, 2017 at 02:27:15PM -0800, Linus Torvalds wrote:
> On Thu, Mar 2, 2017 at 1:54 PM, Joey Hess  wrote:
> >
> > There's a surprising result of combining iterated hash functions, that
> > the combination is no more difficult to attack than the strongest hash
> > function used.
> 
> Duh. I should actually have known that. I started reading the paper
> and went "this seems very familiar". I'm pretty sure I've been pointed
> at that paper before (or maybe just a similar one), and I just didn't
> react enough for it to leave a lasting impact.

What if the "object version" is a hash of the content (as opposed to
header + content like the normal git hash)?

Mike

Re: SHA1 collisions found

2017-03-02 Thread Linus Torvalds

On Thu, Mar 2, 2017 at 12:43 PM, Junio C Hamano  wrote:
>
> My reaction heavily depends on how that "object version" thing
> works.
>
> Would "object version" be like a truncated SHA-1 over the same data
> but with different IV or something, i.e. something that guarantees
> anybody would get the same result given the data to be hashed?

Yes, it does need to be that in practice. So what I was thinking the
object version would be is:

 (a) we actually take the object type into account explicitly.

 (b) we explicitly add another truncated hash.

The first part we can already do without any actual data structure
changes, since basically all users already know the type of an object
when they look it up.

So we already have information that we could use to narrow down the
hash collision case if we saw one.

There are some (very few) cases where we don't already explicitly have
the object type (a tag reference can be any object, for example, and
existing scripts might ask for "give me the type of this SHA1 object
with "git cat-file -t"), but that just goes back to the whole "yeah,
we'll handle legacy uses and we will look up objects even _without_
the extra version data, so it actually integrates well into the whole
notion.

Basically, once you accept that "hey, we'll just have a list of
objects with that hash", it just makes sense to narrow it down by the
object type we also already have.

But yes, the object type is obviously only two bits of information
(actually, considering the type distribution, probably just one bit),
and it's already encoded in the first hash, so it doesn't actually
help much as "collision avoidance" particularly once you have a
particular attack against that hash in place.

It's just that it *is* extra information that we already have, and
that is very natural to use once you start thinking of the hash lookup
as returning a list of objects. It also mitigates one of the worst
_confusions_ in git, and so basically mitigates the worst-case
downside of an attack basically for free, so it seems like a
no-brainer.

But the real new piece of object version would be a truncated second
hash of the object.

I don't think it matters too much what that second hash is, I would
say that we'd just approximate having a total of 256 bits of hash.

Since we already have basically 160 bits of fairly good hashing, and
roughly 128 bits of that isn't known to be attackable, we'd just use
another hash and truncate that to 128 bits. That would be *way*
overkill in practice, but maybe overkill is what we want. And it
wouldn't really expand the objects all that much more than just
picking a new 256-bit hash would do.

So you'd have to be able to attack both the full SHA1, _and_ whatever
other different good hash to 128 bits.

Linus

PS.  if people think that SHA1 is of a good _size_, and only worry
about the known weaknesses of the hashing itself, we'd only need to
get back the bits that the attacks take away from brute force. That's
currently the 80 -> ~63 bits attack, so you'd really only want about
40 bits of second hash to claw us back back up to 80 bits of brute
force (again: brute force is basically sqrt() of the search space, so
half the bits, so adding 40 bits of hash adds 20 bits to the brute
force cost and you'd get back up to the 2**80 we started with).

So 128 bits of secondary hash really is much more than we'd need. 64
bits would probably be fine.

log -S/-G (aka pickaxe) searches binary files by default

2017-03-02 Thread Thomas Braun

Hi,

I happen to have quite large binary files in my repos.

Today I realized that a line like
git log -G a
searches also files found to be binary (or explicitly marked as binary).

Is that on purpose?
The documentation of "-G" states

"Look for differences whose patch text contains added/removed lines that
match ."

which contradicts the current behaviour. At least for me text != binary.

To reproduce:
$ git init
$ echo -e "a\0b" > data.bin
$ git add data.bin
$ git commit -m "Add new data"
$ git log -p
[...]
diff --git a/data.bin b/data.bin
new file mode 100644
index 000..1a23e4b
Binary files /dev/null and b/data.bin differ
$ git log -G a
[...]

Add new data

I've verified the behaviour with git version 2.12.0.windows.1 and git
version 2.12.0.189.g3bc5322 on debian.

If it is on purpose is there a config option to disable that?

Thanks for reading,
Thomas

Re: SHA1 collisions found

2017-03-02 Thread Brandon Williams

On 02/26, Jeff King wrote:
> On Sun, Feb 26, 2017 at 01:13:59AM +, Jason Cooper wrote:
> 
> > On Fri, Feb 24, 2017 at 10:10:01PM -0800, Junio C Hamano wrote:
> > > I was thinking we would need mixed mode support for smoother
> > > transition, but it now seems to me that the approach to stratify the
> > > history into old and new is workable.
> > 
> > As someone looking to deploy (and having previously deployed) git in
> > unconventional roles, I'd like to add one caveat.  The flag day in the
> > history is great, but I'd like to be able to confirm the integrity of
> > the old history.
> > 
> > "Counter-hashing" the blobs is easy enough, but the trees, commits and
> > tags would need to have, iiuc, some sort of cross-reference.  As in my
> > previous example, "git tag -v v3.16" also checks the counter hash to
> > further verify the integrity of the history (yes, it *really* needs to
> > check all of the old hashes, but I'd like to make sure I can do step one
> > first).
> > 
> > Would there be opposition to counter-hashing the old commits at the flag
> > day?
> 
> I don't think a counter-hash needs to be embedded into the git objects
> themselves. If the "modern" repo format stores everything primarily as
> sha-256, say, it will probably need to maintain a (local) mapping table
> of sha1/sha256 equivalence. That table can be generated at any time from
> the object data (though I suspect we'll keep it up to date as objects
> enter the repository).
> 
> At the flag day[1], you can make a signed tag with the "correct" mapping
> in the tag body (so part of the actual GPG signed data, not referenced
> by sha1). Then later you can compare that mapping to the object content
> in the repo (or to the local copy of the mapping based on that data).
> 
> -Peff
> 
> [1] You don't even need to wait until the flag day. You can do it now.
> This is conceptually similar to the git-evtag tool, though it just
> signs the blob contents of the tag's current tree state. Signing the
> whole mapping lets you verify the entirety of history, but of course
> that mapping is quite big: 20 + 32 bytes per object for
> sha1/sha-256, which is ~250MB for the kernel. So you'd probably not
> want to do it more than once.

There were a few of us discussing this sort of approach internally.  We
also figured that, given some performance hit, you could maintain your
repo in sha256 and do some translation to sha1 if you need to push or
fetch to a server which has the the repo in a sha1 format.  This way you
can convert your repo independently of the rest of the world.

As for storing the translation table, you should really only need to
maintain the table until old clients are phased out and all of the repos
of a project have experienced flag day and have been converted to
sha256.

-- 
Brandon Williams

Re: SHA1 collisions found

2017-03-02 Thread Joey Hess

Linus Torvalds wrote:
> So you'd have to be able to attack both the full SHA1, _and_ whatever
> other different good hash to 128 bits.

There's a surprising result of combining iterated hash functions, that
the combination is no more difficult to attack than the strongest hash
function used.

https://www.iacr.org/cryptodb/archive/2004/CRYPTO/1472/1472.pdf

Perhaps you already knew about this, but I had only heard rumors
that was the case, until I found that reference recently.

-- 
see shy jo

signature.asc
Description: PGP signature

[PATCH] line-log.c: prevent crash during union of too many ranges

2017-03-02 Thread Allan Xavier

The existing implementation of range_set_union does not correctly
reallocate memory, leading to a heap overflow when it attempts to union
more than 24 separate line ranges.

For struct range_set *out to grow correctly it must have out->nr set to
the current size of the buffer when it is passed to range_set_grow.
However, the existing implementation of range_set_union only updates
out->nr at the end of the function, meaning that it is always zero
before this. This results in range_set_grow never growing the buffer, as
well as some of the union logic itself being incorrect as !out->nr is
always true.

The reason why 24 is the limit is that the first allocation of size 1
ends up allocating a buffer of size 24 (due to the call to alloc_nr in
ALLOC_GROW). This goes some way to explain why this hasn't been
caught before.

Fix the problem by correctly updating out->nr after reallocating the
range_set. As this results in out->nr containing the same value as the
variable o, replace o with out->nr as well.

Finally, add a new test to help prevent the problem reoccurring in the
future. Thanks to Vegard Nossum for writing the test.

Signed-off-by: Allan Xavier 
---

Originally sent to git-secur...@googlegroups.com to give hosted
services a chance to apply this if they were affected.

 line-log.c  | 15 +++
 t/t4211-line-log.sh | 10 ++
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/line-log.c b/line-log.c
index 65f3558b3..951029665 100644
--- a/line-log.c
+++ b/line-log.c
@@ -144,7 +144,7 @@ void sort_and_merge_range_set(struct range_set *rs)
 static void range_set_union(struct range_set *out,
 struct range_set *a, struct range_set *b)
 {
-   int i = 0, j = 0, o = 0;
+   int i = 0, j = 0;
struct range *ra = a->ranges;
struct range *rb = b->ranges;
/* cannot make an alias of out->ranges: it may change during grow */
@@ -167,16 +167,15 @@ static void range_set_union(struct range_set *out,
new = &rb[j++];
if (new->start == new->end)
; /* empty range */
-   else if (!o || out->ranges[o-1].end < new->start) {
+   else if (!out->nr || out->ranges[out->nr-1].end < new->start) {
range_set_grow(out, 1);
-   out->ranges[o].start = new->start;
-   out->ranges[o].end = new->end;
-   o++;
-   } else if (out->ranges[o-1].end < new->end) {
-   out->ranges[o-1].end = new->end;
+   out->ranges[out->nr].start = new->start;
+   out->ranges[out->nr].end = new->end;
+   out->nr++;
+   } else if (out->ranges[out->nr-1].end < new->end) {
+   out->ranges[out->nr-1].end = new->end;
}
}
-   out->nr = o;
 }
 
 /*
diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh
index 9d8b5..d0377fae5 100755
--- a/t/t4211-line-log.sh
+++ b/t/t4211-line-log.sh
@@ -106,4 +106,14 @@ test_expect_success '-L with --output' '
test_line_count = 70 log
 '
 
+test_expect_success 'range_set_union' '
+   test_seq 500 > c.c &&
+   git add c.c &&
+   git commit -m "many lines" &&
+   test_seq 1000 > c.c &&
+   git add c.c &&
+   git commit -m "modify many lines" &&
+   git log $(for x in $(test_seq 200); do echo -L $((2*x)),+1:c.c; done)
+'
+
 test_done
-- 
2.11.0

Re: [PATCH 3/4] filter-branch: fix --prune-empty on parentless commits

2017-03-02 Thread Jacob Keller

On Thu, Mar 2, 2017 at 11:36 AM, Junio C Hamano  wrote:
> "Devin J. Pohly"  writes:
>
>> I think your point is interesting too, though.  If a commit is also
>> TREESAME to its parent(s?) in the _pre-filtered_ branch, it seems
>> reasonable that someone might want to leave it in the filtered branch as
>> an empty commit while pruning empt*ied* commits.  I would imagine that
>> as another option (--prune-newly-empty?).
>
> I was hoping to hear from others who may care about filter-branch to
> comment on this topic to help me decide, but I haven't heard
> anything, so here is my tentative thinking.
>
> I am leaning to:
>
>  * Take your series as-is, which would mean --prune-empty will
>change the behaviour to unconditionally lose the empty root.
>

This new behavior is how I expected prune-empty to behave, so seeing
that it did not already behave this way was surprising.

Thanks,
Jake

Re: SHA1 collisions found

2017-03-02 Thread Linus Torvalds

On Thu, Mar 2, 2017 at 1:54 PM, Joey Hess  wrote:
>
> There's a surprising result of combining iterated hash functions, that
> the combination is no more difficult to attack than the strongest hash
> function used.

Duh. I should actually have known that. I started reading the paper
and went "this seems very familiar". I'm pretty sure I've been pointed
at that paper before (or maybe just a similar one), and I just didn't
react enough for it to leave a lasting impact.

  Linus

Re: [PATCH 3/4] filter-branch: fix --prune-empty on parentless commits

2017-03-02 Thread Junio C Hamano

"Devin J. Pohly"  writes:

> On Thu, Mar 02, 2017 at 11:36:18AM -0800, Junio C Hamano wrote:
>> "Devin J. Pohly"  writes:
>> 
>> > I think your point is interesting too, though.  If a commit is also
>> > TREESAME to its parent(s?) in the _pre-filtered_ branch, it seems
>> > reasonable that someone might want to leave it in the filtered branch as
>> > an empty commit while pruning empt*ied* commits.  I would imagine that
>> > as another option (--prune-newly-empty?).
>> 
>> I was hoping to hear from others who may care about filter-branch to
>> comment on this topic to help me decide, but I haven't heard
>> anything, so here is my tentative thinking.
>> 
>> I am leaning to:
>> 
>>  * Take your series as-is, which would mean --prune-empty will
>>change the behaviour to unconditionally lose the empty root.
>> 
>>  * Then, people who care deeply about it can add a new option that
>>prunes commits that become empty while keeping the originally
>>empty ones.
>> 
>> Thoughts?
>
> Sounds good to me.  I would be willing to work on a new option if needed
> (to "atone" for changing existing behavior), so you can loop me in if
> there are any complaints.

Thanks.  I'll wait for others who know filter-branch better than me
to say something for a few days before doing anything, though.

Re: [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area

2017-03-02 Thread Junio C Hamano

Jeff Hostetler  writes:

> Sorry,  $DAYJOB got in the way (again).
>
> This is still on my short-list of things to take care of.
> I should have something for you next week.

That's perfectly OK.  I just wanted a newer articule in my
newsreader I can bookmark so that I won't forget ;-) No hurries.

Re: [PATCH 3/4] filter-branch: fix --prune-empty on parentless commits

2017-03-02 Thread Devin J. Pohly

On Thu, Mar 02, 2017 at 11:36:18AM -0800, Junio C Hamano wrote:
> "Devin J. Pohly"  writes:
> 
> > I think your point is interesting too, though.  If a commit is also
> > TREESAME to its parent(s?) in the _pre-filtered_ branch, it seems
> > reasonable that someone might want to leave it in the filtered branch as
> > an empty commit while pruning empt*ied* commits.  I would imagine that
> > as another option (--prune-newly-empty?).
> 
> I was hoping to hear from others who may care about filter-branch to
> comment on this topic to help me decide, but I haven't heard
> anything, so here is my tentative thinking.
> 
> I am leaning to:
> 
>  * Take your series as-is, which would mean --prune-empty will
>change the behaviour to unconditionally lose the empty root.
> 
>  * Then, people who care deeply about it can add a new option that
>prunes commits that become empty while keeping the originally
>empty ones.
> 
> Thoughts?

Sounds good to me.  I would be willing to work on a new option if needed
(to "atone" for changing existing behavior), so you can loop me in if
there are any complaints.

-- 
<><

Re: [PATCH v3] Documentation: Improve description for core.quotePath

2017-03-02 Thread Junio C Hamano

This is now a single-patch, which makes sense, too.

Let's merge it to 'next'.

Thanks.

RE: [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area

2017-03-02 Thread Jeff Hostetler

Sorry,  $DAYJOB got in the way (again).

This is still on my short-list of things to take care of.
I should have something for you next week.

Thanks again,
Jeff


-Original Message-
From: Junio C Hamano [mailto:gits...@pobox.com] 
Sent: Thursday, March 2, 2017 4:12 PM
To: Jeff Hostetler 
Cc: Jeff King ; Johannes Schindelin 
; git@vger.kernel.org; Jeff Hostetler 

Subject: Re: [PATCH 0/5] A series of performance enhancements in the memihash 
and name-cache area

Jeff Hostetler  writes:

> On 2/14/2017 5:03 PM, Jeff King wrote:
>> On Tue, Feb 14, 2017 at 12:31:46PM +0100, Johannes Schindelin wrote:
>>
>>> On Windows, calls to memihash() and maintaining the istate.name_hash and
>>> istate.dir_hash HashMaps take significant time on very large
>>> repositories. This series of changes reduces the overall time taken for
>>> various operations by reducing the number calls to memihash(), moving
>>> some of them into multi-threaded code, and etc.
>>>
>>> Note: one commenter in 
>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgit-for-windows%2Fgit%2Fpull%2F964&data=02%7C01%7CJeff.Hostetler%40microsoft.com%7C1d493f3031f74657f29308d461b0be80%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636240859121403139&sdata=16RQH1%2BrDonsanClb3%2Fwue7pcy9l7cUq9lDJenqCgbE%3D&reserved=0
>>> pointed out that memihash() only handles ASCII correctly. That is true.
>>> And fixing this is outside the purview of this patch series.
>> Out of curiosity, do you have numbers? Bonus points if the speedup can
>> be shown via a t/perf script.
>>
>> We have a read-cache perf-test already, but I suspect you'd want
>> something more like "git status" or "ls-files -o" that calls into
>> read_directory().
>
> I have some informal numbers in a spreadsheet.  I was seeing
> a 8-9% speed up on a status on my gigantic repo.
>
> I'll try to put together a before/after perf-test to better
> demonstrate this.

Ping?  I do not think there is anything wrong with the changes from
correctness point of view, but as the series is about performance,
it somewhat feels pointless to merge to 'next' without mentioning
the actual numbers.  

It might be sufficient to mention the rough numbers in the log
messages, if additions to t/perf/ are too cumbersome to arrange.

Re: [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area

2017-03-02 Thread Junio C Hamano

Jeff Hostetler  writes:

> On 2/14/2017 5:03 PM, Jeff King wrote:
>> On Tue, Feb 14, 2017 at 12:31:46PM +0100, Johannes Schindelin wrote:
>>
>>> On Windows, calls to memihash() and maintaining the istate.name_hash and
>>> istate.dir_hash HashMaps take significant time on very large
>>> repositories. This series of changes reduces the overall time taken for
>>> various operations by reducing the number calls to memihash(), moving
>>> some of them into multi-threaded code, and etc.
>>>
>>> Note: one commenter in https://github.com/git-for-windows/git/pull/964
>>> pointed out that memihash() only handles ASCII correctly. That is true.
>>> And fixing this is outside the purview of this patch series.
>> Out of curiosity, do you have numbers? Bonus points if the speedup can
>> be shown via a t/perf script.
>>
>> We have a read-cache perf-test already, but I suspect you'd want
>> something more like "git status" or "ls-files -o" that calls into
>> read_directory().
>
> I have some informal numbers in a spreadsheet.  I was seeing
> a 8-9% speed up on a status on my gigantic repo.
>
> I'll try to put together a before/after perf-test to better
> demonstrate this.

Ping?  I do not think there is anything wrong with the changes from
correctness point of view, but as the series is about performance,
it somewhat feels pointless to merge to 'next' without mentioning
the actual numbers.  

It might be sufficient to mention the rough numbers in the log
messages, if additions to t/perf/ are too cumbersome to arrange.

Re: SHA1 collisions found

2017-03-02 Thread Junio C Hamano

Linus Torvalds  writes:

> Anyway, I do have a suggestion for what the "object version" would be,
> but I'm not even going to mention it, because I want people to first
> think about the _concept_ and not the implementation.
>
> So: What do you think about the concept?

My reaction heavily depends on how that "object version" thing
works.  When I think I have "variant #1" of an object and say

  have 860cd699c285f02937a2edbdb78e8231292339a5#1

is there any guarantee that the other end has a (small) set of
different objects all sharing the same SHA-1 and it thinks it has
"variant #1" only when it has the same thing as I have (otherwise,
it may have "variant #2" that is an unrelated object but happens to
share the same hash)?  If so, I think I understand how things would
work within your "concept".  But otherwise, I am not really sure.

Would "object version" be like a truncated SHA-1 over the same data
but with different IV or something, i.e. something that guarantees
anybody would get the same result given the data to be hashed?

Re: Transition plan for git to move to a new hash function

2017-03-02 Thread Ian Jackson

brian m. carlson writes ("Re: Transition plan for git to move to a new hash 
function"):
> On Mon, Feb 27, 2017 at 01:00:01PM +, Ian Jackson wrote:
> > Objects of one hash may refer to objects named by a different hash
> > function to their own.  Preference rules arrange that normally, new
> > hash objects refer to other new hash objects.
> 
> The existing codebase isn't really intended with that in mind.

Yes.  I've seen the attempts to start to replace char* with a hash
struct.

> I like Peff's suggested approach in which we essentially rewrite history
> under the hood, but have a lookup table which looks up the old hash
> based on the new hash.  That allows us to refer to old objects, but not
> have to share serialized data that mentions both hashes.

I think this means that the when a project converts, every copy of the
history must be rewritten (separately).  Also, this leaves the whole
system lacking in algorithm agililty.  Meaning we may have to do all
of this again some time.

I also think that we need to distinguish old hashes from new hashes in
the command line interface etc.  Otherwise there is a possible
ambiguity.

> > The object name textual syntax is extended.  The new syntax may be
> > used in all textual git objects and protocols (commits, tags, command
> > lines, etc.).
> > 
> > We declare that the object name syntax is henceforth
> >   [A-Z]+[0-9a-z]+ | [0-9a-f]+
> > and that names [A-Z].* are deprecated as ref name components.
> 
> I'd simply say that we have data always be in the new format if it's
> available, and tag the old SHA-1 versions instead.  Otherwise, as Peff
> pointed out, we're going to be stuck typing a bunch of identical stuff
> every time.  Again, this encourages migration.

The hash identifier is only one character.  Object names are not
normally typed very much anyway.

If you say we must decorate old hashes, then all existing data
everywhere in the world which refers to any git objects by object name
will become invalid.  I don't mean just data in git here.  I mean CI
systems, mailing list archives, commit messages (perhaps in other
version control systems), test cases, and so on.

Ian.

Re: SHA1 collisions found

2017-03-02 Thread Linus Torvalds

On Fri, Feb 24, 2017 at 4:39 PM, Linus Torvalds
 wrote:
>
> Honestly, I think that a primary goal for a new hash implementation
> absolutely needs to be to minimize mixing.
>
> Not for security issues, but because of combinatorics. You want to
> have a model that basically reads old data, but that very aggressively
> approaches "new data only" in order to avoid the situation where you
> have basically the exact same tree state, just _represented_
> differently.
>
> For example, what I would suggest the rules be is something like this:

Hmm. Having looked at this a fair amount, and in particularly having
looked at the code as part of the hash typesafety patch I did, I am
actually starting to think that it would not be too painful at all to
have a totally different approach, which might be a lot easier to do.

So bear with me, let me try to explain my thinking:

 (a) if we want to be backwards compatible and not force people to
convert their trees on some flag day, we're going to be stuck with
having to have the SHA1 code around and all the existing object
parsing for basically forever

Now, this basically means that the _easy_ solution would be that we
just do the flag day, switch to sha-256, extend everything to 32-byte
hashes, and just have a "git2 fast-import" that makes it easy to
convert stuff.

But it really would be a completely different version of git, with a
new pack-file format and no real compatibility. Such a flag-day
approach would certainly have advantages: it would allow for just
re-architecting some bad choices:

 - make the hashing be something that can be threaded (ie maybe we can
just block it up in 4MB chunks that you can hash in parallel, and make
the git object hash be the hash of hashes)

 - replace zlib with something like zstd

 - get rid of old pack formats etc.

but  on the whole, I still think that the compatibility would be worth
much more than the possible technical advantages of a clean slate
restart.

 (b) the SHA1 hash is actually still quite strong, and the collision
detection code effectively means that we don't really have to worry
about collisions in the immediate future.

In other words, the mitigation of the current attack is actually
really easy technically (modulo perhaps the performance concerns), and
there's still nothing fundamentally wrong with using SHA1 as a content
hash. It's still a great hash.

Now, my initial reaction (since it's been discussed for so long
anyway) was obviously "pick a different hash". That was everybody's
initial reaction, I think.

But I'm starting to think that maybe that initial obvious reaction was wrong.

The thing that makes collision attacks so nasty is that our reaction
to a collision is so deadly.  But that's not necessarily fundamental:
we certainly uses hashes with collisions every day, and they work
fine. And they work fine because the code that uses those hashes is
designed to simply deal gracefully - although very possibly with a
performance degradation - with two different things hashing to the
same bucket.

So what if the solution to "SHA1 has potential collisions" is "any
hash can have collisions in theory, let's just make sure we handle
them gracefully"?

Because I'm looking at our data structures that have hashes in them,
and many of them are actually of the type where I go

  "Hmm..  Replacing the hash entirely is really really painful - but
it wouldn't necessarily be all that nasty to extend the format to have
additional version information".

and the advantage of that approach is that it actually makes the
compatibility part trivial. No more "have to pick a different hash and
two different formats", and more of a "just the same format with
extended information that might not be there for old objects".

So we have a few different types of formats:

 - the purely local data structures: the pack index file, the file
index, our refs etc

   These we could in change completely, and it wouldn't even be all
that painful. The pack index has already gone through versions, and it
doesn't affect anything else.

 - the actual objects.

   These are fairly painful to change, particularly things like the
"tree" object which is probably the worst designed of the lot. Making
it contain a fixed-size binary thing was really a nasty mistake. My
bad.

 - the pack format and the protocol to exchange "I have this" information

   This is *really* painful to change, because it contains not just
the raw object data, but it obviously ends up being the wire format
for remote accesses.

and it turns out that *all* of these formats look like they would be
fairly easy to extend to having extra object version information. Some
of that extra object version information we already have and don't
use, in fact.

Even the tree format, with the annoying fixed-size binary blob. Yes,
it has that fixed size binary blob, but it has it in _addition_ to the
ASCII textual form that would be really easy to just extend upon. We
have that "tree entry type" that w

Re: git status --> Out of memory, realloc failed

2017-03-02 Thread René Scharfe


Am 01.03.2017 um 21:12 schrieb Carsten Fuchs:

Hi René,

Am 01.03.2017 um 11:02 schrieb René Scharfe:

I use Git at a web hosting service, where my user account has a memory
limit of 768 MB:

(uiserver):p7715773:~$ uname -a
Linux infongp-de15 3.14.0-ui16322-uiabi1-infong-amd64 #1 SMP Debian
3.14.79-2~ui80+4 (2016-11-17) x86_64 GNU/Linux


What's the output of "ulimit -a"?


(uiserver):p7715773:~$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 1
file size   (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 512
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) 1800
max user processes  (-u) 42
virtual memory  (kbytes, -v) 786432
file locks  (-x) unlimited


When I use ulimit -v with lower and lower numbers I can provoke mmap 
failures on bigger pack files, but not the realloc failures that you're 
seeing.  And your packs should be only up to 20MB anyway (you can check 
that with "ls -l .git/objects/pack/*.pack").



The repository is tracking about 19000 files which together take 260 MB.
The git server version is 2.7.4.1.g5468f9e (Bitbucket)


Is your repository publicly accessible?


Unfortunately, no. There are no big secrets in there, but just a couple
of database details so that I cannot make it universally available. I
can gladly give you access though. (E.g. by adding your public SSH key?)


I'd rather not look at semi-confidential data, and you probably 
shouldn't hand it to a stranger on the internet anyway. ;)


So a shot in the dark: Do you have a lot of untracked files?  You could 
check by cloning your repository locally (which copies only tracked 
contents).  Does "git status" work on the clone?


Another one, darker yet: Does "git config core.preloadIndex 0" help?

René

Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions

2017-03-02 Thread Mike Crowe

On Thursday 02 March 2017 at 10:33:59 -0800, Junio C Hamano wrote:
> Mike Crowe  writes:
> 
> > All the solutions presented so far do cause a small change in behaviour
> > when using git diff --quiet: they may now cause warning messages like:
> >
> >  warning: CRLF will be replaced by LF in crlf.txt.
> >  The file will have its original line endings in your working directory.
> 
> That is actually a good thing, I think.  As the test modifies a file
> that originally has "Hello\r\nWorld\r\n" in it to this:
> 
> >> +test_expect_success 'quiet diff works on file with line-ending change 
> >> that has no effect on repository' '
> >> +  printf "Hello\r\nWorld\n" >crlf.txt &&
> 
> If you did "git add" at this point, you would get the same warning,
> because the lack of CR on the second line could well be a mistake
> you may want to notice and fix before going forward.  Otherwise
> you'd be losing information that _might_ matter to you (i.e. the
> fact that the first line had CRLF while the second had LF) and it is
> the whole point of safe_crlf setting.

Well, there is an argument that it's not very "--quiet" to emit this
message. Especially when it didn't used to come out. However, I can
understand that the message is useful if the line endings have changed
despite this.

However, I can make the message appear from "git diff --quiet" even if the
line endings have not changed. I merely need to touch a file where the line
endings do not match the canonical representation in the repository first.
Upon subsequent invocations of "git diff --quiet" the message does not come
out. (Note that in this may not be reproducible in a script without
sleeps.)

Perhaps this interactive log will make things clearer:

 $ git init
 Initialized empty Git repository in /tmp/test/.git/
 $ echo "* text=auto" >.gitattributes
 $ printf "Hello\r\nWorld\r\n" >crlf.txt
 $ git add .gitattributes crlf.txt
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.

The message was expected and useful there.

 $ git commit -m "initial"
 [master (root-commit) c3fb5a5] initial
 2 files changed, 3 insertions(+)
 create mode 100644 .gitattributes
 create mode 100644 crlf.txt
 $ git diff --quiet
 $ touch crlf.txt
 $ git diff --quiet
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.

I didn't change the file - I just touched it. Why did the message come out here?

 $ git diff --quiet

But then it didn't here. Which is correct?

 $ printf "Hello\r\nWorld\n" >crlf.txt
 $ git diff --quiet
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.
 $ git diff --quiet
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.

If the line endings have genuinely changed then the message comes out every
time...

 $ git add crlf.txt
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.

...until the file is added to the index. That's probably the right thing to
do.

 $ git diff --quiet
 $ touch crlf.txt
 $ git diff --quiet
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.
 $ git diff --quiet

But once the file has been added the previous behaviour of only emitting
the message on the first time after the touch occurs.

 $ printf "Hello\r\nWorld\n" >crlf.txt
 $ git diff --quiet
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.
 $ git diff --quiet
 $ printf "Hello\r\nWorld\r\n" >crlf.txt
 $ git diff --quiet
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.
 $ git diff --quiet
 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.

Hopefully that makes things a bit clearer.

Mike.

Re: [PATCH 5/5] ls-files: fix bug when recuring with relative pathspec

2017-03-02 Thread Brandon Williams

On 02/28, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > Fix a bug which causes a child process for a submodule to error out when a
> > relative pathspec with a ".." is provided in the superproject.
> >
> > While at it, correctly construct the super-prefix to be used in a submodule
> > when not at the root of the repository.
> >
> > Signed-off-by: Brandon Williams 
> > ---
> >  builtin/ls-files.c | 8 ++--
> >  t/t3007-ls-files-recurse-submodules.sh | 2 +-
> >  2 files changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> > index 159229081..89533ab8e 100644
> > --- a/builtin/ls-files.c
> > +++ b/builtin/ls-files.c
> > @@ -194,12 +194,15 @@ static void compile_submodule_options(const struct 
> > dir_struct *dir, int show_tag
> >  static void show_gitlink(const struct cache_entry *ce)
> >  {
> > struct child_process cp = CHILD_PROCESS_INIT;
> > +   struct strbuf name = STRBUF_INIT;
> > int status;
> > int i;
> >  
> > +   quote_path_relative(ce->name, prefix, &name);
> > argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
> 
> Same comment as 3/5.  quote_path is to produce c-quote and is not
> even meant for shell script quoting.  run_command() interface would
> do its own shell quoting when needed, so  I think you just want the
> exact string you want to pass here.
> 

Yeah I don't know what I was thinking when using that instead of
'relative_path()'.  Will change for this and 3/5.

-- 
Brandon Williams

Re: [PATCH 3/4] filter-branch: fix --prune-empty on parentless commits

2017-03-02 Thread Junio C Hamano

"Devin J. Pohly"  writes:

> I think your point is interesting too, though.  If a commit is also
> TREESAME to its parent(s?) in the _pre-filtered_ branch, it seems
> reasonable that someone might want to leave it in the filtered branch as
> an empty commit while pruning empt*ied* commits.  I would imagine that
> as another option (--prune-newly-empty?).

I was hoping to hear from others who may care about filter-branch to
comment on this topic to help me decide, but I haven't heard
anything, so here is my tentative thinking.

I am leaning to:

 * Take your series as-is, which would mean --prune-empty will
   change the behaviour to unconditionally lose the empty root.

 * Then, people who care deeply about it can add a new option that
   prunes commits that become empty while keeping the originally
   empty ones.

Thoughts?

Re: [PATCH] Put sha1dc on a diet

2017-03-02 Thread Linus Torvalds

On Thu, Mar 2, 2017 at 10:37 AM, Jeff Hostetler  wrote:
>>
>> Now, if your _file_ index is 300-400MB (and I do think we check the
>> SHA fingerprint on that even on just reading it - verify_hdr() in
>> do_read_index()), then that's going to be a somewhat noticeable hit on
>> every normal "git diff" etc.
>
> Yes, the .git/index is 450MB with ~3.1M entries.  verify_hdr() is called
> each time we read it into memory.

Ok. So that's really just a purely historical artifact.

The file index is actually the first part of git to have ever been
written. You can't even see it in the history, because the initial
revision from Apr 7, 2005, obviously depended on the actual object
hashing.

But the file index actually came first. You can _kind_ of see that in
the layout of the original git tree, and how the main header file is
still called "cache.h", and how the original ".git" directory was
actually called ".dircache".

And the two biggest files (by a fairly big margin) are "read-cache.c"
and "update-cache.c".

So that file index cache was in many ways _the_ central part of the
original git model. The sha1 file indexing and object database was
just the backing store for the file index.

But part of that history is then how much I worried about corruption
of that index (and, let's face it, general corruption resistance _was_
one of the primary design goals - performance was high up there too,
but safety in the face of filesystem corruption was and is a primary
issue).

But realistically, I don't think we've *ever* hit anything serious on
the index file, and it's obviously not a security issue. It also isn't
even a compatibility issue, so it would be trivial to just bump the
version header and saying that the signature changes the meaning of
the checksum.

That said:

> We have been testing a patch in GfW to run the verification in a separate 
> thread
> while the main thread parses (and mallocs) the cache_entries.  This does help
> offset the time.

Yeah, that seems an even better solution, honestly.

The patch would be cleaner without the NO_PTHREADS things.

I wonder how meaningful that thing even is today. Looking at what
seems to select NO_PTHREADS, I suspect that's all entirely historical.
For example, you'll see it for QNX etc, which seems wrong - QNX
definitely has pthreads according to their docs, for example.

 Linus

Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions

2017-03-02 Thread Torsten Bögershausen

On 2017-03-02 15:20, Mike Crowe wrote:
> ll the solutions presented so far do cause a small change in behaviour
> when using git diff --quiet: they may now cause warning messages like:
> 
>  warning: CRLF will be replaced by LF in crlf.txt.
>  The file will have its original line endings in your working directory.
Ah,
that is not ideal.
I can have a look at it later (or due to the weekend)

Re: [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info

2017-03-02 Thread Junio C Hamano

Jeff King  writes:

> On Tue, Feb 28, 2017 at 01:42:44PM -0800, Junio C Hamano wrote:
>
>> Jonathan Tan  writes:
>> 
>> > It could be argued that in the future, Git might need to distinguish
>> > tree_objects from blob_objects - in particular, a user might want
>> > rev-list to print the trees but not the blobs. 
>> 
>> That was exactly why these bits were originally made to "appear
>> independent but in practice nobody sets only one and leaves others
>> off".  
>> 
>> And it didn't happen in the past 10 years, which tells us that we
>> should take this patch.
>
> I actually have a patch which uses the distinction. It's for
> upload-archive doing reachability checks (which seems rather familiar to
> what's going on here).

OK.  Thanks for stopping me ;-)

[PATCH v2] diff: do not short-cut CHECK_SIZE_ONLY check in diff_populate_filespec()

2017-03-02 Thread Junio C Hamano

Callers of diff_populate_filespec() can choose to ask only for the
size of the blob without grabbing the blob data, and the function,
after running lstat() when the filespec points at a working tree
file, returns by copying the value in size field of the stat
structure into the size field of the filespec when this is the case.

However, this short-cut cannot be taken if the contents from the
path needs to go through convert_to_git(), whose resulting real blob
data may be different from what is in the working tree file.

As "git diff --quiet" compares the .size fields of filespec
structures to skip content comparison, this bug manifests as a
false "there are differences" for a file that needs eol conversion,
for example.

Reported-by: Mike Crowe 
Helped-by: Torsten Bögershausen 
Signed-off-by: Junio C Hamano 
---

 * With "test size_only to avoid more expensive would_convert call"
   fix applied.  Also the new test is now in t4xxx that it belongs
   to.

 diff.c| 19 ++-
 t/t4035-diff-quiet.sh |  9 +
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 059123c5dc..37e60ca601 100644
--- a/diff.c
+++ b/diff.c
@@ -2783,8 +2783,25 @@ int diff_populate_filespec(struct diff_filespec *s, 
unsigned int flags)
s->should_free = 1;
return 0;
}
-   if (size_only)
+
+   /*
+* Even if the caller would be happy with getting
+* only the size, we cannot return early at this
+* point if the path requires us to run the content
+* conversion.
+*/
+   if (size_only && !would_convert_to_git(s->path))
return 0;
+
+   /*
+* Note: this check uses xsize_t(st.st_size) that may
+* not be the true size of the blob after it goes
+* through convert_to_git().  This may not strictly be
+* correct, but the whole point of big_file_threshold
+* and is_binary check being that we want to avoid
+* opening the file and inspecting the contents, this
+* is probably fine.
+*/
if ((flags & CHECK_BINARY) &&
s->size > big_file_threshold && s->is_binary == -1) {
s->is_binary = 1;
diff --git a/t/t4035-diff-quiet.sh b/t/t4035-diff-quiet.sh
index 461f4bb583..2f1737fcef 100755
--- a/t/t4035-diff-quiet.sh
+++ b/t/t4035-diff-quiet.sh
@@ -152,4 +152,13 @@ test_expect_success 'git diff --quiet ignores stat-change 
only entries' '
test_expect_code 1 git diff --quiet
 '
 
+test_expect_success 'git diff --quiet on a path that need conversion' '
+   echo "crlf.txt text=auto" >.gitattributes &&
+   printf "Hello\r\nWorld\r\n" >crlf.txt &&
+   git add .gitattributes crlf.txt &&
+
+   printf "Hello\r\nWorld\n" >crlf.txt &&
+   git diff --quiet crlf.txt
+'
+
 test_done
-- 
2.12.0-352-gb05ccab5eb

Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions

2017-03-02 Thread Jeff King

On Thu, Mar 02, 2017 at 09:52:21AM -0800, Junio C Hamano wrote:

> >> +   * and is_binary check being that we want to avoid
> >> +   * opening the file and inspecting the contents, this
> >> +   * is probably fine.
> >> +   */
> >>if ((flags & CHECK_BINARY) &&
> >>s->size > big_file_threshold && s->is_binary == -1) {
> >>s->is_binary = 1;
> >
> > I'm trying to think how this "not strictly correct" could bite us. 
> 
> Note that the comment is just documenting what I learned and thought
> while working on an unrelated thing that happened to be sitting next
> to it.

Yeah, sorry, this is obviously not a blocker to your patch. I'm just
wondering if there is more work needed.

> To be quite honest, I do not think this code should cater to LFS or
> any other conversion hack.  They all install their own diff driver
> and they can tell diff_filespec_is_binary() if the thing is binary
> or not without falling back to this heuristics codepath.

Yeah, you're right, I was just being silly. Whatever configured the
filter already has an opportunity to give us this knowledge in a better
way, and we should rely on that.

-Peff

Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions

2017-03-02 Thread Junio C Hamano

Jeff King  writes:

>> diff --git a/diff.c b/diff.c
>> index 8c78fce49d..dc51dceb44 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -2792,8 +2792,25 @@ int diff_populate_filespec(struct diff_filespec *s, 
>> unsigned int flags)
>>  s->should_free = 1;
>>  return 0;
>>  }
>> -if (size_only)
>> +
>> +/*
>> + * Even if the caller would be happy with getting
>> + * only the size, we cannot return early at this
>> + * point if the path requires us to run the content
>> + * conversion.
>> + */
>> +if (!would_convert_to_git(s->path) && size_only)
>>  return 0;
>
> The would_convert_to_git() function is a little expensive (it may have
> to do an attribute lookup). It may be worth swapping the two halves of
> the conditional here to get the short-circuit.

Yes.  I think it makes sense.

>> +
>> +/*
>> + * Note: this check uses xsize_t(st.st_size) that may
>> + * not be the true size of the blob after it goes
>> + * through convert_to_git().  This may not strictly be
>> + * correct, but the whole point of big_file_threashold
>
> s/threashold/threshold/

Thanks.  I felt there was something wrong and looked at the line
three times but somehow failed to spot exactly what was wrong ;-)

>
>> + * and is_binary check being that we want to avoid
>> + * opening the file and inspecting the contents, this
>> + * is probably fine.
>> + */
>>  if ((flags & CHECK_BINARY) &&
>>  s->size > big_file_threshold && s->is_binary == -1) {
>>  s->is_binary = 1;
>
> I'm trying to think how this "not strictly correct" could bite us. 

Note that the comment is just documenting what I learned and thought
while working on an unrelated thing that happened to be sitting next
to it.

Nobody asks "I am OK without the contents i.e. size-only" and "Can
you see if this is binary?" at the same time (and if a caller did,
it would never have got is_binary with the original code).  s->size
is still a copy of st.size at this point of the code (we have not
actually updated it to the size of the real blob, which happens a
bit later in the flow of this codepath where we actually slurp the
thing in and run the conversion).  So with or without this patch,
there shouldn't be any change in the behaviour wrt CHECK_BINARY.

> For
> line-ending conversion, I'd say that the before/after are going to be
> approximately the same size. But what about something like LFS? If I
> have a 600MB file that convert_to_git() filters into a short LFS
> pointer, I think this changes the behavior. Before, we would diff the
> pointer file, but now we'll get "binary file changed".

To be quite honest, I do not think this code should cater to LFS or
any other conversion hack.  They all install their own diff driver
and they can tell diff_filespec_is_binary() if the thing is binary
or not without falling back to this heuristics codepath.

What is done here *is* inconsistent with what is done on the other
side of if/else, that compares big_file_threshold with in-git size,
not in-working-tree size.  That may want to be addressed, but I am
not sure if it is worth it.

[PATCH v3] Documentation: Improve description for core.quotePath

2017-03-02 Thread Andreas Heiduk

Linking the description for pathname quoting to the configuration
variable "core.quotePath" removes inconstistent and incomplete
sections while also giving two hints how to deal with it: Either with
"-c core.quotePath=false" or with "-z".

Signed-off-by: Andreas Heiduk 
---
 Documentation/config.txt  | 23 +--
 Documentation/diff-format.txt |  7 ---
 Documentation/diff-generate-patch.txt |  7 +++
 Documentation/diff-options.txt|  7 +++
 Documentation/git-apply.txt   |  7 +++
 Documentation/git-commit.txt  |  9 ++---
 Documentation/git-ls-files.txt| 10 ++
 Documentation/git-ls-tree.txt | 10 +++---
 Documentation/git-status.txt  |  7 +++
 9 files changed, 48 insertions(+), 39 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 015346c..23233d8 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -350,16 +350,19 @@ core.checkStat::
all fields, including the sub-second part of mtime and ctime.
 
 core.quotePath::
-   The commands that output paths (e.g. 'ls-files',
-   'diff'), when not given the `-z` option, will quote
-   "unusual" characters in the pathname by enclosing the
-   pathname in a double-quote pair and with backslashes the
-   same way strings in C source code are quoted.  If this
-   variable is set to false, the bytes higher than 0x80 are
-   not quoted but output as verbatim.  Note that double
-   quote, backslash and control characters are always
-   quoted without `-z` regardless of the setting of this
-   variable.
+   Commands that output paths (e.g. 'ls-files', 'diff'), will
+   quote "unusual" characters in the pathname by enclosing the
+   pathname in double-quotes and escaping those characters with
+   backslashes in the same way C escapes control characters (e.g.
+   `\t` for TAB, `\n` for LF, `\\` for backslash) or bytes with
+   values larger than 0x80 (e.g. octal `\302\265` for "micro" in
+   UTF-8).  If this variable is set to false, bytes higher than
+   0x80 are not considered "unusual" any more. Double-quotes,
+   backslash and control characters are always escaped regardless
+   of the setting of this variable.  A simple space character is
+   not considered "unusual".  Many commands can output pathnames
+   completely verbatim using the `-z` option. The default value
+   is true.
 
 core.eol::
Sets the line ending type to use in the working directory for
diff --git a/Documentation/diff-format.txt b/Documentation/diff-format.txt
index cf52626..706916c 100644
--- a/Documentation/diff-format.txt
+++ b/Documentation/diff-format.txt
@@ -78,9 +78,10 @@ Example:
 :100644 100644 5be4a4.. 00.. M file.c
 
 
-When `-z` option is not used, TAB, LF, and backslash characters
-in pathnames are represented as `\t`, `\n`, and `\\`,
-respectively.
+Without the `-z` option, pathnames with "unusual" characters are
+quoted as explained for the configuration variable `core.quotePath`
+(see linkgit:git-config[1]).  Using `-z` the filename is output
+verbatim and the line is terminated by a NUL byte.
 
 diff format for merges
 --
diff --git a/Documentation/diff-generate-patch.txt 
b/Documentation/diff-generate-patch.txt
index d2a7ff5..231105c 100644
--- a/Documentation/diff-generate-patch.txt
+++ b/Documentation/diff-generate-patch.txt
@@ -53,10 +53,9 @@ The index line includes the SHA-1 checksum before and after 
the change.
 The  is included if the file mode does not change; otherwise,
 separate lines indicate the old and the new mode.
 
-3.  TAB, LF, double quote and backslash characters in pathnames
-are represented as `\t`, `\n`, `\"` and `\\`, respectively.
-If there is need for such substitution then the whole
-pathname is put in double quotes.
+3.  Pathnames with "unusual" characters are quoted as explained for
+the configuration variable `core.quotePath` (see
+linkgit:git-config[1]).
 
 4.  All the `file1` files in the output refer to files before the
 commit, and all the `file2` files refer to files after the commit.
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index d91ddbd..89cc0f4 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -192,10 +192,9 @@ ifndef::git-log[]
given, do not munge pathnames and use NULs as output field terminators.
 endif::git-log[]
 +
-Without this option, each pathname output will have TAB, LF, double quotes,
-and backslash characters replaced with `\t`, `\n`, `\"`, and `\\`,
-respectively, and the pathname will be enclosed in double quotes if
-any of those replacements occurred.
+Without this option, pathnames with "unusual" characters are quoted as
+explained for the configuration variable `core.quotePath` (se

Re: [PATCH v1] Travis: also test on 32-bit Linux

2017-03-02 Thread Junio C Hamano

Lars Schneider  writes:

>> On 02 Mar 2017, at 12:24, Johannes Schindelin  
>> wrote:
>> 
>> Hi Lars,
>> 
>> On Thu, 2 Mar 2017, Lars Schneider wrote:
>> 
>>> The patch looks good to me in general but I want to propose the following
>>> changes:
>> 
>> I know you are using your script to generate this mail, but I would have
>> liked to see v2 in the subject ;-)
>
> Yeah, sorry. I already had a "D'oh" moment *after* I saw the email in 
> my email client. Now I am wondering... is the next version v2 or v3 :D

Another question is which v3 people mean in the discussion, when you
and Dscho work on improvements at the same time and each post the
"next" version marked as "v3", and they comment on one of them?

Between "v2" and "v3", it would not make too much difference to the
readership, when it is clear that two people are working to produce
competing improvements without much coordination (i.e. lack of "ok,
I'll send a reroll marked as vX in a minite"--"ok, I'll wait and
comment on it" exchange).  People watching from the sideline know
"ah this is v3 from Lars which is the highest numbered one from him
on this topic".  As long as you do not mark your next one "v1",
you'd be OK ;-).

Re: [PATCH] Put sha1dc on a diet

2017-03-02 Thread Jeff Hostetler




On 3/2/2017 11:35 AM, Linus Torvalds wrote:

On Thu, Mar 2, 2017 at 6:45 AM, Johannes Schindelin
 wrote:

It would probably make sense to switch the index integrity check away from
SHA-1 because we really only care about detecting bit flips there, and we
have no need for the computational overhead of using a full-blown
cryptographic hash for that purpose.

Which index do you actually see as being a problem, btw? The main file
index (.git/index) or the pack-file indexes?

We definitely don't need the checking version of sha1 for either of
those, but as Jeff already did the math, at least the pack-file index
is almost negligible, because the pack-file operations that update it
end up doing SHA1 over the objects - and the object SHA1 calculations
are much bigger.

And I don't think we even check the pack-file index hashes except on fsck.

Now, if your _file_ index is 300-400MB (and I do think we check the
SHA fingerprint on that even on just reading it - verify_hdr() in
do_read_index()), then that's going to be a somewhat noticeable hit on
every normal "git diff" etc.


Yes, the .git/index is 450MB with ~3.1M entries.  verify_hdr() is called 
each time

we read it into memory.

We have been testing a patch in GfW to run the verification in a 
separate thread
while the main thread parses (and mallocs) the cache_entries.  This does 
help

offset the time.
https://github.com/git-for-windows/git/pull/978/files


But I'd have expected the stat() calls of all the files listed by that
index to be the _much_ bigger problem in that case. Or do you just
turn those off with assume-unchanged?

Yeah, those stat calls are threaded when preloading, but even so..


Yes, the stat() calls are more significant percentage of the time (and 
having
core.fscache and core.preloadindex help that greatly), but the total 
time for a command
is just that -- the total -- so using the philosophy of "every little 
bit helps", the faster

routines help us here.


Anyway, the file index SHA1 checking could probably just be disabled
entirely (with a config flag). It's a corruption check that simply
isn't that important. So if that's your main SHA1 issue, that would be
easy to fix.


Yes, in the GVFS effort, we disabled the verification with a config 
setting and haven't

had any incidents.



Everything else - like pack-file generation etc for a big clone() may
end up using a ton of SHA1 too, but the SHA1 costs all scale with the
other costs that drown them out (ie zlib, network, etc).

I'd love to see a profile if you have one.

   Linus

Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions

2017-03-02 Thread Junio C Hamano

Mike Crowe  writes:

> All the solutions presented so far do cause a small change in behaviour
> when using git diff --quiet: they may now cause warning messages like:
>
>  warning: CRLF will be replaced by LF in crlf.txt.
>  The file will have its original line endings in your working directory.

That is actually a good thing, I think.  As the test modifies a file
that originally has "Hello\r\nWorld\r\n" in it to this:

>> +test_expect_success 'quiet diff works on file with line-ending change that 
>> has no effect on repository' '
>> +printf "Hello\r\nWorld\n" >crlf.txt &&

If you did "git add" at this point, you would get the same warning,
because the lack of CR on the second line could well be a mistake
you may want to notice and fix before going forward.  Otherwise
you'd be losing information that _might_ matter to you (i.e. the
fact that the first line had CRLF while the second had LF) and it is
the whole point of safe_crlf setting.

I also think it is a good thing if "git status" reported this path
as modified for the same reason (I didn't actually check if that is
the case).

Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions

2017-03-02 Thread Torsten Bögershausen

On 2017-03-01 22:25, Mike Crowe wrote:
> On Wednesday 01 March 2017 at 18:04:44 +0100, tbo...@web.de wrote:
>> From: Junio C Hamano 
>>
>> git diff --quiet may take a short-cut to see if a file is changed
>> in the working tree:
>> Whenever the file size differs from what is recorded in the index,
>> the file is assumed to be changed and git diff --quiet returns
>> exit with code 1
>>
>> This shortcut must be suppressed whenever the line endings are converted
>> or a filter is in use.
>> The attributes say "* text=auto" and a file has
>> "Hello\nWorld\n" in the index with a length of 12.
>> The file in the working tree has "Hello\r\nWorld\r\n" with a length of 14.
>> (Or even "Hello\r\nWorld\n").
>> In this case "git add" will not do any changes to the index, and
>> "git diff -quiet" should exit 0.
>>
>> Add calls to would_convert_to_git() before blindly saying that a different
>> size means different content.
>>
>> Reported-By: Mike Crowe 
>> Signed-off-by: Torsten Bögershausen 
>> ---
>> This is what I can come up with, collecting all the loose ends.
>> I'm not sure if Mike wan't to have the Reported-By with a
>> Signed-off-by ?
>> The other question is, if the commit message summarizes the discussion
>> well enough ?
>>
>> diff.c| 18 ++
>>  t/t0028-diff-converted.sh | 27 +++
>>  2 files changed, 41 insertions(+), 4 deletions(-)
>>  create mode 100755 t/t0028-diff-converted.sh
>>
>> diff --git a/diff.c b/diff.c
>> index 051761b..c264758 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -4921,9 +4921,10 @@ static int diff_filespec_check_stat_unmatch(struct 
>> diff_filepair *p)
>>   *differences.
>>   *
>>   * 2. At this point, the file is known to be modified,
>> - *with the same mode and size, and the object
>> - *name of one side is unknown.  Need to inspect
>> - *the identical contents.
>> + *with the same mode and size, the object
>> + *name of one side is unknown, or size comparison
>> + *cannot be depended upon.  Need to inspect the
>> + *contents.
>>   */
>>  if (!DIFF_FILE_VALID(p->one) || /* (1) */
>>  !DIFF_FILE_VALID(p->two) ||
>> @@ -4931,7 +4932,16 @@ static int diff_filespec_check_stat_unmatch(struct 
>> diff_filepair *p)
>>  (p->one->mode != p->two->mode) ||
>>  diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
>>  diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
>> -(p->one->size != p->two->size) ||
>> +
>> +/*
>> + * only if eol and other conversions are not involved,
>> + * we can say that two contents of different sizes
>> + * cannot be the same without checking their contents.
>> + */
>> +(!would_convert_to_git(p->one->path) &&
>> + !would_convert_to_git(p->two->path) &&
>> + (p->one->size != p->two->size)) ||
>> +
>>  !diff_filespec_is_identical(p->one, p->two)) /* (2) */
>>  p->skip_stat_unmatch_result = 1;
>>  return p->skip_stat_unmatch_result;
>> diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh
>> new file mode 100755
>> index 000..3d5ab95
>> --- /dev/null
>> +++ b/t/t0028-diff-converted.sh
>> @@ -0,0 +1,27 @@
>> +#!/bin/sh
>> +#
>> +# Copyright (c) 2017 Mike Crowe
>> +#
>> +# These tests ensure that files changing line endings in the presence
>> +# of .gitattributes to indicate that line endings should be ignored
>> +# don't cause 'git diff' or 'git diff --quiet' to think that they have
>> +# been changed.
>> +
>> +test_description='git diff with files that require CRLF conversion'
>> +
>> +. ./test-lib.sh
>> +
>> +test_expect_success setup '
>> +echo "* text=auto" >.gitattributes &&
>> +printf "Hello\r\nWorld\r\n" >crlf.txt &&
>> +git add .gitattributes crlf.txt &&
>> +git commit -m "initial"
>> +'
>> +
>> +test_expect_success 'quiet diff works on file with line-ending change that 
>> has no effect on repository' '
>> +printf "Hello\r\nWorld\n" >crlf.txt &&
>> +git status &&
>> +git diff --quiet
>> +'
>> +
>> +test_done
> 
> Hi Torsten,
> 
> Thanks for investigating this.
> 
> I think that you've simplified the test to the point where it doesn't
> entirely prove the fix. Although you test the case where the file has
> changed size, you don't test the case where it hasn't.
> 
> Unfortunately that was the part of my test that could only reproduce the
> problem with the sleeps. Maybe someone who understands how the cache works
> fully could explain an alternative way to force the cache not to be used.
> 
> Also, I think I've found a behaviour change with this fix. Consider:
> 
>  echo "* text=auto" >.gitattributes
>  printf "Hello\r\nWorld\r\n" >crlf.txt
That should give
"Hello\nWorld\n" in the index:

git add .gitattributes crlf.txt
warning: CRLF will be replaced by LF in ttt/crlf.txt.
The file will have its original line endings in your working directory.
tb@mac:/tmp/t

Re: [PATCH 3/5] grep: fix bug when recuring with relative pathspec

2017-03-02 Thread Brandon Williams

On 02/28, Junio C Hamano wrote:
> Brandon Williams  writes:
> 
> > /* Add super prefix */
> > +   quote_path_relative(name, opt->prefix, &buf);
> 
> Hmph, do you want a quoted version here, not just relative_path()?
> 
> Perhaps add a test with an "unusual" byte (e.g. a double-quote) in
> the path?

You're absolutely correct.

> 
> > argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
> >  super_prefix ? super_prefix : "",
> > -name);
> > +buf.buf);
> > +   strbuf_release(&buf);
> > argv_array_push(&cp.args, "grep");
> >  
> > /*
> > @@ -1199,7 +1202,8 @@ int cmd_grep(int argc, const char **argv, const char 
> > *prefix)
> >  
> > parse_pathspec(&pathspec, 0,
> >PATHSPEC_PREFER_CWD |
> > -  (opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0),
> > +  (opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0) |
> > +  (super_prefix ? PATHSPEC_FROMROOT : 0),
> >prefix, argv + i);
> > pathspec.max_depth = opt.max_depth;
> > pathspec.recursive = 1;
> > diff --git a/t/t7814-grep-recurse-submodules.sh 
> > b/t/t7814-grep-recurse-submodules.sh
> > index 418ba68fe..e0932b2b7 100755
> > --- a/t/t7814-grep-recurse-submodules.sh
> > +++ b/t/t7814-grep-recurse-submodules.sh
> > @@ -227,7 +227,7 @@ test_expect_success 'grep history with moved submoules' 
> > '
> > test_cmp expect actual
> >  '
> >  
> > -test_expect_failure 'grep using relative path' '
> > +test_expect_success 'grep using relative path' '
> > test_when_finished "rm -rf parent sub" &&
> > git init sub &&
> > echo "foobar" >sub/file &&

-- 
Brandon Williams

Re: [PATCH] mingw: use OpenSSL's SHA-1 routines

2017-03-02 Thread Junio C Hamano

Johannes Sixt  writes:

> Am 24.02.2017 um 22:54 schrieb Junio C Hamano:
>> Johannes Sixt  writes:
>>> I'll use the patch for daily work for a while to see whether it hurts.
>>
>> Please ping this thread again when you have something to add.  For
>> now, I'll demote this patch from 'next' to 'pu' when we rewind and
>> rebuild 'next' post 2.12 release.
>
> ... I've used the patch in
> production for some time now and did not notice any slowdowns.

Thanks.  Dscho obviously thinks this is the right thing for Windows,
and you agree.  That's more than sufficient votes to make me feel
safe ;-)

Thanks.

Re: [PATCH] Put sha1dc on a diet

2017-03-02 Thread Linus Torvalds

On Thu, Mar 2, 2017 at 6:45 AM, Johannes Schindelin
 wrote:
>
> It would probably make sense to switch the index integrity check away from
> SHA-1 because we really only care about detecting bit flips there, and we
> have no need for the computational overhead of using a full-blown
> cryptographic hash for that purpose.

Which index do you actually see as being a problem, btw? The main file
index (.git/index) or the pack-file indexes?

We definitely don't need the checking version of sha1 for either of
those, but as Jeff already did the math, at least the pack-file index
is almost negligible, because the pack-file operations that update it
end up doing SHA1 over the objects - and the object SHA1 calculations
are much bigger.

And I don't think we even check the pack-file index hashes except on fsck.

Now, if your _file_ index is 300-400MB (and I do think we check the
SHA fingerprint on that even on just reading it - verify_hdr() in
do_read_index()), then that's going to be a somewhat noticeable hit on
every normal "git diff" etc.

But I'd have expected the stat() calls of all the files listed by that
index to be the _much_ bigger problem in that case. Or do you just
turn those off with assume-unchanged?

Yeah, those stat calls are threaded when preloading, but even so..

Anyway, the file index SHA1 checking could probably just be disabled
entirely (with a config flag). It's a corruption check that simply
isn't that important. So if that's your main SHA1 issue, that would be
easy to fix.

Everything else - like pack-file generation etc for a big clone() may
end up using a ton of SHA1 too, but the SHA1 costs all scale with the
other costs that drown them out (ie zlib, network, etc).

I'd love to see a profile if you have one.

  Linus

Re: [PATCH] Put sha1dc on a diet

2017-03-02 Thread Johannes Schindelin

Hi Duy,

On Thu, 2 Mar 2017, Duy Nguyen wrote:

> On Thu, Mar 2, 2017 at 6:19 AM, Jeff King  wrote:
> > You have to remember that some of the Git for Windows users are doing
> > horrific things like using repositories with 450MB .git/index files,
> > and the speed to compute the sha1 during an update is noticeable
> > there.
> 
> We probably should separate this use case from the object hashing
> anyway. Here we need a better, more reliable crc32 basically, to detect
> bit flips. Even if we move to SHA-something, we can keep staying with
> SHA-1 here (and with the fastest implementation)

I guess it was convenient to use the same hash algorithm for all hashing
purposes in the beginning. The downside, of course, was that we kept
talking about SHA-1s instead of commit hashes and the index checksum (i.e.
using labels based on implementation details rather than semantically
meaningful names).

In the meantime, we use different hash algorithms where appropriate, of
course, and we typically encapsulate the exact hash algorithm so that it
is easy to switch when/if necessary (think the hash functions for strings
in our hashtables, and the hash functions in xdiff).

It would probably make sense to switch the index integrity check away from
SHA-1 because we really only care about detecting bit flips there, and we
have no need for the computational overhead of using a full-blown
cryptographic hash for that purpose.

Ciao,
Johannes

Re: [PATCH v1] Travis: also test on 32-bit Linux

2017-03-02 Thread Christian Couder

On Thu, Mar 2, 2017 at 3:22 PM, Johannes Schindelin
 wrote:
>
>> >> +set -e
>> >
>> > Is this really necessary? I really like to avoid `set -e`, in
>> > particular when we do pretty much everything in && chains anyway.
>>
>> Agreed, not really necessary here as we just invoke one command.  Out of
>> curiosity: Why do you try to avoid it? I set it by default in all my
>> scripts.
>
> I try to avoid it because it encourages a style that omits helpful error
> messages.

Yeah, we prefer to define and use a die() function like this:

die () {
printf >&2 '%s\n' "$*"
exit 1
}

do_something || die "meaningful error message"

Re: Rebase sequencer changes prevent exec commands from modifying the todo file?

2017-03-02 Thread Johannes Schindelin

Hi Stephen,

On Wed, 1 Mar 2017, Stephen Hicks wrote:

> I have a preferred rebase workflow wherein I generate my own TODO file
> with 'exec' lines that further modify the TODO file.  I recently noticed
> that these changes weren't sticking: they seemed to persist until the
> end of the exec'd command, and then as soon as the next command ran
> (i.e.  GIT_EDITOR=cat git rebase --edit-todo) they were gone.
> 
> I don't understand the changes that have been going through, but I suspect
> this is a result of the sequencer refactoring.  Is there a way to prevent
> git rebase from overwriting these changes?
> 
> To reproduce:
> $ git rebase -i HEAD -x "echo x false >> \"$(git rev-parse
> --git-dir)/rebase-merge/git-rebase-todo\""
> 
> This should cause the rebase to fail (and indeed, it does on 2.11.0.390)
> since it inserts an "x false".  But somewhere between there and
> 2.12.0.rc1.440, this behavior is changed.

Do you also modify the author-script file to execute arbitrary shell
commands? ;-)

Seriously again, it should not be too much of a deal to handle your use
case by re-reading the git-rebase-todo file if it has changed after an
`exec` has modified it. We already force a re-read of the index.

I won't be able to take care of that immediately, though, as I have a
pressing other patch series I need to get done.

If you want to take a crack at it in the meantime, I think this is where I
would start:

https://github.com/git-for-windows/git/blob/ce4c6ca554/sequencer.c#L2020-L2027

I would probably try to make the code smart by looking at the
timestamp/size/inode fields of the stat data of git-rebase-todo, and only
force a re-read in case those fields are different, the `res` variable is
0, and then the code would need to `continue;` in order to skip the
increment of `todo_list->current`.

And before that, I'd turn your example into a test case in
t/t3405-rebase-malformed.sh...

Ciao,
Johannes

git status reports file modified when only line-endings have changed (was git diff --quiet exits with 1 on clean tree with CRLF conversions)

2017-03-02 Thread Mike Crowe

On Tuesday 28 February 2017 at 19:06:44 +0100, Torsten Bögershausen wrote:
> My understanding is that git diff --quiet should be quiet, when
> git add will not do anything (but the file is "touched".
> The touched means that Git will detect e.g a new mtime or inode
> or file size when doing lstat().

Does the same apply to "git status"?

If so, then whilst investigating the "git diff --quiet" problems in this
thread I've found a similar bug with "git status". It reports the file has
modifications even if only the line-endings have changed, and issuing "git
add" causes the perceived modification to disappear.

It can be very confusing for users if "git status" reports a modification
but for "git diff" to claim that the files are identical.

This bug is still reproducible even with the fix from
https://public-inbox.org/git/xmqqshmyhtnu@gitster.mtv.corp.google.com/T/#m67cbfad1f2efe721f0c2afac2a1523b743bb57ca

Here's the test case. Test 3 is the part that currently fails:

commit de5f3f1d9161cdd46342689abe38a046fc71850e
Author: Mike Crowe 
Date:   Sat Feb 25 09:28:37 2017 +

status: Add tests for status output when file line endings change

diff --git a/t/t7518-status-eol-change.sh b/t/t7518-status-eol-change.sh
new file mode 100755
index 000..e18186f
--- /dev/null
+++ b/t/t7518-status-eol-change.sh
@@ -0,0 +1,37 @@
+#!/bin/sh
+#
+# Copyright (c) 2017 Mike Crowe
+#
+
+test_description='git status with files that require CRLF conversion'
+
+. ./test-lib.sh
+
+cat >expected_no_change < .gitattributes &&
+   printf "Hello\r\nWorld\r\n" > crlf.txt &&
+   printf "expected_no_change\nactual\n" > .gitignore &&
+   git add .gitignore .gitattributes crlf.txt &&
+   git commit -m "initial"
+'
+test_expect_success 'git status reports no change if file regenerated' '
+   printf "Hello\r\nWorld\r\n" > crlf.txt &&
+   git status >actual &&
+   test_cmp expected_no_change actual
+'
+test_expect_success 'git status reports no change if line endings change' '
+   printf "Hello\nWorld\n" > crlf.txt &&
+   git status >actual &&
+   test_cmp expected_no_change actual
+'
+test_expect_success 'git status reports no change if line ending change is 
staged' '
+   git add crlf.txt &&
+   git status >actual &&
+   test_cmp expected_no_change actual
+'
+test_done

Re: [PATCH v1] Travis: also test on 32-bit Linux

2017-03-02 Thread Ramsay Jones

On 02/03/17 11:24, Johannes Schindelin wrote:
> Hi Lars,
> 
> On Thu, 2 Mar 2017, Lars Schneider wrote:
> 
[snip]
>> One thing that still bugs me: In the Linux32 environment prove adds the
>> CPU times to every test run: ( 0.02 usr  0.00 sys +  0.00 cusr  0.00
>> csys ...  Has anyone an idea why that happens and how we can disable it?
> 
> I have no idea.

I have no idea either, but it is not unique to this 32bit Linux, but
rather the version of prove. For example, I am seeing this on Linux
Mint 18.1 (64bit _and_ 32bit), whereas Linux Mint 17.x did not do
this. (They used different Ubuntu LTS releases).

[Mint 18.1 'prove --version' says: TAP::Harness v3.35 and Perl v5.22.1]

ATB,
Ramsay Jones

Business proposal

2017-03-02 Thread Qatif Group of Companies>>>>>



Dear Friend,

I would like to discuss a very important issue with you. I am writing to
find out if this is your valid email. Please, let me know if this email is
valid

Kind regards
Adrien Saif
Attorney to Qatif Group of Companies

Kannst du mir antworten

2017-03-02 Thread Lori J Robinson

Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions

2017-03-02 Thread Mike Crowe

On Wednesday 01 March 2017 at 13:54:26 -0800, Junio C Hamano wrote:
> Now I thought about it through a bit more thoroughly, I think this
> is the right approach, so here is my (tenative) final version.
> 
> I seem to be getty really rusty---after all the codepaths involved
> are practically all my code and I should have noticed the real
> culprit during my first attempt X-<.
> 
> Thanks for helping.
> 
> -- >8 --
> Subject: [PATCH] diff: do not short-cut CHECK_SIZE_ONLY check in 
> diff_populate_filespec()
> 
> Callers of diff_populate_filespec() can choose to ask only for the
> size of the blob without grabbing the blob data, and the function,
> after running lstat() when the filespec points at a working tree
> file, returns by copying the value in size field of the stat
> structure into the size field of the filespec when this is the case.
> 
> However, this short-cut cannot be taken if the contents from the
> path needs to go through convert_to_git(), whose resulting real blob
> data may be different from what is in the working tree file.
> 
> As "git diff --quiet" compares the .size fields of filespec
> structures to skip content comparison, this bug manifests as a
> false "there are differences" for a file that needs eol conversion,
> for example.
> 
> Reported-by: Mike Crowe 
> Helped-by: Torsten Bögershausen 
> Signed-off-by: Junio C Hamano 
> ---
>  diff.c| 19 ++-
>  t/t0028-diff-converted.sh | 27 +++
>  2 files changed, 45 insertions(+), 1 deletion(-)
>  create mode 100755 t/t0028-diff-converted.sh
> 
> diff --git a/diff.c b/diff.c
> index 8c78fce49d..dc51dceb44 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2792,8 +2792,25 @@ int diff_populate_filespec(struct diff_filespec *s, 
> unsigned int flags)
>   s->should_free = 1;
>   return 0;
>   }
> - if (size_only)
> +
> + /*
> +  * Even if the caller would be happy with getting
> +  * only the size, we cannot return early at this
> +  * point if the path requires us to run the content
> +  * conversion.
> +  */
> + if (!would_convert_to_git(s->path) && size_only)
>   return 0;
> +
> + /*
> +  * Note: this check uses xsize_t(st.st_size) that may
> +  * not be the true size of the blob after it goes
> +  * through convert_to_git().  This may not strictly be
> +  * correct, but the whole point of big_file_threashold
> +  * and is_binary check being that we want to avoid
> +  * opening the file and inspecting the contents, this
> +  * is probably fine.
> +  */
>   if ((flags & CHECK_BINARY) &&
>   s->size > big_file_threshold && s->is_binary == -1) {
>   s->is_binary = 1;

This patch solves the problem for me. Including my tests where the file
size doesn't change but the file has been touched. It also doesn't have the
side effect of failing to report the extra trailing newline that the
original fix suffered from.

All the solutions presented so far do cause a small change in behaviour
when using git diff --quiet: they may now cause warning messages like:

 warning: CRLF will be replaced by LF in crlf.txt.
 The file will have its original line endings in your working directory.

to be emitted (unless of course core.safecrlf=false.) I think this is an
unavoidable side-effect of doing the job properly but it might be worth
mentioning.

> diff --git a/t/t0028-diff-converted.sh b/t/t0028-diff-converted.sh
> new file mode 100755
> index 00..3d5ab9565b
> --- /dev/null
> +++ b/t/t0028-diff-converted.sh
> @@ -0,0 +1,27 @@
> +#!/bin/sh
> +#
> +# Copyright (c) 2017 Mike Crowe
> +#
> +# These tests ensure that files changing line endings in the presence
> +# of .gitattributes to indicate that line endings should be ignored
> +# don't cause 'git diff' or 'git diff --quiet' to think that they have
> +# been changed.
> +
> +test_description='git diff with files that require CRLF conversion'
> +
> +. ./test-lib.sh
> +
> +test_expect_success setup '
> + echo "* text=auto" >.gitattributes &&
> + printf "Hello\r\nWorld\r\n" >crlf.txt &&
> + git add .gitattributes crlf.txt &&
> + git commit -m "initial"
> +'
> +
> +test_expect_success 'quiet diff works on file with line-ending change that 
> has no effect on repository' '
> + printf "Hello\r\nWorld\n" >crlf.txt &&
> + git status &&
> + git diff --quiet
> +'
> +
> +test_done

As I said before, this doesn't actually test the case when the file sizes
match. However, given the way that the code has changed the actual file
sizes are not compared, so perhaps this doesn't matter.

Thanks for all your help investigating this.

Mike.

Re: [PATCH v1] Travis: also test on 32-bit Linux

2017-03-02 Thread Johannes Schindelin

Hi Lars,


On Thu, 2 Mar 2017, Lars Schneider wrote:

> > On 02 Mar 2017, at 12:24, Johannes Schindelin  
> > wrote:
> > 
> > On Thu, 2 Mar 2017, Lars Schneider wrote:
> > 
> >> The patch looks good to me in general but I want to propose the
> >> following changes:
> > 
> > I know you are using your script to generate this mail, but I would
> > have liked to see v2 in the subject ;-)
> 
> Yeah, sorry. I already had a "D'oh" moment *after* I saw the email in my
> email client. Now I am wondering... is the next version v2 or v3 :D

Since there was no v2, the next one should *definitely* be v2... ;-)

> >> (1) Move all the docker magic into a dedicated file
> >> "ci/run-linux-32-build.sh" This way people should be able to run this
> >> build on their local machines without TravisCI. However, I haven't
> >> tested this.
> > 
> > I considered this, but there is serious overlap: the `docker pull`
> > call and the `docker run` call *have* to refer to the same image. It's
> > very easy for them to get out of sync if you have that information in
> > two files. Maybe make that an option of the script, defaulting to
> > daald/ubuntu32:xenial?
> 
> Right. I missed that. How about something like that?
> 
>   before_install:
> - ci/run-linux32-build.sh --pull-container
>   before_script:
>   script: ci/run-linux32-build.sh

I'd prefer

before_install:
  - docker pull daald/ubuntu32:xenial
before_script:
script: ci/run-linux32-build.sh daald/ubuntu32:xenial

> > BTW speaking of Docker: it would be nicer if there was a Docker image
> > that already had the build-essentials installed, to save on startup
> > time. But I did not find any that was reasonably up-to-date.
> 
> True. But installing everything just takes a minute and we don't need to
> maintain anything...

And when there are network problems (like there were on Tuesday, right
when I developed the first v1 of this patch) then we have another set of
problems that make Travis fail. Even if the code in the PR or branch is
actually good. I'd like to avoid false positives, if possible.

> >> +set -e
> > 
> > Is this really necessary? I really like to avoid `set -e`, in
> > particular when we do pretty much everything in && chains anyway.
> 
> Agreed, not really necessary here as we just invoke one command.  Out of
> curiosity: Why do you try to avoid it? I set it by default in all my
> scripts.

I try to avoid it because it encourages a style that omits helpful error
messages.

> >> +APT_INSTALL="apt update >/dev/null && apt install -y build-essential "\
> >> +"libcurl4-openssl-dev libssl-dev libexpat-dev gettext python >/dev/null"
> >> +
> >> +TEST_GIT_ENV="DEFAULT_TEST_TARGET=$DEFAULT_TEST_TARGET "\
> >> +"GIT_PROVE_OPTS=\"$GIT_PROVE_OPTS\" "\
> >> +"GIT_TEST_OPTS=\"$GIT_TEST_OPTS\" "\
> >> +"GIT_TEST_CLONE_2GB=$GIT_TEST_CLONE_2GB"
> >> +
> >> +TEST_GIT_CMD="linux32 --32bit i386 sh -c '"\
> >> +"'$APT_INSTALL && cd /usr/src/git && $TEST_GIT_ENV make -j2 test'"
> >> +
> >> +sudo docker run \
> >> +--interactive --volume "${PWD}:/usr/src/git" \
> >> +daald/ubuntu32:xenial /bin/bash -c "$TEST_GIT_CMD"
> > 
> > Hmm. Since it is a script now, it would be more readable this way, I
> > think:
> > 
> > sudo docker run --volume "${PWD}:/usr/src/git" 
> > "${1:-daald/ubuntu32:xenial}" \
> > linux32 --32bit i386 sh -c '
> > : update packages first &&
> > apt update >/dev/null &&
> > apt install -y build-essential libcurl4-openssl-dev libssl-dev \
> > libexpat-dev gettext python >/dev/null &&
> > 
> > : now build and test &&
> > cd /usr/src/git &&
> > DEFAULT_TEST_TARGET='"$DEFAULT_TEST_TARGET"' \
> > GIT_PROVE_OPTS='"$GIT_PROVE_OPTS"' \
> > GIT_TEST_OPTS='"$GIT_TEST_OPTS"' \
> > GIT_TEST_CLONE_2GB='"$GIT_TEST_CLONE_2GB"' \
> > make -j2 test
> > '
> 
> That looks better! I'll try it!

Thanks!
Dscho

Re: [PATCH] Put sha1dc on a diet

2017-03-02 Thread Johannes Schindelin

Hi Linus,

On Wed, 1 Mar 2017, Linus Torvalds wrote:

> On Wed, Mar 1, 2017 at 2:51 PM, Johannes Schindelin
>  wrote:
> >
> > But I think bigger than just developers on Windows OS. There are many
> > developers out there working on large repositories (yes, much larger
> > than Linux). Also using Macs and Linux. I am not at all sure that we
> > want to give them an updated Git they cannot fail to notice to be much
> > slower than before.
> 
> Johannes, have you *tried* the patches?
> 
> I really don't think you have. It is completely unnoticeable in any
> normal situation. The two cases that it's noticeable is:
> 
>  - a full fsck is noticeable slower
> 
>  - a full non-local clone is slower (but not really noticeably so
> since the network traffic dominates).
> 
> In other words, I think you're making shit up. I don't think you
> understand how little the SHA1 performance actually matters. It's
> noticeable in benchmarks. It's not noticeable in any normal operation.
> 
> .. and yes, I've actually been running the patches locally since I
> posted my first version (which apparently didn't go out to the list
> because of list size limits) and now running the version in 'pu'.

If you think that the Linux repository is a big one, then your reaction is
understandable.

I have zero interest in potty language, therefore my reply is very terse:
yes, I have been looking ad SHA-1 performance, and yes, it matters. Think
an index file of 300-400MB.

Ciao,
Johannes

Re: [PATCH v1] Travis: also test on 32-bit Linux

2017-03-02 Thread Lars Schneider


> On 02 Mar 2017, at 12:24, Johannes Schindelin  
> wrote:
> 
> Hi Lars,
> 
> On Thu, 2 Mar 2017, Lars Schneider wrote:
> 
>> The patch looks good to me in general but I want to propose the following
>> changes:
> 
> I know you are using your script to generate this mail, but I would have
> liked to see v2 in the subject ;-)

Yeah, sorry. I already had a "D'oh" moment *after* I saw the email in 
my email client. Now I am wondering... is the next version v2 or v3 :D


>> (1) Move all the docker magic into a dedicated file
>> "ci/run-linux-32-build.sh" This way people should be able to run this
>> build on their local machines without TravisCI. However, I haven't
>> tested this.
> 
> I considered this, but there is serious overlap: the `docker pull` call
> and the `docker run` call *have* to refer to the same image. It's very
> easy for them to get out of sync if you have that information in two
> files. Maybe make that an option of the script, defaulting to
> daald/ubuntu32:xenial?

Right. I missed that. How about something like that?

  before_install:
- ci/run-linux32-build.sh --pull-container
  before_script:
  script: ci/run-linux32-build.sh


> BTW speaking of Docker: it would be nicer if there was a Docker image that
> already had the build-essentials installed, to save on startup time. But I
> did not find any that was reasonably up-to-date.

True. But installing everything just takes a minute and we don't need to
maintain anything...


>> +set -e
> 
> Is this really necessary? I really like to avoid `set -e`, in particular
> when we do pretty much everything in && chains anyway.

Agreed, not really necessary here as we just invoke one command.
Out of curiosity: Why do you try to avoid it? I set it by default in all 
my scripts.


>> +APT_INSTALL="apt update >/dev/null && apt install -y build-essential "\
>> +"libcurl4-openssl-dev libssl-dev libexpat-dev gettext python >/dev/null"
>> +
>> +TEST_GIT_ENV="DEFAULT_TEST_TARGET=$DEFAULT_TEST_TARGET "\
>> +"GIT_PROVE_OPTS=\"$GIT_PROVE_OPTS\" "\
>> +"GIT_TEST_OPTS=\"$GIT_TEST_OPTS\" "\
>> +"GIT_TEST_CLONE_2GB=$GIT_TEST_CLONE_2GB"
>> +
>> +TEST_GIT_CMD="linux32 --32bit i386 sh -c '"\
>> +"'$APT_INSTALL && cd /usr/src/git && $TEST_GIT_ENV make -j2 test'"
>> +
>> +sudo docker run \
>> +--interactive --volume "${PWD}:/usr/src/git" \
>> +daald/ubuntu32:xenial /bin/bash -c "$TEST_GIT_CMD"
> 
> Hmm. Since it is a script now, it would be more readable this way, I
> think:
> 
> sudo docker run --volume "${PWD}:/usr/src/git" "${1:-daald/ubuntu32:xenial}" \
> linux32 --32bit i386 sh -c '
>   : update packages first &&
>   apt update >/dev/null &&
>   apt install -y build-essential libcurl4-openssl-dev libssl-dev \
>   libexpat-dev gettext python >/dev/null &&
> 
>   : now build and test &&
>   cd /usr/src/git &&
>   DEFAULT_TEST_TARGET='"$DEFAULT_TEST_TARGET"' \
>   GIT_PROVE_OPTS='"$GIT_PROVE_OPTS"' \
>   GIT_TEST_OPTS='"$GIT_TEST_OPTS"' \
>   GIT_TEST_CLONE_2GB='"$GIT_TEST_CLONE_2GB"' \
>   make -j2 test
> '

That looks better! I'll try it!

- Lars

Re: [PATCH] Put sha1dc on a diet

2017-03-02 Thread Johannes Schindelin

Hi Peff,

On Wed, 1 Mar 2017, Jeff King wrote:

> I do think that could argue for turning on the collision detection only
> during object-write operations, which is where it matters. It would be
> really trivial to flip the "check collisions" bit on sha1dc. But I
> suspect you could go faster still by compiling against two separate
> implementations: the fast-as-possible one (which could be openssl or
> blk-sha1), and the slower-but-careful sha1dc.

Given the speed difference between OpenSSL and sha1dc, it would be a wise
thing indeed to do sha1dc only where objects enter from possibly untrusted
sources, and use OpenSSL for all other hashing.

Ciao,
Johannes

Re: BUG Report: v12.0.0 "git difftool -d" fails with "fatal: cannot create directory at '': No such file or directory"

2017-03-02 Thread Johannes Schindelin

Hi Mark,

On Thu, 2 Mar 2017, Mark Phillips wrote:

> I am building git from source using gcc 4.8.5 on 64 bit linux.
> 
> I am sorry the log info is not very helpful, please tell me how to get
> more information about what is going wrong and I will collect the info
> for you!

Well, you could provide a Minimal, Complete and Verifiable Example
(http://stackoverflow.com/help/mcve): most likely, the error depends on
particular features of your current HEAD, or environment variables.
Because i do *not* get that error if I run `git difftool -d HEAD~`.

So it is not so much about collecting info as about distilling your use
case into an example that allows other developers to reproduce the
problem, and subsequently fix it.

Ciao,
Johannes

Re: [PATCH] README: create HTTP/HTTPS links from URLs in Markdown

2017-03-02 Thread Jeff King

On Wed, Mar 01, 2017 at 10:22:04PM +, Eric Wong wrote:

> Markdown supports automatic links by surrounding URLs with
> angle brackets, as documented in
> 

One of the joys of markdown is that there are so many variants. A lot of
them (including GitHub-flavored markdown) will linkify URLs even when
they're not inside angle brackets.

So I don't mind this patch, but I'm curious what's rendering the
markdown you're seeing. I'd think online that one would either come
across the raw text, or the GFM from https://github.com/git/git.

-Peff

Re: [PATCH v5 07/24] files-backend: add and use files_refname_path()

2017-03-02 Thread Duy Nguyen

On Wed, Mar 1, 2017 at 12:41 AM, Michael Haggerty  wrote:
> On 02/22/2017 03:04 PM, Nguyễn Thái Ngọc Duy wrote:
>> Keep repo-related path handling in one place. This will make it easier
>> to add submodule/multiworktree support later.
>>
>> This automatically adds the "if submodule then use the submodule version
>> of git_path" to other call sites too. But it does not mean those
>> operations are sumodule-ready. Not yet.
>
> s/sumodule/submodule/
>
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>> ---
>>  refs/files-backend.c | 45 +
>>  1 file changed, 25 insertions(+), 20 deletions(-)
>>
>> diff --git a/refs/files-backend.c b/refs/files-backend.c
>> index 7b4ea4c56..72f4e1746 100644
>> --- a/refs/files-backend.c
>> +++ b/refs/files-backend.c
>> @@ -1180,6 +1180,18 @@ static void files_reflog_path(struct files_ref_store 
>> *refs,
>>   strbuf_git_path(sb, "logs/%s", refname);
>>  }
>>
>> +static void files_refname_path(struct files_ref_store *refs,
>> +struct strbuf *sb,
>> +const char *refname)
>> +{
>> + if (refs->submodule) {
>> + strbuf_git_path_submodule(sb, refs->submodule, "%s", refname);
>> + return;
>> + }
>> +
>> + strbuf_git_path(sb, "%s", refname);
>> +}
>
> Maybe it's just me, but I find it odd to exit early here when the first
> exit isn't due to an error. For me, structuring this like `if ()
> call1(); else call2();` would make it clearer that the two code paths
> are equally-valid alternatives, and either one or the other will be
> executed.

Its original version probably looked better. This is another case of
future patches influencing back the past ones: I structure the patch
so that in future we mostly add lines, or delete whole (in this case,
I believe), not modify a lot of lines. I think the readability does
not degrade too much though, so it's probably ok.
-- 
Duy

Re: [PATCH v5 08/24] files-backend: remove the use of git_path()

2017-03-02 Thread Duy Nguyen

On Wed, Mar 1, 2017 at 12:50 AM, Michael Haggerty  wrote:
>> @@ -995,7 +998,11 @@ static struct ref_store *files_ref_store_create(const 
>> char *submodule)
>>   return ref_store;
>>   }
>>
>> - refs->packed_refs_path = git_pathdup("packed-refs");
>> + refs->gitdir = xstrdup(gitdir);
>> + get_common_dir_noenv(&sb, gitdir);
>> + refs->gitcommondir = strbuf_detach(&sb, NULL);
>> + strbuf_addf(&sb, "%s/packed-refs", refs->gitcommondir);
>> + refs->packed_refs_path = strbuf_detach(&sb, NULL);
>
> `git_path()` and friends avoid adding an extra `/` if `git_dir()`
> already ends in a slash or if it is the empty string. Here you don't
> have that functionality. Is that intentional?

Kind of. I noticed that behavior but the thinking was, this $GIT_DIR
thing is going to get replaced soon anyway, and because the the uses
of these paths are very clear (in three groups) that avoiding
redundant slashes does not buy us anything extra, it's just more code.
-- 
Duy

Re: [PATCH v5 05/24] files-backend: move "logs/" out of TMP_RENAMED_LOG

2017-03-02 Thread Duy Nguyen

On Wed, Mar 1, 2017 at 12:19 AM, Michael Haggerty  wrote:
>> @@ -2513,7 +2513,7 @@ static int files_delete_refs(struct ref_store 
>> *ref_store,
>>   * IOW, to avoid cross device rename errors, the temporary renamed log must
>>   * live into logs/refs.
>>   */
>> -#define TMP_RENAMED_LOG  "logs/refs/.tmp-renamed-log"
>> +#define TMP_RENAMED_LOG  "refs/.tmp-renamed-log"
>
> The constant name feels a little bit misleading now that it is not the
> name of a logfile but rather a reference name. OTOH "tmp-renamed-log" is
> *in* the reference name so I guess it's not really wrong.

Heh.. I had a similar internal debate and almost renamed it to
tmp_renamed_refname. But then it's technically not a valid ref name
either (starting with a leading dot). My lazy side came in and
declared that doing nothing was always the right way.

>>  struct rename_cb {
>>   const char *tmp_renamed_log;
>> @@ -2549,7 +2549,7 @@ static int rename_tmp_log(const char *newrefname)
>>   int ret;
>>
>>   strbuf_git_path(&path, "logs/%s", newrefname);
>> - strbuf_git_path(&tmp, TMP_RENAMED_LOG);
>> + strbuf_git_path(&tmp, "logs/%s", TMP_RENAMED_LOG);
>>   cb.tmp_renamed_log = tmp.buf;
>>   ret = raceproof_create_file(path.buf, rename_tmp_log_callback, &cb);
>>   if (ret) {
>> @@ -2626,12 +2626,12 @@ static int files_rename_ref(struct ref_store 
>> *ref_store,
>>   return 1;
>>
>>   strbuf_git_path(&sb_oldref, "logs/%s", oldrefname);
>> - strbuf_git_path(&tmp_renamed_log, TMP_RENAMED_LOG);
>> + strbuf_git_path(&tmp_renamed_log, "logs/%s", TMP_RENAMED_LOG);
>>   ret = log && rename(sb_oldref.buf, tmp_renamed_log.buf);
>>   strbuf_release(&sb_oldref);
>>   strbuf_release(&tmp_renamed_log);
>>   if (ret)
>> - return error("unable to move logfile logs/%s to 
>> "TMP_RENAMED_LOG": %s",
>> + return error("unable to move logfile logs/%s to 
>> logs/"TMP_RENAMED_LOG": %s",
>>   oldrefname, strerror(errno));
>
> It seems like it would be preferable to use `sb_oldref.buf` and
> `tmp.buf` when building the error message. But I guess that `tmp.buf`
> might include some path preceding "logs/" that is unwanted in the error
> message? But it's a shame to hardcode the file naming scheme here again.
>
> Maybe we *do* want the path in the error message?

It's an error, every piece of details matters. So yeah I'm inclined we
should print full path.

> It just occurred to me: this temporary logfile lives in the main
> repository, right? What if a worktree reference is being renamed? Part
> of the advertised use of worktrees is that the worktree might live far
> from the main directory, or even on removable media. But it's not
> possible to rename files across partitions. Maybe this will come out in
> the wash once worktrees are ref_stores themselves.

The actual working directory may be separated, but all the things that
belong to .git (even of a linked worktree) stay in the main worktree's
.git directory. And I don't think we ever support having a .git
directory on multiple partitions. You can rename refs freely even when
the worktree is on a detached removable drive.

> For that matter, what if a user tries to rename a worktree ref into a
> common ref or vice versa?

Interesting. It should work, it's just a
rename(".git/worktrees/blah/refs/bisect/good",
".git/refs/heads/saved") after the path translation done by
git_path().
-- 
Duy

Re: [PATCH v5 24/24] t1406: new tests for submodule ref store

2017-03-02 Thread Duy Nguyen

On Thu, Mar 2, 2017 at 3:16 PM, Michael Haggerty  wrote:
> The for-each-ref iteration is defined to be sorted by refname, but it's
> true that the for-each-reflog order is undefined.

Thanks. I'm going to add a few lines about this near for_each_ref and
for_each_reflog() then, assuming that this is the desired behavior,
not a bug. (Another patch, because I'm not happy until it gets 30+)
-- 
Duy

Re: [PATCH v5 04/24] files-backend: convert git_path() to strbuf_git_path()

2017-03-02 Thread Duy Nguyen

On Wed, Mar 1, 2017 at 12:06 AM, Michael Haggerty  wrote:
>> @@ -2681,13 +2709,19 @@ static int files_rename_ref(struct ref_store 
>> *ref_store,
>>   log_all_ref_updates = flag;
>>
>>   rollbacklog:
>> - if (logmoved && rename(git_path("logs/%s", newrefname), 
>> git_path("logs/%s", oldrefname)))
>> + strbuf_git_path(&sb_newref, "logs/%s", newrefname);
>> + strbuf_git_path(&sb_oldref, "logs/%s", oldrefname);
>> + if (logmoved && rename(sb_newref.buf, sb_oldref.buf))
>>   error("unable to restore logfile %s from %s: %s",
>>   oldrefname, newrefname, strerror(errno));
>> + strbuf_git_path(&tmp_renamed_log, TMP_RENAMED_LOG);
>>   if (!logmoved && log &&
>> - rename(git_path(TMP_RENAMED_LOG), git_path("logs/%s", oldrefname)))
>> + rename(tmp_renamed_log.buf, sb_oldref.buf))
>>   error("unable to restore logfile %s from "TMP_RENAMED_LOG": 
>> %s",
>>   oldrefname, strerror(errno));
>
> It feels like you're writing, releasing, re-writing these strbufs more
> than necessary. Maybe it would be clearer to set them once, and on
> errors set `ret = error()` then jump to a label here...
>
>> + strbuf_release(&sb_newref);
>> + strbuf_release(&sb_oldref);
>> + strbuf_release(&tmp_renamed_log);
>>
>
> ...and change this to `return ret`?

If that's ok with you, will do. I kept the patch dumb so the reader
does not have to keep track of many things when they check if there
are any behavior changes.

>>  static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
>>  {
>> + struct strbuf sb = STRBUF_INIT;
>> +
>>   /* Check validity (but we don't need the result): */
>>   files_downcast(ref_store, 0, "init_db");
>>
>>   /*
>>* Create .git/refs/{heads,tags}
>>*/
>> - safe_create_dir(git_path("refs/heads"), 1);
>> - safe_create_dir(git_path("refs/tags"), 1);
>> + strbuf_git_path(&sb, "refs/heads");
>> + safe_create_dir(sb.buf, 1);
>> + strbuf_reset(&sb);
>> + strbuf_git_path(&sb, "refs/tags");
>> + safe_create_dir(sb.buf, 1);
>> + strbuf_reset(&sb);
>>   if (get_shared_repository()) {
>> - adjust_shared_perm(git_path("refs/heads"));
>> - adjust_shared_perm(git_path("refs/tags"));
>> + strbuf_git_path(&sb, "refs/heads");
>> + adjust_shared_perm(sb.buf);
>> + strbuf_reset(&sb);
>> + strbuf_git_path(&sb, "refs/tags");
>> + adjust_shared_perm(sb.buf);
>>   }
>> + strbuf_release(&sb);
>>   return 0;
>>  }
>
> It looks to me like `safe_create_dir()` already has the ability to
> `adjust_shared_perm()`, or am I missing something? (I realize that this
> is preexisting code.)

Yeah you're right. I guess this code was written when
safe_create_dir() didn't do that. That certainly shortens this patch.
-- 
Duy

Re: [PATCH v1] Travis: also test on 32-bit Linux

2017-03-02 Thread Johannes Schindelin

Hi Lars,

On Thu, 2 Mar 2017, Lars Schneider wrote:

> The patch looks good to me in general but I want to propose the following
> changes:

I know you are using your script to generate this mail, but I would have
liked to see v2 in the subject ;-)

> (1) Move all the docker magic into a dedicated file
> "ci/run-linux-32-build.sh" This way people should be able to run this
> build on their local machines without TravisCI. However, I haven't
> tested this.

I considered this, but there is serious overlap: the `docker pull` call
and the `docker run` call *have* to refer to the same image. It's very
easy for them to get out of sync if you have that information in two
files. Maybe make that an option of the script, defaulting to
daald/ubuntu32:xenial?

BTW speaking of Docker: it would be nicer if there was a Docker image that
already had the build-essentials installed, to save on startup time. But I
did not find any that was reasonably up-to-date.

> (2) The docker build command inherits the Git test environment
> variables.  This way we use the same environment variables as in all
> other TravisCI builds (plus it would use *your* variables if you run it
> locally).

Good!

> (3) Silence the apt update/git output as is it clutters the log.  I did
> not silence stderr output!

Also good!

> (4) Remove (to my knowledge) superfluous "compiler: clang" in the
> Linux32 job.

I copied that from one of your experimental .travis.yml patches ;-)

> I added my sign-off. I hope this is the right thing to do in this "I
> took your patch and changed it to suggest an improvement" situation.

Absolutely. Thank you for taking it from here.

> One thing that still bugs me: In the Linux32 environment prove adds the
> CPU times to every test run: ( 0.02 usr  0.00 sys +  0.00 cusr  0.00
> csys ...  Has anyone an idea why that happens and how we can disable it?

I have no idea.

> diff --git a/ci/run-linux32-build.sh b/ci/run-linux32-build.sh
> new file mode 100755
> index 00..b892fbdc9e
> --- /dev/null
> +++ b/ci/run-linux32-build.sh
> @@ -0,0 +1,21 @@
> +#!/bin/sh
> +#
> +# Build and test Git in a docker container running a 32-bit Ubuntu Linux
> +#
> +
> +set -e

Is this really necessary? I really like to avoid `set -e`, in particular
when we do pretty much everything in && chains anyway.

> +APT_INSTALL="apt update >/dev/null && apt install -y build-essential "\
> +"libcurl4-openssl-dev libssl-dev libexpat-dev gettext python >/dev/null"
> +
> +TEST_GIT_ENV="DEFAULT_TEST_TARGET=$DEFAULT_TEST_TARGET "\
> +"GIT_PROVE_OPTS=\"$GIT_PROVE_OPTS\" "\
> +"GIT_TEST_OPTS=\"$GIT_TEST_OPTS\" "\
> +"GIT_TEST_CLONE_2GB=$GIT_TEST_CLONE_2GB"
> +
> +TEST_GIT_CMD="linux32 --32bit i386 sh -c '"\
> +"'$APT_INSTALL && cd /usr/src/git && $TEST_GIT_ENV make -j2 test'"
> +
> +sudo docker run \
> +--interactive --volume "${PWD}:/usr/src/git" \
> +daald/ubuntu32:xenial /bin/bash -c "$TEST_GIT_CMD"

Hmm. Since it is a script now, it would be more readable this way, I
think:

sudo docker run --volume "${PWD}:/usr/src/git" "${1:-daald/ubuntu32:xenial}" \
linux32 --32bit i386 sh -c '
: update packages first &&
apt update >/dev/null &&
apt install -y build-essential libcurl4-openssl-dev libssl-dev \
libexpat-dev gettext python >/dev/null &&

: now build and test &&
cd /usr/src/git &&
DEFAULT_TEST_TARGET='"$DEFAULT_TEST_TARGET"' \
GIT_PROVE_OPTS='"$GIT_PROVE_OPTS"' \
GIT_TEST_OPTS='"$GIT_TEST_OPTS"' \
GIT_TEST_CLONE_2GB='"$GIT_TEST_CLONE_2GB"' \
make -j2 test
'

This is completely untested (pun intended ;-))...

Ciao,
Dscho

Re: git status --> Out of memory, realloc failed

2017-03-02 Thread Duy Nguyen

On Thu, Mar 2, 2017 at 3:12 AM, Carsten Fuchs  wrote:
>>> The repository is tracking about 19000 files which together take 260 MB.
>>> The git server version is 2.7.4.1.g5468f9e (Bitbucket)
>>
>>
>> Is your repository publicly accessible?
>
>
> Unfortunately, no. There are no big secrets in there, but just a couple of
> database details so that I cannot make it universally available. I can
> gladly give you access though. (E.g. by adding your public SSH key?)

You could also try "git fast-export --anonymize" (read the doc first).
I'm not sure if it keeps blob size though, and delta chain might also
matter so an anonymized repo may or may not share the same problem,
I'm not sure.
-- 
Duy

BUG Report: v12.0.0 "git difftool -d" fails with "fatal: cannot create directory at '': No such file or directory"

2017-03-02 Thread Mark Phillips

I am building git from source using gcc 4.8.5 on 64 bit linux.

I am sorry the log info is not very helpful, please tell me how to get more 
information about what is going wrong and I will collect the info for you!


V2.12.0 Broken log
==

Export GIT_TRACE=1
git difftool -d HEAD~ 
09:11:27.797674 git.c:371   trace: built-in: git 'difftool' '-d' 
'HEAD~'
09:11:27.798255 run-command.c:369   trace: run_command: 'diff' '--raw' 
'--no-abbrev' '-z' 'HEAD~'
09:11:27.798484 exec_cmd.c:118  trace: exec: 'git' 'diff' '--raw' 
'--no-abbrev' '-z' 'HEAD~'
09:11:27.800156 git.c:371   trace: built-in: git 'diff' '--raw' 
'--no-abbrev' '-z' 'HEAD~'
fatal: cannot create directory at '': No such file or directory


Running make check gives
fixed   0
success 14475
failed  0
broken  184
total   14804


Also broken in v2.12.0-rc0, v2.12.0-rc1 and v2.12.0-rc2
-

V 2.11.1 works
===

[PATCH v1] Travis: also test on 32-bit Linux

2017-03-02 Thread Lars Schneider

From: Johannes Schindelin 

When Git v2.9.1 was released, it had a bug that showed only on Windows
and on 32-bit systems: our assumption that `unsigned long` can hold
64-bit values turned out to be wrong.

This could have been caught earlier if we had a Continuous Testing
set up that includes a build and test run on 32-bit Linux.

Let's do this (and take care of the Windows build later). This patch
asks Travis CI to install a Docker image with 32-bit libraries and then
goes on to build and test Git using this 32-bit setup.

Signed-off-by: Johannes Schindelin 
Signed-off-by: Lars Schneider 
---

Thanks for the patch Dscho!

The patch looks good to me in general but I want to propose the following
changes:

(1) Move all the docker magic into a dedicated file "ci/run-linux-32-build.sh"
This way people should be able to run this build on their local machines
without TravisCI. However, I haven't tested this.

(2) The docker build command inherits the Git test environment variables.
This way we use the same environment variables as in all other TravisCI
builds (plus it would use *your* variables if you run it locally).

(3) Silence the apt update/git output as is it clutters the log.
I did not silence stderr output!

(4) Remove (to my knowledge) superfluous "compiler: clang" in the Linux32 job.

I added my sign-off. I hope this is the right thing to do in this "I took your
patch and changed it to suggest an improvement" situation.

You can see a successful run here:
https://travis-ci.org/larsxschneider/git/jobs/206945950


One thing that still bugs me: In the Linux32 environment prove adds the
CPU times to every test run: ( 0.02 usr  0.00 sys +  0.00 cusr  0.00 csys ...
Has anyone an idea why that happens and how we can disable it?


Cheers,
Lars


Notes:
Base Ref:
Web-Diff: https://github.com/larsxschneider/git/commit/82995ed59c
Checkout: git fetch https://github.com/larsxschneider/git 
travisci/linux32-v1 && git checkout 82995ed59c

 .travis.yml |  9 +
 ci/run-linux32-build.sh | 21 +
 2 files changed, 30 insertions(+)
 create mode 100755 ci/run-linux32-build.sh

diff --git a/.travis.yml b/.travis.yml
index 9c63c8c3f6..c8c789c437 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -39,6 +39,15 @@ env:

 matrix:
   include:
+- env: Linux32
+  os: linux
+  sudo: required
+  services:
+- docker
+  before_install:
+- docker pull daald/ubuntu32:xenial
+  before_script:
+  script: ci/run-linux32-build.sh
 - env: Documentation
   os: linux
   compiler: clang
diff --git a/ci/run-linux32-build.sh b/ci/run-linux32-build.sh
new file mode 100755
index 00..b892fbdc9e
--- /dev/null
+++ b/ci/run-linux32-build.sh
@@ -0,0 +1,21 @@
+#!/bin/sh
+#
+# Build and test Git in a docker container running a 32-bit Ubuntu Linux
+#
+
+set -e
+
+APT_INSTALL="apt update >/dev/null && apt install -y build-essential "\
+"libcurl4-openssl-dev libssl-dev libexpat-dev gettext python >/dev/null"
+
+TEST_GIT_ENV="DEFAULT_TEST_TARGET=$DEFAULT_TEST_TARGET "\
+"GIT_PROVE_OPTS=\"$GIT_PROVE_OPTS\" "\
+"GIT_TEST_OPTS=\"$GIT_TEST_OPTS\" "\
+"GIT_TEST_CLONE_2GB=$GIT_TEST_CLONE_2GB"
+
+TEST_GIT_CMD="linux32 --32bit i386 sh -c "\
+"'$APT_INSTALL && cd /usr/src/git && $TEST_GIT_ENV make -j2 test'"
+
+sudo docker run \
+--interactive --volume "${PWD}:/usr/src/git" \
+daald/ubuntu32:xenial /bin/bash -c "$TEST_GIT_CMD"

base-commit: 3bc53220cb2dcf709f7a027a3f526befd021d858
--
2.11.1

[PATCH] add--interactive: fix missing file prompt for patch mode with "-i"

2017-03-02 Thread Jeff King

When invoked as "git add -i", each menu interactive menu
option prompts the user to select a list of files. This
includes the "patch" option, which gets the list before
starting the hunk-selection loop.

As "git add -p", it behaves differently, and jumps straight
to the hunk selection loop.

Since 0539d5e6d (i18n: add--interactive: mark patch prompt
for translation, 2016-12-14), the "add -i" case mistakenly
jumps to straight to the hunk-selection loop. Prior to that
commit the distinction between the two cases was managed by
the $patch_mode variable. That commit used $patch_mode for
something else, and moved the old meaning to the "$cmd"
variable.  But it forgot to update the $patch_mode check
inside patch_update_cmd() which controls the file-list
behavior.

The simplest fix would be to change that line to check $cmd.
But while we're here, let's use a less obscure name for this
flag: $patch_mode_only, a boolean which tells whether we are
in full-interactive mode or only in patch-mode.

Reported-by: Henrik Grubbström 
Signed-off-by: Jeff King 
---
 git-add--interactive.perl  |  8 
 t/t3701-add-interactive.sh | 18 ++
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/git-add--interactive.perl b/git-add--interactive.perl
index 982593c89..f5c816e27 100755
--- a/git-add--interactive.perl
+++ b/git-add--interactive.perl
@@ -92,7 +92,7 @@ sub colored {
 }
 
 # command line options
-my $cmd;
+my $patch_mode_only;
 my $patch_mode;
 my $patch_mode_revision;
 
@@ -1299,7 +1299,7 @@ sub patch_update_cmd {
}
return 0;
}
-   if ($patch_mode) {
+   if ($patch_mode_only) {
@them = @mods;
}
else {
@@ -1721,7 +1721,7 @@ sub process_args {
die sprintf(__("invalid argument %s, expecting --"),
   $arg) unless $arg eq "--";
%patch_mode_flavour = %{$patch_modes{$patch_mode}};
-   $cmd = 1;
+   $patch_mode_only = 1;
}
elsif ($arg ne "--") {
die sprintf(__("invalid argument %s, expecting --"), $arg);
@@ -1758,7 +1758,7 @@ sub main_loop {
 
 process_args();
 refresh();
-if ($cmd) {
+if ($patch_mode_only) {
patch_update_cmd();
 }
 else {
diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 5ffe78e92..aaa258daa 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -394,4 +394,22 @@ test_expect_success 'diffs can be colorized' '
grep "$(printf "\\033")" output
 '
 
+test_expect_success 'patch-mode via -i prompts for files' '
+   git reset --hard &&
+
+   echo one >file &&
+   echo two >test &&
+   git add -i <<-\EOF &&
+   patch
+   test
+
+   y
+   quit
+   EOF
+
+   echo test >expect &&
+   git diff --cached --name-only >actual &&
+   test_cmp expect actual
+'
+
 test_done
-- 
2.12.0.367.gb23790f66

[PATCH v2 8/8] checkout: restrict @-expansions when finding branch

2017-03-02 Thread Jeff King

When we parse "git checkout $NAME", we try to interpret
$NAME as a local branch-name. If it is, then we point HEAD
to that branch. Otherwise, we detach the HEAD at whatever
commit $NAME points to.

We do the interpretation by calling strbuf_branchname(), and
then blindly sticking "refs/heads/" on the front. This leads
to nonsense results when expansions like "@{upstream}" or
"@" point to something besides a local branch. We end up
with a local branch name like "refs/heads/origin/master" or
"refs/heads/HEAD".

Normally this has no user-visible effect because those
branches don't exist, and so we fallback to feeding the
result to get_sha1(), which resolves them correctly.

But as the new test in t3204 shows, there are corner cases
where the effect is observable, and we check out the wrong
local branch rather than detaching to the correct one.

Signed-off-by: Jeff King 
---
 builtin/checkout.c|  2 +-
 t/t3204-branch-name-interpretation.sh | 10 ++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 05eefd994..81f07c3ef 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -452,7 +452,7 @@ static void setup_branch_path(struct branch_info *branch)
 {
struct strbuf buf = STRBUF_INIT;
 
-   strbuf_branchname(&buf, branch->name, 0);
+   strbuf_branchname(&buf, branch->name, INTERPRET_BRANCH_LOCAL);
if (strcmp(buf.buf, branch->name))
branch->name = xstrdup(buf.buf);
strbuf_splice(&buf, 0, 0, "refs/heads/", 11);
diff --git a/t/t3204-branch-name-interpretation.sh 
b/t/t3204-branch-name-interpretation.sh
index 05e88f92d..698d9cc4f 100755
--- a/t/t3204-branch-name-interpretation.sh
+++ b/t/t3204-branch-name-interpretation.sh
@@ -120,4 +120,14 @@ test_expect_success 'delete branch named "@"' '
expect_deleted refs/heads/@
 '
 
+test_expect_success 'checkout does not treat remote @{upstream} as a branch' '
+   git update-ref refs/remotes/origin/checkout one &&
+   git branch --set-upstream-to=origin/checkout &&
+   git update-ref refs/heads/origin/checkout two &&
+   git update-ref refs/heads/remotes/origin/checkout two &&
+
+   git checkout @{upstream} &&
+   expect_branch HEAD one
+'
+
 test_done
-- 
2.12.0.367.gb23790f66

Re: [PATCH v1 1/1] git diff --quiet exits with 1 on clean tree with CRLF conversions

2017-03-02 Thread Jeff King

On Wed, Mar 01, 2017 at 01:54:26PM -0800, Junio C Hamano wrote:

> -- >8 --
> Subject: [PATCH] diff: do not short-cut CHECK_SIZE_ONLY check in 
> diff_populate_filespec()

Thanks, this is well-explained, and the new comments in the code really
help.

I wondered if we should be checking would_convert_to_git() in
reuse_worktree_file(), but we already do. It's just that we may still
end up in this code-path when we're _actually_ diffing the working tree
file, not just trying to optimize.

> diff --git a/diff.c b/diff.c
> index 8c78fce49d..dc51dceb44 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2792,8 +2792,25 @@ int diff_populate_filespec(struct diff_filespec *s, 
> unsigned int flags)
>   s->should_free = 1;
>   return 0;
>   }
> - if (size_only)
> +
> + /*
> +  * Even if the caller would be happy with getting
> +  * only the size, we cannot return early at this
> +  * point if the path requires us to run the content
> +  * conversion.
> +  */
> + if (!would_convert_to_git(s->path) && size_only)
>   return 0;

The would_convert_to_git() function is a little expensive (it may have
to do an attribute lookup). It may be worth swapping the two halves of
the conditional here to get the short-circuit.

It may not matter much in practice, though, because in the !size_only
case we'd make the same query lower a few lines later (and in theory
expensive bits of the attr lookup are cached).

> +
> + /*
> +  * Note: this check uses xsize_t(st.st_size) that may
> +  * not be the true size of the blob after it goes
> +  * through convert_to_git().  This may not strictly be
> +  * correct, but the whole point of big_file_threashold

s/threashold/threshold/

> +  * and is_binary check being that we want to avoid
> +  * opening the file and inspecting the contents, this
> +  * is probably fine.
> +  */
>   if ((flags & CHECK_BINARY) &&
>   s->size > big_file_threshold && s->is_binary == -1) {
>   s->is_binary = 1;

I'm trying to think how this "not strictly correct" could bite us. For
line-ending conversion, I'd say that the before/after are going to be
approximately the same size. But what about something like LFS? If I
have a 600MB file that convert_to_git() filters into a short LFS
pointer, I think this changes the behavior. Before, we would diff the
pointer file, but now we'll get "binary file changed".

I wonder if we should take the opposite approach, and ignore
big_file_threshold for converted files. One assumes that such gigantic
files are binary, and therefore do not have line endings to convert. And
any filtering has a reasonable chance of condensing them to something
much smaller.

I dunno. I'm sure somebody has some horrific 500MB-filtering example
that can prove me wrong.

-Peff

Re: [PATCH v2] fixing corner-cases with interpret_branch_name()

2017-03-02 Thread Jacob Keller

On Thu, Mar 2, 2017 at 12:21 AM, Jeff King  wrote:
> This is a re-roll of the series from:
>
>   
> http://public-inbox.org/git/20170228120633.zkwfqms57fk7d...@sigill.intra.peff.net/
>
> Thanks Junio and Jake for reviewing the original. This is mostly the
> same, but:
>
>   - it fixes the case where "branch -r @{-1}" mistakes a local branch
> for a remote (and adds a test)
>
>   - as a result of the above fix, the series needs to be applied on top
> of jk/auto-namelen-in-interpret-branch-name.
>
>   - I clarified the history in the commit message of patch 4
>
>   - the commit message for patch 4 now explicitly mentions which
> callers can be left alone (so anybody blaming the history won't
> think they were simply forgotten).
>
> With the exception of patch 6 flipping the failure/success bit on the
> new test, the rest of the patches should be identical.
>

I didn't find any comments, it was quite pleasant and well explained.

Thanks,
Jake

>   [1/8]: interpret_branch_name: move docstring to header file
>   [2/8]: strbuf_branchname: drop return value
>   [3/8]: strbuf_branchname: add docstring
>   [4/8]: interpret_branch_name: allow callers to restrict expansions
>   [5/8]: t3204: test git-branch @-expansion corner cases
>   [6/8]: branch: restrict @-expansions when deleting
>   [7/8]: strbuf_check_ref_format(): expand only local branches
>   [8/8]: checkout: restrict @-expansions when finding branch
>
>  builtin/branch.c  |   5 +-
>  builtin/checkout.c|   2 +-
>  builtin/merge.c   |   2 +-
>  cache.h   |  32 +++-
>  refs.c|   2 +-
>  revision.c|   2 +-
>  sha1_name.c   |  92 ---
>  strbuf.h  |  21 +-
>  t/t3204-branch-name-interpretation.sh | 133 
> ++
>  9 files changed, 240 insertions(+), 51 deletions(-)
>  create mode 100755 t/t3204-branch-name-interpretation.sh
>
> -Peff

Re: [PATCH] README: create HTTP/HTTPS links from URLs in Markdown

2017-03-02 Thread Eric Wong

Jeff King  wrote:
> On Thu, Mar 02, 2017 at 07:34:21AM +, Eric Wong wrote:
> > Jeff King  wrote:
> > > On Wed, Mar 01, 2017 at 10:22:04PM +, Eric Wong wrote:
> > > 
> > > > Markdown supports automatic links by surrounding URLs with
> > > > angle brackets, as documented in
> > > > 
> > > 
> > > One of the joys of markdown is that there are so many variants. A lot of
> > > them (including GitHub-flavored markdown) will linkify URLs even when
> > > they're not inside angle brackets.
> > > 
> > > So I don't mind this patch, but I'm curious what's rendering the
> > > markdown you're seeing. I'd think online that one would either come
> > > across the raw text, or the GFM from https://github.com/git/git.
> > 
> > I was using Gruber's reference implementation from Debian stable
> > (1.0.1-7).
> 
> OK. I guess my question more was "why are you doing that?". I'd expect
> people to find the GFM rendering on GitHub, or just look at the text via
> "less".

Actually, I was initially seeing the lack of trailing slash
causing unnecessary 301 redirects on public-inbox.org.

So, I added the trailing slash and ran "markdown  But it's not really my business why you would want to do it. :) It's
> reasonable for us to cater to the common subset of renderers.

:)

I figure somebody unfamiliar with Markdown editing README.md is
likely to run the original implementation locally to check their
work, as I did.


(hmm... and vger is unusually slow this week)

[PATCH v2 7/8] strbuf_check_ref_format(): expand only local branches

2017-03-02 Thread Jeff King

This function asks strbuf_branchname() to expand any @-marks
in the branchname, and then we blindly stick refs/heads/ in
front of the result. This is obviously nonsense if the
expansion is "HEAD" or a ref in refs/remotes/.

The most obvious end-user effect is that creating or
renaming a branch with an expansion may have confusing
results (e.g., creating refs/heads/origin/master from
"@{upstream}" when the operation should be disallowed).

We can fix this by telling strbuf_branchname() that we are
only interested in local expansions. Any unexpanded bits are
then fed to check_ref_format(), which either disallows them
(in the case of "@{upstream}") or lets them through
("refs/heads/@" is technically valid, if a bit silly).

Signed-off-by: Jeff King 
---
 sha1_name.c   | 2 +-
 t/t3204-branch-name-interpretation.sh | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 7f754b60c..26ceec1d7 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1319,7 +1319,7 @@ void strbuf_branchname(struct strbuf *sb, const char 
*name, unsigned allowed)
 
 int strbuf_check_branch_ref(struct strbuf *sb, const char *name)
 {
-   strbuf_branchname(sb, name, 0);
+   strbuf_branchname(sb, name, INTERPRET_BRANCH_LOCAL);
if (name[0] == '-')
return -1;
strbuf_splice(sb, 0, 0, "refs/heads/", 11);
diff --git a/t/t3204-branch-name-interpretation.sh 
b/t/t3204-branch-name-interpretation.sh
index 4f4af1fb4..05e88f92d 100755
--- a/t/t3204-branch-name-interpretation.sh
+++ b/t/t3204-branch-name-interpretation.sh
@@ -42,7 +42,7 @@ test_expect_success 'update branch via local @{upstream}' '
expect_branch local two
 '
 
-test_expect_failure 'disallow updating branch via remote @{upstream}' '
+test_expect_success 'disallow updating branch via remote @{upstream}' '
git update-ref refs/remotes/origin/remote one &&
git branch --set-upstream-to=origin/remote &&
 
@@ -109,7 +109,7 @@ test_expect_success 'disallow deleting remote branch via 
@{-1}' '
 # and not refs/heads/HEAD. These tests should not imply that refs/heads/@ is a
 # sane thing, but it _is_ technically allowed for now. If we disallow it, these
 # can be switched to test_must_fail.
-test_expect_failure 'create branch named "@"' '
+test_expect_success 'create branch named "@"' '
git branch -f @ one &&
expect_branch refs/heads/@ one
 '
-- 
2.12.0.367.gb23790f66

[PATCH v2 4/8] interpret_branch_name: allow callers to restrict expansions

2017-03-02 Thread Jeff King

The interpret_branch_name() function converts names like
@{-1} and @{upstream} into branch names. The expanded ref
names are not fully qualified, and may be outside of the
refs/heads/ namespace (e.g., "@" expands to "HEAD", and
"@{upstream}" is likely to be in "refs/remotes/").

This is OK for callers like dwim_ref() which are primarily
interested in resolving the resulting name, no matter where
it is. But callers like "git branch" treat the result as a
branch name in refs/heads/.  When we expand to a ref outside
that namespace, the results are very confusing (e.g., "git
branch @" tries to create refs/heads/HEAD, which is
nonsense).

Callers can't know from the returned string how the
expansion happened (e.g., did the user really ask for a
branch named "HEAD", or did we do a bogus expansion?). One
fix would be to return some out-parameters describing the
types of expansion that occurred. This has the benefit that
the caller can generate precise error messages ("I
understood @{upstream} to mean origin/master, but that is a
remote tracking branch, so you cannot create it as a local
name").

However, out-parameters make the function interface somewhat
cumbersome. Instead, let's do the opposite: let the caller
tell us which elements to expand. That's easier to pass in,
and none of the callers give more precise error messages
than "@{upstream} isn't a valid branch name" anyway (which
should be sufficient).

The strbuf_branchname() function needs a similar parameter,
as most of the callers access interpret_branch_name()
through it.

We can break the callers down into two groups:

  1. Callers that are happy with any kind of ref in the
 result. We pass "0" here, so they continue to work
 without restrictions. This includes merge_name(),
 the reflog handling in add_pending_object_with_path(),
 and substitute_branch_name(). This last is what powers
 dwim_ref().

  2. Callers that have funny corner cases (mostly in
 git-branch and git-checkout). These need to make use of
 the new parameter, but I've left them as "0" in this
 patch, and will address them individually in follow-on
 patches.

Signed-off-by: Jeff King 
---
 builtin/branch.c   |  2 +-
 builtin/checkout.c |  2 +-
 builtin/merge.c|  2 +-
 cache.h| 13 +--
 refs.c |  2 +-
 revision.c |  2 +-
 sha1_name.c| 68 ++
 strbuf.h   |  6 -
 8 files changed, 69 insertions(+), 28 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 94f7de7fa..cf0ece55d 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -216,7 +216,7 @@ static int delete_branches(int argc, const char **argv, int 
force, int kinds,
char *target = NULL;
int flags = 0;
 
-   strbuf_branchname(&bname, argv[i]);
+   strbuf_branchname(&bname, argv[i], 0);
free(name);
name = mkpathdup(fmt, bname.buf);
 
diff --git a/builtin/checkout.c b/builtin/checkout.c
index f174f5030..05eefd994 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -452,7 +452,7 @@ static void setup_branch_path(struct branch_info *branch)
 {
struct strbuf buf = STRBUF_INIT;
 
-   strbuf_branchname(&buf, branch->name);
+   strbuf_branchname(&buf, branch->name, 0);
if (strcmp(buf.buf, branch->name))
branch->name = xstrdup(buf.buf);
strbuf_splice(&buf, 0, 0, "refs/heads/", 11);
diff --git a/builtin/merge.c b/builtin/merge.c
index a96d4fb50..848a29855 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -438,7 +438,7 @@ static void merge_name(const char *remote, struct strbuf 
*msg)
char *found_ref;
int len, early;
 
-   strbuf_branchname(&bname, remote);
+   strbuf_branchname(&bname, remote, 0);
remote = bname.buf;
 
memset(branch_head, 0, sizeof(branch_head));
diff --git a/cache.h b/cache.h
index c67995caa..a8816c914 100644
--- a/cache.h
+++ b/cache.h
@@ -1383,8 +1383,17 @@ extern char *oid_to_hex(const struct object_id *oid);
/* same static buffer as s
  *
  * If the input was ok but there are not N branch switches in the
  * reflog, it returns 0.
- */
-extern int interpret_branch_name(const char *str, int len, struct strbuf *);
+ *
+ * If "allowed" is non-zero, it is a treated as a bitfield of allowable
+ * expansions: local branches ("refs/heads/"), remote branches
+ * ("refs/remotes/"), or "HEAD". If no "allowed" bits are set, any expansion is
+ * allowed, even ones to refs outside of those namespaces.
+ */
+#define INTERPRET_BRANCH_LOCAL (1<<0)
+#define INTERPRET_BRANCH_REMOTE (1<<1)
+#define INTERPRET_BRANCH_HEAD (1<<2)
+extern int interpret_branch_name(const char *str, int len, struct strbuf *,
+unsigned allowed);
 extern int get_oid_mb(const char *str, struct object_id *oid);
 
 extern int validate_headref(const char *ref);
diff --git a/refs.c b/refs.

[PATCH v2 5/8] t3204: test git-branch @-expansion corner cases

2017-03-02 Thread Jeff King

git-branch feeds the branch names from the command line to
strbuf_branchname(), but we do not yet tell that function
which kinds of expansions should be allowed. Let's create a
set of tests that cover both the allowed and disallowed
cases.

That shows off some breakages where we currently create or
delete the wrong ref (and will make sure that we do not
break any cases that _should_ be working when we do add more
restrictions).

Note that we check branch creation and deletion, but do not
bother with renames. Those follow the same code path as
creation.

Signed-off-by: Jeff King 
---
 t/t3204-branch-name-interpretation.sh | 123 ++
 1 file changed, 123 insertions(+)
 create mode 100755 t/t3204-branch-name-interpretation.sh

diff --git a/t/t3204-branch-name-interpretation.sh 
b/t/t3204-branch-name-interpretation.sh
new file mode 100755
index 0..e671a7a64
--- /dev/null
+++ b/t/t3204-branch-name-interpretation.sh
@@ -0,0 +1,123 @@
+#!/bin/sh
+
+test_description='interpreting exotic branch name arguments
+
+Branch name arguments are usually names which are taken to be inside of
+refs/heads/, but we interpret some magic syntax like @{-1}, @{upstream}, etc.
+This script aims to check the behavior of those corner cases.
+'
+. ./test-lib.sh
+
+expect_branch() {
+   git log -1 --format=%s "$1" >actual &&
+   echo "$2" >expect &&
+   test_cmp expect actual
+}
+
+expect_deleted() {
+   test_must_fail git rev-parse --verify "$1"
+}
+
+test_expect_success 'set up repo' '
+   test_commit one &&
+   test_commit two &&
+   git remote add origin foo.git
+'
+
+test_expect_success 'update branch via @{-1}' '
+   git branch previous one &&
+
+   git checkout previous &&
+   git checkout master &&
+
+   git branch -f @{-1} two &&
+   expect_branch previous two
+'
+
+test_expect_success 'update branch via local @{upstream}' '
+   git branch local one &&
+   git branch --set-upstream-to=local &&
+
+   git branch -f @{upstream} two &&
+   expect_branch local two
+'
+
+test_expect_failure 'disallow updating branch via remote @{upstream}' '
+   git update-ref refs/remotes/origin/remote one &&
+   git branch --set-upstream-to=origin/remote &&
+
+   test_must_fail git branch -f @{upstream} two
+'
+
+test_expect_success 'create branch with pseudo-qualified name' '
+   git branch refs/heads/qualified two &&
+   expect_branch refs/heads/refs/heads/qualified two
+'
+
+test_expect_success 'delete branch via @{-1}' '
+   git branch previous-del &&
+
+   git checkout previous-del &&
+   git checkout master &&
+
+   git branch -D @{-1} &&
+   expect_deleted previous-del
+'
+
+test_expect_success 'delete branch via local @{upstream}' '
+   git branch local-del &&
+   git branch --set-upstream-to=local-del &&
+
+   git branch -D @{upstream} &&
+   expect_deleted local-del
+'
+
+test_expect_success 'delete branch via remote @{upstream}' '
+   git update-ref refs/remotes/origin/remote-del two &&
+   git branch --set-upstream-to=origin/remote-del &&
+
+   git branch -r -D @{upstream} &&
+   expect_deleted origin/remote-del
+'
+
+# Note that we create two oddly named local branches here. We want to make
+# sure that we do not accidentally delete either of them, even if
+# shorten_unambiguous_ref() tweaks the name to avoid ambiguity.
+test_expect_failure 'delete @{upstream} expansion matches -r option' '
+   git update-ref refs/remotes/origin/remote-del two &&
+   git branch --set-upstream-to=origin/remote-del &&
+   git update-ref refs/heads/origin/remote-del two &&
+   git update-ref refs/heads/remotes/origin/remote-del two &&
+
+   test_must_fail git branch -D @{upstream} &&
+   expect_branch refs/heads/origin/remote-del two &&
+   expect_branch refs/heads/remotes/origin/remote-del two
+'
+
+test_expect_failure 'disallow deleting remote branch via @{-1}' '
+   git update-ref refs/remotes/origin/previous one &&
+
+   git checkout -b origin/previous two &&
+   git checkout master &&
+
+   test_must_fail git branch -r -D @{-1} &&
+   expect_branch refs/remotes/origin/previous one &&
+   expect_branch refs/heads/origin/previous two
+'
+
+# The thing we are testing here is that "@" is the real branch refs/heads/@,
+# and not refs/heads/HEAD. These tests should not imply that refs/heads/@ is a
+# sane thing, but it _is_ technically allowed for now. If we disallow it, these
+# can be switched to test_must_fail.
+test_expect_failure 'create branch named "@"' '
+   git branch -f @ one &&
+   expect_branch refs/heads/@ one
+'
+
+test_expect_failure 'delete branch named "@"' '
+   git update-ref refs/heads/@ two &&
+   git branch -D @ &&
+   expect_deleted refs/heads/@
+'
+
+test_done
-- 
2.12.0.367.gb23790f66

[PATCH v2 6/8] branch: restrict @-expansions when deleting

2017-03-02 Thread Jeff King

We use strbuf_branchname() to expand the branch name from
the command line, so you can delete the branch given by
@{-1}, for example.  However, we allow other nonsense like
"@", and we do not respect our "-r" flag (so we may end up
deleting an oddly-named local ref instead of a remote one).

We can fix this by passing the appropriate "allowed" flag to
strbuf_branchname().

Signed-off-by: Jeff King 
---
 builtin/branch.c  | 5 -
 t/t3204-branch-name-interpretation.sh | 6 +++---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index cf0ece55d..291fe90de 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -191,17 +191,20 @@ static int delete_branches(int argc, const char **argv, 
int force, int kinds,
int ret = 0;
int remote_branch = 0;
struct strbuf bname = STRBUF_INIT;
+   unsigned allowed_interpret;
 
switch (kinds) {
case FILTER_REFS_REMOTES:
fmt = "refs/remotes/%s";
/* For subsequent UI messages */
remote_branch = 1;
+   allowed_interpret = INTERPRET_BRANCH_REMOTE;
 
force = 1;
break;
case FILTER_REFS_BRANCHES:
fmt = "refs/heads/%s";
+   allowed_interpret = INTERPRET_BRANCH_LOCAL;
break;
default:
die(_("cannot use -a with -d"));
@@ -216,7 +219,7 @@ static int delete_branches(int argc, const char **argv, int 
force, int kinds,
char *target = NULL;
int flags = 0;
 
-   strbuf_branchname(&bname, argv[i], 0);
+   strbuf_branchname(&bname, argv[i], allowed_interpret);
free(name);
name = mkpathdup(fmt, bname.buf);
 
diff --git a/t/t3204-branch-name-interpretation.sh 
b/t/t3204-branch-name-interpretation.sh
index e671a7a64..4f4af1fb4 100755
--- a/t/t3204-branch-name-interpretation.sh
+++ b/t/t3204-branch-name-interpretation.sh
@@ -83,7 +83,7 @@ test_expect_success 'delete branch via remote @{upstream}' '
 # Note that we create two oddly named local branches here. We want to make
 # sure that we do not accidentally delete either of them, even if
 # shorten_unambiguous_ref() tweaks the name to avoid ambiguity.
-test_expect_failure 'delete @{upstream} expansion matches -r option' '
+test_expect_success 'delete @{upstream} expansion matches -r option' '
git update-ref refs/remotes/origin/remote-del two &&
git branch --set-upstream-to=origin/remote-del &&
git update-ref refs/heads/origin/remote-del two &&
@@ -94,7 +94,7 @@ test_expect_failure 'delete @{upstream} expansion matches -r 
option' '
expect_branch refs/heads/remotes/origin/remote-del two
 '
 
-test_expect_failure 'disallow deleting remote branch via @{-1}' '
+test_expect_success 'disallow deleting remote branch via @{-1}' '
git update-ref refs/remotes/origin/previous one &&
 
git checkout -b origin/previous two &&
@@ -114,7 +114,7 @@ test_expect_failure 'create branch named "@"' '
expect_branch refs/heads/@ one
 '
 
-test_expect_failure 'delete branch named "@"' '
+test_expect_success 'delete branch named "@"' '
git update-ref refs/heads/@ two &&
git branch -D @ &&
expect_deleted refs/heads/@
-- 
2.12.0.367.gb23790f66

[PATCH v2 2/8] strbuf_branchname: drop return value

2017-03-02 Thread Jeff King

The return value from strbuf_branchname() is confusing and
useless: it's 0 if the whole name was consumed by an @-mark,
but otherwise is the length of the original name we fed.

No callers actually look at the return value, so let's just
get rid of it.

Signed-off-by: Jeff King 
---
 sha1_name.c | 5 +
 strbuf.h| 2 +-
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 28865b3a1..4c1e91184 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1279,17 +1279,14 @@ int interpret_branch_name(const char *name, int 
namelen, struct strbuf *buf)
return -1;
 }
 
-int strbuf_branchname(struct strbuf *sb, const char *name)
+void strbuf_branchname(struct strbuf *sb, const char *name)
 {
int len = strlen(name);
int used = interpret_branch_name(name, len, sb);
 
-   if (used == len)
-   return 0;
if (used < 0)
used = 0;
strbuf_add(sb, name + used, len - used);
-   return len;
 }
 
 int strbuf_check_branch_ref(struct strbuf *sb, const char *name)
diff --git a/strbuf.h b/strbuf.h
index cf1b5409e..47df0500d 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -560,7 +560,7 @@ static inline void strbuf_complete_line(struct strbuf *sb)
strbuf_complete(sb, '\n');
 }
 
-extern int strbuf_branchname(struct strbuf *sb, const char *name);
+extern void strbuf_branchname(struct strbuf *sb, const char *name);
 extern int strbuf_check_branch_ref(struct strbuf *sb, const char *name);
 
 extern void strbuf_addstr_urlencode(struct strbuf *, const char *,
-- 
2.12.0.367.gb23790f66

[PATCH v2 3/8] strbuf_branchname: add docstring

2017-03-02 Thread Jeff King

This function and its companion, strbuf_check_branch_ref(),
did not have their purpose or semantics explained. Let's do
so.

Signed-off-by: Jeff King 
---
 strbuf.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/strbuf.h b/strbuf.h
index 47df0500d..6b51b2604 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -560,7 +560,22 @@ static inline void strbuf_complete_line(struct strbuf *sb)
strbuf_complete(sb, '\n');
 }
 
+/*
+ * Copy "name" to "sb", expanding any special @-marks as handled by
+ * interpret_branch_name(). The result is a non-qualified branch name
+ * (so "foo" or "origin/master" instead of "refs/heads/foo" or
+ * "refs/remotes/origin/master").
+ *
+ * Note that the resulting name may not be a syntactically valid refname.
+ */
 extern void strbuf_branchname(struct strbuf *sb, const char *name);
+
+/*
+ * Like strbuf_branchname() above, but confirm that the result is
+ * syntactically valid to be used as a local branch name in refs/heads/.
+ *
+ * The return value is "0" if the result is valid, and "-1" otherwise.
+ */
 extern int strbuf_check_branch_ref(struct strbuf *sb, const char *name);
 
 extern void strbuf_addstr_urlencode(struct strbuf *, const char *,
-- 
2.12.0.367.gb23790f66

[PATCH v2 1/8] interpret_branch_name: move docstring to header file

2017-03-02 Thread Jeff King

We generally put docstrings with function declarations,
because it's the callers who need to know how the function
works. Let's do so for interpret_branch_name().

Signed-off-by: Jeff King 
---
 cache.h | 21 +
 sha1_name.c | 21 -
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/cache.h b/cache.h
index 80b6372cf..c67995caa 100644
--- a/cache.h
+++ b/cache.h
@@ -1363,6 +1363,27 @@ extern char *oid_to_hex_r(char *out, const struct 
object_id *oid);
 extern char *sha1_to_hex(const unsigned char *sha1);   /* static buffer 
result! */
 extern char *oid_to_hex(const struct object_id *oid);  /* same static buffer 
as sha1_to_hex */
 
+/*
+ * This reads short-hand syntax that not only evaluates to a commit
+ * object name, but also can act as if the end user spelled the name
+ * of the branch from the command line.
+ *
+ * - "@{-N}" finds the name of the Nth previous branch we were on, and
+ *   places the name of the branch in the given buf and returns the
+ *   number of characters parsed if successful.
+ *
+ * - "@{upstream}" finds the name of the other ref that
+ *is configured to merge with (missing  defaults
+ *   to the current branch), and places the name of the branch in the
+ *   given buf and returns the number of characters parsed if
+ *   successful.
+ *
+ * If the input is not of the accepted format, it returns a negative
+ * number to signal an error.
+ *
+ * If the input was ok but there are not N branch switches in the
+ * reflog, it returns 0.
+ */
 extern int interpret_branch_name(const char *str, int len, struct strbuf *);
 extern int get_oid_mb(const char *str, struct object_id *oid);
 
diff --git a/sha1_name.c b/sha1_name.c
index 9b5d14b4b..28865b3a1 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1238,27 +1238,6 @@ static int interpret_branch_mark(const char *name, int 
namelen,
return len + at;
 }
 
-/*
- * This reads short-hand syntax that not only evaluates to a commit
- * object name, but also can act as if the end user spelled the name
- * of the branch from the command line.
- *
- * - "@{-N}" finds the name of the Nth previous branch we were on, and
- *   places the name of the branch in the given buf and returns the
- *   number of characters parsed if successful.
- *
- * - "@{upstream}" finds the name of the other ref that
- *is configured to merge with (missing  defaults
- *   to the current branch), and places the name of the branch in the
- *   given buf and returns the number of characters parsed if
- *   successful.
- *
- * If the input is not of the accepted format, it returns a negative
- * number to signal an error.
- *
- * If the input was ok but there are not N branch switches in the
- * reflog, it returns 0.
- */
 int interpret_branch_name(const char *name, int namelen, struct strbuf *buf)
 {
char *at;
-- 
2.12.0.367.gb23790f66

[PATCH v2] fixing corner-cases with interpret_branch_name()

2017-03-02 Thread Jeff King

This is a re-roll of the series from:

  
http://public-inbox.org/git/20170228120633.zkwfqms57fk7d...@sigill.intra.peff.net/

Thanks Junio and Jake for reviewing the original. This is mostly the
same, but:

  - it fixes the case where "branch -r @{-1}" mistakes a local branch
for a remote (and adds a test)

  - as a result of the above fix, the series needs to be applied on top
of jk/auto-namelen-in-interpret-branch-name.

  - I clarified the history in the commit message of patch 4

  - the commit message for patch 4 now explicitly mentions which
callers can be left alone (so anybody blaming the history won't
think they were simply forgotten).

With the exception of patch 6 flipping the failure/success bit on the
new test, the rest of the patches should be identical.

  [1/8]: interpret_branch_name: move docstring to header file
  [2/8]: strbuf_branchname: drop return value
  [3/8]: strbuf_branchname: add docstring
  [4/8]: interpret_branch_name: allow callers to restrict expansions
  [5/8]: t3204: test git-branch @-expansion corner cases
  [6/8]: branch: restrict @-expansions when deleting
  [7/8]: strbuf_check_ref_format(): expand only local branches
  [8/8]: checkout: restrict @-expansions when finding branch

 builtin/branch.c  |   5 +-
 builtin/checkout.c|   2 +-
 builtin/merge.c   |   2 +-
 cache.h   |  32 +++-
 refs.c|   2 +-
 revision.c|   2 +-
 sha1_name.c   |  92 ---
 strbuf.h  |  21 +-
 t/t3204-branch-name-interpretation.sh | 133 ++
 9 files changed, 240 insertions(+), 51 deletions(-)
 create mode 100755 t/t3204-branch-name-interpretation.sh

-Peff

1 2 >

1 - 100 of 102 matches

Mail list logo