Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-07 Thread Jeff King
On Fri, Nov 07, 2014 at 05:13:47PM +0700, Duy Nguyen wrote:

  By the way, one other thing I wondered while looking at this code: when
  we checkout a working tree file, we unlink the old one and write the new
  one in-place. Is there a particular reason we do this versus writing to
  a temporary file and renaming it into place?  That would give
  simultaneous readers a more atomic view.
 
  I suspect the answer is something like: you cannot always do a rename,
  because you might have a typechange, directory becoming a file, or vice
  versa; so anyone relying on an atomic view during a checkout operation
  is already Doing It Wrong.  Handling a content-change of an existing
  path would complicate the code, so we do not bother.
 
 Not a confirmation, but it looks like Linus did it just to make sure
 he had new permissions right, in e447947 (Be much more liberal about
 the file mode bits. - 2005-04-16).

Thanks for digging that up. I think that only gives us half the story,
though. That explains why we would unlink/open instead of relying on
just open(O_TRUNC). But I think opening a new tempfile would work the
same as the current code in that respect.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-07 Thread Jeff King
On Fri, Nov 07, 2014 at 09:14:42AM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  Is there a reason that we don't use this diff technique for checkout?
 
 I suspect that the reasons are probably mixture of these:
 
  (1) the code path may descend from checkout-index and predates the
  in-core diff machinery;
 
  (2) in the context of checkout-index, it was more desirable to be
  able to say I want the contents on the filesystem, even if you
  think I already have it there, as you as a new software are
  likely to be wrong and I know better; or
 
  (3) it was easier to code that way ;-)
 
 I do not see there is any reason not to do what you suggest.

OK. It's not very much code (and can mostly be lifted from git-reset),
so I may take a stab at it.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] trailer: display a trailer without its trailing newline

2014-11-07 Thread Jeff King
On Fri, Nov 07, 2014 at 07:50:49PM +0100, Christian Couder wrote:

 diff --git a/trailer.c b/trailer.c
 index 761b763..f4d51ba 100644
 --- a/trailer.c
 +++ b/trailer.c
 @@ -583,8 +583,12 @@ static int parse_trailer(struct strbuf *tok, struct 
 strbuf *val, const char *tra
   strbuf_addch(seps, '=');
   len = strcspn(trailer, seps.buf);
   strbuf_release(seps);
 - if (len == 0)
 - return error(_(empty trailer token in trailer '%s'), trailer);
 + if (len == 0) {
 + struct strbuf sb = STRBUF_INIT;
 + strbuf_addstr(sb, trailer);
 + strbuf_rtrim(sb);
 + return error(_(empty trailer token in trailer '%s'), sb.buf);
 + }

Doesn't this leak sb.buf?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-08 Thread Jeff King
On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote:

 I think that has direct linkage; what you have in mind I think is
 http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935

Thanks for that link.

I did spend a few hours on this topic earlier today, and got very
confused trying to figure out what the deletion behavior _should_ be,
and whether I was breaking it.  For some reason I had zero recollection
of a conversation from last year that I was obviously a major part of. I
think I am getting old. :)

The end of that thread concludes that a diff-based approach is not going
to work, because we need to update the working tree even for files not
mentioned by the diff. I do not think that is a show-stopper, though.
It just means that we need to load the new index as one step (done now
with read_tree_recursive, but ideally using diff), and then walk over
the whole resulting index applying our pathspec again (instead of
relying on CE_UPDATE flags).

This turns out not to be a big deal, because the existing code is
already doing most of that second pathspec application anyway. It does
it because read_tree_recursive is not smart enough to update the seen
bits for the pathspec. But now we would have another reason to do it
this way. :)

So just to be clear, the behavior we want is that:

  echo foo some-new-path
  git add some-new-path
  git checkout HEAD -- .

will delete some-new-path (whereas the current code turns it into an
untracked file). What should:

  git checkout HEAD -- some-new-path

do in that case? With the current code, it actually barfs, complaining
that nothing matched some-new-path (because it is not part of HEAD, and
therefore we don't consider it at all), and aborts the whole operation.
I think we would want to delete some-new-path in that case, too.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-08 Thread Jeff King
On Sat, Nov 08, 2014 at 03:30:40AM -0500, Jeff King wrote:

 So just to be clear, the behavior we want is that:
 
   echo foo some-new-path
   git add some-new-path
   git checkout HEAD -- .
 
 will delete some-new-path (whereas the current code turns it into an
 untracked file). What should:
 
   git checkout HEAD -- some-new-path
 
 do in that case? With the current code, it actually barfs, complaining
 that nothing matched some-new-path (because it is not part of HEAD, and
 therefore we don't consider it at all), and aborts the whole operation.
 I think we would want to delete some-new-path in that case, too.

Also, t2022.3 has me very confused.

It is explicitly checking that if we have subdir/foo unmerged in the
index, and we git checkout $tree -- subdir, and $tree does not mention
foo, that we _leave_ foo in place.

That seems very counter-intuitive to me. If you asked to make subdir
look like $tree, then we should clobber it. That change comes from
e721c15 (checkout: avoid unnecessary match_pathspec calls, 2013-03-27),
where it is mentioned as a _bugfix_. That in turn references 0a1283b
(checkout $tree $path: do not clobber local changes in $path not in
$tree, 2011-09-30), which explicitly goes against the goal we are
talking about here. It is not make my index and working tree look like
$tree at all.

So now I'm doubly confused about what we want to do.

If we want to retain that behavior, I think we can still cover these
cases by marking items missing from $tree as to remove during the
diff/update the index phase, and then being more gentle with to
remove files (e.g., not clobbering changed worktree files unless -f
is given).

I am not sure that provides a sane user experience, though. Why is it OK
to clobber local changes to a file if we are replacing it with other
content, but _not_ if we are replacing it with nothing?  Either the
content we are losing is valuable or not, but it has nothing to do with
what we are replacing. And Junio argued in the thread linked elsewhere
that the point of git checkout $tree -- $path is to clobber what is in
$path, which I would agree with.

I think the argument made in 0a1283b is that git checkout $tree $path
is not make $path like $tree, but rather pick bits of $path out of
$tree. Which would mean this whole deletion thing we are talking about
is completely contrary to that.

So which is it?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What is the default refspec for fetch?

2014-11-08 Thread Jeff King
On Fri, Nov 07, 2014 at 04:31:08PM +0100, Christian Halstrick wrote:

 In a repo where no remote.name.fetch config parameter is set what
 should a git fetch do? My experiments let me think it's
 HEAD:FETCH_HEAD. Right?

Basically, yes. We always write FETCH_HEAD, regardless of the refspec.
We choose HEAD if no other refspec was provided. So it is really more
like

  git fetch $remote HEAD

This is what makes one-off bare-url pulls work, like:

  git pull git://...

It runs fetch under the hood, which writes into FETCH_HEAD, and then we
merge that.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git_connect: allow passing arguments to ssh in GIT_SSH_ARGS

2014-11-08 Thread Jeff King
On Sat, Nov 08, 2014 at 11:44:39AM +0100, Thomas Quinot wrote:

 It may be impractical to install a wrapper script for ssh
 when additional parameters need to be passed. Provide an
 alternative way of specifying these by means of the GIT_SSH_ARGS
 environment variable. Arguments are whitespace delimited following
 usual shell conventions; embedded whitespace can be enclosed
 in quotes, or escaped with a backslash.

This has bugged me before, too. Almost every sub-program invoked by git
is done so using a shell, so you can put in arbitrary arguments or even
snippets of shell code. But not GIT_SSH.

I think the original reason was to match other programs with similar
variables. At this point, we can't just change it on a whim; the quoting
requirements are different and it would break people's setups. So your
approach of adding a new variable is good (the other option is figuring
out a long-term transition plan).

However, I'm not sure adding a separate variable for options is the best
we can do. Besides being a bit clunky, it has two big shortcomings:

  1. It's not consistent with other git variables that use the shell
 (e.g., GIT_PAGER).

  2. It's not as flexible as a shell; you can't specify $HOME/bin/ssh
 or other even more esoteric things, like dynamically tweaking
 the options based on the environment.

What do you think of adding an alternate variable that is not ssh
_arguments_, but rather just a full shell command for running ssh?
I'm not sure what it could be called (GIT_SSH_SH is probably too
confusing).

 I hope I won't stray too far away from established procedures
 with my first contribution to git. This patch adds support
 for a GIT_SSH_ARGS environment variable, providing a way
 of specifying ssh arguments without having to create a
 wrapper script.

The formatting and whatnot of your submission looks fine to me. The
patch itself looks reasonable, though I didn't look too closely after
making my comments above. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Add a few more values for receive.denyCurrentBranch

2014-11-08 Thread Jeff King
On Fri, Nov 07, 2014 at 02:58:17PM +0100, Johannes Schindelin wrote:

 Under certain circumstances, it makes a *lot* of sense to allow pushing
 into the current branch. For example, when two machines with different
 Operating Systems are required for testing, it makes much more sense to
 synchronize between working directories than having to go via a third
 server.

FWIW, I do this without a third server (and without resorting to pull),
with:

  host1$ git push host2 master:refs/remotes/host1/master
  host2$ git merge host1/master

You can even set up a push refspec to make git push host2 do the right
thing.

That being said, I do like the premise of your patch, as it eliminates
the extra step on the remote side (which is not that big a deal in
itself, but when you realize that host2 _did_ have some changes on it,
then you end up doing the merge there, when in general I'd prefer to do
all the work on host1 via git pull).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Test failure

2014-11-08 Thread Jeff King
On Sat, Nov 08, 2014 at 11:28:32AM -0800, Michael Blume wrote:

 When I build and run tests I get
 
 [11:17][michael.blume@tcc-michael-4:~/workspace/git/t(master)]$
 ./t1410-reflog.sh

What does ./t1410-reflog.sh -v -i report?

 A quick search seems to indicate the test is pretty new?
 http://www.mail-archive.com/git@vger.kernel.org/msg60495.html

Yes, it is new. In these cases there's often some silly little
platform incompatibility in the test script, but I don't see one. So
maybe the incompatibility is in the code itself; I'm wondering if
OS X returns something besides EISDIR when trying to open a directory.

Unfortunately I don't have an OS X install handy to test on.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] t1410: fix breakage on case-insensitive filesystems

2014-11-08 Thread Jeff King
On Sat, Nov 08, 2014 at 08:43:54PM -0500, Jeff King wrote:

 Unfortunately I don't have an OS X install handy to test on.

I lied; it turns out I still had access to an old VM. The problem did
turn out to be rather silly. Here's a patch.

-- 8 --
Two tests recently added to t1410 create branches a and
a/b to test d/f conflicts on reflogs. Earlier, unrelated
tests in that script create the path A/B in the working
tree.  There's no conflict on a case-sensitive filesystem,
but on a case-insensitive one, git log will complain that
a/b is both a revision and a working tree path.

We could fix this by using a -- to disambiguate, but we
are probably better off using names that are less confusing
to make it more clear that they are unrelated to the working
tree files.  This patch turns a/b into one/two.

Reported-by: Michael Blume blume.m...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
The line-diff is hard to read, but if anyone was looking for a chance to
test-drive contrib/diff-highlight, it does a good job of making the
change easy to see.

 t/t1410-reflog.sh | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh
index 976c1d4..8cf4461 100755
--- a/t/t1410-reflog.sh
+++ b/t/t1410-reflog.sh
@@ -254,36 +254,36 @@ test_expect_success 'checkout should not delete log for 
packed ref' '
 '
 
 test_expect_success 'stale dirs do not cause d/f conflicts (reflogs on)' '
-   test_when_finished git branch -d a || git branch -d a/b 
+   test_when_finished git branch -d one || git branch -d one/two 
 
-   git branch a/b master 
-   echo a/b@{0} branch: Created from master expect 
-   git log -g --format=%gd %gs a/b actual 
+   git branch one/two master 
+   echo one/two@{0} branch: Created from master expect 
+   git log -g --format=%gd %gs one/two actual 
test_cmp expect actual 
-   git branch -d a/b 
+   git branch -d one/two 
 
-   # now logs/refs/heads/a is a stale directory, but
-   # we should move it out of the way to create a reflog
-   git branch a master 
-   echo a@{0} branch: Created from master expect 
-   git log -g --format=%gd %gs a actual 
+   # now logs/refs/heads/one is a stale directory, but
+   # we should move it out of the way to create one reflog
+   git branch one master 
+   echo one@{0} branch: Created from master expect 
+   git log -g --format=%gd %gs one actual 
test_cmp expect actual
 '
 
 test_expect_success 'stale dirs do not cause d/f conflicts (reflogs off)' '
-   test_when_finished git branch -d a || git branch -d a/b 
+   test_when_finished git branch -d one || git branch -d one/two 
 
-   git branch a/b master 
-   echo a/b@{0} branch: Created from master expect 
-   git log -g --format=%gd %gs a/b actual 
+   git branch one/two master 
+   echo one/two@{0} branch: Created from master expect 
+   git log -g --format=%gd %gs one/two actual 
test_cmp expect actual 
-   git branch -d a/b 
+   git branch -d one/two 
 
-   # same as before, but we only create a reflog for a if
+   # same as before, but we only create a reflog for one if
# it already exists, which it does not
-   git -c core.logallrefupdates=false branch a master 
+   git -c core.logallrefupdates=false branch one master 
: expect 
-   git log -g --format=%gd %gs a actual 
+   git log -g --format=%gd %gs one actual 
test_cmp expect actual
 '
 
-- 
2.1.2.596.g7379948

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] replace: fix replacing object with itself

2014-11-08 Thread Jeff King
On Sun, Nov 09, 2014 at 01:05:31AM +0100, Manzur Mukhitdinov wrote:

 When object is replaced with itself git shows unhelpful messages like(git 
 log):
 fatal: replace depth too high for object SHA1
 
 Prevents user from replacing object with itself(with test for checking
 this case).

I thought we already did this in the last round of git-replace patches,
but it looks like we only did it for the newly added --edit and --graft
cases, not git replace X X. I think this is probably a good step. I've
also considered that this should be another way of deleting the
replacement, but I think we decided that was too magical.

 diff --git a/builtin/replace.c b/builtin/replace.c
 index 294b61b..b7e05ad 100644
 --- a/builtin/replace.c
 +++ b/builtin/replace.c
 @@ -186,6 +186,9 @@ static int replace_object(const char *object_ref, const 
 char *replace_ref, int f
   if (get_sha1(replace_ref, repl))
   die(Failed to resolve '%s' as a valid ref., replace_ref);
  
 + if (!hashcmp(object, repl))
 + return error(new object is the same as the old one: '%s', 
 sha1_to_hex(object));
 +
   return replace_object_sha1(object_ref, object, replace_ref, repl, 
 force);

I think all of the callers of replace_object_sha1 do this same check
now. Can we just move the check into that function instead of adding
another instance of it?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-09 Thread Jeff King
On Sat, Nov 08, 2014 at 08:19:21AM -0800, Martin von Zweigbergk wrote:

  What should:
 
git checkout HEAD -- some-new-path
 
  do in that case? With the current code, it actually barfs, complaining
  that nothing matched some-new-path (because it is not part of HEAD, and
  therefore we don't consider it at all), and aborts the whole operation.
  I think we would want to delete some-new-path in that case, too.
 
 I don't think we'd want it to be deleted. I would view 'git reset
 --hard' as the role model here, and that command (without paths) would
 not remove the file. And applying it to a path should not change the
 behavior, just restrict it to the paths, right?

Are you sure about git reset here? If I do:

  git init
  echo content file  git add file  git commit -m base
  echo modified file
  echo new some-new-path
  git add file some-new-path
  git reset --hard

then we delete some-new-path (it is not untracked, because the index
knows about it). That makes sense to me. I.e., we treat it with the same
preciousness whether it is named explicitly or not.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] git_connect: set ssh shell command in GIT_SSH_CMD

2014-11-09 Thread Jeff King
On Sat, Nov 08, 2014 at 03:27:53PM +0100, Thomas Quinot wrote:

 It may be impractical to install a wrapper script for GIT_SSH
 when additional parameters need to be passed. Provide an alternative
 way of specifying a shell command to be run, including command line
 arguments, by means of the GIT_SSH_CMD environment variable, which
 behaves like GIT_SSH but is passed to the shell.

Thanks, I like this much better. The name GIT_SSH_CMD is not too bad.
Personally, of the two (GIT_SSH and GIT_SSH_CMD) I would expect the
_CMD form to be the one that does not use the shell. But I do not
really have a better suggestion for the name, so perhaps it's OK.

 Note that with this first patch only, the special processing for PuTTY/plink
 looks at the whole command in that case. I deliberately left it that way,
 even though one might imagine a case where this would cause a false positive,
 e.g. if the user's login name includes the string 'plink':
GIT_SSH_CMD=ssh -l fooplink
 
 The work-around in this case would be to write:
GIT_SSH_CMD=ssh -l foop''link

I'd be tempted to say that the plink magic does not need to kick in at
all for GIT_SSH_CMD. There are simply too many corner cases in trying to
guess what the shell is going to run. For example:

 A second patch, coming in a followup message, pre-splits $GIT_SSH_CMD
 and ensures that we look for plink/tortoiseplink only in the argv[0]
 element.

I don't think you can pre-split accurately. Since we promise to pass the
result to the shell, the user gets to write whatever they want. Even:

  GIT_SSH_CMD='f() {
if test $(hostname) = some-machine; then
  some-special-ssh $@
else
  ssh $@
  }
  f'

Parsing complications aside, you cannot even know in git which ssh is
going to be run until the shell code is executed. I think either we have
to leave the responsibility for munging -p into -P on the side of
the user's shell snippet (and remember that they can still use GIT_SSH
as usual, so we are not making anything _worse_ for them here), or we
have to provide some unambiguous way for them to signal which calling
convention we want ($GIT_SSH_CMD_IS_PLINK or something).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t1410: fix breakage on case-insensitive filesystems

2014-11-09 Thread Jeff King
On Sun, Nov 09, 2014 at 06:56:45PM -0800, Junio C Hamano wrote:

  We could fix this by using a -- to disambiguate, but we
  are probably better off using names that are less confusing
  to make it more clear that they are unrelated to the working
  tree files.  This patch turns a/b into one/two.
 
 Hmph, but the branch a and the file A _do_ have names that are
 unrelated to each other, and it is only the case insensitive fs
 that is confused ;-).  Renaming is not so bad and certainly is
 not wrong per-se as a workaround, but I have this suspicion
 that it sends a wrong message to people on such a filesystem,
 namely, you cannot use Git to manage a file called 'master',
 or something silly like that. Disambiguation with double-dashes
 does not have such a problem, and instead shows a way how
 scripts that are meant to be portable is written. More importantly,
 that is more in line with the problem description (i.e. we complain
 pointing out the ambiguity, implying that everything is fine as long
 as you disambiguate).

 
 So I would rather see the workaround done that way.

My main concern is that it leaves t1410 in an accident waiting to
happen state, where some hapless developer will add a new test using
a/b and not realizing they need to be careful to disambiguate. The test
will pass for them on Linux, but some luckless OS X user will end up
wasting time tracking down the error.

Or another way of looking at it: it is perfectly possible to have git
manage a file called master or even HEAD. But that does not mean it
is a _good idea_, or is without annoyances. :)

 But that is only if this were before you actually wrote the patch.
 The above is not a preference strong enough to make me ask
 you to reroll ;-)

The alternate form is quite trivial. I think I still prefer the
one/two version, but here is the -- patch for reference. You can
decide which to pick up.

-- 8 --
Subject: t1410: fix breakage on case-insensitive filesystems

Two tests recently added to t1410 create branches a and
a/b to test d/f conflicts on reflogs. Earlier tests in
that script create the path A/B in the working tree.
There's no conflict on a case-sensitive filesystem, but on a
case-insensitive one, git log will complain that a/b is
both a revision and a working tree path.

We can fix this by using -- to disambiguate.

Signed-off-by: Jeff King p...@peff.net
---
 t/t1410-reflog.sh | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh
index 976c1d4..48bcd59 100755
--- a/t/t1410-reflog.sh
+++ b/t/t1410-reflog.sh
@@ -258,7 +258,7 @@ test_expect_success 'stale dirs do not cause d/f conflicts 
(reflogs on)' '
 
git branch a/b master 
echo a/b@{0} branch: Created from master expect 
-   git log -g --format=%gd %gs a/b actual 
+   git log -g --format=%gd %gs a/b -- actual 
test_cmp expect actual 
git branch -d a/b 
 
@@ -266,7 +266,7 @@ test_expect_success 'stale dirs do not cause d/f conflicts 
(reflogs on)' '
# we should move it out of the way to create a reflog
git branch a master 
echo a@{0} branch: Created from master expect 
-   git log -g --format=%gd %gs a actual 
+   git log -g --format=%gd %gs a -- actual 
test_cmp expect actual
 '
 
@@ -275,7 +275,7 @@ test_expect_success 'stale dirs do not cause d/f conflicts 
(reflogs off)' '
 
git branch a/b master 
echo a/b@{0} branch: Created from master expect 
-   git log -g --format=%gd %gs a/b actual 
+   git log -g --format=%gd %gs a/b -- actual 
test_cmp expect actual 
git branch -d a/b 
 
@@ -283,7 +283,7 @@ test_expect_success 'stale dirs do not cause d/f conflicts 
(reflogs off)' '
# it already exists, which it does not
git -c core.logallrefupdates=false branch a master 
: expect 
-   git log -g --format=%gd %gs a actual 
+   git log -g --format=%gd %gs a -- actual 
test_cmp expect actual
 '
 
-- 
2.1.2.596.g7379948

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t1410: Fix for case insensitive filesystems

2014-11-09 Thread Jeff King
On Sun, Nov 09, 2014 at 08:48:06PM -0500, Brian Gernhardt wrote:

 A pair of recently added tests used branches a and a/b, but earlier
 tests created files A and A/B.  On case insensitive filesystems (such
 as HFS+), that causes git to complain about the name being ambiguous
 between branch and file.  Resolve by renaming the branches to aa and
 aa/bb.

Already being discussed here:

  http://thread.gmane.org/gmane.comp.version-control.git/259250/focus=259253

:)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t1410: fix breakage on case-insensitive filesystems

2014-11-09 Thread Jeff King
On Sun, Nov 09, 2014 at 09:48:29AM -0800, Junio C Hamano wrote:

 Michael Blume blume.m...@gmail.com writes:
 
  Works for me, thanks =)
 
  I'm curious now, is there an automated build of git running on a mac
  anywhere? There's a mac mini running jenkins in my office and it's
  possible I could convince someone to let me set up a git build that'll
  e-mail me if there's a test failure.
 
 I am not aware of a Macintosh person who regularly runs tests, but
 if there were, we hopefully will hear from them soonish ;-).

I think there are several people who run the tests on OS X fairly
regularly (I note that another fix for the t1410 problem has already
materialized :) ).

However, I think it is nice when test failures are caught early, before
you have merged topics (to master or elsewhere). That helps isolate the
failures to their particular topics.

I know you make test before pushing out the results of any integration
you do. And I recall that for a while (and maybe still?) you even did so
on VMs of a few common platforms. OS X is notoriously irritating to run
in a VM, but would you be interested in a network-accessible install
that you could push to and make test on as part of your routine?

If what Michael is offering cannot do that, I am sure I can get GitHub
to set something up.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] imap-send: avoid curl functions when not building curl support

2014-11-09 Thread Jeff King
Imap-send recently learned to conditionally compile against
and use curl for imap support. To use this feature, you must
both:

  1. Compile with USE_CURL_FOR_IMAP_SEND

  2. Specify --curl on the command line to enable it

It is OK (and even desirable) for the code checking --curl
to be compiled even if USE_CURL_FOR_IMAP_SEND is not in
effect; this lets us explain the situation to the user
(instead of saying --curl? never heard of it).

However, the code which conditionally runs when --curl is
enabled must _not_ be compiled in this case. It references
functions which are neither declared nor defined, causing
the compiler to complain.

Signed-off-by: Jeff King p...@peff.net
---
On top of br/imap-send-via-libcurl. I needed this to compile 'pu' with
NO_CURL (which I don't usually do, but was testing on a minimal box). I
expect it can just be squashed in to the next re-roll.

Since we were talking about testing in another thread, Junio, I wonder
if it is worth having you compile your integration results against a
couple different configs (e.g., NO_CURL). Obviously that will make
things slower if you don't throw more CPU power at it, but that seems
like a problem that can be solved with build servers or similar.

 imap-send.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/imap-send.c b/imap-send.c
index ad4ac22..e0e1f09 100644
--- a/imap-send.c
+++ b/imap-send.c
@@ -1542,8 +1542,10 @@ int main(int argc, char **argv)
if (server.tunnel)
return append_msgs_to_imap(server, all_msgs, total);
 
+#ifdef USE_CURL_FOR_IMAP_SEND
if (use_curl)
return curl_append_msgs_to_imap(server, all_msgs, total);
+#endif
 
return append_msgs_to_imap(server, all_msgs, total);
 }
-- 
2.1.2.596.g7379948
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] t4213: avoid | in sed regexp

2014-11-09 Thread Jeff King
On Sun, Nov 09, 2014 at 06:46:01PM -0800, Michael Blume wrote:

 Ok, with that I have a different test failure on the pu branch --
 please tell me if I'm spamming the list or if there's some other
 protocol I should be using to report issues on pu.

No, reporting problems to the list is exactly the right spot. It is nice
to start a new thread, though, if it is an unrelated problem. And
possibly cc folks you think might be responsible. E.g.:

  git shortlog -se origin/master..origin/pu t/t4213-*

points to Thomas. :)

-- 8 --
Many versions of sed (e.g., that found on OS X) do not understand
|-alternation, even when backslash escaped.  Some versions can turn on
extended regexps with a special option, but of course that option is not
standard, either. Let's just write out our alternates longhand.

Signed-off-by: Jeff King p...@peff.net
---
On top of the tr/remerge-diff topic.

For curiosity, it is -E on OS X and -r on GNU sed to turn on
extended regexps. But I hear that Solaris sed also does not handle \|,
and I would not be surprised to find that it has no extended regexp
support at all. :)

 t/t4213-log-remerge-diff.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/t/t4213-log-remerge-diff.sh b/t/t4213-log-remerge-diff.sh
index 36ef17a..ec93b96 100755
--- a/t/t4213-log-remerge-diff.sh
+++ b/t/t4213-log-remerge-diff.sh
@@ -75,7 +75,9 @@ test_expect_success 'unrelated merge: without conflicts' '
 clean_output () {
git name-rev --name-only --stdin |
# strip away bits that aren't treated by the above
-   sed -e 's/^\(index\|Merge:\|Date:\).*/\1/'
+   sed -e 's/^\(index\).*/\1/' \
+   -e 's/^\(Merge:\).*/\1/' \
+   -e 's/^\(Date:\).*/\1/'
 }
 
 cat expected EOF
-- 
2.1.2.596.g7379948

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t1410: fix breakage on case-insensitive filesystems

2014-11-09 Thread Jeff King
On Sun, Nov 09, 2014 at 10:47:52PM -0800, Junio C Hamano wrote:

 On Sun, Nov 9, 2014 at 10:30 PM, Jeff King p...@peff.net wrote:
 
  I know you make test before pushing out the results of any integration
  you do. And I recall that for a while (and maybe still?) you even did so
  on VMs of a few common platforms. OS X is notoriously irritating to run
  in a VM, but would you be interested in a network-accessible install
  that you could push to and make test on as part of your routine?
 
  If what Michael is offering cannot do that, I am sure I can get GitHub
  to set something up.
 
 Even if it were network-accessible, the latency for doing an integration
 cycle and pushing there and waiting for the tests to come back may
 make it impractical to use it as part of pre-pushout test.

I had the impression you were already doing for i in my_vms; do git
push $i  ssh $i make test; done, so in theory this could be done in
parallel. But yeah, I definitely wouldn't want to add latency to your
workflow.

 But I would think that the project certainly would benefit if a
 post-receive hook at Github did an automated test on the tip of
 'pu' and bisected a breakage, if found, to pinpoint the patch
 that breaks. It could even send a notice to the author of the
 non-portable patch ;-).

Yeah, you're right. Testing on pu is probably enough. Coupled with
automated bisection, the pinpointing part is not that important, and
pu is often early enough to catch problems before people try to build
on top of them. So I think if Michael is still willing to set up his
build box, just pulling and building your pu automatically would be a
good start.

GitHub supports trigger hooks on pushes, but I do not think that is even
necessary here. Just polling once or twice a day would probably give us
good enough turn-around time.

I think somebody just needs to write the auto-bisect script. I suspect
the simplest form is something like:

  # fetch a new version; exit early if we already checked this one
  git fetch
  test $(git rev-parse pu-ok) = $(git rev-parse origin/pu) exit 0
  git reset --hard origin/pu

  # quick path: all tests succeed
  if make test; then
git tag -f pu-ok HEAD
exit 0
  fi

  # slow path: bisect the actual commit. We do not need worry about
  # cutting each bisection step down to the minimal size (e.g., finding
  # the actual failing test), because this should run only infrequently.
  git bisect start HEAD pu-ok
  git bisect run make test

The real work would be in wrapping it up in a cron-job or similar so
that it produces no output on success, and sends an email (or whatever)
with the bisect results when it fails.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] git_connect: set ssh shell command in GIT_SSH_COMMAND

2014-11-09 Thread Jeff King
On Sun, Nov 09, 2014 at 11:42:32PM +0100, Thomas Quinot wrote:

 It may be impractical to install a wrapper script for GIT_SSH
 when additional parameters need to be passed. Provide an alternative
 way of specifying a shell command to be run, including command line
 arguments, by means of the GIT_SSH_COMMAND environment variable,
 which behaves like GIT_SSH but is passed to the shell.
 
 The special circuitry to modify parameters in the case of using
 PuTTY's plink/tortoiseplink is activated only when using GIT_SSH;
 in the case of using GIT_SSH_COMMAND, it is deliberately left up to
 the user to make any required parameters adaptation before calling
 the underlying ssh implementation.
 
 Signed-off-by: Thomas Quinot tho...@quinot.org
 ---
 
 Amended patch as per discussion with Junio: change variable name
 to GIT_SSH_COMMAND, keep plink special circuitry disabled for
 this case (leaving it enabled only when using GIT_SSH, thus
 preserving compatibility with legacy usage).

I think this version looks good. Thanks for working on it.

Two style micro-nits (that I do not think even merit a re-roll by
themselves, but Junio may want to mark up as he applies):

 + } else {
 + ssh = getenv(GIT_SSH);
 + if (!ssh) ssh = ssh;

You did not even introduce this line, but only reindented it. However,
our code style usually would write this as:

if (!ssh)
ssh = ssh;

 + putty = strcasestr(ssh, plink) != NULL;

We would usually write this as:

putty = !!strcasestr(ssh, plink);

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] use child_process_init() to initialize struct child_process variables

2014-11-09 Thread Jeff King
On Sun, Nov 09, 2014 at 02:49:58PM +0100, René Scharfe wrote:

 -- 8 --
 Subject: [PATCH] trailer: use CHILD_PROCESS_INIT in apply_command()
 
 Initialize the struct child_process variable cp at declaration time.
 This is shorter, saves a function call and prevents using the variable
 before initialization by mistake.
 
 Suggested-by: Jeff King p...@peff.net
 Signed-off-by: Rene Scharfe l@web.de
 ---
  trailer.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

Thanks, both this one and the other you just sent (to use
child_process.args in more places) look good to me.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] imap-send: avoid curl functions when not building curl support

2014-11-09 Thread Jeff King
On Mon, Nov 10, 2014 at 01:39:47AM -0500, Jeff King wrote:

 On top of br/imap-send-via-libcurl. I needed this to compile 'pu' with
 NO_CURL (which I don't usually do, but was testing on a minimal box). I
 expect it can just be squashed in to the next re-roll.

Oops, just started reading through my list backlog and realized you
already noticed and dealt with this. Sorry for the noise.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.2.0-rc behavior changes (1/2)

2014-11-10 Thread Jeff King
On Mon, Nov 10, 2014 at 07:47:32PM +1100, Bryan Turner wrote:

 First change: git update-ref -d /refs/heads/nonexistent
 some-valid-sha1 now produces an error about ref locking that it
 didn't produce before
 
 Git 2.1.x and prior produced this output:
 error: unable to resolve reference refs/heads/nonexistent: No such
 file or directory
 
 Now, in the 2.2.0 RCs, it says:
 error: unable to resolve reference refs/heads/nonexistent: No such
 file or directory
 error: Cannot lock the ref 'refs/heads/nonexistent'.
 
 This one feels more like a bug, but again may not be. I say it feels
 like a bug because of the order of the messages: If git has decided
 the ref doesn't exist, why is it still trying to lock it?

I don't think this is a bug. The order you see is because the code goes
something like this:

  1. the parent function calls a sub-function to lock

  2. the sub-function generates the error no such file or directory
 and returns failure to the caller

  3. the caller reports that acquiring the lock failed

The only thing that has changed between the two is step (3), but it is
not an extra lock action after the error. It is just a more verbose
report of the same error.

That being said, the sub-function (lock_ref_sha1_basic) gives a much
more useful message. So it would be a nice enhancement to make sure that
it prints something useful in every return case, and then drop the
message from the caller.

As an aside, I'm also slightly confused by your output. Are you feeding
/refs/heads/nonexistent (with a leading slash), or
refs/heads/nonexistent (no leading slash)? If the latter, then that
should silently succeed (and seems to in my tests). If the former, then
the real problem is not ENOENT, but rather EINVAL; that name is not a
valid refname.

Older versions of git would produce:

  error: unable to resolve reference /refs/heads/nonexistent: No such file or 
directory

which is like the error you showed, but note that the refname is
reported with the leading slash. In v2.2.0-rc1, this is:

  error: unable to resolve reference /refs/heads/nonexistent: Invalid argument
  error: Cannot lock the ref '/refs/heads/nonexistent'.

which is more accurate. I could explain the differences in our output
from some simple transcription errors when writing your email, but I
wanted to make sure I am not missing something.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.2.0-rc behavior changes (2/2)

2014-11-10 Thread Jeff King
On Mon, Nov 10, 2014 at 07:51:00PM +1100, Bryan Turner wrote:

 Second change: git gc --auto now fails if there are loose empty blobs.
 
 We have a test which just touched empty files in objects/17 to trigger
 the gc --auto preconditions (Note: I realize this is completely
 invalid, and I've changed the test to no longer do this; I'm only
 surfacing the behavioral change).

This is expected. As you noticed here:

 error: object file
 .git/objects/17/02d54e8fba95ef9968a0c9b183fe22ec551c86 is empty
 fatal: unable to get object info for 1702d54e8fba95ef9968a0c9b183fe22ec551c86
 error: failed to run prune

the error comes from git prune failing. It is using unreachable but
recent objects as graph tips to keep around. If we can't load a tip
object, we abort the prune, because we would not want to delete objects
that might have been referenced (e.g., if it were a real corrupted
object that we could not read).

The second behavior (die on broken objects) has been around for a while.
The new change (in the commit you bisected to) is that we are
considering more objects.

I'll admit I didn't really consider the impact on sticking broken object
files into the object directory, but I think the code is doing the
sensible and safe thing.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mac test failure -- test 3301

2014-11-10 Thread Jeff King
On Tue, Nov 11, 2014 at 02:40:19AM +0100, Johan Herland wrote:

  This and all other failures are due to the output of 'wc -l', which on
  Mac is {whitespace}1 rather than just 1 as it is on other
  platforms. fbe4f748 added quotes around the $(... | wc -l) invocation
  which caused the whitespace to be retained. A minimum fix would be to
  drop the quotes surrounding $().
 
 Ah, thanks!
 
 I thought that quoting command output was a good idea in general. Am I
 wrong, or is this just one exception to an otherwise good guideline?

It usually is a good idea to prevent whitespace splitting by the shell
(and especially with things that might unexpectedly be empty, in which
case they end up as no argument and not an empty one). So yeah, this
is an exception.

 Anyway, here is the minimal diff to remove quoting from the $(... | wc
 -l) commands (probably whitespace damaged by GMail - sorry). Junio:
 feel free to squash this in, or ask for a re-roll:

I think we have test_line_count exactly because of this unportability
with wc output.

You'd want:

  git ls-tree refs/notes/commits actual 
  test_line_count = 1 actual

or similar.

By the way, the point of that ls-tree appears to be to check the
number of total notes stored. Since notes are stored in a hashed
structure, should it be ls-tree -r? Otherwise, we see only the leading
directories; two notes whose sha1s start with the same two characters
would be considered a single entry.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] run-command: use void to declare that functions take no parameters

2014-11-10 Thread Jeff King
On Mon, Nov 10, 2014 at 02:43:10PM -0800, Junio C Hamano wrote:

  Explicitly declare that git_atexit_dispatch() and git_atexit_clear()
  take no parameters instead of leaving their parameter list empty and
  thus unspecified.
 [...]
 
 I was kind of surprised after running a git blame to find that this
 is a recent thing, and the same patch looked quite substandard with
 numerious style violations, and I somehow managed to let them slip
 in X-.  Perhaps I was having a bad day or something...

I had always just assumed that -Wstrict-prototypes was part of -Wall,
but it is not (nor even part of -Wextra!). Maybe it is time to add it to
your integration-build flags. :)

Looks like we also need this on top of hv/submodule-config (still in pu,
so squashing is probably best):

diff --git a/submodule-config.h b/submodule-config.h
index 58afc83..9061e4e 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -24,6 +24,6 @@ const struct submodule *submodule_from_name(const unsigned 
char *commit_sha1,
const char *name);
 const struct submodule *submodule_from_path(const unsigned char *commit_sha1,
const char *path);
-void submodule_free();
+void submodule_free(void);
 
 #endif /* SUBMODULE_CONFIG_H */

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] replace: fix replacing object with itself

2014-11-10 Thread Jeff King
On Tue, Nov 11, 2014 at 12:20:56AM +0100, Manzur Mukhitdinov wrote:

 When object is replaced with itself git shows unhelpful messages like(git 
 log):
 fatal: replace depth too high for object SHA1
 
 Prevents user from replacing object with itself(with test for checking
 this case).

Thanks, this looks good to me.

 +test_expect_success 'replacing object with itself must fail' '
 +test_must_fail git replace $HASH1 $HASH1 
 +HASH8=$(git rev-parse --verify HEAD) 
 +test_must_fail git replace HEAD $HASH8 
 +test_must_fail git replace --graft HEAD HEAD^ 
 +test_must_fail env GIT_EDITOR=true git replace --edit HEAD
 +'

I think some of these overlap with earlier tests (I know that the
--edit case is checked already), but I think it is nice to keep the
related checks together here.

Should we maybe squash this in to drop the now redundant test (AFAICT,
that is the only one that overlaps)?

diff --git a/t/t6050-replace.sh b/t/t6050-replace.sh
index 98ac9dd..5f96374 100755
--- a/t/t6050-replace.sh
+++ b/t/t6050-replace.sh
@@ -369,9 +369,8 @@ test_expect_success '--edit with and without already 
replaced object' '
git cat-file commit $PARA3 | grep A fake Thor
 '
 
-test_expect_success '--edit and change nothing or command failed' '
+test_expect_success '--edit with failed editor' '
git replace -d $PARA3 
-   test_must_fail env GIT_EDITOR=true git replace --edit $PARA3 
test_must_fail env GIT_EDITOR=./fakeeditor;false git replace --edit 
$PARA3 
GIT_EDITOR=./fakeeditor git replace --edit $PARA3 
git replace -l | grep $PARA3 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] Clean stale environment pointer in finish_command()

2014-11-10 Thread Jeff King
On Mon, Nov 10, 2014 at 01:44:24PM -0800, Junio C Hamano wrote:

 Johannes Schindelin johannes.schinde...@gmx.de writes:
 
  In start_command(), unset env fields are initialized via env_array. In
  finish_command(), the env_array is cleared, therefore the env field
  will point to free()d data.
 
  However, start_command() will set env to env_array.argv only if env
  was unset to begin with, and if it was already set, the caller will need
  the original value. Therefore, we need to be very careful only to reset
  env in finish_command() when it has been initialized in start_command().
 
 Hmph.  Does the same observation apply to cmd-argv that is
 initialied to point to cmd-args.argv only when it is unset?

Yes, they behave exactly the same (I think Dscho just doesn't hit it in
his patch because he assigns argv manually).

I don't have a real problem with going in this direction as a safety
measure, but I am not sure that it is safe to reuse a child_process
after finish_command in general, without an intervening
child_process_init. For instance, calling start_command will convert a
child_process.in value of -1 instead a pipe, and overwrite that -1
with the descriptor of the pipe. A subsequent use of the same
child_process struct will ask the second child to use that pipe (the
write-half of the pipe, mind you) as its stdin, which is nonsensical.

So I think you are much better off just using two child_process structs
(or a single one and reinitializing in between calls).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] Clean stale environment pointer in finish_command()

2014-11-10 Thread Jeff King
On Mon, Nov 10, 2014 at 03:41:09PM +0100, Johannes Schindelin wrote:

  However, start_command() will set env to env_array.argv only if env
  was unset to begin with, and if it was already set, the caller will need
  the original value. Therefore, we need to be very careful only to reset
  env in finish_command() when it has been initialized in start_command().
 
 In case it was unclear: this is needed for the the suggested switch from the
 previous method to construct the environment to the new env_array method
 to work.
 
 (The env_array method unfortunately requires the code to initialize the
 environment twice because finish_command() insists on always releasing the
 env_array, even if the caller may want to reuse the generated array).

I don't think this is unfortunately; freeing the memory was the entire
purpose in adding env_array. If you want to easily reuse the same
environment in multiple commands, it is still perfectly fine to use
env directly, like:

  struct argv_array env = ARGV_ARRAY_INIT;
  struct child_process one = CHILD_PROCESS_INIT;
  struct child_process two = CHILD_PROCESS_INIT;

  ... setup env with argv_array_push ...

  one.argv = foo;
  one.env = env.argv;
  run_command(one);

  two.argv = bar;
  two.env = env.argv;
  run_command(two);

  argv_array_clear(env);

You do not get the benefit of the auto-cleanup (you have to call
argv_array_clear yourself), but that is less bad than repeating the
setup of env twice.

It may be that Documentation/technical/api-run-command.txt needs to
make this more explicit (I just read over it and it looks OK to me. But
since I am the one who designed the feature, I am not the best test of
whether it is sufficiently clear).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/15] refs.c: return error instead of dying when locking fails during transaction

2014-11-11 Thread Jeff King
On Tue, Oct 21, 2014 at 01:36:47PM -0700, Ronnie Sahlberg wrote:

 commit e193c10fc4f9274d1e751cfcdcc4507818e8d498 upstream.
 
 Change lock_ref_sha1_basic to return an error instead of dying when
 we fail to lock a file during a transaction.
 This function is only called from transaction_commit() and it knows how
 to handle these failures.
 [...]
 - else
 - unable_to_lock_die(ref_file, errno);
 + else {
 + struct strbuf err = STRBUF_INIT;
 + unable_to_lock_message(ref_file, errno, err);
 + error(%s, err.buf);
 + strbuf_reset(err);
 + goto error_return;
 + }

I coincidentally just wrote almost the identical patch, because this
isn't just a cleanup; it fixes a real bug. During pack_refs, we call
prune_ref to lock and delete the loose ref. If the lock fails, that's
OK; that just means somebody else is updating it at the same time, and
we can skip our pruning step. But due to the unable_to_lock_die call
here in lock_ref_sha1_basic, the pack-refs process may die prematurely.

I wonder if it is worth pulling this one out from the rest of the
series, as it has value (and can be applied) on its own. I did some
digging on the history of this, too. Here's the rationale I wrote:


lock_ref_sha1_basic: do not die on locking errors

lock_ref_sha1_basic is inconsistent about when it calls
die() and when it returns NULL to signal an error. This is
annoying to any callers that want to recover from a locking
error.

This seems to be mostly historical accident. It was added in
4bd18c4 (Improve abstraction of ref lock/write.,
2006-05-17), which returned an error in all cases except
calling safe_create_leading_directories, in which case it
died.  Later, 40aaae8 (Better error message when we are
unable to lock the index file, 2006-08-12) asked
hold_lock_file_for_update to die for us, leaving the
resolve_ref code-path the only one which returned NULL.

We tried to correct that in 5cc3cef (lock_ref_sha1(): do not
sometimes error() and sometimes die()., 2006-09-30),
by converting all of the die() calls into returns. But we
missed the die flag passed to the lock code, leaving us
inconsistent. This state persisted until e5c223e
(lock_ref_sha1_basic(): if locking fails with ENOENT, retry,
2014-01-18). Because of its retry scheme, it does not ask
the lock code to die, but instead manually dies with
unable_to_lock_die().

We can make this consistent with the other return paths by
converting this to use unable_to_lock_message(), and
returning NULL. This is safe to do because all callers
already needed to check the return value of the function,
since it could fail (and return NULL) for other reasons.

I also have some other cleanups to lock_ref_sha1_basic's error handling.
I'd be happy to take over this patch and send it along with those
cleanups as a separate series.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: diff-highlight highlight words?

2014-11-11 Thread Jeff King
[+cc git@vger, since this may be of interest to others]

On Tue, Nov 11, 2014 at 02:40:59PM -0800, Scott Baker wrote:

 I'd like to recreate the github style diffs on the command line. It
 appears that your diff-highlight is very close. The current version only
 allows you to invert the colors which isn't ideal.

Yes, I never built any configurability into the script. However, you can
tweak the definitions at the top to get different effects.
Traditionally, ANSI colors on the terminal only came in two flavors:
normal and bright (which is attached to the bold attribute).
Instead of reversing video, you can switch on brightness like this:

diff --git a/contrib/diff-highlight/diff-highlight 
b/contrib/diff-highlight/diff-highlight
index c4404d4..c99de99 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -5,8 +5,8 @@ use strict;
 
 # Highlight by reversing foreground and background. You could do
 # other things like bold or underline if you prefer.
-my $HIGHLIGHT   = \x1b[7m;
-my $UNHIGHLIGHT = \x1b[27m;
+my $HIGHLIGHT   = \x1b[1m;
+my $UNHIGHLIGHT = \x1b[22m;
 my $COLOR = qr/\x1b\[[0-9;]*m/;
 my $BORING = qr/$COLOR|\s/;
 

However on most modern terminals the color difference between bright and
normal is very subtle, and this doesn't look good.

XTerm (and other modern terminals) has 256-color support, so you could
do better there (assuming your terminal supports it). The current code
builds on the existing colors produced by git (because the operations
are only invert colors and uninvert colors). Doing anything fancier
requires handling add/del differently.  That patch might look something
like this (which uses dark red/green for most of the line, and a much
brighter variant for the highlighted text):

diff --git a/contrib/diff-highlight/diff-highlight 
b/contrib/diff-highlight/diff-highlight
index c4404d4..4e08f3c 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -5,11 +5,16 @@ use strict;
 
 # Highlight by reversing foreground and background. You could do
 # other things like bold or underline if you prefer.
-my $HIGHLIGHT   = \x1b[7m;
-my $UNHIGHLIGHT = \x1b[27m;
 my $COLOR = qr/\x1b\[[0-9;]*m/;
 my $BORING = qr/$COLOR|\s/;
 
+# Elements:
+# 0 - highlighted text
+# 1 - unhighlighted text
+# 2 - reset to normal
+my @ADD_HIGHLIGHT = (\x1b[38;2;100;255;100m, \x1b[38;2;0;255;0m, 
\x1b[30m);
+my @DEL_HIGHLIGHT = (\x1b[38;2;255;100;100m, \x1b[38;2;255;0;0m, 
\x1b[30m);
+
 my @removed;
 my @added;
 my $in_hunk;
@@ -128,8 +133,8 @@ sub highlight_pair {
}
 
if (is_pair_interesting(\@a, $pa, $sa, \@b, $pb, $sb)) {
-   return highlight_line(\@a, $pa, $sa),
-  highlight_line(\@b, $pb, $sb);
+   return highlight_line(\@a, $pa, $sa, @DEL_HIGHLIGHT),
+  highlight_line(\@b, $pb, $sb, @ADD_HIGHLIGHT);
}
else {
return join('', @a),
@@ -144,15 +149,18 @@ sub split_line {
 }
 
 sub highlight_line {
-   my ($line, $prefix, $suffix) = @_;
+   my ($line, $prefix, $suffix, $highlight, $unhighlight, $reset) = @_;
 
-   return join('',
+   my $r = join('',
+   $unhighlight,
@{$line}[0..($prefix-1)],
-   $HIGHLIGHT,
+   $highlight,
@{$line}[$prefix..$suffix],
-   $UNHIGHLIGHT,
-   @{$line}[($suffix+1)..$#$line]
+   $unhighlight,
+   @{$line}[($suffix+1)..$#$line],
);
+   $r =~ s/\n$/$reset$/;
+   return $r;
 }
 
 # Pairs are interesting to highlight only if we are going to end up


The result does not look terrible to me, though I think I find the
reverse-video more obvious when scanning the diff. To look more like
GitHub's view, you could instead set the background by doing this on
top:

diff --git a/contrib/diff-highlight/diff-highlight 
b/contrib/diff-highlight/diff-highlight
index 4e08f3c..6f98db4 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -12,8 +12,8 @@ my $BORING = qr/$COLOR|\s/;
 # 0 - highlighted text
 # 1 - unhighlighted text
 # 2 - reset to normal
-my @ADD_HIGHLIGHT = (\x1b[38;2;100;255;100m, \x1b[38;2;0;255;0m, 
\x1b[30m);
-my @DEL_HIGHLIGHT = (\x1b[38;2;255;100;100m, \x1b[38;2;255;0;0m, 
\x1b[30m);
+my @ADD_HIGHLIGHT = (\x1b[30;48;2;100;255;100m, \x1b[30;48;2;0;255;0m, 
\x1b[0m);
+my @DEL_HIGHLIGHT = (\x1b[30;48;2;255;100;100m, \x1b[30;48;2;255;0;0m, 
\x1b[0m);
 
 my @removed;
 my @added;
@@ -151,14 +151,18 @@ sub split_line {
 sub highlight_line {
my ($line, $prefix, $suffix, $highlight, $unhighlight, $reset) = @_;
 
+   # strip out existing colors from git, which will clash
+   # both due to contrast and because of random ANSI resets
+   # inside the content
+   my $p = join('', @{$line}[0..($prefix-1)]);
+   my $t = join('', @{$line}[$prefix..$suffix]);
+   my $s = join('', 

Re: [PATCH v2 1/2] Clean stale environment pointer in finish_command()

2014-11-12 Thread Jeff King
On Wed, Nov 12, 2014 at 11:45:19AM +0100, Johannes Schindelin wrote:

 Okay, I have to say that I was led to believe that reusing the
 child_process struct is okay because argv_array_clear() explicitly
 reinitializes the env_array field, something that is useless churn unless
 you plan to reuse the memory.

The argv_array code prepares its data structure for reuse after freeing.
But child_process which uses it does not make any such promises. If
there were an argv_array_free(), child_process could use it. But there
is only argv_array_clear().

I don't have a problem with finish_command leaving its child_process in
a known usable state (e.g., by calling child_process_init). But I also
don't see much benefit.

 However, my personal taste says that reusing the same memory is more
 elegant than to waste extra memory unnecessarily, so I will go with the
 child_process_init() solution.

I do not mind much either way. But I doubt that a single extra struct on
the stack will break the bank, compared to the fact that we are forking
and execing a new program. I'd also not be surprised if a smart compiler
could notice that the variables are used exclusively in non-overlapping
bits of the code, and just reuse the stack space.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] Clean stale environment pointer in finish_command()

2014-11-12 Thread Jeff King
On Wed, Nov 12, 2014 at 05:52:29AM -0500, Jeff King wrote:

  However, my personal taste says that reusing the same memory is more
  elegant than to waste extra memory unnecessarily, so I will go with the
  child_process_init() solution.
 
 I do not mind much either way. But I doubt that a single extra struct on
 the stack will break the bank, compared to the fact that we are forking
 and execing a new program. I'd also not be surprised if a smart compiler
 could notice that the variables are used exclusively in non-overlapping
 bits of the code, and just reuse the stack space.

Actually, I take that back. We are passing a pointer to a struct, rather
than by-value, so the compiler cannot know that the sub-function does
not store that pointer in a static variable, and reference it in the
next call. It must use two variables if it cannot see the definition of
run_command.

I still think it's pointless optimization to worry about, and you should
write whichever is the most readable and maintainable.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t1410: fix breakage on case-insensitive filesystems

2014-11-12 Thread Jeff King
On Wed, Nov 12, 2014 at 09:20:22PM +0100, Johannes Sixt wrote:

 Am 09.11.2014 um 02:59 schrieb Jeff King:
   test_expect_success 'stale dirs do not cause d/f conflicts (reflogs off)' '
  -   test_when_finished git branch -d a || git branch -d a/b 
  +   test_when_finished git branch -d one || git branch -d one/two 
   
  -   git branch a/b master 
  -   echo a/b@{0} branch: Created from master expect 
  -   git log -g --format=%gd %gs a/b actual 
  +   git branch one/two master 
  +   echo one/two@{0} branch: Created from master expect 
  +   git log -g --format=%gd %gs one/two actual 
  test_cmp expect actual 
  -   git branch -d a/b 
  +   git branch -d one/two 
   
  -   # same as before, but we only create a reflog for a if
  +   # same as before, but we only create a reflog for one if
  # it already exists, which it does not
  -   git -c core.logallrefupdates=false branch a master 
  +   git -c core.logallrefupdates=false branch one master 
  : expect 
  -   git log -g --format=%gd %gs a actual 
  +   git log -g --format=%gd %gs one actual 
  test_cmp expect actual
   '
   
 
 On Linux I observe
 
 Deleted branch one/two (was b60a214).
 warning: unable to unlink .git/logs/refs/heads/one: Is a directory
 Deleted branch one (was b60a214).
 ok 15 - stale dirs do not cause d/f conflicts (reflogs off)
 
 (notice the warning)

Yes, this is expected. The warning actually comes from the git branch
-d run during cleanup. Branch one exists but has no reflog. Instead
there is a crufty dir call logs/refs/heads/one. The deletion process
blindly calls unlink on it and complains when it fails for reasons
other than ENOENT.

We could suppress that warning, but I didn't think it was really worth
the bother. This is such a funny setup (reflogs on, then off, then on,
_and_ a d/f conflict in the branches) that I doubt it will come up much.

I seem to recall that ancient versions of SunOS used to do bad things
when you called unlink on a directory, but I do not know if that is
still the case (I also would doubt this is the only place in git where
we blindly do an unlink if you can without actually checking the file.

 but on Windows the test fails with
 
 Deleted branch one/two (was b60a214).
 error: Unable to append to .git/logs/refs/heads/one: Permission denied
 fatal: Cannot update the ref 'refs/heads/one'.
 not ok 15 - stale dirs do not cause d/f conflicts (reflogs off)

That looks more like it is failing the actual test (i.e., the creation
of branch one when there is cruft in the reflog). My guess is that
calling open() on a directory is giving us EACCES instead of EISDIR. Can
you verify that?

If that is the case, then this isn't a new breakage, I think, but just
code we weren't previously exercising. It would be interesting to know
whether:

  git config core.logallrefupdates true
  git branch one/two
  git branch -d one/two
  git branch one

works (even without my patch). If so, then there's probably something
else going on.

The relevant bits are in log_ref_setup. We try to open() once, and
accept EISDIR as a hint that we may need to remove an empty directory
and try again. If Windows gives us EACCES, then we may need to have that
trigger the same code.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t1410: fix breakage on case-insensitive filesystems

2014-11-13 Thread Jeff King
On Thu, Nov 13, 2014 at 09:50:24AM +0100, Johannes Sixt wrote:

 That looks more like it is failing the actual test (i.e., the creation
 of branch one when there is cruft in the reflog). My guess is that
 calling open() on a directory is giving us EACCES instead of EISDIR. Can
 you verify that?
 
 If that is the case, then this isn't a new breakage, I think, but just
 code we weren't previously exercising. It would be interesting to know
 whether:
 
git config core.logallrefupdates true
git branch one/two
git branch -d one/two
git branch one
 
 works (even without my patch). If so, then there's probably something
 else going on.
 
 Don't know what you mean with my patch (the one I was responding to
 touches only t1410).

The patch you are responding to is a fix-up for 9233887, which tweaked
the code and added those tests in the first place (I doubt it would work
for you, though, as it has a problem on case-insensitive filesystems).

 But the sequence works as expected with a version built
 in September:

Hmph. So that would mean my theory is not right. Or maybe I am not
accounting for something else in my analysis.

I guess it is odd that the test right before the failing one passes (it
is basically that same sequence, with reflogs turned on for both
operations), which implies that we are properly getting EISDIR. The only
difference in the failing test is that reflogs are turned off for the
git branch one operation. But I cannot see why that would be broken if
the other one passes.

I wish it were easy for me to ssh into a Windows VM and run gdb. ;)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t5705: Use the correct file:// URL

2014-11-13 Thread Jeff King
On Thu, Nov 13, 2014 at 08:36:07AM +0100, Torsten Bögershausen wrote:

 A URL like file;//. is (no longer) supported by Git:
 Typically there is no host, and RFC1738 says that file:///path
 should be used.
 
 Update t5705 to use a working URL.

Interesting. This looks like it was unintentionally lost in c59ab2e
(connect.c: refactor url parsing, 2013-11-28). Given RFC1738, and that
this is the first notice of it (and that it is not even a real use case,
but something questionable in the test script), it's probably OK to
declare the syntax dead and not treat it like a regression.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug: git log showing nothing when using --since and --until flags with specific dates

2014-11-13 Thread Jeff King
On Thu, Nov 13, 2014 at 11:27:06AM +1100, Colin Smith wrote:

 Apologies if this has already been raised or PEBCAK, but I've noticed
 a bug where git log with certain date ranges breaks things. It appears
 to be any --since date with a --until date in the future between
 2014-12-01 and 2014-12-09. Dates from 2014-12-10 appear to work, and
 so does the date 2015-12-01.

Ugh. Approxidate strikes again:

  for i in 2014-11-01 2013-12-01 2014-12-01; do
./test-date approxidate $i
  done

produces:

  2014-11-01 - 2014-11-01 09:35:19 +
  2013-12-01 - 2013-12-01 09:35:19 +
  2014-12-01 - 2014-01-12 09:35:19 +

The first two are right, but the fourth one is not.  It's probably
something simple and stupid.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] approxidate and future ISO-like times

2014-11-13 Thread Jeff King
On Thu, Nov 13, 2014 at 04:36:06AM -0500, Jeff King wrote:

 On Thu, Nov 13, 2014 at 11:27:06AM +1100, Colin Smith wrote:
 
  Apologies if this has already been raised or PEBCAK, but I've noticed
  a bug where git log with certain date ranges breaks things. It appears
  to be any --since date with a --until date in the future between
  2014-12-01 and 2014-12-09. Dates from 2014-12-10 appear to work, and
  so does the date 2015-12-01.
 
 Ugh. Approxidate strikes again:
 
   for i in 2014-11-01 2013-12-01 2014-12-01; do
 ./test-date approxidate $i
   done
 
 produces:
 
   2014-11-01 - 2014-11-01 09:35:19 +
   2013-12-01 - 2013-12-01 09:35:19 +
   2014-12-01 - 2014-01-12 09:35:19 +
 
 The first two are right, but the fourth one is not.  It's probably
 something simple and stupid.

Less simple and stupid than I thought, but I think I have a fix. It is
not about December specifically, but about the date being in the future.
The first patch is a cleanup to help us more accurately test the bug;
the interesting bits are in the second one.

  [1/2]: pass TIME_DATE_NOW to approxidate future-check
  [2/2]: approxidate: allow ISO-like dates far in the future

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] pass TIME_DATE_NOW to approxidate future-check

2014-11-13 Thread Jeff King
The approxidate functions accept an extra now parameter to
avoid calling time() themselves. We use this in our test
suite to make sure we have a consistent time for computing
relative dates. However, deep in the bowels of approxidate,
we also call time() to check whether possible dates are far
in the future. Let's make sure that the now override makes
it to that spot, too, so we can consistently test that
feature.

Signed-off-by: Jeff King p...@peff.net
---
This code path also gets called from the more-strict parse_date (which
does not know about the current time). This continues to call time()
dynamically, but it is not clear to me if that ever actually matters in
practice.

 date.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/date.c b/date.c
index 59dfe57..e1a4d49 100644
--- a/date.c
+++ b/date.c
@@ -405,9 +405,9 @@ static int is_date(int year, int month, int day, struct tm 
*now_tm, time_t now,
return 0;
 }
 
-static int match_multi_number(unsigned long num, char c, const char *date, 
char *end, struct tm *tm)
+static int match_multi_number(unsigned long num, char c, const char *date,
+ char *end, struct tm *tm, time_t now)
 {
-   time_t now;
struct tm now_tm;
struct tm *refuse_future;
long num2, num3;
@@ -433,7 +433,8 @@ static int match_multi_number(unsigned long num, char c, 
const char *date, char
case '-':
case '/':
case '.':
-   now = time(NULL);
+   if (!now)
+   now = time(NULL);
refuse_future = NULL;
if (gmtime_r(now, now_tm))
refuse_future = now_tm;
@@ -513,7 +514,7 @@ static int match_digit(const char *date, struct tm *tm, int 
*offset, int *tm_gmt
case '/':
case '-':
if (isdigit(end[1])) {
-   int match = match_multi_number(num, *end, date, end, 
tm);
+   int match = match_multi_number(num, *end, date, end, 
tm, 0);
if (match)
return match;
}
@@ -1013,7 +1014,8 @@ static const char *approxidate_alpha(const char *date, 
struct tm *tm, struct tm
return end;
 }
 
-static const char *approxidate_digit(const char *date, struct tm *tm, int *num)
+static const char *approxidate_digit(const char *date, struct tm *tm, int *num,
+time_t now)
 {
char *end;
unsigned long number = strtoul(date, end, 10);
@@ -1024,7 +1026,8 @@ static const char *approxidate_digit(const char *date, 
struct tm *tm, int *num)
case '/':
case '-':
if (isdigit(end[1])) {
-   int match = match_multi_number(number, *end, date, end, 
tm);
+   int match = match_multi_number(number, *end, date, end,
+  tm, now);
if (match)
return date + match;
}
@@ -1087,7 +1090,7 @@ static unsigned long approxidate_str(const char *date,
date++;
if (isdigit(c)) {
pending_number(tm, number);
-   date = approxidate_digit(date-1, tm, number);
+   date = approxidate_digit(date-1, tm, number, 
time_sec);
touched = 1;
continue;
}
-- 
2.1.2.596.g7379948

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] approxidate: allow ISO-like dates far in the future

2014-11-13 Thread Jeff King
When we are parsing approxidate strings and we find three
numbers separate by one of :/-., we guess that it may be a
date. We feed the numbers to match_multi_number, which
checks whether it makes sense as a date in various orderings
(e.g., dd/mm/yy or mm/dd/yy, etc).

One of the checks we do is to see whether it is a date more
than 10 days in the future. This was added in 38035cf (date
parsing: be friendlier to our European friends.,
2006-04-05), and lets us guess that if it is currently April
2014, then 10/03/2014 is probably March 10th, not October
3rd.

This has a downside, though; if you want to be overly
generous with your --until date specification, we may
wrongly parse 2014-12-01 as 2014-01-12 (because the
latter is an in-the-past date). If the year is a future year
(i.e., both are future dates), it gets even weirder. Due to
the vagaries of approxidate, months _after_ the current date
(no matter the year) get flipped, but ones before do not.

This patch drops the in the future check for dates of this
form, letting us treat them always as -mm-dd, even if
they are in the future. This does not affect the normal
dd/mm/ versus mm/dd/ lookup, because this code path
only kicks in when the first number is greater than 70
(i.e., it must be a year, and cannot be either a date or a
month).

The one possible casualty is that -dd-mm is less
likely to be chosen over -mm-dd. That's probably OK,
though because:

  1. The difference happens only when the date is in the
 future. Already we prefer -mm-dd for dates in the
 past.

  2. It's unclear whether anybody even uses -dd-mm
 regularly. It does not appear in lists of common date
 formats in Wikipedia[1,2].

  3. Even if (2) is wrong, it is better to prefer ISO-like
 dates, as that is consistent with what we use elsewhere
 in git.

[1] http://en.wikipedia.org/wiki/Date_and_time_representation_by_country
[2] http://en.wikipedia.org/wiki/Calendar_date

Reported-by: Colin Smith colin.web...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
I did not single out -mm-dd here versus /mm/dd; the change
applies no matter which separator is used. If we wanted to be more
conservative, we could apply it only to the more ISO-looking form with
dashes, but I tend to think the reasoning makes sense no matter which
separate you use (i.e., I do not think any variant with year then day is
in common use).

 date.c  | 29 -
 t/t0006-date.sh |  3 +++
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/date.c b/date.c
index e1a4d49..b9eecfb 100644
--- a/date.c
+++ b/date.c
@@ -363,7 +363,8 @@ static int match_alpha(const char *date, struct tm *tm, int 
*offset)
return skip_alpha(date);
 }
 
-static int is_date(int year, int month, int day, struct tm *now_tm, time_t 
now, struct tm *tm)
+static int is_date(int year, int month, int day, struct tm *now_tm, time_t now,
+  struct tm *tm, int allow_future)
 {
if (month  0  month  13  day  0  day  32) {
struct tm check = *tm;
@@ -388,14 +389,16 @@ static int is_date(int year, int month, int day, struct 
tm *now_tm, time_t now,
if (!now_tm)
return 1;
 
-   specified = tm_to_time_t(r);
+   if (!allow_future) {
+   specified = tm_to_time_t(r);
 
-   /* Be it commit time or author time, it does not make
-* sense to specify timestamp way into the future.  Make
-* sure it is not later than ten days from now...
-*/
-   if ((specified != -1)  (now + 10*24*3600  specified))
-   return 0;
+   /* Be it commit time or author time, it does not make
+* sense to specify timestamp way into the future.  Make
+* sure it is not later than ten days from now...
+*/
+   if ((specified != -1)  (now + 10*24*3600  specified))
+   return 0;
+   }
tm-tm_mon = r-tm_mon;
tm-tm_mday = r-tm_mday;
if (year != -1)
@@ -441,10 +444,10 @@ static int match_multi_number(unsigned long num, char c, 
const char *date,
 
if (num  70) {
/* -mm-dd? */
-   if (is_date(num, num2, num3, refuse_future, now, tm))
+   if (is_date(num, num2, num3, refuse_future, now, tm, 1))
break;
/* -dd-mm? */
-   if (is_date(num, num3, num2, refuse_future, now, tm))
+   if (is_date(num, num3, num2, refuse_future, now, tm, 1))
break;
}
/* Our eastern European friends say dd.mm.yy[yy]
@@ -452,14 +455,14 @@ static int match_multi_number

Re: t9902-completion.sh failed

2014-11-13 Thread Jeff King
On Thu, Nov 13, 2014 at 04:59:12PM +0600, Alex Kuleshov wrote:

 i just got git from master (f6f61cbbad0611e03b712cc354f1665b5d7b087e),
 built and installed it successfully, now i'm running make test and got
 following error:
 
 *** t9902-completion.sh ***
 t9902-completion.sh: 118:
 /home/shk/dev/git/t/../contrib/completion/git-completion.bash: Syntax
 error: ( unexpected (expecting fi)
 FATAL: Unexpected exit with code 2
 make[2]: *** [t9902-completion.sh] Error 1
 make[2]: Leaving directory `/home/shk/dev/git/t'
 make[1]: *** [test] Error 2
 make[1]: Leaving directory `/home/shk/dev/git/t'
 make: *** [test] Error 2
 
 $ bash --version
 4.3.11(1)-release (x86_64-pc-linux-gnu)

Weird. I can't reproduce here, using the version of bash from Debian
unstable (4.3.30(1)), nor compiling 4.3.11 from sources. What platform
are you on (i.e., might it be bash + some other patches installed by the
distro)?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: t9902-completion.sh failed

2014-11-13 Thread Jeff King
On Thu, Nov 13, 2014 at 05:34:50PM +0600, Alex Kuleshov wrote:

 I'm using ubuntu 14.04 x86_64 and bash GNU bash, version
 4.3.11(1)-release (x86_64-pc-linux-gnu).
 
 I didn't applied any patches to bash for all time since i installed
 system. so it reall weird

I tried on a fresh install of Ubuntu 14.04.1, with the same version of
bash, and I still can't reproduce the problem. I'm not sure what else to
check.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git checkout $tree -- $path always rewrites files

2014-11-13 Thread Jeff King
On Sun, Nov 09, 2014 at 09:21:49AM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote:
 
  I think that has direct linkage; what you have in mind I think is
  http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935
 
  Thanks for that link.
 
 It was one of the items in the git blame leftover bits list
 (websearch for that exact phrase), so I didn't have to do any
 digging just for this thread ;-)
 
 But I made a huge typo above.  s/I think/I do not think/;

Oh. That might explain some of my confusion. :)

 The original observation you made in this thread is that when git
 checkout $tree - $pathspec, whose defintion is to grab paths in
 the $tree that match $pathspec and put them in the index, and then
 overwrite the working tree paths with the contents of these paths
 that are updated in the index with the contents taken from the
 $tree, unnecessarily rewrites the working tree paths even when they
 already had contents that were up to date.  That is what I called an
 implementation glitch.
 
 The old thread is a different topic.
 [...]

Right, I do agree that these things don't need to be linked. The reason
I ended up dealing with the deletion thing is that one obvious way to
implement do not touch entries which have not changed is by running a
diff between $tree and the index. But doing a diff means we have to
reconsider all of the code surrounding deletions (and handling of
unmerged entries in the pathspec but not in $tree). I tried a few
variants and had trouble making it work (getting caught up either in the
make sure each pathspec matched code, or the treat unmerged entries
specially behavior).

In the patch below, I ended up retaining the existing
read_tree_recursive code, and just specially handling the replacement of
index entries (which is itself sort of a diff, but it fits nicely into
the existing scheme).

 I'd prefer that these two to be treated separately.

Yeah, that makes sense after reading your emails. What I was really
unclear on was whether the handling of deletion was a bug or a design
choice, and it is the latter (if it were the former, we would not need a
transition plan :) ).

Anyway, here is the patch.

-- 8 --
Subject: checkout $tree: do not throw away unchanged index entries

When we git checkout $tree, we pull paths from $tree into
the index, and then check the resulting entries out to the
worktree. Our method for the first step is rather
heavy-handed, though; it clobbers the entire existing index
entry, even if the content is the same. This means we lose
our stat information, leading checkout_entry to later
rewrite the entire file with identical content.

Instead, let's see if we have the identical entry already in
the index, in which case we leave it in place. That lets
checkout_entry do the right thing. Our tests cover two
interesting cases:

  1. We make sure that a file which has no changes is not
 rewritten.

  2. We make sure that we do update a file that is unchanged
 in the index (versus $tree), but has working tree
 changes. We keep the old index entry, and
 checkout_entry is able to realize that our stat
 information is out of date.

Signed-off-by: Jeff King p...@peff.net
---
Note that the test refreshes the index manually (because we are tweaking
the timestamp of file2). In normal use this should not be necessary
(i.e., your entries should generally be uptodate). I did wonder if
checkout should be refreshing the index itself, but it would a bunch of
extra lstats in the common case.

 builtin/checkout.c| 31 +--
 t/t2022-checkout-paths.sh | 17 +
 2 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 5410dac..67cab4e 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -65,21 +65,40 @@ static int post_checkout_hook(struct commit *old, struct 
commit *new,
 static int update_some(const unsigned char *sha1, const char *base, int 
baselen,
const char *pathname, unsigned mode, int stage, void *context)
 {
-   int len;
+   struct strbuf buf = STRBUF_INIT;
struct cache_entry *ce;
+   int pos;
 
if (S_ISDIR(mode))
return READ_TREE_RECURSIVE;
 
-   len = baselen + strlen(pathname);
-   ce = xcalloc(1, cache_entry_size(len));
+   strbuf_add(buf, base, baselen);
+   strbuf_addstr(buf, pathname);
+
+   /*
+* If the entry is the same as the current index, we can leave the old
+* entry in place. Whether it is UPTODATE or not, checkout_entry will
+* do the right thing.
+*/
+   pos = cache_name_pos(buf.buf, buf.len);
+   if (pos = 0) {
+   struct cache_entry *old = active_cache[pos];
+   if (create_ce_mode(mode) == old-ce_mode 
+   !hashcmp(old-sha1, sha1)) {
+   old-ce_flags

Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-10 Thread Jeff King
On Sun, Jul 07, 2013 at 10:49:46AM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  Perhaps we need
 
git cat-file --batch-format=%(disk-size) %(object)
 
  or similar.
 
 I agree with your reasoning.  It may be simpler to give an interface
 to ask for which pieces of info, e.g. --batch-cols=size,disksize,
 without giving the readers a flexible format.

I started on this, and it turned out not to really be any simpler. In
particular there is the question of whether

  git cat-file --batch-cols=size,type

is different from

  git cat-file --batch-cols=type,size

If so, then you are basically implementing the equivalent of a macro
format anyway (you have to parse it left to right to know the order).
And if not, you end up translating the column list into a bit-field, and
the boilerplate for adding a new item is about the same as for a macro
format.

So I went ahead with the full formats for my re-roll. It turned out
pretty reasonable, I think.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option

2013-07-10 Thread Jeff King
On Mon, Jul 08, 2013 at 07:07:01PM +0530, Ramkumar Ramachandra wrote:

  There's also syntax sharing. I don't think each command should have
  its own syntax. f-e-r already has %(objectsize). If we plan to have a
  common syntax, perhaps %(disk-size) should be %(objectsize:disk) or
  something.
 
 Ofcourse.  I didn't notice %(objectsize); %(objectsize[:disk]) is a
 fine suggestion.
 
  Adding formatting to cat-file --batch from scratch could be
  another big chunk of code (that also comes with bugs, usually) and may
  or may not be compatible with the common syntax because of some
  oversight.
 
 Oh, I'm proposing that Peff implements just %H and
 %(objectsize[:disk]) for _now_, because that's what he wants.  It
 should be a tiny 20-line parser that's easy to swap out.

I went with %(objectname), %(objectsize), and %(objectsize:disk). The
former match for-each-ref, and the latter extends it in a natural-ish
way (though it is arguable whether foo:modifier should mean do
something with the foo value or this is like foo, but not quite). In
the long run, I would like to see long names for each atom, with short
aliases (so %H and %h for %(objectname) and %(objectname:short),
available everywhere).

But I think it is OK to start without the aliases, and then pick them up
as the implementations and interfaces are unified. IOW, it is OK to say
cat-file has not learned about %H yet, and later add it; we cannot
teach it %H now and then change our minds later.

  --batch-cols=... or --batch-disk-size would be simpler, but
  we might never be able to remove that code.
 
 Agreed.  The approach paints us into a design-corner, and must
 therefore be avoided.

I would say it is worth one of the other routes if it turned out to be
dramatically simpler. But having just done the work to add formats for
cat-file, it is really not too bad. It lacks some of the niceties of the
other formatters (e.g., colors), but again, we can always add them in
later as the implementations unify.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 00/10] cat-file formats/on-disk sizes

2013-07-10 Thread Jeff King
Here's a re-roll of the cat-file --batch-disk-sizes series.

The main change is that it replaces the --batch-disk-sizes option with a
format string for --batch-check, syntactically modeled after the
for-each-ref format string.

  [01/10]: zero-initialize object_info structs
  [02/10]: teach sha1_object_info_extended a disk_size query

These two are the same as before.

  [03/10]: t1006: modernize output comparisons

New. Cleanup since I add some related tests later on.

  [04/10]: cat-file: teach --batch to stream blob objects

New. I think this is a sane thing to be doing, and it helps reorganize
the code for later changes. But note the performance caveat in the
commit message.

  [05/10]: cat-file: refactor --batch option parsing
  [06/10]: cat-file: add --batch-check=format

These ones add the formatting code.

  [07/10]: cat-file: add %(objectsize:disk) format atom

And this is the format analog of my original 3/4.

  [08/10]: cat-file: split --batch input lines on whitespace

New. This makes certain types of analysis simpler when you pipe
rev-list --objects into cat-file --batch-check, because you can
retain the object paths through the pipe.

  [09/10]: pack-revindex: use unsigned to store number of objects

New. Addresses the 2^31 bug that Brandon noticed.

  [10/10]: pack-revindex: radix-sort the revindex

From the old 4/4, but with cleanups and addressing comments from the
list (details along with the patch).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/10] zero-initialize object_info structs

2013-07-10 Thread Jeff King
The sha1_object_info_extended function expects the caller to
provide a struct object_info which contains pointers to
query items that will be filled in. The purpose of
providing pointers rather than storing the response directly
in the struct is so that callers can choose not to incur the
expense in finding particular fields that they do not care
about.

Right now the only query item is sizep, and all callers
set it explicitly to choose whether or not to query it; they
can then leave the rest of the struct uninitialized.

However, as we add new query items, each caller will have to
be updated to explicitly turn off the new ones (by setting
them to NULL).  Instead, let's teach each caller to
zero-initialize the struct, so that they do not have to
learn about each new query item added.

Signed-off-by: Jeff King p...@peff.net
---
 sha1_file.c | 2 +-
 streaming.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 0af19c0..de06a97 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2428,7 +2428,7 @@ int sha1_object_info(const unsigned char *sha1, unsigned 
long *sizep)
 
 int sha1_object_info(const unsigned char *sha1, unsigned long *sizep)
 {
-   struct object_info oi;
+   struct object_info oi = {0};
 
oi.sizep = sizep;
return sha1_object_info_extended(sha1, oi);
diff --git a/streaming.c b/streaming.c
index cabcd9d..cac282f 100644
--- a/streaming.c
+++ b/streaming.c
@@ -135,7 +135,7 @@ struct git_istream *open_istream(const unsigned char *sha1,
 struct stream_filter *filter)
 {
struct git_istream *st;
-   struct object_info oi;
+   struct object_info oi = {0};
const unsigned char *real = lookup_replace_object(sha1);
enum input_source src = istream_source(real, type, oi);
 
-- 
1.8.3.rc3.24.gec82cb9

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/10] teach sha1_object_info_extended a disk_size query

2013-07-10 Thread Jeff King
Using sha1_object_info_extended, a caller can find out the
type of an object, its size, and information about where it
is stored. In addition to the object's true size, it can
also be useful to know the size that the object takes on
disk (e.g., to generate statistics about which refs consume
space).

This patch adds a disk_sizep field to struct object_info,
and fills it in during sha1_object_info_extended if it is
non-NULL.

Signed-off-by: Jeff King p...@peff.net
---
 cache.h |  1 +
 sha1_file.c | 20 
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index dd0fb33..2d06169 100644
--- a/cache.h
+++ b/cache.h
@@ -1130,6 +1130,7 @@ struct object_info {
 struct object_info {
/* Request */
unsigned long *sizep;
+   unsigned long *disk_sizep;
 
/* Response */
enum {
diff --git a/sha1_file.c b/sha1_file.c
index de06a97..4c2365f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1697,7 +1697,8 @@ static int packed_object_info(struct packed_git *p, off_t 
obj_offset,
 #define POI_STACK_PREALLOC 64
 
 static int packed_object_info(struct packed_git *p, off_t obj_offset,
- unsigned long *sizep, int *rtype)
+ unsigned long *sizep, int *rtype,
+ unsigned long *disk_sizep)
 {
struct pack_window *w_curs = NULL;
unsigned long size;
@@ -1731,6 +1732,11 @@ static int packed_object_info(struct packed_git *p, 
off_t obj_offset,
}
}
 
+   if (disk_sizep) {
+   struct revindex_entry *revidx = find_pack_revindex(p, 
obj_offset);
+   *disk_sizep = revidx[1].offset - obj_offset;
+   }
+
while (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) {
off_t base_offset;
/* Push the object we're going to leave behind */
@@ -2357,7 +2363,8 @@ struct packed_git *find_sha1_pack(const unsigned char 
*sha1,
 
 }
 
-static int sha1_loose_object_info(const unsigned char *sha1, unsigned long 
*sizep)
+static int sha1_loose_object_info(const unsigned char *sha1, unsigned long 
*sizep,
+ unsigned long *disk_sizep)
 {
int status;
unsigned long mapsize, size;
@@ -2368,6 +2375,8 @@ static int sha1_loose_object_info(const unsigned char 
*sha1, unsigned long *size
map = map_sha1_file(sha1, mapsize);
if (!map)
return -1;
+   if (disk_sizep)
+   *disk_sizep = mapsize;
if (unpack_sha1_header(stream, map, mapsize, hdr, sizeof(hdr))  0)
status = error(unable to unpack %s header,
   sha1_to_hex(sha1));
@@ -2391,13 +2400,15 @@ int sha1_object_info_extended(const unsigned char 
*sha1, struct object_info *oi)
if (co) {
if (oi-sizep)
*(oi-sizep) = co-size;
+   if (oi-disk_sizep)
+   *(oi-disk_sizep) = 0;
oi-whence = OI_CACHED;
return co-type;
}
 
if (!find_pack_entry(sha1, e)) {
/* Most likely it's a loose object. */
-   status = sha1_loose_object_info(sha1, oi-sizep);
+   status = sha1_loose_object_info(sha1, oi-sizep, 
oi-disk_sizep);
if (status = 0) {
oi-whence = OI_LOOSE;
return status;
@@ -2409,7 +2420,8 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi)
return status;
}
 
-   status = packed_object_info(e.p, e.offset, oi-sizep, rtype);
+   status = packed_object_info(e.p, e.offset, oi-sizep, rtype,
+   oi-disk_sizep);
if (status  0) {
mark_bad_packed_object(e.p, sha1);
status = sha1_object_info_extended(sha1, oi);
-- 
1.8.3.rc3.24.gec82cb9

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/10] t1006: modernize output comparisons

2013-07-10 Thread Jeff King
In modern tests, we typically put output into a file and
compare it with test_cmp. This is nicer than just comparing
via test, and much shorter than comparing via test and
printing a custom message.

Signed-off-by: Jeff King p...@peff.net
---
I didn't do the whole file, just the ones of a particular style close to
what I was touching.

 t/t1006-cat-file.sh | 61 -
 1 file changed, 18 insertions(+), 43 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 9cc5c6b..c2f2503 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -36,66 +36,41 @@ $content
 '
 
 test_expect_success Type of $type is correct '
-test $type = $(git cat-file -t $sha1)
+   echo $type expect 
+   git cat-file -t $sha1 actual 
+   test_cmp expect actual
 '
 
 test_expect_success Size of $type is correct '
-test $size = $(git cat-file -s $sha1)
+   echo $size expect 
+   git cat-file -s $sha1 actual 
+   test_cmp expect actual
 '
 
 test -z $content ||
 test_expect_success Content of $type is correct '
-   expect=$(maybe_remove_timestamp $content $no_ts)
-   actual=$(maybe_remove_timestamp $(git cat-file $type $sha1) $no_ts)
-
-if test z$expect = z$actual
-   then
-   : happy
-   else
-   echo Oops: expected $expect
-   echo but got $actual
-   false
-fi
+   maybe_remove_timestamp $content $no_ts expect 
+   maybe_remove_timestamp $(git cat-file $type $sha1) $no_ts actual 
+   test_cmp expect actual
 '
 
 test_expect_success Pretty content of $type is correct '
-   expect=$(maybe_remove_timestamp $pretty_content $no_ts)
-   actual=$(maybe_remove_timestamp $(git cat-file -p $sha1) $no_ts)
-if test z$expect = z$actual
-   then
-   : happy
-   else
-   echo Oops: expected $expect
-   echo but got $actual
-   false
-fi
+   maybe_remove_timestamp $pretty_content $no_ts expect 
+   maybe_remove_timestamp $(git cat-file -p $sha1) $no_ts actual 
+   test_cmp expect actual
 '
 
 test -z $content ||
 test_expect_success --batch output of $type is correct '
-   expect=$(maybe_remove_timestamp $batch_output $no_ts)
-   actual=$(maybe_remove_timestamp $(echo $sha1 | git cat-file --batch) 
$no_ts)
-if test z$expect = z$actual
-   then
-   : happy
-   else
-   echo Oops: expected $expect
-   echo but got $actual
-   false
-fi
+   maybe_remove_timestamp $batch_output $no_ts expect 
+   maybe_remove_timestamp $(echo $sha1 | git cat-file --batch) $no_ts 
actual 
+   test_cmp expect actual
 '
 
 test_expect_success --batch-check output of $type is correct '
-   expect=$sha1 $type $size
-   actual=$(echo_without_newline $sha1 | git cat-file --batch-check)
-if test z$expect = z$actual
-   then
-   : happy
-   else
-   echo Oops: expected $expect
-   echo but got $actual
-   false
-fi
+   echo $sha1 $type $size expect 
+   echo_without_newline $sha1 | git cat-file --batch-check actual 
+   test_cmp expect actual
 '
 }
 
-- 
1.8.3.rc3.24.gec82cb9

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/10] cat-file: refactor --batch option parsing

2013-07-10 Thread Jeff King
We currently use an int to tell us whether --batch parsing
is on, and if so, whether we should print the full object
contents. Let's instead factor this into a struct, filled in
by callback, which will make further batch-related options
easy to add.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/cat-file.c | 56 --
 1 file changed, 38 insertions(+), 18 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 70dd8c8..5254fe8 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -13,9 +13,6 @@
 #include userdiff.h
 #include streaming.h
 
-#define BATCH 1
-#define BATCH_CHECK 2
-
 static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
 {
unsigned char sha1[20];
@@ -142,7 +139,12 @@ static void print_object_or_die(int fd, const unsigned 
char *sha1,
}
 }
 
-static int batch_one_object(const char *obj_name, int print_contents)
+struct batch_options {
+   int enabled;
+   int print_contents;
+};
+
+static int batch_one_object(const char *obj_name, struct batch_options *opt)
 {
unsigned char sha1[20];
enum object_type type = 0;
@@ -167,19 +169,19 @@ static int batch_objects(int print_contents)
printf(%s %s %lu\n, sha1_to_hex(sha1), typename(type), size);
fflush(stdout);
 
-   if (print_contents == BATCH) {
+   if (opt-print_contents) {
print_object_or_die(1, sha1, type, size);
write_or_die(1, \n, 1);
}
return 0;
 }
 
-static int batch_objects(int print_contents)
+static int batch_objects(struct batch_options *opt)
 {
struct strbuf buf = STRBUF_INIT;
 
while (strbuf_getline(buf, stdin, '\n') != EOF) {
-   int error = batch_one_object(buf.buf, print_contents);
+   int error = batch_one_object(buf.buf, opt);
if (error)
return error;
}
@@ -201,10 +203,28 @@ int cmd_cat_file(int argc, const char **argv, const char 
*prefix)
return git_default_config(var, value, cb);
 }
 
+static int batch_option_callback(const struct option *opt,
+const char *arg,
+int unset)
+{
+   struct batch_options *bo = opt-value;
+
+   if (unset) {
+   memset(bo, 0, sizeof(*bo));
+   return 0;
+   }
+
+   bo-enabled = 1;
+   bo-print_contents = !strcmp(opt-long_name, batch);
+
+   return 0;
+}
+
 int cmd_cat_file(int argc, const char **argv, const char *prefix)
 {
-   int opt = 0, batch = 0;
+   int opt = 0;
const char *exp_type = NULL, *obj_name = NULL;
+   struct batch_options batch = {0};
 
const struct option options[] = {
OPT_GROUP(N_(type can be one of: blob, tree, commit, tag)),
@@ -215,12 +235,12 @@ int cmd_cat_file(int argc, const char **argv, const char 
*prefix)
OPT_SET_INT('p', NULL, opt, N_(pretty-print object's 
content), 'p'),
OPT_SET_INT(0, textconv, opt,
N_(for blob objects, run textconv on object's 
content), 'c'),
-   OPT_SET_INT(0, batch, batch,
-   N_(show info and content of objects fed from the 
standard input),
-   BATCH),
-   OPT_SET_INT(0, batch-check, batch,
-   N_(show info about objects fed from the standard 
input),
-   BATCH_CHECK),
+   { OPTION_CALLBACK, 0, batch, batch, NULL,
+   N_(show info and content of objects fed from the 
standard input),
+   PARSE_OPT_NOARG, batch_option_callback },
+   { OPTION_CALLBACK, 0, batch-check, batch, NULL,
+   N_(show info about objects fed from the standard 
input),
+   PARSE_OPT_NOARG, batch_option_callback },
OPT_END()
};
 
@@ -237,19 +257,19 @@ int cmd_cat_file(int argc, const char **argv, const char 
*prefix)
else
usage_with_options(cat_file_usage, options);
}
-   if (!opt  !batch) {
+   if (!opt  !batch.enabled) {
if (argc == 2) {
exp_type = argv[0];
obj_name = argv[1];
} else
usage_with_options(cat_file_usage, options);
}
-   if (batch  (opt || argc)) {
+   if (batch.enabled  (opt || argc)) {
usage_with_options(cat_file_usage, options);
}
 
-   if (batch)
-   return batch_objects(batch);
+   if (batch.enabled)
+   return batch_objects(batch);
 
return cat_one_file(opt, exp_type, obj_name);
 }
-- 
1.8.3.rc3.24.gec82cb9

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http

[PATCH 04/10] cat-file: teach --batch to stream blob objects

2013-07-10 Thread Jeff King
The regular git cat-file -p and git cat-file blob code
paths already learned to stream large blobs. Let's do the
same here.

Note that this means we look up the type and size before
making a decision of whether to load the object into memory
or stream (just like the -p code path does). That can lead
to extra work, but it should be dwarfed by the cost of
actually accessing the object itself. In my measurements,
there was a 1-2% slowdown when using --batch on a large
number of objects.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/cat-file.c | 41 -
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 045cee7..70dd8c8 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -117,12 +117,36 @@ static int batch_one_object(const char *obj_name, int 
print_contents)
return 0;
 }
 
+static void print_object_or_die(int fd, const unsigned char *sha1,
+   enum object_type type, unsigned long size)
+{
+   if (type == OBJ_BLOB) {
+   if (stream_blob_to_fd(fd, sha1, NULL, 0)  0)
+   die(unable to stream %s to stdout, sha1_to_hex(sha1));
+   }
+   else {
+   enum object_type rtype;
+   unsigned long rsize;
+   void *contents;
+
+   contents = read_sha1_file(sha1, rtype, rsize);
+   if (!contents)
+   die(object %s disappeared, sha1_to_hex(sha1));
+   if (rtype != type)
+   die(object %s changed type!?, sha1_to_hex(sha1));
+   if (rsize != size)
+   die(object %s change size!?, sha1_to_hex(sha1));
+
+   write_or_die(fd, contents, size);
+   free(contents);
+   }
+}
+
 static int batch_one_object(const char *obj_name, int print_contents)
 {
unsigned char sha1[20];
enum object_type type = 0;
unsigned long size;
-   void *contents = NULL;
 
if (!obj_name)
   return 1;
@@ -133,16 +157,10 @@ static int batch_one_object(const char *obj_name, int 
print_contents)
return 0;
}
 
-   if (print_contents == BATCH)
-   contents = read_sha1_file(sha1, type, size);
-   else
-   type = sha1_object_info(sha1, size);
-
+   type = sha1_object_info(sha1, size);
if (type = 0) {
printf(%s missing\n, obj_name);
fflush(stdout);
-   if (print_contents == BATCH)
-   free(contents);
return 0;
}
 
@@ -150,12 +168,9 @@ static int batch_one_object(const char *obj_name, int 
print_contents)
fflush(stdout);
 
if (print_contents == BATCH) {
-   write_or_die(1, contents, size);
-   printf(\n);
-   fflush(stdout);
-   free(contents);
+   print_object_or_die(1, sha1, type, size);
+   write_or_die(1, \n, 1);
}
-
return 0;
 }
 
-- 
1.8.3.rc3.24.gec82cb9

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/10] cat-file: add --batch-check=format

2013-07-10 Thread Jeff King
The `cat-file --batch-check` command can be used to quickly
get information about a large number of objects. However, it
provides a fixed set of information.

This patch adds an optional format option to --batch-check
to allow a caller to specify which items they are interested
in, and in which order to output them. This is not very
exciting for now, since we provide the same limited set that
you could already get. However, it opens the door to adding
new format items in the future without breaking backwards
compatibility (or forcing callers to pay the cost to
calculate uninteresting items).

Since the --batch option shares code with --batch-check, it
receives the same feature, though it is less likely to be of
interest there.

The format atom names are chosen to match their counterparts
in for-each-ref. Though we do not (yet) share any code with
for-each-ref's formatter, this keeps the interface as
consistent as possible, and may help later on if the
implementations are unified.

Signed-off-by: Jeff King p...@peff.net
---
If the 1% slowdown in the streaming blob patch is too much, the simplest
thing would be to have this formatting apply only to --batch-check, and
let --batch follow its own simpler code path. I doubt anybody really
cares about having custom formats for --batch, as it is less about
analysis and more about getting enough information to recreate the
objects.

Also note that there is no %(contents) atom that one could use to
emulate --batch via --batch-check. I don't see much point, and it would
mean we would not want to build the formatting on strbuf_expand (because
we don't want to copy a large blob into the strbuf). We can add it later
if somebody finds an actual use.

 Documentation/git-cat-file.txt |  55 -
 builtin/cat-file.c | 107 +++--
 t/t1006-cat-file.sh|   6 +++
 3 files changed, 142 insertions(+), 26 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 30d585a..dd5d6e4 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -58,12 +58,16 @@ OPTIONS
to apply the filter to the content recorded in the index at path.
 
 --batch::
-   Print the SHA-1, type, size, and contents of each object provided on
-   stdin. May not be combined with any other options or arguments.
+--batch=format::
+   Print object information and contents for each object provided
+   on stdin.  May not be combined with any other options or arguments.
+   See the section `BATCH OUTPUT` below for details.
 
 --batch-check::
-   Print the SHA-1, type, and size of each object provided on stdin. May 
not
-   be combined with any other options or arguments.
+--batch-check=format::
+   Print object information for each object provided on stdin.  May
+   not be combined with any other options or arguments.  See the
+   section `BATCH OUTPUT` below for details.
 
 OUTPUT
 --
@@ -78,23 +82,52 @@ each object specified on stdin:
 If type is specified, the raw (though uncompressed) contents of the object
 will be returned.
 
-If '--batch' is specified, output of the following form is printed for each
-object specified on stdin:
+BATCH OUTPUT
+
+
+If `--batch` or `--batch-check` is given, `cat-file` will read objects
+from stdin, one per line, and print information about them.
+
+Each line is considered as a whole object name, and is parsed as if
+given to linkgit:git-rev-parse[1].
+
+You can specify the information shown for each object by using a custom
+`format`. The `format` is copied literally to stdout for each
+object, with placeholders of the form `%(atom)` expanded, followed by a
+newline. The available atoms are:
+
+`objectname`::
+   The sha1 hash of the object.
+
+`objecttype`::
+   The type of of the object (the same as `cat-file -t` reports).
+
+`objectsize`::
+   The size, in bytes, of the object (the same as `cat-file -s`
+   reports).
+
+If no format is specified, the default format is `%(objectname)
+%(objecttype) %(objectsize)`.
+
+If `--batch` is specified, the object information is followed by the
+object contents (consisting of `%(objectsize)` bytes), followed by a
+newline.
+
+For example, `--batch` without a custom format would produce:
 
 
 sha1 SP type SP size LF
 contents LF
 
 
-If '--batch-check' is specified, output of the following form is printed for
-each object specified on stdin:
+Whereas `--batch-check='%(objectname) %(objecttype)'` would produce:
 
 
-sha1 SP type SP size LF
+sha1 SP type LF
 
 
-For both '--batch' and '--batch-check', output of the following form is printed
-for each object specified on stdin that does not exist in the repository:
+If a name is specified on stdin that cannot be resolved to an object in
+the repository, then `cat-file` will ignore any custom format and print:
 
 
 object

[PATCH 07/10] cat-file: add %(objectsize:disk) format atom

2013-07-10 Thread Jeff King
This atom is just like %(objectsize), except that it shows
the on-disk size of the object rather than the object's true
size. In other words, it makes the disk_size query of
sha1_object_info_extended available via the command-line.

This can be used for rough attribution of disk usage to
particular refs, though see the caveats in the
documentation.

This patch does not include any tests, as the exact numbers
returned are volatile and subject to zlib and packing
decisions. We cannot even reliably guarantee that the
on-disk size is smaller than the object content (though in
general this should be the case for non-trivial objects).

Signed-off-by: Jeff King p...@peff.net
---
 Documentation/git-cat-file.txt | 18 ++
 builtin/cat-file.c |  6 ++
 2 files changed, 24 insertions(+)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index dd5d6e4..06bdc43 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -106,6 +106,10 @@ newline. The available atoms are:
The size, in bytes, of the object (the same as `cat-file -s`
reports).
 
+`objectsize:disk`::
+   The size, in bytes, that the object takes up on disk. See the
+   note about on-disk sizes in the `CAVEATS` section below.
+
 If no format is specified, the default format is `%(objectname)
 %(objecttype) %(objectsize)`.
 
@@ -133,6 +137,20 @@ the repository, then `cat-file` will ignore any custom 
format and print:
 object SP missing LF
 
 
+
+CAVEATS
+---
+
+Note that the sizes of objects on disk are reported accurately, but care
+should be taken in drawing conclusions about which refs or objects are
+responsible for disk usage. The size of a packed non-delta object may be
+much larger than the size of objects which delta against it, but the
+choice of which object is the base and which is the delta is arbitrary
+and is subject to change during a repack. Note also that multiple copies
+of an object may be present in the object database; in this case, it is
+undefined which copy's size will be reported.
+
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index b43a0c5..11fa8c0 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -118,6 +118,7 @@ struct expand_data {
unsigned char sha1[20];
enum object_type type;
unsigned long size;
+   unsigned long disk_size;
 
/*
 * If mark_query is true, we do not expand anything, but rather
@@ -155,6 +156,11 @@ static void expand_atom(struct strbuf *sb, const char 
*atom, int len,
data-info.sizep = data-size;
else
strbuf_addf(sb, %lu, data-size);
+   } else if (is_atom(objectsize:disk, atom, len)) {
+   if (data-mark_query)
+   data-info.disk_sizep = data-disk_size;
+   else
+   strbuf_addf(sb, %lu, data-disk_size);
} else
die(unknown format element: %.*s, len, atom);
 }
-- 
1.8.3.rc3.24.gec82cb9

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/10] pack-revindex: use unsigned to store number of objects

2013-07-10 Thread Jeff King
A packfile may have up to 2^32-1 objects in it, so the
right data type to use is uint32_t. We currently use a
signed int, which means that we may behave incorrectly for
packfiles with more than 2^31-1 objects on 32-bit systems.

Nobody has noticed because having 2^31 objects is pretty
insane. The linux.git repo has on the order of 2^22 objects,
which is hundreds of times smaller than necessary to trigger
the bug.

Let's bump this up to an unsigned. On 32-bit systems, this
gives us the correct data-type, and on 64-bit systems, it is
probably more efficient to use the native unsigned than a
true uint32_t.

While we're at it, we can fix the binary search not to
overflow in such a case if our unsigned is 32 bits.

Signed-off-by: Jeff King p...@peff.net
---
I didn't look farther in the pack code to see if we have other
problematic instances. So there may be others lurking, but these ones
were close to the area I was working in.

 pack-revindex.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/pack-revindex.c b/pack-revindex.c
index 77a0465..1aa9754 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -72,8 +72,8 @@ static void create_pack_revindex(struct pack_revindex *rix)
 static void create_pack_revindex(struct pack_revindex *rix)
 {
struct packed_git *p = rix-p;
-   int num_ent = p-num_objects;
-   int i;
+   unsigned num_ent = p-num_objects;
+   unsigned i;
const char *index = p-index_data;
 
rix-revindex = xmalloc(sizeof(*rix-revindex) * (num_ent + 1));
@@ -114,7 +114,7 @@ struct revindex_entry *find_pack_revindex(struct packed_git 
*p, off_t ofs)
 struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs)
 {
int num;
-   int lo, hi;
+   unsigned lo, hi;
struct pack_revindex *rix;
struct revindex_entry *revindex;
 
@@ -132,7 +132,7 @@ struct revindex_entry *find_pack_revindex(struct packed_git 
*p, off_t ofs)
lo = 0;
hi = p-num_objects + 1;
do {
-   int mi = (lo + hi) / 2;
+   unsigned mi = lo + (hi - lo) / 2;
if (revindex[mi].offset == ofs) {
return revindex + mi;
} else if (ofs  revindex[mi].offset)
-- 
1.8.3.rc3.24.gec82cb9

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/10] pack-revindex: radix-sort the revindex

2013-07-10 Thread Jeff King
The pack revindex stores the offsets of the objects in the
pack in sorted order, allowing us to easily find the on-disk
size of each object. To compute it, we populate an array
with the offsets from the sha1-sorted idx file, and then use
qsort to order it by offsets.

That does O(n log n) offset comparisons, and profiling shows
that we spend most of our time in cmp_offset. However, since
we are sorting on a simple off_t, we can use numeric sorts
that perform better. A radix sort can run in O(k*n), where k
is the number of digits in our number. For a 64-bit off_t,
using 16-bit digits gives us k=4.

On the linux.git repo, with about 3M objects to sort, this
yields a 400% speedup. Here are the best-of-five numbers for
running echo HEAD | git cat-file --batch-disk-size, which
is dominated by time spent building the pack revindex:

  before after
  real0m0.834s   0m0.204s
  user0m0.788s   0m0.164s
  sys 0m0.040s   0m0.036s

On a smaller repo, the radix sort would not be
as impressive (and could even be worse), as we are trading
the log(n) factor for the k=4 of the radix sort. However,
even on git.git, with 173K objects, it shows some
improvement:

  before after
  real0m0.046s   0m0.017s
  user0m0.036s   0m0.012s
  sys 0m0.008s   0m0.000s

Signed-off-by: Jeff King p...@peff.net
---
I changed a few things from the original, including:

  1. We take an unsigned number of objects to match the fix in the
 last patch.

  2. The 16-bit digit size is factored out to a single place, which
 avoids magic numbers and repeating ourselves.

  3. The digits variable is renamed to bits, which is more accurate.

  4. The outer loop condition uses the simpler while (max  bits).

  5. We use memcpy instead of an open-coded loop to copy the whole array
 at the end. The individual bucket-assignment is still done by
 struct assignment. I haven't timed if memcpy would make a
 difference there.

  6. The 64K*sizeof(int) pos array is now heap-allocated, in case
 there are platforms with a small stack.

I re-ran my timings to make sure none of the above impacted them; it
turned out the same.

 pack-revindex.c | 84 +
 1 file changed, 79 insertions(+), 5 deletions(-)

diff --git a/pack-revindex.c b/pack-revindex.c
index 1aa9754..9365bc2 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -59,11 +59,85 @@ static int cmp_offset(const void *a_, const void *b_)
/* revindex elements are lazily initialized */
 }
 
-static int cmp_offset(const void *a_, const void *b_)
+/*
+ * This is a least-significant-digit radix sort.
+ */
+static void sort_revindex(struct revindex_entry *entries, unsigned n, off_t 
max)
 {
-   const struct revindex_entry *a = a_;
-   const struct revindex_entry *b = b_;
-   return (a-offset  b-offset) ? -1 : (a-offset  b-offset) ? 1 : 0;
+   /*
+* We use a digit size of 16 bits. That keeps our memory
+* usage reasonable, and we can generally (for a 4G or smaller
+* packfile) quit after two rounds of radix-sorting.
+*/
+#define DIGIT_SIZE (16)
+#define BUCKETS (1  DIGIT_SIZE)
+   /*
+* We want to know the bucket that a[i] will go into when we are using
+* the digit that is N bits from the (least significant) end.
+*/
+#define BUCKET_FOR(a, i, bits) (((a)[(i)].offset  (bits))  (BUCKETS-1))
+
+   /*
+* We need O(n) temporary storage, so we sort back and forth between
+* the real array and our tmp storage. To keep them straight, we always
+* sort from a into buckets in b.
+*/
+   struct revindex_entry *tmp = xcalloc(n, sizeof(*tmp));
+   struct revindex_entry *a = entries, *b = tmp;
+   int bits = 0;
+   unsigned *pos = xmalloc(BUCKETS * sizeof(*pos));
+
+   while (max  bits) {
+   struct revindex_entry *swap;
+   int i;
+
+   memset(pos, 0, BUCKETS * sizeof(*pos));
+
+   /*
+* We want pos[i] to store the index of the last element that
+* will go in bucket i (actually one past the last element).
+* To do this, we first count the items that will go in each
+* bucket, which gives us a relative offset from the last
+* bucket. We can then cumulatively add the index from the
+* previous bucket to get the true index.
+*/
+   for (i = 0; i  n; i++)
+   pos[BUCKET_FOR(a, i, bits)]++;
+   for (i = 1; i  BUCKETS; i++)
+   pos[i] += pos[i-1];
+
+   /*
+* Now we can drop the elements into their correct buckets (in
+* our temporary array).  We iterate the pos counter backwards
+* to avoid using an extra index to count up. And since we are
+* going backwards there, we must also go

Re: [PATCH 10/10] pack-revindex: radix-sort the revindex

2013-07-10 Thread Jeff King
On Wed, Jul 10, 2013 at 07:55:57AM -0400, Jeff King wrote:

   5. We use memcpy instead of an open-coded loop to copy the whole array
  at the end. The individual bucket-assignment is still done by
  struct assignment. I haven't timed if memcpy would make a
  difference there.

I just timed this, and I can't measure any difference. I think the
struct assignment is the more readable option, and I do not think any
compilers should have trouble with it. But if they do, we can switch it
for a memcpy.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/10] pack-revindex: radix-sort the revindex

2013-07-11 Thread Jeff King
On Wed, Jul 10, 2013 at 06:47:49PM +0530, Ramkumar Ramachandra wrote:

  For a 64-bit off_t, using 16-bit digits gives us k=4.
 
 Wait, isn't off_t a signed data type?  Did you account for that in
 your algorithm?

It is signed, but the values we are storing in the revindex are all
positive file offsets. Right-shifting a positive signed type is
explicitly allowed in C.

  -static int cmp_offset(const void *a_, const void *b_)
  +/*
  + * This is a least-significant-digit radix sort.
  + */
 
 Any particular reason for choosing LSD, and not MSD?

Simplicity. An MSD implementation should have the same algorithmic
complexity and in theory, one can do MSD in-place. I'm happy enough with
the speedup here, but if you want to take a stab at beating my times
with MSD, please feel free.

The other usual downside of MSD is that it is typically not stable,
but we don't care about that here. We know that our sort keys are
unique.

  +#define DIGIT_SIZE (16)
  +#define BUCKETS (1  DIGIT_SIZE)
 
 Okay, NUMBER_OF_BUCKETS = 2^RADIX, and you choose a hex radix.  Is
 off_t guaranteed to be fixed-length though?  I thought only the ones
 in stdint.h were guaranteed to be fixed-length?

I'm not sure what you mean by fixed-length. If you mean does it have the
same size on every platform, then no. It will typically be 32-bit on
platforms without largefile support, and 64-bit elsewhere. But it
shouldn't matter. We'll first sort the entries by the lower 16 bits, and
then if we have more bits, by the next 16 bits, and so on. We quit when
the maximum value to sort (which we know ahead of time from the size of
the packfile) is smaller than the 16-bits we are on. So we don't need to
know the exact size of off_t, only the maximum value in our list (which
must, by definition, be smaller than what can be represented by off_t).

  +   /*
  +* We want to know the bucket that a[i] will go into when we are 
  using
  +* the digit that is N bits from the (least significant) end.
  +*/
  +#define BUCKET_FOR(a, i, bits) (((a)[(i)].offset  (bits))  (BUCKETS-1))
 
 Ouch!  This is unreadable.  Just write an inline function instead?  A
 % would've been easier on the eyes, but you chose base-16.

I specifically avoided an inline function because they are subject to
compiler settings. This isn't just it would be a bit faster if this got
inlined, and OK otherwise but this would be horribly slow if not
inlined.

I'm also not sure that

  static inline unsigned bucket_for(const struct revindex *a,
unsigned i,
unsigned bits)
  {
  return a[i].offset  bits  (BUCKETS-1);
  }

is actually any more readable.

I'm not sure what you mean by base-16. No matter the radix digit size,
as long as it is an integral number of bits, we can mask it off, which
is more efficient than modulo. A good compiler should see that it
is a constant and convert it to a bit-mask, but I'm not sure I agree
that modular arithmetic is more readable. This is fundamentally a
bit-twiddling operation, as we are shifting and masking.

I tried to explain it in the comment; suggestions to improve that are
welcome.

  +   /*
  +* We need O(n) temporary storage, so we sort back and forth between
  +* the real array and our tmp storage. To keep them straight, we 
  always
  +* sort from a into buckets in b.
  +*/
  +   struct revindex_entry *tmp = xcalloc(n, sizeof(*tmp));
 
 Shouldn't this be sizeof (struct revindex_entry), since tmp hasn't
 been declared yet?

No, the variable is declared (but uninitialized) in its initializer.
Despite its syntax, sizeof() is not a function and does not care about
the state of the variable, only its type.

 Also, s/n/revindex_nr/, and something more appropriate for tmp?

What name would you suggest be be more appropriate for tmp?

  +   int bits = 0;
  +   unsigned *pos = xmalloc(BUCKETS * sizeof(*pos));
 
 sizeof(unsigned int), for clarity, if not anything else.

I disagree; in general, I prefer using sizeof(*var) rather than
sizeof(type), because it avoids repeating ourselves, and there is no
compile-time check that you have gotten it right.

In the initializer it is less important, because the type is right
there. But when you are later doing:

  memset(pos, 0, BUCKETS * sizeof(*pos));

this is much more robust. If somebody changes the type of pos, the
memset line does not need changed. If you used sizeof(unsigned), then
the code is now buggy (and the compiler cannot notice).

 You picked malloc over calloc here, because you didn't want to incur
 the extra cost of zero-initializing the memory?

Yes. We have to zero-initialize in each loop, so there is no point
spending the extra effort on calloc.

We could also xcalloc inside each loop iteration, but since we need the
same-size allocation each time, I hoisted the malloc out of the loop.

 Also, pos is the actual buckets array, I presume (hence unsigned,

Re: [PATCH 10/10] pack-revindex: radix-sort the revindex

2013-07-11 Thread Jeff King
On Wed, Jul 10, 2013 at 10:10:16AM -0700, Brandon Casey wrote:

  On the linux.git repo, with about 3M objects to sort, this
  yields a 400% speedup. Here are the best-of-five numbers for
  running echo HEAD | git cat-file --batch-disk-size, which
  is dominated by time spent building the pack revindex:
 
before after
real0m0.834s   0m0.204s
user0m0.788s   0m0.164s
sys 0m0.040s   0m0.036s
 
  On a smaller repo, the radix sort would not be
  as impressive (and could even be worse), as we are trading
  the log(n) factor for the k=4 of the radix sort. However,
  even on git.git, with 173K objects, it shows some
  improvement:
 
before after
real0m0.046s   0m0.017s
user0m0.036s   0m0.012s
sys 0m0.008s   0m0.000s
 
 k should only be 2 for git.git.  I haven't packed in a while, but I
 think it should all fit within 4G.  :)  The pathological case would be
 a pack file with very few very very large objects, large enough to
 push the pack size over the 2^48 threshold so we'd have to do all four
 radixes.

Yeah, even linux.git fits into k=2. And that does more or less explain
the numbers in both cases.

For git.git, With 173K objects, log(n) is ~18, so regular sort is 18n.
With a radix sort of k=2, which has a constant factor of 2 (you can see
by looking at the code that we go through the list twice per radix), we
have 4n. So there should be a 4.5x speedup. We don't quite get that,
which is probably due to the extra bookkeeping on the buckets.

For linux.git, with 3M objects, log(n) is ~22, so the speedup we hope
for is 5.5x. We end up with 4x.

 It's probably worth mentioning here and/or in the code that k is
 dependent on the pack file size and that we can jump out early for
 small pack files.  That's my favorite part of this code by the way. :)

Yeah, I agree it is probably worth mentioning along with the numbers; it
is where half of our speedup is coming from. I think the max  bits
loop condition deserves to be commented, too. I'll add that.

Also note that my commit message still refers to --batch-disk-size
which does not exist anymore. :) I didn't update the timings in the
commit message for my re-roll, but I did confirm that they are the same.

  +   /*
  +* We need O(n) temporary storage, so we sort back and forth between
  +* the real array and our tmp storage. To keep them straight, we 
  always
  +* sort from a into buckets in b.
  +*/
  +   struct revindex_entry *tmp = xcalloc(n, sizeof(*tmp));
 
 Didn't notice it the first time I read this, but do we really need
 calloc here?  Or will malloc do?

No, a malloc should be fine. I doubt it matters much, but there's no
reason not to go the cheap route.

  +   struct revindex_entry *a = entries, *b = tmp;
  +   int bits = 0;
  +   unsigned *pos = xmalloc(BUCKETS * sizeof(*pos));
  +
  +   while (max  bits) {
  +   struct revindex_entry *swap;
  +   int i;
 
 You forgot to make i unsigned.  See below too...

Oops. Thanks for catching.

  +   /*
  +* Now we can drop the elements into their correct buckets 
  (in
  +* our temporary array).  We iterate the pos counter 
  backwards
  +* to avoid using an extra index to count up. And since we 
  are
  +* going backwards there, we must also go backwards through 
  the
  +* array itself, to keep the sort stable.
  +*/
  +   for (i = n - 1; i = 0; i--)
  +   b[--pos[BUCKET_FOR(a, i, bits)]] = a[i];
 
 ...which is why the above loop still works.

Since we are iterating by ones, I guess I can just compare to UINT_MAX.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/10] cat-file: add --batch-check=format

2013-07-11 Thread Jeff King
On Wed, Jul 10, 2013 at 08:21:15PM +0530, Ramkumar Ramachandra wrote:

 Jeff King wrote:
  +If `--batch` or `--batch-check` is given, `cat-file` will read objects
  +from stdin, one per line, and print information about them.
  +
  +You can specify the information shown for each object by using a custom
  +`format`. The `format` is copied literally to stdout for each
  +object, with placeholders of the form `%(atom)` expanded, followed by a
  +newline. The available atoms are:
  +
  +If no format is specified, the default format is `%(objectname)
  +%(objecttype) %(objectsize)`.
  +
  +If `--batch` is specified, the object information is followed by the
  +object contents (consisting of `%(objectsize)` bytes), followed by a
  +newline.
 
 I find this slightly hideous, and would have expected an
 %(objectcontents) or similar.

I looked into doing that, but it makes the code significantly more
complicated, assuming you do not want to copy the full object contents
in memory. You cannot use strbuf_expand, and you need to worry about
buffering/flushing more (you do not want to write() each individual
item, but if you are using printf(), you need to flush before using the
unbuffered streaming interface).

My thinking was to leave it until somebody actually wants it, at which
point they can do the necessary refactoring (and hopefully this would be
part of unifying it with other format-parsers).

If we were designing from scratch and this was the difference between
having --batch-check and --batch, or having a single --batch, I'd
care more about doing %(objectcontents) right away. But because we must
support the historical --batch/--batch-check distinction anyway, I don't
think this is any worse.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv3 10/10] pack-revindex: radix-sort the revindex

2013-07-11 Thread Jeff King
 it by offsets.

That does O(n log n) offset comparisons, and profiling shows
that we spend most of our time in cmp_offset. However, since
we are sorting on a simple off_t, we can use numeric sorts
that perform better. A radix sort can run in O(k*n), where k
is the number of digits in our number. For a 64-bit off_t,
using 16-bit digits gives us k=4.

On the linux.git repo, with about 3M objects to sort, this
yields a 400% speedup. Here are the best-of-five numbers for
running

  echo HEAD | git cat-file --batch-check=%(objectsize:disk)

on a fully packed repository, which is dominated by time
spent building the pack revindex:

  before after
  real0m0.834s   0m0.204s
  user0m0.788s   0m0.164s
  sys 0m0.040s   0m0.036s

This matches our algorithmic expectations. log(3M) is ~21.5,
so a traditional sort is ~21.5n. Our radix sort runs in k*n,
where k is the number of radix digits. In the worst case,
this is k=4 for a 64-bit off_t, but we can quit early when
the largest value to be sorted is smaller. For any
repository under 4G, k=2. Our algorithm makes two passes
over the list per radix digit, so we end up with 4n. That
should yield ~5.3x speedup. We see 4x here; the difference
is probably due to the extra bucket book-keeping the radix
sort has to do.

On a smaller repo, the difference is less impressive, as
log(n) is smaller. For git.git, with 173K objects (but still
k=2), we see a 2.7x improvement:

  before after
  real0m0.046s   0m0.017s
  user0m0.036s   0m0.012s
  sys 0m0.008s   0m0.000s

On even tinier repos (e.g., a few hundred objects), the
speedup goes away entirely, as the small advantage of the
radix sort gets erased by the book-keeping costs (and at
those sizes, the cost to generate the the rev-index gets
lost in the noise anyway).

Signed-off-by: Jeff King p...@peff.net
---
 pack-revindex.c | 100 +---
 1 file changed, 95 insertions(+), 5 deletions(-)

diff --git a/pack-revindex.c b/pack-revindex.c
index 1aa9754..b4d2b35 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -59,11 +59,101 @@ static int cmp_offset(const void *a_, const void *b_)
/* revindex elements are lazily initialized */
 }
 
-static int cmp_offset(const void *a_, const void *b_)
+/*
+ * This is a least-significant-digit radix sort.
+ *
+ * It sorts each of the n items in entries by its offset field. The max
+ * parameter must be at least as large as the largest offset in the array,
+ * and lets us quit the sort early.
+ */
+static void sort_revindex(struct revindex_entry *entries, unsigned n, off_t 
max)
 {
-   const struct revindex_entry *a = a_;
-   const struct revindex_entry *b = b_;
-   return (a-offset  b-offset) ? -1 : (a-offset  b-offset) ? 1 : 0;
+   /*
+* We use a digit size of 16 bits. That keeps our memory
+* usage reasonable, and we can generally (for a 4G or smaller
+* packfile) quit after two rounds of radix-sorting.
+*/
+#define DIGIT_SIZE (16)
+#define BUCKETS (1  DIGIT_SIZE)
+   /*
+* We want to know the bucket that a[i] will go into when we are using
+* the digit that is N bits from the (least significant) end.
+*/
+#define BUCKET_FOR(a, i, bits) (((a)[(i)].offset  (bits))  (BUCKETS-1))
+
+   /*
+* We need O(n) temporary storage. Rather than do an extra copy of the
+* partial results into entries, we sort back and forth between the
+* real array and temporary storage. In each iteration of the loop, we
+* keep track of them with alias pointers, always sorting from from
+* to to.
+*/
+   struct revindex_entry *tmp = xmalloc(n * sizeof(*tmp));
+   struct revindex_entry *from = entries, *to = tmp;
+   int bits;
+   unsigned *pos = xmalloc(BUCKETS * sizeof(*pos));
+
+   /*
+* If (max  bits) is zero, then we know that the radix digit we are
+* on (and any higher) will be zero for all entries, and our loop will
+* be a no-op, as everybody lands in the same zero-th bucket.
+*/
+   for (bits = 0; max  bits; bits += DIGIT_SIZE) {
+   struct revindex_entry *swap;
+   unsigned i;
+
+   memset(pos, 0, BUCKETS * sizeof(*pos));
+
+   /*
+* We want pos[i] to store the index of the last element that
+* will go in bucket i (actually one past the last element).
+* To do this, we first count the items that will go in each
+* bucket, which gives us a relative offset from the last
+* bucket. We can then cumulatively add the index from the
+* previous bucket to get the true index.
+*/
+   for (i = 0; i  n; i++)
+   pos[BUCKET_FOR(from, i, bits)]++;
+   for (i = 1; i  BUCKETS; i++)
+   pos[i] += pos[i-1

Re: t0008 hang on streaming test (OS X)

2013-07-11 Thread Jeff King
On Wed, Jul 10, 2013 at 12:36:40PM -0400, Brian Gernhardt wrote:

 The newest test in t0008 streaming support for --stdin, seems to
 hang sporadically on my MacBook Pro (running 10.8.4).  The hang seems
 to be related to running it in parallel with other tests, as I can
 only reliably cause it by running with prove  and -j 3.  However, once
 that has hung I am able to semi-reliably have it occur by running the
 test separately (with the test hung in the background, using separate
 trash directories via the --root option).

I can't replicate the hang here (on Linux) doing:

  for i in `seq 1 30`; do
  ./t0008-ignores.sh --root=/tmp/foo/$i 
  done

Do you know which test it is hanging on? You mentioned that you can
replicate it outside of prove; what does running with -v say?

The last test in t0008, with the fifos, would make me the most
suspicious. The way we do it _should_ be fine, but I'm wondering if the
shell is blocking in exec here:

  mkfifo in out 
  (git check-ignore -n -v --stdin in out ) 
  exec 9in 

That is, if the fifo is not opened for some reason by the backgrounded
process (there's a race, of course, but the outer shell should just
block until the sub-shell actually opens it). I wonder if the
descriptor-opening behavior of:

  cmd in out 

is different between shells (that is, if it backgrounds the opening of
in and out on some shells, but not on others). But then I would expect
it to fail consistently.

Just for fun, does switching the middle line there to:

  (sh -c git check-ignore -n -v --stdin in out ) 

have any effect?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv3 08/10] cat-file: split --batch input lines on whitespace

2013-07-11 Thread Jeff King
On Thu, Jul 11, 2013 at 07:36:53AM -0400, Jeff King wrote:

 On Wed, Jul 10, 2013 at 08:59:51PM +0530, Ramkumar Ramachandra wrote:
 
  Jeff King wrote:
 git rev-list --objects HEAD |
 git cat-file --batch-check='%(objectsize) %(text)'
  
  If anything, I would have expected %(rest), not %(text).  This atom is
  specific to commands that accept input via stdin (i.e. not log, f-e-r,
  branch, or anything else I can think of).
 
 I considered %(rest) as well. I don't have a strong preference.

Here is the patch re-rolled with s/text/rest/.

Junio, that makes this and 10/10 the only v3 updates. Let me know if
it would be simpler to just resend the whole series.

-- 8 --
Subject: [PATCH] cat-file: split --batch input lines on whitespace

If we get an input line to --batch or --batch-check that
looks like HEAD foo bar, we will currently feed the whole
thing to get_sha1(). This means that to use --batch-check
with `rev-list --objects`, one must pre-process the input,
like:

  git rev-list --objects HEAD |
  cut -d' ' -f1 |
  git cat-file --batch-check

Besides being more typing and slightly less efficient to
invoke `cut`, the result loses information: we no longer
know which path each object was found at.

This patch teaches cat-file to split input lines at the
first whitespace. Everything to the left of the whitespace
is considered an object name, and everything to the right is
made available as the %(reset) atom. So you can now do:

  git rev-list --objects HEAD |
  git cat-file --batch-check='%(objectsize) %(rest)'

to collect object sizes at particular paths.

Even if %(rest) is not used, we always do the whitespace
split (which means you can simply eliminate the `cut`
command from the first example above).

This whitespace split is backwards compatible for any
reasonable input. Object names cannot contain spaces, so any
input with spaces would have resulted in a missing line.
The only input hurt is if somebody really expected input of
the form HEAD is a fine-looking ref! to fail; it will now
parse HEAD, and make is a fine-looking ref! available as
%(rest).

Signed-off-by: Jeff King p...@peff.net
---
 Documentation/git-cat-file.txt | 10 --
 builtin/cat-file.c | 20 +++-
 t/t1006-cat-file.sh|  7 +++
 3 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 06bdc43..68691d4 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -88,8 +88,10 @@ from stdin, one per line, and print information about them.
 If `--batch` or `--batch-check` is given, `cat-file` will read objects
 from stdin, one per line, and print information about them.
 
-Each line is considered as a whole object name, and is parsed as if
-given to linkgit:git-rev-parse[1].
+Each line is split at the first whitespace boundary. All characters
+before that whitespace are considered as a whole object name, and are
+parsed as if given to linkgit:git-rev-parse[1]. Characters after that
+whitespace can be accessed using the `%(rest)` atom (see below).
 
 You can specify the information shown for each object by using a custom
 `format`. The `format` is copied literally to stdout for each
@@ -110,6 +112,10 @@ newline. The available atoms are:
The size, in bytes, that the object takes up on disk. See the
note about on-disk sizes in the `CAVEATS` section below.
 
+`rest`::
+   The text (if any) found after the first run of whitespace on the
+   input line (i.e., the rest of the line).
+
 If no format is specified, the default format is `%(objectname)
 %(objecttype) %(objectsize)`.
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 11fa8c0..0e64b41 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -119,6 +119,7 @@ struct expand_data {
enum object_type type;
unsigned long size;
unsigned long disk_size;
+   const char *rest;
 
/*
 * If mark_query is true, we do not expand anything, but rather
@@ -161,6 +162,9 @@ static void expand_atom(struct strbuf *sb, const char 
*atom, int len,
data-info.disk_sizep = data-disk_size;
else
strbuf_addf(sb, %lu, data-disk_size);
+   } else if (is_atom(rest, atom, len)) {
+   if (!data-mark_query  data-rest)
+   strbuf_addstr(sb, data-rest);
} else
die(unknown format element: %.*s, len, atom);
 }
@@ -263,7 +267,21 @@ static int batch_objects(struct batch_options *opt)
data.mark_query = 0;
 
while (strbuf_getline(buf, stdin, '\n') != EOF) {
-   int error = batch_one_object(buf.buf, opt, data);
+   char *p;
+   int error;
+
+   /*
+* Split at first whitespace, tying off the beginning of the
+* string and saving the remainder (or NULL) in data.rest

[PATCH 0/7] cat-file --batch-check performance improvements

2013-07-12 Thread Jeff King
In my earlier series introducing git cat-file --batch-check=format,
found here:

  http://thread.gmane.org/gmane.comp.version-control.git/229761/focus=230041

I spent a little time optimizing revindex generation, and measured by
requesting information on a single object from a large repository. This
series takes the next logical step: requesting a large number of objects
from a large repository.

There are two major optimizations here:

  1. Avoiding extra ref lookups due to the warning in 798c35f (get_sha1:
 warn about full or short object names that look like refs,
 2013-05-29).

  2. Avoiding extra work for delta type resolution when the user has not
 asked for %(objecttype).

I prepared the series on top of jk/in-pack-size-measurement, and
certainly optimization 2 is pointless without it (before that topic,
--batch-check always printed the type).

However, the first optimization affects regular --batch-check, and
represents a much more serious performance regression. It looks like
798c35f is in master, but hasn't been released yet, so assuming these
topics graduate before the next release, it should be OK. But if not, we
should consider pulling the first patch out and applying it (or
something like it) separately.

The results for running (in linux.git):

  $ git rev-list --objects --all objects
  $ git cat-file --batch-check='%(objectsize:disk)' objects /dev/null

are:

before after
  real  1m17.143s  0m7.205s
  user  0m27.684s  0m6.580s
  sys   0m49.320s  0m0.608s

Now, _most_ of that speedup is coming from the first patch, and it's
quite trivial. The rest of the patches involve a lot of refactoring, and
only manage to eke out one more second of performance, so it may not be
worth it (though I think the result actually cleans up the
sha1_object_info_extended interface a bit, and is worth it). Individual
timings are in the commit messages.

The patches are:

  [1/7]: cat-file: disable object/refname ambiguity check for batch mode

Optimization 1.

  [2/7]: sha1_object_info_extended: rename status to type
  [3/7]: sha1_loose_object_info: make type lookup optional
  [4/7]: packed_object_info: hoist delta type resolution to helper
  [5/7]: packed_object_info: make type lookup optional
  [6/7]: sha1_object_info_extended: make type calculation optional

Optimization 2.

  [7/7]: sha1_object_info_extended: pass object_info to helpers

Optional cleanup.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode

2013-07-12 Thread Jeff King
A common use of cat-file --batch-check is to feed a list
of objects from rev-list --objects or a similar command.
In this instance, all of our input objects are 40-byte sha1
ids. However, cat-file has always allowed arbitrary revision
specifiers, and feeds the result to get_sha1().

Fortunately, get_sha1() recognizes a 40-byte sha1 before
doing any hard work trying to look up refs, meaning this
scenario should end up spending very little time converting
the input into an object sha1. However, since 798c35f
(get_sha1: warn about full or short object names that look
like refs, 2013-05-29), when we encounter this case, we
spend the extra effort to do a refname lookup anyway, just
to print a warning. This is further exacerbated by ca91993
(get_packed_ref_cache: reload packed-refs file when it
changes, 2013-06-20), which makes individual ref lookup more
expensive by requiring a stat() of the packed-refs file for
each missing ref.

With no patches, this is the time it takes to run:

  $ git rev-list --objects --all objects
  $ time git cat-file --batch-check='%(objectname)' objects

on the linux.git repository:

  real1m13.494s
  user0m25.924s
  sys 0m47.532s

If we revert ca91993, the packed-refs up-to-date check, it
gets a little better:

  real0m54.697s
  user0m21.692s
  sys 0m32.916s

but we are still spending quite a bit of time on ref lookup
(and we would not want to revert that patch, anyway, which
has correctness issues).  If we revert 798c35f, disabling
the warning entirely, we get a much more reasonable time:

  real0m7.452s
  user0m6.836s
  sys 0m0.608s

This patch does the moral equivalent of this final case (and
gets similar speedups). We introduce a global flag that
callers of get_sha1() can use to avoid paying the price for
the warning.

Signed-off-by: Jeff King p...@peff.net
---
The solution feels a little hacky, but I'm not sure there is a better
one short of reverting the warning entirely.

We could also tie it to warn_ambiguous_refs (or add another config
variable), but I don't think that makes sense. It is not about do I
care about ambiguities in this repository, but rather I am going to
do a really large number of sha1 resolutions, and I do not want to pay
the price in this particular code path for the warning).

There may be other sites that resolve a large number of refs and run
into this, but I couldn't think of any. Doing for_each_ref would not
have the same problem, as we already know they are refs there.

 builtin/cat-file.c |  9 +
 cache.h|  1 +
 environment.c  |  1 +
 sha1_name.c| 14 --
 4 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 0e64b41..fe5c77f 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -266,6 +266,15 @@ static int batch_objects(struct batch_options *opt)
strbuf_expand(buf, opt-format, expand_format, data);
data.mark_query = 0;
 
+   /*
+* We are going to call get_sha1 on a potentially very large number of
+* objects. In most large cases, these will be actual object sha1s. The
+* cost to double-check that each one is not also a ref (just so we can
+* warn) ends up dwarfing the actual cost of the object lookups
+* themselves. We can work around it by just turning off the warning.
+*/
+   warn_on_object_refname_ambiguity = 0;
+
while (strbuf_getline(buf, stdin, '\n') != EOF) {
char *p;
int error;
diff --git a/cache.h b/cache.h
index 2d06169..c1fd82c 100644
--- a/cache.h
+++ b/cache.h
@@ -575,6 +575,7 @@ extern int warn_ambiguous_refs;
 extern int prefer_symlink_refs;
 extern int log_all_ref_updates;
 extern int warn_ambiguous_refs;
+extern int warn_on_object_refname_ambiguity;
 extern int shared_repository;
 extern const char *apply_default_whitespace;
 extern const char *apply_default_ignorewhitespace;
diff --git a/environment.c b/environment.c
index 0cb67b2..5398c36 100644
--- a/environment.c
+++ b/environment.c
@@ -22,6 +22,7 @@ int warn_ambiguous_refs = 1;
 int is_bare_repository_cfg = -1; /* unspecified */
 int log_all_ref_updates = -1; /* unspecified */
 int warn_ambiguous_refs = 1;
+int warn_on_object_refname_ambiguity = 1;
 int repository_format_version;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
diff --git a/sha1_name.c b/sha1_name.c
index 90419ef..6f8812a 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -452,13 +452,15 @@ static int get_sha1_basic(const char *str, int len, 
unsigned char *sha1)
int at, reflog_len, nth_prior = 0;
 
if (len == 40  !get_sha1_hex(str, sha1)) {
-   refs_found = dwim_ref(str, len, tmp_sha1, real_ref);
-   if (refs_found  0  warn_ambiguous_refs) {
-   warning(warn_msg, len, str);
-   if (advice_object_name_warning)
-   fprintf(stderr, %s\n

[PATCH 2/7] sha1_object_info_extended: rename status to type

2013-07-12 Thread Jeff King
The value we get from each low-level object_info function
(e.g., loose, packed) is actually the object type (or -1 for
error). Let's explicitly call it type, which will make
further refactorings easier to read.

Signed-off-by: Jeff King p...@peff.net
---
 sha1_file.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 4c2365f..e826066 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2394,7 +2394,7 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi)
 {
struct cached_object *co;
struct pack_entry e;
-   int status, rtype;
+   int type, rtype;
 
co = find_cached_object(sha1);
if (co) {
@@ -2408,23 +2408,23 @@ int sha1_object_info_extended(const unsigned char 
*sha1, struct object_info *oi)
 
if (!find_pack_entry(sha1, e)) {
/* Most likely it's a loose object. */
-   status = sha1_loose_object_info(sha1, oi-sizep, 
oi-disk_sizep);
-   if (status = 0) {
+   type = sha1_loose_object_info(sha1, oi-sizep, oi-disk_sizep);
+   if (type = 0) {
oi-whence = OI_LOOSE;
-   return status;
+   return type;
}
 
/* Not a loose object; someone else may have just packed it. */
reprepare_packed_git();
if (!find_pack_entry(sha1, e))
-   return status;
+   return type;
}
 
-   status = packed_object_info(e.p, e.offset, oi-sizep, rtype,
-   oi-disk_sizep);
-   if (status  0) {
+   type = packed_object_info(e.p, e.offset, oi-sizep, rtype,
+ oi-disk_sizep);
+   if (type  0) {
mark_bad_packed_object(e.p, sha1);
-   status = sha1_object_info_extended(sha1, oi);
+   type = sha1_object_info_extended(sha1, oi);
} else if (in_delta_base_cache(e.p, e.offset)) {
oi-whence = OI_DBCACHED;
} else {
@@ -2435,7 +2435,7 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi)
 rtype == OBJ_OFS_DELTA);
}
 
-   return status;
+   return type;
 }
 
 int sha1_object_info(const unsigned char *sha1, unsigned long *sizep)
-- 
1.8.3.rc3.24.gec82cb9

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] sha1_loose_object_info: make type lookup optional

2013-07-12 Thread Jeff King
Until recently, the only items to request from
sha1_object_info_extended were type and size. This meant
that we always had to open a loose object file to determine
one or the other.  But with the addition of the disk_size
query, it's possible that we can fulfill the query without
even opening the object file at all. However, since the
function interface always returns the type, we have no way
of knowing whether the caller cares about it or not.

This patch only modified sha1_loose_object_info to make type
lookup optional using an out-parameter, similar to the way
the size is handled (and the return value is 0 or -1 for
success or error, respectively).

There should be no functional change yet, though, as
sha1_object_info_extended, the only caller, will always ask
for a type.

Signed-off-by: Jeff King p...@peff.net
---
Obviously the end goal is to have sha1_object_info_extended do this
optionally, too (which happens in patch 6).

I'm not too happy about the stat_sha1_file function, which is almost
identical to open_sha1_file (and which in turn is almost the same thing
as has_loose_object). They all do:

  try operation X on sha1_file_name(sha1)
  prepare_alt_odb();
  foreach alt_odb
try operation X on alt_odb/sha1_file_name(sha1)

Unfortunately it's hard to do this kind of factoring out in C, because
the argument and return types for operation X are different in these
cases; you are stuck with providing callback function that takes a void
pointer to some operation-specific data. The boilerplate ends up worse
than the repeated code.

Another solution would be to have a find the file for loose object Y
function, and then just do operation X on that. But since X is a
filesystem operation in each case, you do not want to lose the atomicity
of performing the operation directly (not to mention incurring the cost
of an extra stat() on each file).

So I am open to clever refactoring suggestions.

 sha1_file.c | 48 +++-
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index e826066..39e7313 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1306,6 +1306,26 @@ static int git_open_noatime(const char *name)
}
 }
 
+static int stat_sha1_file(const unsigned char *sha1, struct stat *st)
+{
+   char *name = sha1_file_name(sha1);
+   struct alternate_object_database *alt;
+
+   if (!lstat(name, st))
+   return 0;
+
+   prepare_alt_odb();
+   errno = ENOENT;
+   for (alt = alt_odb_list; alt; alt = alt-next) {
+   name = alt-name;
+   fill_sha1_path(name, sha1);
+   if (!lstat(alt-base, st))
+   return 0;
+   }
+
+   return -1;
+}
+
 static int open_sha1_file(const unsigned char *sha1)
 {
int fd;
@@ -2363,7 +2383,9 @@ struct packed_git *find_sha1_pack(const unsigned char 
*sha1,
 
 }
 
-static int sha1_loose_object_info(const unsigned char *sha1, unsigned long 
*sizep,
+static int sha1_loose_object_info(const unsigned char *sha1,
+ enum object_type *typep,
+ unsigned long *sizep,
  unsigned long *disk_sizep)
 {
int status;
@@ -2372,6 +2394,20 @@ static int sha1_loose_object_info(const unsigned char 
*sha1, unsigned long *size
git_zstream stream;
char hdr[32];
 
+   /*
+* If we don't care about type or size, then we don't
+* need to look inside the object at all.
+*/
+   if (!typep  !sizep) {
+   if (disk_sizep) {
+   struct stat st;
+   if (stat_sha1_file(sha1, st)  0)
+   return -1;
+   *disk_sizep = st.st_size;
+   }
+   return 0;
+   }
+
map = map_sha1_file(sha1, mapsize);
if (!map)
return -1;
@@ -2386,7 +2422,9 @@ static int sha1_loose_object_info(const unsigned char 
*sha1, unsigned long *size
*sizep = size;
git_inflate_end(stream);
munmap(map, mapsize);
-   return status;
+   if (typep)
+   *typep = status;
+   return 0;
 }
 
 /* returns enum object_type or negative */
@@ -2408,8 +2446,8 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi)
 
if (!find_pack_entry(sha1, e)) {
/* Most likely it's a loose object. */
-   type = sha1_loose_object_info(sha1, oi-sizep, oi-disk_sizep);
-   if (type = 0) {
+   if (!sha1_loose_object_info(sha1, type,
+   oi-sizep, oi-disk_sizep)) {
oi-whence = OI_LOOSE;
return type;
}
@@ -2417,7 +2455,7 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi)
/* Not a loose object; someone else may have

[PATCH 5/7] packed_object_info: make type lookup optional

2013-07-12 Thread Jeff King
Currently, packed_object_info can save some work by not
calculating the size or disk_size of the object if the
caller is not interested. However, it always calculates the
true object type, whether the caller cares or not, and only
optionally returns the easy-to-get representation type.

Let's swap these types. The function will now return the
representation type (or OBJ_BAD on failure), and will only
optionally fill in the true type.

There should be no behavior change yet, as the only caller,
sha1_object_info_extended, will always feed it a type
pointer.

Signed-off-by: Jeff King p...@peff.net
---
 sha1_file.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 6f4aff3..2a1e230 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1783,7 +1783,7 @@ static int packed_object_info(struct packed_git *p, off_t 
obj_offset,
 }
 
 static int packed_object_info(struct packed_git *p, off_t obj_offset,
- unsigned long *sizep, int *rtype,
+ enum object_type *typep, unsigned long *sizep,
  unsigned long *disk_sizep)
 {
struct pack_window *w_curs = NULL;
@@ -1791,11 +1791,12 @@ static int packed_object_info(struct packed_git *p, 
off_t obj_offset,
off_t curpos = obj_offset;
enum object_type type;
 
+   /*
+* We always get the representation type, but only convert it to
+* a real type later if the caller is interested.
+*/
type = unpack_object_header(p, w_curs, curpos, size);
 
-   if (rtype)
-   *rtype = type; /* representation type */
-
if (sizep) {
if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) {
off_t tmp_pos = curpos;
@@ -1820,7 +1821,13 @@ static int packed_object_info(struct packed_git *p, 
off_t obj_offset,
*disk_sizep = revidx[1].offset - obj_offset;
}
 
-   type = packed_to_object_type(p, obj_offset, type, w_curs, curpos);
+   if (typep) {
+   *typep = packed_to_object_type(p, obj_offset, type, w_curs, 
curpos);
+   if (*typep  0) {
+   type = OBJ_BAD;
+   goto out;
+   }
+   }
 
 out:
unuse_pack(w_curs);
@@ -2471,11 +2478,11 @@ int sha1_object_info_extended(const unsigned char 
*sha1, struct object_info *oi)
return -1;
}
 
-   type = packed_object_info(e.p, e.offset, oi-sizep, rtype,
- oi-disk_sizep);
-   if (type  0) {
+   rtype = packed_object_info(e.p, e.offset, type, oi-sizep,
+  oi-disk_sizep);
+   if (rtype  0) {
mark_bad_packed_object(e.p, sha1);
-   type = sha1_object_info_extended(sha1, oi);
+   return sha1_object_info_extended(sha1, oi);
} else if (in_delta_base_cache(e.p, e.offset)) {
oi-whence = OI_DBCACHED;
} else {
-- 
1.8.3.rc3.24.gec82cb9

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] packed_object_info: hoist delta type resolution to helper

2013-07-12 Thread Jeff King
To calculate the type of a packed object, we must walk down
its delta chain until we hit a true base object with a real
type. Most of the code in packed_object_info is for handling
this case.

Let's hoist it out into a separate helper function, which
will make it easier to make the type-lookup optional in the
future (and keep our indentation level sane).

Signed-off-by: Jeff King p...@peff.net
---
Annoyingly, since the part we are moving comprises the bulk
of the function, the diff tends to show the opposite of what
actually happened: it looks like we renamed the function to its helper
name, and moved the other parts to a new function, which happens to have
the same name as the old one. So read the diff with care. :)

 sha1_file.c | 93 +++--
 1 file changed, 53 insertions(+), 40 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 39e7313..6f4aff3 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1713,52 +1713,21 @@ static int packed_object_info(struct packed_git *p, 
off_t obj_offset,
return type;
 }
 
-
 #define POI_STACK_PREALLOC 64
 
-static int packed_object_info(struct packed_git *p, off_t obj_offset,
- unsigned long *sizep, int *rtype,
- unsigned long *disk_sizep)
+static enum object_type packed_to_object_type(struct packed_git *p,
+ off_t obj_offset,
+ enum object_type type,
+ struct pack_window **w_curs,
+ off_t curpos)
 {
-   struct pack_window *w_curs = NULL;
-   unsigned long size;
-   off_t curpos = obj_offset;
-   enum object_type type;
off_t small_poi_stack[POI_STACK_PREALLOC];
off_t *poi_stack = small_poi_stack;
int poi_stack_nr = 0, poi_stack_alloc = POI_STACK_PREALLOC;
 
-   type = unpack_object_header(p, w_curs, curpos, size);
-
-   if (rtype)
-   *rtype = type; /* representation type */
-
-   if (sizep) {
-   if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) {
-   off_t tmp_pos = curpos;
-   off_t base_offset = get_delta_base(p, w_curs, tmp_pos,
-  type, obj_offset);
-   if (!base_offset) {
-   type = OBJ_BAD;
-   goto out;
-   }
-   *sizep = get_size_from_delta(p, w_curs, tmp_pos);
-   if (*sizep == 0) {
-   type = OBJ_BAD;
-   goto out;
-   }
-   } else {
-   *sizep = size;
-   }
-   }
-
-   if (disk_sizep) {
-   struct revindex_entry *revidx = find_pack_revindex(p, 
obj_offset);
-   *disk_sizep = revidx[1].offset - obj_offset;
-   }
-
while (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) {
off_t base_offset;
+   unsigned long size;
/* Push the object we're going to leave behind */
if (poi_stack_nr = poi_stack_alloc  poi_stack == 
small_poi_stack) {
poi_stack_alloc = alloc_nr(poi_stack_nr);
@@ -1769,11 +1738,11 @@ static int packed_object_info(struct packed_git *p, 
off_t obj_offset,
}
poi_stack[poi_stack_nr++] = obj_offset;
/* If parsing the base offset fails, just unwind */
-   base_offset = get_delta_base(p, w_curs, curpos, type, 
obj_offset);
+   base_offset = get_delta_base(p, w_curs, curpos, type, 
obj_offset);
if (!base_offset)
goto unwind;
curpos = obj_offset = base_offset;
-   type = unpack_object_header(p, w_curs, curpos, size);
+   type = unpack_object_header(p, w_curs, curpos, size);
if (type = OBJ_NONE) {
/* If getting the base itself fails, we first
 * retry the base, otherwise unwind */
@@ -1800,7 +1769,6 @@ out:
 out:
if (poi_stack != small_poi_stack)
free(poi_stack);
-   unuse_pack(w_curs);
return type;
 
 unwind:
@@ -1814,6 +1782,51 @@ unwind:
goto out;
 }
 
+static int packed_object_info(struct packed_git *p, off_t obj_offset,
+ unsigned long *sizep, int *rtype,
+ unsigned long *disk_sizep)
+{
+   struct pack_window *w_curs = NULL;
+   unsigned long size;
+   off_t curpos = obj_offset;
+   enum object_type type;
+
+   type = unpack_object_header(p, w_curs, curpos, size);
+
+   if (rtype)
+   *rtype = type; /* representation type */
+
+   if (sizep) {
+   if (type

[PATCH 6/7] sha1_object_info_extended: make type calculation optional

2013-07-12 Thread Jeff King
Each caller of sha1_object_info_extended sets up an
object_info struct to tell the function which elements of
the object it wants to get. Until now, getting the type of
the object has always been required (and it is returned via
the return type rather than a pointer in object_info).

This can involve actually opening a loose object file to
determine its type, or following delta chains to determine a
packed file's base type. These effects produce a measurable
slow-down when doing a cat-file --batch-check that does
not include %(objecttype).

This patch adds a typep query to struct object_info, so
that it can be optionally queried just like size and
disk_size. As a result, the return type of the function is
no longer the object type, but rather 0/-1 for success/error.

As there are only three callers total, we just fix up each
caller rather than keep a compatibility wrapper:

  1. The simpler sha1_object_info wrapper continues to
 always ask for and return the type field.

  2. The istream_source function wants to know the type, and
 so always asks for it.

  3. The cat-file batch code asks for the type only when
 %(objecttype) is part of the format string.

On linux.git, the best-of-five for running:

  $ git rev-list --objects --all objects
  $ time git cat-file --batch-check='%(objectsize:disk)'

on a fully packed repository goes from:

  real0m8.680s
  user0m8.160s
  sys 0m0.512s

to:

  real0m7.205s
  user0m6.580s
  sys 0m0.608s

Signed-off-by: Jeff King p...@peff.net
---
This ends up changing the behavior of sha1_object_info_extended without
changing its function signature. Given that it is a fairly inactive area
of the code and that there are no topics in flight, I think this is OK.
But an alternative could be to add (yet another) wrapper to leave the
first two call-sites untouched.

 builtin/cat-file.c |  7 ---
 cache.h|  1 +
 sha1_file.c| 20 +---
 streaming.c|  2 +-
 4 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index fe5c77f..163ce6c 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -150,7 +150,9 @@ static void expand_atom(struct strbuf *sb, const char 
*atom, int len,
if (!data-mark_query)
strbuf_addstr(sb, sha1_to_hex(data-sha1));
} else if (is_atom(objecttype, atom, len)) {
-   if (!data-mark_query)
+   if (data-mark_query)
+   data-info.typep = data-type;
+   else
strbuf_addstr(sb, typename(data-type));
} else if (is_atom(objectsize, atom, len)) {
if (data-mark_query)
@@ -229,8 +231,7 @@ static int batch_one_object(const char *obj_name, struct 
batch_options *opt,
return 0;
}
 
-   data-type = sha1_object_info_extended(data-sha1, data-info);
-   if (data-type = 0) {
+   if (sha1_object_info_extended(data-sha1, data-info)  0) {
printf(%s missing\n, obj_name);
fflush(stdout);
return 0;
diff --git a/cache.h b/cache.h
index c1fd82c..d3b770c 100644
--- a/cache.h
+++ b/cache.h
@@ -1130,6 +1130,7 @@ struct object_info {
 
 struct object_info {
/* Request */
+   enum object_type *typep;
unsigned long *sizep;
unsigned long *disk_sizep;
 
diff --git a/sha1_file.c b/sha1_file.c
index 2a1e230..52f7a1e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2452,24 +2452,26 @@ int sha1_object_info_extended(const unsigned char 
*sha1, struct object_info *oi)
 {
struct cached_object *co;
struct pack_entry e;
-   int type, rtype;
+   int rtype;
 
co = find_cached_object(sha1);
if (co) {
+   if (oi-typep)
+   *(oi-typep) = co-type;
if (oi-sizep)
*(oi-sizep) = co-size;
if (oi-disk_sizep)
*(oi-disk_sizep) = 0;
oi-whence = OI_CACHED;
-   return co-type;
+   return 0;
}
 
if (!find_pack_entry(sha1, e)) {
/* Most likely it's a loose object. */
-   if (!sha1_loose_object_info(sha1, type,
+   if (!sha1_loose_object_info(sha1, oi-typep,
oi-sizep, oi-disk_sizep)) {
oi-whence = OI_LOOSE;
-   return type;
+   return 0;
}
 
/* Not a loose object; someone else may have just packed it. */
@@ -2478,7 +2480,7 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi)
return -1;
}
 
-   rtype = packed_object_info(e.p, e.offset, type, oi-sizep,
+   rtype = packed_object_info(e.p, e.offset, oi-typep, oi-sizep,
   oi-disk_sizep);
if (rtype  0

[PATCH 7/7] sha1_object_info_extended: pass object_info to helpers

2013-07-12 Thread Jeff King
We take in a struct object_info which contains pointers to
storage for items the caller cares about. But then rather
than pass the whole object to the low-level loose/packed
helper functions, we pass the individual pointers.

Let's pass the whole struct instead, which will make adding
more items later easier.

Signed-off-by: Jeff King p...@peff.net
---
This one is an optional cleanup. The diff is quite noisy due to all of
the s/foo/oi-foo/, so it is arguable whether the result is nicer or
not. It would make later additions to object_info nicer, but I do not
plan to add any more.

It _would_ have been a nice cleanup to do at the beginning of the series
(and further diffs would not have to add extra parameters to the
function calls), but that would make the incremental learn to do type
optionally patches quite awkward.

So I am on the fence over this one, and do not mind too much if it gets
dropped.

 sha1_file.c | 49 ++---
 1 file changed, 22 insertions(+), 27 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 52f7a1e..563f521 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1783,8 +1783,7 @@ static int packed_object_info(struct packed_git *p, off_t 
obj_offset,
 }
 
 static int packed_object_info(struct packed_git *p, off_t obj_offset,
- enum object_type *typep, unsigned long *sizep,
- unsigned long *disk_sizep)
+ struct object_info *oi)
 {
struct pack_window *w_curs = NULL;
unsigned long size;
@@ -1797,7 +1796,7 @@ static int packed_object_info(struct packed_git *p, off_t 
obj_offset,
 */
type = unpack_object_header(p, w_curs, curpos, size);
 
-   if (sizep) {
+   if (oi-sizep) {
if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) {
off_t tmp_pos = curpos;
off_t base_offset = get_delta_base(p, w_curs, tmp_pos,
@@ -1806,24 +1805,24 @@ static int packed_object_info(struct packed_git *p, 
off_t obj_offset,
type = OBJ_BAD;
goto out;
}
-   *sizep = get_size_from_delta(p, w_curs, tmp_pos);
-   if (*sizep == 0) {
+   *oi-sizep = get_size_from_delta(p, w_curs, tmp_pos);
+   if (*oi-sizep == 0) {
type = OBJ_BAD;
goto out;
}
} else {
-   *sizep = size;
+   *oi-sizep = size;
}
}
 
-   if (disk_sizep) {
+   if (oi-disk_sizep) {
struct revindex_entry *revidx = find_pack_revindex(p, 
obj_offset);
-   *disk_sizep = revidx[1].offset - obj_offset;
+   *oi-disk_sizep = revidx[1].offset - obj_offset;
}
 
-   if (typep) {
-   *typep = packed_to_object_type(p, obj_offset, type, w_curs, 
curpos);
-   if (*typep  0) {
+   if (oi-typep) {
+   *oi-typep = packed_to_object_type(p, obj_offset, type, 
w_curs, curpos);
+   if (*oi-typep  0) {
type = OBJ_BAD;
goto out;
}
@@ -2404,9 +2403,7 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
 }
 
 static int sha1_loose_object_info(const unsigned char *sha1,
- enum object_type *typep,
- unsigned long *sizep,
- unsigned long *disk_sizep)
+ struct object_info *oi)
 {
int status;
unsigned long mapsize, size;
@@ -2418,12 +2415,12 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
 * If we don't care about type or size, then we don't
 * need to look inside the object at all.
 */
-   if (!typep  !sizep) {
-   if (disk_sizep) {
+   if (!oi-typep  !oi-sizep) {
+   if (oi-disk_sizep) {
struct stat st;
if (stat_sha1_file(sha1, st)  0)
return -1;
-   *disk_sizep = st.st_size;
+   *oi-disk_sizep = st.st_size;
}
return 0;
}
@@ -2431,19 +2428,19 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
map = map_sha1_file(sha1, mapsize);
if (!map)
return -1;
-   if (disk_sizep)
-   *disk_sizep = mapsize;
+   if (oi-disk_sizep)
+   *oi-disk_sizep = mapsize;
if (unpack_sha1_header(stream, map, mapsize, hdr, sizeof(hdr))  0)
status = error(unable to unpack %s header,
   sha1_to_hex(sha1));
else if ((status = parse_sha1_header(hdr, size))  0

Re: [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode

2013-07-12 Thread Jeff King
On Fri, Jul 12, 2013 at 10:47:46AM +0200, Michael Haggerty wrote:

  The solution feels a little hacky, but I'm not sure there is a better
  one short of reverting the warning entirely.
  
  We could also tie it to warn_ambiguous_refs (or add another config
  variable), but I don't think that makes sense. It is not about do I
  care about ambiguities in this repository, but rather I am going to
  do a really large number of sha1 resolutions, and I do not want to pay
  the price in this particular code path for the warning).
  
  There may be other sites that resolve a large number of refs and run
  into this, but I couldn't think of any. Doing for_each_ref would not
  have the same problem, as we already know they are refs there.
 
 ISTM that callers of --batch-check would usually know whether they are
 feeding it SHA-1s or arbitrary object specifiers.  (And in most cases
 when there are a large number of them, they are probably all SHA-1s).

Thanks for bringing this up; I had meant to outline this option in the
cover letter, but forgot when posting.

If were designing cat-file from scratch, I'd consider having --batch
take just 40-hex sha1s in the first place. Since we can't do that now
for backwards compatibility, having an option is the best we can do
there.

But having to use such an option to avoid a 10x slowdown just strikes me
as ugly for two reasons:

  1. It's part of the user-visible interface, and it seems like an extra
 wtf? moment when somebody wonders why their script is painfully
 slow, and why they need a magic option to fix it. We're cluttering
 the interface forever.

  2. It's a sign that the refname check was put in for a specific
 performance profile (resolving one or a few refs), but didn't
 consider resolving a large number. I'm wondering what other
 performance regressions it's possible to trigger.

 So instead of framing this change as skip object refname ambiguity
 check (i.e., sacrifice some correctness for performance), perhaps it
 would be less hacky to offer the following new feature:

I wouldn't frame it as correctness for performance at all. The check
does not impact behavior at all, and is purely informational (to catch
quite a rare situation, too). I'd argue that the check simply isn't that
interesting in this case in the first place.

 To cat-file we could add an option like --sha1-only or --literal or
 --no-dwim (... better names are failing me) which would skip *all*
 dwimming of 40-character strings.  It would also assume that any shorter
 strings are abbreviated SHA-1s and fail if they are not.  This would be
 a nice feature by itself (these are object names, dammit, and don't try
 to tell me differently!) and would have the additional small advantage
 of speeding up lookups of abbreviated SHA-1s, which (regardless of your
 patch) otherwise go through the whole DWIM process.

I can see in theory that somebody might want that, but I am having a
hard time thinking of a practical use.

 I understand that such an option alone would leave some old scripts
 suffering the performance regression caused by the check, but in most
 cases it would be easy to fix them.

I'm less worried about old scripts, and more worried about carrying
around a confusing option forever, especially when I expect most
invocations to use it (perhaps my experience is biased, but I use
cat-file --batch quite a lot in scripting, and it has almost always been
on the output of rev-list).

IOW, it seems like a poor default, and we are choosing it only because
of backwards compatibility. I guess another option is to switch the
default with the usual deprecation dance.

Yet another option is to consider what the check is doing, and
accomplish the same thing in a different way. The real pain is that we
are individually trying to resolve each object by hitting the filesystem
(and doing lots of silly checks on the refname format, when we know it
must be valid).

We don't actually care in this case if the ref list is up to date (we
are not trying to update or read a ref, but only know if it exists, and
raciness is OK). IOW, could we replace the dwim_ref call for the warning
with something that directly queries the ref cache?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] config: add support for http.url.* settings

2013-07-12 Thread Jeff King
On Thu, Jul 11, 2013 at 02:50:03PM -0700, Kyle J. McKay wrote:

 The url value is considered a match to a url if the url value
 is either an exact match or a prefix of the url which ends on a
 path component boundary ('/').  So https://example.com/test; will
 match https://example.com/test; and https://example.com/test/too;
 but not https://example.com/testextra;.
 
 Longer matches take precedence over shorter matches with
 environment variable settings taking precedence over all.
 
 With this configuration:
 
 [http]
 useragent = other-agent
 noEPSV = true
 [http https://example.com;]
 useragent = example-agent
 sslVerify = false
 [http https://example.com/path;]
 useragent = path-agent

I like the general direction you are going, and especially the path
prefix matching is something I had always meant to implement for the
credential code.

A few comments on the implementation:

 +enum http_option_type {
 + opt_post_buffer = 0,
 + opt_min_sessions,
 +#ifdef USE_CURL_MULTI
 + opt_max_requests,
 +#endif
 + opt_ssl_verify,
 + opt_ssl_try,
 + opt_ssl_cert,
 +#if LIBCURL_VERSION_NUM = 0x070903
 + opt_ssl_key,
 +#endif
 +#if LIBCURL_VERSION_NUM = 0x070908
 + opt_ssl_capath,
 +#endif
 + opt_ssl_cainfo,
 + opt_low_speed,
 + opt_low_time,
 + opt_no_epsv,
 + opt_http_proxy,
 + opt_cookie_file,
 + opt_user_agent,
 + opt_passwd_req,
 + opt_max
 +};

We usually put enum fields in ALL_CAPS to mark them as constants (though
you can find few exceptions in the code).

 +static size_t http_options_url_match_prefix(const char *url,
 + const char *url_prefix,
 + size_t url_prefix_len)
 +{

It looks like you're matching the URLs as raw strings, and I don't see
any canonicalization going on. What happens if I have
https://example.com/foo+bar; in my config, but then I visit
https://example.comfoo%20bar;?

Or what about optional components? If I have https://example.com; in my
config, but I am visiting https://p...@example.com/;, shouldn't it
match? The config spec is more general than my specific URL; you would
not want it to match in the opposite direction, though.

I think you would want to break down the URL into fields, canonicalize
each field by undoing any encoding, and then compare the broken-down
URLs, as credential_match does (adding in your prefix/boundary matching to
the path instead of a straight string comparison).

I think you could mostly reuse the code there by doing:

  1. Create a struct url that contains the broken-down form; struct
 credential would contain a url.

  2. Pull out credential_from_url into int url_parse(struct url *,
 const char *).

  3. Pull out credential_match into int url_match(struct url *, struct
 url *).

  4. Parse and compare URLs from http.$URL.* during the config read
 (similar to credential_config_callback).

That does make your by-length ordering impossible, but I don't think you
can do it in the general case. If I have:

  [http http://p...@example.com;] foo = 1
  [http http://example.com:8080;] foo = 2

and I visit http://p...@example.com:8080;, which one is the winner? I
don't think there is an unambiguous definition. I'd suggest instead just
using the usual last one wins strategy that our config uses. It has
the unfortunate case that:

  [http http://example.com;]
 foo = 1
  [http]
 foo = 2

will always choose http.foo=2, but I don't think that is a big problem
in practice. You often only set one or the other, and in the cases where
you want to have a non-standard default and then override it with
another host-specific case, the last one wins rule makes it simple to
explain that you need to specify keys in most-general to most-specific
order.

  static int http_options(const char *var, const char *value, void *cb)
  {
 - if (!strcmp(http.sslverify, var)) {
 + const char *url = (const char *)cb;

No need to cast from void, this isn't C++. :)

The rest of the http_options() changes look like what I'd expect.

 @@ -344,7 +479,7 @@ void http_init(struct remote *remote, const char *url, 
 int proactive_auth)
  
   http_is_verbose = 0;
  
 - git_config(http_options, NULL);
 + git_config(http_options, (void *)url);

No need to cast again. :)

So this is the URL that we got on the command line. Which means that if
we get a redirect, we will not re-examine the options. I think that's
OK; we do not even _see_ the redirect, as it happens internally within
curl. The credential code has the same problem, but it is not a security
issue because curl clears the credentials on redirect.

However, it may be worth mentioning in the documentation that the config
options operate on the URL you give git, _not_ necessarily on the actual
URL you are hitting at any given moment (the gitcredentials(7) page
should probably make the same distinction).

-Peff
--
To unsubscribe 

Re: [PATCH v5 0/5] allow more sources for config values

2013-07-12 Thread Jeff King
On Thu, Jul 11, 2013 at 04:26:02PM -0700, Junio C Hamano wrote:

 Thanks.
 
 The differences since the last round I see are these.  And I think
 the series overall makes sense (I haven't look hard enough to pick
 any nits yet, though).

Both v4 and the interdiff look fine to me. I gave it one more cursory
read-through, and I don't see any problems. So:

Acked-by: Jeff King p...@peff.net

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] t0008: avoid SIGPIPE race condition on fifo

2013-07-12 Thread Jeff King
On Thu, Jul 11, 2013 at 09:09:55AM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  On Wed, Jul 10, 2013 at 12:36:40PM -0400, Brian Gernhardt wrote:
 
  The newest test in t0008 streaming support for --stdin, seems to
  hang sporadically on my MacBook Pro (running 10.8.4).  The hang seems
  to be related to running it in parallel with other tests, as I can
  only reliably cause it by running with prove  and -j 3.  However, once
  that has hung I am able to semi-reliably have it occur by running the
  test separately (with the test hung in the background, using separate
  trash directories via the --root option).
 
  I can't replicate the hang here (on Linux) doing:
 
for i in `seq 1 30`; do
./t0008-ignores.sh --root=/tmp/foo/$i 
done
 
 It seems to hang on me with bash, but not dash, here.

Thanks, I was able to replicate it with bash, and like Brian, I saw it
hanging in the second read. strace revealed that we were blocked on
open(out).

The patch below should fix it. I'm still not sure why the choice of
shell matters; it may simply be a timing fluke that bash is more likely
to hit for some reason.

-- 8 --
Subject: [PATCH] t0008: avoid SIGPIPE race condition on fifo

To test check-ignore's --stdin feature, we use two fifos to
send and receive data. We carefully keep a descriptor to its
input open so that it does not receive EOF between input
lines. However, we do not do the same for its output. That
means there is a potential race condition in which
check-ignore has opened the output pipe once (when we read
the first line), and then writes the second line before we
have re-opened the pipe.

In that case, check-ignore gets a SIGPIPE and dies. The
outer shell then tries to open the output fifo but blocks
indefinitely, because there is no writer.  We can fix it by
keeping a descriptor open through the whole procedure.

This should also help if check-ignore dies for any other
reason (we would already have opened the fifo and would
therefore not block, but just get EOF on read).

However, we are technically still susceptible to
check-ignore dying early, before we have opened the fifo.
This is an unlikely race and shouldn't generally happen in
practice, though, so we can hopefully ignore it.

Signed-off-by: Jeff King p...@peff.net
---
 t/t0008-ignores.sh | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
index a56db80..c29342d 100755
--- a/t/t0008-ignores.sh
+++ b/t/t0008-ignores.sh
@@ -697,13 +697,21 @@ test_expect_success PIPE 'streaming support for --stdin' '
# shell, and then echo to the fd. We make sure to close it at
# the end, so that the subprocess does get EOF and dies
# properly.
+   #
+   # Similarly, we must keep out open so that check-ignore does
+   # not ever get SIGPIPE trying to write to us. Not only would that
+   # produce incorrect results, but then there would be no writer on the
+   # other end of the pipe, and we would potentially block forever trying
+   # to open it.
exec 9in 
+   exec 8out 
test_when_finished exec 9- 
+   test_when_finished exec 8- 
echo 9 one 
-   read response out 
+   read response 8 
echo $response | grep ^\.gitignore:1:one one 
echo 9 two 
-   read response out 
+   read response 8 
echo $response | grep ^::two
 '
 
-- 
1.8.3.rc1.30.gff0fb75

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] cat-file --batch-check performance improvements

2013-07-12 Thread Jeff King
On Fri, Jul 12, 2013 at 10:23:34AM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  The results for running (in linux.git):
 
$ git rev-list --objects --all objects
$ git cat-file --batch-check='%(objectsize:disk)' objects /dev/null
 
 I can see how these patches are very logical avenue to grab only
 on-disk footprint for large number of objects, but among the type,
 payload size and on-disk footprint, I find it highly narrow niche
 that a real user or script is interested _only_ in on-disk footprint
 without even worrying about the type of object.

Yeah, I agree it is a bit of a niche. However, there are other code
paths that might want only the size and not the type (e.g., we already
know the object is a blob, but want to know size before deciding how to
handle diff). But in general, I doubt the performance impact is a big
deal there. It's only measurable when you're doing millions of objects.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t0008: avoid SIGPIPE race condition on fifo

2013-07-12 Thread Jeff King
On Fri, Jul 12, 2013 at 09:23:54AM -0700, Junio C Hamano wrote:

  In that case, check-ignore gets a SIGPIPE and dies. The
  outer shell then tries to open the output fifo but blocks
  indefinitely, because there is no writer.  We can fix it by
  keeping a descriptor open through the whole procedure.
 
 Ahh, figures.
 
 I wish I were smart enough to figure that out immediately after
 seeing the test that does funny things to in with 9.

I wish I were smart enough to have figured it out when I was writing the
funny stuff with in and 9 in the first place. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] wt-status: use format function attribute for status_printf

2013-07-12 Thread Jeff King
On Fri, Jul 12, 2013 at 09:10:30AM -0700, Junio C Hamano wrote:

  You can fix it with -Wno-zero-format-length, so the hassle is not
  huge. But I am also inclined to just drop this one. We have lived
  without the extra safety for a long time, and list review does tend to
  catch such problems in practice.
 
 I am tempted to actually merge the original one as-is without any of
 the workaround, and just tell people to use -Wno-format-zero-length.

Yeah, I think the only downside is the cognitive burden on individual
developers who try -Wall and have to figure out that we need
-Wno-zero-format-length (and that the warnings are not interesting).

It would be nice to add it automatically to CFLAGS, but I do not know if
we can reliably detect from the Makefile that we are compiling under
gcc.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode

2013-07-14 Thread Jeff King
On Sun, Jul 14, 2013 at 08:45:37PM -0700, Junio C Hamano wrote:

  To cat-file we could add an option like --sha1-only or --literal or
  --no-dwim (... better names are failing me) which would skip *all*
  dwimming of 40-character strings.  It would also assume that any shorter
  strings are abbreviated SHA-1s and fail if they are not.  This would be
  a nice feature by itself (these are object names, dammit, and don't try
  to tell me differently!) and would have the additional small advantage
  of speeding up lookups of abbreviated SHA-1s, which (regardless of your
  patch) otherwise go through the whole DWIM process.
 
  I can see in theory that somebody might want that, but I am having a
  hard time thinking of a practical use.
 
 Would it be a good alternative to call get_sha1_hex() to catch the
 most common case (reading from rev-list output, for example) and
 then let the more general get_sha1() to let extended SHA-1 to be
 handled?

For a 40-byte sha1, that _should_ be what get_sha1 does (i.e., go more
or less directly to the 40-hex code path, and return). And that's
basically what happens now, except that after we do so, we now have the
extra oh, is it also a refname? check.

For a shortened sha1, I don't think it would have the same behavior.
Right now, I believe the order is to treat a short sha1 as a possible
refname, and only if that fails consider it as a short sha1.

  IOW, it seems like a poor default, and we are choosing it only because
  of backwards compatibility. I guess another option is to switch the
  default with the usual deprecation dance.
 
 I agree that did you mean the unreadable refname or 40-hex object?
 turned on everywhere get_sha1() is called is a very poor default.  I
 wonder if we can limit it only to the end-user input somehow at the
 API level.

It is easy to do on top of my patch (just flip the default on the switch
I introduced, and turn it back on in whichever code paths are
appropriate).  But the question is: what is end-user input? Do you mean
command-line arguments to rev-list and friends?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode

2013-07-14 Thread Jeff King
On Fri, Jul 12, 2013 at 12:30:07PM +0200, Michael Haggerty wrote:

 But with particular respect to git cat-file, I see problems:
 
 1. get_ref_snapshot() would have to read all loose and packed refs
 within the specified subtree, because loose refs have to be read before
 packed refs.  So the call would be expensive if there are a lot of loose
 refs.  And DWIM wouldn't know in advance where the references might be,
 so it would have to set prefix=.  If many refs are looked up, then it
 would presumably be worth it.  But if only a couple of lookups are done
 and there are a lot of loose refs, then using a cache would probably
 slow things down.
 
 The slowdown could be ameliorated by adding some more intelligence, for
 example only populating the loose refs cache after a certain number of
 lookups have already been done.

I had assumed we would load the loose ref cache progressively by
directory. Of course that only helps avoiding refs/foo/* when you are
interested in refs/heads/foo. If your refs/heads/* is very large,
you have to load all of it, and at some point it is cheaper to do a few
stat() calls than to actually readdir() the whole directory. On the
other hand, that is basically how packed-refs work now (if we hit any
ref, we have to load the whole file), and repos with many refs would
typically pack them all anyway.

 2. A git cat-file --batch process can be long-lived.  What guarantees
 would users expect regarding its lookup results?

I hadn't really intended this for general lookups, but just to speed up
the refname warning, where being out-of-date is more acceptable (since
the warning is a purely informational hint).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] config: add support for http.url.* settings

2013-07-14 Thread Jeff King
On Fri, Jul 12, 2013 at 06:07:35AM -0700, Kyle J. McKay wrote:

 It looks like you're matching the URLs as raw strings, and I don't see
 any canonicalization going on. What happens if I have
 https://example.com/foo+bar; in my config, but then I visit
 https://example.comfoo%20bar;?
 
 The documentation for http.url.* says:

Yes, I know. My question was more rhetorical, as in what _should_
happen. It seems unfriendly not to do at least some basic
canonicalization with respect to encodings.

 That does make your by-length ordering impossible, but I don't
 think you
 can do it in the general case. If I have:
 
  [http http://p...@example.com;] foo = 1
  [http http://example.com:8080;] foo = 2
 
 and I visit http://p...@example.com:8080;, which one is the winner?
 
 If I were to spilt everything out, then I would only have the second
 one match.  The first config is on a different port, quite possibly
 an entirely different service.  It shouldn't match.

Yeah, sorry, that was a bad example, because a missing port is not
do not care, but rather default port. A better example is:

  [http http://p...@example.com/;] foo = 1
  [http http://example.com/path/;] foo = 2

If we see the URL http://p...@example.com/path/foo;, I would expect the
second one to match (it does not under a pure-prefix scheme). If we were
to make it match, you cannot say which of the two entries is more
specific they are both specific in different ways.

 I don't think it's necessary to split the URL apart though.  Whatever
 URL the user gave to git on the command line (at some point even if
 it's now stored as a remote setting in config) complete with URL-
 encoding, user names, port names, etc. is the same url, possibly
 shortened, that needs to be used for the http.url.option setting.

I'm not sure I agree with this. The URL handed to git is not always
typed directly by the user. E.g., it might be cut-and-paste from an
email or website, or it may even be fed by a script (e.g., the repo
tool will pull URLs from its manifest file).

 I think that's simple and very easy to explain and avoids user
 confusion and surprise while still allowing a default to be set for a
 site but easily overridden for a portion of that site without needing
 to worry about the order config files are processed or the order of
 the [http url] sections within them.

But the user has to worry about last-one-wins anyway for same-length
prefixes, as you noted below.

 using the usual last one wins strategy that our config uses. It has
 the unfortunate case that:
 
  [http http://example.com;]
 foo = 1
  [http]
 foo = 2
 
 will always choose http.foo=2, but I don't think that is a big problem
 in practice.
 
 I think that violates the principle of least surprise.  In this case:
 
 [http https://cacert.org;]
   sslVerify = false
 [http]
   sslVerify = true

Sure, I think yours is less surprising in this case. But it is more
surprising in other cases, like ones where URL encoding or optional
components are involved.  E.g., imagine your two options above were
flipped (you do not usually verify ssl, but it is very important for you
to do so against cacert.org). An automated tool like repo tries to clone
from https://u...@cacert.org, and you might be surprised that SSL
verification is not turned on.

 -   git_config(http_options, NULL);
 +   git_config(http_options, (void *)url);
 
 No need to cast again. :)
 
 Ah, but there is in order to avoid a warning:

Ah, you're right. Sorry for the noise.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] config: add support for http.url.* settings

2013-07-14 Thread Jeff King
On Sat, Jul 13, 2013 at 12:46:17PM -0700, Kyle J. McKay wrote:

 I expect it will be easier just to normalize the URL without
 splitting.  That is, lowercase the parts that are case-insensitive
 (scheme and host name) and adjust the URL-escaping to remove URL
 escaping (%xx) from characters that don't need it but add it to any
 for which it is required that are not escaped (according to RFC
 1738).

I think you are suggesting doing better than this, but just to be clear,
we cannot treat the URL as a simple string and just decode and
re-encode.

One of the things that gets encoded are the delimiting characters. So if
I have the URL:

  https://foo%3a...@example.com

you would canonicalize it into:

  https://foo:b...@example.com

But those are two different URLs entirely; the first has the username
foo:bar, and the second has the username foo and the password bar.

I admit that these are unlikely to come up in practice, but I am worried
that there is some room for mischief here. For example:

  https://example.com%2ftricky.host/repo.git

If we canonicalize that into:

  https://example.com/tricky.host/repo.git

and do a lookup, we think we are hitting example.com, but we are
actually hitting example.comtricky.host (i.e., that is how curl will
interpret it).  If we were deciding to use a stored credential based on
that information, it would be quite bad (we would leak credentials to
the owner of comtricky.host). I know your patch does not impact the
credential lookup behavior, but it would be nice in the long run if the
two lookups followed the same rules.

So I think the three options are basically:

  1. No decoding, require the user to use a consistent prefix between
 config and other uses of the URL. I.e., your current patch. The
 downside is that it doesn't handle any variation of input.

  2. Full decoding into constituent parts. This handles canonicalization
 of encoding, and also allows wildcard components (e.g., a URL
 with username can match the generic https://example.com; in the
 config). The downside is that you cannot do a longest prefix wins
 rule for overriding.

  3. Full decoding as in (2), but then re-assemble into a canonicalized
 encoded URL. The upside is that you get to do longest prefix
 wins, but you can no longer have wildcard components. I think this
 is what you are suggesting in your mail.

I'm still in favor of (2), because I think the wildcard components are
important (and while I agree that the longest prefix wins is nicer, we
already have last one wins for the rest of the config, including the
credential URL matcher). But I certainly think (3) is better than (1).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] config: add support for http.url.* settings

2013-07-14 Thread Jeff King
On Sun, Jul 14, 2013 at 09:02:19PM -0700, Junio C Hamano wrote:

  Or proceed with what's there right now (there are a few pending
  updates from reviewers) and then, as Junio says above, adjust it later
  if needed?
 
 I have been assuming that strictly textual match will be a subset
 of the matching semantics Aaron and Peff suggested.  That is, if we
 include your version in the upcoming release, the user writes the
 http.URLpattern.variable configuration so that the entries match
 what they want them to match, the enhanced URL matcher Aaron and
 Peff suggested will still make them match.
 
 Am I mistaken?  Will there be some URLpattern that will not match
 with the same URL literally?

I think we need to decide now, because the two schemes are not
compatible, and switching will break setups. Yes, the matcher that Aaron
and I suggest is a strict superset (i.e., we will not stop matching
things that used to match), which is good. But we would not be able to
implement longest prefix wins overriding anymore, which would change
the meaning of cases like:

  [http https://example.com;] foo = 1
  [http] foo = 2

(under Kyle's scheme it is 1, and under ours 2). We can probably
come up with some clever rules for overriding a broken-down URL that
would stay backwards compatible. E.g., longest-prefix-match if there are
no wildcarded components, and last-one-wins if there are. But that is
not a rule I would want readers to have to puzzle out in the
documentation.

So I think we are much better off to decide the semantics now.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix some sparse warnings

2013-07-16 Thread Jeff King
On Tue, Jul 16, 2013 at 07:57:20AM +0200, Johannes Sixt wrote:

 Am 7/15/2013 19:31, schrieb Ramsay Jones:
  Sparse issues three Using plain integer as NULL pointer warnings.
  Each warning relates to the use of an '{0}' initialiser expression
  in the declaration of an 'struct object_info'.
 
 I question the value of this warning. Initialization with '= {0}' is a
 well-established idiom, and sparse should know about it. Also, plain 0
 *is* a null pointer constant.

I agree with you. It's not a bug, and I think sparse is being overly
picky here; it is missing the forest for the trees in interpreting the
idiom.

Still, it may be worth tweaking in the name of eliminating compiler
noise, since it does not cost us very much to do so (and I believe we
have done so in the past, too).

We could also ask people with sparse to turn off the use NULL instead
of 0 warning, but I think it _is_ a useful warning elsewhere (even
though it is never a bug, it violates our style guidelines and may be an
indication of a bug). It would be nice if sparse learned to ignore the
warning in this particular idiom, but I am not going to hold my breath
for that.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add the GIT_SENTINEL macro

2013-07-18 Thread Jeff King
On Thu, Jul 18, 2013 at 09:02:12PM +0100, Ramsay Jones wrote:

 The sentinel function attribute is not understood by versions of
 the gcc compiler prior to v4.0. At present, for earlier versions
 of gcc, the build issues 108 warnings related to the unknown
 attribute. In order to suppress the warnings, we conditionally
 define the GIT_SENTINEL macro to provide the sentinel attribute
 for gcc v4.0 and newer.
 
 Signed-off-by: Ramsay Jones ram...@ramsay1.demon.co.uk
 ---

Acked-by: Jeff King p...@peff.net

 -__attribute__((sentinel))
 +GIT_SENTINEL(0)
  void argv_array_pushl(struct argv_array *, ...);

We could also add GIT_SENTINEL to handle the default-zero case, but I do
not think it is worth the trouble.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git tag output order is incorrect (IMHO)

2013-07-19 Thread Jeff King
On Fri, Jul 19, 2013 at 12:40:55PM -0700, Junio C Hamano wrote:

 Andreas Schwab sch...@linux-m68k.org writes:
 
  Rahul Bansal rahul.ban...@rtcamp.com writes:
 
  IMHO git tag is expected to show tag-list ordered by versions. 
 
  A git tag can be anything, not related to versions at all.
 
 Correct.
 
 But that does not prevent somebody to add git tag --sort=X option
 to the command, just like git for-each-ref has --sort=X option.

A while ago I started on (but did not get very far on) unifying the ref
selection code for for-each-ref, tag, and branch. It would be nice if
they all supported the same set of --contains/--points-at/--merged/--sort,
etc.

I do plan to finish it eventually, but if anyone else feels like picking
it up, I'd be glad to review patches and/or share my work-in-progress as
a starting point.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: getting git from kernel.org is failing

2013-07-23 Thread Jeff King
On Tue, Jul 23, 2013 at 10:06:05PM +0200, Fredrik Gustafsson wrote:

 Confirmed (tested both with and without trailing /, IIRC there was some
 configuration change recently that effect that):
 iveqy@kolya:~/slask/git$ git clone
 https://git.kernel.org/cgit/git/git.git/
 Klonar till git...
 error: Unable to get pack index
 https://git.kernel.org/cgit/git/git.git/objects/pack/pack-1e2bca8b5127039cff42b1cd3d47767fb577cd4f.idx

That's weird. We shouldn't be fetching an index at all unless dumb http
is in use. When I try to clone from the URL above, I also get shunted to
dumb-http (and the repo on the server appears to be poorly packed).

But if I go to:

  https://git.kernel.org/pub/scm/git/git.git

then smart HTTP works fine. I wonder if there is a problem in the cgit
setup on kernel.org (or if it was even intended that you could fetch
from the cgit URL at all; the cgit page shows the clone URLs in
/pub/scm/git).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 4/4] config: allow http.url.* any user matching

2013-07-24 Thread Jeff King
On Mon, Jul 22, 2013 at 01:24:06PM -0700, Kyle J. McKay wrote:

 I am not yet convinced that the precedence rule specified in this
 what we want (I do not have an example why it is *not* what we want,
 either).  Another definition could be if user@ is present in the
 request, give lower precedence to config entries for the site
 without user@ than entries with user@, and I do not have a strong
 opinion myself which one between the two is better (and there may be
 third and other possible rule).
 
 Comments?
 
 Consider this site:
 [...]

Thanks for explaining, and sorry I missed out on the last few rounds of
review.

I think your scheme (normalization plus special handling of the username
field) addresses my biggest concern, which is matching in the face of
optional usernames. The only two things that make me wary are:

  1. The explanation and special-casing of username is a little
 complicated to explain.

  2. The behavior for resolving the value when faced with multiple
 possibilities is completely unlike the rest of the config system
 (both dropping last-one-wins, and unlike the URL matching for
 credentials).

I think we can decide that (2) is worth it if your semantics are more
flexible in practice. It would be nice to see real-world feedback on how
people use it before setting the behavior in stone, but there's sort of
a chicken and egg problem there.

For (1), I wonder if the explanation would be simpler if the precedences
of each sub-part were simply laid out. That is, would it be correct to
say something like:

  For a config key to match a URL, each element of the config key (if
  present) is compared to that of the URL, in the following order:

1. Protocol (e.g., `https` in `https://example.com/`). This field
   must match exactly between the config key and the URL.

2. Host/domain name (e.g., `example.com` in `https://example.com/`).
   This field must match exactly between the config key and the URL.

3. Path (e.g., `repo.git` in `https://example.com/repo.git`). This
   field is prefix-matched by slash-delimited path elements, so that
   config key `foo/` matches URL `foo/bar`. Longer matches take
   precedence (so `foo/bar`, if it exists, is a better match than
   just `foo/`).

4. Username (e.g., `user` in `https://u...@example.com/repo.git`).

  The list above is ordered by decreasing precedence; a URL that matches
  a config key's path is preferred to one that matches its username.

I don't know if that is more or less clear of an explanation. It makes
more sense to me, but that is probably because I wrote it. I'm also not
100% sure it describes your implementation, but I think it is equivalent
to the prefix matching with normalization.

I have a few other comments on specific patches; I'll send them
separately.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 4/4] config: allow http.url.* any user matching

2013-07-24 Thread Jeff King
On Mon, Jul 22, 2013 at 05:56:44AM -0700, Kyle J. McKay wrote:

 + matches a url if it refers to the same scheme, host and port and the
 + path portion is an exact match or a prefix that matches at a /
 + boundary.  If url does not include a user name, it will match a url
 + with any username otherwise the user name must match as well (the
 + password part, if present in the url, is always ignored).  Longer url
 + path matches take precedence over shorter matches no matter what order
 + they occur in.  For example, if both https://u...@example.com/path; and
 + https://example.com/path/name; are used as a config url value and
 + then https://u...@example.com/path/name/here; is passed to a git
 + command, the settings in the https://example.com/path/name; section

These https://...; should probably be `https://...`, to mark them in
asciidoc as literals.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 3/4] tests: add new test for the url_normalize function

2013-07-24 Thread Jeff King
On Mon, Jul 22, 2013 at 05:56:43AM -0700, Kyle J. McKay wrote:

 In order to perform sane URL matching for http.url.* options,
 http.c normalizes URLs before performing matches.
 
 A new test-url-normalize test program is introduced along with
 a new t5200-url-normalize.sh script to run the tests.

While looking at test-url-normalize (and wanting to experiment some with
your series to see how it handled a few behaviors), I wondered if we
shouldn't be exposing this selection process to the user somehow via
git-config.

That is, would a shell script ever want to say what is the value of
this config variable for url $X? Certainly our test scripts want to,
and having a test-* program covers that, but might user scripts want to
do the same? Or even to introduce its own URL-matched config options?

How hard would it be to convert the -c option of test-url-normalize
into something like:

  git config --file=foo --url http noepsv $URL

which would look for http.$URL.noepsv matches.

If we want to go that route, we don't have to do it now (the test-*
interface is unpublished, and the git-config interface can simply come
later than the internal feature).  But it may be worth thinking about
now while it is in your head.

 diff --git a/test-url-normalize.c b/test-url-normalize.c
 new file mode 100644
 index 000..f18bd88
 --- /dev/null
 +++ b/test-url-normalize.c
 [...]
 + if (!strcmp(sslverify, opt_lc.buf))
 + printf(%s\n, curl_ssl_verify ? true : false);
 + else if (!strcmp(sslcert, opt_lc.buf))
 + printf(%s\n, ssl_cert);
 +#if LIBCURL_VERSION_NUM = 0x070903
 + else if (!strcmp(sslkey, opt_lc.buf))
 + printf(%s\n, ssl_key);
 +#endif
 +#if LIBCURL_VERSION_NUM = 0x070908
 + else if (!strcmp(sslcapath, opt_lc.buf))
 + printf(%s\n, ssl_capath);
 +#endif
 [...]

Do we need to have the complete list of options here, including curl
version limitations? It seems like this will eventually get out of date
with the list of options. Would it be sufficient to test just one (or
even just handle a fake http.$URL.foo variable)?

 +#define url_normalize(u) http_options_url_normalize(u)

Does this macro do anything besides shorten the name? Is the extra
level of indirection to the reader worth it?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] config: add support for http.url.* settings

2013-07-24 Thread Jeff King
On Mon, Jul 22, 2013 at 05:56:41AM -0700, Kyle J. McKay wrote:

 +enum http_option_type {
 + OPT_POST_BUFFER,
 + OPT_MIN_SESSIONS,
 + OPT_SSL_VERIFY,
 + OPT_SSL_TRY,
 + OPT_SSL_CERT,
 + OPT_SSL_CAINFO,
 + OPT_LOW_SPEED,
 + OPT_LOW_TIME,
 + OPT_NO_EPSV,
 + OPT_HTTP_PROXY,
 + OPT_COOKIE_FILE,
 + OPT_USER_AGENT,
 + OPT_PASSWD_REQ,
 +#ifdef USE_CURL_MULTI
 + OPT_MAX_REQUESTS,
 +#endif
 +#if LIBCURL_VERSION_NUM = 0x070903
 + OPT_SSL_KEY,
 +#endif
 +#if LIBCURL_VERSION_NUM = 0x070908
 + OPT_SSL_CAPATH,
 +#endif
 + OPT_MAX
 +};
 +
 +static size_t http_option_max_matched_len[OPT_MAX];

It's frustrating that we now have an extra spot to add new options to
(e.g., somebody else is adding http.savecookies in another thread, and
the merge will need to not just resolve the textual conflicts, but add
it to this other list).

Might it be simpler to just keep a dynamic list indexed by option name
(either a hash table, or a sorted string_list)? We only incur a lookup
when we find an actual config entry, so the number of lookups (and
entries) should be minuscule and not affect performance. And as a bonus,
it lets us handle arbitrary keys if we want to allow git config to
learn about url matching for user-specified keys.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] protocol-capabilities documentation updates

2013-07-24 Thread Jeff King
On Mon, Jul 15, 2013 at 07:25:19PM +0700, Nguyen Thai Ngoc Duy wrote:

 I noticed that quiet and agent capabilities were missing in
 protocol-capabilitities.txt. I have a rough idea what they do, but I
 think it's best to be documented by the authors. Maybe you have some
 time to make a patch?

Thanks for bringing it up; we should be more careful about documenting
these as we add them.

I went ahead and documented them both, as I also remember the rationale
for quiet. And of course I found some other inaccuracies in the
capabilities documentation while I was there. :)

  [1/4]: docs: fix 'report-status' protocol capability thinko
  [2/4]: docs: note that receive-pack knows side-band-64k capability
  [3/4]: document 'agent' protocol capability
  [4/4]: document 'quiet' receive-pack capability

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] docs: fix 'report-status' protocol capability thinko

2013-07-24 Thread Jeff King
The report-status capability is understood by receive-pack,
not upload-pack.

Signed-off-by: Jeff King p...@peff.net
---
 Documentation/technical/protocol-capabilities.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/technical/protocol-capabilities.txt 
b/Documentation/technical/protocol-capabilities.txt
index b15517f..11467ff 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -168,7 +168,7 @@ report-status
 report-status
 -
 
-The upload-pack process can receive a 'report-status' capability,
+The receive-pack process can receive a 'report-status' capability,
 which tells it that the client wants a report of what happened after
 a packfile upload and reference update.  If the pushing client requests
 this capability, after unpacking and updating references the server
-- 
1.8.3.rc1.30.gff0fb75

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] docs: note that receive-pack knows side-band-64k capability

2013-07-24 Thread Jeff King
The protocol-capabilities documentation notes that any
capabilities not explicitly mentioned for receive-pack work
only for upload-pack.

Receive-pack has advertised and understood side-band-64k
since 38a81b4 (receive-pack: Wrap status reports inside
side-band-64k, 2010-02-05), but we do not mention it
explicitly. Let's do so.

Note that receive-pack does not understand side-band, which
was obsolete by that point.

Signed-off-by: Jeff King p...@peff.net
---
 Documentation/technical/protocol-capabilities.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/protocol-capabilities.txt 
b/Documentation/technical/protocol-capabilities.txt
index 11467ff..9bc2a10 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -21,8 +21,8 @@ recognized by the receive-pack (push to server) process.
 The 'report-status' and 'delete-refs' capabilities are sent and
 recognized by the receive-pack (push to server) process.
 
-The 'ofs-delta' capability is sent and recognized by both upload-pack
-and receive-pack protocols.
+The 'ofs-delta' and 'side-band-64k' capabilities are sent and recognized
+by both upload-pack and receive-pack protocols.
 
 All other capabilities are only recognized by the upload-pack (fetch
 from server) process.
-- 
1.8.3.rc1.30.gff0fb75

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] document 'quiet' receive-pack capability

2013-07-24 Thread Jeff King
This was added in c207e34 (fix push --quiet: add 'quiet'
capability to receive-pack, 2012-01-08) but never
documented.

Signed-off-by: Jeff King p...@peff.net
---
 Documentation/technical/protocol-capabilities.txt | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/protocol-capabilities.txt 
b/Documentation/technical/protocol-capabilities.txt
index d35159e..ec131b6 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -18,7 +18,7 @@ NOT advertise capabilities it does not understand.
 and server advertised.  As a consequence of these rules, server MUST
 NOT advertise capabilities it does not understand.
 
-The 'report-status' and 'delete-refs' capabilities are sent and
+The 'report-status', 'delete-refs', and 'quiet' capabilities are sent and
 recognized by the receive-pack (push to server) process.
 
 The 'ofs-delta' and 'side-band-64k' capabilities are sent and recognized
@@ -200,3 +200,13 @@ to delete references.
 value of a reference update.  It is not sent back by the client, it
 simply informs the client that it can be sent zero-id values
 to delete references.
+
+quiet
+-
+
+If the receive-pack server advertises the 'quiet' capability, it is
+capable of silencing human-readable progress output which otherwise may
+be shown when processing the received pack. A send-pack client should
+respond with the 'quiet' capability to suppress server-side progress
+reporting if the local progress reporting is also being suppressed
+(e.g., via `push -q`, or if stderr does not go to a tty).
-- 
1.8.3.rc1.30.gff0fb75
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] document 'agent' protocol capability

2013-07-24 Thread Jeff King
This was added in ff5effd (include agent identifier in
capability string, 2012-08-03), but neither the syntax nor
the semantics were ever documented outside of the commit
message.

Signed-off-by: Jeff King p...@peff.net
---
 Documentation/technical/protocol-capabilities.txt | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/protocol-capabilities.txt 
b/Documentation/technical/protocol-capabilities.txt
index 9bc2a10..d35159e 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -22,7 +22,8 @@ The 'ofs-delta' and 'side-band-64k' capabilities are sent and 
recognized
 recognized by the receive-pack (push to server) process.
 
 The 'ofs-delta' and 'side-band-64k' capabilities are sent and recognized
-by both upload-pack and receive-pack protocols.
+by both upload-pack and receive-pack protocols.  The 'agent' capability
+may optionally be sent in both protocols.
 
 All other capabilities are only recognized by the upload-pack (fetch
 from server) process.
@@ -123,6 +124,20 @@ send/read OBJ_OFS_DELTA (aka type 6) in a packfile.
 its base by position in pack rather than by an obj-id.  That is, they can
 send/read OBJ_OFS_DELTA (aka type 6) in a packfile.
 
+agent
+-
+
+The server may optionally send a capability of the form `agent=X` to
+notify the client that the server is running version `X`. The client may
+optionally return its own agent string by responding with an `agent=Y`
+capability (but it MUST NOT do so if the server did not mention the
+agent capability). The `X` and `Y` strings may contain any printable
+ASCII characters except space (i.e., the byte range 32  x  127), and
+are typically of the form package/version (e.g., git/1.8.3.1). The
+agent strings are purely informative for statistics and debugging
+purposes, and MUST NOT be used to programatically assume the presence
+or absence of particular features.
+
 shallow
 ---
 
-- 
1.8.3.rc1.30.gff0fb75

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 3/4] tests: add new test for the url_normalize function

2013-07-24 Thread Jeff King
On Wed, Jul 24, 2013 at 12:01:26PM -0700, Kyle J. McKay wrote:

 Right now, the values are only available as various strings, ints,
 longs etc. which have to be formatted differently for output.  The
 original string value they were converted from is gone.  The snippet
 shown above only shows some of the %s formatters.
 
 Either the original value will have to be kept around or a
 reconstituted string depending on what:
 
 git config --file=foo --url http noepsv $URL
 
 should output.  If the original value was 0 or 1, should it output
 that or false or true?  The test-url-normalize code for -c
 normalizes the output to false or true for all boolean values and
 reconverts ints/longs to strings.

I think it would be the responsibility of the caller to specify what
they are looking for. I.e., add --bool to the git-config command line
as appropriate.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] commit.h: drop redundant comment

2013-07-25 Thread Jeff King
We mention twice that the from_ident field of struct
pretty_print_context is internal.

The first comment was added by 10f2fbf, which prepares the
struct for internal fields, and then the second by a908047,
which actually adds such a field. This was a mistake made
when re-rolling the series on the list; the comment should
have been removed from the latter commit.

Signed-off-by: Jeff King p...@peff.net
---
Not that important, really. I only bothered tracking down the source of
the error because I was curious if it was a mis-merge or something. But
nope, just me screwing up, and missing it in review.

 commit.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/commit.h b/commit.h
index 35cc4e2..d912a9d 100644
--- a/commit.h
+++ b/commit.h
@@ -102,8 +102,6 @@ struct pretty_print_context {
 * Fields below here are manipulated internally by pp_* functions and
 * should not be counted on by callers.
 */
-
-   /* Manipulated by the pp_* functions internally. */
struct string_list in_body_headers;
 };
 
-- 
1.8.3.rc1.30.gff0fb75
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Faster git grep.

2013-07-25 Thread Jeff King
On Thu, Jul 25, 2013 at 08:29:05PM +0200, Ondřej Bílka wrote:

 One solution would be to use same trick as was done in google code.
 Build and keep database of trigraphs and which files contain how many of
 them. When querry is made then check
 only these files that have appropriate combination of trigraphs.

That seems like a sensible approach.

 Updating database would be relatively inexpensive compared to disk
 access we need to do anyway.

Yes, I think it can be quite cheap, since you would only need to
re-index files that have changed (and git is very quick at telling you
what those files are).

 A space usage might be problem so which is why I decided go option
 route.
 
 Comments, pointers?

I think it is a good idea, but not need to be part of core git. It seems
more like you would want to glue together an existing code-indexing
solution (like codesearch) with git (which would provide the list of
files to index and to search).

If that proves useful in practice, but the interface is clunky for
whatever reason, then a good follow-on project could be to build support
for updating and using the index via the usual git grep.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REQUEST 1/1] docs: update http.url.* options documentation

2013-07-25 Thread Jeff King
On Thu, Jul 25, 2013 at 03:39:13PM -0700, Kyle J. McKay wrote:

 Overhaul the text of the http.url.* options documentation
 providing a hopefully easier to understand itemized list of
 matching behavior as suggested by and including text from
 Jeff King.
 ---

Signed-off-by: Jeff King p...@peff.net

You should add your S-O-B, too, for your bits.

 +--
 +. Scheme (e.g., `https` in `https://example.com/`). This field
 +  must match exactly between the config key and the URL.
 +. Host/domain name (e.g., `example.com` in `https://example.com/`).
 +  This field must match exactly between the config key and the URL.

These look fine in the rendered manpage, but I think the source might be
a little easier to read with a blank line between items.

 +. Port number (e.g., `8080` in `http://example.com:8080/`).
 +  This field must match exactly between the config key and the URL.
 +  Omitted port numbers are automatically converted to the correct
 +  default for the scheme before matching.

Thanks, I forgot to include port number in my original suggested text.

 +. Exact user name match (e.g., `user` in 
 `https://u...@example.com/repo.git`).
 +  If the config key has a user name it must match the user name in the URL
 +  exactly.
 +. Any user name match.  If a config key does not have a user name, that 
 config
 +  key will match a URL with any user name (including none).

IMHO, this would be more clear as a single item, like:

  . User name (e.g., `user` in `https://u...@example.com/repo.git`). If
the config key has a user name it must match the user name in the
URL exactly. If the config key does not have a user name, that
config key will match a URL with any user name (including none).

 +All URLs are normalized before attempting any matching (the password part,
 +if embedded in the URL, is always ignored for matching purposes) so that
 +equivalent urls that are simply spelled differently will match properly.

And this nicely ties up the open questions I had after re-reading the
list. Good.

We could define equivalent here (the %-encoding thing is obvious, I
think, but which components are case-sensitive and which are not is
perhaps a bit subtle). I do not necessarily know that it is useful to
the user, though, and is one more thing to confuse them.  And somebody
who really cares can look at the appropriate RFCs.

So this patch looks pretty good to me; the two points I raised above are
how I would have done it, but they are relatively minor if you do not
agree.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] branch: make sure the upstream remote is configured

2013-07-26 Thread Jeff King
On Fri, Jul 26, 2013 at 07:39:37PM +0200, Carlos Martín Nieto wrote:

 A command of e.g.
 
 git push --set-upstream /tmp/t master
 
 will call install_branch_config() with a remote name of /tmp/t. This
 function will set the 'branch.master.remote' key to, which is
 nonsensical as there is no remote by that name.

Is it nonsensical? It does not make sense for the @{upstream} magic
token, because we will not have a branch in tracking branch refs/remotes
to point to. But the configuration would still affect how git pull
chooses a branch to fetch and merge.

I.e., you can currently do:

  git push --set-upstream /tmp/t master
  git pull ;# pulls from /tmp/t master

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


<    7   8   9   10   11   12   13   14   15   16   >