Re: [RFC] git checkout $tree -- $path always rewrites files
On Fri, Nov 07, 2014 at 05:13:47PM +0700, Duy Nguyen wrote: By the way, one other thing I wondered while looking at this code: when we checkout a working tree file, we unlink the old one and write the new one in-place. Is there a particular reason we do this versus writing to a temporary file and renaming it into place? That would give simultaneous readers a more atomic view. I suspect the answer is something like: you cannot always do a rename, because you might have a typechange, directory becoming a file, or vice versa; so anyone relying on an atomic view during a checkout operation is already Doing It Wrong. Handling a content-change of an existing path would complicate the code, so we do not bother. Not a confirmation, but it looks like Linus did it just to make sure he had new permissions right, in e447947 (Be much more liberal about the file mode bits. - 2005-04-16). Thanks for digging that up. I think that only gives us half the story, though. That explains why we would unlink/open instead of relying on just open(O_TRUNC). But I think opening a new tempfile would work the same as the current code in that respect. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
On Fri, Nov 07, 2014 at 09:14:42AM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: Is there a reason that we don't use this diff technique for checkout? I suspect that the reasons are probably mixture of these: (1) the code path may descend from checkout-index and predates the in-core diff machinery; (2) in the context of checkout-index, it was more desirable to be able to say I want the contents on the filesystem, even if you think I already have it there, as you as a new software are likely to be wrong and I know better; or (3) it was easier to code that way ;-) I do not see there is any reason not to do what you suggest. OK. It's not very much code (and can mostly be lifted from git-reset), so I may take a stab at it. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] trailer: display a trailer without its trailing newline
On Fri, Nov 07, 2014 at 07:50:49PM +0100, Christian Couder wrote: diff --git a/trailer.c b/trailer.c index 761b763..f4d51ba 100644 --- a/trailer.c +++ b/trailer.c @@ -583,8 +583,12 @@ static int parse_trailer(struct strbuf *tok, struct strbuf *val, const char *tra strbuf_addch(seps, '='); len = strcspn(trailer, seps.buf); strbuf_release(seps); - if (len == 0) - return error(_(empty trailer token in trailer '%s'), trailer); + if (len == 0) { + struct strbuf sb = STRBUF_INIT; + strbuf_addstr(sb, trailer); + strbuf_rtrim(sb); + return error(_(empty trailer token in trailer '%s'), sb.buf); + } Doesn't this leak sb.buf? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote: I think that has direct linkage; what you have in mind I think is http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935 Thanks for that link. I did spend a few hours on this topic earlier today, and got very confused trying to figure out what the deletion behavior _should_ be, and whether I was breaking it. For some reason I had zero recollection of a conversation from last year that I was obviously a major part of. I think I am getting old. :) The end of that thread concludes that a diff-based approach is not going to work, because we need to update the working tree even for files not mentioned by the diff. I do not think that is a show-stopper, though. It just means that we need to load the new index as one step (done now with read_tree_recursive, but ideally using diff), and then walk over the whole resulting index applying our pathspec again (instead of relying on CE_UPDATE flags). This turns out not to be a big deal, because the existing code is already doing most of that second pathspec application anyway. It does it because read_tree_recursive is not smart enough to update the seen bits for the pathspec. But now we would have another reason to do it this way. :) So just to be clear, the behavior we want is that: echo foo some-new-path git add some-new-path git checkout HEAD -- . will delete some-new-path (whereas the current code turns it into an untracked file). What should: git checkout HEAD -- some-new-path do in that case? With the current code, it actually barfs, complaining that nothing matched some-new-path (because it is not part of HEAD, and therefore we don't consider it at all), and aborts the whole operation. I think we would want to delete some-new-path in that case, too. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
On Sat, Nov 08, 2014 at 03:30:40AM -0500, Jeff King wrote: So just to be clear, the behavior we want is that: echo foo some-new-path git add some-new-path git checkout HEAD -- . will delete some-new-path (whereas the current code turns it into an untracked file). What should: git checkout HEAD -- some-new-path do in that case? With the current code, it actually barfs, complaining that nothing matched some-new-path (because it is not part of HEAD, and therefore we don't consider it at all), and aborts the whole operation. I think we would want to delete some-new-path in that case, too. Also, t2022.3 has me very confused. It is explicitly checking that if we have subdir/foo unmerged in the index, and we git checkout $tree -- subdir, and $tree does not mention foo, that we _leave_ foo in place. That seems very counter-intuitive to me. If you asked to make subdir look like $tree, then we should clobber it. That change comes from e721c15 (checkout: avoid unnecessary match_pathspec calls, 2013-03-27), where it is mentioned as a _bugfix_. That in turn references 0a1283b (checkout $tree $path: do not clobber local changes in $path not in $tree, 2011-09-30), which explicitly goes against the goal we are talking about here. It is not make my index and working tree look like $tree at all. So now I'm doubly confused about what we want to do. If we want to retain that behavior, I think we can still cover these cases by marking items missing from $tree as to remove during the diff/update the index phase, and then being more gentle with to remove files (e.g., not clobbering changed worktree files unless -f is given). I am not sure that provides a sane user experience, though. Why is it OK to clobber local changes to a file if we are replacing it with other content, but _not_ if we are replacing it with nothing? Either the content we are losing is valuable or not, but it has nothing to do with what we are replacing. And Junio argued in the thread linked elsewhere that the point of git checkout $tree -- $path is to clobber what is in $path, which I would agree with. I think the argument made in 0a1283b is that git checkout $tree $path is not make $path like $tree, but rather pick bits of $path out of $tree. Which would mean this whole deletion thing we are talking about is completely contrary to that. So which is it? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is the default refspec for fetch?
On Fri, Nov 07, 2014 at 04:31:08PM +0100, Christian Halstrick wrote: In a repo where no remote.name.fetch config parameter is set what should a git fetch do? My experiments let me think it's HEAD:FETCH_HEAD. Right? Basically, yes. We always write FETCH_HEAD, regardless of the refspec. We choose HEAD if no other refspec was provided. So it is really more like git fetch $remote HEAD This is what makes one-off bare-url pulls work, like: git pull git://... It runs fetch under the hood, which writes into FETCH_HEAD, and then we merge that. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git_connect: allow passing arguments to ssh in GIT_SSH_ARGS
On Sat, Nov 08, 2014 at 11:44:39AM +0100, Thomas Quinot wrote: It may be impractical to install a wrapper script for ssh when additional parameters need to be passed. Provide an alternative way of specifying these by means of the GIT_SSH_ARGS environment variable. Arguments are whitespace delimited following usual shell conventions; embedded whitespace can be enclosed in quotes, or escaped with a backslash. This has bugged me before, too. Almost every sub-program invoked by git is done so using a shell, so you can put in arbitrary arguments or even snippets of shell code. But not GIT_SSH. I think the original reason was to match other programs with similar variables. At this point, we can't just change it on a whim; the quoting requirements are different and it would break people's setups. So your approach of adding a new variable is good (the other option is figuring out a long-term transition plan). However, I'm not sure adding a separate variable for options is the best we can do. Besides being a bit clunky, it has two big shortcomings: 1. It's not consistent with other git variables that use the shell (e.g., GIT_PAGER). 2. It's not as flexible as a shell; you can't specify $HOME/bin/ssh or other even more esoteric things, like dynamically tweaking the options based on the environment. What do you think of adding an alternate variable that is not ssh _arguments_, but rather just a full shell command for running ssh? I'm not sure what it could be called (GIT_SSH_SH is probably too confusing). I hope I won't stray too far away from established procedures with my first contribution to git. This patch adds support for a GIT_SSH_ARGS environment variable, providing a way of specifying ssh arguments without having to create a wrapper script. The formatting and whatnot of your submission looks fine to me. The patch itself looks reasonable, though I didn't look too closely after making my comments above. :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Add a few more values for receive.denyCurrentBranch
On Fri, Nov 07, 2014 at 02:58:17PM +0100, Johannes Schindelin wrote: Under certain circumstances, it makes a *lot* of sense to allow pushing into the current branch. For example, when two machines with different Operating Systems are required for testing, it makes much more sense to synchronize between working directories than having to go via a third server. FWIW, I do this without a third server (and without resorting to pull), with: host1$ git push host2 master:refs/remotes/host1/master host2$ git merge host1/master You can even set up a push refspec to make git push host2 do the right thing. That being said, I do like the premise of your patch, as it eliminates the extra step on the remote side (which is not that big a deal in itself, but when you realize that host2 _did_ have some changes on it, then you end up doing the merge there, when in general I'd prefer to do all the work on host1 via git pull). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Test failure
On Sat, Nov 08, 2014 at 11:28:32AM -0800, Michael Blume wrote: When I build and run tests I get [11:17][michael.blume@tcc-michael-4:~/workspace/git/t(master)]$ ./t1410-reflog.sh What does ./t1410-reflog.sh -v -i report? A quick search seems to indicate the test is pretty new? http://www.mail-archive.com/git@vger.kernel.org/msg60495.html Yes, it is new. In these cases there's often some silly little platform incompatibility in the test script, but I don't see one. So maybe the incompatibility is in the code itself; I'm wondering if OS X returns something besides EISDIR when trying to open a directory. Unfortunately I don't have an OS X install handy to test on. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] t1410: fix breakage on case-insensitive filesystems
On Sat, Nov 08, 2014 at 08:43:54PM -0500, Jeff King wrote: Unfortunately I don't have an OS X install handy to test on. I lied; it turns out I still had access to an old VM. The problem did turn out to be rather silly. Here's a patch. -- 8 -- Two tests recently added to t1410 create branches a and a/b to test d/f conflicts on reflogs. Earlier, unrelated tests in that script create the path A/B in the working tree. There's no conflict on a case-sensitive filesystem, but on a case-insensitive one, git log will complain that a/b is both a revision and a working tree path. We could fix this by using a -- to disambiguate, but we are probably better off using names that are less confusing to make it more clear that they are unrelated to the working tree files. This patch turns a/b into one/two. Reported-by: Michael Blume blume.m...@gmail.com Signed-off-by: Jeff King p...@peff.net --- The line-diff is hard to read, but if anyone was looking for a chance to test-drive contrib/diff-highlight, it does a good job of making the change easy to see. t/t1410-reflog.sh | 36 ++-- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh index 976c1d4..8cf4461 100755 --- a/t/t1410-reflog.sh +++ b/t/t1410-reflog.sh @@ -254,36 +254,36 @@ test_expect_success 'checkout should not delete log for packed ref' ' ' test_expect_success 'stale dirs do not cause d/f conflicts (reflogs on)' ' - test_when_finished git branch -d a || git branch -d a/b + test_when_finished git branch -d one || git branch -d one/two - git branch a/b master - echo a/b@{0} branch: Created from master expect - git log -g --format=%gd %gs a/b actual + git branch one/two master + echo one/two@{0} branch: Created from master expect + git log -g --format=%gd %gs one/two actual test_cmp expect actual - git branch -d a/b + git branch -d one/two - # now logs/refs/heads/a is a stale directory, but - # we should move it out of the way to create a reflog - git branch a master - echo a@{0} branch: Created from master expect - git log -g --format=%gd %gs a actual + # now logs/refs/heads/one is a stale directory, but + # we should move it out of the way to create one reflog + git branch one master + echo one@{0} branch: Created from master expect + git log -g --format=%gd %gs one actual test_cmp expect actual ' test_expect_success 'stale dirs do not cause d/f conflicts (reflogs off)' ' - test_when_finished git branch -d a || git branch -d a/b + test_when_finished git branch -d one || git branch -d one/two - git branch a/b master - echo a/b@{0} branch: Created from master expect - git log -g --format=%gd %gs a/b actual + git branch one/two master + echo one/two@{0} branch: Created from master expect + git log -g --format=%gd %gs one/two actual test_cmp expect actual - git branch -d a/b + git branch -d one/two - # same as before, but we only create a reflog for a if + # same as before, but we only create a reflog for one if # it already exists, which it does not - git -c core.logallrefupdates=false branch a master + git -c core.logallrefupdates=false branch one master : expect - git log -g --format=%gd %gs a actual + git log -g --format=%gd %gs one actual test_cmp expect actual ' -- 2.1.2.596.g7379948 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] replace: fix replacing object with itself
On Sun, Nov 09, 2014 at 01:05:31AM +0100, Manzur Mukhitdinov wrote: When object is replaced with itself git shows unhelpful messages like(git log): fatal: replace depth too high for object SHA1 Prevents user from replacing object with itself(with test for checking this case). I thought we already did this in the last round of git-replace patches, but it looks like we only did it for the newly added --edit and --graft cases, not git replace X X. I think this is probably a good step. I've also considered that this should be another way of deleting the replacement, but I think we decided that was too magical. diff --git a/builtin/replace.c b/builtin/replace.c index 294b61b..b7e05ad 100644 --- a/builtin/replace.c +++ b/builtin/replace.c @@ -186,6 +186,9 @@ static int replace_object(const char *object_ref, const char *replace_ref, int f if (get_sha1(replace_ref, repl)) die(Failed to resolve '%s' as a valid ref., replace_ref); + if (!hashcmp(object, repl)) + return error(new object is the same as the old one: '%s', sha1_to_hex(object)); + return replace_object_sha1(object_ref, object, replace_ref, repl, force); I think all of the callers of replace_object_sha1 do this same check now. Can we just move the check into that function instead of adding another instance of it? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
On Sat, Nov 08, 2014 at 08:19:21AM -0800, Martin von Zweigbergk wrote: What should: git checkout HEAD -- some-new-path do in that case? With the current code, it actually barfs, complaining that nothing matched some-new-path (because it is not part of HEAD, and therefore we don't consider it at all), and aborts the whole operation. I think we would want to delete some-new-path in that case, too. I don't think we'd want it to be deleted. I would view 'git reset --hard' as the role model here, and that command (without paths) would not remove the file. And applying it to a path should not change the behavior, just restrict it to the paths, right? Are you sure about git reset here? If I do: git init echo content file git add file git commit -m base echo modified file echo new some-new-path git add file some-new-path git reset --hard then we delete some-new-path (it is not untracked, because the index knows about it). That makes sense to me. I.e., we treat it with the same preciousness whether it is named explicitly or not. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] git_connect: set ssh shell command in GIT_SSH_CMD
On Sat, Nov 08, 2014 at 03:27:53PM +0100, Thomas Quinot wrote: It may be impractical to install a wrapper script for GIT_SSH when additional parameters need to be passed. Provide an alternative way of specifying a shell command to be run, including command line arguments, by means of the GIT_SSH_CMD environment variable, which behaves like GIT_SSH but is passed to the shell. Thanks, I like this much better. The name GIT_SSH_CMD is not too bad. Personally, of the two (GIT_SSH and GIT_SSH_CMD) I would expect the _CMD form to be the one that does not use the shell. But I do not really have a better suggestion for the name, so perhaps it's OK. Note that with this first patch only, the special processing for PuTTY/plink looks at the whole command in that case. I deliberately left it that way, even though one might imagine a case where this would cause a false positive, e.g. if the user's login name includes the string 'plink': GIT_SSH_CMD=ssh -l fooplink The work-around in this case would be to write: GIT_SSH_CMD=ssh -l foop''link I'd be tempted to say that the plink magic does not need to kick in at all for GIT_SSH_CMD. There are simply too many corner cases in trying to guess what the shell is going to run. For example: A second patch, coming in a followup message, pre-splits $GIT_SSH_CMD and ensures that we look for plink/tortoiseplink only in the argv[0] element. I don't think you can pre-split accurately. Since we promise to pass the result to the shell, the user gets to write whatever they want. Even: GIT_SSH_CMD='f() { if test $(hostname) = some-machine; then some-special-ssh $@ else ssh $@ } f' Parsing complications aside, you cannot even know in git which ssh is going to be run until the shell code is executed. I think either we have to leave the responsibility for munging -p into -P on the side of the user's shell snippet (and remember that they can still use GIT_SSH as usual, so we are not making anything _worse_ for them here), or we have to provide some unambiguous way for them to signal which calling convention we want ($GIT_SSH_CMD_IS_PLINK or something). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] t1410: fix breakage on case-insensitive filesystems
On Sun, Nov 09, 2014 at 06:56:45PM -0800, Junio C Hamano wrote: We could fix this by using a -- to disambiguate, but we are probably better off using names that are less confusing to make it more clear that they are unrelated to the working tree files. This patch turns a/b into one/two. Hmph, but the branch a and the file A _do_ have names that are unrelated to each other, and it is only the case insensitive fs that is confused ;-). Renaming is not so bad and certainly is not wrong per-se as a workaround, but I have this suspicion that it sends a wrong message to people on such a filesystem, namely, you cannot use Git to manage a file called 'master', or something silly like that. Disambiguation with double-dashes does not have such a problem, and instead shows a way how scripts that are meant to be portable is written. More importantly, that is more in line with the problem description (i.e. we complain pointing out the ambiguity, implying that everything is fine as long as you disambiguate). So I would rather see the workaround done that way. My main concern is that it leaves t1410 in an accident waiting to happen state, where some hapless developer will add a new test using a/b and not realizing they need to be careful to disambiguate. The test will pass for them on Linux, but some luckless OS X user will end up wasting time tracking down the error. Or another way of looking at it: it is perfectly possible to have git manage a file called master or even HEAD. But that does not mean it is a _good idea_, or is without annoyances. :) But that is only if this were before you actually wrote the patch. The above is not a preference strong enough to make me ask you to reroll ;-) The alternate form is quite trivial. I think I still prefer the one/two version, but here is the -- patch for reference. You can decide which to pick up. -- 8 -- Subject: t1410: fix breakage on case-insensitive filesystems Two tests recently added to t1410 create branches a and a/b to test d/f conflicts on reflogs. Earlier tests in that script create the path A/B in the working tree. There's no conflict on a case-sensitive filesystem, but on a case-insensitive one, git log will complain that a/b is both a revision and a working tree path. We can fix this by using -- to disambiguate. Signed-off-by: Jeff King p...@peff.net --- t/t1410-reflog.sh | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh index 976c1d4..48bcd59 100755 --- a/t/t1410-reflog.sh +++ b/t/t1410-reflog.sh @@ -258,7 +258,7 @@ test_expect_success 'stale dirs do not cause d/f conflicts (reflogs on)' ' git branch a/b master echo a/b@{0} branch: Created from master expect - git log -g --format=%gd %gs a/b actual + git log -g --format=%gd %gs a/b -- actual test_cmp expect actual git branch -d a/b @@ -266,7 +266,7 @@ test_expect_success 'stale dirs do not cause d/f conflicts (reflogs on)' ' # we should move it out of the way to create a reflog git branch a master echo a@{0} branch: Created from master expect - git log -g --format=%gd %gs a actual + git log -g --format=%gd %gs a -- actual test_cmp expect actual ' @@ -275,7 +275,7 @@ test_expect_success 'stale dirs do not cause d/f conflicts (reflogs off)' ' git branch a/b master echo a/b@{0} branch: Created from master expect - git log -g --format=%gd %gs a/b actual + git log -g --format=%gd %gs a/b -- actual test_cmp expect actual git branch -d a/b @@ -283,7 +283,7 @@ test_expect_success 'stale dirs do not cause d/f conflicts (reflogs off)' ' # it already exists, which it does not git -c core.logallrefupdates=false branch a master : expect - git log -g --format=%gd %gs a actual + git log -g --format=%gd %gs a -- actual test_cmp expect actual ' -- 2.1.2.596.g7379948 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] t1410: Fix for case insensitive filesystems
On Sun, Nov 09, 2014 at 08:48:06PM -0500, Brian Gernhardt wrote: A pair of recently added tests used branches a and a/b, but earlier tests created files A and A/B. On case insensitive filesystems (such as HFS+), that causes git to complain about the name being ambiguous between branch and file. Resolve by renaming the branches to aa and aa/bb. Already being discussed here: http://thread.gmane.org/gmane.comp.version-control.git/259250/focus=259253 :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] t1410: fix breakage on case-insensitive filesystems
On Sun, Nov 09, 2014 at 09:48:29AM -0800, Junio C Hamano wrote: Michael Blume blume.m...@gmail.com writes: Works for me, thanks =) I'm curious now, is there an automated build of git running on a mac anywhere? There's a mac mini running jenkins in my office and it's possible I could convince someone to let me set up a git build that'll e-mail me if there's a test failure. I am not aware of a Macintosh person who regularly runs tests, but if there were, we hopefully will hear from them soonish ;-). I think there are several people who run the tests on OS X fairly regularly (I note that another fix for the t1410 problem has already materialized :) ). However, I think it is nice when test failures are caught early, before you have merged topics (to master or elsewhere). That helps isolate the failures to their particular topics. I know you make test before pushing out the results of any integration you do. And I recall that for a while (and maybe still?) you even did so on VMs of a few common platforms. OS X is notoriously irritating to run in a VM, but would you be interested in a network-accessible install that you could push to and make test on as part of your routine? If what Michael is offering cannot do that, I am sure I can get GitHub to set something up. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] imap-send: avoid curl functions when not building curl support
Imap-send recently learned to conditionally compile against and use curl for imap support. To use this feature, you must both: 1. Compile with USE_CURL_FOR_IMAP_SEND 2. Specify --curl on the command line to enable it It is OK (and even desirable) for the code checking --curl to be compiled even if USE_CURL_FOR_IMAP_SEND is not in effect; this lets us explain the situation to the user (instead of saying --curl? never heard of it). However, the code which conditionally runs when --curl is enabled must _not_ be compiled in this case. It references functions which are neither declared nor defined, causing the compiler to complain. Signed-off-by: Jeff King p...@peff.net --- On top of br/imap-send-via-libcurl. I needed this to compile 'pu' with NO_CURL (which I don't usually do, but was testing on a minimal box). I expect it can just be squashed in to the next re-roll. Since we were talking about testing in another thread, Junio, I wonder if it is worth having you compile your integration results against a couple different configs (e.g., NO_CURL). Obviously that will make things slower if you don't throw more CPU power at it, but that seems like a problem that can be solved with build servers or similar. imap-send.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/imap-send.c b/imap-send.c index ad4ac22..e0e1f09 100644 --- a/imap-send.c +++ b/imap-send.c @@ -1542,8 +1542,10 @@ int main(int argc, char **argv) if (server.tunnel) return append_msgs_to_imap(server, all_msgs, total); +#ifdef USE_CURL_FOR_IMAP_SEND if (use_curl) return curl_append_msgs_to_imap(server, all_msgs, total); +#endif return append_msgs_to_imap(server, all_msgs, total); } -- 2.1.2.596.g7379948 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] t4213: avoid | in sed regexp
On Sun, Nov 09, 2014 at 06:46:01PM -0800, Michael Blume wrote: Ok, with that I have a different test failure on the pu branch -- please tell me if I'm spamming the list or if there's some other protocol I should be using to report issues on pu. No, reporting problems to the list is exactly the right spot. It is nice to start a new thread, though, if it is an unrelated problem. And possibly cc folks you think might be responsible. E.g.: git shortlog -se origin/master..origin/pu t/t4213-* points to Thomas. :) -- 8 -- Many versions of sed (e.g., that found on OS X) do not understand |-alternation, even when backslash escaped. Some versions can turn on extended regexps with a special option, but of course that option is not standard, either. Let's just write out our alternates longhand. Signed-off-by: Jeff King p...@peff.net --- On top of the tr/remerge-diff topic. For curiosity, it is -E on OS X and -r on GNU sed to turn on extended regexps. But I hear that Solaris sed also does not handle \|, and I would not be surprised to find that it has no extended regexp support at all. :) t/t4213-log-remerge-diff.sh | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/t/t4213-log-remerge-diff.sh b/t/t4213-log-remerge-diff.sh index 36ef17a..ec93b96 100755 --- a/t/t4213-log-remerge-diff.sh +++ b/t/t4213-log-remerge-diff.sh @@ -75,7 +75,9 @@ test_expect_success 'unrelated merge: without conflicts' ' clean_output () { git name-rev --name-only --stdin | # strip away bits that aren't treated by the above - sed -e 's/^\(index\|Merge:\|Date:\).*/\1/' + sed -e 's/^\(index\).*/\1/' \ + -e 's/^\(Merge:\).*/\1/' \ + -e 's/^\(Date:\).*/\1/' } cat expected EOF -- 2.1.2.596.g7379948 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] t1410: fix breakage on case-insensitive filesystems
On Sun, Nov 09, 2014 at 10:47:52PM -0800, Junio C Hamano wrote: On Sun, Nov 9, 2014 at 10:30 PM, Jeff King p...@peff.net wrote: I know you make test before pushing out the results of any integration you do. And I recall that for a while (and maybe still?) you even did so on VMs of a few common platforms. OS X is notoriously irritating to run in a VM, but would you be interested in a network-accessible install that you could push to and make test on as part of your routine? If what Michael is offering cannot do that, I am sure I can get GitHub to set something up. Even if it were network-accessible, the latency for doing an integration cycle and pushing there and waiting for the tests to come back may make it impractical to use it as part of pre-pushout test. I had the impression you were already doing for i in my_vms; do git push $i ssh $i make test; done, so in theory this could be done in parallel. But yeah, I definitely wouldn't want to add latency to your workflow. But I would think that the project certainly would benefit if a post-receive hook at Github did an automated test on the tip of 'pu' and bisected a breakage, if found, to pinpoint the patch that breaks. It could even send a notice to the author of the non-portable patch ;-). Yeah, you're right. Testing on pu is probably enough. Coupled with automated bisection, the pinpointing part is not that important, and pu is often early enough to catch problems before people try to build on top of them. So I think if Michael is still willing to set up his build box, just pulling and building your pu automatically would be a good start. GitHub supports trigger hooks on pushes, but I do not think that is even necessary here. Just polling once or twice a day would probably give us good enough turn-around time. I think somebody just needs to write the auto-bisect script. I suspect the simplest form is something like: # fetch a new version; exit early if we already checked this one git fetch test $(git rev-parse pu-ok) = $(git rev-parse origin/pu) exit 0 git reset --hard origin/pu # quick path: all tests succeed if make test; then git tag -f pu-ok HEAD exit 0 fi # slow path: bisect the actual commit. We do not need worry about # cutting each bisection step down to the minimal size (e.g., finding # the actual failing test), because this should run only infrequently. git bisect start HEAD pu-ok git bisect run make test The real work would be in wrapping it up in a cron-job or similar so that it produces no output on success, and sends an email (or whatever) with the bisect results when it fails. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] git_connect: set ssh shell command in GIT_SSH_COMMAND
On Sun, Nov 09, 2014 at 11:42:32PM +0100, Thomas Quinot wrote: It may be impractical to install a wrapper script for GIT_SSH when additional parameters need to be passed. Provide an alternative way of specifying a shell command to be run, including command line arguments, by means of the GIT_SSH_COMMAND environment variable, which behaves like GIT_SSH but is passed to the shell. The special circuitry to modify parameters in the case of using PuTTY's plink/tortoiseplink is activated only when using GIT_SSH; in the case of using GIT_SSH_COMMAND, it is deliberately left up to the user to make any required parameters adaptation before calling the underlying ssh implementation. Signed-off-by: Thomas Quinot tho...@quinot.org --- Amended patch as per discussion with Junio: change variable name to GIT_SSH_COMMAND, keep plink special circuitry disabled for this case (leaving it enabled only when using GIT_SSH, thus preserving compatibility with legacy usage). I think this version looks good. Thanks for working on it. Two style micro-nits (that I do not think even merit a re-roll by themselves, but Junio may want to mark up as he applies): + } else { + ssh = getenv(GIT_SSH); + if (!ssh) ssh = ssh; You did not even introduce this line, but only reindented it. However, our code style usually would write this as: if (!ssh) ssh = ssh; + putty = strcasestr(ssh, plink) != NULL; We would usually write this as: putty = !!strcasestr(ssh, plink); -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] use child_process_init() to initialize struct child_process variables
On Sun, Nov 09, 2014 at 02:49:58PM +0100, René Scharfe wrote: -- 8 -- Subject: [PATCH] trailer: use CHILD_PROCESS_INIT in apply_command() Initialize the struct child_process variable cp at declaration time. This is shorter, saves a function call and prevents using the variable before initialization by mistake. Suggested-by: Jeff King p...@peff.net Signed-off-by: Rene Scharfe l@web.de --- trailer.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) Thanks, both this one and the other you just sent (to use child_process.args in more places) look good to me. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] imap-send: avoid curl functions when not building curl support
On Mon, Nov 10, 2014 at 01:39:47AM -0500, Jeff King wrote: On top of br/imap-send-via-libcurl. I needed this to compile 'pu' with NO_CURL (which I don't usually do, but was testing on a minimal box). I expect it can just be squashed in to the next re-roll. Oops, just started reading through my list backlog and realized you already noticed and dealt with this. Sorry for the noise. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.2.0-rc behavior changes (1/2)
On Mon, Nov 10, 2014 at 07:47:32PM +1100, Bryan Turner wrote: First change: git update-ref -d /refs/heads/nonexistent some-valid-sha1 now produces an error about ref locking that it didn't produce before Git 2.1.x and prior produced this output: error: unable to resolve reference refs/heads/nonexistent: No such file or directory Now, in the 2.2.0 RCs, it says: error: unable to resolve reference refs/heads/nonexistent: No such file or directory error: Cannot lock the ref 'refs/heads/nonexistent'. This one feels more like a bug, but again may not be. I say it feels like a bug because of the order of the messages: If git has decided the ref doesn't exist, why is it still trying to lock it? I don't think this is a bug. The order you see is because the code goes something like this: 1. the parent function calls a sub-function to lock 2. the sub-function generates the error no such file or directory and returns failure to the caller 3. the caller reports that acquiring the lock failed The only thing that has changed between the two is step (3), but it is not an extra lock action after the error. It is just a more verbose report of the same error. That being said, the sub-function (lock_ref_sha1_basic) gives a much more useful message. So it would be a nice enhancement to make sure that it prints something useful in every return case, and then drop the message from the caller. As an aside, I'm also slightly confused by your output. Are you feeding /refs/heads/nonexistent (with a leading slash), or refs/heads/nonexistent (no leading slash)? If the latter, then that should silently succeed (and seems to in my tests). If the former, then the real problem is not ENOENT, but rather EINVAL; that name is not a valid refname. Older versions of git would produce: error: unable to resolve reference /refs/heads/nonexistent: No such file or directory which is like the error you showed, but note that the refname is reported with the leading slash. In v2.2.0-rc1, this is: error: unable to resolve reference /refs/heads/nonexistent: Invalid argument error: Cannot lock the ref '/refs/heads/nonexistent'. which is more accurate. I could explain the differences in our output from some simple transcription errors when writing your email, but I wanted to make sure I am not missing something. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.2.0-rc behavior changes (2/2)
On Mon, Nov 10, 2014 at 07:51:00PM +1100, Bryan Turner wrote: Second change: git gc --auto now fails if there are loose empty blobs. We have a test which just touched empty files in objects/17 to trigger the gc --auto preconditions (Note: I realize this is completely invalid, and I've changed the test to no longer do this; I'm only surfacing the behavioral change). This is expected. As you noticed here: error: object file .git/objects/17/02d54e8fba95ef9968a0c9b183fe22ec551c86 is empty fatal: unable to get object info for 1702d54e8fba95ef9968a0c9b183fe22ec551c86 error: failed to run prune the error comes from git prune failing. It is using unreachable but recent objects as graph tips to keep around. If we can't load a tip object, we abort the prune, because we would not want to delete objects that might have been referenced (e.g., if it were a real corrupted object that we could not read). The second behavior (die on broken objects) has been around for a while. The new change (in the commit you bisected to) is that we are considering more objects. I'll admit I didn't really consider the impact on sticking broken object files into the object directory, but I think the code is doing the sensible and safe thing. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mac test failure -- test 3301
On Tue, Nov 11, 2014 at 02:40:19AM +0100, Johan Herland wrote: This and all other failures are due to the output of 'wc -l', which on Mac is {whitespace}1 rather than just 1 as it is on other platforms. fbe4f748 added quotes around the $(... | wc -l) invocation which caused the whitespace to be retained. A minimum fix would be to drop the quotes surrounding $(). Ah, thanks! I thought that quoting command output was a good idea in general. Am I wrong, or is this just one exception to an otherwise good guideline? It usually is a good idea to prevent whitespace splitting by the shell (and especially with things that might unexpectedly be empty, in which case they end up as no argument and not an empty one). So yeah, this is an exception. Anyway, here is the minimal diff to remove quoting from the $(... | wc -l) commands (probably whitespace damaged by GMail - sorry). Junio: feel free to squash this in, or ask for a re-roll: I think we have test_line_count exactly because of this unportability with wc output. You'd want: git ls-tree refs/notes/commits actual test_line_count = 1 actual or similar. By the way, the point of that ls-tree appears to be to check the number of total notes stored. Since notes are stored in a hashed structure, should it be ls-tree -r? Otherwise, we see only the leading directories; two notes whose sha1s start with the same two characters would be considered a single entry. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] run-command: use void to declare that functions take no parameters
On Mon, Nov 10, 2014 at 02:43:10PM -0800, Junio C Hamano wrote: Explicitly declare that git_atexit_dispatch() and git_atexit_clear() take no parameters instead of leaving their parameter list empty and thus unspecified. [...] I was kind of surprised after running a git blame to find that this is a recent thing, and the same patch looked quite substandard with numerious style violations, and I somehow managed to let them slip in X-. Perhaps I was having a bad day or something... I had always just assumed that -Wstrict-prototypes was part of -Wall, but it is not (nor even part of -Wextra!). Maybe it is time to add it to your integration-build flags. :) Looks like we also need this on top of hv/submodule-config (still in pu, so squashing is probably best): diff --git a/submodule-config.h b/submodule-config.h index 58afc83..9061e4e 100644 --- a/submodule-config.h +++ b/submodule-config.h @@ -24,6 +24,6 @@ const struct submodule *submodule_from_name(const unsigned char *commit_sha1, const char *name); const struct submodule *submodule_from_path(const unsigned char *commit_sha1, const char *path); -void submodule_free(); +void submodule_free(void); #endif /* SUBMODULE_CONFIG_H */ -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] replace: fix replacing object with itself
On Tue, Nov 11, 2014 at 12:20:56AM +0100, Manzur Mukhitdinov wrote: When object is replaced with itself git shows unhelpful messages like(git log): fatal: replace depth too high for object SHA1 Prevents user from replacing object with itself(with test for checking this case). Thanks, this looks good to me. +test_expect_success 'replacing object with itself must fail' ' +test_must_fail git replace $HASH1 $HASH1 +HASH8=$(git rev-parse --verify HEAD) +test_must_fail git replace HEAD $HASH8 +test_must_fail git replace --graft HEAD HEAD^ +test_must_fail env GIT_EDITOR=true git replace --edit HEAD +' I think some of these overlap with earlier tests (I know that the --edit case is checked already), but I think it is nice to keep the related checks together here. Should we maybe squash this in to drop the now redundant test (AFAICT, that is the only one that overlaps)? diff --git a/t/t6050-replace.sh b/t/t6050-replace.sh index 98ac9dd..5f96374 100755 --- a/t/t6050-replace.sh +++ b/t/t6050-replace.sh @@ -369,9 +369,8 @@ test_expect_success '--edit with and without already replaced object' ' git cat-file commit $PARA3 | grep A fake Thor ' -test_expect_success '--edit and change nothing or command failed' ' +test_expect_success '--edit with failed editor' ' git replace -d $PARA3 - test_must_fail env GIT_EDITOR=true git replace --edit $PARA3 test_must_fail env GIT_EDITOR=./fakeeditor;false git replace --edit $PARA3 GIT_EDITOR=./fakeeditor git replace --edit $PARA3 git replace -l | grep $PARA3 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] Clean stale environment pointer in finish_command()
On Mon, Nov 10, 2014 at 01:44:24PM -0800, Junio C Hamano wrote: Johannes Schindelin johannes.schinde...@gmx.de writes: In start_command(), unset env fields are initialized via env_array. In finish_command(), the env_array is cleared, therefore the env field will point to free()d data. However, start_command() will set env to env_array.argv only if env was unset to begin with, and if it was already set, the caller will need the original value. Therefore, we need to be very careful only to reset env in finish_command() when it has been initialized in start_command(). Hmph. Does the same observation apply to cmd-argv that is initialied to point to cmd-args.argv only when it is unset? Yes, they behave exactly the same (I think Dscho just doesn't hit it in his patch because he assigns argv manually). I don't have a real problem with going in this direction as a safety measure, but I am not sure that it is safe to reuse a child_process after finish_command in general, without an intervening child_process_init. For instance, calling start_command will convert a child_process.in value of -1 instead a pipe, and overwrite that -1 with the descriptor of the pipe. A subsequent use of the same child_process struct will ask the second child to use that pipe (the write-half of the pipe, mind you) as its stdin, which is nonsensical. So I think you are much better off just using two child_process structs (or a single one and reinitializing in between calls). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] Clean stale environment pointer in finish_command()
On Mon, Nov 10, 2014 at 03:41:09PM +0100, Johannes Schindelin wrote: However, start_command() will set env to env_array.argv only if env was unset to begin with, and if it was already set, the caller will need the original value. Therefore, we need to be very careful only to reset env in finish_command() when it has been initialized in start_command(). In case it was unclear: this is needed for the the suggested switch from the previous method to construct the environment to the new env_array method to work. (The env_array method unfortunately requires the code to initialize the environment twice because finish_command() insists on always releasing the env_array, even if the caller may want to reuse the generated array). I don't think this is unfortunately; freeing the memory was the entire purpose in adding env_array. If you want to easily reuse the same environment in multiple commands, it is still perfectly fine to use env directly, like: struct argv_array env = ARGV_ARRAY_INIT; struct child_process one = CHILD_PROCESS_INIT; struct child_process two = CHILD_PROCESS_INIT; ... setup env with argv_array_push ... one.argv = foo; one.env = env.argv; run_command(one); two.argv = bar; two.env = env.argv; run_command(two); argv_array_clear(env); You do not get the benefit of the auto-cleanup (you have to call argv_array_clear yourself), but that is less bad than repeating the setup of env twice. It may be that Documentation/technical/api-run-command.txt needs to make this more explicit (I just read over it and it looks OK to me. But since I am the one who designed the feature, I am not the best test of whether it is sufficiently clear). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/15] refs.c: return error instead of dying when locking fails during transaction
On Tue, Oct 21, 2014 at 01:36:47PM -0700, Ronnie Sahlberg wrote: commit e193c10fc4f9274d1e751cfcdcc4507818e8d498 upstream. Change lock_ref_sha1_basic to return an error instead of dying when we fail to lock a file during a transaction. This function is only called from transaction_commit() and it knows how to handle these failures. [...] - else - unable_to_lock_die(ref_file, errno); + else { + struct strbuf err = STRBUF_INIT; + unable_to_lock_message(ref_file, errno, err); + error(%s, err.buf); + strbuf_reset(err); + goto error_return; + } I coincidentally just wrote almost the identical patch, because this isn't just a cleanup; it fixes a real bug. During pack_refs, we call prune_ref to lock and delete the loose ref. If the lock fails, that's OK; that just means somebody else is updating it at the same time, and we can skip our pruning step. But due to the unable_to_lock_die call here in lock_ref_sha1_basic, the pack-refs process may die prematurely. I wonder if it is worth pulling this one out from the rest of the series, as it has value (and can be applied) on its own. I did some digging on the history of this, too. Here's the rationale I wrote: lock_ref_sha1_basic: do not die on locking errors lock_ref_sha1_basic is inconsistent about when it calls die() and when it returns NULL to signal an error. This is annoying to any callers that want to recover from a locking error. This seems to be mostly historical accident. It was added in 4bd18c4 (Improve abstraction of ref lock/write., 2006-05-17), which returned an error in all cases except calling safe_create_leading_directories, in which case it died. Later, 40aaae8 (Better error message when we are unable to lock the index file, 2006-08-12) asked hold_lock_file_for_update to die for us, leaving the resolve_ref code-path the only one which returned NULL. We tried to correct that in 5cc3cef (lock_ref_sha1(): do not sometimes error() and sometimes die()., 2006-09-30), by converting all of the die() calls into returns. But we missed the die flag passed to the lock code, leaving us inconsistent. This state persisted until e5c223e (lock_ref_sha1_basic(): if locking fails with ENOENT, retry, 2014-01-18). Because of its retry scheme, it does not ask the lock code to die, but instead manually dies with unable_to_lock_die(). We can make this consistent with the other return paths by converting this to use unable_to_lock_message(), and returning NULL. This is safe to do because all callers already needed to check the return value of the function, since it could fail (and return NULL) for other reasons. I also have some other cleanups to lock_ref_sha1_basic's error handling. I'd be happy to take over this patch and send it along with those cleanups as a separate series. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: diff-highlight highlight words?
[+cc git@vger, since this may be of interest to others] On Tue, Nov 11, 2014 at 02:40:59PM -0800, Scott Baker wrote: I'd like to recreate the github style diffs on the command line. It appears that your diff-highlight is very close. The current version only allows you to invert the colors which isn't ideal. Yes, I never built any configurability into the script. However, you can tweak the definitions at the top to get different effects. Traditionally, ANSI colors on the terminal only came in two flavors: normal and bright (which is attached to the bold attribute). Instead of reversing video, you can switch on brightness like this: diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight index c4404d4..c99de99 100755 --- a/contrib/diff-highlight/diff-highlight +++ b/contrib/diff-highlight/diff-highlight @@ -5,8 +5,8 @@ use strict; # Highlight by reversing foreground and background. You could do # other things like bold or underline if you prefer. -my $HIGHLIGHT = \x1b[7m; -my $UNHIGHLIGHT = \x1b[27m; +my $HIGHLIGHT = \x1b[1m; +my $UNHIGHLIGHT = \x1b[22m; my $COLOR = qr/\x1b\[[0-9;]*m/; my $BORING = qr/$COLOR|\s/; However on most modern terminals the color difference between bright and normal is very subtle, and this doesn't look good. XTerm (and other modern terminals) has 256-color support, so you could do better there (assuming your terminal supports it). The current code builds on the existing colors produced by git (because the operations are only invert colors and uninvert colors). Doing anything fancier requires handling add/del differently. That patch might look something like this (which uses dark red/green for most of the line, and a much brighter variant for the highlighted text): diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight index c4404d4..4e08f3c 100755 --- a/contrib/diff-highlight/diff-highlight +++ b/contrib/diff-highlight/diff-highlight @@ -5,11 +5,16 @@ use strict; # Highlight by reversing foreground and background. You could do # other things like bold or underline if you prefer. -my $HIGHLIGHT = \x1b[7m; -my $UNHIGHLIGHT = \x1b[27m; my $COLOR = qr/\x1b\[[0-9;]*m/; my $BORING = qr/$COLOR|\s/; +# Elements: +# 0 - highlighted text +# 1 - unhighlighted text +# 2 - reset to normal +my @ADD_HIGHLIGHT = (\x1b[38;2;100;255;100m, \x1b[38;2;0;255;0m, \x1b[30m); +my @DEL_HIGHLIGHT = (\x1b[38;2;255;100;100m, \x1b[38;2;255;0;0m, \x1b[30m); + my @removed; my @added; my $in_hunk; @@ -128,8 +133,8 @@ sub highlight_pair { } if (is_pair_interesting(\@a, $pa, $sa, \@b, $pb, $sb)) { - return highlight_line(\@a, $pa, $sa), - highlight_line(\@b, $pb, $sb); + return highlight_line(\@a, $pa, $sa, @DEL_HIGHLIGHT), + highlight_line(\@b, $pb, $sb, @ADD_HIGHLIGHT); } else { return join('', @a), @@ -144,15 +149,18 @@ sub split_line { } sub highlight_line { - my ($line, $prefix, $suffix) = @_; + my ($line, $prefix, $suffix, $highlight, $unhighlight, $reset) = @_; - return join('', + my $r = join('', + $unhighlight, @{$line}[0..($prefix-1)], - $HIGHLIGHT, + $highlight, @{$line}[$prefix..$suffix], - $UNHIGHLIGHT, - @{$line}[($suffix+1)..$#$line] + $unhighlight, + @{$line}[($suffix+1)..$#$line], ); + $r =~ s/\n$/$reset$/; + return $r; } # Pairs are interesting to highlight only if we are going to end up The result does not look terrible to me, though I think I find the reverse-video more obvious when scanning the diff. To look more like GitHub's view, you could instead set the background by doing this on top: diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight index 4e08f3c..6f98db4 100755 --- a/contrib/diff-highlight/diff-highlight +++ b/contrib/diff-highlight/diff-highlight @@ -12,8 +12,8 @@ my $BORING = qr/$COLOR|\s/; # 0 - highlighted text # 1 - unhighlighted text # 2 - reset to normal -my @ADD_HIGHLIGHT = (\x1b[38;2;100;255;100m, \x1b[38;2;0;255;0m, \x1b[30m); -my @DEL_HIGHLIGHT = (\x1b[38;2;255;100;100m, \x1b[38;2;255;0;0m, \x1b[30m); +my @ADD_HIGHLIGHT = (\x1b[30;48;2;100;255;100m, \x1b[30;48;2;0;255;0m, \x1b[0m); +my @DEL_HIGHLIGHT = (\x1b[30;48;2;255;100;100m, \x1b[30;48;2;255;0;0m, \x1b[0m); my @removed; my @added; @@ -151,14 +151,18 @@ sub split_line { sub highlight_line { my ($line, $prefix, $suffix, $highlight, $unhighlight, $reset) = @_; + # strip out existing colors from git, which will clash + # both due to contrast and because of random ANSI resets + # inside the content + my $p = join('', @{$line}[0..($prefix-1)]); + my $t = join('', @{$line}[$prefix..$suffix]); + my $s = join('',
Re: [PATCH v2 1/2] Clean stale environment pointer in finish_command()
On Wed, Nov 12, 2014 at 11:45:19AM +0100, Johannes Schindelin wrote: Okay, I have to say that I was led to believe that reusing the child_process struct is okay because argv_array_clear() explicitly reinitializes the env_array field, something that is useless churn unless you plan to reuse the memory. The argv_array code prepares its data structure for reuse after freeing. But child_process which uses it does not make any such promises. If there were an argv_array_free(), child_process could use it. But there is only argv_array_clear(). I don't have a problem with finish_command leaving its child_process in a known usable state (e.g., by calling child_process_init). But I also don't see much benefit. However, my personal taste says that reusing the same memory is more elegant than to waste extra memory unnecessarily, so I will go with the child_process_init() solution. I do not mind much either way. But I doubt that a single extra struct on the stack will break the bank, compared to the fact that we are forking and execing a new program. I'd also not be surprised if a smart compiler could notice that the variables are used exclusively in non-overlapping bits of the code, and just reuse the stack space. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] Clean stale environment pointer in finish_command()
On Wed, Nov 12, 2014 at 05:52:29AM -0500, Jeff King wrote: However, my personal taste says that reusing the same memory is more elegant than to waste extra memory unnecessarily, so I will go with the child_process_init() solution. I do not mind much either way. But I doubt that a single extra struct on the stack will break the bank, compared to the fact that we are forking and execing a new program. I'd also not be surprised if a smart compiler could notice that the variables are used exclusively in non-overlapping bits of the code, and just reuse the stack space. Actually, I take that back. We are passing a pointer to a struct, rather than by-value, so the compiler cannot know that the sub-function does not store that pointer in a static variable, and reference it in the next call. It must use two variables if it cannot see the definition of run_command. I still think it's pointless optimization to worry about, and you should write whichever is the most readable and maintainable. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] t1410: fix breakage on case-insensitive filesystems
On Wed, Nov 12, 2014 at 09:20:22PM +0100, Johannes Sixt wrote: Am 09.11.2014 um 02:59 schrieb Jeff King: test_expect_success 'stale dirs do not cause d/f conflicts (reflogs off)' ' - test_when_finished git branch -d a || git branch -d a/b + test_when_finished git branch -d one || git branch -d one/two - git branch a/b master - echo a/b@{0} branch: Created from master expect - git log -g --format=%gd %gs a/b actual + git branch one/two master + echo one/two@{0} branch: Created from master expect + git log -g --format=%gd %gs one/two actual test_cmp expect actual - git branch -d a/b + git branch -d one/two - # same as before, but we only create a reflog for a if + # same as before, but we only create a reflog for one if # it already exists, which it does not - git -c core.logallrefupdates=false branch a master + git -c core.logallrefupdates=false branch one master : expect - git log -g --format=%gd %gs a actual + git log -g --format=%gd %gs one actual test_cmp expect actual ' On Linux I observe Deleted branch one/two (was b60a214). warning: unable to unlink .git/logs/refs/heads/one: Is a directory Deleted branch one (was b60a214). ok 15 - stale dirs do not cause d/f conflicts (reflogs off) (notice the warning) Yes, this is expected. The warning actually comes from the git branch -d run during cleanup. Branch one exists but has no reflog. Instead there is a crufty dir call logs/refs/heads/one. The deletion process blindly calls unlink on it and complains when it fails for reasons other than ENOENT. We could suppress that warning, but I didn't think it was really worth the bother. This is such a funny setup (reflogs on, then off, then on, _and_ a d/f conflict in the branches) that I doubt it will come up much. I seem to recall that ancient versions of SunOS used to do bad things when you called unlink on a directory, but I do not know if that is still the case (I also would doubt this is the only place in git where we blindly do an unlink if you can without actually checking the file. but on Windows the test fails with Deleted branch one/two (was b60a214). error: Unable to append to .git/logs/refs/heads/one: Permission denied fatal: Cannot update the ref 'refs/heads/one'. not ok 15 - stale dirs do not cause d/f conflicts (reflogs off) That looks more like it is failing the actual test (i.e., the creation of branch one when there is cruft in the reflog). My guess is that calling open() on a directory is giving us EACCES instead of EISDIR. Can you verify that? If that is the case, then this isn't a new breakage, I think, but just code we weren't previously exercising. It would be interesting to know whether: git config core.logallrefupdates true git branch one/two git branch -d one/two git branch one works (even without my patch). If so, then there's probably something else going on. The relevant bits are in log_ref_setup. We try to open() once, and accept EISDIR as a hint that we may need to remove an empty directory and try again. If Windows gives us EACCES, then we may need to have that trigger the same code. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] t1410: fix breakage on case-insensitive filesystems
On Thu, Nov 13, 2014 at 09:50:24AM +0100, Johannes Sixt wrote: That looks more like it is failing the actual test (i.e., the creation of branch one when there is cruft in the reflog). My guess is that calling open() on a directory is giving us EACCES instead of EISDIR. Can you verify that? If that is the case, then this isn't a new breakage, I think, but just code we weren't previously exercising. It would be interesting to know whether: git config core.logallrefupdates true git branch one/two git branch -d one/two git branch one works (even without my patch). If so, then there's probably something else going on. Don't know what you mean with my patch (the one I was responding to touches only t1410). The patch you are responding to is a fix-up for 9233887, which tweaked the code and added those tests in the first place (I doubt it would work for you, though, as it has a problem on case-insensitive filesystems). But the sequence works as expected with a version built in September: Hmph. So that would mean my theory is not right. Or maybe I am not accounting for something else in my analysis. I guess it is odd that the test right before the failing one passes (it is basically that same sequence, with reflogs turned on for both operations), which implies that we are properly getting EISDIR. The only difference in the failing test is that reflogs are turned off for the git branch one operation. But I cannot see why that would be broken if the other one passes. I wish it were easy for me to ssh into a Windows VM and run gdb. ;) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] t5705: Use the correct file:// URL
On Thu, Nov 13, 2014 at 08:36:07AM +0100, Torsten Bögershausen wrote: A URL like file;//. is (no longer) supported by Git: Typically there is no host, and RFC1738 says that file:///path should be used. Update t5705 to use a working URL. Interesting. This looks like it was unintentionally lost in c59ab2e (connect.c: refactor url parsing, 2013-11-28). Given RFC1738, and that this is the first notice of it (and that it is not even a real use case, but something questionable in the test script), it's probably OK to declare the syntax dead and not treat it like a regression. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug: git log showing nothing when using --since and --until flags with specific dates
On Thu, Nov 13, 2014 at 11:27:06AM +1100, Colin Smith wrote: Apologies if this has already been raised or PEBCAK, but I've noticed a bug where git log with certain date ranges breaks things. It appears to be any --since date with a --until date in the future between 2014-12-01 and 2014-12-09. Dates from 2014-12-10 appear to work, and so does the date 2015-12-01. Ugh. Approxidate strikes again: for i in 2014-11-01 2013-12-01 2014-12-01; do ./test-date approxidate $i done produces: 2014-11-01 - 2014-11-01 09:35:19 + 2013-12-01 - 2013-12-01 09:35:19 + 2014-12-01 - 2014-01-12 09:35:19 + The first two are right, but the fourth one is not. It's probably something simple and stupid. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] approxidate and future ISO-like times
On Thu, Nov 13, 2014 at 04:36:06AM -0500, Jeff King wrote: On Thu, Nov 13, 2014 at 11:27:06AM +1100, Colin Smith wrote: Apologies if this has already been raised or PEBCAK, but I've noticed a bug where git log with certain date ranges breaks things. It appears to be any --since date with a --until date in the future between 2014-12-01 and 2014-12-09. Dates from 2014-12-10 appear to work, and so does the date 2015-12-01. Ugh. Approxidate strikes again: for i in 2014-11-01 2013-12-01 2014-12-01; do ./test-date approxidate $i done produces: 2014-11-01 - 2014-11-01 09:35:19 + 2013-12-01 - 2013-12-01 09:35:19 + 2014-12-01 - 2014-01-12 09:35:19 + The first two are right, but the fourth one is not. It's probably something simple and stupid. Less simple and stupid than I thought, but I think I have a fix. It is not about December specifically, but about the date being in the future. The first patch is a cleanup to help us more accurately test the bug; the interesting bits are in the second one. [1/2]: pass TIME_DATE_NOW to approxidate future-check [2/2]: approxidate: allow ISO-like dates far in the future -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] pass TIME_DATE_NOW to approxidate future-check
The approxidate functions accept an extra now parameter to avoid calling time() themselves. We use this in our test suite to make sure we have a consistent time for computing relative dates. However, deep in the bowels of approxidate, we also call time() to check whether possible dates are far in the future. Let's make sure that the now override makes it to that spot, too, so we can consistently test that feature. Signed-off-by: Jeff King p...@peff.net --- This code path also gets called from the more-strict parse_date (which does not know about the current time). This continues to call time() dynamically, but it is not clear to me if that ever actually matters in practice. date.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/date.c b/date.c index 59dfe57..e1a4d49 100644 --- a/date.c +++ b/date.c @@ -405,9 +405,9 @@ static int is_date(int year, int month, int day, struct tm *now_tm, time_t now, return 0; } -static int match_multi_number(unsigned long num, char c, const char *date, char *end, struct tm *tm) +static int match_multi_number(unsigned long num, char c, const char *date, + char *end, struct tm *tm, time_t now) { - time_t now; struct tm now_tm; struct tm *refuse_future; long num2, num3; @@ -433,7 +433,8 @@ static int match_multi_number(unsigned long num, char c, const char *date, char case '-': case '/': case '.': - now = time(NULL); + if (!now) + now = time(NULL); refuse_future = NULL; if (gmtime_r(now, now_tm)) refuse_future = now_tm; @@ -513,7 +514,7 @@ static int match_digit(const char *date, struct tm *tm, int *offset, int *tm_gmt case '/': case '-': if (isdigit(end[1])) { - int match = match_multi_number(num, *end, date, end, tm); + int match = match_multi_number(num, *end, date, end, tm, 0); if (match) return match; } @@ -1013,7 +1014,8 @@ static const char *approxidate_alpha(const char *date, struct tm *tm, struct tm return end; } -static const char *approxidate_digit(const char *date, struct tm *tm, int *num) +static const char *approxidate_digit(const char *date, struct tm *tm, int *num, +time_t now) { char *end; unsigned long number = strtoul(date, end, 10); @@ -1024,7 +1026,8 @@ static const char *approxidate_digit(const char *date, struct tm *tm, int *num) case '/': case '-': if (isdigit(end[1])) { - int match = match_multi_number(number, *end, date, end, tm); + int match = match_multi_number(number, *end, date, end, + tm, now); if (match) return date + match; } @@ -1087,7 +1090,7 @@ static unsigned long approxidate_str(const char *date, date++; if (isdigit(c)) { pending_number(tm, number); - date = approxidate_digit(date-1, tm, number); + date = approxidate_digit(date-1, tm, number, time_sec); touched = 1; continue; } -- 2.1.2.596.g7379948 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] approxidate: allow ISO-like dates far in the future
When we are parsing approxidate strings and we find three numbers separate by one of :/-., we guess that it may be a date. We feed the numbers to match_multi_number, which checks whether it makes sense as a date in various orderings (e.g., dd/mm/yy or mm/dd/yy, etc). One of the checks we do is to see whether it is a date more than 10 days in the future. This was added in 38035cf (date parsing: be friendlier to our European friends., 2006-04-05), and lets us guess that if it is currently April 2014, then 10/03/2014 is probably March 10th, not October 3rd. This has a downside, though; if you want to be overly generous with your --until date specification, we may wrongly parse 2014-12-01 as 2014-01-12 (because the latter is an in-the-past date). If the year is a future year (i.e., both are future dates), it gets even weirder. Due to the vagaries of approxidate, months _after_ the current date (no matter the year) get flipped, but ones before do not. This patch drops the in the future check for dates of this form, letting us treat them always as -mm-dd, even if they are in the future. This does not affect the normal dd/mm/ versus mm/dd/ lookup, because this code path only kicks in when the first number is greater than 70 (i.e., it must be a year, and cannot be either a date or a month). The one possible casualty is that -dd-mm is less likely to be chosen over -mm-dd. That's probably OK, though because: 1. The difference happens only when the date is in the future. Already we prefer -mm-dd for dates in the past. 2. It's unclear whether anybody even uses -dd-mm regularly. It does not appear in lists of common date formats in Wikipedia[1,2]. 3. Even if (2) is wrong, it is better to prefer ISO-like dates, as that is consistent with what we use elsewhere in git. [1] http://en.wikipedia.org/wiki/Date_and_time_representation_by_country [2] http://en.wikipedia.org/wiki/Calendar_date Reported-by: Colin Smith colin.web...@gmail.com Signed-off-by: Jeff King p...@peff.net --- I did not single out -mm-dd here versus /mm/dd; the change applies no matter which separator is used. If we wanted to be more conservative, we could apply it only to the more ISO-looking form with dashes, but I tend to think the reasoning makes sense no matter which separate you use (i.e., I do not think any variant with year then day is in common use). date.c | 29 - t/t0006-date.sh | 3 +++ 2 files changed, 19 insertions(+), 13 deletions(-) diff --git a/date.c b/date.c index e1a4d49..b9eecfb 100644 --- a/date.c +++ b/date.c @@ -363,7 +363,8 @@ static int match_alpha(const char *date, struct tm *tm, int *offset) return skip_alpha(date); } -static int is_date(int year, int month, int day, struct tm *now_tm, time_t now, struct tm *tm) +static int is_date(int year, int month, int day, struct tm *now_tm, time_t now, + struct tm *tm, int allow_future) { if (month 0 month 13 day 0 day 32) { struct tm check = *tm; @@ -388,14 +389,16 @@ static int is_date(int year, int month, int day, struct tm *now_tm, time_t now, if (!now_tm) return 1; - specified = tm_to_time_t(r); + if (!allow_future) { + specified = tm_to_time_t(r); - /* Be it commit time or author time, it does not make -* sense to specify timestamp way into the future. Make -* sure it is not later than ten days from now... -*/ - if ((specified != -1) (now + 10*24*3600 specified)) - return 0; + /* Be it commit time or author time, it does not make +* sense to specify timestamp way into the future. Make +* sure it is not later than ten days from now... +*/ + if ((specified != -1) (now + 10*24*3600 specified)) + return 0; + } tm-tm_mon = r-tm_mon; tm-tm_mday = r-tm_mday; if (year != -1) @@ -441,10 +444,10 @@ static int match_multi_number(unsigned long num, char c, const char *date, if (num 70) { /* -mm-dd? */ - if (is_date(num, num2, num3, refuse_future, now, tm)) + if (is_date(num, num2, num3, refuse_future, now, tm, 1)) break; /* -dd-mm? */ - if (is_date(num, num3, num2, refuse_future, now, tm)) + if (is_date(num, num3, num2, refuse_future, now, tm, 1)) break; } /* Our eastern European friends say dd.mm.yy[yy] @@ -452,14 +455,14 @@ static int match_multi_number
Re: t9902-completion.sh failed
On Thu, Nov 13, 2014 at 04:59:12PM +0600, Alex Kuleshov wrote: i just got git from master (f6f61cbbad0611e03b712cc354f1665b5d7b087e), built and installed it successfully, now i'm running make test and got following error: *** t9902-completion.sh *** t9902-completion.sh: 118: /home/shk/dev/git/t/../contrib/completion/git-completion.bash: Syntax error: ( unexpected (expecting fi) FATAL: Unexpected exit with code 2 make[2]: *** [t9902-completion.sh] Error 1 make[2]: Leaving directory `/home/shk/dev/git/t' make[1]: *** [test] Error 2 make[1]: Leaving directory `/home/shk/dev/git/t' make: *** [test] Error 2 $ bash --version 4.3.11(1)-release (x86_64-pc-linux-gnu) Weird. I can't reproduce here, using the version of bash from Debian unstable (4.3.30(1)), nor compiling 4.3.11 from sources. What platform are you on (i.e., might it be bash + some other patches installed by the distro)? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: t9902-completion.sh failed
On Thu, Nov 13, 2014 at 05:34:50PM +0600, Alex Kuleshov wrote: I'm using ubuntu 14.04 x86_64 and bash GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu). I didn't applied any patches to bash for all time since i installed system. so it reall weird I tried on a fresh install of Ubuntu 14.04.1, with the same version of bash, and I still can't reproduce the problem. I'm not sure what else to check. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] git checkout $tree -- $path always rewrites files
On Sun, Nov 09, 2014 at 09:21:49AM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote: I think that has direct linkage; what you have in mind I think is http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935 Thanks for that link. It was one of the items in the git blame leftover bits list (websearch for that exact phrase), so I didn't have to do any digging just for this thread ;-) But I made a huge typo above. s/I think/I do not think/; Oh. That might explain some of my confusion. :) The original observation you made in this thread is that when git checkout $tree - $pathspec, whose defintion is to grab paths in the $tree that match $pathspec and put them in the index, and then overwrite the working tree paths with the contents of these paths that are updated in the index with the contents taken from the $tree, unnecessarily rewrites the working tree paths even when they already had contents that were up to date. That is what I called an implementation glitch. The old thread is a different topic. [...] Right, I do agree that these things don't need to be linked. The reason I ended up dealing with the deletion thing is that one obvious way to implement do not touch entries which have not changed is by running a diff between $tree and the index. But doing a diff means we have to reconsider all of the code surrounding deletions (and handling of unmerged entries in the pathspec but not in $tree). I tried a few variants and had trouble making it work (getting caught up either in the make sure each pathspec matched code, or the treat unmerged entries specially behavior). In the patch below, I ended up retaining the existing read_tree_recursive code, and just specially handling the replacement of index entries (which is itself sort of a diff, but it fits nicely into the existing scheme). I'd prefer that these two to be treated separately. Yeah, that makes sense after reading your emails. What I was really unclear on was whether the handling of deletion was a bug or a design choice, and it is the latter (if it were the former, we would not need a transition plan :) ). Anyway, here is the patch. -- 8 -- Subject: checkout $tree: do not throw away unchanged index entries When we git checkout $tree, we pull paths from $tree into the index, and then check the resulting entries out to the worktree. Our method for the first step is rather heavy-handed, though; it clobbers the entire existing index entry, even if the content is the same. This means we lose our stat information, leading checkout_entry to later rewrite the entire file with identical content. Instead, let's see if we have the identical entry already in the index, in which case we leave it in place. That lets checkout_entry do the right thing. Our tests cover two interesting cases: 1. We make sure that a file which has no changes is not rewritten. 2. We make sure that we do update a file that is unchanged in the index (versus $tree), but has working tree changes. We keep the old index entry, and checkout_entry is able to realize that our stat information is out of date. Signed-off-by: Jeff King p...@peff.net --- Note that the test refreshes the index manually (because we are tweaking the timestamp of file2). In normal use this should not be necessary (i.e., your entries should generally be uptodate). I did wonder if checkout should be refreshing the index itself, but it would a bunch of extra lstats in the common case. builtin/checkout.c| 31 +-- t/t2022-checkout-paths.sh | 17 + 2 files changed, 42 insertions(+), 6 deletions(-) diff --git a/builtin/checkout.c b/builtin/checkout.c index 5410dac..67cab4e 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -65,21 +65,40 @@ static int post_checkout_hook(struct commit *old, struct commit *new, static int update_some(const unsigned char *sha1, const char *base, int baselen, const char *pathname, unsigned mode, int stage, void *context) { - int len; + struct strbuf buf = STRBUF_INIT; struct cache_entry *ce; + int pos; if (S_ISDIR(mode)) return READ_TREE_RECURSIVE; - len = baselen + strlen(pathname); - ce = xcalloc(1, cache_entry_size(len)); + strbuf_add(buf, base, baselen); + strbuf_addstr(buf, pathname); + + /* +* If the entry is the same as the current index, we can leave the old +* entry in place. Whether it is UPTODATE or not, checkout_entry will +* do the right thing. +*/ + pos = cache_name_pos(buf.buf, buf.len); + if (pos = 0) { + struct cache_entry *old = active_cache[pos]; + if (create_ce_mode(mode) == old-ce_mode + !hashcmp(old-sha1, sha1)) { + old-ce_flags
Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option
On Sun, Jul 07, 2013 at 10:49:46AM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: Perhaps we need git cat-file --batch-format=%(disk-size) %(object) or similar. I agree with your reasoning. It may be simpler to give an interface to ask for which pieces of info, e.g. --batch-cols=size,disksize, without giving the readers a flexible format. I started on this, and it turned out not to really be any simpler. In particular there is the question of whether git cat-file --batch-cols=size,type is different from git cat-file --batch-cols=type,size If so, then you are basically implementing the equivalent of a macro format anyway (you have to parse it left to right to know the order). And if not, you end up translating the column list into a bit-field, and the boilerplate for adding a new item is about the same as for a macro format. So I went ahead with the full formats for my re-roll. It turned out pretty reasonable, I think. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] cat-file: add --batch-disk-sizes option
On Mon, Jul 08, 2013 at 07:07:01PM +0530, Ramkumar Ramachandra wrote: There's also syntax sharing. I don't think each command should have its own syntax. f-e-r already has %(objectsize). If we plan to have a common syntax, perhaps %(disk-size) should be %(objectsize:disk) or something. Ofcourse. I didn't notice %(objectsize); %(objectsize[:disk]) is a fine suggestion. Adding formatting to cat-file --batch from scratch could be another big chunk of code (that also comes with bugs, usually) and may or may not be compatible with the common syntax because of some oversight. Oh, I'm proposing that Peff implements just %H and %(objectsize[:disk]) for _now_, because that's what he wants. It should be a tiny 20-line parser that's easy to swap out. I went with %(objectname), %(objectsize), and %(objectsize:disk). The former match for-each-ref, and the latter extends it in a natural-ish way (though it is arguable whether foo:modifier should mean do something with the foo value or this is like foo, but not quite). In the long run, I would like to see long names for each atom, with short aliases (so %H and %h for %(objectname) and %(objectname:short), available everywhere). But I think it is OK to start without the aliases, and then pick them up as the implementations and interfaces are unified. IOW, it is OK to say cat-file has not learned about %H yet, and later add it; we cannot teach it %H now and then change our minds later. --batch-cols=... or --batch-disk-size would be simpler, but we might never be able to remove that code. Agreed. The approach paints us into a design-corner, and must therefore be avoided. I would say it is worth one of the other routes if it turned out to be dramatically simpler. But having just done the work to add formats for cat-file, it is really not too bad. It lacks some of the niceties of the other formatters (e.g., colors), but again, we can always add them in later as the implementations unify. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 00/10] cat-file formats/on-disk sizes
Here's a re-roll of the cat-file --batch-disk-sizes series. The main change is that it replaces the --batch-disk-sizes option with a format string for --batch-check, syntactically modeled after the for-each-ref format string. [01/10]: zero-initialize object_info structs [02/10]: teach sha1_object_info_extended a disk_size query These two are the same as before. [03/10]: t1006: modernize output comparisons New. Cleanup since I add some related tests later on. [04/10]: cat-file: teach --batch to stream blob objects New. I think this is a sane thing to be doing, and it helps reorganize the code for later changes. But note the performance caveat in the commit message. [05/10]: cat-file: refactor --batch option parsing [06/10]: cat-file: add --batch-check=format These ones add the formatting code. [07/10]: cat-file: add %(objectsize:disk) format atom And this is the format analog of my original 3/4. [08/10]: cat-file: split --batch input lines on whitespace New. This makes certain types of analysis simpler when you pipe rev-list --objects into cat-file --batch-check, because you can retain the object paths through the pipe. [09/10]: pack-revindex: use unsigned to store number of objects New. Addresses the 2^31 bug that Brandon noticed. [10/10]: pack-revindex: radix-sort the revindex From the old 4/4, but with cleanups and addressing comments from the list (details along with the patch). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/10] zero-initialize object_info structs
The sha1_object_info_extended function expects the caller to provide a struct object_info which contains pointers to query items that will be filled in. The purpose of providing pointers rather than storing the response directly in the struct is so that callers can choose not to incur the expense in finding particular fields that they do not care about. Right now the only query item is sizep, and all callers set it explicitly to choose whether or not to query it; they can then leave the rest of the struct uninitialized. However, as we add new query items, each caller will have to be updated to explicitly turn off the new ones (by setting them to NULL). Instead, let's teach each caller to zero-initialize the struct, so that they do not have to learn about each new query item added. Signed-off-by: Jeff King p...@peff.net --- sha1_file.c | 2 +- streaming.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 0af19c0..de06a97 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2428,7 +2428,7 @@ int sha1_object_info(const unsigned char *sha1, unsigned long *sizep) int sha1_object_info(const unsigned char *sha1, unsigned long *sizep) { - struct object_info oi; + struct object_info oi = {0}; oi.sizep = sizep; return sha1_object_info_extended(sha1, oi); diff --git a/streaming.c b/streaming.c index cabcd9d..cac282f 100644 --- a/streaming.c +++ b/streaming.c @@ -135,7 +135,7 @@ struct git_istream *open_istream(const unsigned char *sha1, struct stream_filter *filter) { struct git_istream *st; - struct object_info oi; + struct object_info oi = {0}; const unsigned char *real = lookup_replace_object(sha1); enum input_source src = istream_source(real, type, oi); -- 1.8.3.rc3.24.gec82cb9 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/10] teach sha1_object_info_extended a disk_size query
Using sha1_object_info_extended, a caller can find out the type of an object, its size, and information about where it is stored. In addition to the object's true size, it can also be useful to know the size that the object takes on disk (e.g., to generate statistics about which refs consume space). This patch adds a disk_sizep field to struct object_info, and fills it in during sha1_object_info_extended if it is non-NULL. Signed-off-by: Jeff King p...@peff.net --- cache.h | 1 + sha1_file.c | 20 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/cache.h b/cache.h index dd0fb33..2d06169 100644 --- a/cache.h +++ b/cache.h @@ -1130,6 +1130,7 @@ struct object_info { struct object_info { /* Request */ unsigned long *sizep; + unsigned long *disk_sizep; /* Response */ enum { diff --git a/sha1_file.c b/sha1_file.c index de06a97..4c2365f 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1697,7 +1697,8 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, #define POI_STACK_PREALLOC 64 static int packed_object_info(struct packed_git *p, off_t obj_offset, - unsigned long *sizep, int *rtype) + unsigned long *sizep, int *rtype, + unsigned long *disk_sizep) { struct pack_window *w_curs = NULL; unsigned long size; @@ -1731,6 +1732,11 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, } } + if (disk_sizep) { + struct revindex_entry *revidx = find_pack_revindex(p, obj_offset); + *disk_sizep = revidx[1].offset - obj_offset; + } + while (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) { off_t base_offset; /* Push the object we're going to leave behind */ @@ -2357,7 +2363,8 @@ struct packed_git *find_sha1_pack(const unsigned char *sha1, } -static int sha1_loose_object_info(const unsigned char *sha1, unsigned long *sizep) +static int sha1_loose_object_info(const unsigned char *sha1, unsigned long *sizep, + unsigned long *disk_sizep) { int status; unsigned long mapsize, size; @@ -2368,6 +2375,8 @@ static int sha1_loose_object_info(const unsigned char *sha1, unsigned long *size map = map_sha1_file(sha1, mapsize); if (!map) return -1; + if (disk_sizep) + *disk_sizep = mapsize; if (unpack_sha1_header(stream, map, mapsize, hdr, sizeof(hdr)) 0) status = error(unable to unpack %s header, sha1_to_hex(sha1)); @@ -2391,13 +2400,15 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) if (co) { if (oi-sizep) *(oi-sizep) = co-size; + if (oi-disk_sizep) + *(oi-disk_sizep) = 0; oi-whence = OI_CACHED; return co-type; } if (!find_pack_entry(sha1, e)) { /* Most likely it's a loose object. */ - status = sha1_loose_object_info(sha1, oi-sizep); + status = sha1_loose_object_info(sha1, oi-sizep, oi-disk_sizep); if (status = 0) { oi-whence = OI_LOOSE; return status; @@ -2409,7 +2420,8 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) return status; } - status = packed_object_info(e.p, e.offset, oi-sizep, rtype); + status = packed_object_info(e.p, e.offset, oi-sizep, rtype, + oi-disk_sizep); if (status 0) { mark_bad_packed_object(e.p, sha1); status = sha1_object_info_extended(sha1, oi); -- 1.8.3.rc3.24.gec82cb9 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/10] t1006: modernize output comparisons
In modern tests, we typically put output into a file and compare it with test_cmp. This is nicer than just comparing via test, and much shorter than comparing via test and printing a custom message. Signed-off-by: Jeff King p...@peff.net --- I didn't do the whole file, just the ones of a particular style close to what I was touching. t/t1006-cat-file.sh | 61 - 1 file changed, 18 insertions(+), 43 deletions(-) diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh index 9cc5c6b..c2f2503 100755 --- a/t/t1006-cat-file.sh +++ b/t/t1006-cat-file.sh @@ -36,66 +36,41 @@ $content ' test_expect_success Type of $type is correct ' -test $type = $(git cat-file -t $sha1) + echo $type expect + git cat-file -t $sha1 actual + test_cmp expect actual ' test_expect_success Size of $type is correct ' -test $size = $(git cat-file -s $sha1) + echo $size expect + git cat-file -s $sha1 actual + test_cmp expect actual ' test -z $content || test_expect_success Content of $type is correct ' - expect=$(maybe_remove_timestamp $content $no_ts) - actual=$(maybe_remove_timestamp $(git cat-file $type $sha1) $no_ts) - -if test z$expect = z$actual - then - : happy - else - echo Oops: expected $expect - echo but got $actual - false -fi + maybe_remove_timestamp $content $no_ts expect + maybe_remove_timestamp $(git cat-file $type $sha1) $no_ts actual + test_cmp expect actual ' test_expect_success Pretty content of $type is correct ' - expect=$(maybe_remove_timestamp $pretty_content $no_ts) - actual=$(maybe_remove_timestamp $(git cat-file -p $sha1) $no_ts) -if test z$expect = z$actual - then - : happy - else - echo Oops: expected $expect - echo but got $actual - false -fi + maybe_remove_timestamp $pretty_content $no_ts expect + maybe_remove_timestamp $(git cat-file -p $sha1) $no_ts actual + test_cmp expect actual ' test -z $content || test_expect_success --batch output of $type is correct ' - expect=$(maybe_remove_timestamp $batch_output $no_ts) - actual=$(maybe_remove_timestamp $(echo $sha1 | git cat-file --batch) $no_ts) -if test z$expect = z$actual - then - : happy - else - echo Oops: expected $expect - echo but got $actual - false -fi + maybe_remove_timestamp $batch_output $no_ts expect + maybe_remove_timestamp $(echo $sha1 | git cat-file --batch) $no_ts actual + test_cmp expect actual ' test_expect_success --batch-check output of $type is correct ' - expect=$sha1 $type $size - actual=$(echo_without_newline $sha1 | git cat-file --batch-check) -if test z$expect = z$actual - then - : happy - else - echo Oops: expected $expect - echo but got $actual - false -fi + echo $sha1 $type $size expect + echo_without_newline $sha1 | git cat-file --batch-check actual + test_cmp expect actual ' } -- 1.8.3.rc3.24.gec82cb9 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/10] cat-file: refactor --batch option parsing
We currently use an int to tell us whether --batch parsing is on, and if so, whether we should print the full object contents. Let's instead factor this into a struct, filled in by callback, which will make further batch-related options easy to add. Signed-off-by: Jeff King p...@peff.net --- builtin/cat-file.c | 56 -- 1 file changed, 38 insertions(+), 18 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 70dd8c8..5254fe8 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -13,9 +13,6 @@ #include userdiff.h #include streaming.h -#define BATCH 1 -#define BATCH_CHECK 2 - static int cat_one_file(int opt, const char *exp_type, const char *obj_name) { unsigned char sha1[20]; @@ -142,7 +139,12 @@ static void print_object_or_die(int fd, const unsigned char *sha1, } } -static int batch_one_object(const char *obj_name, int print_contents) +struct batch_options { + int enabled; + int print_contents; +}; + +static int batch_one_object(const char *obj_name, struct batch_options *opt) { unsigned char sha1[20]; enum object_type type = 0; @@ -167,19 +169,19 @@ static int batch_objects(int print_contents) printf(%s %s %lu\n, sha1_to_hex(sha1), typename(type), size); fflush(stdout); - if (print_contents == BATCH) { + if (opt-print_contents) { print_object_or_die(1, sha1, type, size); write_or_die(1, \n, 1); } return 0; } -static int batch_objects(int print_contents) +static int batch_objects(struct batch_options *opt) { struct strbuf buf = STRBUF_INIT; while (strbuf_getline(buf, stdin, '\n') != EOF) { - int error = batch_one_object(buf.buf, print_contents); + int error = batch_one_object(buf.buf, opt); if (error) return error; } @@ -201,10 +203,28 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) return git_default_config(var, value, cb); } +static int batch_option_callback(const struct option *opt, +const char *arg, +int unset) +{ + struct batch_options *bo = opt-value; + + if (unset) { + memset(bo, 0, sizeof(*bo)); + return 0; + } + + bo-enabled = 1; + bo-print_contents = !strcmp(opt-long_name, batch); + + return 0; +} + int cmd_cat_file(int argc, const char **argv, const char *prefix) { - int opt = 0, batch = 0; + int opt = 0; const char *exp_type = NULL, *obj_name = NULL; + struct batch_options batch = {0}; const struct option options[] = { OPT_GROUP(N_(type can be one of: blob, tree, commit, tag)), @@ -215,12 +235,12 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) OPT_SET_INT('p', NULL, opt, N_(pretty-print object's content), 'p'), OPT_SET_INT(0, textconv, opt, N_(for blob objects, run textconv on object's content), 'c'), - OPT_SET_INT(0, batch, batch, - N_(show info and content of objects fed from the standard input), - BATCH), - OPT_SET_INT(0, batch-check, batch, - N_(show info about objects fed from the standard input), - BATCH_CHECK), + { OPTION_CALLBACK, 0, batch, batch, NULL, + N_(show info and content of objects fed from the standard input), + PARSE_OPT_NOARG, batch_option_callback }, + { OPTION_CALLBACK, 0, batch-check, batch, NULL, + N_(show info about objects fed from the standard input), + PARSE_OPT_NOARG, batch_option_callback }, OPT_END() }; @@ -237,19 +257,19 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) else usage_with_options(cat_file_usage, options); } - if (!opt !batch) { + if (!opt !batch.enabled) { if (argc == 2) { exp_type = argv[0]; obj_name = argv[1]; } else usage_with_options(cat_file_usage, options); } - if (batch (opt || argc)) { + if (batch.enabled (opt || argc)) { usage_with_options(cat_file_usage, options); } - if (batch) - return batch_objects(batch); + if (batch.enabled) + return batch_objects(batch); return cat_one_file(opt, exp_type, obj_name); } -- 1.8.3.rc3.24.gec82cb9 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http
[PATCH 04/10] cat-file: teach --batch to stream blob objects
The regular git cat-file -p and git cat-file blob code paths already learned to stream large blobs. Let's do the same here. Note that this means we look up the type and size before making a decision of whether to load the object into memory or stream (just like the -p code path does). That can lead to extra work, but it should be dwarfed by the cost of actually accessing the object itself. In my measurements, there was a 1-2% slowdown when using --batch on a large number of objects. Signed-off-by: Jeff King p...@peff.net --- builtin/cat-file.c | 41 - 1 file changed, 28 insertions(+), 13 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 045cee7..70dd8c8 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -117,12 +117,36 @@ static int batch_one_object(const char *obj_name, int print_contents) return 0; } +static void print_object_or_die(int fd, const unsigned char *sha1, + enum object_type type, unsigned long size) +{ + if (type == OBJ_BLOB) { + if (stream_blob_to_fd(fd, sha1, NULL, 0) 0) + die(unable to stream %s to stdout, sha1_to_hex(sha1)); + } + else { + enum object_type rtype; + unsigned long rsize; + void *contents; + + contents = read_sha1_file(sha1, rtype, rsize); + if (!contents) + die(object %s disappeared, sha1_to_hex(sha1)); + if (rtype != type) + die(object %s changed type!?, sha1_to_hex(sha1)); + if (rsize != size) + die(object %s change size!?, sha1_to_hex(sha1)); + + write_or_die(fd, contents, size); + free(contents); + } +} + static int batch_one_object(const char *obj_name, int print_contents) { unsigned char sha1[20]; enum object_type type = 0; unsigned long size; - void *contents = NULL; if (!obj_name) return 1; @@ -133,16 +157,10 @@ static int batch_one_object(const char *obj_name, int print_contents) return 0; } - if (print_contents == BATCH) - contents = read_sha1_file(sha1, type, size); - else - type = sha1_object_info(sha1, size); - + type = sha1_object_info(sha1, size); if (type = 0) { printf(%s missing\n, obj_name); fflush(stdout); - if (print_contents == BATCH) - free(contents); return 0; } @@ -150,12 +168,9 @@ static int batch_one_object(const char *obj_name, int print_contents) fflush(stdout); if (print_contents == BATCH) { - write_or_die(1, contents, size); - printf(\n); - fflush(stdout); - free(contents); + print_object_or_die(1, sha1, type, size); + write_or_die(1, \n, 1); } - return 0; } -- 1.8.3.rc3.24.gec82cb9 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/10] cat-file: add --batch-check=format
The `cat-file --batch-check` command can be used to quickly get information about a large number of objects. However, it provides a fixed set of information. This patch adds an optional format option to --batch-check to allow a caller to specify which items they are interested in, and in which order to output them. This is not very exciting for now, since we provide the same limited set that you could already get. However, it opens the door to adding new format items in the future without breaking backwards compatibility (or forcing callers to pay the cost to calculate uninteresting items). Since the --batch option shares code with --batch-check, it receives the same feature, though it is less likely to be of interest there. The format atom names are chosen to match their counterparts in for-each-ref. Though we do not (yet) share any code with for-each-ref's formatter, this keeps the interface as consistent as possible, and may help later on if the implementations are unified. Signed-off-by: Jeff King p...@peff.net --- If the 1% slowdown in the streaming blob patch is too much, the simplest thing would be to have this formatting apply only to --batch-check, and let --batch follow its own simpler code path. I doubt anybody really cares about having custom formats for --batch, as it is less about analysis and more about getting enough information to recreate the objects. Also note that there is no %(contents) atom that one could use to emulate --batch via --batch-check. I don't see much point, and it would mean we would not want to build the formatting on strbuf_expand (because we don't want to copy a large blob into the strbuf). We can add it later if somebody finds an actual use. Documentation/git-cat-file.txt | 55 - builtin/cat-file.c | 107 +++-- t/t1006-cat-file.sh| 6 +++ 3 files changed, 142 insertions(+), 26 deletions(-) diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt index 30d585a..dd5d6e4 100644 --- a/Documentation/git-cat-file.txt +++ b/Documentation/git-cat-file.txt @@ -58,12 +58,16 @@ OPTIONS to apply the filter to the content recorded in the index at path. --batch:: - Print the SHA-1, type, size, and contents of each object provided on - stdin. May not be combined with any other options or arguments. +--batch=format:: + Print object information and contents for each object provided + on stdin. May not be combined with any other options or arguments. + See the section `BATCH OUTPUT` below for details. --batch-check:: - Print the SHA-1, type, and size of each object provided on stdin. May not - be combined with any other options or arguments. +--batch-check=format:: + Print object information for each object provided on stdin. May + not be combined with any other options or arguments. See the + section `BATCH OUTPUT` below for details. OUTPUT -- @@ -78,23 +82,52 @@ each object specified on stdin: If type is specified, the raw (though uncompressed) contents of the object will be returned. -If '--batch' is specified, output of the following form is printed for each -object specified on stdin: +BATCH OUTPUT + + +If `--batch` or `--batch-check` is given, `cat-file` will read objects +from stdin, one per line, and print information about them. + +Each line is considered as a whole object name, and is parsed as if +given to linkgit:git-rev-parse[1]. + +You can specify the information shown for each object by using a custom +`format`. The `format` is copied literally to stdout for each +object, with placeholders of the form `%(atom)` expanded, followed by a +newline. The available atoms are: + +`objectname`:: + The sha1 hash of the object. + +`objecttype`:: + The type of of the object (the same as `cat-file -t` reports). + +`objectsize`:: + The size, in bytes, of the object (the same as `cat-file -s` + reports). + +If no format is specified, the default format is `%(objectname) +%(objecttype) %(objectsize)`. + +If `--batch` is specified, the object information is followed by the +object contents (consisting of `%(objectsize)` bytes), followed by a +newline. + +For example, `--batch` without a custom format would produce: sha1 SP type SP size LF contents LF -If '--batch-check' is specified, output of the following form is printed for -each object specified on stdin: +Whereas `--batch-check='%(objectname) %(objecttype)'` would produce: -sha1 SP type SP size LF +sha1 SP type LF -For both '--batch' and '--batch-check', output of the following form is printed -for each object specified on stdin that does not exist in the repository: +If a name is specified on stdin that cannot be resolved to an object in +the repository, then `cat-file` will ignore any custom format and print: object
[PATCH 07/10] cat-file: add %(objectsize:disk) format atom
This atom is just like %(objectsize), except that it shows the on-disk size of the object rather than the object's true size. In other words, it makes the disk_size query of sha1_object_info_extended available via the command-line. This can be used for rough attribution of disk usage to particular refs, though see the caveats in the documentation. This patch does not include any tests, as the exact numbers returned are volatile and subject to zlib and packing decisions. We cannot even reliably guarantee that the on-disk size is smaller than the object content (though in general this should be the case for non-trivial objects). Signed-off-by: Jeff King p...@peff.net --- Documentation/git-cat-file.txt | 18 ++ builtin/cat-file.c | 6 ++ 2 files changed, 24 insertions(+) diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt index dd5d6e4..06bdc43 100644 --- a/Documentation/git-cat-file.txt +++ b/Documentation/git-cat-file.txt @@ -106,6 +106,10 @@ newline. The available atoms are: The size, in bytes, of the object (the same as `cat-file -s` reports). +`objectsize:disk`:: + The size, in bytes, that the object takes up on disk. See the + note about on-disk sizes in the `CAVEATS` section below. + If no format is specified, the default format is `%(objectname) %(objecttype) %(objectsize)`. @@ -133,6 +137,20 @@ the repository, then `cat-file` will ignore any custom format and print: object SP missing LF + +CAVEATS +--- + +Note that the sizes of objects on disk are reported accurately, but care +should be taken in drawing conclusions about which refs or objects are +responsible for disk usage. The size of a packed non-delta object may be +much larger than the size of objects which delta against it, but the +choice of which object is the base and which is the delta is arbitrary +and is subject to change during a repack. Note also that multiple copies +of an object may be present in the object database; in this case, it is +undefined which copy's size will be reported. + + GIT --- Part of the linkgit:git[1] suite diff --git a/builtin/cat-file.c b/builtin/cat-file.c index b43a0c5..11fa8c0 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -118,6 +118,7 @@ struct expand_data { unsigned char sha1[20]; enum object_type type; unsigned long size; + unsigned long disk_size; /* * If mark_query is true, we do not expand anything, but rather @@ -155,6 +156,11 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len, data-info.sizep = data-size; else strbuf_addf(sb, %lu, data-size); + } else if (is_atom(objectsize:disk, atom, len)) { + if (data-mark_query) + data-info.disk_sizep = data-disk_size; + else + strbuf_addf(sb, %lu, data-disk_size); } else die(unknown format element: %.*s, len, atom); } -- 1.8.3.rc3.24.gec82cb9 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/10] pack-revindex: use unsigned to store number of objects
A packfile may have up to 2^32-1 objects in it, so the right data type to use is uint32_t. We currently use a signed int, which means that we may behave incorrectly for packfiles with more than 2^31-1 objects on 32-bit systems. Nobody has noticed because having 2^31 objects is pretty insane. The linux.git repo has on the order of 2^22 objects, which is hundreds of times smaller than necessary to trigger the bug. Let's bump this up to an unsigned. On 32-bit systems, this gives us the correct data-type, and on 64-bit systems, it is probably more efficient to use the native unsigned than a true uint32_t. While we're at it, we can fix the binary search not to overflow in such a case if our unsigned is 32 bits. Signed-off-by: Jeff King p...@peff.net --- I didn't look farther in the pack code to see if we have other problematic instances. So there may be others lurking, but these ones were close to the area I was working in. pack-revindex.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/pack-revindex.c b/pack-revindex.c index 77a0465..1aa9754 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -72,8 +72,8 @@ static void create_pack_revindex(struct pack_revindex *rix) static void create_pack_revindex(struct pack_revindex *rix) { struct packed_git *p = rix-p; - int num_ent = p-num_objects; - int i; + unsigned num_ent = p-num_objects; + unsigned i; const char *index = p-index_data; rix-revindex = xmalloc(sizeof(*rix-revindex) * (num_ent + 1)); @@ -114,7 +114,7 @@ struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs) struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs) { int num; - int lo, hi; + unsigned lo, hi; struct pack_revindex *rix; struct revindex_entry *revindex; @@ -132,7 +132,7 @@ struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs) lo = 0; hi = p-num_objects + 1; do { - int mi = (lo + hi) / 2; + unsigned mi = lo + (hi - lo) / 2; if (revindex[mi].offset == ofs) { return revindex + mi; } else if (ofs revindex[mi].offset) -- 1.8.3.rc3.24.gec82cb9 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/10] pack-revindex: radix-sort the revindex
The pack revindex stores the offsets of the objects in the pack in sorted order, allowing us to easily find the on-disk size of each object. To compute it, we populate an array with the offsets from the sha1-sorted idx file, and then use qsort to order it by offsets. That does O(n log n) offset comparisons, and profiling shows that we spend most of our time in cmp_offset. However, since we are sorting on a simple off_t, we can use numeric sorts that perform better. A radix sort can run in O(k*n), where k is the number of digits in our number. For a 64-bit off_t, using 16-bit digits gives us k=4. On the linux.git repo, with about 3M objects to sort, this yields a 400% speedup. Here are the best-of-five numbers for running echo HEAD | git cat-file --batch-disk-size, which is dominated by time spent building the pack revindex: before after real0m0.834s 0m0.204s user0m0.788s 0m0.164s sys 0m0.040s 0m0.036s On a smaller repo, the radix sort would not be as impressive (and could even be worse), as we are trading the log(n) factor for the k=4 of the radix sort. However, even on git.git, with 173K objects, it shows some improvement: before after real0m0.046s 0m0.017s user0m0.036s 0m0.012s sys 0m0.008s 0m0.000s Signed-off-by: Jeff King p...@peff.net --- I changed a few things from the original, including: 1. We take an unsigned number of objects to match the fix in the last patch. 2. The 16-bit digit size is factored out to a single place, which avoids magic numbers and repeating ourselves. 3. The digits variable is renamed to bits, which is more accurate. 4. The outer loop condition uses the simpler while (max bits). 5. We use memcpy instead of an open-coded loop to copy the whole array at the end. The individual bucket-assignment is still done by struct assignment. I haven't timed if memcpy would make a difference there. 6. The 64K*sizeof(int) pos array is now heap-allocated, in case there are platforms with a small stack. I re-ran my timings to make sure none of the above impacted them; it turned out the same. pack-revindex.c | 84 + 1 file changed, 79 insertions(+), 5 deletions(-) diff --git a/pack-revindex.c b/pack-revindex.c index 1aa9754..9365bc2 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -59,11 +59,85 @@ static int cmp_offset(const void *a_, const void *b_) /* revindex elements are lazily initialized */ } -static int cmp_offset(const void *a_, const void *b_) +/* + * This is a least-significant-digit radix sort. + */ +static void sort_revindex(struct revindex_entry *entries, unsigned n, off_t max) { - const struct revindex_entry *a = a_; - const struct revindex_entry *b = b_; - return (a-offset b-offset) ? -1 : (a-offset b-offset) ? 1 : 0; + /* +* We use a digit size of 16 bits. That keeps our memory +* usage reasonable, and we can generally (for a 4G or smaller +* packfile) quit after two rounds of radix-sorting. +*/ +#define DIGIT_SIZE (16) +#define BUCKETS (1 DIGIT_SIZE) + /* +* We want to know the bucket that a[i] will go into when we are using +* the digit that is N bits from the (least significant) end. +*/ +#define BUCKET_FOR(a, i, bits) (((a)[(i)].offset (bits)) (BUCKETS-1)) + + /* +* We need O(n) temporary storage, so we sort back and forth between +* the real array and our tmp storage. To keep them straight, we always +* sort from a into buckets in b. +*/ + struct revindex_entry *tmp = xcalloc(n, sizeof(*tmp)); + struct revindex_entry *a = entries, *b = tmp; + int bits = 0; + unsigned *pos = xmalloc(BUCKETS * sizeof(*pos)); + + while (max bits) { + struct revindex_entry *swap; + int i; + + memset(pos, 0, BUCKETS * sizeof(*pos)); + + /* +* We want pos[i] to store the index of the last element that +* will go in bucket i (actually one past the last element). +* To do this, we first count the items that will go in each +* bucket, which gives us a relative offset from the last +* bucket. We can then cumulatively add the index from the +* previous bucket to get the true index. +*/ + for (i = 0; i n; i++) + pos[BUCKET_FOR(a, i, bits)]++; + for (i = 1; i BUCKETS; i++) + pos[i] += pos[i-1]; + + /* +* Now we can drop the elements into their correct buckets (in +* our temporary array). We iterate the pos counter backwards +* to avoid using an extra index to count up. And since we are +* going backwards there, we must also go
Re: [PATCH 10/10] pack-revindex: radix-sort the revindex
On Wed, Jul 10, 2013 at 07:55:57AM -0400, Jeff King wrote: 5. We use memcpy instead of an open-coded loop to copy the whole array at the end. The individual bucket-assignment is still done by struct assignment. I haven't timed if memcpy would make a difference there. I just timed this, and I can't measure any difference. I think the struct assignment is the more readable option, and I do not think any compilers should have trouble with it. But if they do, we can switch it for a memcpy. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10] pack-revindex: radix-sort the revindex
On Wed, Jul 10, 2013 at 06:47:49PM +0530, Ramkumar Ramachandra wrote: For a 64-bit off_t, using 16-bit digits gives us k=4. Wait, isn't off_t a signed data type? Did you account for that in your algorithm? It is signed, but the values we are storing in the revindex are all positive file offsets. Right-shifting a positive signed type is explicitly allowed in C. -static int cmp_offset(const void *a_, const void *b_) +/* + * This is a least-significant-digit radix sort. + */ Any particular reason for choosing LSD, and not MSD? Simplicity. An MSD implementation should have the same algorithmic complexity and in theory, one can do MSD in-place. I'm happy enough with the speedup here, but if you want to take a stab at beating my times with MSD, please feel free. The other usual downside of MSD is that it is typically not stable, but we don't care about that here. We know that our sort keys are unique. +#define DIGIT_SIZE (16) +#define BUCKETS (1 DIGIT_SIZE) Okay, NUMBER_OF_BUCKETS = 2^RADIX, and you choose a hex radix. Is off_t guaranteed to be fixed-length though? I thought only the ones in stdint.h were guaranteed to be fixed-length? I'm not sure what you mean by fixed-length. If you mean does it have the same size on every platform, then no. It will typically be 32-bit on platforms without largefile support, and 64-bit elsewhere. But it shouldn't matter. We'll first sort the entries by the lower 16 bits, and then if we have more bits, by the next 16 bits, and so on. We quit when the maximum value to sort (which we know ahead of time from the size of the packfile) is smaller than the 16-bits we are on. So we don't need to know the exact size of off_t, only the maximum value in our list (which must, by definition, be smaller than what can be represented by off_t). + /* +* We want to know the bucket that a[i] will go into when we are using +* the digit that is N bits from the (least significant) end. +*/ +#define BUCKET_FOR(a, i, bits) (((a)[(i)].offset (bits)) (BUCKETS-1)) Ouch! This is unreadable. Just write an inline function instead? A % would've been easier on the eyes, but you chose base-16. I specifically avoided an inline function because they are subject to compiler settings. This isn't just it would be a bit faster if this got inlined, and OK otherwise but this would be horribly slow if not inlined. I'm also not sure that static inline unsigned bucket_for(const struct revindex *a, unsigned i, unsigned bits) { return a[i].offset bits (BUCKETS-1); } is actually any more readable. I'm not sure what you mean by base-16. No matter the radix digit size, as long as it is an integral number of bits, we can mask it off, which is more efficient than modulo. A good compiler should see that it is a constant and convert it to a bit-mask, but I'm not sure I agree that modular arithmetic is more readable. This is fundamentally a bit-twiddling operation, as we are shifting and masking. I tried to explain it in the comment; suggestions to improve that are welcome. + /* +* We need O(n) temporary storage, so we sort back and forth between +* the real array and our tmp storage. To keep them straight, we always +* sort from a into buckets in b. +*/ + struct revindex_entry *tmp = xcalloc(n, sizeof(*tmp)); Shouldn't this be sizeof (struct revindex_entry), since tmp hasn't been declared yet? No, the variable is declared (but uninitialized) in its initializer. Despite its syntax, sizeof() is not a function and does not care about the state of the variable, only its type. Also, s/n/revindex_nr/, and something more appropriate for tmp? What name would you suggest be be more appropriate for tmp? + int bits = 0; + unsigned *pos = xmalloc(BUCKETS * sizeof(*pos)); sizeof(unsigned int), for clarity, if not anything else. I disagree; in general, I prefer using sizeof(*var) rather than sizeof(type), because it avoids repeating ourselves, and there is no compile-time check that you have gotten it right. In the initializer it is less important, because the type is right there. But when you are later doing: memset(pos, 0, BUCKETS * sizeof(*pos)); this is much more robust. If somebody changes the type of pos, the memset line does not need changed. If you used sizeof(unsigned), then the code is now buggy (and the compiler cannot notice). You picked malloc over calloc here, because you didn't want to incur the extra cost of zero-initializing the memory? Yes. We have to zero-initialize in each loop, so there is no point spending the extra effort on calloc. We could also xcalloc inside each loop iteration, but since we need the same-size allocation each time, I hoisted the malloc out of the loop. Also, pos is the actual buckets array, I presume (hence unsigned,
Re: [PATCH 10/10] pack-revindex: radix-sort the revindex
On Wed, Jul 10, 2013 at 10:10:16AM -0700, Brandon Casey wrote: On the linux.git repo, with about 3M objects to sort, this yields a 400% speedup. Here are the best-of-five numbers for running echo HEAD | git cat-file --batch-disk-size, which is dominated by time spent building the pack revindex: before after real0m0.834s 0m0.204s user0m0.788s 0m0.164s sys 0m0.040s 0m0.036s On a smaller repo, the radix sort would not be as impressive (and could even be worse), as we are trading the log(n) factor for the k=4 of the radix sort. However, even on git.git, with 173K objects, it shows some improvement: before after real0m0.046s 0m0.017s user0m0.036s 0m0.012s sys 0m0.008s 0m0.000s k should only be 2 for git.git. I haven't packed in a while, but I think it should all fit within 4G. :) The pathological case would be a pack file with very few very very large objects, large enough to push the pack size over the 2^48 threshold so we'd have to do all four radixes. Yeah, even linux.git fits into k=2. And that does more or less explain the numbers in both cases. For git.git, With 173K objects, log(n) is ~18, so regular sort is 18n. With a radix sort of k=2, which has a constant factor of 2 (you can see by looking at the code that we go through the list twice per radix), we have 4n. So there should be a 4.5x speedup. We don't quite get that, which is probably due to the extra bookkeeping on the buckets. For linux.git, with 3M objects, log(n) is ~22, so the speedup we hope for is 5.5x. We end up with 4x. It's probably worth mentioning here and/or in the code that k is dependent on the pack file size and that we can jump out early for small pack files. That's my favorite part of this code by the way. :) Yeah, I agree it is probably worth mentioning along with the numbers; it is where half of our speedup is coming from. I think the max bits loop condition deserves to be commented, too. I'll add that. Also note that my commit message still refers to --batch-disk-size which does not exist anymore. :) I didn't update the timings in the commit message for my re-roll, but I did confirm that they are the same. + /* +* We need O(n) temporary storage, so we sort back and forth between +* the real array and our tmp storage. To keep them straight, we always +* sort from a into buckets in b. +*/ + struct revindex_entry *tmp = xcalloc(n, sizeof(*tmp)); Didn't notice it the first time I read this, but do we really need calloc here? Or will malloc do? No, a malloc should be fine. I doubt it matters much, but there's no reason not to go the cheap route. + struct revindex_entry *a = entries, *b = tmp; + int bits = 0; + unsigned *pos = xmalloc(BUCKETS * sizeof(*pos)); + + while (max bits) { + struct revindex_entry *swap; + int i; You forgot to make i unsigned. See below too... Oops. Thanks for catching. + /* +* Now we can drop the elements into their correct buckets (in +* our temporary array). We iterate the pos counter backwards +* to avoid using an extra index to count up. And since we are +* going backwards there, we must also go backwards through the +* array itself, to keep the sort stable. +*/ + for (i = n - 1; i = 0; i--) + b[--pos[BUCKET_FOR(a, i, bits)]] = a[i]; ...which is why the above loop still works. Since we are iterating by ones, I guess I can just compare to UINT_MAX. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 06/10] cat-file: add --batch-check=format
On Wed, Jul 10, 2013 at 08:21:15PM +0530, Ramkumar Ramachandra wrote: Jeff King wrote: +If `--batch` or `--batch-check` is given, `cat-file` will read objects +from stdin, one per line, and print information about them. + +You can specify the information shown for each object by using a custom +`format`. The `format` is copied literally to stdout for each +object, with placeholders of the form `%(atom)` expanded, followed by a +newline. The available atoms are: + +If no format is specified, the default format is `%(objectname) +%(objecttype) %(objectsize)`. + +If `--batch` is specified, the object information is followed by the +object contents (consisting of `%(objectsize)` bytes), followed by a +newline. I find this slightly hideous, and would have expected an %(objectcontents) or similar. I looked into doing that, but it makes the code significantly more complicated, assuming you do not want to copy the full object contents in memory. You cannot use strbuf_expand, and you need to worry about buffering/flushing more (you do not want to write() each individual item, but if you are using printf(), you need to flush before using the unbuffered streaming interface). My thinking was to leave it until somebody actually wants it, at which point they can do the necessary refactoring (and hopefully this would be part of unifying it with other format-parsers). If we were designing from scratch and this was the difference between having --batch-check and --batch, or having a single --batch, I'd care more about doing %(objectcontents) right away. But because we must support the historical --batch/--batch-check distinction anyway, I don't think this is any worse. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv3 10/10] pack-revindex: radix-sort the revindex
it by offsets. That does O(n log n) offset comparisons, and profiling shows that we spend most of our time in cmp_offset. However, since we are sorting on a simple off_t, we can use numeric sorts that perform better. A radix sort can run in O(k*n), where k is the number of digits in our number. For a 64-bit off_t, using 16-bit digits gives us k=4. On the linux.git repo, with about 3M objects to sort, this yields a 400% speedup. Here are the best-of-five numbers for running echo HEAD | git cat-file --batch-check=%(objectsize:disk) on a fully packed repository, which is dominated by time spent building the pack revindex: before after real0m0.834s 0m0.204s user0m0.788s 0m0.164s sys 0m0.040s 0m0.036s This matches our algorithmic expectations. log(3M) is ~21.5, so a traditional sort is ~21.5n. Our radix sort runs in k*n, where k is the number of radix digits. In the worst case, this is k=4 for a 64-bit off_t, but we can quit early when the largest value to be sorted is smaller. For any repository under 4G, k=2. Our algorithm makes two passes over the list per radix digit, so we end up with 4n. That should yield ~5.3x speedup. We see 4x here; the difference is probably due to the extra bucket book-keeping the radix sort has to do. On a smaller repo, the difference is less impressive, as log(n) is smaller. For git.git, with 173K objects (but still k=2), we see a 2.7x improvement: before after real0m0.046s 0m0.017s user0m0.036s 0m0.012s sys 0m0.008s 0m0.000s On even tinier repos (e.g., a few hundred objects), the speedup goes away entirely, as the small advantage of the radix sort gets erased by the book-keeping costs (and at those sizes, the cost to generate the the rev-index gets lost in the noise anyway). Signed-off-by: Jeff King p...@peff.net --- pack-revindex.c | 100 +--- 1 file changed, 95 insertions(+), 5 deletions(-) diff --git a/pack-revindex.c b/pack-revindex.c index 1aa9754..b4d2b35 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -59,11 +59,101 @@ static int cmp_offset(const void *a_, const void *b_) /* revindex elements are lazily initialized */ } -static int cmp_offset(const void *a_, const void *b_) +/* + * This is a least-significant-digit radix sort. + * + * It sorts each of the n items in entries by its offset field. The max + * parameter must be at least as large as the largest offset in the array, + * and lets us quit the sort early. + */ +static void sort_revindex(struct revindex_entry *entries, unsigned n, off_t max) { - const struct revindex_entry *a = a_; - const struct revindex_entry *b = b_; - return (a-offset b-offset) ? -1 : (a-offset b-offset) ? 1 : 0; + /* +* We use a digit size of 16 bits. That keeps our memory +* usage reasonable, and we can generally (for a 4G or smaller +* packfile) quit after two rounds of radix-sorting. +*/ +#define DIGIT_SIZE (16) +#define BUCKETS (1 DIGIT_SIZE) + /* +* We want to know the bucket that a[i] will go into when we are using +* the digit that is N bits from the (least significant) end. +*/ +#define BUCKET_FOR(a, i, bits) (((a)[(i)].offset (bits)) (BUCKETS-1)) + + /* +* We need O(n) temporary storage. Rather than do an extra copy of the +* partial results into entries, we sort back and forth between the +* real array and temporary storage. In each iteration of the loop, we +* keep track of them with alias pointers, always sorting from from +* to to. +*/ + struct revindex_entry *tmp = xmalloc(n * sizeof(*tmp)); + struct revindex_entry *from = entries, *to = tmp; + int bits; + unsigned *pos = xmalloc(BUCKETS * sizeof(*pos)); + + /* +* If (max bits) is zero, then we know that the radix digit we are +* on (and any higher) will be zero for all entries, and our loop will +* be a no-op, as everybody lands in the same zero-th bucket. +*/ + for (bits = 0; max bits; bits += DIGIT_SIZE) { + struct revindex_entry *swap; + unsigned i; + + memset(pos, 0, BUCKETS * sizeof(*pos)); + + /* +* We want pos[i] to store the index of the last element that +* will go in bucket i (actually one past the last element). +* To do this, we first count the items that will go in each +* bucket, which gives us a relative offset from the last +* bucket. We can then cumulatively add the index from the +* previous bucket to get the true index. +*/ + for (i = 0; i n; i++) + pos[BUCKET_FOR(from, i, bits)]++; + for (i = 1; i BUCKETS; i++) + pos[i] += pos[i-1
Re: t0008 hang on streaming test (OS X)
On Wed, Jul 10, 2013 at 12:36:40PM -0400, Brian Gernhardt wrote: The newest test in t0008 streaming support for --stdin, seems to hang sporadically on my MacBook Pro (running 10.8.4). The hang seems to be related to running it in parallel with other tests, as I can only reliably cause it by running with prove and -j 3. However, once that has hung I am able to semi-reliably have it occur by running the test separately (with the test hung in the background, using separate trash directories via the --root option). I can't replicate the hang here (on Linux) doing: for i in `seq 1 30`; do ./t0008-ignores.sh --root=/tmp/foo/$i done Do you know which test it is hanging on? You mentioned that you can replicate it outside of prove; what does running with -v say? The last test in t0008, with the fifos, would make me the most suspicious. The way we do it _should_ be fine, but I'm wondering if the shell is blocking in exec here: mkfifo in out (git check-ignore -n -v --stdin in out ) exec 9in That is, if the fifo is not opened for some reason by the backgrounded process (there's a race, of course, but the outer shell should just block until the sub-shell actually opens it). I wonder if the descriptor-opening behavior of: cmd in out is different between shells (that is, if it backgrounds the opening of in and out on some shells, but not on others). But then I would expect it to fail consistently. Just for fun, does switching the middle line there to: (sh -c git check-ignore -n -v --stdin in out ) have any effect? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv3 08/10] cat-file: split --batch input lines on whitespace
On Thu, Jul 11, 2013 at 07:36:53AM -0400, Jeff King wrote: On Wed, Jul 10, 2013 at 08:59:51PM +0530, Ramkumar Ramachandra wrote: Jeff King wrote: git rev-list --objects HEAD | git cat-file --batch-check='%(objectsize) %(text)' If anything, I would have expected %(rest), not %(text). This atom is specific to commands that accept input via stdin (i.e. not log, f-e-r, branch, or anything else I can think of). I considered %(rest) as well. I don't have a strong preference. Here is the patch re-rolled with s/text/rest/. Junio, that makes this and 10/10 the only v3 updates. Let me know if it would be simpler to just resend the whole series. -- 8 -- Subject: [PATCH] cat-file: split --batch input lines on whitespace If we get an input line to --batch or --batch-check that looks like HEAD foo bar, we will currently feed the whole thing to get_sha1(). This means that to use --batch-check with `rev-list --objects`, one must pre-process the input, like: git rev-list --objects HEAD | cut -d' ' -f1 | git cat-file --batch-check Besides being more typing and slightly less efficient to invoke `cut`, the result loses information: we no longer know which path each object was found at. This patch teaches cat-file to split input lines at the first whitespace. Everything to the left of the whitespace is considered an object name, and everything to the right is made available as the %(reset) atom. So you can now do: git rev-list --objects HEAD | git cat-file --batch-check='%(objectsize) %(rest)' to collect object sizes at particular paths. Even if %(rest) is not used, we always do the whitespace split (which means you can simply eliminate the `cut` command from the first example above). This whitespace split is backwards compatible for any reasonable input. Object names cannot contain spaces, so any input with spaces would have resulted in a missing line. The only input hurt is if somebody really expected input of the form HEAD is a fine-looking ref! to fail; it will now parse HEAD, and make is a fine-looking ref! available as %(rest). Signed-off-by: Jeff King p...@peff.net --- Documentation/git-cat-file.txt | 10 -- builtin/cat-file.c | 20 +++- t/t1006-cat-file.sh| 7 +++ 3 files changed, 34 insertions(+), 3 deletions(-) diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt index 06bdc43..68691d4 100644 --- a/Documentation/git-cat-file.txt +++ b/Documentation/git-cat-file.txt @@ -88,8 +88,10 @@ from stdin, one per line, and print information about them. If `--batch` or `--batch-check` is given, `cat-file` will read objects from stdin, one per line, and print information about them. -Each line is considered as a whole object name, and is parsed as if -given to linkgit:git-rev-parse[1]. +Each line is split at the first whitespace boundary. All characters +before that whitespace are considered as a whole object name, and are +parsed as if given to linkgit:git-rev-parse[1]. Characters after that +whitespace can be accessed using the `%(rest)` atom (see below). You can specify the information shown for each object by using a custom `format`. The `format` is copied literally to stdout for each @@ -110,6 +112,10 @@ newline. The available atoms are: The size, in bytes, that the object takes up on disk. See the note about on-disk sizes in the `CAVEATS` section below. +`rest`:: + The text (if any) found after the first run of whitespace on the + input line (i.e., the rest of the line). + If no format is specified, the default format is `%(objectname) %(objecttype) %(objectsize)`. diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 11fa8c0..0e64b41 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -119,6 +119,7 @@ struct expand_data { enum object_type type; unsigned long size; unsigned long disk_size; + const char *rest; /* * If mark_query is true, we do not expand anything, but rather @@ -161,6 +162,9 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len, data-info.disk_sizep = data-disk_size; else strbuf_addf(sb, %lu, data-disk_size); + } else if (is_atom(rest, atom, len)) { + if (!data-mark_query data-rest) + strbuf_addstr(sb, data-rest); } else die(unknown format element: %.*s, len, atom); } @@ -263,7 +267,21 @@ static int batch_objects(struct batch_options *opt) data.mark_query = 0; while (strbuf_getline(buf, stdin, '\n') != EOF) { - int error = batch_one_object(buf.buf, opt, data); + char *p; + int error; + + /* +* Split at first whitespace, tying off the beginning of the +* string and saving the remainder (or NULL) in data.rest
[PATCH 0/7] cat-file --batch-check performance improvements
In my earlier series introducing git cat-file --batch-check=format, found here: http://thread.gmane.org/gmane.comp.version-control.git/229761/focus=230041 I spent a little time optimizing revindex generation, and measured by requesting information on a single object from a large repository. This series takes the next logical step: requesting a large number of objects from a large repository. There are two major optimizations here: 1. Avoiding extra ref lookups due to the warning in 798c35f (get_sha1: warn about full or short object names that look like refs, 2013-05-29). 2. Avoiding extra work for delta type resolution when the user has not asked for %(objecttype). I prepared the series on top of jk/in-pack-size-measurement, and certainly optimization 2 is pointless without it (before that topic, --batch-check always printed the type). However, the first optimization affects regular --batch-check, and represents a much more serious performance regression. It looks like 798c35f is in master, but hasn't been released yet, so assuming these topics graduate before the next release, it should be OK. But if not, we should consider pulling the first patch out and applying it (or something like it) separately. The results for running (in linux.git): $ git rev-list --objects --all objects $ git cat-file --batch-check='%(objectsize:disk)' objects /dev/null are: before after real 1m17.143s 0m7.205s user 0m27.684s 0m6.580s sys 0m49.320s 0m0.608s Now, _most_ of that speedup is coming from the first patch, and it's quite trivial. The rest of the patches involve a lot of refactoring, and only manage to eke out one more second of performance, so it may not be worth it (though I think the result actually cleans up the sha1_object_info_extended interface a bit, and is worth it). Individual timings are in the commit messages. The patches are: [1/7]: cat-file: disable object/refname ambiguity check for batch mode Optimization 1. [2/7]: sha1_object_info_extended: rename status to type [3/7]: sha1_loose_object_info: make type lookup optional [4/7]: packed_object_info: hoist delta type resolution to helper [5/7]: packed_object_info: make type lookup optional [6/7]: sha1_object_info_extended: make type calculation optional Optimization 2. [7/7]: sha1_object_info_extended: pass object_info to helpers Optional cleanup. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode
A common use of cat-file --batch-check is to feed a list of objects from rev-list --objects or a similar command. In this instance, all of our input objects are 40-byte sha1 ids. However, cat-file has always allowed arbitrary revision specifiers, and feeds the result to get_sha1(). Fortunately, get_sha1() recognizes a 40-byte sha1 before doing any hard work trying to look up refs, meaning this scenario should end up spending very little time converting the input into an object sha1. However, since 798c35f (get_sha1: warn about full or short object names that look like refs, 2013-05-29), when we encounter this case, we spend the extra effort to do a refname lookup anyway, just to print a warning. This is further exacerbated by ca91993 (get_packed_ref_cache: reload packed-refs file when it changes, 2013-06-20), which makes individual ref lookup more expensive by requiring a stat() of the packed-refs file for each missing ref. With no patches, this is the time it takes to run: $ git rev-list --objects --all objects $ time git cat-file --batch-check='%(objectname)' objects on the linux.git repository: real1m13.494s user0m25.924s sys 0m47.532s If we revert ca91993, the packed-refs up-to-date check, it gets a little better: real0m54.697s user0m21.692s sys 0m32.916s but we are still spending quite a bit of time on ref lookup (and we would not want to revert that patch, anyway, which has correctness issues). If we revert 798c35f, disabling the warning entirely, we get a much more reasonable time: real0m7.452s user0m6.836s sys 0m0.608s This patch does the moral equivalent of this final case (and gets similar speedups). We introduce a global flag that callers of get_sha1() can use to avoid paying the price for the warning. Signed-off-by: Jeff King p...@peff.net --- The solution feels a little hacky, but I'm not sure there is a better one short of reverting the warning entirely. We could also tie it to warn_ambiguous_refs (or add another config variable), but I don't think that makes sense. It is not about do I care about ambiguities in this repository, but rather I am going to do a really large number of sha1 resolutions, and I do not want to pay the price in this particular code path for the warning). There may be other sites that resolve a large number of refs and run into this, but I couldn't think of any. Doing for_each_ref would not have the same problem, as we already know they are refs there. builtin/cat-file.c | 9 + cache.h| 1 + environment.c | 1 + sha1_name.c| 14 -- 4 files changed, 19 insertions(+), 6 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 0e64b41..fe5c77f 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -266,6 +266,15 @@ static int batch_objects(struct batch_options *opt) strbuf_expand(buf, opt-format, expand_format, data); data.mark_query = 0; + /* +* We are going to call get_sha1 on a potentially very large number of +* objects. In most large cases, these will be actual object sha1s. The +* cost to double-check that each one is not also a ref (just so we can +* warn) ends up dwarfing the actual cost of the object lookups +* themselves. We can work around it by just turning off the warning. +*/ + warn_on_object_refname_ambiguity = 0; + while (strbuf_getline(buf, stdin, '\n') != EOF) { char *p; int error; diff --git a/cache.h b/cache.h index 2d06169..c1fd82c 100644 --- a/cache.h +++ b/cache.h @@ -575,6 +575,7 @@ extern int warn_ambiguous_refs; extern int prefer_symlink_refs; extern int log_all_ref_updates; extern int warn_ambiguous_refs; +extern int warn_on_object_refname_ambiguity; extern int shared_repository; extern const char *apply_default_whitespace; extern const char *apply_default_ignorewhitespace; diff --git a/environment.c b/environment.c index 0cb67b2..5398c36 100644 --- a/environment.c +++ b/environment.c @@ -22,6 +22,7 @@ int warn_ambiguous_refs = 1; int is_bare_repository_cfg = -1; /* unspecified */ int log_all_ref_updates = -1; /* unspecified */ int warn_ambiguous_refs = 1; +int warn_on_object_refname_ambiguity = 1; int repository_format_version; const char *git_commit_encoding; const char *git_log_output_encoding; diff --git a/sha1_name.c b/sha1_name.c index 90419ef..6f8812a 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -452,13 +452,15 @@ static int get_sha1_basic(const char *str, int len, unsigned char *sha1) int at, reflog_len, nth_prior = 0; if (len == 40 !get_sha1_hex(str, sha1)) { - refs_found = dwim_ref(str, len, tmp_sha1, real_ref); - if (refs_found 0 warn_ambiguous_refs) { - warning(warn_msg, len, str); - if (advice_object_name_warning) - fprintf(stderr, %s\n
[PATCH 2/7] sha1_object_info_extended: rename status to type
The value we get from each low-level object_info function (e.g., loose, packed) is actually the object type (or -1 for error). Let's explicitly call it type, which will make further refactorings easier to read. Signed-off-by: Jeff King p...@peff.net --- sha1_file.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 4c2365f..e826066 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2394,7 +2394,7 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) { struct cached_object *co; struct pack_entry e; - int status, rtype; + int type, rtype; co = find_cached_object(sha1); if (co) { @@ -2408,23 +2408,23 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) if (!find_pack_entry(sha1, e)) { /* Most likely it's a loose object. */ - status = sha1_loose_object_info(sha1, oi-sizep, oi-disk_sizep); - if (status = 0) { + type = sha1_loose_object_info(sha1, oi-sizep, oi-disk_sizep); + if (type = 0) { oi-whence = OI_LOOSE; - return status; + return type; } /* Not a loose object; someone else may have just packed it. */ reprepare_packed_git(); if (!find_pack_entry(sha1, e)) - return status; + return type; } - status = packed_object_info(e.p, e.offset, oi-sizep, rtype, - oi-disk_sizep); - if (status 0) { + type = packed_object_info(e.p, e.offset, oi-sizep, rtype, + oi-disk_sizep); + if (type 0) { mark_bad_packed_object(e.p, sha1); - status = sha1_object_info_extended(sha1, oi); + type = sha1_object_info_extended(sha1, oi); } else if (in_delta_base_cache(e.p, e.offset)) { oi-whence = OI_DBCACHED; } else { @@ -2435,7 +2435,7 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) rtype == OBJ_OFS_DELTA); } - return status; + return type; } int sha1_object_info(const unsigned char *sha1, unsigned long *sizep) -- 1.8.3.rc3.24.gec82cb9 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] sha1_loose_object_info: make type lookup optional
Until recently, the only items to request from sha1_object_info_extended were type and size. This meant that we always had to open a loose object file to determine one or the other. But with the addition of the disk_size query, it's possible that we can fulfill the query without even opening the object file at all. However, since the function interface always returns the type, we have no way of knowing whether the caller cares about it or not. This patch only modified sha1_loose_object_info to make type lookup optional using an out-parameter, similar to the way the size is handled (and the return value is 0 or -1 for success or error, respectively). There should be no functional change yet, though, as sha1_object_info_extended, the only caller, will always ask for a type. Signed-off-by: Jeff King p...@peff.net --- Obviously the end goal is to have sha1_object_info_extended do this optionally, too (which happens in patch 6). I'm not too happy about the stat_sha1_file function, which is almost identical to open_sha1_file (and which in turn is almost the same thing as has_loose_object). They all do: try operation X on sha1_file_name(sha1) prepare_alt_odb(); foreach alt_odb try operation X on alt_odb/sha1_file_name(sha1) Unfortunately it's hard to do this kind of factoring out in C, because the argument and return types for operation X are different in these cases; you are stuck with providing callback function that takes a void pointer to some operation-specific data. The boilerplate ends up worse than the repeated code. Another solution would be to have a find the file for loose object Y function, and then just do operation X on that. But since X is a filesystem operation in each case, you do not want to lose the atomicity of performing the operation directly (not to mention incurring the cost of an extra stat() on each file). So I am open to clever refactoring suggestions. sha1_file.c | 48 +++- 1 file changed, 43 insertions(+), 5 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index e826066..39e7313 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1306,6 +1306,26 @@ static int git_open_noatime(const char *name) } } +static int stat_sha1_file(const unsigned char *sha1, struct stat *st) +{ + char *name = sha1_file_name(sha1); + struct alternate_object_database *alt; + + if (!lstat(name, st)) + return 0; + + prepare_alt_odb(); + errno = ENOENT; + for (alt = alt_odb_list; alt; alt = alt-next) { + name = alt-name; + fill_sha1_path(name, sha1); + if (!lstat(alt-base, st)) + return 0; + } + + return -1; +} + static int open_sha1_file(const unsigned char *sha1) { int fd; @@ -2363,7 +2383,9 @@ struct packed_git *find_sha1_pack(const unsigned char *sha1, } -static int sha1_loose_object_info(const unsigned char *sha1, unsigned long *sizep, +static int sha1_loose_object_info(const unsigned char *sha1, + enum object_type *typep, + unsigned long *sizep, unsigned long *disk_sizep) { int status; @@ -2372,6 +2394,20 @@ static int sha1_loose_object_info(const unsigned char *sha1, unsigned long *size git_zstream stream; char hdr[32]; + /* +* If we don't care about type or size, then we don't +* need to look inside the object at all. +*/ + if (!typep !sizep) { + if (disk_sizep) { + struct stat st; + if (stat_sha1_file(sha1, st) 0) + return -1; + *disk_sizep = st.st_size; + } + return 0; + } + map = map_sha1_file(sha1, mapsize); if (!map) return -1; @@ -2386,7 +2422,9 @@ static int sha1_loose_object_info(const unsigned char *sha1, unsigned long *size *sizep = size; git_inflate_end(stream); munmap(map, mapsize); - return status; + if (typep) + *typep = status; + return 0; } /* returns enum object_type or negative */ @@ -2408,8 +2446,8 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) if (!find_pack_entry(sha1, e)) { /* Most likely it's a loose object. */ - type = sha1_loose_object_info(sha1, oi-sizep, oi-disk_sizep); - if (type = 0) { + if (!sha1_loose_object_info(sha1, type, + oi-sizep, oi-disk_sizep)) { oi-whence = OI_LOOSE; return type; } @@ -2417,7 +2455,7 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) /* Not a loose object; someone else may have
[PATCH 5/7] packed_object_info: make type lookup optional
Currently, packed_object_info can save some work by not calculating the size or disk_size of the object if the caller is not interested. However, it always calculates the true object type, whether the caller cares or not, and only optionally returns the easy-to-get representation type. Let's swap these types. The function will now return the representation type (or OBJ_BAD on failure), and will only optionally fill in the true type. There should be no behavior change yet, as the only caller, sha1_object_info_extended, will always feed it a type pointer. Signed-off-by: Jeff King p...@peff.net --- sha1_file.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 6f4aff3..2a1e230 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1783,7 +1783,7 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, } static int packed_object_info(struct packed_git *p, off_t obj_offset, - unsigned long *sizep, int *rtype, + enum object_type *typep, unsigned long *sizep, unsigned long *disk_sizep) { struct pack_window *w_curs = NULL; @@ -1791,11 +1791,12 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, off_t curpos = obj_offset; enum object_type type; + /* +* We always get the representation type, but only convert it to +* a real type later if the caller is interested. +*/ type = unpack_object_header(p, w_curs, curpos, size); - if (rtype) - *rtype = type; /* representation type */ - if (sizep) { if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) { off_t tmp_pos = curpos; @@ -1820,7 +1821,13 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, *disk_sizep = revidx[1].offset - obj_offset; } - type = packed_to_object_type(p, obj_offset, type, w_curs, curpos); + if (typep) { + *typep = packed_to_object_type(p, obj_offset, type, w_curs, curpos); + if (*typep 0) { + type = OBJ_BAD; + goto out; + } + } out: unuse_pack(w_curs); @@ -2471,11 +2478,11 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) return -1; } - type = packed_object_info(e.p, e.offset, oi-sizep, rtype, - oi-disk_sizep); - if (type 0) { + rtype = packed_object_info(e.p, e.offset, type, oi-sizep, + oi-disk_sizep); + if (rtype 0) { mark_bad_packed_object(e.p, sha1); - type = sha1_object_info_extended(sha1, oi); + return sha1_object_info_extended(sha1, oi); } else if (in_delta_base_cache(e.p, e.offset)) { oi-whence = OI_DBCACHED; } else { -- 1.8.3.rc3.24.gec82cb9 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] packed_object_info: hoist delta type resolution to helper
To calculate the type of a packed object, we must walk down its delta chain until we hit a true base object with a real type. Most of the code in packed_object_info is for handling this case. Let's hoist it out into a separate helper function, which will make it easier to make the type-lookup optional in the future (and keep our indentation level sane). Signed-off-by: Jeff King p...@peff.net --- Annoyingly, since the part we are moving comprises the bulk of the function, the diff tends to show the opposite of what actually happened: it looks like we renamed the function to its helper name, and moved the other parts to a new function, which happens to have the same name as the old one. So read the diff with care. :) sha1_file.c | 93 +++-- 1 file changed, 53 insertions(+), 40 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 39e7313..6f4aff3 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1713,52 +1713,21 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, return type; } - #define POI_STACK_PREALLOC 64 -static int packed_object_info(struct packed_git *p, off_t obj_offset, - unsigned long *sizep, int *rtype, - unsigned long *disk_sizep) +static enum object_type packed_to_object_type(struct packed_git *p, + off_t obj_offset, + enum object_type type, + struct pack_window **w_curs, + off_t curpos) { - struct pack_window *w_curs = NULL; - unsigned long size; - off_t curpos = obj_offset; - enum object_type type; off_t small_poi_stack[POI_STACK_PREALLOC]; off_t *poi_stack = small_poi_stack; int poi_stack_nr = 0, poi_stack_alloc = POI_STACK_PREALLOC; - type = unpack_object_header(p, w_curs, curpos, size); - - if (rtype) - *rtype = type; /* representation type */ - - if (sizep) { - if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) { - off_t tmp_pos = curpos; - off_t base_offset = get_delta_base(p, w_curs, tmp_pos, - type, obj_offset); - if (!base_offset) { - type = OBJ_BAD; - goto out; - } - *sizep = get_size_from_delta(p, w_curs, tmp_pos); - if (*sizep == 0) { - type = OBJ_BAD; - goto out; - } - } else { - *sizep = size; - } - } - - if (disk_sizep) { - struct revindex_entry *revidx = find_pack_revindex(p, obj_offset); - *disk_sizep = revidx[1].offset - obj_offset; - } - while (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) { off_t base_offset; + unsigned long size; /* Push the object we're going to leave behind */ if (poi_stack_nr = poi_stack_alloc poi_stack == small_poi_stack) { poi_stack_alloc = alloc_nr(poi_stack_nr); @@ -1769,11 +1738,11 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, } poi_stack[poi_stack_nr++] = obj_offset; /* If parsing the base offset fails, just unwind */ - base_offset = get_delta_base(p, w_curs, curpos, type, obj_offset); + base_offset = get_delta_base(p, w_curs, curpos, type, obj_offset); if (!base_offset) goto unwind; curpos = obj_offset = base_offset; - type = unpack_object_header(p, w_curs, curpos, size); + type = unpack_object_header(p, w_curs, curpos, size); if (type = OBJ_NONE) { /* If getting the base itself fails, we first * retry the base, otherwise unwind */ @@ -1800,7 +1769,6 @@ out: out: if (poi_stack != small_poi_stack) free(poi_stack); - unuse_pack(w_curs); return type; unwind: @@ -1814,6 +1782,51 @@ unwind: goto out; } +static int packed_object_info(struct packed_git *p, off_t obj_offset, + unsigned long *sizep, int *rtype, + unsigned long *disk_sizep) +{ + struct pack_window *w_curs = NULL; + unsigned long size; + off_t curpos = obj_offset; + enum object_type type; + + type = unpack_object_header(p, w_curs, curpos, size); + + if (rtype) + *rtype = type; /* representation type */ + + if (sizep) { + if (type
[PATCH 6/7] sha1_object_info_extended: make type calculation optional
Each caller of sha1_object_info_extended sets up an object_info struct to tell the function which elements of the object it wants to get. Until now, getting the type of the object has always been required (and it is returned via the return type rather than a pointer in object_info). This can involve actually opening a loose object file to determine its type, or following delta chains to determine a packed file's base type. These effects produce a measurable slow-down when doing a cat-file --batch-check that does not include %(objecttype). This patch adds a typep query to struct object_info, so that it can be optionally queried just like size and disk_size. As a result, the return type of the function is no longer the object type, but rather 0/-1 for success/error. As there are only three callers total, we just fix up each caller rather than keep a compatibility wrapper: 1. The simpler sha1_object_info wrapper continues to always ask for and return the type field. 2. The istream_source function wants to know the type, and so always asks for it. 3. The cat-file batch code asks for the type only when %(objecttype) is part of the format string. On linux.git, the best-of-five for running: $ git rev-list --objects --all objects $ time git cat-file --batch-check='%(objectsize:disk)' on a fully packed repository goes from: real0m8.680s user0m8.160s sys 0m0.512s to: real0m7.205s user0m6.580s sys 0m0.608s Signed-off-by: Jeff King p...@peff.net --- This ends up changing the behavior of sha1_object_info_extended without changing its function signature. Given that it is a fairly inactive area of the code and that there are no topics in flight, I think this is OK. But an alternative could be to add (yet another) wrapper to leave the first two call-sites untouched. builtin/cat-file.c | 7 --- cache.h| 1 + sha1_file.c| 20 +--- streaming.c| 2 +- 4 files changed, 19 insertions(+), 11 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index fe5c77f..163ce6c 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -150,7 +150,9 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len, if (!data-mark_query) strbuf_addstr(sb, sha1_to_hex(data-sha1)); } else if (is_atom(objecttype, atom, len)) { - if (!data-mark_query) + if (data-mark_query) + data-info.typep = data-type; + else strbuf_addstr(sb, typename(data-type)); } else if (is_atom(objectsize, atom, len)) { if (data-mark_query) @@ -229,8 +231,7 @@ static int batch_one_object(const char *obj_name, struct batch_options *opt, return 0; } - data-type = sha1_object_info_extended(data-sha1, data-info); - if (data-type = 0) { + if (sha1_object_info_extended(data-sha1, data-info) 0) { printf(%s missing\n, obj_name); fflush(stdout); return 0; diff --git a/cache.h b/cache.h index c1fd82c..d3b770c 100644 --- a/cache.h +++ b/cache.h @@ -1130,6 +1130,7 @@ struct object_info { struct object_info { /* Request */ + enum object_type *typep; unsigned long *sizep; unsigned long *disk_sizep; diff --git a/sha1_file.c b/sha1_file.c index 2a1e230..52f7a1e 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2452,24 +2452,26 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) { struct cached_object *co; struct pack_entry e; - int type, rtype; + int rtype; co = find_cached_object(sha1); if (co) { + if (oi-typep) + *(oi-typep) = co-type; if (oi-sizep) *(oi-sizep) = co-size; if (oi-disk_sizep) *(oi-disk_sizep) = 0; oi-whence = OI_CACHED; - return co-type; + return 0; } if (!find_pack_entry(sha1, e)) { /* Most likely it's a loose object. */ - if (!sha1_loose_object_info(sha1, type, + if (!sha1_loose_object_info(sha1, oi-typep, oi-sizep, oi-disk_sizep)) { oi-whence = OI_LOOSE; - return type; + return 0; } /* Not a loose object; someone else may have just packed it. */ @@ -2478,7 +2480,7 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) return -1; } - rtype = packed_object_info(e.p, e.offset, type, oi-sizep, + rtype = packed_object_info(e.p, e.offset, oi-typep, oi-sizep, oi-disk_sizep); if (rtype 0
[PATCH 7/7] sha1_object_info_extended: pass object_info to helpers
We take in a struct object_info which contains pointers to storage for items the caller cares about. But then rather than pass the whole object to the low-level loose/packed helper functions, we pass the individual pointers. Let's pass the whole struct instead, which will make adding more items later easier. Signed-off-by: Jeff King p...@peff.net --- This one is an optional cleanup. The diff is quite noisy due to all of the s/foo/oi-foo/, so it is arguable whether the result is nicer or not. It would make later additions to object_info nicer, but I do not plan to add any more. It _would_ have been a nice cleanup to do at the beginning of the series (and further diffs would not have to add extra parameters to the function calls), but that would make the incremental learn to do type optionally patches quite awkward. So I am on the fence over this one, and do not mind too much if it gets dropped. sha1_file.c | 49 ++--- 1 file changed, 22 insertions(+), 27 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 52f7a1e..563f521 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1783,8 +1783,7 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, } static int packed_object_info(struct packed_git *p, off_t obj_offset, - enum object_type *typep, unsigned long *sizep, - unsigned long *disk_sizep) + struct object_info *oi) { struct pack_window *w_curs = NULL; unsigned long size; @@ -1797,7 +1796,7 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, */ type = unpack_object_header(p, w_curs, curpos, size); - if (sizep) { + if (oi-sizep) { if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) { off_t tmp_pos = curpos; off_t base_offset = get_delta_base(p, w_curs, tmp_pos, @@ -1806,24 +1805,24 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, type = OBJ_BAD; goto out; } - *sizep = get_size_from_delta(p, w_curs, tmp_pos); - if (*sizep == 0) { + *oi-sizep = get_size_from_delta(p, w_curs, tmp_pos); + if (*oi-sizep == 0) { type = OBJ_BAD; goto out; } } else { - *sizep = size; + *oi-sizep = size; } } - if (disk_sizep) { + if (oi-disk_sizep) { struct revindex_entry *revidx = find_pack_revindex(p, obj_offset); - *disk_sizep = revidx[1].offset - obj_offset; + *oi-disk_sizep = revidx[1].offset - obj_offset; } - if (typep) { - *typep = packed_to_object_type(p, obj_offset, type, w_curs, curpos); - if (*typep 0) { + if (oi-typep) { + *oi-typep = packed_to_object_type(p, obj_offset, type, w_curs, curpos); + if (*oi-typep 0) { type = OBJ_BAD; goto out; } @@ -2404,9 +2403,7 @@ static int sha1_loose_object_info(const unsigned char *sha1, } static int sha1_loose_object_info(const unsigned char *sha1, - enum object_type *typep, - unsigned long *sizep, - unsigned long *disk_sizep) + struct object_info *oi) { int status; unsigned long mapsize, size; @@ -2418,12 +2415,12 @@ static int sha1_loose_object_info(const unsigned char *sha1, * If we don't care about type or size, then we don't * need to look inside the object at all. */ - if (!typep !sizep) { - if (disk_sizep) { + if (!oi-typep !oi-sizep) { + if (oi-disk_sizep) { struct stat st; if (stat_sha1_file(sha1, st) 0) return -1; - *disk_sizep = st.st_size; + *oi-disk_sizep = st.st_size; } return 0; } @@ -2431,19 +2428,19 @@ static int sha1_loose_object_info(const unsigned char *sha1, map = map_sha1_file(sha1, mapsize); if (!map) return -1; - if (disk_sizep) - *disk_sizep = mapsize; + if (oi-disk_sizep) + *oi-disk_sizep = mapsize; if (unpack_sha1_header(stream, map, mapsize, hdr, sizeof(hdr)) 0) status = error(unable to unpack %s header, sha1_to_hex(sha1)); else if ((status = parse_sha1_header(hdr, size)) 0
Re: [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode
On Fri, Jul 12, 2013 at 10:47:46AM +0200, Michael Haggerty wrote: The solution feels a little hacky, but I'm not sure there is a better one short of reverting the warning entirely. We could also tie it to warn_ambiguous_refs (or add another config variable), but I don't think that makes sense. It is not about do I care about ambiguities in this repository, but rather I am going to do a really large number of sha1 resolutions, and I do not want to pay the price in this particular code path for the warning). There may be other sites that resolve a large number of refs and run into this, but I couldn't think of any. Doing for_each_ref would not have the same problem, as we already know they are refs there. ISTM that callers of --batch-check would usually know whether they are feeding it SHA-1s or arbitrary object specifiers. (And in most cases when there are a large number of them, they are probably all SHA-1s). Thanks for bringing this up; I had meant to outline this option in the cover letter, but forgot when posting. If were designing cat-file from scratch, I'd consider having --batch take just 40-hex sha1s in the first place. Since we can't do that now for backwards compatibility, having an option is the best we can do there. But having to use such an option to avoid a 10x slowdown just strikes me as ugly for two reasons: 1. It's part of the user-visible interface, and it seems like an extra wtf? moment when somebody wonders why their script is painfully slow, and why they need a magic option to fix it. We're cluttering the interface forever. 2. It's a sign that the refname check was put in for a specific performance profile (resolving one or a few refs), but didn't consider resolving a large number. I'm wondering what other performance regressions it's possible to trigger. So instead of framing this change as skip object refname ambiguity check (i.e., sacrifice some correctness for performance), perhaps it would be less hacky to offer the following new feature: I wouldn't frame it as correctness for performance at all. The check does not impact behavior at all, and is purely informational (to catch quite a rare situation, too). I'd argue that the check simply isn't that interesting in this case in the first place. To cat-file we could add an option like --sha1-only or --literal or --no-dwim (... better names are failing me) which would skip *all* dwimming of 40-character strings. It would also assume that any shorter strings are abbreviated SHA-1s and fail if they are not. This would be a nice feature by itself (these are object names, dammit, and don't try to tell me differently!) and would have the additional small advantage of speeding up lookups of abbreviated SHA-1s, which (regardless of your patch) otherwise go through the whole DWIM process. I can see in theory that somebody might want that, but I am having a hard time thinking of a practical use. I understand that such an option alone would leave some old scripts suffering the performance regression caused by the check, but in most cases it would be easy to fix them. I'm less worried about old scripts, and more worried about carrying around a confusing option forever, especially when I expect most invocations to use it (perhaps my experience is biased, but I use cat-file --batch quite a lot in scripting, and it has almost always been on the output of rev-list). IOW, it seems like a poor default, and we are choosing it only because of backwards compatibility. I guess another option is to switch the default with the usual deprecation dance. Yet another option is to consider what the check is doing, and accomplish the same thing in a different way. The real pain is that we are individually trying to resolve each object by hitting the filesystem (and doing lots of silly checks on the refname format, when we know it must be valid). We don't actually care in this case if the ref list is up to date (we are not trying to update or read a ref, but only know if it exists, and raciness is OK). IOW, could we replace the dwim_ref call for the warning with something that directly queries the ref cache? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] config: add support for http.url.* settings
On Thu, Jul 11, 2013 at 02:50:03PM -0700, Kyle J. McKay wrote: The url value is considered a match to a url if the url value is either an exact match or a prefix of the url which ends on a path component boundary ('/'). So https://example.com/test; will match https://example.com/test; and https://example.com/test/too; but not https://example.com/testextra;. Longer matches take precedence over shorter matches with environment variable settings taking precedence over all. With this configuration: [http] useragent = other-agent noEPSV = true [http https://example.com;] useragent = example-agent sslVerify = false [http https://example.com/path;] useragent = path-agent I like the general direction you are going, and especially the path prefix matching is something I had always meant to implement for the credential code. A few comments on the implementation: +enum http_option_type { + opt_post_buffer = 0, + opt_min_sessions, +#ifdef USE_CURL_MULTI + opt_max_requests, +#endif + opt_ssl_verify, + opt_ssl_try, + opt_ssl_cert, +#if LIBCURL_VERSION_NUM = 0x070903 + opt_ssl_key, +#endif +#if LIBCURL_VERSION_NUM = 0x070908 + opt_ssl_capath, +#endif + opt_ssl_cainfo, + opt_low_speed, + opt_low_time, + opt_no_epsv, + opt_http_proxy, + opt_cookie_file, + opt_user_agent, + opt_passwd_req, + opt_max +}; We usually put enum fields in ALL_CAPS to mark them as constants (though you can find few exceptions in the code). +static size_t http_options_url_match_prefix(const char *url, + const char *url_prefix, + size_t url_prefix_len) +{ It looks like you're matching the URLs as raw strings, and I don't see any canonicalization going on. What happens if I have https://example.com/foo+bar; in my config, but then I visit https://example.comfoo%20bar;? Or what about optional components? If I have https://example.com; in my config, but I am visiting https://p...@example.com/;, shouldn't it match? The config spec is more general than my specific URL; you would not want it to match in the opposite direction, though. I think you would want to break down the URL into fields, canonicalize each field by undoing any encoding, and then compare the broken-down URLs, as credential_match does (adding in your prefix/boundary matching to the path instead of a straight string comparison). I think you could mostly reuse the code there by doing: 1. Create a struct url that contains the broken-down form; struct credential would contain a url. 2. Pull out credential_from_url into int url_parse(struct url *, const char *). 3. Pull out credential_match into int url_match(struct url *, struct url *). 4. Parse and compare URLs from http.$URL.* during the config read (similar to credential_config_callback). That does make your by-length ordering impossible, but I don't think you can do it in the general case. If I have: [http http://p...@example.com;] foo = 1 [http http://example.com:8080;] foo = 2 and I visit http://p...@example.com:8080;, which one is the winner? I don't think there is an unambiguous definition. I'd suggest instead just using the usual last one wins strategy that our config uses. It has the unfortunate case that: [http http://example.com;] foo = 1 [http] foo = 2 will always choose http.foo=2, but I don't think that is a big problem in practice. You often only set one or the other, and in the cases where you want to have a non-standard default and then override it with another host-specific case, the last one wins rule makes it simple to explain that you need to specify keys in most-general to most-specific order. static int http_options(const char *var, const char *value, void *cb) { - if (!strcmp(http.sslverify, var)) { + const char *url = (const char *)cb; No need to cast from void, this isn't C++. :) The rest of the http_options() changes look like what I'd expect. @@ -344,7 +479,7 @@ void http_init(struct remote *remote, const char *url, int proactive_auth) http_is_verbose = 0; - git_config(http_options, NULL); + git_config(http_options, (void *)url); No need to cast again. :) So this is the URL that we got on the command line. Which means that if we get a redirect, we will not re-examine the options. I think that's OK; we do not even _see_ the redirect, as it happens internally within curl. The credential code has the same problem, but it is not a security issue because curl clears the credentials on redirect. However, it may be worth mentioning in the documentation that the config options operate on the URL you give git, _not_ necessarily on the actual URL you are hitting at any given moment (the gitcredentials(7) page should probably make the same distinction). -Peff -- To unsubscribe
Re: [PATCH v5 0/5] allow more sources for config values
On Thu, Jul 11, 2013 at 04:26:02PM -0700, Junio C Hamano wrote: Thanks. The differences since the last round I see are these. And I think the series overall makes sense (I haven't look hard enough to pick any nits yet, though). Both v4 and the interdiff look fine to me. I gave it one more cursory read-through, and I don't see any problems. So: Acked-by: Jeff King p...@peff.net -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] t0008: avoid SIGPIPE race condition on fifo
On Thu, Jul 11, 2013 at 09:09:55AM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: On Wed, Jul 10, 2013 at 12:36:40PM -0400, Brian Gernhardt wrote: The newest test in t0008 streaming support for --stdin, seems to hang sporadically on my MacBook Pro (running 10.8.4). The hang seems to be related to running it in parallel with other tests, as I can only reliably cause it by running with prove and -j 3. However, once that has hung I am able to semi-reliably have it occur by running the test separately (with the test hung in the background, using separate trash directories via the --root option). I can't replicate the hang here (on Linux) doing: for i in `seq 1 30`; do ./t0008-ignores.sh --root=/tmp/foo/$i done It seems to hang on me with bash, but not dash, here. Thanks, I was able to replicate it with bash, and like Brian, I saw it hanging in the second read. strace revealed that we were blocked on open(out). The patch below should fix it. I'm still not sure why the choice of shell matters; it may simply be a timing fluke that bash is more likely to hit for some reason. -- 8 -- Subject: [PATCH] t0008: avoid SIGPIPE race condition on fifo To test check-ignore's --stdin feature, we use two fifos to send and receive data. We carefully keep a descriptor to its input open so that it does not receive EOF between input lines. However, we do not do the same for its output. That means there is a potential race condition in which check-ignore has opened the output pipe once (when we read the first line), and then writes the second line before we have re-opened the pipe. In that case, check-ignore gets a SIGPIPE and dies. The outer shell then tries to open the output fifo but blocks indefinitely, because there is no writer. We can fix it by keeping a descriptor open through the whole procedure. This should also help if check-ignore dies for any other reason (we would already have opened the fifo and would therefore not block, but just get EOF on read). However, we are technically still susceptible to check-ignore dying early, before we have opened the fifo. This is an unlikely race and shouldn't generally happen in practice, though, so we can hopefully ignore it. Signed-off-by: Jeff King p...@peff.net --- t/t0008-ignores.sh | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh index a56db80..c29342d 100755 --- a/t/t0008-ignores.sh +++ b/t/t0008-ignores.sh @@ -697,13 +697,21 @@ test_expect_success PIPE 'streaming support for --stdin' ' # shell, and then echo to the fd. We make sure to close it at # the end, so that the subprocess does get EOF and dies # properly. + # + # Similarly, we must keep out open so that check-ignore does + # not ever get SIGPIPE trying to write to us. Not only would that + # produce incorrect results, but then there would be no writer on the + # other end of the pipe, and we would potentially block forever trying + # to open it. exec 9in + exec 8out test_when_finished exec 9- + test_when_finished exec 8- echo 9 one - read response out + read response 8 echo $response | grep ^\.gitignore:1:one one echo 9 two - read response out + read response 8 echo $response | grep ^::two ' -- 1.8.3.rc1.30.gff0fb75 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] cat-file --batch-check performance improvements
On Fri, Jul 12, 2013 at 10:23:34AM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: The results for running (in linux.git): $ git rev-list --objects --all objects $ git cat-file --batch-check='%(objectsize:disk)' objects /dev/null I can see how these patches are very logical avenue to grab only on-disk footprint for large number of objects, but among the type, payload size and on-disk footprint, I find it highly narrow niche that a real user or script is interested _only_ in on-disk footprint without even worrying about the type of object. Yeah, I agree it is a bit of a niche. However, there are other code paths that might want only the size and not the type (e.g., we already know the object is a blob, but want to know size before deciding how to handle diff). But in general, I doubt the performance impact is a big deal there. It's only measurable when you're doing millions of objects. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] t0008: avoid SIGPIPE race condition on fifo
On Fri, Jul 12, 2013 at 09:23:54AM -0700, Junio C Hamano wrote: In that case, check-ignore gets a SIGPIPE and dies. The outer shell then tries to open the output fifo but blocks indefinitely, because there is no writer. We can fix it by keeping a descriptor open through the whole procedure. Ahh, figures. I wish I were smart enough to figure that out immediately after seeing the test that does funny things to in with 9. I wish I were smart enough to have figured it out when I was writing the funny stuff with in and 9 in the first place. :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] wt-status: use format function attribute for status_printf
On Fri, Jul 12, 2013 at 09:10:30AM -0700, Junio C Hamano wrote: You can fix it with -Wno-zero-format-length, so the hassle is not huge. But I am also inclined to just drop this one. We have lived without the extra safety for a long time, and list review does tend to catch such problems in practice. I am tempted to actually merge the original one as-is without any of the workaround, and just tell people to use -Wno-format-zero-length. Yeah, I think the only downside is the cognitive burden on individual developers who try -Wall and have to figure out that we need -Wno-zero-format-length (and that the warnings are not interesting). It would be nice to add it automatically to CFLAGS, but I do not know if we can reliably detect from the Makefile that we are compiling under gcc. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode
On Sun, Jul 14, 2013 at 08:45:37PM -0700, Junio C Hamano wrote: To cat-file we could add an option like --sha1-only or --literal or --no-dwim (... better names are failing me) which would skip *all* dwimming of 40-character strings. It would also assume that any shorter strings are abbreviated SHA-1s and fail if they are not. This would be a nice feature by itself (these are object names, dammit, and don't try to tell me differently!) and would have the additional small advantage of speeding up lookups of abbreviated SHA-1s, which (regardless of your patch) otherwise go through the whole DWIM process. I can see in theory that somebody might want that, but I am having a hard time thinking of a practical use. Would it be a good alternative to call get_sha1_hex() to catch the most common case (reading from rev-list output, for example) and then let the more general get_sha1() to let extended SHA-1 to be handled? For a 40-byte sha1, that _should_ be what get_sha1 does (i.e., go more or less directly to the 40-hex code path, and return). And that's basically what happens now, except that after we do so, we now have the extra oh, is it also a refname? check. For a shortened sha1, I don't think it would have the same behavior. Right now, I believe the order is to treat a short sha1 as a possible refname, and only if that fails consider it as a short sha1. IOW, it seems like a poor default, and we are choosing it only because of backwards compatibility. I guess another option is to switch the default with the usual deprecation dance. I agree that did you mean the unreadable refname or 40-hex object? turned on everywhere get_sha1() is called is a very poor default. I wonder if we can limit it only to the end-user input somehow at the API level. It is easy to do on top of my patch (just flip the default on the switch I introduced, and turn it back on in whichever code paths are appropriate). But the question is: what is end-user input? Do you mean command-line arguments to rev-list and friends? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode
On Fri, Jul 12, 2013 at 12:30:07PM +0200, Michael Haggerty wrote: But with particular respect to git cat-file, I see problems: 1. get_ref_snapshot() would have to read all loose and packed refs within the specified subtree, because loose refs have to be read before packed refs. So the call would be expensive if there are a lot of loose refs. And DWIM wouldn't know in advance where the references might be, so it would have to set prefix=. If many refs are looked up, then it would presumably be worth it. But if only a couple of lookups are done and there are a lot of loose refs, then using a cache would probably slow things down. The slowdown could be ameliorated by adding some more intelligence, for example only populating the loose refs cache after a certain number of lookups have already been done. I had assumed we would load the loose ref cache progressively by directory. Of course that only helps avoiding refs/foo/* when you are interested in refs/heads/foo. If your refs/heads/* is very large, you have to load all of it, and at some point it is cheaper to do a few stat() calls than to actually readdir() the whole directory. On the other hand, that is basically how packed-refs work now (if we hit any ref, we have to load the whole file), and repos with many refs would typically pack them all anyway. 2. A git cat-file --batch process can be long-lived. What guarantees would users expect regarding its lookup results? I hadn't really intended this for general lookups, but just to speed up the refname warning, where being out-of-date is more acceptable (since the warning is a purely informational hint). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] config: add support for http.url.* settings
On Fri, Jul 12, 2013 at 06:07:35AM -0700, Kyle J. McKay wrote: It looks like you're matching the URLs as raw strings, and I don't see any canonicalization going on. What happens if I have https://example.com/foo+bar; in my config, but then I visit https://example.comfoo%20bar;? The documentation for http.url.* says: Yes, I know. My question was more rhetorical, as in what _should_ happen. It seems unfriendly not to do at least some basic canonicalization with respect to encodings. That does make your by-length ordering impossible, but I don't think you can do it in the general case. If I have: [http http://p...@example.com;] foo = 1 [http http://example.com:8080;] foo = 2 and I visit http://p...@example.com:8080;, which one is the winner? If I were to spilt everything out, then I would only have the second one match. The first config is on a different port, quite possibly an entirely different service. It shouldn't match. Yeah, sorry, that was a bad example, because a missing port is not do not care, but rather default port. A better example is: [http http://p...@example.com/;] foo = 1 [http http://example.com/path/;] foo = 2 If we see the URL http://p...@example.com/path/foo;, I would expect the second one to match (it does not under a pure-prefix scheme). If we were to make it match, you cannot say which of the two entries is more specific they are both specific in different ways. I don't think it's necessary to split the URL apart though. Whatever URL the user gave to git on the command line (at some point even if it's now stored as a remote setting in config) complete with URL- encoding, user names, port names, etc. is the same url, possibly shortened, that needs to be used for the http.url.option setting. I'm not sure I agree with this. The URL handed to git is not always typed directly by the user. E.g., it might be cut-and-paste from an email or website, or it may even be fed by a script (e.g., the repo tool will pull URLs from its manifest file). I think that's simple and very easy to explain and avoids user confusion and surprise while still allowing a default to be set for a site but easily overridden for a portion of that site without needing to worry about the order config files are processed or the order of the [http url] sections within them. But the user has to worry about last-one-wins anyway for same-length prefixes, as you noted below. using the usual last one wins strategy that our config uses. It has the unfortunate case that: [http http://example.com;] foo = 1 [http] foo = 2 will always choose http.foo=2, but I don't think that is a big problem in practice. I think that violates the principle of least surprise. In this case: [http https://cacert.org;] sslVerify = false [http] sslVerify = true Sure, I think yours is less surprising in this case. But it is more surprising in other cases, like ones where URL encoding or optional components are involved. E.g., imagine your two options above were flipped (you do not usually verify ssl, but it is very important for you to do so against cacert.org). An automated tool like repo tries to clone from https://u...@cacert.org, and you might be surprised that SSL verification is not turned on. - git_config(http_options, NULL); + git_config(http_options, (void *)url); No need to cast again. :) Ah, but there is in order to avoid a warning: Ah, you're right. Sorry for the noise. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] config: add support for http.url.* settings
On Sat, Jul 13, 2013 at 12:46:17PM -0700, Kyle J. McKay wrote: I expect it will be easier just to normalize the URL without splitting. That is, lowercase the parts that are case-insensitive (scheme and host name) and adjust the URL-escaping to remove URL escaping (%xx) from characters that don't need it but add it to any for which it is required that are not escaped (according to RFC 1738). I think you are suggesting doing better than this, but just to be clear, we cannot treat the URL as a simple string and just decode and re-encode. One of the things that gets encoded are the delimiting characters. So if I have the URL: https://foo%3a...@example.com you would canonicalize it into: https://foo:b...@example.com But those are two different URLs entirely; the first has the username foo:bar, and the second has the username foo and the password bar. I admit that these are unlikely to come up in practice, but I am worried that there is some room for mischief here. For example: https://example.com%2ftricky.host/repo.git If we canonicalize that into: https://example.com/tricky.host/repo.git and do a lookup, we think we are hitting example.com, but we are actually hitting example.comtricky.host (i.e., that is how curl will interpret it). If we were deciding to use a stored credential based on that information, it would be quite bad (we would leak credentials to the owner of comtricky.host). I know your patch does not impact the credential lookup behavior, but it would be nice in the long run if the two lookups followed the same rules. So I think the three options are basically: 1. No decoding, require the user to use a consistent prefix between config and other uses of the URL. I.e., your current patch. The downside is that it doesn't handle any variation of input. 2. Full decoding into constituent parts. This handles canonicalization of encoding, and also allows wildcard components (e.g., a URL with username can match the generic https://example.com; in the config). The downside is that you cannot do a longest prefix wins rule for overriding. 3. Full decoding as in (2), but then re-assemble into a canonicalized encoded URL. The upside is that you get to do longest prefix wins, but you can no longer have wildcard components. I think this is what you are suggesting in your mail. I'm still in favor of (2), because I think the wildcard components are important (and while I agree that the longest prefix wins is nicer, we already have last one wins for the rest of the config, including the credential URL matcher). But I certainly think (3) is better than (1). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] config: add support for http.url.* settings
On Sun, Jul 14, 2013 at 09:02:19PM -0700, Junio C Hamano wrote: Or proceed with what's there right now (there are a few pending updates from reviewers) and then, as Junio says above, adjust it later if needed? I have been assuming that strictly textual match will be a subset of the matching semantics Aaron and Peff suggested. That is, if we include your version in the upcoming release, the user writes the http.URLpattern.variable configuration so that the entries match what they want them to match, the enhanced URL matcher Aaron and Peff suggested will still make them match. Am I mistaken? Will there be some URLpattern that will not match with the same URL literally? I think we need to decide now, because the two schemes are not compatible, and switching will break setups. Yes, the matcher that Aaron and I suggest is a strict superset (i.e., we will not stop matching things that used to match), which is good. But we would not be able to implement longest prefix wins overriding anymore, which would change the meaning of cases like: [http https://example.com;] foo = 1 [http] foo = 2 (under Kyle's scheme it is 1, and under ours 2). We can probably come up with some clever rules for overriding a broken-down URL that would stay backwards compatible. E.g., longest-prefix-match if there are no wildcarded components, and last-one-wins if there are. But that is not a rule I would want readers to have to puzzle out in the documentation. So I think we are much better off to decide the semantics now. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix some sparse warnings
On Tue, Jul 16, 2013 at 07:57:20AM +0200, Johannes Sixt wrote: Am 7/15/2013 19:31, schrieb Ramsay Jones: Sparse issues three Using plain integer as NULL pointer warnings. Each warning relates to the use of an '{0}' initialiser expression in the declaration of an 'struct object_info'. I question the value of this warning. Initialization with '= {0}' is a well-established idiom, and sparse should know about it. Also, plain 0 *is* a null pointer constant. I agree with you. It's not a bug, and I think sparse is being overly picky here; it is missing the forest for the trees in interpreting the idiom. Still, it may be worth tweaking in the name of eliminating compiler noise, since it does not cost us very much to do so (and I believe we have done so in the past, too). We could also ask people with sparse to turn off the use NULL instead of 0 warning, but I think it _is_ a useful warning elsewhere (even though it is never a bug, it violates our style guidelines and may be an indication of a bug). It would be nice if sparse learned to ignore the warning in this particular idiom, but I am not going to hold my breath for that. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add the GIT_SENTINEL macro
On Thu, Jul 18, 2013 at 09:02:12PM +0100, Ramsay Jones wrote: The sentinel function attribute is not understood by versions of the gcc compiler prior to v4.0. At present, for earlier versions of gcc, the build issues 108 warnings related to the unknown attribute. In order to suppress the warnings, we conditionally define the GIT_SENTINEL macro to provide the sentinel attribute for gcc v4.0 and newer. Signed-off-by: Ramsay Jones ram...@ramsay1.demon.co.uk --- Acked-by: Jeff King p...@peff.net -__attribute__((sentinel)) +GIT_SENTINEL(0) void argv_array_pushl(struct argv_array *, ...); We could also add GIT_SENTINEL to handle the default-zero case, but I do not think it is worth the trouble. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git tag output order is incorrect (IMHO)
On Fri, Jul 19, 2013 at 12:40:55PM -0700, Junio C Hamano wrote: Andreas Schwab sch...@linux-m68k.org writes: Rahul Bansal rahul.ban...@rtcamp.com writes: IMHO git tag is expected to show tag-list ordered by versions. A git tag can be anything, not related to versions at all. Correct. But that does not prevent somebody to add git tag --sort=X option to the command, just like git for-each-ref has --sort=X option. A while ago I started on (but did not get very far on) unifying the ref selection code for for-each-ref, tag, and branch. It would be nice if they all supported the same set of --contains/--points-at/--merged/--sort, etc. I do plan to finish it eventually, but if anyone else feels like picking it up, I'd be glad to review patches and/or share my work-in-progress as a starting point. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: getting git from kernel.org is failing
On Tue, Jul 23, 2013 at 10:06:05PM +0200, Fredrik Gustafsson wrote: Confirmed (tested both with and without trailing /, IIRC there was some configuration change recently that effect that): iveqy@kolya:~/slask/git$ git clone https://git.kernel.org/cgit/git/git.git/ Klonar till git... error: Unable to get pack index https://git.kernel.org/cgit/git/git.git/objects/pack/pack-1e2bca8b5127039cff42b1cd3d47767fb577cd4f.idx That's weird. We shouldn't be fetching an index at all unless dumb http is in use. When I try to clone from the URL above, I also get shunted to dumb-http (and the repo on the server appears to be poorly packed). But if I go to: https://git.kernel.org/pub/scm/git/git.git then smart HTTP works fine. I wonder if there is a problem in the cgit setup on kernel.org (or if it was even intended that you could fetch from the cgit URL at all; the cgit page shows the clone URLs in /pub/scm/git). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/4] config: allow http.url.* any user matching
On Mon, Jul 22, 2013 at 01:24:06PM -0700, Kyle J. McKay wrote: I am not yet convinced that the precedence rule specified in this what we want (I do not have an example why it is *not* what we want, either). Another definition could be if user@ is present in the request, give lower precedence to config entries for the site without user@ than entries with user@, and I do not have a strong opinion myself which one between the two is better (and there may be third and other possible rule). Comments? Consider this site: [...] Thanks for explaining, and sorry I missed out on the last few rounds of review. I think your scheme (normalization plus special handling of the username field) addresses my biggest concern, which is matching in the face of optional usernames. The only two things that make me wary are: 1. The explanation and special-casing of username is a little complicated to explain. 2. The behavior for resolving the value when faced with multiple possibilities is completely unlike the rest of the config system (both dropping last-one-wins, and unlike the URL matching for credentials). I think we can decide that (2) is worth it if your semantics are more flexible in practice. It would be nice to see real-world feedback on how people use it before setting the behavior in stone, but there's sort of a chicken and egg problem there. For (1), I wonder if the explanation would be simpler if the precedences of each sub-part were simply laid out. That is, would it be correct to say something like: For a config key to match a URL, each element of the config key (if present) is compared to that of the URL, in the following order: 1. Protocol (e.g., `https` in `https://example.com/`). This field must match exactly between the config key and the URL. 2. Host/domain name (e.g., `example.com` in `https://example.com/`). This field must match exactly between the config key and the URL. 3. Path (e.g., `repo.git` in `https://example.com/repo.git`). This field is prefix-matched by slash-delimited path elements, so that config key `foo/` matches URL `foo/bar`. Longer matches take precedence (so `foo/bar`, if it exists, is a better match than just `foo/`). 4. Username (e.g., `user` in `https://u...@example.com/repo.git`). The list above is ordered by decreasing precedence; a URL that matches a config key's path is preferred to one that matches its username. I don't know if that is more or less clear of an explanation. It makes more sense to me, but that is probably because I wrote it. I'm also not 100% sure it describes your implementation, but I think it is equivalent to the prefix matching with normalization. I have a few other comments on specific patches; I'll send them separately. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 4/4] config: allow http.url.* any user matching
On Mon, Jul 22, 2013 at 05:56:44AM -0700, Kyle J. McKay wrote: + matches a url if it refers to the same scheme, host and port and the + path portion is an exact match or a prefix that matches at a / + boundary. If url does not include a user name, it will match a url + with any username otherwise the user name must match as well (the + password part, if present in the url, is always ignored). Longer url + path matches take precedence over shorter matches no matter what order + they occur in. For example, if both https://u...@example.com/path; and + https://example.com/path/name; are used as a config url value and + then https://u...@example.com/path/name/here; is passed to a git + command, the settings in the https://example.com/path/name; section These https://...; should probably be `https://...`, to mark them in asciidoc as literals. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 3/4] tests: add new test for the url_normalize function
On Mon, Jul 22, 2013 at 05:56:43AM -0700, Kyle J. McKay wrote: In order to perform sane URL matching for http.url.* options, http.c normalizes URLs before performing matches. A new test-url-normalize test program is introduced along with a new t5200-url-normalize.sh script to run the tests. While looking at test-url-normalize (and wanting to experiment some with your series to see how it handled a few behaviors), I wondered if we shouldn't be exposing this selection process to the user somehow via git-config. That is, would a shell script ever want to say what is the value of this config variable for url $X? Certainly our test scripts want to, and having a test-* program covers that, but might user scripts want to do the same? Or even to introduce its own URL-matched config options? How hard would it be to convert the -c option of test-url-normalize into something like: git config --file=foo --url http noepsv $URL which would look for http.$URL.noepsv matches. If we want to go that route, we don't have to do it now (the test-* interface is unpublished, and the git-config interface can simply come later than the internal feature). But it may be worth thinking about now while it is in your head. diff --git a/test-url-normalize.c b/test-url-normalize.c new file mode 100644 index 000..f18bd88 --- /dev/null +++ b/test-url-normalize.c [...] + if (!strcmp(sslverify, opt_lc.buf)) + printf(%s\n, curl_ssl_verify ? true : false); + else if (!strcmp(sslcert, opt_lc.buf)) + printf(%s\n, ssl_cert); +#if LIBCURL_VERSION_NUM = 0x070903 + else if (!strcmp(sslkey, opt_lc.buf)) + printf(%s\n, ssl_key); +#endif +#if LIBCURL_VERSION_NUM = 0x070908 + else if (!strcmp(sslcapath, opt_lc.buf)) + printf(%s\n, ssl_capath); +#endif [...] Do we need to have the complete list of options here, including curl version limitations? It seems like this will eventually get out of date with the list of options. Would it be sufficient to test just one (or even just handle a fake http.$URL.foo variable)? +#define url_normalize(u) http_options_url_normalize(u) Does this macro do anything besides shorten the name? Is the extra level of indirection to the reader worth it? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] config: add support for http.url.* settings
On Mon, Jul 22, 2013 at 05:56:41AM -0700, Kyle J. McKay wrote: +enum http_option_type { + OPT_POST_BUFFER, + OPT_MIN_SESSIONS, + OPT_SSL_VERIFY, + OPT_SSL_TRY, + OPT_SSL_CERT, + OPT_SSL_CAINFO, + OPT_LOW_SPEED, + OPT_LOW_TIME, + OPT_NO_EPSV, + OPT_HTTP_PROXY, + OPT_COOKIE_FILE, + OPT_USER_AGENT, + OPT_PASSWD_REQ, +#ifdef USE_CURL_MULTI + OPT_MAX_REQUESTS, +#endif +#if LIBCURL_VERSION_NUM = 0x070903 + OPT_SSL_KEY, +#endif +#if LIBCURL_VERSION_NUM = 0x070908 + OPT_SSL_CAPATH, +#endif + OPT_MAX +}; + +static size_t http_option_max_matched_len[OPT_MAX]; It's frustrating that we now have an extra spot to add new options to (e.g., somebody else is adding http.savecookies in another thread, and the merge will need to not just resolve the textual conflicts, but add it to this other list). Might it be simpler to just keep a dynamic list indexed by option name (either a hash table, or a sorted string_list)? We only incur a lookup when we find an actual config entry, so the number of lookups (and entries) should be minuscule and not affect performance. And as a bonus, it lets us handle arbitrary keys if we want to allow git config to learn about url matching for user-specified keys. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] protocol-capabilities documentation updates
On Mon, Jul 15, 2013 at 07:25:19PM +0700, Nguyen Thai Ngoc Duy wrote: I noticed that quiet and agent capabilities were missing in protocol-capabilitities.txt. I have a rough idea what they do, but I think it's best to be documented by the authors. Maybe you have some time to make a patch? Thanks for bringing it up; we should be more careful about documenting these as we add them. I went ahead and documented them both, as I also remember the rationale for quiet. And of course I found some other inaccuracies in the capabilities documentation while I was there. :) [1/4]: docs: fix 'report-status' protocol capability thinko [2/4]: docs: note that receive-pack knows side-band-64k capability [3/4]: document 'agent' protocol capability [4/4]: document 'quiet' receive-pack capability -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] docs: fix 'report-status' protocol capability thinko
The report-status capability is understood by receive-pack, not upload-pack. Signed-off-by: Jeff King p...@peff.net --- Documentation/technical/protocol-capabilities.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt index b15517f..11467ff 100644 --- a/Documentation/technical/protocol-capabilities.txt +++ b/Documentation/technical/protocol-capabilities.txt @@ -168,7 +168,7 @@ report-status report-status - -The upload-pack process can receive a 'report-status' capability, +The receive-pack process can receive a 'report-status' capability, which tells it that the client wants a report of what happened after a packfile upload and reference update. If the pushing client requests this capability, after unpacking and updating references the server -- 1.8.3.rc1.30.gff0fb75 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] docs: note that receive-pack knows side-band-64k capability
The protocol-capabilities documentation notes that any capabilities not explicitly mentioned for receive-pack work only for upload-pack. Receive-pack has advertised and understood side-band-64k since 38a81b4 (receive-pack: Wrap status reports inside side-band-64k, 2010-02-05), but we do not mention it explicitly. Let's do so. Note that receive-pack does not understand side-band, which was obsolete by that point. Signed-off-by: Jeff King p...@peff.net --- Documentation/technical/protocol-capabilities.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt index 11467ff..9bc2a10 100644 --- a/Documentation/technical/protocol-capabilities.txt +++ b/Documentation/technical/protocol-capabilities.txt @@ -21,8 +21,8 @@ recognized by the receive-pack (push to server) process. The 'report-status' and 'delete-refs' capabilities are sent and recognized by the receive-pack (push to server) process. -The 'ofs-delta' capability is sent and recognized by both upload-pack -and receive-pack protocols. +The 'ofs-delta' and 'side-band-64k' capabilities are sent and recognized +by both upload-pack and receive-pack protocols. All other capabilities are only recognized by the upload-pack (fetch from server) process. -- 1.8.3.rc1.30.gff0fb75 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] document 'quiet' receive-pack capability
This was added in c207e34 (fix push --quiet: add 'quiet' capability to receive-pack, 2012-01-08) but never documented. Signed-off-by: Jeff King p...@peff.net --- Documentation/technical/protocol-capabilities.txt | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt index d35159e..ec131b6 100644 --- a/Documentation/technical/protocol-capabilities.txt +++ b/Documentation/technical/protocol-capabilities.txt @@ -18,7 +18,7 @@ NOT advertise capabilities it does not understand. and server advertised. As a consequence of these rules, server MUST NOT advertise capabilities it does not understand. -The 'report-status' and 'delete-refs' capabilities are sent and +The 'report-status', 'delete-refs', and 'quiet' capabilities are sent and recognized by the receive-pack (push to server) process. The 'ofs-delta' and 'side-band-64k' capabilities are sent and recognized @@ -200,3 +200,13 @@ to delete references. value of a reference update. It is not sent back by the client, it simply informs the client that it can be sent zero-id values to delete references. + +quiet +- + +If the receive-pack server advertises the 'quiet' capability, it is +capable of silencing human-readable progress output which otherwise may +be shown when processing the received pack. A send-pack client should +respond with the 'quiet' capability to suppress server-side progress +reporting if the local progress reporting is also being suppressed +(e.g., via `push -q`, or if stderr does not go to a tty). -- 1.8.3.rc1.30.gff0fb75 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] document 'agent' protocol capability
This was added in ff5effd (include agent identifier in capability string, 2012-08-03), but neither the syntax nor the semantics were ever documented outside of the commit message. Signed-off-by: Jeff King p...@peff.net --- Documentation/technical/protocol-capabilities.txt | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt index 9bc2a10..d35159e 100644 --- a/Documentation/technical/protocol-capabilities.txt +++ b/Documentation/technical/protocol-capabilities.txt @@ -22,7 +22,8 @@ The 'ofs-delta' and 'side-band-64k' capabilities are sent and recognized recognized by the receive-pack (push to server) process. The 'ofs-delta' and 'side-band-64k' capabilities are sent and recognized -by both upload-pack and receive-pack protocols. +by both upload-pack and receive-pack protocols. The 'agent' capability +may optionally be sent in both protocols. All other capabilities are only recognized by the upload-pack (fetch from server) process. @@ -123,6 +124,20 @@ send/read OBJ_OFS_DELTA (aka type 6) in a packfile. its base by position in pack rather than by an obj-id. That is, they can send/read OBJ_OFS_DELTA (aka type 6) in a packfile. +agent +- + +The server may optionally send a capability of the form `agent=X` to +notify the client that the server is running version `X`. The client may +optionally return its own agent string by responding with an `agent=Y` +capability (but it MUST NOT do so if the server did not mention the +agent capability). The `X` and `Y` strings may contain any printable +ASCII characters except space (i.e., the byte range 32 x 127), and +are typically of the form package/version (e.g., git/1.8.3.1). The +agent strings are purely informative for statistics and debugging +purposes, and MUST NOT be used to programatically assume the presence +or absence of particular features. + shallow --- -- 1.8.3.rc1.30.gff0fb75 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 3/4] tests: add new test for the url_normalize function
On Wed, Jul 24, 2013 at 12:01:26PM -0700, Kyle J. McKay wrote: Right now, the values are only available as various strings, ints, longs etc. which have to be formatted differently for output. The original string value they were converted from is gone. The snippet shown above only shows some of the %s formatters. Either the original value will have to be kept around or a reconstituted string depending on what: git config --file=foo --url http noepsv $URL should output. If the original value was 0 or 1, should it output that or false or true? The test-url-normalize code for -c normalizes the output to false or true for all boolean values and reconverts ints/longs to strings. I think it would be the responsibility of the caller to specify what they are looking for. I.e., add --bool to the git-config command line as appropriate. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] commit.h: drop redundant comment
We mention twice that the from_ident field of struct pretty_print_context is internal. The first comment was added by 10f2fbf, which prepares the struct for internal fields, and then the second by a908047, which actually adds such a field. This was a mistake made when re-rolling the series on the list; the comment should have been removed from the latter commit. Signed-off-by: Jeff King p...@peff.net --- Not that important, really. I only bothered tracking down the source of the error because I was curious if it was a mis-merge or something. But nope, just me screwing up, and missing it in review. commit.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/commit.h b/commit.h index 35cc4e2..d912a9d 100644 --- a/commit.h +++ b/commit.h @@ -102,8 +102,6 @@ struct pretty_print_context { * Fields below here are manipulated internally by pp_* functions and * should not be counted on by callers. */ - - /* Manipulated by the pp_* functions internally. */ struct string_list in_body_headers; }; -- 1.8.3.rc1.30.gff0fb75 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Faster git grep.
On Thu, Jul 25, 2013 at 08:29:05PM +0200, Ondřej Bílka wrote: One solution would be to use same trick as was done in google code. Build and keep database of trigraphs and which files contain how many of them. When querry is made then check only these files that have appropriate combination of trigraphs. That seems like a sensible approach. Updating database would be relatively inexpensive compared to disk access we need to do anyway. Yes, I think it can be quite cheap, since you would only need to re-index files that have changed (and git is very quick at telling you what those files are). A space usage might be problem so which is why I decided go option route. Comments, pointers? I think it is a good idea, but not need to be part of core git. It seems more like you would want to glue together an existing code-indexing solution (like codesearch) with git (which would provide the list of files to index and to search). If that proves useful in practice, but the interface is clunky for whatever reason, then a good follow-on project could be to build support for updating and using the index via the usual git grep. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REQUEST 1/1] docs: update http.url.* options documentation
On Thu, Jul 25, 2013 at 03:39:13PM -0700, Kyle J. McKay wrote: Overhaul the text of the http.url.* options documentation providing a hopefully easier to understand itemized list of matching behavior as suggested by and including text from Jeff King. --- Signed-off-by: Jeff King p...@peff.net You should add your S-O-B, too, for your bits. +-- +. Scheme (e.g., `https` in `https://example.com/`). This field + must match exactly between the config key and the URL. +. Host/domain name (e.g., `example.com` in `https://example.com/`). + This field must match exactly between the config key and the URL. These look fine in the rendered manpage, but I think the source might be a little easier to read with a blank line between items. +. Port number (e.g., `8080` in `http://example.com:8080/`). + This field must match exactly between the config key and the URL. + Omitted port numbers are automatically converted to the correct + default for the scheme before matching. Thanks, I forgot to include port number in my original suggested text. +. Exact user name match (e.g., `user` in `https://u...@example.com/repo.git`). + If the config key has a user name it must match the user name in the URL + exactly. +. Any user name match. If a config key does not have a user name, that config + key will match a URL with any user name (including none). IMHO, this would be more clear as a single item, like: . User name (e.g., `user` in `https://u...@example.com/repo.git`). If the config key has a user name it must match the user name in the URL exactly. If the config key does not have a user name, that config key will match a URL with any user name (including none). +All URLs are normalized before attempting any matching (the password part, +if embedded in the URL, is always ignored for matching purposes) so that +equivalent urls that are simply spelled differently will match properly. And this nicely ties up the open questions I had after re-reading the list. Good. We could define equivalent here (the %-encoding thing is obvious, I think, but which components are case-sensitive and which are not is perhaps a bit subtle). I do not necessarily know that it is useful to the user, though, and is one more thing to confuse them. And somebody who really cares can look at the appropriate RFCs. So this patch looks pretty good to me; the two points I raised above are how I would have done it, but they are relatively minor if you do not agree. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] branch: make sure the upstream remote is configured
On Fri, Jul 26, 2013 at 07:39:37PM +0200, Carlos Martín Nieto wrote: A command of e.g. git push --set-upstream /tmp/t master will call install_branch_config() with a remote name of /tmp/t. This function will set the 'branch.master.remote' key to, which is nonsensical as there is no remote by that name. Is it nonsensical? It does not make sense for the @{upstream} magic token, because we will not have a branch in tracking branch refs/remotes to point to. But the configuration would still affect how git pull chooses a branch to fetch and merge. I.e., you can currently do: git push --set-upstream /tmp/t master git pull ;# pulls from /tmp/t master -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html