Re: [PATCH v2 2/2] grep: use '/' delimiter for paths
On Thu, Feb 09, 2017 at 12:20:34AM -0500, Jeff King wrote: > On Wed, Feb 08, 2017 at 09:14:17PM -0800, Junio C Hamano wrote: > > Jeff King writes: > (I _do_ think Stefan's proposed direction is worth it simply because the > result is easier to read, but I agree the whole thing can be avoided by > using pathspecs, as you've noted). I won't be pushing this series further due to limited time. Stefan signature.asc Description: PGP signature
Re: [PATCH v2 2/2] grep: use '/' delimiter for paths
On Wed, Feb 08, 2017 at 09:14:17PM -0800, Junio C Hamano wrote: > Jeff King writes: > > > master:a:a:a:a:a:a:a:a:a:a:a > > > > I think there are 2^(n-1) possible paths (each colon can be a real colon > > or a slash). Though I guess if you walk the trees as you go, you only > > have to examine at most "n" paths to find the first-level tree, and then > > at most "n-1" paths at the second level, and so on. > > > > Unless you really do have ambiguous trees, in which case you have to > > walk down multiple paths. > > > > It certainly would not be the first combinatoric explosion you can > > convince Git to perform. But it does seem like a lot of complication for > > something as simple as path lookups. > > That is true, and we may want to avoid the implementation complexity > of the backtracking name resolution. If you are on the other hand > worried about the runtime cost, it will be an issue to begin with > only for those who do "git grep -e pattern HEAD:t/perf", which is an > unnatural way to do "git grep -e pattern HEAD -- t/perf", and the > output from the latter won't have such an issue, so... I thought your point was to move it into the get_sha1() parser (so that while the form is only generated by "git grep", it can be accepted by any git command). That exposes it in a lot of places, including ones which are network accessible to things like gitweb (or GitHub, of course, which is my concern). Even without the runtime cost, though, I think the general complexity makes it an ugly path to go down (e.g., handling ambiguous cases). I wouldn't want to have to write the documentation for it. :) (I _do_ think Stefan's proposed direction is worth it simply because the result is easier to read, but I agree the whole thing can be avoided by using pathspecs, as you've noted). -Peff
Re: [PATCH v2 2/2] grep: use '/' delimiter for paths
Jeff King writes: > master:a:a:a:a:a:a:a:a:a:a:a > > I think there are 2^(n-1) possible paths (each colon can be a real colon > or a slash). Though I guess if you walk the trees as you go, you only > have to examine at most "n" paths to find the first-level tree, and then > at most "n-1" paths at the second level, and so on. > > Unless you really do have ambiguous trees, in which case you have to > walk down multiple paths. > > It certainly would not be the first combinatoric explosion you can > convince Git to perform. But it does seem like a lot of complication for > something as simple as path lookups. That is true, and we may want to avoid the implementation complexity of the backtracking name resolution. If you are on the other hand worried about the runtime cost, it will be an issue to begin with only for those who do "git grep -e pattern HEAD:t/perf", which is an unnatural way to do "git grep -e pattern HEAD -- t/perf", and the output from the latter won't have such an issue, so...
Re: [PATCH v2 2/2] grep: use '/' delimiter for paths
On Tue, Feb 07, 2017 at 12:24:30PM -0800, Junio C Hamano wrote: > Having said that, I actually think "make it more convenient" without > making anything incorrect would be to teach the revision parser to > understand > > > > as an extended SHA-1 expression to name the blob or the tree at that > path in the tree-ish, e.g. if we can make the revision parser to > take this > > master:Documentation:git.txt So here I was wondering what happens when you have a filename with a colon in it, but then... > to be able to show the same thing as well. You'd need to backtrack > the parsing (e.g. attempt to find "Documentation:git.txt" in > "master", fail to find any, then fall back to find "git.txt" in > "master:Documentation", find one, and be happy, or something like > that), and define how to resolve potential ambiguity (e.g. there may > indeed be "Documentation:git.txt" and "Documentation/git.txt" in the > tree-ish "master"), though. ...you obviously did think of that. Backtracking sounds pretty nasty, though. What's the time complexity of parsing: master:a:a:a:a:a:a:a:a:a:a:a I think there are 2^(n-1) possible paths (each colon can be a real colon or a slash). Though I guess if you walk the trees as you go, you only have to examine at most "n" paths to find the first-level tree, and then at most "n-1" paths at the second level, and so on. Unless you really do have ambiguous trees, in which case you have to walk down multiple paths. It certainly would not be the first combinatoric explosion you can convince Git to perform. But it does seem like a lot of complication for something as simple as path lookups. -Peff
Re: [PATCH v2 2/2] grep: use '/' delimiter for paths
Junio C Hamano writes: Sorry, one shouldn't type while being sick and in bed X-<. > I am not sure if you are shooting for is "work correctly" to begin > with, to be honest. The current code always shows the "correct" > output which is "the tree-ish object name (expressed in a way easier > to understand by the humans), followed by a colon, followed by the > path in the tree-ish the hit lies". You are making it "incorrect > but often more convenient", and sometimes that is a worth goal, but s/worth/&y/; > for the particular use cases you presented, i.e. > > $ git grep -e "$pattern" "$commit:path" > > a more natural way to express "I want to find this pattern in the > commit under that path" exists: > > $ git grep -e "$pattern" "$commit" -- path > > and because of that, I do not think the former form of the query s/do not think/do think/ > should happen _less_ often in the first place, which would make it > "incorrect but more convenient if the user gives an unusual query". > > So I am not sure if the change to "grep" is worth it. Also, it may be fairer to do s/incorrect/inconsistent/.
Re: [PATCH v2 2/2] grep: use '/' delimiter for paths
Stefan Hajnoczi writes: > Perhaps it's better to leave this than to merge code that doesn't work > correctly 100% of the time. I am not sure if you are shooting for is "work correctly" to begin with, to be honest. The current code always shows the "correct" output which is "the tree-ish object name (expressed in a way easier to understand by the humans), followed by a colon, followed by the path in the tree-ish the hit lies". You are making it "incorrect but often more convenient", and sometimes that is a worth goal, but for the particular use cases you presented, i.e. $ git grep -e "$pattern" "$commit:path" a more natural way to express "I want to find this pattern in the commit under that path" exists: $ git grep -e "$pattern" "$commit" -- path and because of that, I do not think the former form of the query should happen _less_ often in the first place, which would make it "incorrect but more convenient if the user gives an unusual query". So I am not sure if the change to "grep" is worth it. Having said that, I actually think "make it more convenient" without making anything incorrect would be to teach the revision parser to understand as an extended SHA-1 expression to name the blob or the tree at that path in the tree-ish, e.g. if we can make the revision parser to take this master:Documentation:git.txt as the name of the blob object, then the current output is both correct and more convenient. After all, this sample string starts at "master:Documentation" (which is an extended SHA-1 expression to name a tree-ish), followed by a colon, then followed by the path "git.txt" in it, and "grep -e pattern master:Documentation" would show hits in that blob prefixed with it. I.e. T=$(git rev-parse master:Documentation) git cat-file blob $T:git.txt would give you the contents of the source to the Git manual. It is not all that unreasonable to expect git cat-file blob master:Documentation:git.txt to be able to show the same thing as well. You'd need to backtrack the parsing (e.g. attempt to find "Documentation:git.txt" in "master", fail to find any, then fall back to find "git.txt" in "master:Documentation", find one, and be happy, or something like that), and define how to resolve potential ambiguity (e.g. there may indeed be "Documentation:git.txt" and "Documentation/git.txt" in the tree-ish "master"), though.
Re: [PATCH v2 2/2] grep: use '/' delimiter for paths
On Tue, Feb 07, 2017 at 03:04:14PM +, Stefan Hajnoczi wrote: > > I assume Stefan just grabbed my naive suggestion hence why it checks > > equality with a commit. So that's my fault :) Either of these may > > not be enough though, since if you do 'git grep malloc v2.9.3^{tree}' > > with this change the output prefix is 'v2.9.3^{tree}/' instead of the > > correct prefix 'v2.9.3^{tree}:' > > I revisited this series again today and am coming to the conclusion that > forming output based on the user's rev is really hard to get right in > all cases. I don't have a good solution to the v2.9.3^{tree} problem. I think the rule you need is not "are we at a tree", but rather "did we traverse a path while resolving the name?". Only the get_sha1() parser can tell you that. I think: char delim = ':'; struct object_context oc; if (get_sha1_with_context(name, 0, sha1, &oc)) die("..."); if (oc.path[0]) delim = '/'; /* name had a partial path */ would work. Root trees via "v2.9.3^{tree}" or "v2.9.3:" would have no path, but "v2.9.3:Documentation" would. I think you'd still need to avoid duplicating a trailing delimiter, but I can't think of a case where it is wrong to do that based purely on the name. -Peff
Re: [PATCH v2 2/2] grep: use '/' delimiter for paths
On Fri, Jan 20, 2017 at 03:51:33PM -0800, Brandon Williams wrote: > On 01/20, Junio C Hamano wrote: > > Stefan Hajnoczi writes: > > > > > If the tree contains a sub-directory then git-grep(1) output contains a > > > colon character instead of a path separator: > > > > > > $ git grep malloc v2.9.3:t > > > v2.9.3:t:test-lib.sh: setup_malloc_check () { > > > $ git show v2.9.3:t:test-lib.sh > > > fatal: Path 't:test-lib.sh' does not exist in 'v2.9.3' > > > > > > This patch attempts to use the correct delimiter: > > > > > > $ git grep malloc v2.9.3:t > > > v2.9.3:t/test-lib.sh: setup_malloc_check () { > > > $ git show v2.9.3:t/test-lib.sh > > > (success) > > > > > > Signed-off-by: Stefan Hajnoczi > > > --- > > > builtin/grep.c | 4 +++- > > > t/t7810-grep.sh | 5 + > > > 2 files changed, 8 insertions(+), 1 deletion(-) > > > > > > diff --git a/builtin/grep.c b/builtin/grep.c > > > index 90a4f3d..7a7aab9 100644 > > > --- a/builtin/grep.c > > > +++ b/builtin/grep.c > > > @@ -494,7 +494,9 @@ static int grep_object(struct grep_opt *opt, const > > > struct pathspec *pathspec, > > > > > > /* Add a delimiter if there isn't one already */ > > > if (name[len - 1] != '/' && name[len - 1] != ':') { > > > - strbuf_addch(&base, ':'); > > > + /* rev: or rev:path/ */ > > > + char delim = obj->type == OBJ_COMMIT ? ':' : > > > '/'; > > > > Why check the equality with commit, rather than un-equality with > > tree? Wouldn't you want to treat $commit:path and $tag:path the > > same way? > > I assume Stefan just grabbed my naive suggestion hence why it checks > equality with a commit. So that's my fault :) Either of these may > not be enough though, since if you do 'git grep malloc v2.9.3^{tree}' > with this change the output prefix is 'v2.9.3^{tree}/' instead of the > correct prefix 'v2.9.3^{tree}:' I revisited this series again today and am coming to the conclusion that forming output based on the user's rev is really hard to get right in all cases. I don't have a good solution to the v2.9.3^{tree} problem. Perhaps it's better to leave this than to merge code that doesn't work correctly 100% of the time. Stefan signature.asc Description: PGP signature
Re: [PATCH v2 2/2] grep: use '/' delimiter for paths
On 01/20, Junio C Hamano wrote: > Stefan Hajnoczi writes: > > > If the tree contains a sub-directory then git-grep(1) output contains a > > colon character instead of a path separator: > > > > $ git grep malloc v2.9.3:t > > v2.9.3:t:test-lib.sh: setup_malloc_check () { > > $ git show v2.9.3:t:test-lib.sh > > fatal: Path 't:test-lib.sh' does not exist in 'v2.9.3' > > > > This patch attempts to use the correct delimiter: > > > > $ git grep malloc v2.9.3:t > > v2.9.3:t/test-lib.sh: setup_malloc_check () { > > $ git show v2.9.3:t/test-lib.sh > > (success) > > > > Signed-off-by: Stefan Hajnoczi > > --- > > builtin/grep.c | 4 +++- > > t/t7810-grep.sh | 5 + > > 2 files changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/builtin/grep.c b/builtin/grep.c > > index 90a4f3d..7a7aab9 100644 > > --- a/builtin/grep.c > > +++ b/builtin/grep.c > > @@ -494,7 +494,9 @@ static int grep_object(struct grep_opt *opt, const > > struct pathspec *pathspec, > > > > /* Add a delimiter if there isn't one already */ > > if (name[len - 1] != '/' && name[len - 1] != ':') { > > - strbuf_addch(&base, ':'); > > + /* rev: or rev:path/ */ > > + char delim = obj->type == OBJ_COMMIT ? ':' : > > '/'; > > Why check the equality with commit, rather than un-equality with > tree? Wouldn't you want to treat $commit:path and $tag:path the > same way? I assume Stefan just grabbed my naive suggestion hence why it checks equality with a commit. So that's my fault :) Either of these may not be enough though, since if you do 'git grep malloc v2.9.3^{tree}' with this change the output prefix is 'v2.9.3^{tree}/' instead of the correct prefix 'v2.9.3^{tree}:' > > I also think the two patches should be squashed together, as it is > only after this patch that the code does the "right thing" by > changing the delimiter depending on obj->type. > > > + strbuf_addch(&base, delim); > > } > > } > > init_tree_desc(&tree, data, size); > > diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh > > index e804a3f..8a58d5e 100755 > > --- a/t/t7810-grep.sh > > +++ b/t/t7810-grep.sh > > @@ -1445,6 +1445,11 @@ test_expect_success 'grep outputs valid : > > for HEAD:t/' ' > > test_cmp expected actual > > ' > > > > +test_expect_success 'grep outputs valid : for HEAD:t' ' > > + git grep vvv HEAD:t >actual && > > + test_cmp expected actual > > +' > > + > > Perhaps you want an annotated tag object refs/tags/v1.0 that points > at the commit at the HEAD, and then run the same grep on v1.0:t, in > addition to the above test. > > > cat >expected < > HEAD:t/a/v:vvv > > HEAD:t/v:vvv -- Brandon Williams
Re: [PATCH v2 2/2] grep: use '/' delimiter for paths
Stefan Hajnoczi writes: > If the tree contains a sub-directory then git-grep(1) output contains a > colon character instead of a path separator: > > $ git grep malloc v2.9.3:t > v2.9.3:t:test-lib.sh: setup_malloc_check () { > $ git show v2.9.3:t:test-lib.sh > fatal: Path 't:test-lib.sh' does not exist in 'v2.9.3' > > This patch attempts to use the correct delimiter: > > $ git grep malloc v2.9.3:t > v2.9.3:t/test-lib.sh: setup_malloc_check () { > $ git show v2.9.3:t/test-lib.sh > (success) > > Signed-off-by: Stefan Hajnoczi > --- > builtin/grep.c | 4 +++- > t/t7810-grep.sh | 5 + > 2 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/builtin/grep.c b/builtin/grep.c > index 90a4f3d..7a7aab9 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -494,7 +494,9 @@ static int grep_object(struct grep_opt *opt, const struct > pathspec *pathspec, > > /* Add a delimiter if there isn't one already */ > if (name[len - 1] != '/' && name[len - 1] != ':') { > - strbuf_addch(&base, ':'); > + /* rev: or rev:path/ */ > + char delim = obj->type == OBJ_COMMIT ? ':' : > '/'; Why check the equality with commit, rather than un-equality with tree? Wouldn't you want to treat $commit:path and $tag:path the same way? I also think the two patches should be squashed together, as it is only after this patch that the code does the "right thing" by changing the delimiter depending on obj->type. > + strbuf_addch(&base, delim); > } > } > init_tree_desc(&tree, data, size); > diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh > index e804a3f..8a58d5e 100755 > --- a/t/t7810-grep.sh > +++ b/t/t7810-grep.sh > @@ -1445,6 +1445,11 @@ test_expect_success 'grep outputs valid : > for HEAD:t/' ' > test_cmp expected actual > ' > > +test_expect_success 'grep outputs valid : for HEAD:t' ' > + git grep vvv HEAD:t >actual && > + test_cmp expected actual > +' > + Perhaps you want an annotated tag object refs/tags/v1.0 that points at the commit at the HEAD, and then run the same grep on v1.0:t, in addition to the above test. > cat >expected < HEAD:t/a/v:vvv > HEAD:t/v:vvv