Re: [PATCH v2 2/2] grep: use '/' delimiter for paths

2017-02-14 Thread Stefan Hajnoczi
On Thu, Feb 09, 2017 at 12:20:34AM -0500, Jeff King wrote:
> On Wed, Feb 08, 2017 at 09:14:17PM -0800, Junio C Hamano wrote:
> > Jeff King  writes:
> (I _do_ think Stefan's proposed direction is worth it simply because the
> result is easier to read, but I agree the whole thing can be avoided by
> using pathspecs, as you've noted).

I won't be pushing this series further due to limited time.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v2 2/2] grep: use '/' delimiter for paths

2017-02-08 Thread Jeff King
On Wed, Feb 08, 2017 at 09:14:17PM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> >   master:a:a:a:a:a:a:a:a:a:a:a
> >
> > I think there are 2^(n-1) possible paths (each colon can be a real colon
> > or a slash). Though I guess if you walk the trees as you go, you only
> > have to examine at most "n" paths to find the first-level tree, and then
> > at most "n-1" paths at the second level, and so on.
> >
> > Unless you really do have ambiguous trees, in which case you have to
> > walk down multiple paths.
> >
> > It certainly would not be the first combinatoric explosion you can
> > convince Git to perform. But it does seem like a lot of complication for
> > something as simple as path lookups.
> 
> That is true, and we may want to avoid the implementation complexity
> of the backtracking name resolution.  If you are on the other hand
> worried about the runtime cost, it will be an issue to begin with
> only for those who do "git grep -e pattern HEAD:t/perf", which is an
> unnatural way to do "git grep -e pattern HEAD -- t/perf", and the
> output from the latter won't have such an issue, so...

I thought your point was to move it into the get_sha1() parser (so that
while the form is only generated by "git grep", it can be accepted by
any git command). That exposes it in a lot of places, including ones
which are network accessible to things like gitweb (or GitHub, of
course, which is my concern).

Even without the runtime cost, though, I think the general complexity
makes it an ugly path to go down (e.g., handling ambiguous cases). I
wouldn't want to have to write the documentation for it. :)

(I _do_ think Stefan's proposed direction is worth it simply because the
result is easier to read, but I agree the whole thing can be avoided by
using pathspecs, as you've noted).

-Peff


Re: [PATCH v2 2/2] grep: use '/' delimiter for paths

2017-02-08 Thread Junio C Hamano
Jeff King  writes:

>   master:a:a:a:a:a:a:a:a:a:a:a
>
> I think there are 2^(n-1) possible paths (each colon can be a real colon
> or a slash). Though I guess if you walk the trees as you go, you only
> have to examine at most "n" paths to find the first-level tree, and then
> at most "n-1" paths at the second level, and so on.
>
> Unless you really do have ambiguous trees, in which case you have to
> walk down multiple paths.
>
> It certainly would not be the first combinatoric explosion you can
> convince Git to perform. But it does seem like a lot of complication for
> something as simple as path lookups.

That is true, and we may want to avoid the implementation complexity
of the backtracking name resolution.  If you are on the other hand
worried about the runtime cost, it will be an issue to begin with
only for those who do "git grep -e pattern HEAD:t/perf", which is an
unnatural way to do "git grep -e pattern HEAD -- t/perf", and the
output from the latter won't have such an issue, so...




Re: [PATCH v2 2/2] grep: use '/' delimiter for paths

2017-02-08 Thread Jeff King
On Tue, Feb 07, 2017 at 12:24:30PM -0800, Junio C Hamano wrote:

> Having said that, I actually think "make it more convenient" without
> making anything incorrect would be to teach the revision parser to
> understand
> 
> 
> 
> as an extended SHA-1 expression to name the blob or the tree at that
> path in the tree-ish, e.g. if we can make the revision parser to
> take this
> 
> master:Documentation:git.txt

So here I was wondering what happens when you have a filename with a
colon in it, but then...

> to be able to show the same thing as well.  You'd need to backtrack
> the parsing (e.g. attempt to find "Documentation:git.txt" in
> "master", fail to find any, then fall back to find "git.txt" in
> "master:Documentation", find one, and be happy, or something like
> that), and define how to resolve potential ambiguity (e.g. there may
> indeed be "Documentation:git.txt" and "Documentation/git.txt" in the
> tree-ish "master"), though.

...you obviously did think of that. Backtracking sounds pretty nasty,
though. What's the time complexity of parsing:

  master:a:a:a:a:a:a:a:a:a:a:a

I think there are 2^(n-1) possible paths (each colon can be a real colon
or a slash). Though I guess if you walk the trees as you go, you only
have to examine at most "n" paths to find the first-level tree, and then
at most "n-1" paths at the second level, and so on.

Unless you really do have ambiguous trees, in which case you have to
walk down multiple paths.

It certainly would not be the first combinatoric explosion you can
convince Git to perform. But it does seem like a lot of complication for
something as simple as path lookups.

-Peff


Re: [PATCH v2 2/2] grep: use '/' delimiter for paths

2017-02-07 Thread Junio C Hamano
Junio C Hamano  writes:

Sorry, one shouldn't type while being sick and in bed X-<.

> I am not sure if you are shooting for is "work correctly" to begin
> with, to be honest.  The current code always shows the "correct"
> output which is "the tree-ish object name (expressed in a way easier
> to understand by the humans), followed by a colon, followed by the
> path in the tree-ish the hit lies".  You are making it "incorrect
> but often more convenient", and sometimes that is a worth goal, but

s/worth/&y/;

> for the particular use cases you presented, i.e.
>
> $ git grep -e "$pattern" "$commit:path"
>
> a more natural way to express "I want to find this pattern in the
> commit under that path" exists:
>
> $ git grep -e "$pattern" "$commit" -- path
>
> and because of that, I do not think the former form of the query

s/do not think/do think/

> should happen _less_ often in the first place, which would make it
> "incorrect but more convenient if the user gives an unusual query".
>
> So I am not sure if the change to "grep" is worth it.

Also, it may be fairer to do s/incorrect/inconsistent/.


Re: [PATCH v2 2/2] grep: use '/' delimiter for paths

2017-02-07 Thread Junio C Hamano
Stefan Hajnoczi  writes:

> Perhaps it's better to leave this than to merge code that doesn't work
> correctly 100% of the time.

I am not sure if you are shooting for is "work correctly" to begin
with, to be honest.  The current code always shows the "correct"
output which is "the tree-ish object name (expressed in a way easier
to understand by the humans), followed by a colon, followed by the
path in the tree-ish the hit lies".  You are making it "incorrect
but often more convenient", and sometimes that is a worth goal, but
for the particular use cases you presented, i.e.

$ git grep -e "$pattern" "$commit:path"

a more natural way to express "I want to find this pattern in the
commit under that path" exists:

$ git grep -e "$pattern" "$commit" -- path

and because of that, I do not think the former form of the query
should happen _less_ often in the first place, which would make it
"incorrect but more convenient if the user gives an unusual query".

So I am not sure if the change to "grep" is worth it.

Having said that, I actually think "make it more convenient" without
making anything incorrect would be to teach the revision parser to
understand



as an extended SHA-1 expression to name the blob or the tree at that
path in the tree-ish, e.g. if we can make the revision parser to
take this

master:Documentation:git.txt

as the name of the blob object, then the current output is both
correct and more convenient.  After all, this sample string starts
at "master:Documentation" (which is an extended SHA-1 expression to
name a tree-ish), followed by a colon, then followed by the path
"git.txt" in it, and "grep -e pattern master:Documentation" would
show hits in that blob prefixed with it.

I.e.

T=$(git rev-parse master:Documentation)
git cat-file blob $T:git.txt

would give you the contents of the source to the Git manual.  It is
not all that unreasonable to expect

git cat-file blob master:Documentation:git.txt

to be able to show the same thing as well.  You'd need to backtrack
the parsing (e.g. attempt to find "Documentation:git.txt" in
"master", fail to find any, then fall back to find "git.txt" in
"master:Documentation", find one, and be happy, or something like
that), and define how to resolve potential ambiguity (e.g. there may
indeed be "Documentation:git.txt" and "Documentation/git.txt" in the
tree-ish "master"), though.




Re: [PATCH v2 2/2] grep: use '/' delimiter for paths

2017-02-07 Thread Jeff King
On Tue, Feb 07, 2017 at 03:04:14PM +, Stefan Hajnoczi wrote:

> > I assume Stefan just grabbed my naive suggestion hence why it checks
> > equality with a commit.  So that's my fault :)  Either of these may
> > not be enough though, since if you do 'git grep malloc v2.9.3^{tree}'
> > with this change the output prefix is 'v2.9.3^{tree}/' instead of the
> > correct prefix 'v2.9.3^{tree}:'
> 
> I revisited this series again today and am coming to the conclusion that
> forming output based on the user's rev is really hard to get right in
> all cases.  I don't have a good solution to the v2.9.3^{tree} problem.

I think the rule you need is not "are we at a tree", but rather "did we
traverse a path while resolving the name?". Only the get_sha1() parser
can tell you that. I think:

  char delim = ':';
  struct object_context oc;
  if (get_sha1_with_context(name, 0, sha1, &oc))
  die("...");
  if (oc.path[0])
  delim = '/'; /* name had a partial path */

would work. Root trees via "v2.9.3^{tree}" or "v2.9.3:" would have no
path, but "v2.9.3:Documentation" would. I think you'd still need to
avoid duplicating a trailing delimiter, but I can't think of a case
where it is wrong to do that based purely on the name.

-Peff


Re: [PATCH v2 2/2] grep: use '/' delimiter for paths

2017-02-07 Thread Stefan Hajnoczi
On Fri, Jan 20, 2017 at 03:51:33PM -0800, Brandon Williams wrote:
> On 01/20, Junio C Hamano wrote:
> > Stefan Hajnoczi  writes:
> > 
> > > If the tree contains a sub-directory then git-grep(1) output contains a
> > > colon character instead of a path separator:
> > >
> > >   $ git grep malloc v2.9.3:t
> > >   v2.9.3:t:test-lib.sh:   setup_malloc_check () {
> > >   $ git show v2.9.3:t:test-lib.sh
> > >   fatal: Path 't:test-lib.sh' does not exist in 'v2.9.3'
> > >
> > > This patch attempts to use the correct delimiter:
> > >
> > >   $ git grep malloc v2.9.3:t
> > >   v2.9.3:t/test-lib.sh:   setup_malloc_check () {
> > >   $ git show v2.9.3:t/test-lib.sh
> > >   (success)
> > >
> > > Signed-off-by: Stefan Hajnoczi 
> > > ---
> > >  builtin/grep.c  | 4 +++-
> > >  t/t7810-grep.sh | 5 +
> > >  2 files changed, 8 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/builtin/grep.c b/builtin/grep.c
> > > index 90a4f3d..7a7aab9 100644
> > > --- a/builtin/grep.c
> > > +++ b/builtin/grep.c
> > > @@ -494,7 +494,9 @@ static int grep_object(struct grep_opt *opt, const 
> > > struct pathspec *pathspec,
> > >  
> > >   /* Add a delimiter if there isn't one already */
> > >   if (name[len - 1] != '/' && name[len - 1] != ':') {
> > > - strbuf_addch(&base, ':');
> > > + /* rev: or rev:path/ */
> > > + char delim = obj->type == OBJ_COMMIT ? ':' : 
> > > '/';
> > 
> > Why check the equality with commit, rather than un-equality with
> > tree?  Wouldn't you want to treat $commit:path and $tag:path the
> > same way?
> 
> I assume Stefan just grabbed my naive suggestion hence why it checks
> equality with a commit.  So that's my fault :)  Either of these may
> not be enough though, since if you do 'git grep malloc v2.9.3^{tree}'
> with this change the output prefix is 'v2.9.3^{tree}/' instead of the
> correct prefix 'v2.9.3^{tree}:'

I revisited this series again today and am coming to the conclusion that
forming output based on the user's rev is really hard to get right in
all cases.  I don't have a good solution to the v2.9.3^{tree} problem.

Perhaps it's better to leave this than to merge code that doesn't work
correctly 100% of the time.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v2 2/2] grep: use '/' delimiter for paths

2017-01-20 Thread Brandon Williams
On 01/20, Junio C Hamano wrote:
> Stefan Hajnoczi  writes:
> 
> > If the tree contains a sub-directory then git-grep(1) output contains a
> > colon character instead of a path separator:
> >
> >   $ git grep malloc v2.9.3:t
> >   v2.9.3:t:test-lib.sh: setup_malloc_check () {
> >   $ git show v2.9.3:t:test-lib.sh
> >   fatal: Path 't:test-lib.sh' does not exist in 'v2.9.3'
> >
> > This patch attempts to use the correct delimiter:
> >
> >   $ git grep malloc v2.9.3:t
> >   v2.9.3:t/test-lib.sh: setup_malloc_check () {
> >   $ git show v2.9.3:t/test-lib.sh
> >   (success)
> >
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  builtin/grep.c  | 4 +++-
> >  t/t7810-grep.sh | 5 +
> >  2 files changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/builtin/grep.c b/builtin/grep.c
> > index 90a4f3d..7a7aab9 100644
> > --- a/builtin/grep.c
> > +++ b/builtin/grep.c
> > @@ -494,7 +494,9 @@ static int grep_object(struct grep_opt *opt, const 
> > struct pathspec *pathspec,
> >  
> > /* Add a delimiter if there isn't one already */
> > if (name[len - 1] != '/' && name[len - 1] != ':') {
> > -   strbuf_addch(&base, ':');
> > +   /* rev: or rev:path/ */
> > +   char delim = obj->type == OBJ_COMMIT ? ':' : 
> > '/';
> 
> Why check the equality with commit, rather than un-equality with
> tree?  Wouldn't you want to treat $commit:path and $tag:path the
> same way?

I assume Stefan just grabbed my naive suggestion hence why it checks
equality with a commit.  So that's my fault :)  Either of these may
not be enough though, since if you do 'git grep malloc v2.9.3^{tree}'
with this change the output prefix is 'v2.9.3^{tree}/' instead of the
correct prefix 'v2.9.3^{tree}:'

> 
> I also think the two patches should be squashed together, as it is
> only after this patch that the code does the "right thing" by
> changing the delimiter depending on obj->type.
> 
> > +   strbuf_addch(&base, delim);
> > }
> > }
> > init_tree_desc(&tree, data, size);
> > diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
> > index e804a3f..8a58d5e 100755
> > --- a/t/t7810-grep.sh
> > +++ b/t/t7810-grep.sh
> > @@ -1445,6 +1445,11 @@ test_expect_success 'grep outputs valid : 
> > for HEAD:t/' '
> > test_cmp expected actual
> >  '
> >  
> > +test_expect_success 'grep outputs valid : for HEAD:t' '
> > +   git grep vvv HEAD:t >actual &&
> > +   test_cmp expected actual
> > +'
> > +
> 
> Perhaps you want an annotated tag object refs/tags/v1.0 that points
> at the commit at the HEAD, and then run the same grep on v1.0:t, in
> addition to the above test.
> 
> >  cat >expected < >  HEAD:t/a/v:vvv
> >  HEAD:t/v:vvv

-- 
Brandon Williams


Re: [PATCH v2 2/2] grep: use '/' delimiter for paths

2017-01-20 Thread Junio C Hamano
Stefan Hajnoczi  writes:

> If the tree contains a sub-directory then git-grep(1) output contains a
> colon character instead of a path separator:
>
>   $ git grep malloc v2.9.3:t
>   v2.9.3:t:test-lib.sh:   setup_malloc_check () {
>   $ git show v2.9.3:t:test-lib.sh
>   fatal: Path 't:test-lib.sh' does not exist in 'v2.9.3'
>
> This patch attempts to use the correct delimiter:
>
>   $ git grep malloc v2.9.3:t
>   v2.9.3:t/test-lib.sh:   setup_malloc_check () {
>   $ git show v2.9.3:t/test-lib.sh
>   (success)
>
> Signed-off-by: Stefan Hajnoczi 
> ---
>  builtin/grep.c  | 4 +++-
>  t/t7810-grep.sh | 5 +
>  2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 90a4f3d..7a7aab9 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -494,7 +494,9 @@ static int grep_object(struct grep_opt *opt, const struct 
> pathspec *pathspec,
>  
>   /* Add a delimiter if there isn't one already */
>   if (name[len - 1] != '/' && name[len - 1] != ':') {
> - strbuf_addch(&base, ':');
> + /* rev: or rev:path/ */
> + char delim = obj->type == OBJ_COMMIT ? ':' : 
> '/';

Why check the equality with commit, rather than un-equality with
tree?  Wouldn't you want to treat $commit:path and $tag:path the
same way?

I also think the two patches should be squashed together, as it is
only after this patch that the code does the "right thing" by
changing the delimiter depending on obj->type.

> + strbuf_addch(&base, delim);
>   }
>   }
>   init_tree_desc(&tree, data, size);
> diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
> index e804a3f..8a58d5e 100755
> --- a/t/t7810-grep.sh
> +++ b/t/t7810-grep.sh
> @@ -1445,6 +1445,11 @@ test_expect_success 'grep outputs valid : 
> for HEAD:t/' '
>   test_cmp expected actual
>  '
>  
> +test_expect_success 'grep outputs valid : for HEAD:t' '
> + git grep vvv HEAD:t >actual &&
> + test_cmp expected actual
> +'
> +

Perhaps you want an annotated tag object refs/tags/v1.0 that points
at the commit at the HEAD, and then run the same grep on v1.0:t, in
addition to the above test.

>  cat >expected <  HEAD:t/a/v:vvv
>  HEAD:t/v:vvv