[PATCH v2 0/2] Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree

2012-10-10 Thread Nguyễn Thái Ngọc Duy
Round too. .gitattributes should be respected in all cases except blob
grepping. As Johannes pointed elsewhere in this thread, if "git
diff  " looks up worktree's .gitattributes (and none in
rev1/rev2), there's no reason "git grep" should behave differently.

Nguyễn Thái Ngọc Duy (2):
  quote: let caller reset buffer for quote_path_relative()
  grep: stop looking at random places for .gitattributes

 builtin/clean.c|  2 ++
 builtin/grep.c | 24 +---
 builtin/ls-files.c |  1 +
 grep.c | 11 ---
 grep.h |  4 +++-
 quote.c|  1 -
 t/t7008-grep-binary.sh | 22 ++
 wt-status.c|  1 +
 8 files changed, 50 insertions(+), 16 deletions(-)

-- 
1.7.12.1.406.g6ab07c4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree

2012-10-10 Thread Nguyễn Thái Ngọc Duy
On Wed, Oct 10, 2012 at 1:59 AM, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> I think we just need to have callers of grep_source_init provide us with
>> the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is
>> where the information is lost.
>
> Yes.  I agree that is the right approach.

Here we go. No additional tests as I don't know how to write them. But
strace shows it's ok.

Currently we don't have many options to optimize attr access. We
cannot prepare a attr stack in advance for a specific directory, then
check attributes many times using the stack. We cannot prepare attr
stack from a tree object. As a result a normal "grep pattern --"
results in a lot of duplicate failed open(".../.gitattributes") calls.

Jeff King (1):
  grep: pass true path name to grep machinery

Nguyễn Thái Ngọc Duy (2):
  quote: let caller reset buffer for quote_path_relative()
  grep: stop looking at random places for .gitattributes

 Documentation/git-grep.txt |  7 +--
 builtin/clean.c|  2 ++
 builtin/grep.c | 19 ---
 builtin/ls-files.c |  1 +
 grep.c | 11 ---
 grep.h |  4 +++-
 quote.c|  1 -
 wt-status.c|  1 +
 8 files changed, 32 insertions(+), 14 deletions(-)

-- 
1.7.12.1.406.g6ab07c4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree

2012-10-09 Thread Nguyen Thai Ngoc Duy
On Wed, Oct 10, 2012 at 12:33 PM, Junio C Hamano  wrote:
> Nguyen Thai Ngoc Duy  writes:
>
>> On Wed, Oct 10, 2012 at 1:59 AM, Junio C Hamano  wrote:
>>> Jeff King  writes:
>>>
 I think we just need to have callers of grep_source_init provide us with
 the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is
 where the information is lost.
>>>
>>> Yes.  I agree that is the right approach.
>>
>> Passing full path this way means prepare_attr_stack has to walk back
>> to top directory for every files (even in the same directory). If
>> .gitattributes are loaded while git-grep traverses the tree, then it
>> can preload attr once per directory. But Jeff's approach looks
>> simpler.
>
> Why can't you do both?  That is, to build a full path as you
> descend, and read per-directory .gitattributes as you go?

Hm... I need to check attr.c code but I think it means read
.gitattributes and prepare attr stack in builtin/grep.c (where we
traverse trees) and actually check the attribute in
grep_source_load_driver(), much further down the call stack. I'm not
sure how we can pass the prepared attr stack down to
grep_source_load_driver().
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree

2012-10-09 Thread Junio C Hamano
Nguyen Thai Ngoc Duy  writes:

> On Wed, Oct 10, 2012 at 1:59 AM, Junio C Hamano  wrote:
>> Jeff King  writes:
>>
>>> I think we just need to have callers of grep_source_init provide us with
>>> the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is
>>> where the information is lost.
>>
>> Yes.  I agree that is the right approach.
>
> Passing full path this way means prepare_attr_stack has to walk back
> to top directory for every files (even in the same directory). If
> .gitattributes are loaded while git-grep traverses the tree, then it
> can preload attr once per directory. But Jeff's approach looks
> simpler.

Why can't you do both?  That is, to build a full path as you
descend, and read per-directory .gitattributes as you go?

Confused...

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree

2012-10-09 Thread Nguyen Thai Ngoc Duy
On Wed, Oct 10, 2012 at 1:59 AM, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> I think we just need to have callers of grep_source_init provide us with
>> the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is
>> where the information is lost.
>
> Yes.  I agree that is the right approach.

Passing full path this way means prepare_attr_stack has to walk back
to top directory for every files (even in the same directory). If
.gitattributes are loaded while git-grep traverses the tree, then it
can preload attr once per directory. But Jeff's approach looks
simpler.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree

2012-10-09 Thread Junio C Hamano
Jeff King  writes:

> I think we just need to have callers of grep_source_init provide us with
> the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is
> where the information is lost.

Yes.  I agree that is the right approach.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree

2012-10-09 Thread Jeff King
On Tue, Oct 09, 2012 at 07:01:44PM +0700, Nguyen Thai Ngoc Duy wrote:

> On Tue, Oct 09, 2012 at 04:38:32PM +0700, Nguyen Thai Ngoc Duy wrote:
> > #5  0x0815e736 in userdiff_find_by_path (path=0x820e7f0
> > "HEAD:Documentation/.gitattributes") at userdiff.c:278
> > #6  0x081058ca in grep_source_load_driver (gs=0xbfffd978) at grep.c:1504
> 
> A bandage patch may look like this. But it does not solve the real
> problem:
> 
>  - we should be able to look up in-tree .gitattributes, not just
>ignore like this patch does
> 
>  - gs->name seems to be prepared for human consumption, not for
>accessing files. grep_file() with opt->relative is true can put
>quotes in gs->name, for example.

Right. For the second, you would probably want gs->identifier in the
case of GREP_SOURCE_FILE. But that identifier information is not
available at all for GREP_SOURCE_SHA1, which is what is breaking the
first point.

> I feel like we should make this function a callback and let git-grep
> deal with driver loading itself.

I think we just need to have callers of grep_source_init provide us with
the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is
where the information is lost.

Like this incomplete and untested patch, which should fix the quoting
problem for GREP_SOURCE_FILE, but leave the sha1 bits broken (see the
in-code comment). I'm traveling this week, so I doubt I'll have time to
look for a few more days. If you want to work on it, please do.

diff --git a/builtin/grep.c b/builtin/grep.c
index 82530a6..be602dd 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -86,7 +86,7 @@ static void add_work(struct grep_opt *opt, enum 
grep_source_type type,
 static int skip_first_line;
 
 static void add_work(struct grep_opt *opt, enum grep_source_type type,
-const char *name, const void *id)
+const char *name, const char *path, const void *id)
 {
grep_lock();
 
@@ -94,7 +94,7 @@ static void add_work(struct grep_opt *opt, enum 
grep_source_type type,
pthread_cond_wait(&cond_write, &grep_mutex);
}
 
-   grep_source_init(&todo[todo_end].source, type, name, id);
+   grep_source_init(&todo[todo_end].source, type, name, path, id);
if (opt->binary != GREP_BINARY_TEXT)
grep_source_load_driver(&todo[todo_end].source);
todo[todo_end].done = 0;
@@ -378,14 +378,21 @@ static int grep_sha1(struct grep_opt *opt, const unsigned 
char *sha1,
if (opt->relative && opt->prefix_length) {
quote_path_relative(filename + tree_name_len, -1, &pathbuf,
opt->prefix);
+   /* XXX Why do we insert here instead of just putting it in
+* first? */
strbuf_insert(&pathbuf, 0, filename, tree_name_len);
} else {
strbuf_addstr(&pathbuf, filename);
}
 
+   /* XXX We seem to get all kinds of junk via the filename field here,
+* including partial filenames, sha1:path, etc. We could parse it
+* ourselves, but that is probably insanity. We should ask the
+* caller to break it down more for us. For now, just pass NULL. */
+
 #ifndef NO_PTHREADS
if (use_threads) {
-   add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, sha1);
+   add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, NULL, sha1);
strbuf_release(&pathbuf);
return 0;
} else
@@ -394,7 +401,7 @@ static int grep_sha1(struct grep_opt *opt, const unsigned 
char *sha1,
struct grep_source gs;
int hit;
 
-   grep_source_init(&gs, GREP_SOURCE_SHA1, pathbuf.buf, sha1);
+   grep_source_init(&gs, GREP_SOURCE_SHA1, pathbuf.buf, NULL, 
sha1);
strbuf_release(&pathbuf);
hit = grep_source(opt, &gs);
 
@@ -414,7 +421,7 @@ static int grep_file(struct grep_opt *opt, const char 
*filename)
 
 #ifndef NO_PTHREADS
if (use_threads) {
-   add_work(opt, GREP_SOURCE_FILE, buf.buf, filename);
+   add_work(opt, GREP_SOURCE_FILE, buf.buf, filename, filename);
strbuf_release(&buf);
return 0;
} else
@@ -423,7 +430,7 @@ static int grep_file(struct grep_opt *opt, const char 
*filename)
struct grep_source gs;
int hit;
 
-   grep_source_init(&gs, GREP_SOURCE_FILE, buf.buf, filename);
+   grep_source_init(&gs, GREP_SOURCE_FILE, buf.buf, filename, 
filename);
strbuf_release(&buf);
hit = grep_source(opt, &gs);
 
diff --git a/grep.c b/grep.c
index edc7776..06bc1c6 100644
--- a/grep.c
+++ b/grep.c
@@ -1373,7 +1373,7 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned 
long size)
struct grep_source gs;
int r;
 
-   grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL);
+   grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL);

Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree

2012-10-09 Thread Nguyen Thai Ngoc Duy
On Tue, Oct 09, 2012 at 04:38:32PM +0700, Nguyen Thai Ngoc Duy wrote:
> #5  0x0815e736 in userdiff_find_by_path (path=0x820e7f0
> "HEAD:Documentation/.gitattributes") at userdiff.c:278
> #6  0x081058ca in grep_source_load_driver (gs=0xbfffd978) at grep.c:1504

A bandage patch may look like this. But it does not solve the real
problem:

 - we should be able to look up in-tree .gitattributes, not just
   ignore like this patch does

 - gs->name seems to be prepared for human consumption, not for
   accessing files. grep_file() with opt->relative is true can put
   quotes in gs->name, for example.

I feel like we should make this function a callback and let git-grep
deal with driver loading itself.

-- 8< --
diff --git a/grep.c b/grep.c
index edc7776..c5ed067 100644
--- a/grep.c
+++ b/grep.c
@@ -1501,7 +1501,8 @@ void grep_source_load_driver(struct grep_source *gs)
return;
 
grep_attr_lock();
-   gs->driver = userdiff_find_by_path(gs->name);
+   if (gs->type == GREP_SOURCE_FILE)
+   gs->driver = userdiff_find_by_path(gs->name);
if (!gs->driver)
gs->driver = userdiff_find_by_name("default");
grep_attr_unlock();
-- 8< --

-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree

2012-10-09 Thread Nguyen Thai Ngoc Duy
On Tue, Oct 9, 2012 at 4:03 PM, Johannes Sixt  wrote:
> Running 'git grep needle origin/master' on Windows gives numerous warnings
> of the kind
>
> warning: unable to access 'origin/master:Documentation/.gitattributes':
> Invalid argument

strace confirms it. Stack trace

#0  read_attr_from_file (path=0x820e818
"HEAD:Documentation/.gitattributes", macro_ok=0) at attr.c:351
#1  0x080d378d in read_attr (path=0x820e818
"HEAD:Documentation/.gitattributes", macro_ok=0) at attr.c:436
#2  0x080d3bf1 in prepare_attr_stack (path=0x820e7f0
"HEAD:Documentation/.gitattributes") at attr.c:630
#3  0x080d3f68 in collect_all_attrs (path=0x820e7f0
"HEAD:Documentation/.gitattributes") at attr.c:747
#4  0x080d3ffd in git_check_attr (path=0x820e7f0
"HEAD:Documentation/.gitattributes", num=1, check=0xbfffd848) at
attr.c:761
#5  0x0815e736 in userdiff_find_by_path (path=0x820e7f0
"HEAD:Documentation/.gitattributes") at userdiff.c:278
#6  0x081058ca in grep_source_load_driver (gs=0xbfffd978) at grep.c:1504
#7  0x08105907 in grep_source_is_binary (gs=0xbfffd978) at grep.c:1512
#8  0x08104eaa in grep_source_1 (opt=0xbfffe304, gs=0xbfffd978,
collect_hits=0) at grep.c:1180

Never touched userdiff.c before. Anybody?
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree

2012-10-09 Thread Johannes Sixt
Running 'git grep needle origin/master' on Windows gives numerous warnings
of the kind

warning: unable to access 'origin/master:Documentation/.gitattributes':
Invalid argument

It is worrysome that it is attempted to access a file whose name is
prefixed by a revision.

-- Hannes
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html