[PATCH v2 0/2] Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
Round too. .gitattributes should be respected in all cases except blob grepping. As Johannes pointed elsewhere in this thread, if "git diff " looks up worktree's .gitattributes (and none in rev1/rev2), there's no reason "git grep" should behave differently. Nguyễn Thái Ngọc Duy (2): quote: let caller reset buffer for quote_path_relative() grep: stop looking at random places for .gitattributes builtin/clean.c| 2 ++ builtin/grep.c | 24 +--- builtin/ls-files.c | 1 + grep.c | 11 --- grep.h | 4 +++- quote.c| 1 - t/t7008-grep-binary.sh | 22 ++ wt-status.c| 1 + 8 files changed, 50 insertions(+), 16 deletions(-) -- 1.7.12.1.406.g6ab07c4 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
On Wed, Oct 10, 2012 at 1:59 AM, Junio C Hamano wrote: > Jeff King writes: > >> I think we just need to have callers of grep_source_init provide us with >> the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is >> where the information is lost. > > Yes. I agree that is the right approach. Here we go. No additional tests as I don't know how to write them. But strace shows it's ok. Currently we don't have many options to optimize attr access. We cannot prepare a attr stack in advance for a specific directory, then check attributes many times using the stack. We cannot prepare attr stack from a tree object. As a result a normal "grep pattern --" results in a lot of duplicate failed open(".../.gitattributes") calls. Jeff King (1): grep: pass true path name to grep machinery Nguyễn Thái Ngọc Duy (2): quote: let caller reset buffer for quote_path_relative() grep: stop looking at random places for .gitattributes Documentation/git-grep.txt | 7 +-- builtin/clean.c| 2 ++ builtin/grep.c | 19 --- builtin/ls-files.c | 1 + grep.c | 11 --- grep.h | 4 +++- quote.c| 1 - wt-status.c| 1 + 8 files changed, 32 insertions(+), 14 deletions(-) -- 1.7.12.1.406.g6ab07c4 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
On Wed, Oct 10, 2012 at 12:33 PM, Junio C Hamano wrote: > Nguyen Thai Ngoc Duy writes: > >> On Wed, Oct 10, 2012 at 1:59 AM, Junio C Hamano wrote: >>> Jeff King writes: >>> I think we just need to have callers of grep_source_init provide us with the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is where the information is lost. >>> >>> Yes. I agree that is the right approach. >> >> Passing full path this way means prepare_attr_stack has to walk back >> to top directory for every files (even in the same directory). If >> .gitattributes are loaded while git-grep traverses the tree, then it >> can preload attr once per directory. But Jeff's approach looks >> simpler. > > Why can't you do both? That is, to build a full path as you > descend, and read per-directory .gitattributes as you go? Hm... I need to check attr.c code but I think it means read .gitattributes and prepare attr stack in builtin/grep.c (where we traverse trees) and actually check the attribute in grep_source_load_driver(), much further down the call stack. I'm not sure how we can pass the prepared attr stack down to grep_source_load_driver(). -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
Nguyen Thai Ngoc Duy writes: > On Wed, Oct 10, 2012 at 1:59 AM, Junio C Hamano wrote: >> Jeff King writes: >> >>> I think we just need to have callers of grep_source_init provide us with >>> the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is >>> where the information is lost. >> >> Yes. I agree that is the right approach. > > Passing full path this way means prepare_attr_stack has to walk back > to top directory for every files (even in the same directory). If > .gitattributes are loaded while git-grep traverses the tree, then it > can preload attr once per directory. But Jeff's approach looks > simpler. Why can't you do both? That is, to build a full path as you descend, and read per-directory .gitattributes as you go? Confused... -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
On Wed, Oct 10, 2012 at 1:59 AM, Junio C Hamano wrote: > Jeff King writes: > >> I think we just need to have callers of grep_source_init provide us with >> the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is >> where the information is lost. > > Yes. I agree that is the right approach. Passing full path this way means prepare_attr_stack has to walk back to top directory for every files (even in the same directory). If .gitattributes are loaded while git-grep traverses the tree, then it can preload attr once per directory. But Jeff's approach looks simpler. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
Jeff King writes: > I think we just need to have callers of grep_source_init provide us with > the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is > where the information is lost. Yes. I agree that is the right approach. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
On Tue, Oct 09, 2012 at 07:01:44PM +0700, Nguyen Thai Ngoc Duy wrote: > On Tue, Oct 09, 2012 at 04:38:32PM +0700, Nguyen Thai Ngoc Duy wrote: > > #5 0x0815e736 in userdiff_find_by_path (path=0x820e7f0 > > "HEAD:Documentation/.gitattributes") at userdiff.c:278 > > #6 0x081058ca in grep_source_load_driver (gs=0xbfffd978) at grep.c:1504 > > A bandage patch may look like this. But it does not solve the real > problem: > > - we should be able to look up in-tree .gitattributes, not just >ignore like this patch does > > - gs->name seems to be prepared for human consumption, not for >accessing files. grep_file() with opt->relative is true can put >quotes in gs->name, for example. Right. For the second, you would probably want gs->identifier in the case of GREP_SOURCE_FILE. But that identifier information is not available at all for GREP_SOURCE_SHA1, which is what is breaking the first point. > I feel like we should make this function a callback and let git-grep > deal with driver loading itself. I think we just need to have callers of grep_source_init provide us with the actual pathname (or NULL, in the case of GREP_SOURCE_BUF). That is where the information is lost. Like this incomplete and untested patch, which should fix the quoting problem for GREP_SOURCE_FILE, but leave the sha1 bits broken (see the in-code comment). I'm traveling this week, so I doubt I'll have time to look for a few more days. If you want to work on it, please do. diff --git a/builtin/grep.c b/builtin/grep.c index 82530a6..be602dd 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -86,7 +86,7 @@ static void add_work(struct grep_opt *opt, enum grep_source_type type, static int skip_first_line; static void add_work(struct grep_opt *opt, enum grep_source_type type, -const char *name, const void *id) +const char *name, const char *path, const void *id) { grep_lock(); @@ -94,7 +94,7 @@ static void add_work(struct grep_opt *opt, enum grep_source_type type, pthread_cond_wait(&cond_write, &grep_mutex); } - grep_source_init(&todo[todo_end].source, type, name, id); + grep_source_init(&todo[todo_end].source, type, name, path, id); if (opt->binary != GREP_BINARY_TEXT) grep_source_load_driver(&todo[todo_end].source); todo[todo_end].done = 0; @@ -378,14 +378,21 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1, if (opt->relative && opt->prefix_length) { quote_path_relative(filename + tree_name_len, -1, &pathbuf, opt->prefix); + /* XXX Why do we insert here instead of just putting it in +* first? */ strbuf_insert(&pathbuf, 0, filename, tree_name_len); } else { strbuf_addstr(&pathbuf, filename); } + /* XXX We seem to get all kinds of junk via the filename field here, +* including partial filenames, sha1:path, etc. We could parse it +* ourselves, but that is probably insanity. We should ask the +* caller to break it down more for us. For now, just pass NULL. */ + #ifndef NO_PTHREADS if (use_threads) { - add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, sha1); + add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, NULL, sha1); strbuf_release(&pathbuf); return 0; } else @@ -394,7 +401,7 @@ static int grep_sha1(struct grep_opt *opt, const unsigned char *sha1, struct grep_source gs; int hit; - grep_source_init(&gs, GREP_SOURCE_SHA1, pathbuf.buf, sha1); + grep_source_init(&gs, GREP_SOURCE_SHA1, pathbuf.buf, NULL, sha1); strbuf_release(&pathbuf); hit = grep_source(opt, &gs); @@ -414,7 +421,7 @@ static int grep_file(struct grep_opt *opt, const char *filename) #ifndef NO_PTHREADS if (use_threads) { - add_work(opt, GREP_SOURCE_FILE, buf.buf, filename); + add_work(opt, GREP_SOURCE_FILE, buf.buf, filename, filename); strbuf_release(&buf); return 0; } else @@ -423,7 +430,7 @@ static int grep_file(struct grep_opt *opt, const char *filename) struct grep_source gs; int hit; - grep_source_init(&gs, GREP_SOURCE_FILE, buf.buf, filename); + grep_source_init(&gs, GREP_SOURCE_FILE, buf.buf, filename, filename); strbuf_release(&buf); hit = grep_source(opt, &gs); diff --git a/grep.c b/grep.c index edc7776..06bc1c6 100644 --- a/grep.c +++ b/grep.c @@ -1373,7 +1373,7 @@ int grep_buffer(struct grep_opt *opt, char *buf, unsigned long size) struct grep_source gs; int r; - grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL); + grep_source_init(&gs, GREP_SOURCE_BUF, NULL, NULL, NULL);
Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
On Tue, Oct 09, 2012 at 04:38:32PM +0700, Nguyen Thai Ngoc Duy wrote: > #5 0x0815e736 in userdiff_find_by_path (path=0x820e7f0 > "HEAD:Documentation/.gitattributes") at userdiff.c:278 > #6 0x081058ca in grep_source_load_driver (gs=0xbfffd978) at grep.c:1504 A bandage patch may look like this. But it does not solve the real problem: - we should be able to look up in-tree .gitattributes, not just ignore like this patch does - gs->name seems to be prepared for human consumption, not for accessing files. grep_file() with opt->relative is true can put quotes in gs->name, for example. I feel like we should make this function a callback and let git-grep deal with driver loading itself. -- 8< -- diff --git a/grep.c b/grep.c index edc7776..c5ed067 100644 --- a/grep.c +++ b/grep.c @@ -1501,7 +1501,8 @@ void grep_source_load_driver(struct grep_source *gs) return; grep_attr_lock(); - gs->driver = userdiff_find_by_path(gs->name); + if (gs->type == GREP_SOURCE_FILE) + gs->driver = userdiff_find_by_path(gs->name); if (!gs->driver) gs->driver = userdiff_find_by_name("default"); grep_attr_unlock(); -- 8< -- -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
On Tue, Oct 9, 2012 at 4:03 PM, Johannes Sixt wrote: > Running 'git grep needle origin/master' on Windows gives numerous warnings > of the kind > > warning: unable to access 'origin/master:Documentation/.gitattributes': > Invalid argument strace confirms it. Stack trace #0 read_attr_from_file (path=0x820e818 "HEAD:Documentation/.gitattributes", macro_ok=0) at attr.c:351 #1 0x080d378d in read_attr (path=0x820e818 "HEAD:Documentation/.gitattributes", macro_ok=0) at attr.c:436 #2 0x080d3bf1 in prepare_attr_stack (path=0x820e7f0 "HEAD:Documentation/.gitattributes") at attr.c:630 #3 0x080d3f68 in collect_all_attrs (path=0x820e7f0 "HEAD:Documentation/.gitattributes") at attr.c:747 #4 0x080d3ffd in git_check_attr (path=0x820e7f0 "HEAD:Documentation/.gitattributes", num=1, check=0xbfffd848) at attr.c:761 #5 0x0815e736 in userdiff_find_by_path (path=0x820e7f0 "HEAD:Documentation/.gitattributes") at userdiff.c:278 #6 0x081058ca in grep_source_load_driver (gs=0xbfffd978) at grep.c:1504 #7 0x08105907 in grep_source_is_binary (gs=0xbfffd978) at grep.c:1512 #8 0x08104eaa in grep_source_1 (opt=0xbfffe304, gs=0xbfffd978, collect_hits=0) at grep.c:1180 Never touched userdiff.c before. Anybody? -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
'git grep needle rev' attempts to access 'rev:.../.gitattributes' in the worktree
Running 'git grep needle origin/master' on Windows gives numerous warnings of the kind warning: unable to access 'origin/master:Documentation/.gitattributes': Invalid argument It is worrysome that it is attempted to access a file whose name is prefixed by a revision. -- Hannes -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html