Re: Fwd: Possibly nicer pathspec syntax?
On Thu, Feb 9, 2017 at 4:11 AM, Junio C Hamanowrote: > With or without the patch in this thread, parse_pathspec() behaves > the same way for either CWD or FULL if you feed a non-empty > pathspec with at least one positive element. IOW, if a caller feeds > a non-empty pathspec and there is no "negative" element involved, it > does not matter if we feed CWD or FULL. Yes. Now that you put it that way, it may make more sense to rename the options to PATHSPEC_DEFAULT_* instead of PATHSPEC_PREFER_*. > - builtin/checkout.c::cmd_checkout(), when argc==0, does not call >parse_pathspec(). This codepath will get affected by Linus's >change ("cd t && git checkout :\!perf" would try to check out >everything except t/perf, but what is a reasonable definition of >"everything" in the context of this command). We need to >decide, and I am leaning towards preferring CWD for this case. So far I have seen arguments with single negative pathspec as examples. What about "cd t; git checkout :^perf :^../Documentation"? CWD is probably not the best base to be excluded from. Maybe the common prefix of all negative pathspecs? But things start to get a bit unintuitive on that road. Or do will still bail out on multiple negative pathspecs with no positive one? Also, even with single negative pathspec, what about "cd t; git checkout :^../ewah"? > So, I am tempted to suggest us doing the following: > > * Leave a NEEDSWORK comment to archive.c::path_exists() that is >used for checking the validation of pathspec elements. To fix it >properly, we need to be able to skip a negative pathspec to be >passed to this function by the caller, and to do so, we need to >expose a helper from the pathspec API that gets a single string >and returns what magic it has, but that is of lower priority. Side note. There's something else I'm not happy with pathspec handling in "git archive". Try "cd t; git archive HEAD -- ../Documentation" and you'll get a very unfriendly error message. But well, no time for it. > * Retire the PATHSPEC_PREFER_CWD bit and replace its use with the >lack of the PATHSPEC_PREFER_FULL bit. > > * Keep most of the above callsites that currently do not pass >CWD/FULL as they are, except the ones that should take FULL (see >above). > > Comments? This comes from my experience reading files-backend.c. I didn't realize '0' in files_downcast(ref_store, 0, "pack_refs"); meant 'submodule not allowed' because apparently I was too lazy to read prototype. But if was files_downcast(ref_store, NO_SUBMODULE, "pack_refs"), it would save people's time. With that in mind, should we keep _CWD the name, so people can quickly see and guess that it prefers _CWD than _FULL? _CWD can be defined as zero. It there's mostly as a friendly reminder. Other than that, yes if killing one flag helps maintenance, I'm for it. PS. You may notice I favored ^ over ! already. ! was a pain point for me too but I was too lazy to bring it up and fight for it. Thanks Linus. -- Duy
Re: Fwd: Possibly nicer pathspec syntax?
On Thu, Feb 9, 2017 at 12:39 AM, Junio C Hamanowrote: > Duy Nguyen writes: > >> On Wed, Feb 8, 2017 at 12:12 PM, Linus Torvalds >> wrote: >>> Two-patch series to follow. >> >> glossary-content.txt update for both patches would be nice. > > I am no longer worried about it as I saw somebody actually sent > follow-up patches on this, but I want to pick your brain on one > thing that is related to this codepath. > > We have PATHSPEC_PREFER_CWD and PATHSPEC_PREFER_FULL bits in flags, > added at fc12261fea ("parse_pathspec: add PATHSPEC_PREFER_{CWD,FULL} > flags", 2013-07-14), and I think the intent is some commands when > given no pathspec work on all paths in the current subdirectory > while others work on the full tree, regardless of where you are. > "grep" is in the former camp, "log" is in the latter. And there is > a check to catch a bug in a caller that sets both. > > I am wondering about this hunk (this is from the original commit > that added it): > > if (!entry) { > static const char *raw[2]; > > + if (flags & PATHSPEC_PREFER_FULL) > + return; > + > + if (!(flags & PATHSPEC_PREFER_CWD)) > + die("BUG: PATHSPEC_PREFER_CWD requires arguments"); > + > pathspec->items = item = xmalloc(sizeof(*item)); > memset(item, 0, sizeof(*item)); > item->match = prefix; > ... returns a single entry pathspec to cover cwd ... > > The BUG message is given when > > - The command got no pathspec from the caller; and > - PATHSPEC_PREFER_FULL is not set; and > - PATHSPEC_PREFER_CWD is NOT set. > > but the message says that the caller must have args when it sets > prefer-cwd. Is this a simple typo? If so what should it say? > > die("BUG: one of PATHSPEC_PREFER_FULL or _CWD must be set"); Without reading through your next mail, I'd say "BUG: this command requires arguments". > Does this third possibility (i.e. a caller is allowed to pass > "flags" that does not prefer either) exist to support a command > where the caller MUST have at least one pathspec? If that were the > case, this wouldn't be a BUG but an end-user error, e.g. > > die("at least one pathspec element is required"); Or this. Yes. I might have just been defensive at then and kept the third option open. > If you know offhand which callers pass neither of the two > PATHSPEC_PREFER_* bits and remember for what purpose you allowed > them to do so, please remind me. I'll keep digging in the meantime. I don't usually remember what I ate yesterday and this commit was from 2013 :D But I'll see if your findings spark anything in my brain. -- Duy
Re: Fwd: Possibly nicer pathspec syntax?
Junio C Hamanowrites: > If you know offhand which callers pass neither of the two > PATHSPEC_PREFER_* bits and remember for what purpose you allowed > them to do so, please remind me. I'll keep digging in the meantime. Answering my own questions, here are my findings so far and a tentative conclusion. With or without the patch in this thread, parse_pathspec() behaves the same way for either CWD or FULL if you feed a non-empty pathspec with at least one positive element. IOW, if a caller feeds a non-empty pathspec and there is no "negative" element involved, it does not matter if we feed CWD or FULL. There are only a handful of callers that pass neither preference bits to parse_pathspec(). Here are my observations on them that tells me that most of them are OK if we change them to prefer either CWD or FULL: - archive.c::path_exists() feeds a pathspec with a single element to see if read_tree_recursive() finds any matching paths, to allow its caller to iterate over the original pathspec and see if there is a typo (i.e. an element that matches nothing). It should prefer FULL to match what parse_pathspec_arg(), its caller, uses. The caller probably should refrain from passing ones with negative magic. I.e. "git archive -- t :\!t/perf" errors out because checking each element independently in the loop means that ":\!t/perf" is checked alone, triggering "there is nothing to exclude from". - blame.c::find_origin() feeds a pathspec with a single element, which is a path in the history and does so as a literal, hence no room for "negative" to kick in. - builtin/check-ignore.c::check_ignore(), when argc==0, does not call parse_pathspec(). It does not take any magic other than FROMTOP, so "negative" won't come into the picture. - builtin/checkout.c::cmd_checkout(), when argc==0, does not call parse_pathspec(). This codepath will get affected by Linus's change ("cd t && git checkout :\!perf" would try to check out everything except t/perf, but what is a reasonable definition of "everything" in the context of this command). We need to decide, and I am leaning towards preferring CWD for this case. - revision.c::setup_revisions() calls parse_pathspec() only when the caller gave a non-empty pathspec. This pathspec is used for pruning log traversal (e.g. "only show commits that touch these paths") and is affected by Linus's change. It should favor FULL. - tree-diff.c::try_to_follow_renames() feeds a pathspec with a single element as a literal, hence no room for "negative" to kick in. So, I am tempted to suggest us doing the following: * Leave a NEEDSWORK comment to archive.c::path_exists() that is used for checking the validation of pathspec elements. To fix it properly, we need to be able to skip a negative pathspec to be passed to this function by the caller, and to do so, we need to expose a helper from the pathspec API that gets a single string and returns what magic it has, but that is of lower priority. * Retire the PATHSPEC_PREFER_CWD bit and replace its use with the lack of the PATHSPEC_PREFER_FULL bit. * Keep most of the above callsites that currently do not pass CWD/FULL as they are, except the ones that should take FULL (see above). Comments?
Re: Fwd: Possibly nicer pathspec syntax?
Duy Nguyenwrites: > On Wed, Feb 8, 2017 at 12:12 PM, Linus Torvalds > wrote: >> Two-patch series to follow. > > glossary-content.txt update for both patches would be nice. I am no longer worried about it as I saw somebody actually sent follow-up patches on this, but I want to pick your brain on one thing that is related to this codepath. We have PATHSPEC_PREFER_CWD and PATHSPEC_PREFER_FULL bits in flags, added at fc12261fea ("parse_pathspec: add PATHSPEC_PREFER_{CWD,FULL} flags", 2013-07-14), and I think the intent is some commands when given no pathspec work on all paths in the current subdirectory while others work on the full tree, regardless of where you are. "grep" is in the former camp, "log" is in the latter. And there is a check to catch a bug in a caller that sets both. I am wondering about this hunk (this is from the original commit that added it): if (!entry) { static const char *raw[2]; + if (flags & PATHSPEC_PREFER_FULL) + return; + + if (!(flags & PATHSPEC_PREFER_CWD)) + die("BUG: PATHSPEC_PREFER_CWD requires arguments"); + pathspec->items = item = xmalloc(sizeof(*item)); memset(item, 0, sizeof(*item)); item->match = prefix; ... returns a single entry pathspec to cover cwd ... The BUG message is given when - The command got no pathspec from the caller; and - PATHSPEC_PREFER_FULL is not set; and - PATHSPEC_PREFER_CWD is NOT set. but the message says that the caller must have args when it sets prefer-cwd. Is this a simple typo? If so what should it say? die("BUG: one of PATHSPEC_PREFER_FULL or _CWD must be set"); If that were the case, we are expressing only one bit of information (do we limit to cwd, or do we work on full-tree?), but there must have been a reason why we added two bits and made them mutually incompatible so that we can express three possibilities. Does this third possibility (i.e. a caller is allowed to pass "flags" that does not prefer either) exist to support a command where the caller MUST have at least one pathspec? If that were the case, this wouldn't be a BUG but an end-user error, e.g. die("at least one pathspec element is required"); If you know offhand which callers pass neither of the two PATHSPEC_PREFER_* bits and remember for what purpose you allowed them to do so, please remind me. I'll keep digging in the meantime. Thanks.
Re: Fwd: Possibly nicer pathspec syntax?
On Wed, Feb 8, 2017 at 12:12 PM, Linus Torvaldswrote: > Two-patch series to follow. glossary-content.txt update for both patches would be nice. -- Duy
Re: Fwd: Possibly nicer pathspec syntax?
On Tue, 7 Feb 2017, Junio C Hamano wrote: > > + // Special case alias for '!' > > /* style? */ Will fix. > I somehow do not think this is correct. I expect > > cd t && git grep -e foo -- :^perf/ > > to look into things in 't' except for things in 't/perf', but the > above will grab hits from ../Documentation/ etc. We need to pay > attention to PATHSPEC_PREFER_CWD bit in the flags word, I think. Ok, that's easy enough. Two-patch series to follow. Linus
Re: Fwd: Possibly nicer pathspec syntax?
Linus Torvaldswrites: > People don't expect set theory from their pathspecs. They expect their > pathspecs to limit the output. They've learnt that within a > subdirectory, the pathspec limits to that subdirectory. And now it > suddenly starts showing things outside the subdirectory? > > At that point no amount of "but but think about set theory" will > matter, methinks. I do not feel too strongly about it, either, but when we invoke "what would people who go down into subdirectories expect", the issue becomes a lot bigger. "git diff/log" without any pathspec in a subdirectory still shows the whole diff. "git grep" looks for hits inside that subdirectory, on the other hand. I think the former design decision is mostly a historical accident. "The project tree as the whole is what matters" was one reason, and another is that historically we didn't have ":/"--defaulting to the whole tree and telling people to give "." was easier than defaulting to the current and telling people to give "../.." after counting how many levels deep you started at. If we were designing the system today with "." and ":/", I wouldn't be surprised if we chose "always limit to cwd if there is no pathspec" for any command for the sake of consistency. We can easily say "if you want whole-tree, pass :/" instead. But we do not live in that world, and I do not think change them to make things consistent is what we are discussing in this thread. Given users accept and expect that "diff" without pathspec is whole tree, while "grep" without pathspec is limited to cwd, what is the reasonable definition of "everything from which negative ones are excluded"? That is the question I am trying to answer. As Mike mentioned earlier in the thread, if we used "." (limit to cwd) for "diff/log" when there are only negative pathspec elements, the resulting UI would become inconsistent within a single command, as in my world view, giving no pathspec means "work on everything you would ordinarily work on", a positive pathspec means "among everything you would ordinarily work on, only work on the ones that match this pattern", and giving only a negative one ought to mean "among everything you would ordinarily work on, only work on the ones that do NOT match this pattern." > pathspec.c | 20 +++- > 1 file changed, 15 insertions(+), 5 deletions(-) > > diff --git a/pathspec.c b/pathspec.c > index 7ababb315..2a91247bc 100644 > --- a/pathspec.c > +++ b/pathspec.c > @@ -224,6 +224,12 @@ static const char *parse_short_magic(unsigned *magic, > const char *elem) > char ch = *pos; > int i; > > + // Special case alias for '!' /* style? */ > + if (ch == '^') { > + *magic |= PATHSPEC_EXCLUDE; > + continue; > + } > + > if (!is_pathspec_magic(ch)) > break; > > + /* > + * If everything is an exclude pattern, add one positive pattern > + * that matches everyting. We allocated an extra one for this. > + */ > + if (nr_exclude == n) { > + init_pathspec_item(item + n, 0, "", 0, ""); > + pathspec->nr++; > + } I somehow do not think this is correct. I expect cd t && git grep -e foo -- :^perf/ to look into things in 't' except for things in 't/perf', but the above will grab hits from ../Documentation/ etc. We need to pay attention to PATHSPEC_PREFER_CWD bit in the flags word, I think. A real change probably wants to become a two-patch series (one for the new :^ alias, the other is for "everything except...") with log message ;-)
Re: Fwd: Possibly nicer pathspec syntax?
On Tue, Feb 7, 2017 at 7:12 PM, Junio C Hamanowrote: > > But that is not what I was talking about. Let's simplify. I'd say > for any command that acts on "everything" when pathspec is not > given, the two sets of actual paths affected by these two: > > git cmd -- "net/" > git cmd -- ":!net/" > > should have no overlap (obviously) and when you take union of the > two sets, that should equal to > > git cmd -- > > i.e. no pathspecs. Well, as mentioned, I won't ever care. I'm certainly ok with the "make the default positive entry be everything". I just suspect that from a user perspective that actually delves into the subdirectories, the much bigger question will be: "I gave you a pathspec, and suddenly you start giving me stuff from outside the area entirely". And then you can say "well, just add '.' to the pathspec", and you'd be right, but I still think it's not what a naive user would expect. People don't expect set theory from their pathspecs. They expect their pathspecs to limit the output. They've learnt that within a subdirectory, the pathspec limits to that subdirectory. And now it suddenly starts showing things outside the subdirectory? At that point no amount of "but but think about set theory" will matter, methinks. But I really don't feel strongly about it. The path I sent out (and the slightly modified version attached in this email) actually acts the way you suggest. It's certainly the simplest implementation. I just suspect it's not the implementation people who go down into subdirectories would want/expect. >>> 2. I am not sure what ctype.c change is about. Care to elaborate? >> >> I didn't see the need for it either until I made the rest of the >> patch, and it didn't work at all. >> >> The pathspec.c code uses "if (is_pathspec_magic(..))" to test whether >> a character is a short magiic pathspec character. But '^' wasn't in >> that set, because it was already marked as being (only) in the regex >> set. >> >> Does that whole is_pathspec_magic() thing make any sense when we have >> an array that specifies the special characters we react to? No it does >> not. >> >> But it is what the code does, and I just made that code work. > > Ah, OK. Side note: here's an alternative patch that avoids that issue entirely, and also avoids a problem with prefix_magic() being uphappy when one bit shows up multiple times in the array. It's slightly hacky in parse_short_magic(), but it's smaller and simpler, and avoids the two subtle cases. No need for ctype changes, and no issues with prefix_magic() being somewhat stupid. Linus pathspec.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/pathspec.c b/pathspec.c index 7ababb315..2a91247bc 100644 --- a/pathspec.c +++ b/pathspec.c @@ -224,6 +224,12 @@ static const char *parse_short_magic(unsigned *magic, const char *elem) char ch = *pos; int i; + // Special case alias for '!' + if (ch == '^') { + *magic |= PATHSPEC_EXCLUDE; + continue; + } + if (!is_pathspec_magic(ch)) break; @@ -516,7 +522,7 @@ void parse_pathspec(struct pathspec *pathspec, } pathspec->nr = n; - ALLOC_ARRAY(pathspec->items, n); + ALLOC_ARRAY(pathspec->items, n+1); item = pathspec->items; prefixlen = prefix ? strlen(prefix) : 0; @@ -540,10 +546,14 @@ void parse_pathspec(struct pathspec *pathspec, pathspec->magic |= item[i].magic; } - if (nr_exclude == n) - die(_("There is nothing to exclude from by :(exclude) patterns.\n" - "Perhaps you forgot to add either ':/' or '.' ?")); - + /* +* If everything is an exclude pattern, add one positive pattern +* that matches everyting. We allocated an extra one for this. +*/ + if (nr_exclude == n) { + init_pathspec_item(item + n, 0, "", 0, ""); + pathspec->nr++; + } if (pathspec->magic & PATHSPEC_MAXDEPTH) { if (flags & PATHSPEC_KEEP_ORDER)
Re: Fwd: Possibly nicer pathspec syntax?
Linus Torvaldswrites: > No. The thing is, "git diff" is relative too - for path > specifications. And the negative entries are pathspecs - and they act > as relative ones. > > IOW, that whole > > cd drivers > git diff A..B -- net/ > > will actually show the diff for drivers/net - so the pathspec very > much acts as relative to the cwd. But that is not what I was talking about. Let's simplify. I'd say for any command that acts on "everything" when pathspec is not given, the two sets of actual paths affected by these two: git cmd -- "net/" git cmd -- ":!net/" should have no overlap (obviously) and when you take union of the two sets, that should equal to git cmd -- i.e. no pathspecs. >> 2. I am not sure what ctype.c change is about. Care to elaborate? > > I didn't see the need for it either until I made the rest of the > patch, and it didn't work at all. > > The pathspec.c code uses "if (is_pathspec_magic(..))" to test whether > a character is a short magiic pathspec character. But '^' wasn't in > that set, because it was already marked as being (only) in the regex > set. > > Does that whole is_pathspec_magic() thing make any sense when we have > an array that specifies the special characters we react to? No it does > not. > > But it is what the code does, and I just made that code work. Ah, OK.
Re: Fwd: Possibly nicer pathspec syntax?
On Tue, Feb 07, 2017 at 05:48:26PM -0800, Linus Torvalds wrote: > > > On Tue, 7 Feb 2017, Linus Torvalds wrote: > > > > [ Clarification from original message, since Junio asked: I didn't > > actually want the semantics of '.' at all, since in a subdirectory it > > limits to the current subdirectory. So I'd suggest that in the absence > > of any positive pattern, there is simply no filtering at all, so > > whenever I say '.' as a pattern, I really meant ":(top)." which is > > even more of a cumbersom syntax that the current model really > > encourages. Crazy. Since I tend to always work in the top directory, > > the two are the same for me ] > > So here's an RFC patch, and I'm quoting the above part of my thinking > because it's what the patch does, but it turns out that it's probably not > what we want, and I suspect the "." behavior (as opposed to "no filtering > at all") is actually better. > > Now _I_ don't much care, since I only work from the top level, but without > the "." behavior, you get into an odd situation that the negative match > will be relative to the current directory, but then the positive matches > will be everywhere else. > > Obviously, a negative match that has "top" set would change that logic. So > this patch is purely a request for further discussion. > > When I wrote the patch, I actually also removed the now stale entries from > the 'po' files, but I'm not including that part here because it just > distracts from the meat of it all. So this diff was actually generated > with the new syntax: > > git diff -p --stat -- :^po/ > > and the only thing even remotely subtle here is that it changes our ctype > array to make '^' be both a regex and a pathspec magic character. > > Everything else should be pretty darn obvious. > > The code *could* just track all the 'relative to top or not' bits in the > exclusion pattern, and then use whatever top-ness the exclusion patterns > have (and maybe fall back to the old warning if it had a mixture of > exclusionary patterns). I'll happily change it to act that way if people > think that makes sense. > > Comments? It seems to me that `git diff` and `git diff -- :^stuff` should have the same output if there aren't changes in stuff, and `git diff` does the same as `git diff -- :/` if you are in a subdirectory, not the same as `git diff .`. As such, the default positive match should be ':/' (which is shorter and less cumbersome than ':(top)', btw) Mike
Re: Fwd: Possibly nicer pathspec syntax?
On Tue, Feb 07, 2017 at 06:49:24PM -0800, Linus Torvalds wrote: > On Tue, Feb 7, 2017 at 6:40 PM, Mike Hommeywrote: > > > > As such, the default positive match should be ':/' (which is shorter and > > less cumbersome than ':(top)', btw) > > So that's what my patch does. > > However, it's actually very counter-intuitive in a subdirectory. > > Git doesn't do much of that, but let me give you an example from the > kernel. Again, this is not an example of anything I would do (because > I'm always at the top), but: > > [torvalds@i7 linux]$ cd drivers/ > [torvalds@i7 drivers]$ ll > > .. whee, *lots* of diorectories .. > .. lets see what happened in net/ .. > > [torvalds@i7 drivers]$ git diff -M --dirstat=1,cumulative > v4.10-rc6..v4.10-rc7 -- net/ > 7.4% drivers/net/ethernet/adaptec/ > 47.9% drivers/net/ethernet/cadence/ > 7.1% drivers/net/ethernet/emulex/benet/ > 1.1% drivers/net/ethernet/freescale/ > 3.6% drivers/net/ethernet/mellanox/mlx4/ > 23.5% drivers/net/ethernet/mellanox/mlx5/core/ > 27.2% drivers/net/ethernet/mellanox/ > 92.5% drivers/net/ethernet/ > 5.3% drivers/net/wireless/intel/iwlwifi/mvm/ > 5.9% drivers/net/wireless/intel/iwlwifi/ >100.0% drivers/net/ > > .. let's see what happened *outside* of net/ .. > > [torvalds@i7 drivers]$ git diff -M --dirstat=1,cumulative > v4.10-rc6..v4.10-rc7 -- :^net/ >2.4% arch/arm64/crypto/ >2.1% arch/powerpc/include/asm/ >1.5% arch/powerpc/kernel/ >3.9% arch/powerpc/ >3.5% arch/sparc/kernel/ >4.1% arch/sparc/ >8.3% arch/x86/events/intel/ >1.7% arch/x86/kernel/cpu/mcheck/ >1.6% arch/x86/kernel/cpu/microcode/ >3.3% arch/x86/kernel/cpu/ >3.8% arch/x86/kernel/ >1.0% arch/x86/platform/efi/ > 13.3% arch/x86/ > 24.0% arch/ >1.1% drivers/base/ >2.9% drivers/dma/ > 12.3% drivers/gpu/drm/i915/ >1.0% drivers/gpu/drm/nouveau/ > 16.2% drivers/gpu/drm/ >3.9% drivers/hid/ >1.6% drivers/iio/ >2.3% drivers/regulator/ >... > > Notice? When you say "show only the net subdirectory" it does the > obvious thing relative to the current working directory, but if you > say "show everything _but_ the net subdirectory" it suddenly starts > showing other things. I can totally see how this can be confusing, but the case can be made that the problem is that `git diff` would be showing everything if you didn't pass an exclusion list. Now, what you're suggesting introduces some inconsistency, which, in itself, can cause confusion. I'm not sure which confusion is worse. Mike
Re: Fwd: Possibly nicer pathspec syntax?
On Tue, Feb 7, 2017 at 6:42 PM, Junio C Hamanowrote: > > 1. I think some commands limit their operands to cwd and some work > on the whole tree when given no pathspec. I think the "no > positive? then let's give you everything except these you > excluded" should base the definition of "everything" to that. > IOW, "cd t && git grep -e foo" shows everything in t/ directory, > so the default you would add would be "." for "grep"; "cd t && > git diff HEAD~100 HEAD" would show everything, so you would give > ":(top)." for "diff". No. The thing is, "git diff" is relative too - for path specifications. And the negative entries are pathspecs - and they act as relative ones. IOW, that whole cd drivers git diff A..B -- net/ will actually show the diff for drivers/net - so the pathspec very much acts as relative to the cwd. So no, absolute (ie ":(top)" or ":/") doesn't actually make sense for "diff" either, even though diff by default is absolute when not given a pathname at all. But if you do cd drivers git diff A..B -- :^/arch then suddenly an absolute positive root _does_ make sense,. because now the negative pathspec was absolute.. Odd? Yes it is. But the positive pathspec rules are what they are, and they are actually what I suspect everybody really wants. The existing negative ones match the rules for the positive ones. So I suspect that the best thing is if the "implicit positive rule when there are no explicit ones" ends up matching the same semantics as the (explicit) negative entries have.. > 2. I am not sure what ctype.c change is about. Care to elaborate? I didn't see the need for it either until I made the rest of the patch, and it didn't work at all. The pathspec.c code uses "if (is_pathspec_magic(..))" to test whether a character is a short magiic pathspec character. But '^' wasn't in that set, because it was already marked as being (only) in the regex set. Does that whole is_pathspec_magic() thing make any sense when we have an array that specifies the special characters we react to? No it does not. But it is what the code does, and I just made that code work. > 3. I think our recent trend is to wean ourselves away from "an > empty element in pathspec means all paths match", and I think we > even have accepted a patch to emit a warning. Doesn't the > warning trigger for the new code below? It didn't trigger for me in my testing, I suspect the warning is at an earlier level when it walks through the argv[] array and fills in the pathspec arrays. But I didn't actually look for it. Linus
Re: Fwd: Possibly nicer pathspec syntax?
On Tue, Feb 7, 2017 at 6:40 PM, Mike Hommeywrote: > > As such, the default positive match should be ':/' (which is shorter and > less cumbersome than ':(top)', btw) So that's what my patch does. However, it's actually very counter-intuitive in a subdirectory. Git doesn't do much of that, but let me give you an example from the kernel. Again, this is not an example of anything I would do (because I'm always at the top), but: [torvalds@i7 linux]$ cd drivers/ [torvalds@i7 drivers]$ ll .. whee, *lots* of diorectories .. .. lets see what happened in net/ .. [torvalds@i7 drivers]$ git diff -M --dirstat=1,cumulative v4.10-rc6..v4.10-rc7 -- net/ 7.4% drivers/net/ethernet/adaptec/ 47.9% drivers/net/ethernet/cadence/ 7.1% drivers/net/ethernet/emulex/benet/ 1.1% drivers/net/ethernet/freescale/ 3.6% drivers/net/ethernet/mellanox/mlx4/ 23.5% drivers/net/ethernet/mellanox/mlx5/core/ 27.2% drivers/net/ethernet/mellanox/ 92.5% drivers/net/ethernet/ 5.3% drivers/net/wireless/intel/iwlwifi/mvm/ 5.9% drivers/net/wireless/intel/iwlwifi/ 100.0% drivers/net/ .. let's see what happened *outside* of net/ .. [torvalds@i7 drivers]$ git diff -M --dirstat=1,cumulative v4.10-rc6..v4.10-rc7 -- :^net/ 2.4% arch/arm64/crypto/ 2.1% arch/powerpc/include/asm/ 1.5% arch/powerpc/kernel/ 3.9% arch/powerpc/ 3.5% arch/sparc/kernel/ 4.1% arch/sparc/ 8.3% arch/x86/events/intel/ 1.7% arch/x86/kernel/cpu/mcheck/ 1.6% arch/x86/kernel/cpu/microcode/ 3.3% arch/x86/kernel/cpu/ 3.8% arch/x86/kernel/ 1.0% arch/x86/platform/efi/ 13.3% arch/x86/ 24.0% arch/ 1.1% drivers/base/ 2.9% drivers/dma/ 12.3% drivers/gpu/drm/i915/ 1.0% drivers/gpu/drm/nouveau/ 16.2% drivers/gpu/drm/ 3.9% drivers/hid/ 1.6% drivers/iio/ 2.3% drivers/regulator/ ... Notice? When you say "show only the net subdirectory" it does the obvious thing relative to the current working directory, but if you say "show everything _but_ the net subdirectory" it suddenly starts showing other things. Now, it would be easy enough to say "if you don't give a positive path, we'll just use the empty path that matches the negative paths". So if you ask for a negative relative "net" directory, we'll use the relative empty path. And if you ask for a negative absolute path, we'll use the empty absolute path. It's a couple of lines more, and I think it might avoid some confusion. And I suspect almost nobody has ever done any of this before,. because the syntax was/is so cumbersome. Linus
Re: Fwd: Possibly nicer pathspec syntax?
Linus Torvaldswrites: > So here's an RFC patch, and I'm quoting the above part of my thinking > because it's what the patch does, but it turns out that it's probably not > what we want, and I suspect the "." behavior (as opposed to "no filtering > at all") is actually better. > ... > > Comments? 1. I think some commands limit their operands to cwd and some work on the whole tree when given no pathspec. I think the "no positive? then let's give you everything except these you excluded" should base the definition of "everything" to that. IOW, "cd t && git grep -e foo" shows everything in t/ directory, so the default you would add would be "." for "grep"; "cd t && git diff HEAD~100 HEAD" would show everything, so you would give ":(top)." for "diff". 2. I am not sure what ctype.c change is about. Care to elaborate? 3. I think our recent trend is to wean ourselves away from "an empty element in pathspec means all paths match", and I think we even have accepted a patch to emit a warning. Doesn't the warning trigger for the new code below? > - if (nr_exclude == n) > - die(_("There is nothing to exclude from by :(exclude) > patterns.\n" > - "Perhaps you forgot to add either ':/' or '.' ?")); > - > + /* > + * If everything is an exclude pattern, add one positive pattern > + * that matches everyting. We allocated an extra one for this. > + */ > + if (nr_exclude == n) { > + init_pathspec_item(item + n, 0, "", 0, ""); > + pathspec->nr++; > + } > > if (pathspec->magic & PATHSPEC_MAXDEPTH) { > if (flags & PATHSPEC_KEEP_ORDER)
Re: Fwd: Possibly nicer pathspec syntax?
On Tue, 7 Feb 2017, Linus Torvalds wrote: > > [ Clarification from original message, since Junio asked: I didn't > actually want the semantics of '.' at all, since in a subdirectory it > limits to the current subdirectory. So I'd suggest that in the absence > of any positive pattern, there is simply no filtering at all, so > whenever I say '.' as a pattern, I really meant ":(top)." which is > even more of a cumbersom syntax that the current model really > encourages. Crazy. Since I tend to always work in the top directory, > the two are the same for me ] So here's an RFC patch, and I'm quoting the above part of my thinking because it's what the patch does, but it turns out that it's probably not what we want, and I suspect the "." behavior (as opposed to "no filtering at all") is actually better. Now _I_ don't much care, since I only work from the top level, but without the "." behavior, you get into an odd situation that the negative match will be relative to the current directory, but then the positive matches will be everywhere else. Obviously, a negative match that has "top" set would change that logic. So this patch is purely a request for further discussion. When I wrote the patch, I actually also removed the now stale entries from the 'po' files, but I'm not including that part here because it just distracts from the meat of it all. So this diff was actually generated with the new syntax: git diff -p --stat -- :^po/ and the only thing even remotely subtle here is that it changes our ctype array to make '^' be both a regex and a pathspec magic character. Everything else should be pretty darn obvious. The code *could* just track all the 'relative to top or not' bits in the exclusion pattern, and then use whatever top-ness the exclusion patterns have (and maybe fall back to the old warning if it had a mixture of exclusionary patterns). I'll happily change it to act that way if people think that makes sense. Comments? Linus --- ctype.c| 3 ++- pathspec.c | 15 ++- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/ctype.c b/ctype.c index fc0225ceb..250e2ce15 100644 --- a/ctype.c +++ b/ctype.c @@ -14,6 +14,7 @@ enum { P = GIT_PATHSPEC_MAGIC, /* other non-alnum, except for ] and } */ X = GIT_CNTRL, U = GIT_PUNCT, + Y = GIT_REGEX_SPECIAL | GIT_PATHSPEC_MAGIC, Z = GIT_CNTRL | GIT_SPACE }; @@ -23,7 +24,7 @@ const unsigned char sane_ctype[256] = { S, P, P, P, R, P, P, P, R, R, G, R, P, P, R, P, /* 32.. 47 */ D, D, D, D, D, D, D, D, D, D, P, P, P, P, P, G, /* 48.. 63 */ P, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, /* 64.. 79 */ - A, A, A, A, A, A, A, A, A, A, A, G, G, U, R, P, /* 80.. 95 */ + A, A, A, A, A, A, A, A, A, A, A, G, G, U, Y, P, /* 80.. 95 */ P, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, /* 96..111 */ A, A, A, A, A, A, A, A, A, A, A, R, R, U, P, X, /* 112..127 */ /* Nothing in the 128.. range */ diff --git a/pathspec.c b/pathspec.c index 7ababb315..ef59d080d 100644 --- a/pathspec.c +++ b/pathspec.c @@ -72,6 +72,7 @@ static struct pathspec_magic { { PATHSPEC_GLOB,'\0', "glob" }, { PATHSPEC_ICASE, '\0', "icase" }, { PATHSPEC_EXCLUDE, '!', "exclude" }, + { PATHSPEC_EXCLUDE, '^', "exclude" }, }; static void prefix_magic(struct strbuf *sb, int prefixlen, unsigned magic) @@ -516,7 +517,7 @@ void parse_pathspec(struct pathspec *pathspec, } pathspec->nr = n; - ALLOC_ARRAY(pathspec->items, n); + ALLOC_ARRAY(pathspec->items, n+1); item = pathspec->items; prefixlen = prefix ? strlen(prefix) : 0; @@ -540,10 +541,14 @@ void parse_pathspec(struct pathspec *pathspec, pathspec->magic |= item[i].magic; } - if (nr_exclude == n) - die(_("There is nothing to exclude from by :(exclude) patterns.\n" - "Perhaps you forgot to add either ':/' or '.' ?")); - + /* +* If everything is an exclude pattern, add one positive pattern +* that matches everyting. We allocated an extra one for this. +*/ + if (nr_exclude == n) { + init_pathspec_item(item + n, 0, "", 0, ""); + pathspec->nr++; + } if (pathspec->magic & PATHSPEC_MAXDEPTH) { if (flags & PATHSPEC_KEEP_ORDER)
Re: Fwd: Possibly nicer pathspec syntax?
Linus Torvaldswrites: > [ Duh, I sent this just to Junio initially due to a brainfart. Here > goes the list also ] And my earlier response goes to the list ;-) Linus Torvalds writes: > Most of the time when I use pathspecs, I just use the bog-standard > normal ones, and everything works wonderfully. It is, I think, a no-brainer to lift the "you must have at least one positive". If the user says "not this and that", it is reasonable to assume that "but include everything else" is implied. As to "!" that triggers history substitution without quoting, it may be annoying and I think it is probably OK to pick a synonym letter, perhaps "^", now that the set of pathspec magics do not seem to be growing rapidly and there may not be any other existing magic that the natural meaning of "^" would match better than "negate". The primary reason why we used ! is, I think, to match patterns in the exclude files. As to the leading ":", that is shared between the ":(long form)" and the short form, I am a bit hesitant to lose it. It allows the users to be trained only once, i.e. "if you want to match a path without magic in your working tree, you need to watch out for an unusual path that begins with a colon, which may be quite minority to begin with. You just prefix ./ in front to defeat it. Everything else you can type as-is, modulo wildcard metacharacters, but you know that already." and their brains need no upgrading. Once we start accepting short forms without the ":", every time we add a short form magic, the users need to be retrained. In short, this > Or even just allowing ^ in addition to ! for negation, and otherwise > keeping the current syntax. in addition to "no positives? let's pretend you also said '.' as a positive", would not be too bad, methinks. And that allows this > git diff -M --dirstat .. -- ':!drivers' ':!arch' . to become git diff -M --dirstat .. -- :^{drivers,arch} which is a bit shorter. I personally am perfectly fine without ^, i.e. git diff -M --dirstat .. -- :\!{drivers,arch} though. By the way, I am wondering why this is private, not cc'ed to the mailing list. As messages addressed to gitster@ without git@vger bypass my Inbox and gets thrown into spam box, which I only occasionally scan to resurrect messages worth responding, and this is one of those cases ;-)
Fwd: Possibly nicer pathspec syntax?
[ Duh, I sent this just to Junio initially due to a brainfart. Here goes the list also ] Most of the time when I use pathspecs, I just use the bog-standard normal ones, and everything works wonderfully. But then occasionally I want to exclude a directory (usually "drivers/"), and it all works fine, but the syntax for that is just really cumbersome. That's due to two issues: - the magic characters seem to have been chosen to be annoying on purpose - there's an extra "you can't exclude things without having positive patterns" check and both of those are rather nasty. So to explain where I come from, during releases I do things like this: git diff -M --dirstat=2,cumulative v4.10-rc6..v4.10-rc7 and this is literaly why that "dirstat" diff exists - I find this very useful for a project like the kernel that has a good hierarchical directory structure. So the whole dirstat option came about from my statistics gathering (see more explanations in my original commit 7df7c019c ("Add "--dirstat" for some directory statistics"). However, what often happens for the kernel is that a few big subsystems end up dominating the discussion (usually drivers and architecture), and then you want to drill down into everything else to get that part. Long ago, that used to be painful, and I did things like git diff -M --dirstat ... -- $(ls | egrep -v '(drivers)|(arch)') which works, and gives me the dirstat for stuff that isn't arch or driver updates. However, git actually added exclude patterns, and I don't need to do that crazy thing with shell expansion any more. Now I can do this crazy thing with git natively instead: git diff -M --dirstat .. -- ':!drivers' ':!arch' . but honestly, the git native interface really isn't much simpler than what I used to do. Is there really any reason for requiring the '.'? [ Clarification from original message, since Junio asked: I didn't actually want the semantics of '.' at all, since in a subdirectory it limits to the current subdirectory. So I'd suggest that in the absense of any positive pattern, there is simply no filtering at all, so whenever I say '.' as a pattern, I really meant ":(top)." which is even more of a cumbersom syntax that the current model really encourages. Crazy. Since I tend to always work in the top directory, the two are the same for me ] And did we really have to pick such annoying characters that we need the shell escaping? (I never use the other ones with long forms, but they have the same issue: parenthesis need escaping too, so you have to write them as ':(exclude,icase)drivers' and you have to remember that a final colon is *not* allowed, and they still need the escaping). It really isn't all that wonderful to use from the command line. In revisions, we use "^" for negation, partly exactly because '!' is such a nasty character for shell users. With exclusion being the only case I particularly use, I'd like that for pathspecs too, but maybe others use icase etc? IOW, what I'd like to do is just git diff -M --dirstat .. ^drivers ^arch without needing the ugly quoting, and without needing the pointless positive 'match everything else' marker. Or even just allowing ^ in addition to ! for negation, and otherwise keeping the current syntax. [ Clarification from original message, since Junio asked: yes, this suggestion still assumes the "don't need to specify the positive pattern", so you could just do git diff :^drivers to avoid the drivers directory ] Comments? Other ideas? This is certainly not a high priority, but I hit it once again when doing the 4.10-rc7 statistics, which is why I bring up the discussion.. Linus