Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Robert Elz wrote: > Date:Fri, 27 Apr 2018 15:06:52 +0200 > From:Joerg Schilling > Message-ID: > <5ae3206c.gzrnd81xboh3e0x7%joerg.schill...@fokus.fraunhofer.de> > > | Since bash seems to be the only shell that works this way, > > Until I changed the NetBSD sh (if that change is retained), yes. > > | I would call this a bug. > > Then I think it would be also a bug in POSIX (as I think it > actually specifies this result) and a deficiency - as there > really needs to be a way to store a pattern in a variable > such that a pattern-magic character can be treated literally. > > I will leave it for Chet to say whether or not he considers this > to be a bug in bash. In case that bash did pass the bosh conformance test suite, this was a suitable proposal. Unfortunately, this is not the case and for this reason, there is no way to verify whether this bash deviation from other implementations is not related to other deviations as well. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Robert Elz wrote: > Date:Fri, 27 Apr 2018 10:00:50 +0100 > From:Geoff Clare > Message-ID: <20180427090050.GA2538@lt2.masqnet> > > quoting me: > | > 4. On the question of bug 985 ... (kind of related) - if quote removal > is > | > added to case pattern processing, it makes that into a different case > from all > | > of the others. [...] > | > | The danger here is that there are references to quote removal elsewhere > > This isn't about any such potential dangers, which I don't think exist, but a > case where it seems to make a difference. > > Consider this, where different shells produce different results: > > $SHELL -c 'LC_ALL=C; case B in ([[:"alpha":]]) printf M;; (*) printf > X;; esac' > > bash bosh and pdksh print 'X' (fail to match), everything else I have tested > (not posh or ksh88 - or a v7 sh) prints 'M' (matches). That includes mksh > ksh93 and all the ash dervied shells I have access to. Since the POSIXyfied ksh88 prints "X", it seems that this is a result of a change in ksh93 that may not be POSIX compliant. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
On 4/27/18 10:02 AM, Robert Elz wrote: > Date:Fri, 27 Apr 2018 15:06:52 +0200 > From:Joerg Schilling > Message-ID: > <5ae3206c.gzrnd81xboh3e0x7%joerg.schill...@fokus.fraunhofer.de> > > | Since bash seems to be the only shell that works this way, > > Until I changed the NetBSD sh (if that change is retained), yes. > > | I would call this a bug. > > Then I think it would be also a bug in POSIX (as I think it > actually specifies this result) and a deficiency - as there > really needs to be a way to store a pattern in a variable > such that a pattern-magic character can be treated literally. > > I will leave it for Chet to say whether or not he considers this > to be a bug in bash. I don't. If a shell variable contains a literal backslash, that backslash should be treated as an escape character by the pattern matching engine. This is as the standard specifies. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Date:Fri, 27 Apr 2018 10:00:50 +0100 From:Geoff Clare Message-ID: <20180427090050.GA2538@lt2.masqnet> quoting me: | > 4. On the question of bug 985 ... (kind of related) - if quote removal is | > added to case pattern processing, it makes that into a different case from all | > of the others. [...] | | The danger here is that there are references to quote removal elsewhere This isn't about any such potential dangers, which I don't think exist, but a case where it seems to make a difference. Consider this, where different shells produce different results: $SHELL -c 'LC_ALL=C; case B in ([[:"alpha":]]) printf M;; (*) printf X;; esac' bash bosh and pdksh print 'X' (fail to match), everything else I have tested (not posh or ksh88 - or a v7 sh) prints 'M' (matches). That includes mksh ksh93 and all the ash dervied shells I have access to. In pdksh the issue is just that char classes don't match at all (not implemented) so that one we can ignore. A true v7 sh would be the same. (In those the input word 'p]' matches - or variants of that.) The original test had var=alpha and the pattern was [[:"$var":]] but that makes no difference at all (after expansion the two cases look the same). "No difference" means the different shells produce the same results this way as they do the other way, whether matching or not. If either quote removal is specified to happen before pattern matching (but I really think that would break too many other cases) or if the way quoted strings are encoded in the shell is not literally as "string" then this matches (quoted "alpha" is still alpha) (similarly if the pattern match code was "clever" about quotes in patterns, aside from \ - but it is not, in any shell, so I think that option is out of consideration). This works (with ether the literal [[:alpha:]] or with [[:$var:]]) when the double quotes are not present (except in pdksh of course.) It does not work anywhere, and I would not really expect it to with the pattern being [[:$var:]] (no quotes) with var='"alpha"' (though that would not be out of the question if the "clever" quotes in patterns model was adopted.) (The actual test case gets a bit ugly to get the quoting right to allow that to be input, but that is not the issue,.) kre
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Date:Fri, 27 Apr 2018 15:24:30 +0100 From:Geoff Clare Message-ID: <20180427142430.GB9716@lt2.masqnet> | This discussion seems to have come round to the same issue that was | raised recently in some comments in bug 1190, specifically Stephane's | notes 3960 and 3962 and my reply in note 3963. Yes, I remembered seeing something like that, somewhere... | In summary: the need for a way to store a pattern in a variable such | that a pattern-magic character can be treated literally Yes, that is the need. | is a reason to keep the first paragraph of 2.13.1 as-is and say that | shells which behave differently than bash here do not conform. That would be nice. I was going to say that I expect that Jörg would not agree - but I see he has already done that For now the best that might be possible, given that almost no shells do this, would be to make it unspecified whether this works, and mark it as a future direction that a later rev will require it. kre
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Date:Fri, 27 Apr 2018 16:20:01 +0200 From:Joerg Schilling Message-ID: <5ae33191.adgpivkbwgx8dc1y%joerg.schill...@fokus.fraunhofer.de> | But you forgot that after this variable content is expanded, it is quoted in a | way to keep the content in the final result. I didn't forget that, because it doesn't happen. That's what the bosh -c 'var="???";printf "%s\n" ${var}' was meant to show. The "???" is not kept in the final result, it is expanded to produce all the 3 character filenames. | This however requires the macro | expansion code (parameter expansion) to quote the \ at the end of the macro | expansion to allow the \ to be kept visible after the final quote removal. It doesn're require anything of the kind. That \ is not subject to quote removal, as it was not part of the original word. Only quotes that were in the original word get removed. Sure, quoting it might be one way to make that work, provided you can do it properly - but that does not duplicae the original shell. Remember, as you showed the code earlier, the original Bourne sh parsed original word qouting by setting the QUOTE bit on the quoted text. Results of expansions don't get that. Then quote removal is just clearing that bit - it is all simple (and easy to code, and small, which is why I assume it was done that way - despite all the idiotic quoting rules it has left us with). | If this is not in the POSIX text, It isn't, and should not be, as it is simply wrong. The way the NetBSD sh (and original ash) copes with field splitting, (and quote removal, or could, though that's actually done differently) is by remembering (and updating as it changes) offsets into the word to keep track of which chars are originals, and which are the results of expansions. The FreeBSD sh (which being based upon ash) used to be the same, but they rewrote all of that part and now do it a different way (but certainly not quoting the results of expansions). kre
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Joerg Schilling wrote, on 27 Apr 2018: > > Geoff Clare wrote: > > > In summary: the need for a way to store a pattern in a variable such > > that a pattern-magic character can be treated literally is a reason to > > keep the first paragraph of 2.13.1 as-is and say that shells which > > behave differently than bash here do not conform. > > I am not convinced since _all_ other shells behave the same and since > changing > this in the shell would result in other missbehavior as well. > > Your wish would e.g. result in a missbehaving "case". The comments in bug 1190 that I referred to (in the part you snipped) are about "case"! -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Date:Fri, 27 Apr 2018 09:33:49 -0400 From:Shware Systems Message-ID: <163074f534e-c83-4...@webjas-vaa062.srv.aolmail.net> | For my analysis, 2.6.5 says it is results which are subject to field splitting, Yes, but irrelevant here | with the parameter expand and direct entry both being one field as the pattern to evaluate | according to 2.6.6, yes. | and the treatment of the double quotes follows from 2.13.1 that is how I read the text. I kind of doubt that is how it is intended to work, but that is what it looks like to me as well. | before removal by 2.6.7 those quotes would not be rmeoved by that, but that should only matter if the pattern matches no files - otherwise the pattern, and its quotes, is removed, and the file names produced appear instead. | processing. 2.13.1 effectively has the quotes ignored, That's how I read it.Of course, all this is based upon the (frankly bogus) specification that quoting characters in words are retained as is in the word for later processing. | using only the chars in between (the one ?), for matching purposes. Yes, again, that is how I would read the current text. | 2.6.7 does not properly account for that when a pattern has been evaluated, | the ignored quotes are required to be removed to reflect the intent of the pattern. No, that's not what happens. If the pattern matches any files, the pattern vanishes, and the matched file names replace it (as many fields as needed). Any quote characters produced there (files that contain quote characters in their names) must be retained (I have plenty of those in my test directory.) If the pattern does not match, the word will be retained unchanged, and the quotes will remain in it. That's actualy useful. | What is there now is more the requirements when set -f in effect, No, it is not that - filename generation still happens, what's missing is any processing of the quote characters. | and then quotes from var expansions, not being in the original input, would be | expected to stay in the result as literals. Yes, agreed - either when filename expansion does not happen, or when no files are matched. kre ps: please could you avoid top posting - my messages are long and boring enough the first time, no-one needs to get them resent in full as a part of a reply!
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Geoff Clare wrote: > In summary: the need for a way to store a pattern in a variable such > that a pattern-magic character can be treated literally is a reason to > keep the first paragraph of 2.13.1 as-is and say that shells which > behave differently than bash here do not conform. I am not convinced since _all_ other shells behave the same and since changing this in the shell would result in other missbehavior as well. Your wish would e.g. result in a missbehaving "case". Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Date:Fri, 27 Apr 2018 15:23:10 +0200 From:Joerg Schilling Message-ID: <5ae3243e.8dyd5s4eftmrpyui%joerg.schill...@fokus.fraunhofer.de> | Robert Elz wrote: | | > But it looked right, so I changed (not yet committed, | | This would be a mistake. Perhaps. | > Then I started pondering other quote characters, since the quote | > characters are still in the string, that is, if the command were | > | > $SHELL -c 'printf "%s\n" [a-e]\?.*' | | This is a different example, as you here have a quoted '?' instead of a quoted | \ as in the first example. There was never a quoted \ (except in the assignment to var). | > bosh -c 'printf "%s\n" [a-e]\?.*' | > a?.?? | > b?.?? | > c?.?? | > e?.?? | | See above, a different example results in a different behavior. Of course, but the original example was ${SHELL} -c 'var="[a-e]\?.*";printf "%s\n" ${var}' or ${SHELL} -c 'var="[a-e]\\?.*";printf "%s\n" ${var}' which are identical to each other in effect. The only difference from the bosh example above is that this one has the pattern (the same pattern) in a variable, where the bosh one had it on the command line. | > bosh -c 'var="???";printf "%s\n" ${var}' | wc -l | > 2297 | | I am not sure what this should point to. It indicates that the results of a variable expansion are not "internally quoted" which is how you justified the earlier example not working. If the ${var} result was somehow quoted, the ? chars that result would be quoted, and so would not be matching characters. But they're not, so they are. This is working as it should be, and there is no "internal quoting" being performed. kre
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Robert Elz wrote, on 27 Apr 2018: > > Date:Fri, 27 Apr 2018 15:06:52 +0200 > From:Joerg Schilling > > | Since bash seems to be the only shell that works this way, > > Until I changed the NetBSD sh (if that change is retained), yes. > > | I would call this a bug. > > Then I think it would be also a bug in POSIX (as I think it > actually specifies this result) and a deficiency - as there > really needs to be a way to store a pattern in a variable > such that a pattern-magic character can be treated literally. This discussion seems to have come round to the same issue that was raised recently in some comments in bug 1190, specifically Stephane's notes 3960 and 3962 and my reply in note 3963. In summary: the need for a way to store a pattern in a variable such that a pattern-magic character can be treated literally is a reason to keep the first paragraph of 2.13.1 as-is and say that shells which behave differently than bash here do not conform. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Date:Fri, 27 Apr 2018 15:17:41 +0200 From:Joerg Schilling Message-ID: <5ae322f5.uw3u84gim9o+bvrx%joerg.schill...@fokus.fraunhofer.de> | See my recent reply, this does not result in a quoted \. Of course it doesn't - no-one wants (or ever attempted) a quoted \, we want a quoted '?' kre
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Robert Elz wrote: > The examples with "" characters I expect will simply remain as they > are in all shells, and the code I have been in the process of writing > to allow that to "work" (based on the assumption that there is no reason > why not - and even now, except that it doesn't work that way in other > shells, I see no good reason to doubt) should just be consigned to the > scrap heap (that code doesn't even compile yet, so no big loss.) > > | In your example, expand() is told to expand: > | > | [a-e]\\?.* > > No it isn't. I said the \\ was irrelevant and I meant it. > > In > var="[a-e]\\?.*" > > which is the command that was used, the first \ is a quoting > character, and is removed by quote removal (as are the > enclosing "") just before the assignment to var is performed. > > The value assigned to var is > > [a-e]\?.* But you forgot that after this variable content is expanded, it is quoted in a way to keep the content in the final result. This however requires the macro expansion code (parameter expansion) to quote the \ at the end of the macro expansion to allow the \ to be kept visible after the final quote removal. If this is not in the POSIX text, this is a bug of the same quality as the incorrect backus naur grammar for the shell in the POSIX standard text. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Robert Elz wrote, on 27 Apr 2018: > > Date:Fri, 27 Apr 2018 10:00:50 +0100 > From:Geoff Clare > > | I believe the former text is misleading and should be deleted. It is > | effectively duplicating the requirements regarding backslashes stated in > | 2.2.1 and 2.2.3, but gets the details wrong. > > Except that here it is talking about quoting characters in patterns, Oops, you're right. For some reason I had it in my head that this special pattern-matching meaning was covered elsewhere, but now that I look again I see that this is the place. -- Geoff Clare The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Date:Fri, 27 Apr 2018 15:06:52 +0200 From:Joerg Schilling Message-ID: <5ae3206c.gzrnd81xboh3e0x7%joerg.schill...@fokus.fraunhofer.de> | Since bash seems to be the only shell that works this way, Until I changed the NetBSD sh (if that change is retained), yes. | I would call this a bug. Then I think it would be also a bug in POSIX (as I think it actually specifies this result) and a deficiency - as there really needs to be a way to store a pattern in a variable such that a pattern-magic character can be treated literally. I will leave it for Chet to say whether or not he considers this to be a bug in bash. | I tested Historic Bourne, ksh88, ksh92, dash, yash, mksh posh, zsh, bosh. I agree, and the FreeBSD and currently released (and all available) NetBSD shells as well. | BTW: with the previous example, the "expand" function is told to expand: | | a*"? That's the one where I missed the closing quote (deliverately) - let's just forget that one for now until we get a real conclusion on what should happen with pairs of quptes (and more importantly, \ quoting). The examples with "" characters I expect will simply remain as they are in all shells, and the code I have been in the process of writing to allow that to "work" (based on the assumption that there is no reason why not - and even now, except that it doesn't work that way in other shells, I see no good reason to doubt) should just be consigned to the scrap heap (that code doesn't even compile yet, so no big loss.) | In your example, expand() is told to expand: | | [a-e]\\?.* No it isn't. I said the \\ was irrelevant and I meant it. In var="[a-e]\\?.*" which is the command that was used, the first \ is a quoting character, and is removed by quote removal (as are the enclosing "") just before the assignment to var is performed. The value assigned to var is [a-e]\?.* which is exactly the same as when the command was var="[a-e]\?.*" as there the \ is not a quoting character, as '?' isn't one of the magic few that \ can quote inside a double quoted string -- but another \ is. If I had used var='[a-e]\\?.*' that would be different, there neither \ is a quoting char, and what you said would be expanded would be correct. But that is not what was done (as I was using, as I always do when I can, single quotes around the arg to sh -c - using single quotes inside that string then gets ugly (bad for examples when the quoting is not the point) so I avoid that when possible (of course, the test cases include examples like that - doesn't matter if they're incomprehensible.) | But: | | sh -c 'var="[a-e]?.*";printf "%s\n" ${var}' | a?.?? | | ...I have only one matching file. This is an entirely different pattern, which matches a whole different set of files (including the ones that the other pattern matches - sometimes) bosh -c 'var="[a-e]?.*";printf "%s\n" ${var}' |wc -l 84 again, the wc is just because you really don't want to see the list of odd filenames that match that pattern. bosh is correct incidentally, all shells produce the same 84 files, but this is a very easy case. The idea is to match files that contain a letter (one of the 5) followed by a literal character '?' followed by a literal character '.' followed by anything at all. And to store that pattern in a variable. The literal '.' is no problem, the question is how tio encode the literal ?. I showed one way, using pattern magic, in my reply to Geoff, the question is why not using shell quoting as well. Note: that the section in 2.13.1 (which Geoff says is the correct explanation of quoting in patterns) says: When pattern matching is used where shell quote removal is not performed [...] special characters can be escaped to remove their special meaning by preceding them with a character. "special characters" there is referring to the '*' '?' and '[' chars, and the section goes on to allow \\ for matching a literal '\'. Since 2.6.7 (Quote Removal) says ... The quote characters (, single-quote, and double-quote) that were present in the original word shall be removed unless they have themselves been quoted. which means that quote removal is not performed on text in a word that came from the results of an expansion (that's not the original word) and so one could read 2.13.1 as saying that \ quoting of special characters is available in this context, since quote removal is not performed there, (which then makes it just the same as in literal patterns in the text, though there the \ acts as a quoting character, and quotes the special characters that way.) Now I am quite willing to admit (especially given that shells have not historically implemented this this way) that this might not be intended, and that perhaps the spec needs to be changed to make this more clear - but as it is wr
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
For my analysis, 2.6.5 says it is results which are subject to field splitting, with the parameter expand and direct entry both being one field as the pattern to evaluate according to 2.6.6, and the treatment of the double quotes follows from 2.13.1 before removal by 2.6.7 processing. 2.13.1 effectively has the quotes ignored, using only the chars in between (the one ?), for matching purposes. 2.6.7 does not properly account for that when a pattern has been evaluated, the ignored quotes are required to be removed to reflect the intent of the pattern. What is there now is more the requirements when set -f in effect, and then quotes from var expansions, not being in the original input, would be expected to stay in the result as literals. On Friday, April 27, 2018 Robert Elz wrote: Date: Fri, 27 Apr 2018 11:03:57 +0200 From: Joerg Schilling Message-ID: <5ae2e77d.95ubF707FXNl6/H/%joerg.schill...@fokus.fraunhofer.de> First, a (minor) apology - I should have made it clear that, yes, "set +f" was intended, and that IFS was not intended to contain any unusual values (no 'a' '*' "'"' '\' or '?' in it... ) Obviously anything like that would alter the results, and that kind of bizarreness is not what I was seeking to query - and if I was, those pre-conditions would not have been forgotten. | XCU 2.6.5 explains what happens after parameter expansion, the quoting happens | as the last action during parameter expansion. 2.6.5 is field splitting, which while it would normally be attempted in the example I gave, would do nothing - and we could disable it by assuming IFS='' if wanted - that should change nothing. But in any case, unless some new text has been added in the resolution of some bug that I am unaware of (which is most of them...) I see nothing in 2.6.5 which is even remotely similar to what you said. Can you cut/paste the relevant words, or quote line numbers, or if there's a change that is not yet in the published text, the bug number ? | The text related to double quotes refers only to "spaces" inside the result. No, it means IFS characters - that is, something that was quoted is not subject to field splitting - that's usually white space, but doesn't have to be, but I agree, that's not relevant to anything here (since field splitting is not going to change anything anyway, we can simply disable it, with IFS='') | If you like, check: | | $shell -c "var='a*\"?\"'; echo \$var" | | alls shells agree here ;-) Yes, they probably do in that case. They don't however in the case that originally caused me to start looking at this. [Aside: Martijn Dekker's modernish found some problems with NetBSD's pattern matching - minor and obscure ones - but clearly bugs, and then when I started testing, I found a few more ... so I created a large set of tests for everything obscure and weird I could think of and these messages are the result of that: before I can "fix" anything I need to understand what is the correct result, and why.] The problem case is: ${SHELL} -c 'var="[a-e]\\?.*";printf "%s\n" ${var}' There are 4 files in $PWD (when the above command is executed) with names that start with a char in [a-e] followed by a '?' followed by a '.' followed by two more '?' chars - and lots more irrelevant files). Almost all shells simply print [a-e]\?.* which is the string assigned to "var" (whether the original input has one or two \ characters makes no difference, and nor should it.) But bash doesn't: (the -o posix given here makes no difference) bash -o posix -c 'var="[a-e]\\?.*";printf "%s\n" ${var}' a?.?? b?.?? c?.?? e?.?? So I started wondering why, and looked at the spec, and could find nothing to suggest this should not be the result, rather, the text to me reads as if it should be. Even though nothing else I have available to test does that. But it looked right, so I changed (not yet committed, nor are the other bug fixes I have made to this) the NetBSD sh to produce the same result as bash: ${SH} -c 'var="[a-e]\\?.*";printf "%s\n" ${var}' a?.?? b?.?? c?.?? e?.?? (${SH} is the obscure pathname to the uninstalled test build of my development version of the NetBSD sh - I have it in a var because it is way too long to type...) whereas the old way: sh -c 'var="[a-e]\\?.*";printf "%s\n" ${var}' [a-e]\?.* the same as everyone else. Then I started pondering other quote characters, since the quote characters are still in the string, that is, if the command were $SHELL -c 'printf "%s\n" [a-e]\?.*' (here it is important that there just be one '\') all shells agree, that the result where the 4 file names are printed is correct. For example: bosh -c 'printf "%s\n" [a-e]\?.*' a?.?? b?.?? c?.?? e?.?? In your earlier reply you said ... | The result of a shell macro expansion is quoted internally before quote | removal is applied. but I cannot find any text anywhere which mandates that, and what's more, it is nothing like what really happens: bosh -c 'var="???";printf "%s\n" ${var}' | wc
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
For my analysis, 2.6.5 says it is results which are subject to field splitting, with the parameter expand and direct entry both being one field as the pattern to evaluate according to 2.6.6, and the treatment of the double quotes follows from 2.13.1 before removal by 2.6.7 processing. 2.13.1 effectively has the quotes ignored, using only the chars in between (the one ?), for matching purposes. 2.6.7 does not properly account for that when a pattern has been evaluated, the ignored quotes are required to be removed to reflect the intent of the pattern. What is there now is more the requirements when set -f in effect, and then quotes from var expansions, not being in the original input, would be expected to stay in the result as literals. On Friday, April 27, 2018 Robert Elz wrote: Date: Fri, 27 Apr 2018 11:03:57 +0200 From: Joerg Schilling Message-ID: <5ae2e77d.95ubF707FXNl6/H/%joerg.schill...@fokus.fraunhofer.de> First, a (minor) apology - I should have made it clear that, yes, "set +f" was intended, and that IFS was not intended to contain any unusual values (no 'a' '*' "'"' '\' or '?' in it... ) Obviously anything like that would alter the results, and that kind of bizarreness is not what I was seeking to query - and if I was, those pre-conditions would not have been forgotten. | XCU 2.6.5 explains what happens after parameter expansion, the quoting happens | as the last action during parameter expansion. 2.6.5 is field splitting, which while it would normally be attempted in the example I gave, would do nothing - and we could disable it by assuming IFS='' if wanted - that should change nothing. But in any case, unless some new text has been added in the resolution of some bug that I am unaware of (which is most of them...) I see nothing in 2.6.5 which is even remotely similar to what you said. Can you cut/paste the relevant words, or quote line numbers, or if there's a change that is not yet in the published text, the bug number ? | The text related to double quotes refers only to "spaces" inside the result. No, it means IFS characters - that is, something that was quoted is not subject to field splitting - that's usually white space, but doesn't have to be, but I agree, that's not relevant to anything here (since field splitting is not going to change anything anyway, we can simply disable it, with IFS='') | If you like, check: | | $shell -c "var='a*\"?\"'; echo \$var" | | alls shells agree here ;-) Yes, they probably do in that case. They don't however in the case that originally caused me to start looking at this. [Aside: Martijn Dekker's modernish found some problems with NetBSD's pattern matching - minor and obscure ones - but clearly bugs, and then when I started testing, I found a few more ... so I created a large set of tests for everything obscure and weird I could think of and these messages are the result of that: before I can "fix" anything I need to understand what is the correct result, and why.] The problem case is: ${SHELL} -c 'var="[a-e]\\?.*";printf "%s\n" ${var}' There are 4 files in $PWD (when the above command is executed) with names that start with a char in [a-e] followed by a '?' followed by a '.' followed by two more '?' chars - and lots more irrelevant files). Almost all shells simply print [a-e]\?.* which is the string assigned to "var" (whether the original input has one or two \ characters makes no difference, and nor should it.) But bash doesn't: (the -o posix given here makes no difference) bash -o posix -c 'var="[a-e]\\?.*";printf "%s\n" ${var}' a?.?? b?.?? c?.?? e?.?? So I started wondering why, and looked at the spec, and could find nothing to suggest this should not be the result, rather, the text to me reads as if it should be. Even though nothing else I have available to test does that. But it looked right, so I changed (not yet committed, nor are the other bug fixes I have made to this) the NetBSD sh to produce the same result as bash: ${SH} -c 'var="[a-e]\\?.*";printf "%s\n" ${var}' a?.?? b?.?? c?.?? e?.?? (${SH} is the obscure pathname to the uninstalled test build of my development version of the NetBSD sh - I have it in a var because it is way too long to type...) whereas the old way: sh -c 'var="[a-e]\\?.*";printf "%s\n" ${var}' [a-e]\?.* the same as everyone else. Then I started pondering other quote characters, since the quote characters are still in the string, that is, if the command were $SHELL -c 'printf "%s\n" [a-e]\?.*' (here it is important that there just be one '\') all shells agree, that the result where the 4 file names are printed is correct. For example: bosh -c 'printf "%s\n" [a-e]\?.*' a?.?? b?.?? c?.?? e?.?? In your earlier reply you said ... | The result of a shell macro expansion is quoted internally before quote | removal is applied. but I cannot find any text anywhere which mandates that, and what's more, it is nothing like what really happens: bosh -c 'var="???";printf "%s\n" ${var}' | wc
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Robert Elz wrote: > But it looked right, so I changed (not yet committed, nor are the other > bug fixes I have made to this) the NetBSD sh to produce the same > result as bash: > > ${SH} -c 'var="[a-e]\\?.*";printf "%s\n" ${var}' > a?.?? > b?.?? > c?.?? > e?.?? This would be a mistake. > Then I started pondering other quote characters, since the quote > characters are still in the string, that is, if the command were > > $SHELL -c 'printf "%s\n" [a-e]\?.*' This is a different example, as you here have a quoted '?' instead of a quoted \ as in the first example. > (here it is important that there just be one '\') all shells agree, that the > result where the 4 file names are printed is correct. For example: > > bosh -c 'printf "%s\n" [a-e]\?.*' > a?.?? > b?.?? > c?.?? > e?.?? See above, a different example results in a different behavior. > In your earlier reply you said ... > > | The result of a shell macro expansion is quoted internally before quote > | removal is applied. > > but I cannot find any text anywhere which mandates that, and what's more, > it is nothing like what really happens: > > bosh -c 'var="???";printf "%s\n" ${var}' | wc -l > 2297 I am not sure what this should point to. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Robert Elz wrote: > We could require, than when stored in a variable, we quote > things in pattern style "quoting" rather than shell style, that is, > to take the example from my immediately previous message, > > $SHELL -c 'var="[a-e][?].*";printf "%s\n" ${var}' > > lists the 4 filenames expected, for all values of $SHELL. See my recent reply, this does not result in a quoted \. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Robert Elz wrote: Hi, first the easy case: > [Aside: Martijn Dekker's modernish found some problems with NetBSD's > pattern matching - minor and obscure ones - but clearly bugs, and then > when I started testing, I found a few more ... so I created a large set of > tests for everything obscure and weird I could think of and these > messages are the result of that: before I can "fix" anything I need to > understand what is the correct result, and why.] > > The problem case is: > > ${SHELL} -c 'var="[a-e]\\?.*";printf "%s\n" ${var}' > > There are 4 files in $PWD (when the above command is executed) > with names that start with a char in [a-e] followed by a '?' followed > by a '.' followed by two more '?' chars - and lots more irrelevant files). > > Almost all shells simply print > [a-e]\?.* > which is the string assigned to "var" (whether the original input has > one or two \ characters makes no difference, and nor should it.) > > But bash doesn't: (the -o posix given here makes no difference) > > bash -o posix -c 'var="[a-e]\\?.*";printf "%s\n" ${var}' > a?.?? > b?.?? > c?.?? > e?.?? Since bash seems to be the only shell that works this way, I would call this a bug. I tested Historic Bourne, ksh88, ksh92, dash, yash, mksh posh, zsh, bosh. BTW: with the previous example, the "expand" function is told to expand: a*"? In your example, expand() is told to expand: [a-e]\\?.* and this must not be match the files you mention. The double slash is the quoting caused at the end of the macro expansion that I mentioned before. sh -c 'var="[a-e]\?.*";printf "%s\n" ${var}' [a-e]\?.* But: sh -c 'var="[a-e]?.*";printf "%s\n" ${var}' a?.?? ...I have only one matching file. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Date:Fri, 27 Apr 2018 10:00:50 +0100 From:Geoff Clare Message-ID: <20180427090050.GA2538@lt2.masqnet> | I believe the former text is misleading and should be deleted. It is | effectively duplicating the requirements regarding backslashes stated in | 2.2.1 and 2.2.3, but gets the details wrong. Except that here it is talking about quoting characters in patterns, where different ones need to be quoted than when parsing. If we were to require that only "original" quotes can quote characters in patterns, this wouldn't matter, but if we do that, I don't think there is any way that we can (reasonably) store a pattern in a variable where the pattern is to match a literal magic char (say an asterisk, or question-mark) - that is, unless in that context we were to require only "pattern" type quoting to ever be used. Note "eval" doesn't really help - that removes quotes, where we need to add them, and while it is possible to write a pattern in a form where it can be eval'd and produce the desired result, that isn't something that I would normally expect almost anyone to be able to work out how to do correctly (and safely - given that the entire command needs to be eval'd there's no way to do just the pattern word in question). We could require, than when stored in a variable, we quote things in pattern style "quoting" rather than shell style, that is, to take the example from my immediately previous message, $SHELL -c 'var="[a-e][?].*";printf "%s\n" ${var}' lists the 4 filenames expected, for all values of $SHELL. This means to quote a * ? or [ (and to be safe) \ outside a bracket expression, one must include it in a (single character) bracket expression, and in a bracket expression, to quote ! (or ^ if applicable) ] and '-' they need to be written in the correct magic order so their special properties are lost. But I think if that is to be the solution, we will need to spell it out very clealy, and at the same time explain why a pattern in a variable has a whole set of different rules that a pattern simply written on the command line. | > But in a pattern?Which of these two applies? | | Depends where the pattern is. Anywhere double quotes have an effect, | the backslash-within-double-quotes rule applies. Elsewhere the "normal" | rule applies. But the backslash within double quotes only applies the \ to quote the double quote string magic chars ($ " ` \ and newline) whereas for patterns what matters is the pattern magic chars (* ? [ etc). Is that really what is supposed to happen? | > 4. On the question of bug 985 ... (kind of related) - if quote removal is | > added to case pattern processing, [...] | | The danger here is that there are references to quote removal elsewhere | that could mean the wrong thing if case patterns are not subject to | quote removal. You actually quoted one of these above from 2.13.1, You could "fix" that by specifying that the pattern in a case statement be subject to quote removal after the pattern has been used to match against the word (the same way that filename patterns are subject to quote removal after they have been used to match). That would be easy to implement, as the expanded pattern is just discarded after it has failed to match (the original text remains for the next iteration of the enclosing loop or whatever, if any, but that's unchanged in all cases.) | When pattern matching is used where shell quote removal is not | performed, ... | | This would apply to case patterns if quote removal is not performed | for them. Yes, it would. But ... | Okay, we could change this condition to something else but | can we be sure there aren't other similar side effects? Are you | willing to search through the standard for every occurence of the | substring "quot"? Huh? I'm confused - what other side effects are possible to changing the wording about how case pattern matching in case statements is done? No-one is proposing altering what quote removal means, or how that is performed. Just whether it should be done in this particular case, and what that means. But yes, I do believe that the whole of 2.13 needs extensive revision, not just fiddling here and there. I'll leave your answer to the 2nd half of (or the addendum to) my message from this morning until you have had time to consider my reply to Jörg (and Mark), as you (more or less) said the same as Jörg. kre
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Date:Fri, 27 Apr 2018 11:03:57 +0200 From:Joerg Schilling Message-ID: <5ae2e77d.95ubF707FXNl6/H/%joerg.schill...@fokus.fraunhofer.de> First, a (minor) apology - I should have made it clear that, yes, "set +f" was intended, and that IFS was not intended to contain any unusual values (no 'a' '*' "'"' '\' or '?' in it... ) Obviously anything like that would alter the results, and that kind of bizarreness is not what I was seeking to query - and if I was, those pre-conditions would not have been forgotten. | XCU 2.6.5 explains what happens after parameter expansion, the quoting happens | as the last action during parameter expansion. 2.6.5 is field splitting, which while it would normally be attempted in the example I gave, would do nothing - and we could disable it by assuming IFS='' if wanted - that should change nothing. But in any case, unless some new text has been added in the resolution of some bug that I am unaware of (which is most of them...) I see nothing in 2.6.5 which is even remotely similar to what you said. Can you cut/paste the relevant words, or quote line numbers, or if there's a change that is not yet in the published text, the bug number ? | The text related to double quotes refers only to "spaces" inside the result. No, it means IFS characters - that is, something that was quoted is not subject to field splitting - that's usually white space, but doesn't have to be, but I agree, that's not relevant to anything here (since field splitting is not going to change anything anyway, we can simply disable it, with IFS='') | If you like, check: | | $shell -c "var='a*\"?\"'; echo \$var" | | alls shells agree here ;-) Yes, they probably do in that case. They don't however in the case that originally caused me to start looking at this. [Aside: Martijn Dekker's modernish found some problems with NetBSD's pattern matching - minor and obscure ones - but clearly bugs, and then when I started testing, I found a few more ... so I created a large set of tests for everything obscure and weird I could think of and these messages are the result of that: before I can "fix" anything I need to understand what is the correct result, and why.] The problem case is: ${SHELL} -c 'var="[a-e]\\?.*";printf "%s\n" ${var}' There are 4 files in $PWD (when the above command is executed) with names that start with a char in [a-e] followed by a '?' followed by a '.' followed by two more '?' chars - and lots more irrelevant files). Almost all shells simply print [a-e]\?.* which is the string assigned to "var" (whether the original input has one or two \ characters makes no difference, and nor should it.) But bash doesn't: (the -o posix given here makes no difference) bash -o posix -c 'var="[a-e]\\?.*";printf "%s\n" ${var}' a?.?? b?.?? c?.?? e?.?? So I started wondering why, and looked at the spec, and could find nothing to suggest this should not be the result, rather, the text to me reads as if it should be. Even though nothing else I have available to test does that. But it looked right, so I changed (not yet committed, nor are the other bug fixes I have made to this) the NetBSD sh to produce the same result as bash: ${SH} -c 'var="[a-e]\\?.*";printf "%s\n" ${var}' a?.?? b?.?? c?.?? e?.?? (${SH} is the obscure pathname to the uninstalled test build of my development version of the NetBSD sh - I have it in a var because it is way too long to type...) whereas the old way: sh -c 'var="[a-e]\\?.*";printf "%s\n" ${var}' [a-e]\?.* the same as everyone else. Then I started pondering other quote characters, since the quote characters are still in the string, that is, if the command were $SHELL -c 'printf "%s\n" [a-e]\?.*' (here it is important that there just be one '\') all shells agree, that the result where the 4 file names are printed is correct. For example: bosh -c 'printf "%s\n" [a-e]\?.*' a?.?? b?.?? c?.?? e?.?? In your earlier reply you said ... | The result of a shell macro expansion is quoted internally before quote | removal is applied. but I cannot find any text anywhere which mandates that, and what's more, it is nothing like what really happens: bosh -c 'var="???";printf "%s\n" ${var}' | wc -l 2297 (the wc is there just because (as shown) there are way too many 3 character filenames to include the printf output directly...) If "The result of a shell macro expansion is quoted internally" was happening, then this example would look like bosh -c 'printf "%s\n" "???" | wc -l' 1 (the '1' being the literal string "???" of course). Instead, what we're getting is: bosh -c 'printf "%s\n" ??? | wc -l' 2297 which shows that the results of the macro expansion are not internally quoted. All shells do
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Geoff Clare wrote: > Robert Elz wrote, on 27 Apr 2018: > > > > Oh, one more thing about patterns - a question this time, though the > > answer might end up suggesting more text that needs to be in > > the standard. > > > > If I have > > > > var='a*"?"' > > > > and then I do > > > > echo $var > > > > what should the result be? Is this absolutely the same as > > > > echo a*"?" > > > > ? > > No it's not the same. The shell expands $var to all filenames that > start with 'a' and end with double-quote, any character, double-quote. Which is a result of the way, the internal quoting is added to the parameter expansion result. Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Shware Systems wrote: > According to XCU 2.6.5, it's treated literally only when double quoted, e.g. > "$var", otherwise quote removal should still occur on the variable's contents > after any field splitting... XCU 2.6.5 explains what happens after parameter expansion, the quoting happens as the last action during parameter expansion. The text related to double quotes refers only to "spaces" inside the result. If you like, check: $shell -c "var='a*\"?\"'; echo \$var" alls shells agree here ;-) Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Robert Elz wrote, on 27 Apr 2018: > > 1. There is text dealing with backslash processing at 2 separate places in > 2.13.1. First at lines 76212-3 > > A character shall escape the following character. > The escaping shall be discarded. > > and then at lines 76232-8 (which is on the following page) > > When pattern matching is used where shell quote removal is not performed > (such as in the argument to the find -name primary when find is being > called > using one of the exec functions as defined in the System Interfaces > volume > of POSIX.1-2008, or in the pattern argument to the fnmatch( ) > function), special > characters can be escaped to remove their special meaning by preceding > them with a character. This escaping is > discarded. > The sequence "\\" represents one literal . All of the > requirements > and effects of quoting on ordinary, shell special, and special pattern > characters > shall apply to escaping in this context. > > Given the former, which is simple, and easy to follow, what is the point of > the latter? I believe the former text is misleading and should be deleted. It is effectively duplicating the requirements regarding backslashes stated in 2.2.1 and 2.2.3, but gets the details wrong. > What's more, in the latter, only special characters can be > escaped, after which the escaping \ is removed - in that version, what > happens to a \ that is not followed by a special character ? Unspecified. > These two are kind of like backslash quoting in unquoted shell text (where the > \ escapes anything (ignoring the \newline for this)) and backslash quoting in > double quoted strings, where the \ only escapes a specific set of characters, > and other backslashes are left untouched. > > In parsing and processing words it is no problem, as we know if we're in a > double quoted string or not. > > But in a pattern?Which of these two applies? Depends where the pattern is. Anywhere double quotes have an effect, the backslash-within-double-quotes rule applies. Elsewhere the "normal" rule applies. > 2. Lines 76219-21: > > If any character (ordinary, shell special, or pattern special) is > quoted, > that pattern shall match the character itself. > [that's fine] > The shell special characters always require quoting. > [that's nonsense]. Agreed. That sentence should be deleted. > 3. Lines 76247-9 > > In such patterns, each shall match a string of zero or more > characters, > [fine] > matching the greatest possible number of characters that still allows > the > remainder of the pattern to match the string. > > the "greatest possible" is unnecessary, and in some cases, actually incorrect > (that's an idea taken from '*' in REs where a specification of this is > needed.) > > It is not generally needed, as in general, shell patterns are just match or > no-match - it is irrelevant exactly what matched where. > > So given the word (or file name) abcdxbz > the pattern > a*b*z > matches, but no-one cares in the slightest whether the 'b' that was > selected was the one after a or the one before z. Which * matched the > null string, and which matched the rest of the characters is irrelevant. > There is no need to specify "greatest possible number" - the * just > needs to match any number of characters that allows the remainder > of the pattern to match. > > The one place where we need more than match/no-match is in the variable > expansion substring operators (# ## % %%). > > There, assuming var contains the word above, we want (require) ${var#a*b} > to match such that the 'b' that matches is the one after 'a', and ${var##a*b} > to match so that the b that matches is the one before 'z'. > > In the single char substring operators we want the '*' to match the smallest, > not greatest, possible number of chars that allows the remainder of the > pattern to match. The only time "greatest" is relevant is for the double > char > substring operators. All true. The descriptions of parameter expansions with %, %%, # and ## cover this, so the "greatest possible number" clause in 2.13.2 should just be deleted. > 4. On the question of bug 985 ... (kind of related) - if quote removal is > added to case pattern processing, it makes that into a different case from all > of the others. In filename generation, pattern matching is done before > quote removal, so the quotes are still there. In parameter expansion > (substring matching) the pattern matching happens before quote removal, > so the quotes in the pattern are still there. To be consistent, it would be > best to leave the quotes in the pattern in a case statement, so processing of > it is consistent with all of the others. The danger here is that there are references to quote removal elsewhere that could mean the wrong thing if case patterns are not subject to quote removal. You actually qu
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
According to XCU 2.6.5, it's treated literally only when double quoted, e.g. "$var", otherwise quote removal should still occur on the variable's contents after any field splitting... On Friday, April 27, 2018 Joerg Schilling wrote: Robert Elz wrote: > Oh, one more thing about patterns - a question this time, though the > answer might end up suggesting more text that needs to be in > the standard. > > If I have > > var='a*"?"' > > and then I do > > echo $var > > what should the result be? Is this absolutely the same as > > echo a*"?" No, it isn't. The result of a shell macro expansion is quoted internally before quote removal is applied. For this reason echo $var will print a*"?", while the latter prints a*? Jörg -- EMail:jo...@schily.net (home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Robert Elz wrote: > Oh, one more thing about patterns - a question this time, though the > answer might end up suggesting more text that needs to be in > the standard. > > If I have > > var='a*"?"' > > and then I do > > echo $var > > what should the result be? Is this absolutely the same as > > echo a*"?" No, it isn't. The result of a shell macro expansion is quoted internally before quote removal is applied. For this reason echo $var will print a*"?", while the latter prints a*? Jörg -- EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Assuming set +f in effect, the first 2 should expand identically, how I read XCU 2.6.5 and 2.6.6; treating the * as a glob special character and the ? as a literal. For the 3rd case the standard is silent on whether the closing " is assumed on reaching the end of the field established during token recognition, by the after the 'r' in '$var', or is a syntax error when the glob is evaluated. The text assumes, in XCU 2.13 by use of 'quoting' generically, if single quotes or double quotes are used to begin a literal pattern sequence the application will ensure the closing quote is always present. I agree a statement should be added to XCU 2.13.1, Line 76221, about what is the required interpretation. It only has now a trailing '\' is undefined behavior. In a message dated 4/26/2018 8:24:33 PM Eastern Standard Time, k...@munnari.oz.au writes: Oh, one more thing about patterns - a question this time, though the answer might end up suggesting more text that needs to be in the standard. If I have var='a*"?"' and then I do echo $var what should the result be? Is this absolutely the same as echo a*"?" ? And if so, whay would happen if instead I had var='a*"?' (and used it the same way?) kre
Re: More questions/comments on XCU 2.13 (sh Pattern Matching)
Oh, one more thing about patterns - a question this time, though the answer might end up suggesting more text that needs to be in the standard. If I have var='a*"?"' and then I do echo $var what should the result be? Is this absolutely the same as echo a*"?" ? And if so, whay would happen if instead I had var='a*"?' (and used it the same way?) kre