For my analysis, 2.6.5 says it is results which are subject to field splitting, 
with the parameter expand and direct entry both being one field as the pattern 
to evaluate according to 2.6.6, and the treatment of the double quotes follows 
from 2.13.1 before removal by 2.6.7 processing. 2.13.1 effectively has the 
quotes ignored, using only the chars in between (the one ?), for matching 
purposes. 2.6.7 does not properly account for that when a pattern has been 
evaluated, the ignored quotes are required to be removed to reflect the intent 
of the pattern. What is there now is more the requirements when set -f in 
effect, and then quotes from var expansions, not being in the original input, 
would be expected to stay in the result as literals.

On Friday, April 27, 2018 Robert Elz <k...@munnari.oz.au> wrote:

Date: Fri, 27 Apr 2018 11:03:57 +0200
From: Joerg Schilling <joerg.schill...@fokus.fraunhofer.de>
Message-ID: <5ae2e77d.95ubF707FXNl6/H/%joerg.schill...@fokus.fraunhofer.de>

First, a (minor) apology - I should have made it clear that, yes, "set +f" was
intended, and that IFS was not intended to contain any unusual values (no 'a'
'*' "'"' '\' or '?' in it... ) Obviously anything like that would alter the 
results, and that kind of bizarreness is not what I was seeking to
query - and if I was, those pre-conditions would not have been forgotten.

| XCU 2.6.5 explains what happens after parameter expansion, the quoting 
happens 
| as the last action during parameter expansion.

2.6.5 is field splitting, which while it would normally be attempted in the
example I gave, would do nothing - and we could disable it by assuming IFS=''
if wanted - that should change nothing.

But in any case, unless some new text has been added in the resolution of
some bug that I am unaware of (which is most of them...) I see nothing in 2.6.5
which is even remotely similar to what you said. Can you cut/paste the 
relevant words, or quote line numbers, or if there's a change that is not yet
in the published text, the bug number ?

| The text related to double quotes refers only to "spaces" inside the result.

No, it means IFS characters - that is, something that was quoted is not
subject to field splitting - that's usually white space, but doesn't have to
be, but I agree, that's not relevant to anything here (since field splitting is
not going to change anything anyway, we can simply disable it, with IFS='')

| If you like, check:
|
| $shell -c "var='a*\"?\"'; echo \$var"
|
| alls shells agree here ;-)

Yes, they probably do in that case. They don't however in the case that
originally caused me to start looking at this.

[Aside: Martijn Dekker's modernish found some problems with NetBSD's
pattern matching - minor and obscure ones - but clearly bugs, and then
when I started testing, I found a few more ... so I created a large set of
tests for everything obscure and weird I could think of .... and these
messages are the result of that: before I can "fix" anything I need to
understand what is the correct result, and why.]

The problem case is:

${SHELL} -c 'var="[a-e]\\?.*";printf "%s\n" ${var}'

There are 4 files in $PWD (when the above command is executed)
with names that start with a char in [a-e] followed by a '?' followed
by a '.' followed by two more '?' chars - and lots more irrelevant files).

Almost all shells simply print
[a-e]\?.*
which is the string assigned to "var" (whether the original input has
one or two \ characters makes no difference, and nor should it.)

But bash doesn't: (the -o posix given here makes no difference)

bash -o posix -c 'var="[a-e]\\?.*";printf "%s\n" ${var}'
a?.??
b?.??
c?.??
e?.??

So I started wondering why, and looked at the spec, and could find
nothing to suggest this should not be the result, rather, the text to
me reads as if it should be.

Even though nothing else I have available to test does that.

But it looked right, so I changed (not yet committed, nor are the other
bug fixes I have made to this) the NetBSD sh to produce the same
result as bash:

${SH} -c 'var="[a-e]\\?.*";printf "%s\n" ${var}'
a?.??
b?.??
c?.??
e?.??

(${SH} is the obscure pathname to the uninstalled test build of my
development version of the NetBSD sh - I have it in a var because
it is way too long to type...) whereas the old way:

sh -c 'var="[a-e]\\?.*";printf "%s\n" ${var}'
[a-e]\?.*

the same as everyone else.

Then I started pondering other quote characters, since the quote
characters are still in the string, that is, if the command were

$SHELL -c 'printf "%s\n" [a-e]\?.*'

(here it is important that there just be one '\') all shells agree, that the
result where the 4 file names are printed is correct. For example:

bosh -c 'printf "%s\n" [a-e]\?.*'
a?.??
b?.??
c?.??
e?.??

In your earlier reply you said ...

| The result of a shell macro expansion is quoted internally before quote
| removal is applied.

but I cannot find any text anywhere which mandates that, and what's more,
it is nothing like what really happens:

bosh -c 'var="???";printf "%s\n" ${var}' | wc -l
2297

(the wc is there just because (as shown) there are way too many 3 character
filenames to include the printf output directly...)

If "The result of a shell macro expansion is quoted internally" was happening,
then this example would look like

bosh -c 'printf "%s\n" "???" | wc -l'
1

(the '1' being the literal string "???" of course). Instead, what we're 
getting is:

bosh -c 'printf "%s\n" ??? | wc -l'
2297

which shows that the results of the macro expansion are not internally
quoted. All shells do this, I just, for some reason, picked bosh to use
for this e-mail.

What I think you're actually referring to is 2.6.7 (Quote removal) which says
that quote removal removes the quotes that were in the original word (ie: not
any extra ones produced by one of the expansions that have been performed).

There's no need for any "quoted internally" for this - the shell just needs to
know which quotes in the input were there originally, and which were produced
by an expansion, and only remove the former.

But that's not relevant at all when we're doing pathname expansion (globbing)
quote removal doesn't come until later, so all of the quotes (those originally
in the input word, and those that come from expansions) are there, and waiting
to be used to quote the magic characters in the pattern match (which is
what 2.13 is very clear happens when the quoting character is \ - and depending
on which explanation about quoting in patterns applies, perhaps "" and ''
quoting as well.)

Now I know that shells generally don't work like this - that is because they 
don't implement the spec, which requires quote characters to be retained
unchanged in words that contain them.

That is, when we give

[a-e]\?.*

as a word on the commad line, that exact word is what is made available
to the various expansions (no tilde, param, cmdsub, or arith apply here
of course, so this string goes directly to filename expansion, and produces
the list of the 4 file names in my test directory).

Then when we instead give

var=' [a-e]\?.*' ; ${var}

the expansion of $var produces the exact same string to be used later in
filename expansion, so should give the same results.

That is, if things were implemented the way the spec mandates.

Instead, when parsing, what (almost all) shells do is mark the quoted
characters somehow, so after parsing, the input

[a-e]\?.*

is represented internally as something more like (though obviously
not actually this):

[a-e]<<<QUOTED:?>>>.*

as
var=' [a-e]\?.*' ; ${var}

is represented as

var=<<<QUOTED:[a-e]\?.*>>>; ${var}

then when executed the assignment assigns

[a-e]\?.*

to var, then expands it, producing that same exact string, which
is not the same as what we get with that same string directly
on the command line.

Hence the different results. One how shells work, the other how
POSIX specifies that shells must work. They're just different.

This mess is what I was trying to fathom, and perhaps find some kind of
way of resolving, without needing to go back and rewrite the parsing
sections of the standard to get rid of the "leave the word exactly as
input" which is what causes this (and other) problems - as it was never
the way the shells upon which the spec was based were implemented.

Fortunately here, between the parameter/... expansions, and quote removal
(which fixes things) there are just two steps.

Field splitting is irrelevant, as it applies only to the results of the other
expansions, so can never touch whatever remains of the original word
(including any quote characters, however they are represented.)
This means it is safe to include quote characters in IFS, and the
splitting only happens on quote characters that came from an
expansion and so (in all shells) are just the characters themselves,
and it doesn't matter how "original word" quoting happens to be
represented.

Then there's pattern matching....

That's what led to this morning's (pair of) messages - to see if it is possible 
to
find some sane way to specify pattern matching so that it works with quotes
the way it is supposed to work (whatever that actually is), and can be
made to work with both shells implemented the way shells are implemented,
and with shells that actually do what POSIX says they should do.

To do that, of course, we need to understand what is the way that
patterns containing quotes are actually supposed to work.

kre




Reply via email to