On 21/06/2019 22:22, Eric Blake wrote:
On 6/21/19 4:00 PM, Stephane Chazelas wrote:
To quote two striking examples that have already been given,
that interpretation of the standard would mean that:

pattern='\.'
grep $pattern file

Which in all shells is documented to search for lines that
contain a dot in "file" would now be required to instead search
for lines that contain at least one character in "file", as \ is
now a glob quoting operator, and \. happens to match the .
directory entry (on those systems where . is included in the
result of readdir() at least and with shells that don't skip .
and .. in glob expansions).

Where's the glob character that causes $pattern to be subjected to
globbing?  Had there also been a '*', '[', or '?' in $pattern, I could
(sort of) see the logic to the unquoted $pattern being subjected to use
as a glob pattern. But when there are no globbing characters at all, why
does \. suddenly serve to cause a glob lookup (where \ is then erased by
the globbing procedure) and match '.' in the current directory?  (And
yes, this one is also confusing because of the ongoing work on the other
open bug about whether shells should be permitted to always omit '.'
from globbing, regardless of whether readdir() omitted it)

In theory, pathname expansion is supposed to be done for every word, no matter which characters it contains. Even in something as simple as

  echo hello

both words undergo pathname expansion. The first word is supposed to be matched against the names in the current directory. If any matches "echo", then pathname expansion produces that name. If none matches, then the "echo" is used unmodified. Same for the second word.

Obviously, this means that pathname expansion produces "echo", regardless of the contents of the current directory, and shells are allowed to bypass reading the current directory as the results do not depend on it. This is strictly speaking an optimisation though, and as such the "as the results to not depend on it" is a requirement for the optimisation to be valid. If shells bypass the reading when the results *do* depend on it, then that is a bug in the shells.

In order for unquoted \ to not require globbing, the standard would need to specify the results of pathname expansion in a way that produces the same results regardless of the contents of the current directory. That was one advantage Geoff Clare's interpretation (the one that said 2.13.1's "The escaping <backslash> shall be discarded" should override 2.13.3's "the pattern string shall be left unchanged") did have: under that interpretation, no scan of the current directory would be needed, as results would be identical regardless of the contents. That is also an advantage many shells' behaviour of treating unquoted backslash as a literal character have. But if backslash does also function as an escape character during pattern matching, it needs to trigger globbing as well.

Cheers,
Harald van Dijk

Reply via email to