Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

Harald van Dijk Tue, 18 Jun 2019 11:06:54 -0700

On 18/06/2019 10:51, Geoff Clare wrote:

Harald van Dijk <a...@gigawatt.nl> wrote, on 17 Jun 2019:

In 2.13.1, "This escaping <backslash>" refers to the escaping <backslash>
defined in the previous sentence. The full previous sentence is conditional:
"When pattern matching is used where shell quote removal is not performed
(...), special characters can be escaped to remove their special meaning by
preceding them with a <backslash> character." We are talking about pattern
matching used where shell quote removal is performed, so this does not
apply.


Quote removal only removes quote characters that were in the original
word.  In the case we are discussing the backslash is not in the
original word, so is not subject to quote removal.

I suppose you could argue that quote removal is "performed" but just
doesn't do anything to that backslash.


That is clearly exactly what is meant.

                                        In which case, as you pointed
out, the earlier statement in 2.13.1 would specify the same thing
anyway.

It would be better to have a clear division between the two statements
such that one applies to shell-quoting backslash characters and the
other applies to pattern-matching backslash characters.

Assuming there needs to be a difference between the two. If not, one ofthe sentences can just be dropped.

It seems to me that 2.13.1 should be interpreted as overriding 2.13.3,
as otherwise there would be no point in having that statement there.


It is what clarifies or specifies that `find . -name '\*'` looks for files
named '*', not files named '\*'.


No.  Just saying that the backslash escapes the next character is
sufficient for that, as processing of this meaning for the backslash
happens during the matching operation.  Once the matching operation
has been performed there is no point removing the backslash, as find
will not make any further use of the pattern.

I wrote "clarifies or specifies" because I was unsure whether the justsaying that the backslash escapes the next character is sufficient.Without saying the backslash is removed, it is clear that the '*' mustbe taken literally, but either it would also allow the '\' to be takenliterally, or it would not be clear (other than by common sense) thatthe '\' must not also be taken literally as matching a backslash character.

Compare this to how quoting works in the shell. Given '\$', the factthat '$' is quoted is determined early, but the '\' is supposed to bepreserved at that point (in theory, shell implementations may differ).The '\' getting deleted is specified separately, as part of QuoteRemoval. I suspect the backslash removal during pattern matching wasmade to model that.

The only place that discarding the backslash makes any difference is
if the pattern is used for something after the matching operation, and
the only way for that to happen is when a shell pathname expansion does
not match any files.  But bash sticks to the 2.13.3 requirement and
uses the pattern unchanged (including the backslash).

In the resolution of bug 1234 we should update 2.13.3 to say
something like "... left unchanged, except that escaping <backslash>
pattern characters in parts of the pattern that are not affected by
shell quoting shall be discarded as specified in [xref to 2.13.1]".


That seems like a bad idea.

For unquoted variables containing backslashes but no other metacharacters,
the common case will be that the backslashes are meant to be taken literally
but that filename expansion does not find any matching files. This common
case will be broken if 2.13.1 overrides 2.13.3, but will match existing
shells and user expectations if 2.13.1 is limited to the pattern matching
and 2.13.3's "unchanged" means "unchanged".

Think of something like

   var='printf %s\n hello'
   $var

Yes, this is potentially unsafe, but unless someone actually creates a file
named %sn, it will do what the user expects in all shells that I know of.
Your interpretation would require it to break.


Yes, and that's a good thing.  It is always better for programming
errors to be discovered as soon as possible, rather than becoming
a timebomb that will go off when a particular circumstance arises
(in this case, existence of a file with a particular name).

For newly written scripts, perhaps. But changing the standard andknowingly taking a risk of breaking scripts, scripts that currently workin all shells and have worked in all shells for years, and are requiredby the current version of POSIX to work in all shells and have beenrequired by older versions of POSIX to work in all shells for years, iscompletely obviously not something POSIX should be doing.


Cheers,
Harald van Dijk

Re: [1003.1(2016)/Issue7+TC2 0001234]: in most shells, backslash doesn't have two meaning wrt pattern matching

Reply via email to