2017-05-16 17:33:26 +0700, Robert Elz:
[...]
> | Or just write it as quote() (...) instead of quote() { ...;}
>
> Yes, as you would have seen later, I mentioned that in a subsequent
> message.
Sorry about that. I hadn't seen that message at the time I
replied.
[...]
> | Here, I'd fire awk and quote more than one arg at a time:
>
> Hmm - you're really aiming for maximum sluggishness... I could beat that
> by just adding a couple of sleeps ...
Depends. If quoting only a handful a arguments, then that call
to awk might cost you you a couple of milliseconds indeed. But
if processing thousands, you might find that it saves a few
seconds.
> I deliberately did not do multiple arg quoting, as what you want in
> that case depends upon the application, just quoting each separately
> is not necessarily the desired result. And given the ability to quote
> a single string, adding the mechanism to quote multiple strings is
> not very hard ...(call the function over and over) and you get to
> deal with the multiple results in whatever way your application needs.
My quote() works like your quote() when passed a single
argument, mine can take more than one and still produce a
useful outcome (and helps with performance).
> WHat is clightly harder to fix, but can be done if you really wanted it,
> is to omit redundant (quoting) 's in the result, so we don't end up
> with stuff like
>
> 'a'\'''\'''\''b'
> when
> a\'\'\'b
>
> is all that is really needed... If the aim is just, as was originally
> stated (save & restore,) then it doesn't matter, but if you are ever
> going to show the result to a human, it does.
When quoting shell code, it's better to quote everything as the
parsing depends on the locale (and with single quote as that's a
safe character in most usable encodings)
There are shell quoting libraries out there that try to be smart
by not quoting everything but they end up introducing
vulnerabilities.
See for instance this bug in perl's String::ShellQuote
https://rt.cpan.org/Public/Bug/Display.html?id=118508
> | Using LC_ALL=C on the assumption that the encoding of ' (0x27 in
>
> This is the shell, there is exactly one single quote character, and it
> is that one. The data can be anything, the characters used in the
> syntax elements cannot. Nor do non-ascii chars ever expand to anything
> or have any meaning different from themselves as a data char.
>
> If we start having shell parsing differently depending on what locale the
> user happens to be using, we may as well all give up now, and go find
> something else to do.
Yes, as I said, single-quote is safe in all charsets on my
system. But backslash and backtick are not for instance.
On one given POSIX system, ` and \ being part of the portable
character set are guaranteed to be encoded the same in every
charset supported on the system. For instance, on ASCII-based
systems (the norm nowadays), \ is 0x5c. The shell syntax
(POSIX operators and keywords) use only characters from the
portable charset, but that is not to say that 0x5c cannot be
found in the multi-byte encoding of other characters.
For instance the α character in BIG5-HKSCS is encoded as
0xa3 0x5c. In POSIX shells like bash or ksh93 (also zsh), in a
zh_HK.big5hkscs Hong Kong locale.
echo α
would /work/. It would not issue a PS2 prompt because of that
trailing 0x5c byte (\ in ASCII and in BIG5-HKSCS).
Yet, if you did a LC_ALL=C sed 's/\\/&&/g' on that α, it would
effectively turn it into α\ which would make things worse.
The single quote character doesn't have such a problem.
(and yes, before you mention it, I agree, all multi-byte
character sets other than UTF-8 should really be retired as
they're a source of countless issues).
>
> | Also note that if $IFS was previously unset upon calling your
> | quote() (as is common when you want to restore splitting to its
> | default behaviour), it would leave it assigned an empty value
> | (which means "no splitting").
>
> Yes, mentioned that in my following message too.
Sorry again.
> [email protected] in another message said:
> | No, the split+glob operator that is done upon unquoted parameter expansion
> | (or command substitution or arithmetic expansion) is completely different
> | from the shell syntax parsing. It is not affected by quotes.
>
> I'm not sure what point you were making there, but all I was saying was
> that in my original (test, not in the function) I did
>
> y=$(quote $x)
>
> (by accident, I normally quote everything.) That version doesn't
> work properly at all - of course (depending upon what $x expands to
> of course). When quoted ("$x") it does work. That is, when quoted
> like this the value of $x is certainly a single arg to the quote
> function, and any glob meta chars in it will just be themselves,
> not expanded as file names, which the unquoted version would do.
[...]
Sorry that's probably my misinterpretation of:
kre> $ y=$(quote "$x")
[...]
kre> Just remember to always quote variable references "$x" unless you are
kre> 100% certain what the content of the variable is, eg: as above with $y
kre> where we know it is the result of the quote function, so is safe.
Which I understood as you saying it was OK not to quote $y as it
was the result of the quote() function. I'm not sure what you
meant if not that though.
--
Stephane