Date: Fri, 05 Feb 2021 21:54:52 +0100 From: Steffen Nurpmeso <stef...@sdaoden.eu> Message-ID: <20210205205452.7tbl2%stef...@sdaoden.eu>
| Well .. if i recall correctly quoting inside of ${xYz} has been | clarified not too long ago Not the way that you seem to think. | |And last (for now anyway), after "set -- A B C" what's the effect of | |$'pfx\${@}sfx' ? | | This is interesting. I would say it is identical to ${*} here. In that case $'' could not be the only quoting mechanism that users use. | My MUA just turns it into UTF-8 (via a utf32_to_utf8 function that | uses the Unicode replacement character for erroneous codepoints) The generation of the UTF-8 is not the issue, and the (relatively few) values that are reserved can be handled. | You have to be careful a bit with Unicode. There are guarantees | that must be fulfilled, see for example [1]. Since the shell is | producing UTF-8 it should ensure that no invalid UTF-8 sequences | are exposed to consumers. Of course. But: users are permitted to write $'\xfc\x13' and similar, and no-one suggests that the shell should validate such sequences for valid UTF-8 encoding, and nor would anyone (I hope) claim the shell should object to $'\u0207\xfc\x13' just because it happens to have a \u in it. This is all just bits until it gets used somehow, at which point if it is invalid, then so be it. | When a process interprets a code unit sequence which purports to | be in a Unicode character encoding form, it shall treat | ill-formed code unit sequences as an error conddition and shall | not interpret such sequences as characters. That has to be a requirement on the application, not upon the programming language implementation (the shell here) - when the shell is converting the string, it has no idea how the script will interpret it, nothing requires that a $'\uxxxx' value ever be used as "characters" (though that would be a common use). kre