On Fri, 2022-01-28 at 11:26 +0000, mirabilos wrote: > Christoph Anton Mitterer via austin-group-l at The Open Group dixit: > > > However, that may have been crashed again by [2] respectively [3]. > > |it actually finds "XL"? One possible behavior is to replace "XL" > with > |"<Error>" where <Error> is a replacement character such as "�" > > I’d say this is a bug. It should replace only X with <Error>, > then look at the next char. If this is again X without Y, > you get another <Error>, but it’s L so you don’t.
Hmm... not so sure. a) For the ${var%word} forms I'd say it's rather clear that they actually operate on characters (while it seems not so clear what variables themselves are (bytes (except NUL), interpreted in some locale when used as characters ... or generally characters). But for these forms, it clearly says that Pattern Matching Notation is employed, and that in turn is described to work on characters. b) If it's on characters (and unless the locale is C, which makes them bytes again)... where is it defined how a decoder should deal with an invalid encoding? If there's any place in POSIX which says, it should replace such with <Error>... then your point would seem valid, but is there? Cheers, Chris.