2017-05-17 14:00:21 +0200, Steffen Nurpmeso: [...] > |BTW, U+00A0, should really not be a [:blank:] or [:space:]. > |That's the whole point of that "non-breaking space" character. > > It is clearly defined as whitespace in Unicode and thus ISO, at > least once i last worked with the Unicode tables. Not that my MUA > gets that right at the moment, but in theory it should. (We yet > use a homegrown byte table for that, which was designed to deal > with email RFCs, but i hope i soon find some time for my ctext > Unicode thing again, finally, and in a distant future that MUA > will get this right, too.)
It's whitespace but it's non-breaking as in it should *not* be used for delimiters. So either blank/space should not have U+00A0 or the POSIX spec should be updated to *not* refer to "blank" when it specifies delimiting behaviour IMO. Now, http://www.unicode.org/L2/L2003/03139-posix-classes.htm recommends nbsp be included in the POSIX "blank"/"space" classes, so I suppose there are quite a few people that don't agree with me on that (note that I don't object of U+00A0 being considered a blank/space but of it being considered a delimiter). See also http://www.unicode.org/L2/L2003/03139-posix-classes.htm#TR_14652 about ISO/IEC TR 14652:2004 including BS (backspace) and not nbsp. What's the opengroup position? > |That's a known oddity of Solaris. (that makes it the only > |single-byte blank I'm aware of, though of course one may always > |construct a rogue locale that has more). > > The only one besides U+0020 SP and U+0009 HT. [...] Yes, of course, that's what I meant. The only non-ASCII single byte character (0xa0 in many charsets, 0x9a in KOI8-U) that can cause problem in practice with bash (or ksh88 it appears). -- Stephane
