On Monday, March 9, 2015 at 12:05:05 PM UTC+5:30, Steven D'Aprano wrote: > Chris Angelico wrote: > > > As to the notion of rejecting the construction of strings containing > > these invalid codepoints, I'm not sure. Are there any languages out > > there that have a Unicode string type that requires that all > > codepoints be valid (no surrogates, no U+FFFE, etc)? > > U+FFFE and U+FFFF are *noncharacters*, not invalid. There are a total of 66 > noncharacters in Unicode, and they are legal in strings.
Interesting -- Thanks! I wonder whether that's one more instance of the anti-pattern (other thread)? Number thats not a number -- Nan Pointer that points nowhere -- NULL SQL data thats not there but there -- null > > http://www.unicode.org/faq/private_use.html#nonchar8 > > I think the only illegal code points are surrogates. Surrogates should only > appear as bytes in UTF-16 byte-strings. Even more interesting: So there's a whole hierarchy of illegality?? Could you suggest some good reference for 'surrogate'? -- https://mail.python.org/mailman/listinfo/python-list