Re: The Case For Autodecode

ag0aep6g via Digitalmars-d Fri, 03 Jun 2016 14:46:23 -0700

On 06/03/2016 11:13 PM, Steven Schveighoffer wrote:

No, but I like the idea of preserving the erroneous character you tried
to convert.


Makes sense.

But is there an invalid wchar? I looked through the wikipedia article on
UTF 16, and it didn't seem to say there was one.

If we use U+FFFD, that signifies a coding problem but is still a valid
code point. However, doing a wchar in the D800 - D8FF range without
being followed by a code unit in the DC00 - DFFF range is an invalid
sequence. D throws if it encounters such a thing.

The Unicode FAQ has an answer to this exact question, but it also onlysays that "[u]npaired surrogates are invalid" [1].

It also mentions "noncharacters" which are "permanently reserved [...]for internal use". "For example, they might be used internally as aparticular kind of object placeholder in a string." [2] - Not too bad.

And then there is the replacement character, of course. "[U]sed toreplace an incoming character whose value is unknown or unrepresentablein Unicode" [3].



[1] http://www.unicode.org/faq/utf_bom.html#utf16-7
[2] http://www.unicode.org/faq/private_use.html#noncharacters
[3] http://www.fileformat.info/info/unicode/char/0fffd/index.htm

Re: The Case For Autodecode

Reply via email to