Re: If invalid string should crash(was:string need to be robust)

Jussi Jumppanen Sun, 13 Mar 2011 23:16:17 -0700

%u Wrote:

> I agree with a), but not b), Can't find anything in unicode standard says
> you can use the low surrogate like that


According to: http://www.cl.cam.ac.uk/~mgk25/

    According to ISO 10646-1:2000, sections D.7 and 2.3c, a device
    receiving UTF-8 shall interpret a "malformed sequence in the same way
    that it interprets a character that is outside the adopted subset" and
    "characters that are not within the adopted subset shall be indicated
    to the user" by a receiving device. A quite commonly used approach in
    UTF-8 decoders is to replace any malformed UTF-8 sequence by a
    replacement character (U+FFFD), which looks a bit like an inverted
    question mark, or a similar symbol. 

Refer to this file for the above quote: 

http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt

Re: If invalid string should crash(was:string need to be robust)

Reply via email to