Re: Handling invalid UTF sequences

Dmitry Olshansky Fri, 21 Mar 2014 10:17:47 -0700

21-Mar-2014 02:39, Walter Bright пишет:

Currently we do it by throwing a UTFException. This has problems:


1. about anything that deals with UTF cannot be made nothrow

2. turns innocuous errors into major problems, such as DOS attack vectors
http://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences

One option to fix this is to treat invalid sequences as:

1. the .init value (0xFF for UTF8, 0xFFFF for UTF16 and UTF32)


If we talk decoding then only dchar is relevant.

If transcoding then, having 0xFF makes for broken UTF-8 encoding so Isee no sense in going for it.


2. U+FFFD

Also has the benefit of being recommended by the standard specificallyfor the case of substitution for bad encoding.


Details:
https://d.puremagic.com/issues/show_bug.cgi?id=12113

I kinda like option 1.


Not enough of an argument ;)


--
Dmitry Olshansky

Re: Handling invalid UTF sequences

Reply via email to