On Thursday, 20 March 2014 at 22:39:47 UTC, Walter Bright wrote:
Currently we do it by throwing a UTFException. This has
problems:
1. about anything that deals with UTF cannot be made nothrow
2. turns innocuous errors into major problems, such as DOS
attack vectors
http://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences
One option to fix this is to treat invalid sequences as:
1. the .init value (0xFF for UTF8, 0xFFFF for UTF16 and UTF32)
2. U+FFFD
I kinda like option 1.
What do you think?
Hiding errors under the carpet is not a good strategy. These
sequences are invalid, and doomed to explode at some point. I'm
not sure what the solution is, but the .init one do not seems
like the right one to me.