On Thursday, 20 March 2014 at 22:39:47 UTC, Walter Bright wrote:
Currently we do it by throwing a UTFException. This has problems:

1. about anything that deals with UTF cannot be made nothrow

2. turns innocuous errors into major problems, such as DOS attack vectors
http://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences

One option to fix this is to treat invalid sequences as:

1. the .init value (0xFF for UTF8, 0xFFFF for UTF16 and UTF32)

2. U+FFFD

I kinda like option 1.

What do you think?

Hiding errors under the carpet is not a good strategy. These sequences are invalid, and doomed to explode at some point. I'm not sure what the solution is, but the .init one do not seems like the right one to me.

Reply via email to