Re: Handling invalid UTF sequences

Nick Sabalausky Thu, 20 Mar 2014 16:01:21 -0700

On 3/20/2014 6:39 PM, Walter Bright wrote:

Currently we do it by throwing a UTFException. This has problems:


1. about anything that deals with UTF cannot be made nothrow

2. turns innocuous errors into major problems, such as DOS attack vectors
http://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences

One option to fix this is to treat invalid sequences as:

1. the .init value (0xFF for UTF8, 0xFFFF for UTF16 and UTF32)

2. U+FFFD

I kinda like option 1.

What do you think?

I'd have to give some thought to have an opinion on the right solution,however I do want to say the current UTFException throwing is somethingI've always been unhappy with. So it definitely should get addressed insome way.

Re: Handling invalid UTF sequences

Reply via email to