Re: Use cases for invalid-Unicode atoms

smaug Tue, 20 Mar 2018 02:05:36 -0700

On 03/19/2018 09:30 PM, Kris Maglione wrote:

On Mon, Mar 19, 2018 at 08:49:10PM +0200, Henri Sivonen wrote:

It appears that currently we allow atomicizing invalid UTF-16 string,
which are impossible to look up by UTF-8 key and we don't allow
atomicizing invalid UTF-8.


I need to change some things in this area in response to changing
error handling of UTF-8 to UTF-16 XPCOM string conversions to be more
secure, so I want to check if I should change things a bit more.

I can well imagine that the current state is exactly what we want:
Bogosity on the UTF-16 side round-trips and bogus UTF-8 doesn't
normally reach the atom machinery.

Am I correct in assuming we don't want changes here?

(One imaginable change would be replacing invalid sequences in both
UTF-16 and UTF-8 with U+FFFD and then atomicizing the result.)

Leaving aside the question of whether validation is desirable, I'd worry about the performance impact. We atomize UTF-16 strings all over the place inDOM code (and even have a main-thread pseudo-hashtable optimization for them). Validation might be relatively cheap, but I'd still expect thatrelative cheapness to add up fairly quickly.


Yeah, all the atom handling is very hot code.  Unless there is some actual 
serious bug to fix, I wouldn't
change the handling.
_______________________________________________
dev-platform mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-platform

Re: Use cases for invalid-Unicode atoms

Reply via email to