Re: Use cases for invalid-Unicode atoms

Kris Maglione Mon, 19 Mar 2018 12:30:35 -0700

On Mon, Mar 19, 2018 at 08:49:10PM +0200, Henri Sivonen wrote:

It appears that currently we allow atomicizing invalid UTF-16 string,
which are impossible to look up by UTF-8 key and we don't allow
atomicizing invalid UTF-8.


I need to change some things in this area in response to changing
error handling of UTF-8 to UTF-16 XPCOM string conversions to be more
secure, so I want to check if I should change things a bit more.

I can well imagine that the current state is exactly what we want:
Bogosity on the UTF-16 side round-trips and bogus UTF-8 doesn't
normally reach the atom machinery.

Am I correct in assuming we don't want changes here?

(One imaginable change would be replacing invalid sequences in both
UTF-16 and UTF-8 with U+FFFD and then atomicizing the result.)

Leaving aside the question of whether validation is desirable,I'd worry about the performance impact. We atomize UTF-16strings all over the place in DOM code (and even have amain-thread pseudo-hashtable optimization for them). Validationmight be relatively cheap, but I'd still expect that relativecheapness to add up fairly quickly.

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Use cases for invalid-Unicode atoms

Reply via email to