On Mon, Mar 19, 2018 at 08:49:10PM +0200, Henri Sivonen wrote:
It appears that currently we allow atomicizing invalid UTF-16 string,
which are impossible to look up by UTF-8 key and we don't allow
atomicizing invalid UTF-8.

I need to change some things in this area in response to changing
error handling of UTF-8 to UTF-16 XPCOM string conversions to be more
secure, so I want to check if I should change things a bit more.

I can well imagine that the current state is exactly what we want:
Bogosity on the UTF-16 side round-trips and bogus UTF-8 doesn't
normally reach the atom machinery.

Am I correct in assuming we don't want changes here?

(One imaginable change would be replacing invalid sequences in both
UTF-16 and UTF-8 with U+FFFD and then atomicizing the result.)

Leaving aside the question of whether validation is desirable, I'd worry about the performance impact. We atomize UTF-16 strings all over the place in DOM code (and even have a main-thread pseudo-hashtable optimization for them). Validation might be relatively cheap, but I'd still expect that relative cheapness to add up fairly quickly.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to