On 03/19/2018 09:30 PM, Kris Maglione wrote:
On Mon, Mar 19, 2018 at 08:49:10PM +0200, Henri Sivonen wrote:
It appears that currently we allow atomicizing invalid UTF-16 string,
which are impossible to look up by UTF-8 key and we don't allow
atomicizing invalid UTF-8.

I need to change some things in this area in response to changing
error handling of UTF-8 to UTF-16 XPCOM string conversions to be more
secure, so I want to check if I should change things a bit more.

I can well imagine that the current state is exactly what we want:
Bogosity on the UTF-16 side round-trips and bogus UTF-8 doesn't
normally reach the atom machinery.

Am I correct in assuming we don't want changes here?

(One imaginable change would be replacing invalid sequences in both
UTF-16 and UTF-8 with U+FFFD and then atomicizing the result.)

Leaving aside the question of whether validation is desirable, I'd worry about the performance impact. We atomize UTF-16 strings all over the place in DOM code (and even have a main-thread pseudo-hashtable optimization for them). Validation might be relatively cheap, but I'd still expect that relative cheapness to add up fairly quickly.

Yeah, all the atom handling is very hot code.  Unless there is some actual 
serious bug to fix, I wouldn't
change the handling.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to