On 03/20/2018 06:49 AM, Henri Sivonen wrote:
On Tue, Mar 20, 2018 at 12:44 PM, Henri Sivonen <hsivo...@hsivonen.fi> wrote:
On Tue, Mar 20, 2018 at 11:12 AM, Henri Sivonen <hsivo...@hsivonen.fi> wrote:
OK. I'll leave the UTF-16 case unchanged and will make the minimal
changes on the UTF-8 side to retain the existing outward behavior
without burning the tree. Hopefully I can make the UTF-8 case faster
while at it. It depended on not-so-great code.
I still have doubts about retaining the exact invalid-UTF-8 behavior.
The current behavior appears to be that if we try to atomicize an
invalid UTF-8 string, the returned atom is new atom representing the
empty string--not the pre-existing atom for the empty string.

Is there a reason why it's desirable behavior to potentially have
multiple atoms representing the empty string? Is there a reason why we
don't MOZ_CRASH on invalid UTF-8 if we are convinced enough that it's
not supposed to happen to the point that we let go of the atomicity of
atoms if it does happen?
Furthermore, we validate UTF-8 strings anyway as a side effect of
hashing them as if they were UTF-16, so if we don't want to MOZ_CRASH,
we could at that point swap a valid string (invalid byte sequences
replaced with U+FFFD) in the input string's place and atomicize that.
It could be a MOZ_UNLIKELY branch on the validation result that we
compute anyway and would avoid the weirdness of non-atomic atoms that
have no resemblance to the input string.

Thoughts?
My only thought is that the atomize-to-non-canonical-empty-string behavior sounds like a crazy footgun. But changing it also sounds scary -- currently, it sounds like valid strings produce unique atoms, and invalid strings produce unique placeholder things that will never match anything, even themselves, sort of like a string version of NaN. Is that correct?

If so, then your proposed change would not only make invalid strings compare equal to themselves, but also to other *different* invalid strings with same-length invalid byte sequences. That also seems kind of footgunny.

I don't really know how Gecko atoms work or are used, though, so I may be totally off base here.

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to