On 03/20/2018 06:49 AM, Henri Sivonen wrote:
On Tue, Mar 20, 2018 at 12:44 PM, Henri Sivonen <hsivo...@hsivonen.fi> wrote:
On Tue, Mar 20, 2018 at 11:12 AM, Henri Sivonen <hsivo...@hsivonen.fi> wrote:
OK. I'll leave the UTF-16 case unchanged and will make the minimal
changes on the UTF-8 side to retain the existing outward behavior
without burning the tree. Hopefully I can make the UTF-8 case faster
while at it. It depended on not-so-great code.
I still have doubts about retaining the exact invalid-UTF-8 behavior.
The current behavior appears to be that if we try to atomicize an
invalid UTF-8 string, the returned atom is new atom representing the
empty string--not the pre-existing atom for the empty string.
Is there a reason why it's desirable behavior to potentially have
multiple atoms representing the empty string? Is there a reason why we
don't MOZ_CRASH on invalid UTF-8 if we are convinced enough that it's
not supposed to happen to the point that we let go of the atomicity of
atoms if it does happen?
Furthermore, we validate UTF-8 strings anyway as a side effect of
hashing them as if they were UTF-16, so if we don't want to MOZ_CRASH,
we could at that point swap a valid string (invalid byte sequences
replaced with U+FFFD) in the input string's place and atomicize that.
It could be a MOZ_UNLIKELY branch on the validation result that we
compute anyway and would avoid the weirdness of non-atomic atoms that
have no resemblance to the input string.
Thoughts?
My only thought is that the atomize-to-non-canonical-empty-string
behavior sounds like a crazy footgun. But changing it also sounds scary
-- currently, it sounds like valid strings produce unique atoms, and
invalid strings produce unique placeholder things that will never match
anything, even themselves, sort of like a string version of NaN. Is that
correct?
If so, then your proposed change would not only make invalid strings
compare equal to themselves, but also to other *different* invalid
strings with same-length invalid byte sequences. That also seems kind of
footgunny.
I don't really know how Gecko atoms work or are used, though, so I may
be totally off base here.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform