On Tue, Mar 20, 2018 at 12:44 PM, Henri Sivonen <hsivo...@hsivonen.fi> wrote: > On Tue, Mar 20, 2018 at 11:12 AM, Henri Sivonen <hsivo...@hsivonen.fi> wrote: >> OK. I'll leave the UTF-16 case unchanged and will make the minimal >> changes on the UTF-8 side to retain the existing outward behavior >> without burning the tree. Hopefully I can make the UTF-8 case faster >> while at it. It depended on not-so-great code. > > I still have doubts about retaining the exact invalid-UTF-8 behavior. > The current behavior appears to be that if we try to atomicize an > invalid UTF-8 string, the returned atom is new atom representing the > empty string--not the pre-existing atom for the empty string. > > Is there a reason why it's desirable behavior to potentially have > multiple atoms representing the empty string? Is there a reason why we > don't MOZ_CRASH on invalid UTF-8 if we are convinced enough that it's > not supposed to happen to the point that we let go of the atomicity of > atoms if it does happen?
Furthermore, we validate UTF-8 strings anyway as a side effect of hashing them as if they were UTF-16, so if we don't want to MOZ_CRASH, we could at that point swap a valid string (invalid byte sequences replaced with U+FFFD) in the input string's place and atomicize that. It could be a MOZ_UNLIKELY branch on the validation result that we compute anyway and would avoid the weirdness of non-atomic atoms that have no resemblance to the input string. Thoughts? -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform