Re: ECMAScript collation question

Norbert Lindenberg Fri, 31 Aug 2012 09:56:43 -0700

OK, so the Unicode conformance question hinges on "must be able to do" versus 
"must do".


The question for ECMAScript then is whether we should stick with "must do" (the 
current state of the specifications) or change to "must be able to do".

The changes for "must be able to do" would be:

- In the Language specification, remove the description of 
String.prototype.localeCompare, and require implementations to follow the 
Internationalization API specification at least for this method, or better 
provide the complete Internationalization API. That way, localeCompare acquires 
support for the normalization property in options, and the -kk- key in the 
Unicode locale extensions.

- In the Internationalization API specification, make support for the 
normalization property and the -kk- key mandatory (it's currently optional), 
but drop the separate requirement that canonically equivalent strings compare 
as 0.

This would give applications control over the trade-off between performance and 
full canonical equivalence, and let implementations select the default per 
locale.

But trading off correctness for performance in this way doesn't seem quite 
right. Especially for search usage, it could mean that you're staring at a 
Vietnamese or Arabic word in a list and the search functions says it's not 
there because you typed an indistinguishable but different string into the 
search box.

Thanks,
Norbert


On Aug 31, 2012, at 8:24 , Nebojša Ćirić wrote:

> This is what Markus had to say (he implemented most of the collation for ICU):
> 
> "http://www.unicode.org/reports/tr10/#Avoiding_Normalization
> 
> Step 1 of the algorithm: http://www.unicode.org/reports/tr10/#Step_1
> which has a note:
>       • Conformant implementations may skip this step in certain 
> circumstances: see Section 6.5, Avoiding Normalization for more information.
> See also http://www.unicode.org/reports/tr10/#Parametic_Tailoring
> -> attribute "normalization", see the description there
> (this whole table 14 will soon move to the LDML spec, leaving only a link in 
> this place)"
> 
> So the question is:
> 
> 1. Do we change i18n API default for normalization to always be true, with 
> some performance penalty?
> 2. Update ES 262 spec with info Markus passed (if possible)?
> 
> 
> 2012/8/30 Mark Davis ☕ <m...@macchiato.com>
> ICU is always able to compare them as being equal, just by setting the 
> parameter.
> 
> Even if the parameter isn't set, it uses an FCD sort (see 
> http://unicode.org/notes/tn5/) and canonical closure, which handles most 
> cases of canonical equivalence. The default is turned on for languages where 
> the normal+auxiliary exemplar sets contains characters that would show a 
> difference even with an FCD+closure sort, and can be turned on always if 
> desired (at some cost in performance; 30% sounds high though).
> 
> Mark
> 
> — Il meglio è l’inimico del bene —
> 
> 
> 
> On Thu, Aug 30, 2012 at 6:30 PM, Norbert Lindenberg 
> <ecmascr...@norbertlindenberg.com> wrote:
> In particular, a conformant implementation must be able to compare any two 
> canonical-equivalent strings as being equal, for all Unicode characters 
> supported by that implementation."
> 
> 
> 
> 
> -- 
> Nebojša Ćirić

_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: ECMAScript collation question

Reply via email to