2013-08-20 17:09, Anne van Kesteren wrote:

On Tue, Aug 20, 2013 at 12:30 AM, Ryosuke Niwa <rn...@apple.com> wrote:
Can the specification be changed to use the number of composed character 
sequences instead of the code-unit length?

In a way I guess that's nice, but it also seems confusing that given

data:text/html,<input type=text maxlength=1>

pasting in U+0041 U+030A would give a string that's longer than 1 from
JavaScript's perspective.

Oh, right, this is an issue different from the non-BMP issue I discussed in my reply. This is even clearer in my opinion, since U+0041 U+030A is clearly two Unicode characters, not one, even though it is expected to be rendered as “Å” and even though U+00C5 is canonically equivalent to U+0041 U+030A.

I don't think there's any place in the
platform where we measure string length other than by number of code
units at the moment.

Besides, if “character” means something else than Unicode character (Unicode code point assigned to a character) or, as a different concept, Unicode code unit, then the question would arise what it means. For example, would a letter followed by 42 combining marks still be one character? (Such monstrosities are actually used, in an attempt to create “funny” effects.)

Yucca


Reply via email to