2013-08-20 17:09, Anne van Kesteren wrote:
On Tue, Aug 20, 2013 at 12:30 AM, Ryosuke Niwa <rn...@apple.com> wrote:
Can the specification be changed to use the number of composed character
sequences instead of the code-unit length?
In a way I guess that's nice, but it also seems confusing that given
data:text/html,<input type=text maxlength=1>
pasting in U+0041 U+030A would give a string that's longer than 1 from
JavaScript's perspective.
Oh, right, this is an issue different from the non-BMP issue I discussed
in my reply. This is even clearer in my opinion, since U+0041 U+030A is
clearly two Unicode characters, not one, even though it is expected to
be rendered as “Å” and even though U+00C5 is canonically equivalent to
U+0041 U+030A.
I don't think there's any place in the
platform where we measure string length other than by number of code
units at the moment.
Besides, if “character” means something else than Unicode character
(Unicode code point assigned to a character) or, as a different concept,
Unicode code unit, then the question would arise what it means. For
example, would a letter followed by 42 combining marks still be one
character? (Such monstrosities are actually used, in an attempt to
create “funny” effects.)
Yucca