On Tue, Jun 4, 2013 at 8:34 PM, Dmitri Gribenko <[email protected]> wrote:
>
>   Are code points the correct thing to count here?  There are combining 
> characters, there are double-width characters.  I think that the CJK number 
> tests pass for the wrong reason -- the width of characters is counted as 1, 
> not as 2, as it would be displayed on the terminal.

In my experience, there are two reasonable choices, and many bad ones.
 The reasonable ones are to count bytes, or to count code points.
Attempting to count characters is doomed, and attempting to determine
the column is something that can only really be done robustly by a
program that renders text.

Of the two reasonable choices (bytes or code points), and assuming
Unicode, code points have the advantage of being independent of the
transformation format.  Compared to characters, they have the virtue
of being unambiguously defined.

-- James

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to