On Tue, Jun 4, 2013 at 8:34 PM, Dmitri Gribenko <[email protected]> wrote: > > Are code points the correct thing to count here? There are combining > characters, there are double-width characters. I think that the CJK number > tests pass for the wrong reason -- the width of characters is counted as 1, > not as 2, as it would be displayed on the terminal.
In my experience, there are two reasonable choices, and many bad ones. The reasonable ones are to count bytes, or to count code points. Attempting to count characters is doomed, and attempting to determine the column is something that can only really be done robustly by a program that renders text. Of the two reasonable choices (bytes or code points), and assuming Unicode, code points have the advantage of being independent of the transformation format. Compared to characters, they have the virtue of being unambiguously defined. -- James _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
