Re: std.string will get the boot

Michel Fortin Sun, 31 Jan 2010 01:10:40 -0800

On 2010-01-30 22:06:06 -0500, Lionello Lunesu <l...@lunesu.remove.com> said:

On 30-1-2010 1:59, Andrei Alexandrescu wrote:

bearophile wrote:

Andrei Alexandrescu:

Currently arrays of characters count as random-access ranges, which
is not true for arrays of char and wchar. I plan to make std.range
aware of that and only characterize char[] and wchar[] (and their
qualified versions) as bidirectional ranges.


32 bits are not enough to represent certain "characters", they need
more than one of such dchar. So dchar too may be a bidirectional range.


[citation needed]


I also doubt 32-bit is not enough. In fact, Unicode has 0x10FFFF
as the highest code point.

32-bit is enough to cover all code points. But there are many combiningcode points in Unicode, allowing you to combine diacritic with variousother characters, such as an acute accent with a 'k'. Some of thesecombinations exists in precombined form and are considered equivalent.So if you want to count the number of characters the user actually seeinstead of counting code points, then you need to take these combiningcode points into account.

But if you really wanted to iterate over "characters" instead of codepoints, note that it can become quite hard if you take into accountdouble diacritics, combining diacritic signs placed across two letters.So I think it's reasonable to have dchar, a code point, as the baseunit for iterating over a string.


http://en.wikipedia.org/wiki/Combining_character
http://en.wikipedia.org/wiki/Unicode_normalization

Another interesting case:
http://en.wikipedia.org/wiki/Combining_grapheme_joiner

Unicode, isn't it great?


--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Re: std.string will get the boot

Reply via email to