Re: why a part of D community do not want go to D2 ?

Daniel Gibson Thu, 11 Nov 2010 16:06:28 -0800

Walter Bright schrieb:

spir wrote:
In my views, there is a missing level of abstraction in common UStringprocessing libs and types. How to count the "â"s in a text? How tofind one? Above, indexOf fails because my editor uses a precombinedcode, while the source (here literal) uses another form.To be able to produce meaningful results, and to use simple routineslike index, find, count..., the way we used to with single-lengthcharacter sets, there should be a grouping phase on top of decoding;we would then process arrays of "stacks" representing characters, notof codes. ITo search, it's also necessary to have all charactersnormalised form, so that both "â" would match: another phase.Unicode provides algorithms for those phases in constructing stringrepresentations -- but everyone seems to ignore the issues... s[0..1]would then return the first character, not the first code of the"stack" representing the first character.
http://www.digitalmars.com/d/2.0/phobos/std_utf.html

If I'm not mistaken, those functions don't handle these "graphemes", i.e.something that appears like one character on the screen, but consists ofmultiple code *points*. Like spir's "â" that, in UTF-8, is encoded with thefollowing bytes: 0x61 (=='a'), 0xCC, 0x82. (Or \u0061\u0302 in UTF-32).

Also, a function returning the physical position (i.e. pos in arrray of chars orwchars) of logical char #logPos may be useful, e.g. for fixed width printing stuff:

  size_t getPhysPos(char[] str, size_t logPos)

Cheers,
- Daniel

Re: why a part of D community do not want go to D2 ?

Reply via email to