On Fri, 12 Nov 2010 01:00:18 +0100
Daniel Gibson <metalcae...@gmail.com> wrote:

> > http://www.digitalmars.com/d/2.0/phobos/std_utf.html  
> 
> If I'm not mistaken, those functions don't handle these "graphemes", i.e. 
> something that appears like one character on the screen, but consists of 
> multiple code *points*. Like spir's "â" that, in UTF-8, is encoded with the 
> following bytes: 0x61 (=='a'), 0xCC, 0x82. (Or \u0061\u0302 in UTF-32).

You are right, Daniel. As far as I understand it superficially (haven't used it 
yet), the current utf library deals with the lower-level issues of encoding 
code point into code units, and bytes.

> Also, a function returning the physical position (i.e. pos in arrray of chars 
> or 
> wchars) of logical char #logPos may be useful, e.g. for fixed width printing 
> stuff:
>    size_t getPhysPos(char[] str, size_t logPos)

See my reply to Walter's next post.

Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com

Reply via email to