Re: Today's programming challenge - How's your Range-Fu ?

Chris via Digitalmars-d Mon, 20 Apr 2015 11:41:11 -0700

On Monday, 20 April 2015 at 17:48:17 UTC, Panke wrote:

This can lead to subtle bugs, cf. length of random and e_one.You have to convert everything to dstring to get the"expected" result. However, this is not always desirable.
There are three things that you need to be aware of whenhandling unicode: code units, code points and graphems.

This is why I use a helper function that uses byCodePoint andbyGrapheme. At least for my use cases it returns the correctlength. However, I might think about an alternative version basedon the discussion here.

In general the length of one guarantees anything about thelength of the other, except for utf32, which is a 1:1 mappingbetween code units and code points.
In this thread, we were discussing the relationship betweencode points and graphemes. You're examples however apply to therelationship between code units and code points.
To measure the columns needed to print a string, you'll needthe number of graphemes. (d|)?string.length gives you thenumber of code units.
If you normalize a string (in the sequence ofcharacters/codepoints sense, not object.string) to NFC, it willdecompose every precomposed character in the string (like é,single codeunit), establish a defined order between thecomposite characters and then recompose a selected fewgraphemes (like é). This way é always ends up as a single codeunit in NFC. There are dozens of other combinations whereyou'll still have n:1 mapping between code points and graphemesleft after normalization.
Example given already in this thread: putting an arrow over anlatin letter is typical in math and always more than onecodepoint.

Re: Today's programming challenge - How's your Range-Fu ?

Reply via email to