Re: Unicode handling comparison

Wyatt Wed, 27 Nov 2013 08:21:27 -0800

On Wednesday, 27 November 2013 at 15:43:11 UTC, Jakob Ovrum wrote:

The author also doesn't seem to understand the Unicodedefinitions of character and grapheme, which is a shame,because the difference is more or less the whole point of thepost.

I agree with the assertion that people SHOULD know how unicodeworks if they want to work with it, but the way our docs are nowis off-putting enough that most probably won't learn anything.If they know, they know; if they don't, the wall of jargon isintimidating and hard to grasp (more examples up front of morethings that you'd actually use std.uni for). Even though I'mdecently familiar with Unicode, I was having trouble followingall that (e.g. Isn't "noe\u0308l" a grapheme cluster according tostd.uni?). On the flip side, std.utf has a serious dearth ofexamples and the relationship between the two isn't clear.

On that note, I tried to use std.uni to write a simple exampleof how to correctly handle this in D, but it became apparentthat std.uni should expose something like `byGrapheme` whichlazily transforms a range of code points to a range ofgraphemes (probably needs a `byCodePoint` to do the conversetoo). The two extant grapheme functions, `decodeGrapheme` and`graphemeStride`, are *awful* for string manipulation (granted,they are probably perfect for text rendering).

Yes, please. While operations on single codepoints andcharacters seem pretty robust (i.e. you can do lots of thingswith and to them), it feels like it just falls apart when you tryto work with strings. It honestly surprised me how many thingsin std.uni don't seem to work on ranges.


-Wyatt

Re: Unicode handling comparison

Reply via email to