On Wednesday, 27 November 2013 at 15:43:11 UTC, Jakob Ovrum wrote:

The author also doesn't seem to understand the Unicode definitions of character and grapheme, which is a shame, because the difference is more or less the whole point of the post.

I agree with the assertion that people SHOULD know how unicode works if they want to work with it, but the way our docs are now is off-putting enough that most probably won't learn anything. If they know, they know; if they don't, the wall of jargon is intimidating and hard to grasp (more examples up front of more things that you'd actually use std.uni for). Even though I'm decently familiar with Unicode, I was having trouble following all that (e.g. Isn't "noe\u0308l" a grapheme cluster according to std.uni?). On the flip side, std.utf has a serious dearth of examples and the relationship between the two isn't clear.

On that note, I tried to use std.uni to write a simple example of how to correctly handle this in D, but it became apparent that std.uni should expose something like `byGrapheme` which lazily transforms a range of code points to a range of graphemes (probably needs a `byCodePoint` to do the converse too). The two extant grapheme functions, `decodeGrapheme` and `graphemeStride`, are *awful* for string manipulation (granted, they are probably perfect for text rendering).

Yes, please. While operations on single codepoints and characters seem pretty robust (i.e. you can do lots of things with and to them), it feels like it just falls apart when you try to work with strings. It honestly surprised me how many things in std.uni don't seem to work on ranges.

-Wyatt

Reply via email to