On Wednesday, 27 November 2013 at 15:43:11 UTC, Jakob Ovrum wrote:
The author also doesn't seem to understand the Unicode
definitions of character and grapheme, which is a shame,
because the difference is more or less the whole point of the
post.
I agree with the assertion that people SHOULD know how unicode
works if they want to work with it, but the way our docs are now
is off-putting enough that most probably won't learn anything.
If they know, they know; if they don't, the wall of jargon is
intimidating and hard to grasp (more examples up front of more
things that you'd actually use std.uni for). Even though I'm
decently familiar with Unicode, I was having trouble following
all that (e.g. Isn't "noe\u0308l" a grapheme cluster according to
std.uni?). On the flip side, std.utf has a serious dearth of
examples and the relationship between the two isn't clear.
On that note, I tried to use std.uni to write a simple example
of how to correctly handle this in D, but it became apparent
that std.uni should expose something like `byGrapheme` which
lazily transforms a range of code points to a range of
graphemes (probably needs a `byCodePoint` to do the converse
too). The two extant grapheme functions, `decodeGrapheme` and
`graphemeStride`, are *awful* for string manipulation (granted,
they are probably perfect for text rendering).
Yes, please. While operations on single codepoints and
characters seem pretty robust (i.e. you can do lots of things
with and to them), it feels like it just falls apart when you try
to work with strings. It honestly surprised me how many things
in std.uni don't seem to work on ranges.
-Wyatt