On Wednesday, 16 October 2013 at 08:03:26 UTC, qznc wrote:
On Tuesday, 15 October 2013 at 14:11:37 UTC, Kagamin wrote:
On Sunday, 13 October 2013 at 14:14:14 UTC, nickles wrote:
Also, I understand, that there is the std.utf.count()
function which returns the length that I was searching for.
However, why - if D is so UTF-8-centric - isn't this function
implemented in the core like ".length"?
Most code doesn't need to count graphemes and lives happily
with just strings, that's why it's not in the core.
Most code might be buggy then.
An issue the often comes up is file names. A file called "bär"
will be normalized differently depending on the operating
system. In both cases it is one grapheme. However, on Linux it
is one code point, but on OS X it is two code points.
Now that you mention it, I had a program that would send strings
to a socket written in D. Before I could process the strings on
OS X, I had to normalize the decomposed OS X version of the
strings to the composed form that D could handle, else it
wouldn't work. I used libutf8proc for it (only one tiny little
function). It was no problem to interface to the C library,
however, I thought it would have been nice, if D could've handled
this on its own without depending on third party libraries.