On Wednesday, 16 October 2013 at 08:48:30 UTC, Chris wrote:
On Wednesday, 16 October 2013 at 08:03:26 UTC, qznc wrote:
On Tuesday, 15 October 2013 at 14:11:37 UTC, Kagamin wrote:
On Sunday, 13 October 2013 at 14:14:14 UTC, nickles wrote:
Also, I understand, that there is the std.utf.count() function which returns the length that I was searching for. However, why - if D is so UTF-8-centric - isn't this function implemented in the core like ".length"?

Most code doesn't need to count graphemes and lives happily with just strings, that's why it's not in the core.

Most code might be buggy then.

An issue the often comes up is file names. A file called "bär" will be normalized differently depending on the operating system. In both cases it is one grapheme. However, on Linux it is one code point, but on OS X it is two code points.

Now that you mention it, I had a program that would send strings to a socket written in D. Before I could process the strings on OS X, I had to normalize the decomposed OS X version of the strings to the composed form that D could handle, else it wouldn't work. I used libutf8proc for it (only one tiny little function). It was no problem to interface to the C library, however, I thought it would have been nice, if D could've handled this on its own without depending on third party libraries.

I'm not sure this is a "D" issue though: It's a fact of unicode
that there are two different ways to write ä.

Reply via email to