On Wednesday, 16 October 2013 at 09:00:01 UTC, monarch_dodra
wrote:
On Wednesday, 16 October 2013 at 08:48:30 UTC, Chris wrote:
On Wednesday, 16 October 2013 at 08:03:26 UTC, qznc wrote:
On Tuesday, 15 October 2013 at 14:11:37 UTC, Kagamin wrote:
On Sunday, 13 October 2013 at 14:14:14 UTC, nickles wrote:
Also, I understand, that there is the std.utf.count()
function which returns the length that I was searching for.
However, why - if D is so UTF-8-centric - isn't this
function implemented in the core like ".length"?
Most code doesn't need to count graphemes and lives happily
with just strings, that's why it's not in the core.
Most code might be buggy then.
An issue the often comes up is file names. A file called
"bär" will be normalized differently depending on the
operating system. In both cases it is one grapheme. However,
on Linux it is one code point, but on OS X it is two code
points.
Now that you mention it, I had a program that would send
strings to a socket written in D. Before I could process the
strings on OS X, I had to normalize the decomposed OS X
version of the strings to the composed form that D could
handle, else it wouldn't work. I used libutf8proc for it (only
one tiny little function). It was no problem to interface to
the C library, however, I thought it would have been nice, if
D could've handled this on its own without depending on third
party libraries.
I'm not sure this is a "D" issue though: It's a fact of unicode
that there are two different ways to write ä.
My point was it would have been nice to have a native D function
that can convert between the two types, especially because this
is a well known issue. NSString (Cocoa / Objective-C) for example
has things like precomposedStringWithCompatibilityMapping etc.