On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote:
So is it 'correct'?
Yes, with the caveat that it might find a surrogate pair (like H
followed by an accent code point). That's what byGrapheme is
about: combining those pairs.
But meh, do you really care about that?
indexOf does correctly handle the UTF formats and returns an
index suitable for slicing (or -1).
auto idx = "cool".indexOf("o");
if(idx == -1)
throw new Exception("not found");
auto before = "cool"[0 .. idx];
auto after = "cool"[idx + 1 .. $];
Code like that will always yield valid UTF strings. Again, it
*might* break up a pair of code points, but it *will* correctly
handle multi-byte code points... so probably good enough for 99%
of use cases.
Looks like bytes, but then it talks
It is bytes on string, and wchars on wstring; it is whatever unit
is correct for slicing the type you pass it.
The D docs are pretty terrible, they don't do much to help you
find what you're looking for.
I mostly agree (and this is partially why I started writing
http://dpldocs.info/ but I never finished that so it isn't much
better). I don't notice it so much because I already know where
to look for most things but regardless I agree it is a pain for
anything new.