On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote:
So is it 'correct'?

Yes, with the caveat that it might find a surrogate pair (like H followed by an accent code point). That's what byGrapheme is about: combining those pairs.

But meh, do you really care about that?

indexOf does correctly handle the UTF formats and returns an index suitable for slicing (or -1).

auto idx = "cool".indexOf("o");
if(idx == -1)
  throw new Exception("not found");

auto before = "cool"[0 .. idx];
auto after = "cool"[idx + 1 .. $];


Code like that will always yield valid UTF strings. Again, it *might* break up a pair of code points, but it *will* correctly handle multi-byte code points... so probably good enough for 99% of use cases.

Looks like bytes, but then it talks

It is bytes on string, and wchars on wstring; it is whatever unit is correct for slicing the type you pass it.

The D docs are pretty terrible, they don't do much to help you find what you're looking for.

I mostly agree (and this is partially why I started writing http://dpldocs.info/ but I never finished that so it isn't much better). I don't notice it so much because I already know where to look for most things but regardless I agree it is a pain for anything new.

Reply via email to