On Friday, December 09, 2011 05:58:40 bearophile wrote: > Jonathan M Davis: > > And as I explained in bug# 7085, reverse's behavior with regards to > > dchar[] is completely correct. It's reversing the code points, _not_ > > the graphemes. > OK. Maybe I will open a differently worded enhancement request, for a > grapheme-aware std.string. > > If you want to reverse a char[], then cast it to ubyte[] and reverse > > that. If you want to reverse a wchar[], then cast it to ushort[] and > > reverse that. In Phobos, strings are ranges of dchar, so reverse is > > going to reverse code points. If you want it to reverse code units > > instead, then you just use the appropriate cast. There's no reason to > > have it reverse the code units and completely mess up unicode strings. > > I am not interested in reversing code units. Sorry if my post has led to > this wrong idea. For this specific problem I am not going to cast to > ubyte[] or ushort[] because it gives very wrong results. > > It's possible to write a "correct" (that doesn't take into account > graphemes) reverse even if you do not use casts, keeping the array as > char[] or wchar[], reversing the bytes, and then reversing the bytes of > each variable-length codepoint. This is what I was asking to an in-place > reverse().
I don't expect that std.string will _ever_ be grapheme-aware or be processed by default as a range of graphemes. That's far too expensive as far as performance goes. Rather, we're likely to have a wrapper and/or separate range-type which handles graphemes. Then if you want the extra correctness and are willing to pay the cost, you use that. As I understand it, std.regex does have the beginnings of such, but we do still need to have a range type of some variety (probably in std.utf) which fully supports graphemes. - Jonathan M Davis