On Friday, December 09, 2011 05:58:40 bearophile wrote:
> Jonathan M Davis:
> > And as I explained in bug# 7085, reverse's behavior with regards to
> > dchar[] is completely correct. It's reversing the code points, _not_
> > the graphemes.
> OK. Maybe I will open a differently worded enhancement request, for a
> grapheme-aware std.string.
> > If you want to reverse a char[], then cast it to ubyte[] and reverse
> > that. If you want to reverse a wchar[], then cast it to ushort[] and
> > reverse that. In Phobos, strings are ranges of dchar, so reverse is
> > going to reverse code points. If you want it to reverse code units
> > instead, then you just use the appropriate cast. There's no reason to
> > have it reverse the code units and completely mess up unicode strings.
> 
> I am not interested in reversing code units. Sorry if my post has led to
> this wrong idea. For this specific problem I am not going to cast to
> ubyte[] or ushort[] because it gives very wrong results.
> 
> It's possible to write a "correct" (that doesn't take into account
> graphemes) reverse even if you do not use casts, keeping the array as
> char[] or wchar[], reversing the bytes, and then reversing the bytes of
> each variable-length codepoint. This is what I was asking to an in-place
> reverse().

I don't expect that std.string will _ever_ be grapheme-aware or be processed 
by default as a range of graphemes. That's far too expensive as far as 
performance goes. Rather, we're likely to have a wrapper and/or separate 
range-type which handles graphemes. Then if you want the extra correctness and 
are willing to pay the cost, you use that. As I understand it, std.regex does 
have the beginnings of such, but we do still need to have a range type of some 
variety (probably in std.utf) which fully supports graphemes.

- Jonathan M Davis

Reply via email to