"Jonathan M Davis" <jmdavisp...@gmx.com> wrote in message news:mailman.2166.1335463456.4860.digitalmar...@puremagic.com... > On Thursday, April 26, 2012 13:51:17 Nick Sabalausky wrote: >> Also, keep in mind that (unless I'm mistaken) walkLength does *not* >> return >> the number of "characters" (ie, graphemes), but merely the number of code >> points - which is not the same thing (due to existence of the >> [confusingly-named] "combining characters"). > > You're not mistaken. Nothing in Phobos (save perhaps some of std.regex's > internals) deals with graphemes. It all operates on code points, and > strings > are considered to be ranges of code points, not graphemes. So, as far as > ranges go, walkLength returns the actual length of the range. That's > _usually_ > the number of characters/graphemes as well, but it's certainly not 100% > correct. We'll need further unicode facilities in Phobos to deal with that > though, and I doubt that strings will ever change to be treated as ranges > of > graphemes, since that would be incredibly expensive computationally. We > have > enough performance problems with strings as it is. What we'll probably get > is > extra functions to deal with normalization (and probably something to > count > the number of graphemes) and probably a wrapper type that does deal in > graphemes. >
Yea, I'm not saying that walkLength should deal with graphemes. Just that if someone wants the number of "characters", then neither length *nor* walkLength are guaranteed to be correct.