Andrei Alexandrescu Wrote: > It's no secret that string et al. are not a magic recipe for writing > correct Unicode code. However, things are pretty good and could be > further improved by operating the following changes in std.array and > std.range: > > These changes effectively make UTF-8 and UTF-16 bidirectional ranges, > with the quirk that you still have a sort of a random-access operator. > > I'm very strongly in favor of this change. Bidirectional strings allow > beautiful correct algorithms to be written that handle encoded strings > without any additional effort; with these changes, everything applicable > of std.algorithm works out of the box (with the appropriate fixes here > and there), which is really remarkable. > > The remaining WTF is the length property. Traditionally, a range > offering length also implies the expectation that a range of length n > allows you to call popFront n times and then assert that the range is > empty. However, if you check e.g. hasLength!string it will yield false, > although the string does have an accessible member by that name and of > the appropriate type. > > Although Phobos always checks its assumptions, people might occasionally > write code that just uses .length without checking hasLength. Then, > they'll be annoyed when the code fails with UTF-8 and UTF-16 strings. > > (The "real" length of the range is not stored, but can be computed by > using str.walkLength() in std.range.) > > What can be done about that? I see a number of solutions:
The underlying array of byte-sized data fragments is an implementation detail. hasLength is a kludge. Follow good OO design and hide the implementation details from the standard interface! I would use a struct for UTF8 and UTF16 strings, and add a method to get the raw array. That allows simple, compiler-enforced usage while still allowing special casing to use raw data. As an added bonus, this method can generalize for other variable widthrange elements.