Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Steven Schveighoffer Tue, 11 Jan 2011 05:35:21 -0800

On Mon, 10 Jan 2011 22:57:36 -0500, Andrei Alexandrescu<seewebsiteforem...@erdani.org> wrote:

I've been thinking on how to better deal with Unicode strings. Currentlystrings are formally bidirectional ranges with a surreptitious randomaccess interface. The random access interface accesses the support ofthe string, which is understood to hold data in a variable-encodedformat. For as long as the programmer understands this relationship,code for string manipulation can be written with relative ease. However,there is still room for writing wrong code that looks legit.
Sometimes the best way to tackle a hairy reality is to invite it to thenegotiation table and offer it promotion to first-class abstractionstatus. Along that vein I was thinking of defining a new range:VLERange, i.e. Variable Length Encoding Range. Such a range would havethe power somewhere in between bidirectional and random access.
The primitives offered would include empty, access to front and back,popFront and popBack (just like BidirectionalRange), and in additionproperties typical of random access ranges: indexing, slicing, andlength. Note that the result of the indexing operator is not the same asthe element type of the range, as it only represents the unit ofencoding.
In addition to these (and connecting the two), a VLERange would offertwo additional primitives:
1. size_t stepSize(size_t offset) gives the length of the step needed toskip to the next element.
2. size_t backstepSize(size_t offset) gives the size of the _backward_step that goes to the previous element.
In both cases, offset is assumed to be at the beginning of a logicalelement of the range.
I suspect that a lot of functions in std.string can be written withoutUnicode-specific knowledge just by relying on such an interface.Moreover, algorithms can be generalized to other structures that usevariable-length encoding, such as those used in data compression. (Inthat case, the support would be a bit array and the encoded type wouldbe ubyte.)
Writing to such ranges is not addressed by this design. Ideas arewelcome.
Adding VLERange would legitimize strings and would clarify theirhandling, at the cost of adding one additional concept that needs to beminded. Is the trade-off worthwhile?

While this makes it possible to write algorithms that only acceptVLERanges, I don't think it solves the major problem with strings -- theyare treated as arrays by the compiler.

I'd also rather see an indexing operation return the element type, andhave a separate function to get the encoding unit. This makes more sensefor generic code IMO.


I noticed you never commented on my proposed string type...

That reminds me, I should update with suggested changes and re-post it.

-Steve

Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Reply via email to