On Friday, December 03, 2010 05:13:50 Lars T. Kyllingstad wrote: > On Thu, 02 Dec 2010 16:18:52 -0500, Steven Schveighoffer wrote: > > On Thu, 02 Dec 2010 02:09:51 -0500, Lars T. Kyllingstad > > > > <[email protected]> wrote: > >> On Wed, 01 Dec 2010 16:44:42 -0500, Steven Schveighoffer wrote: > >>> On Tue, 30 Nov 2010 18:34:11 -0500, Lars T. Kyllingstad > >>> > >>> <[email protected]> wrote: > >>>> On Tue, 30 Nov 2010 13:52:20 -0500, Steven Schveighoffer wrote: > >>>>> On Tue, 30 Nov 2010 13:34:50 -0500, Jonathan M Davis > >>>>> <[email protected]> wrote: > >>>>> > >>>>> [...] > >>>>> > >>>>>> 4. Indexing is no longer O(1), which violates the guarantees of the > >>>>>> index operator. > >>>>> > >>>>> Indexing is still O(1). > >>>>> > >>>>>> 5. Slicing (other than a full slice) is no longer O(1), which > >>>>>> violates the > >>>>>> guarantees of the slicing operator. > >>>>> > >>>>> Slicing is still O(1). > >>>>> > >>>>> [...] > >>>> > >>>> It feels extremely weird that the indices refer to code units and not > >>>> code points. If I write > >>>> > >>>> auto str = mystring("hæ?"); > >>>> writeln(str[1], " ", str[2]); > >>>> > >>>> I expect it to print "æ ?", not "æ æ" like it does now. > >>> > >>> I don't think it's possible to do that with any implementation without > >>> making indexing not O(1). This just isn't possible, unless you want > >>> to use dchar[]. > >>> > >>> But your point is well taken. I think what I'm going to do is throw > >>> an exception when accessing an invalid index. While also surprising, > >>> it doesn't result in "extra data". I feel it's probably very rare to > >>> just access hard-coded indexes like that unless you are sure of the > >>> data in the string. Or to use a for-loop to access characters, etc. > >> > >> As soon as you add opIndex(), your interface becomes that of a random- > >> access range, something which narrow strings are not. In fact, the > >> distinction between random access and bidirectional range access for > >> strings is in many ways the reason we're having this discussion. > >> > >> How about dropping opIndex() for UTF-8 and UTF-16 strings, and instead > >> adding a characterAt(i) function that retrieves the i'th code point, > >> and which is not required to be O(1)? Then, if someone wants O(1) > >> indexing they are forced to use string_t!dchar or just plain ol' > >> arrays, both of which have clear, predictable indexing semantics. > > > > Then substring (slicing) becomes an O(n) operation. It just doesn't > > work well. > > What I meant wast that opSlice() should be disabled in the same way as > opIndex().
A string type without slicing (which must be O(1)) is DOA without question. Slicing is _far_ too useful to lose. Indexing in strings is fairly rare because it's generally stupid idea, but slicing happens all the time. If nothing else, that is _the_ way to get a substring. - Jonathan M Davis
