On Thursday, August 02, 2012 08:51:26 Jacob Carlborg wrote: > On 2012-08-02 08:26, Jonathan M Davis wrote: > > It's really not all that hard to special case for strings, especially when > > you're operating primarily on code units. And I think that the lexer > > should be flexible enough to be usable with ranges other than strings. > > We're trying to make most stuff in Phobos range-based, not string-based > > or array-based. > Ok. I just don't think it's worth giving up some performance or make the > design overly complicated just to make a range interface. But if ranges > doesn't cause these problems I'm happy.
A range-based function operating on strings without special-casing them often _will_ harm performance. But if you special-case them for strings, then you can avoid that performance penalty - especially if you can avoid having to decode any characters. The result is that using range-based functions on strings is generally correct without the function writer (or the caller) having to worry about encodings and the like, but if they want to eke out all of the performance that they can, they need to go to the extra effort of special-casing the function for strings. Like much of D, it favors correctness/saftey but allows you to get full performance if you work at it a bit harder. In the case of the lexer, it's really not all that bad - especially since string mixins allow me to give the operation that I need (e.g. get the first code unit) in the correct way for that particular range type without worrying about the details. For instance, I have this function which I use to generate a mixin any time that I want to get the first code unit: string declareFirst(R)() if(isForwardRange!R && is(Unqual!(ElementType!R) == dchar)) { static if(isNarrowString!R) return "Unqual!(ElementEncodingType!R) first = range[0];"; else return "dchar first = range.front;"; } So, every line using it becomes mixin(declareFirst!R()); which really isn't any worse than char c = str[0]; except that it works with more than just strings. Yes, it's more effort to get the lexer working with all ranges of dchar, but I don't think that it's all that much worse, it the result is much more flexible. - Jonathan M Davis