On 8/2/12 3:43 AM, Jonathan M Davis wrote:
A range-based function operating on strings without special-casing them often
_will_ harm performance. But if you special-case them for strings, then you
can avoid that performance penalty - especially if you can avoid having to
decode any characters.

The result is that using range-based functions on strings is generally correct
without the function writer (or the caller) having to worry about encodings
and the like, but if they want to eke out all of the performance that they
can, they need to go to the extra effort of special-casing the function for
strings. Like much of D, it favors correctness/saftey but allows you to get
full performance if you work at it a bit harder.

In the case of the lexer, it's really not all that bad - especially since
string mixins allow me to give the operation that I need (e.g. get the first
code unit) in the correct way for that particular range type without worrying
about the details.

For instance, I have this function which I use to generate a mixin any time
that I want to get the first code unit:

string declareFirst(R)()
     if(isForwardRange!R&&  is(Unqual!(ElementType!R) == dchar))
{
     static if(isNarrowString!R)
         return "Unqual!(ElementEncodingType!R) first = range[0];";
     else
         return "dchar first = range.front;";
}

So, every line using it becomes

mixin(declareFirst!R());

which really isn't any worse than

char c = str[0];

except that it works with more than just strings. Yes, it's more effort to get
the lexer working with all ranges of dchar, but I don't think that it's all
that much worse, it the result is much more flexible.

I just posted in the .D forum a simple solution that is fast, uses general ranges, and has you avoid all of the hecatomb of code above.

Andrei


Reply via email to