On 8/2/2012 12:21 AM, Jonathan M Davis wrote:
Because your input range is a range of dchar?
I think that we're misunderstanding each other here. A typical, well-written,
range-based function which operates on ranges of dchar will use static if or
overloads to special-case strings. This means that it will function with any
range of dchar, but it _also_ will be as efficient with strings as if it just
operated on strings.

It *still* must convert UTF8 to dchars before presenting them to the consumer of the dchar elements.


It won't decode anything in the string unless it has to.
So, having a lexer which operates on ranges of dchar does _not_ make string
processing less efficient. It just makes it so that it can _also_ operate on
ranges of dchar which aren't strings.

For instance, my lexer uses this whenever it needs to get at the first
character in the range:

static if(isNarrowString!R)
     Unqual!(ElementEncodingType!R) first = range[0];
else
     dchar first = range.front;

You're requiring a random access input range that has random access to something other than the range element type?? and you're requiring an isNarrowString to work on an arbitrary range?


if I need to know the number of code units that make up the code point, I
explicitly call decode in the case of a narrow string. In either case, code
units are _not_ being converted to dchar unless they absolutely have to be.

Or you could do away with requiring a special range type and just have it be a UTF8 range.

What I wasn't realizing earlier was that you were positing a range type that has two different kinds of elements. I don't think this is a proper component type.


Yes. I understand. It has a mapping of pointers to identifiers. My point is
that nothing but parsers will need that.
From the standpoint of functionality,
it's a parser feature, not a lexer feature. So, if it can be done just fine in
the parser, then that's where it should be. If on the other hand, it _needs_
to be in the lexer for some reason (e.g. performance), then that's a reason to
put it there.

If you take it out of the lexer, then:

1. the lexer must allocate storage for every identifier, rather than only for unique identifiers

2. and then the parser must scan the identifier string *again*

3. there must be two hash lookups of each identifier rather than one

It's a suboptimal design.

Reply via email to