Le 04/08/2012 15:45, Dmitry Olshansky a écrit :
On 04-Aug-12 15:48, Jonathan M Davis wrote:
On Saturday, August 04, 2012 15:32:22 Dmitry Olshansky wrote:
I see it as a compile-time policy, that will fit nicely and solve both
issues. Just provide a templates with a few hooks, and add a Noop policy
that does nothing.

It's starting to look like figuring out what should and shouldn't be
configurable and how to handle it is going to be the largest problem
in the
lexer...


Let's add some meat to my post.
I've seen it mostly as follows:

//user defines mixin template that is mixed in inside lexer
template MyConfig()
{
enum identifierTable = true; // means there would be calls to
table.insert on each identifier
enum countLines = true; //adds line, column properties to the lexer/Tokens

//statically bound callbacks, inside one can use say:
// skip() - to skip a char (popFront)
// get() - to read next char (via popFront, front)
// line, col - as readonly properties
// (skip & get do the counting if enabled)

bool onError()
{
skip(); //the most dumb recovery, just skip a char
return true; //go on with tokenizing, false - stop prematurely
}

...
}

usage:


{
auto my_supa_table = ...; //some kind of container (should a set on
strings and support .insert("blah"); )

auto dlex = Lexer!(MyConfig)(table);
auto all_tokens = array(dlex(joiner(stdin.byChunk(4096))));

//or if we had no interest in table but only tokens:
auto noop = Lexer!(NoopLex)();
...
}


It seems way too much.

The most complex thing that is needed is the policy to allocate identifiers in tokens. It can be made by passing a function that have a string as parameter and a string as return value. The default one would be an identity function.

The second parameter is a bool to tokenize comments or not. Is that enough ?

The onError look like a typical use case for conditions as explained in the huge thread on Exception.

Reply via email to