On 06-Aug-12 22:03, deadalnix wrote:
Le 04/08/2012 15:45, Dmitry Olshansky a écrit :
On 04-Aug-12 15:48, Jonathan M Davis wrote:
On Saturday, August 04, 2012 15:32:22 Dmitry Olshansky wrote:
I see it as a compile-time policy, that will fit nicely and solve both
issues. Just provide a templates with a few hooks, and add a Noop
policy
that does nothing.

It's starting to look like figuring out what should and shouldn't be
configurable and how to handle it is going to be the largest problem
in the
lexer...


Let's add some meat to my post.
I've seen it mostly as follows:

//user defines mixin template that is mixed in inside lexer
template MyConfig()
{
enum identifierTable = true; // means there would be calls to
table.insert on each identifier
enum countLines = true; //adds line, column properties to the
lexer/Tokens

//statically bound callbacks, inside one can use say:
// skip() - to skip a char (popFront)
// get() - to read next char (via popFront, front)
// line, col - as readonly properties
// (skip & get do the counting if enabled)

bool onError()
{
skip(); //the most dumb recovery, just skip a char
return true; //go on with tokenizing, false - stop prematurely
}

...
}

usage:


{
auto my_supa_table = ...; //some kind of container (should a set on
strings and support .insert("blah"); )

auto dlex = Lexer!(MyConfig)(table);
auto all_tokens = array(dlex(joiner(stdin.byChunk(4096))));

//or if we had no interest in table but only tokens:
auto noop = Lexer!(NoopLex)();
...
}


It seems way too much.

The most complex thing that is needed is the policy to allocate
identifiers in tokens.

Editor that highlights text may choose not to build identifier table at all. One may see it as a safe mode (low resource mode) for more advance IDE.

The second parameter is a bool to tokenize comments or not. Is that
enough ?
No.

And doing Tokens as special comment token is frankly bad idea. See Walter's comments in this thread.

Also e.g. For compiler only DDoc ones are ever useful, not so for IDE. Filtering them out later is inefficient, as it would be far better not to create them in the first place.

The onError look like a typical use case for conditions as explained in
the huge thread on Exception.

mm I lost track of that discussion. Either way I see statically bound function as good enough hook into the process as it can do anything useful: skip wrong chars, throw exception, stop parsing prematurely, whatever - pick your poison.

--
Dmitry Olshansky

Reply via email to