I'm still mired in lexical analysis, though nearly out of the woods; was on
vacation for a while, which slowed me down. Vacations are stressful! :-(

Traditional compilers have made a strong phase distinction between the
tokenizer and the parser. The tokenizer turns an input character stream
into an (position-annotated) token stream. The parser takes a token stream
and constructs an AST tree. Some, but not all, tokens have corresponding
AST node types. E.g. identifiers get an AST. Though some tokenizers
implement a notion of a "start state" that can be changed, most tokenizers
are context-free.

There are numerous cases where we need limited lookahead and/or
backtracking in common languages. Historically this was best done at the
token stream layer, but with a modern, memoizing parser it isn't clear that
this is still motivated.

LR and LL grammars are strictly more powerful than regular expressions.
There is nothing we can express in a tokenizer that cannot be expressed
just as well in a grammar.

As I have been working on the parser generator, I have realized that there
are tokenization sub-patterns that I want to re-use. E.g. character escapes
want the same syntax inside character literals that is used inside strings.
This led me in the direction of a tokenizer specification that looks a lot
like a grammar specification.

All of which is leading me to the following questions:

   1. Is the tokenizer/parser phase distinction actually useful in modern
   front ends?
   2. What value is it providing, and should it be preserved?
   3. Why use two algorithms (regular expression engine, parse algorithm)
   when one will suffice?

The only place in BitC v0 where the phase distinction was important was the
layout engine, and *that* actually *does* seem difficult to express in a
conventional grammar. It *can* be expressed in a predicated grammar with
suitable predicates.

Opinions?


Jonathan
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to