I had the same problem on my first language; here's what I did.
I started off thinking that I needed a simple interpreter, so I wrote
the Bison code, and basically did all the required work in the Bison
actions. This worked well; It looks like you're at this stage.
A little later, I realised that this wasn't quite good enough. The
specific problem, I think, was that I needed some forward declarations -
ie. sometimes code earlier in the source file needed to know about
something later in the source file. Tricky. After some thought, I
decided that I could do this with minimal changes, just by rescanning
the input again, with Bison. I scanned it once, remembered the bits I
needed to know about, reset yyin to the beginning of the file, and
scanned it again properly. I now had a 2-pass interpreter, which worked
well.
As time went on, it became obvious that this was very limiting. I needed
to do all sorts of extra things; some of these were:
1 - more forward declarations
2 - the language allowed constant expressions, so the sematic checking
code had to confirm that an expression was actually constant. how do you
do this in one pass? The easy answer is to have an extra pass - you scan
the code finding everthing that should be a constant, evaluate any
expression you find there, replace the expression with a constant (if
you can), and let the semantic checker confirm that there's a constant
there.
3 - As semantic checking became more complex, it became obvious that I
had to analyse *all* the code before I had enough information to analyse
*some* of the code. This will depend on your language.
4 - some things are next to impossible to code generate just be scanning
and rescanning the input. since you're looking at C, what about sequence
points, and expressions with side-effects? How do you handle the
side-effects? Imagine code-generating "y = x++, c(), x" - where do you
put in the 'x' increment?
5 - optimisations? A code generator for a new target? What if you have
to scan something right-to-left? And so on.
Anyway, it quickly became obvious that handling anything that wasn't
trivial is next to impossible just be scanning and rescanning the input.
Fortunately, there's a really simple answer. The lexer and scanner do
very little, apart from creating an AST. The scanner is no longer the
'compiler'; it's just a very simple front-end that creates a data
structure, the AST. The compilation process is now a standard
data-processing problem - it's nothing more than analysing,
manipulating, and transforming the AST. Each analysis or transformation
of the AST is, loosely speaking, a compiler "pass"; you can do this as
few, or as many, times as you wish. It's trivial to add a new pass when
you have some new feature or requirement. The AST *is* your program; the
scanning is basically irrelevant.
And you can still have an "interpreter" at the end of it, if you want
one; as soon as you finish "code generation", you just run your "code",
or whatever it is that actually does anything. Technically, however, you
should probably call this JIT compilation, rather than interpreting.
The bad news is that you need to know a *lot* more to do this, over and
above writing a Bison scanner. The (user) scanner code (ie. your
"actions") is probably a lot less than 5% of a practical and useful
compiler. So, stick with scanning your input until you find out what the
problems are and, if it's really necessary to fix them, rewrite your
code around an AST. If you need to find out more about ASTs, look at
Antlr; you can use the Antlr library to create and maintain your AST.
Evan
_______________________________________________
help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison