Re: intermediary representation and bison?

Evan Lavelle Mon, 24 Dec 2007 02:13:39 -0800

I had the same problem on my first language; here's what I did.

I started off thinking that I needed a simple interpreter, so I wrotethe Bison code, and basically did all the required work in the Bisonactions. This worked well; It looks like you're at this stage.

A little later, I realised that this wasn't quite good enough. Thespecific problem, I think, was that I needed some forward declarations -ie. sometimes code earlier in the source file needed to know aboutsomething later in the source file. Tricky. After some thought, Idecided that I could do this with minimal changes, just by rescanningthe input again, with Bison. I scanned it once, remembered the bits Ineeded to know about, reset yyin to the beginning of the file, andscanned it again properly. I now had a 2-pass interpreter, which workedwell.

As time went on, it became obvious that this was very limiting. I neededto do all sorts of extra things; some of these were:


1 - more forward declarations

2 - the language allowed constant expressions, so the sematic checkingcode had to confirm that an expression was actually constant. how do youdo this in one pass? The easy answer is to have an extra pass - you scanthe code finding everthing that should be a constant, evaluate anyexpression you find there, replace the expression with a constant (ifyou can), and let the semantic checker confirm that there's a constantthere.

3 - As semantic checking became more complex, it became obvious that Ihad to analyse *all* the code before I had enough information to analyse*some* of the code. This will depend on your language.

4 - some things are next to impossible to code generate just be scanningand rescanning the input. since you're looking at C, what about sequencepoints, and expressions with side-effects? How do you handle theside-effects? Imagine code-generating "y = x++, c(), x" - where do youput in the 'x' increment?

5 - optimisations? A code generator for a new target? What if you haveto scan something right-to-left? And so on.

Anyway, it quickly became obvious that handling anything that wasn'ttrivial is next to impossible just be scanning and rescanning the input.Fortunately, there's a really simple answer. The lexer and scanner dovery little, apart from creating an AST. The scanner is no longer the'compiler'; it's just a very simple front-end that creates a datastructure, the AST. The compilation process is now a standarddata-processing problem - it's nothing more than analysing,manipulating, and transforming the AST. Each analysis or transformationof the AST is, loosely speaking, a compiler "pass"; you can do this asfew, or as many, times as you wish. It's trivial to add a new pass whenyou have some new feature or requirement. The AST *is* your program; thescanning is basically irrelevant.

And you can still have an "interpreter" at the end of it, if you wantone; as soon as you finish "code generation", you just run your "code",or whatever it is that actually does anything. Technically, however, youshould probably call this JIT compilation, rather than interpreting.

The bad news is that you need to know a *lot* more to do this, over andabove writing a Bison scanner. The (user) scanner code (ie. your"actions") is probably a lot less than 5% of a practical and usefulcompiler. So, stick with scanning your input until you find out what theproblems are and, if it's really necessary to fix them, rewrite yourcode around an AST. If you need to find out more about ASTs, look atAntlr; you can use the Antlr library to create and maintain your AST.


Evan


_______________________________________________
[email protected] http://lists.gnu.org/mailman/listinfo/help-bison

Re: intermediary representation and bison?

Reply via email to