On 15 Mar 2010, at 10:57, Søren Andersen wrote:
Consider a language with all the normal expressions - you can add,
subtract, multiply, etc.
Now, you'd like for the user to be able to define his own operators
- for instance, '+?' or something like that.
In order to help with ambiguities, you decide these user defined
operators must be at least 2 "elements" long (I'm specifically NOT
using the word "tokens" here for reasons to become clear).
So, you'll allow '++' and '-+', etc.
Now, the problem is that this still ends in shift / reduce conflicts
- mainly because if you write this naturally:
UserOp = PossOp PossOp*;
PossOp = '+' | '-' | '*' | ....;
The parser will look for a succession of tokens - you can write '-'
'+'. But, this is exactly what results in conflicts - obviously,
with just 1 token of lookahead, this will go wrong.
What I really want is for my specification to specify *a single
token* rather than a series of tokens, which is the exact opposite
from what you usually want to happen.
You could generate the possible tokens up to a certain length
automatically:
'++', '+-', '+*', ...
but this would be very large, and you can (obviously) only do it up
to a certain length.
The typical thing would be to let the lexer recognize valid tokens.
Then on can let the .y file recognize the operators and put them and
values on a stack, which is then sorted out by a function in the
actions, computing the value using operator precedences. This way the
number of valid tokens can even be unlimited.
You might check out the Haskell interpreter Hugs <http://haskell.org/hugs/
, which has a .y file and a handwritten lexer in the file input.c.
Perhaps the lexer is handwritten to handle the layout syntax. Also
look into the file Prelude.hs to see how precedences are set. Haskell
just admits about ten level, which is a bit too limited.
Hans
_______________________________________________
help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison