I've written a rather minimal s-expression parser with PLY, but I'm
experiencing a strange bug. Since the code is rather short, I'll post
it here:
--==BEGIN lexer.py==--
import ply.lex as lex
tokens = ('INTEGER', 'FLOAT', 'STRING', 'LPAREN', 'RPAREN',
'IDENTIFIER',
'NEWLINE', 'RATIONAL')
t_FLOAT = r'((\d*\.\d+)(E[\+-]?\d+)?|([1-9]\d*E[\+-]?\d+))'
t_STRING = r'\".*?\"'
t_LPAREN = r'\('
t_RPAREN = r'\)'
t_IDENTIFIER = r'[^0-9()][^()\ \t\n]*'
t_INTEGER = r'(-)?\d+'
t_RATIONAL = r'(-)?\d+/\d+'
t_ignore = ' \t'
def t_NEWLINE(t):
r'\n'
t.lexer.lineno += 1
def t_error(t):
'''
Houston, we have a problem.
'''
print("Illegal character %s" % t.value[0])
t.lexer.skip(1)
lexer = lex.lex (optimize = 0)
--==END lexer.py==--
Now, when I do this:
>>> from lexer import lexer
>>>
>>> lexer.input (' (+ 7abc 3 "xyz") ')
>>> for token in lexer:
... print token
I get:
LexToken(LPAREN,'(',1,1)
LexToken(IDENTIFIER,'+',1,2)
LexToken(INTEGER,'7',1,4)
LexToken(IDENTIFIER,'abc',1,5)
LexToken(INTEGER,'3',1,9)
LexToken(IDENTIFIER,'"xyz"',1,11)
LexToken(RPAREN,')',1,16)
>>>
What I'd expect is an error matching 7abc, since it's not a valid
identifier. The thing that makes me suspect this is a LY bug rather
than a bug in my code is that pyscheme (http://hkn.eecs.berkeley.edu/
~dyoo/python/pyscheme/) builds its lexer and parser using PLY and has
the same bug. Can anyone confirm this is a bug in PLY or am I doing
something subtly wrong?
Thanks!
--
You received this message because you are subscribed to the Google Groups
"ply-hack" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/ply-hack?hl=en.