I'm a royal n00b to writing translators, but you have to start someplace. In my Python project, I've decided that writing the dispatch code to sit between the Glulx virtual machine and the Glk API will be best done automatically, using the handy prototypes.
Below is the prototype of the lexer, and I'd like some comments in case I'm doing something silly already. My particular concern are: The loop checks for each possible lexeme one at a time, and has rather lame error checking. I made regexes for matching a couple of really trivial cases for the sake of consistency. In general, is there a better way to use re module for lexing. Ultimately, I'm going to need to build up an AST from the lines, and use that to generate Python code to dispatch Glk functions. I realize I'm throwing away the id of the lexeme right now; suggestions on the best way to store that information are welcome. I do know of and have experimented with PyParsing, but for now I want to use the standard modules. After I understand what I'm doing, I think a PyParsing solution will be easy to write. import re def match(regex, proto, ix, lexed_line): m = regex.match(proto, ix) if m: lexed_line.append(m.group()) ix = m.end() return ix def parse(proto): """ Return a lexed version of the prototype string. See the Glk specification, 0.7.0, section 11.1.4 >>> parse('0:') ['0', ':'] >>> parse('1:Cu') ['1', ':', 'Cu'] >>> parse('2<Qb:Cn') ['2', '<', 'Qb', ':', 'Cn'] >>> parse('4Iu&#![2SF]>+Iu:Is') ['4', 'Iu', '&#!', '[', '2', 'S', 'F', ']', '>+', 'Iu', ':', 'Is'] """ arg_count = re.compile('\d+') qualifier = re.compile('[&<>][+#!]*') type_name = re.compile('I[us]|C[nus]|[SUF]|Q[a-z]') o_bracket = re.compile('\\[') c_bracket = re.compile('\\]') colon = re.compile(':') ix = 0 lexed_line = [] m = lambda regex, ix: match(regex, proto, ix, lexed_line) while ix < len(proto): old = ix ix = m(arg_count, ix) ix = m(qualifier, ix) ix = m(type_name, ix) ix = m(o_bracket, ix) ix = m(c_bracket, ix) ix = m(colon, ix) if ix == old: print "Parse error at %s of %s" % (proto[ix:], proto) ix = len(proto) return lexed_line if __name__ == "__main__": import doctest doctest.testmod() -- Neil Cerutti We dispense with accuracy --sign at New York drug store -- http://mail.python.org/mailman/listinfo/python-list