John Nagle <na...@animats.com> writes: > A dictionary lookup (actually, several of them) for every input > character is rather expensive. Tokenizers usually index into a table > of character classes, then use the character class index in a switch > statement. > > This is an issue that comes up whenever you have to parse some > formal structure, from XML/HTML to Pickle to JPEG images to program > source. > […] > The temptation is to write tokenizers in C, but that's an admission > of language design failure.
This sounds like a job for <URL:http://pyparsing.wikispaces.com/> Pyparsing. -- \ “Better not take a dog on the space shuttle, because if he | `\ sticks his head out when you're coming home his face might burn | _o__) up.” —Jack Handey | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list