Davy wrote: > Hi David, > > Thank you for the generous help :) > As far as I can see, Lex and Yacc do the same thing on different > abstraction layer. > That is, Lex rule is regular expression of character, while Yacc rule > (if we use EBNF) is regular expression of token. The only difference > is Yacc rule embrace recursion.
True, if you forget about the white space that you silently throw away. Below I wrote some background information with some terms you can use as starting point for searching more information. The lack of recursion in lex makes it possible to generate a DFA (deterministic finite automaton) from the regular expression(s), which allows for *very fast* processing of the characters. The yacc parser on the other hand is slower, and more powerful than regular expressions. (Note that in compiler theory, 'regular expressions with recursion' don't exist, such grammars fall in the Chomsky hierarchy of contex-free languages. For such languages, you need to have a stack besides the automaton.) This additional power allows you to recognize nested structures correctly, for example to find the matching closing brakcet in an expression. The LALR(1) algorithm used by yacc is a compromise between maximal recognition power (enough for most languages), and not too many states (although that is of less importance nowadays). > I guess Lex parser is only a special case of Yacc parser (Lex output > is list but not recursively structure like tree). So why not merge > them into one tool? Speed and simplicity mostly. The current division of labour means you can use optimized algorithms for each part while reducing overall complexity. While writing yacc rules, you don't have to think about the representation of an integer number (was it '12345' or '0x3039' or '030071' or 'B11000000111001'?). None the less, parsers that combine both phases do exist. One example is 'sglr' (a C-based solution, I am not aware of any Python-based solutions). It uses much more powerful (and much more resource-eating) algorithms to recognize the text. Their power is useful at places where the lex/yacc algorithms are not enough, eg with ambigious grammars and/or textual input with several languages combined (eg Cobol with embedded SQL). Albert --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ply-hack" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/ply-hack?hl=en -~----------~----~----~----~------~----~------~--~---
