Davy wrote:
> Hi David,
> 
> Thank you for the generous help :)
> As far as I can see, Lex and Yacc do the same thing on different
> abstraction layer.
> That is, Lex rule is regular expression of character, while Yacc rule
> (if we use EBNF) is regular expression of token. The only difference
> is Yacc rule embrace recursion.

True, if you forget about the white space that you silently throw away.

Below I wrote some background information with some terms you can use as 
starting point for searching more information.

The lack of recursion in lex makes it possible to generate a DFA 
(deterministic finite automaton) from the regular expression(s), which allows 
for *very fast* processing of the characters.

The yacc parser on the other hand is slower, and more powerful than regular 
expressions.

(Note that in compiler theory, 'regular expressions with recursion' don't 
exist, such grammars fall in the Chomsky hierarchy of contex-free languages. 
For such languages, you need to have a stack besides the automaton.)

This additional power allows you to recognize nested structures correctly, for 
example to find the matching closing brakcet in an expression.

The LALR(1) algorithm used by yacc is a compromise between maximal recognition 
power (enough for most languages), and not too many states (although that is 
of less importance nowadays).

> I guess Lex parser is only a special case of Yacc parser (Lex output
> is list but not recursively structure like tree). So why not merge
> them into one tool?

Speed and simplicity mostly. The current division of labour means you can use 
optimized algorithms for each part while reducing overall complexity.

While writing  yacc rules, you don't have to think about the representation of 
an integer number (was it '12345' or '0x3039' or '030071' or 
'B11000000111001'?).


None the less, parsers that combine both phases do exist. One example is 
'sglr' (a C-based solution, I am not aware of any Python-based solutions). It 
uses much more powerful (and much more resource-eating) algorithms to 
recognize the text.

Their power is useful at places where the lex/yacc algorithms are not enough, 
eg with ambigious grammars and/or textual input with several languages 
combined (eg Cobol with embedded SQL).

Albert



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/ply-hack?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to