Hi Albert,

Nice to read your answer:)  Perhaps I shall read some introduction
material on compiler/regular expression parser design, is there any
recommendation?

Best regards,
Davy

On Aug 21, 4:18 pm, "A.T.Hofkamp" <[EMAIL PROTECTED]> wrote:
> Davy wrote:
> > Hi David,
>
> > Thank you for the generous help :)
> > As far as I can see, Lex and Yacc do the same thing on different
> > abstraction layer.
> > That is, Lex rule is regular expression of character, while Yacc rule
> > (if we use EBNF) is regular expression of token. The only difference
> > is Yacc rule embrace recursion.
>
> True, if you forget about the white space that you silently throw away.
>
> Below I wrote some background information with some terms you can use as
> starting point for searching more information.
>
> The lack of recursion in lex makes it possible to generate a DFA
> (deterministic finite automaton) from the regular expression(s), which allows
> for *very fast* processing of the characters.
>
> The yacc parser on the other hand is slower, and more powerful than regular
> expressions.
>
> (Note that in compiler theory, 'regular expressions with recursion' don't
> exist, such grammars fall in the Chomsky hierarchy of contex-free languages.
> For such languages, you need to have a stack besides the automaton.)
>
> This additional power allows you to recognize nested structures correctly, for
> example to find the matching closing brakcet in an expression.
>
> The LALR(1) algorithm used by yacc is a compromise between maximal recognition
> power (enough for most languages), and not too many states (although that is
> of less importance nowadays).
>
> > I guess Lex parser is only a special case of Yacc parser (Lex output
> > is list but not recursively structure like tree). So why not merge
> > them into one tool?
>
> Speed and simplicity mostly. The current division of labour means you can use
> optimized algorithms for each part while reducing overall complexity.
>
> While writing  yacc rules, you don't have to think about the representation of
> an integer number (was it '12345' or '0x3039' or '030071' or 
> 'B11000000111001'?).
>
> None the less, parsers that combine both phases do exist. One example is
> 'sglr' (a C-based solution, I am not aware of any Python-based solutions). It
> uses much more powerful (and much more resource-eating) algorithms to
> recognize the text.
>
> Their power is useful at places where the lex/yacc algorithms are not enough,
> eg with ambigious grammars and/or textual input with several languages
> combined (eg Cobol with embedded SQL).
>
> Albert
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/ply-hack?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to