Re: Pegged, From EBNF to PEG

Dmitry Olshansky Tue, 13 Mar 2012 09:20:06 -0700

On 12.03.2012 17:45, bls wrote:

On 03/13/2012 04:28 AM, Dmitry Olshansky wrote:

On 12.03.2012 16:43, bls wrote:

On 03/10/2012 03:28 PM, Philippe Sigaud wrote:

Hello,


I created a new Github project, Pegged, a Parsing Expression Grammar
(PEG) generator in D.

https://github.com/PhilippeSigaud/Pegged

docs: https://github.com/PhilippeSigaud/Pegged/wiki


Just WOW!

Nice to have on your WIKI would be a EBNF to PEG sheet.

Wirth EBNF Pegged
A = BC. A <- B C
A = B|C. A <- C / C


Maybe A <- B / C. And even then it's not exactly equivalent if the
grammar was ambiguous.
Imagine: B <- a, C <- aa

PEG is pretty new to me. Can you elaborate a bit ?

PEG defines order of alternatives, that is pretty much like a top-downrecursive descent parser would parse it. Alternatives are tried fromleft to right, if first one fails, it tries next and so on.In an example I give B is always picked first and so C is never everlooked at.


Somewhat less artificial example:
Literal <- IntL| FloatL
FloatL <- [0-9]+(.[0-9]+)?
IntL <- [0-9]+

If you change it to: Literal <- FloatL| IntL then integer literals wouldget parsed as floating point.


My mistake.. cleaned up stuff..

Pegged Wirth EBNF

Sequence
A <- B C A = BC.

B or C
A <- B / C A = B|C.

Zero or one B
A <- B? A = [B].

Zero or more Bs
A <- B* A = {B}.

One or more Bs
A <- B+ Not available

PEG description of EBNF

EBNF <- Procuction+
Production <- Identifier '=' Expression '.'
Expression <- Term ( '|' Term)*
Term <- Factor Factor*
Factor <- Identifier / Literal / '[' Expression ']' / '{' Expression '}'
/ '(' Expression ')'
lowerCase <- [a-z]
upperCase <- [A-Z]
Identifier <- (lowerCase / upperCase) (lowerCase / upperCase)*


Why not:
Identifier <- [a-zA-Z]+

Literal <- ("'" .+ "'") / ('"' .+ '"')

This needs escaping. Plain '.+' in pattern asks for trouble 99% of time.

Still not sure if this is correct. Especially :
Term <- Factor Factor*


Another thing I never really understand is the "production" order, In
other words : Why not top down ..
Start :
lowerCase <- [a-z]
upperCase <- [A-Z]
Identifier <- (lowerCase / upperCase) (lowerCase / upperCase)*

....

End :
EBNF <- Procuction+

where End is Root..


In fact grammars are usually devised the other way around, e.g.
Start:
 Program -> ...

Ehm... what the whole program is exactly ? Ok, let it be Declaration*for now. What kind of declarations do we have? ... and so on. Lattergrammars get tweaked and extended numerous times.

At any rate production order has no effect on the grammar, it's stillthe same. The only thing of importance is what non-terminal consideredfinal (or start if you are LL-centric).


TIA, Bjoern



--
Dmitry Olshansky

Re: Pegged, From EBNF to PEG

Reply via email to