On 12.03.2012 17:45, bls wrote:
On 03/13/2012 04:28 AM, Dmitry Olshansky wrote:
On 12.03.2012 16:43, bls wrote:
On 03/10/2012 03:28 PM, Philippe Sigaud wrote:
Hello,

I created a new Github project, Pegged, a Parsing Expression Grammar
(PEG) generator in D.

https://github.com/PhilippeSigaud/Pegged

docs: https://github.com/PhilippeSigaud/Pegged/wiki

Just WOW!

Nice to have on your WIKI would be a EBNF to PEG sheet.

Wirth EBNF Pegged
A = BC. A <- B C
A = B|C. A <- C / C

Maybe A <- B / C. And even then it's not exactly equivalent if the
grammar was ambiguous.
Imagine: B <- a, C <- aa
PEG is pretty new to me. Can you elaborate a bit ?

PEG defines order of alternatives, that is pretty much like a top-down recursive descent parser would parse it. Alternatives are tried from left to right, if first one fails, it tries next and so on. In an example I give B is always picked first and so C is never ever looked at.

Somewhat less artificial example:
Literal <- IntL| FloatL
FloatL <- [0-9]+(.[0-9]+)?
IntL <- [0-9]+

If you change it to: Literal <- FloatL| IntL then integer literals would get parsed as floating point.



My mistake.. cleaned up stuff..

Pegged Wirth EBNF

Sequence
A <- B C A = BC.

B or C
A <- B / C A = B|C.

Zero or one B
A <- B? A = [B].

Zero or more Bs
A <- B* A = {B}.

One or more Bs
A <- B+ Not available

PEG description of EBNF

EBNF <- Procuction+
Production <- Identifier '=' Expression '.'
Expression <- Term ( '|' Term)*
Term <- Factor Factor*
Factor <- Identifier / Literal / '[' Expression ']' / '{' Expression '}'
/ '(' Expression ')'
lowerCase <- [a-z]
upperCase <- [A-Z]
Identifier <- (lowerCase / upperCase) (lowerCase / upperCase)*

Why not:
Identifier <- [a-zA-Z]+

Literal <- ("'" .+ "'") / ('"' .+ '"')

This needs escaping. Plain '.+' in pattern asks for trouble 99% of time.

Still not sure if this is correct. Especially :
Term <- Factor Factor*


Another thing I never really understand is the "production" order, In
other words : Why not top down ..
Start :
lowerCase <- [a-z]
upperCase <- [A-Z]
Identifier <- (lowerCase / upperCase) (lowerCase / upperCase)*

....

End :
EBNF <- Procuction+

where End is Root..

In fact grammars are usually devised the other way around, e.g.
Start:
 Program -> ...
Ehm... what the whole program is exactly ? Ok, let it be Declaration* for now. What kind of declarations do we have? ... and so on. Latter grammars get tweaked and extended numerous times.

At any rate production order has no effect on the grammar, it's still the same. The only thing of importance is what non-terminal considered final (or start if you are LL-centric).


TIA, Bjoern


--
Dmitry Olshansky

Reply via email to