Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2013-08-08 Thread Tom Compton

On Sunday, 11 March 2012 at 12:40:34 UTC, Alex Rønne Petersen
wrote:

Question: Are the generated parsers, AST nodes, etc classes or 
structs?


Why is this important?


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2013-08-07 Thread Elie Morisse

Hi Philippe,

I wrote a Kate(the official text editor of KDE, also used in 
KDevelop) syntax highlighting file for Pegged:


http://www.homo-nebulus.fr/dlang/pegged.xml

Screenshot:
http://imgur.com/sb3jFqs.png

Of course for the syntax highlighting to work the grammar must be 
in a separate file and included like this in a .d file:


  mixin(grammar(import("Tales.pgd")));


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-22 Thread Martin Nowak
On Sun, 11 Mar 2012 00:28:42 +0100, Philippe Sigaud  
 wrote:



Hello,

I created a new Github project, Pegged, a Parsing Expression Grammar  
(PEG) generator in D.


https://github.com/PhilippeSigaud/Pegged

docs: https://github.com/PhilippeSigaud/Pegged/wiki

PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar

The idea is to give the generator a PEG with the standard syntax.  From  
this grammar definition, a set of related parsers will be created, to be  
used at runtime or compile time.




Just wanted to draw your attention on a pull request of mine.
https://github.com/D-Programming-Language/dmd/pull/426

Using an added option (-M|Mf|Md) dmd will gather all mixin
strings into a file. This is very helpful for better error
messages and debugging.


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-15 Thread Timon Gehr

On 03/15/2012 12:09 AM, Jay Norwood wrote:

On Saturday, 10 March 2012 at 23:28:42 UTC, Philippe Sigaud wrote:

* Grammars can be dumped in a file to create a D module.




In reading the D spec I've seen a few instance where there are infered
items such as auto for variables and various parameters in templates. I
presume your D grammar will have to have some rules to be able to infer
these as well.

It seems like it would not be a big step to output what are the infered
proper statements as the .capture output. Is that correct? I think that
would be helpful in some cases to be able to view the fully expanded
expression.


This is not what a parser does. Resolving symbols and assigning types to 
expressions is part of semantic analysis.


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-14 Thread Jay Norwood

On Saturday, 10 March 2012 at 23:28:42 UTC, Philippe Sigaud wrote:

* Grammars can be dumped in a file to create a D module.




 In reading the D spec I've seen a few instance where there are 
infered items such as auto for variables and various parameters 
in templates.  I presume your D grammar will have to have some 
rules to be able to infer these as well.


It seems like it would not be a big step to output what are the 
infered proper statements as the .capture output.  Is that 
correct?  I think that would be helpful in some cases to be able 
to view the fully expanded expression.


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-13 Thread Tobias Pankrath
I am impressed. That's a really nice showcase for the D compile 
time features.


Can I use PEG to parse languages like python and haskell where 
indention matters without preprocessing?


Will you make it work with input ranges of dchar? So that I can 
easily plug in some preprocessing steps?





Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-12 Thread Jay Norwood

On Tuesday, 13 March 2012 at 05:25:38 UTC, Jay Norwood wrote:






Admittedly I have not heard of PEGs before, so I'm curious: Is 
this powerful enough to parse a language such as C?


I've just read a few articles referenced from this page, and 
the second link was by someone who had done java 1.5, the 
second link

http://bford.info/packrat/
http://www.romanredz.se/papers/FI2007.pdf


Also in the later paper he did a C parser, so I suppose that is 
the answer ...


http://www.romanredz.se/papers/FI2008.pdf


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-12 Thread Jay Norwood






Admittedly I have not heard of PEGs before, so I'm curious: Is 
this powerful enough to parse a language such as C?


I've just read a few articles referenced from this page, and the 
second link was by someone who had done java 1.5, the second link

http://bford.info/packrat/
http://www.romanredz.se/papers/FI2007.pdf

It is interesting but that article left me with some questions 
about the implementation in order to make it useful for my needs.


I had done an experiment with mvel expression evaluation in java 
and gotten good improvements relative to homebrew expression 
evaluators.


However, the mvel expressions are missing the ability to express 
array operations clearly, which is something that is very clear 
in D, and my particular need is to enable the user to express 
array operations.


With this pegged embedded parser, it appears to me you could 
provide a group of  context symbols as part of a language 
definition, similar to providing a list of reserved words, so 
that the parsing of the user's expression would also validate the 
symbols used.


Also, I've been reading David Simcha's parallel_algorithm.d, here:
https://github.com/dsimcha/parallel_algorithm/blob/master/parallel_algorithm.d
and in the parallelArrayOp portion, he has presented a way to 
turn the D array expressions into code that is executed in 
parallel on multicore systems. That is something I'd want to use, 
but the manual lexing requirement is a bit clunky, and the rules 
are unclear to me,  so it seems to me a combination with the 
pegged parser could make that more accessible.

Thanks,
Jay








Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread Alex Rønne Petersen

On 12-03-2012 02:27, Alex Rønne Petersen wrote:

On 11-03-2012 00:28, Philippe Sigaud wrote:

Hello,

I created a new Github project, Pegged, a Parsing Expression Grammar
(PEG) generator in D.

https://github.com/PhilippeSigaud/Pegged

docs: https://github.com/PhilippeSigaud/Pegged/wiki

PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar

The idea is to give the generator a PEG with the standard syntax. From
this grammar definition, a set of related parsers will be created, to be
used at runtime or compile time.

Usage
-

To use Pegged, just call the `grammar` function with a PEG and mix it
in. For example:


import pegged.grammar;

mixin(grammar("
Expr <- Factor AddExpr*
AddExpr <- ('+'/'-') Factor
Factor <- Primary MulExpr*
MulExpr <- ('*'/'/') Primary
Primary <- Parens / Number / Variable / '-' Primary

Parens <- '(' Expr ')'
Number <~ [0-9]+
Variable <- Identifier
"));



This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for
basic arithmetic expressions with operator precedence ('*' and '/' bind
stronger than '+' or '-'). `Identifier` is a pre-defined parser
recognizing your basic C-style identifier. Recursive or mutually
recursive rules are OK (no left recursion for now).

To use a parser, use the `.parse` method. It will return a parse tree
containing the calls to the different rules:

// Parsing at compile-time:
enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6");

pragma(msg, parseTree1.capture);
writeln(parseTree1);

// And at runtime too:
auto parseTree2 = Expr.parse(" 0 + 123 - 456 ");
assert(parseTree2.capture == ["0", "+", "123", "-", "456"]);



Features


* The complete set of PEG operators are implemented
* Pegged can parse its input at compile time and generate a complete
parse tree at compile time. In a word: compile-time string (read: D
code) transformation and generation.
* You can parse at runtime also, you lucky you.
* Use a standard and readable PEG syntax as a DSL, not a bunch of
templates that hide the parser in noise.
* But you can use expression templates if you want, as parsers are all
available as such. Pegged is implemented as an expression template, and
what's good for the library writer is sure OK for the user too.
* Some useful additional operators are there too: a way to discard
matches (thus dumping them from the parse tree), to push captures on a
stack, to accept matches that are equal to another match
* Adding new parsers is easy.
* Grammars are composable: you can put different
`mixin(grammar(rules));` in a module and then grammars and rules can
refer to one another. That way, you can have utility grammars providing
their functionalities to other grammars.
* That's why Pegged comes with some pre-defined grammars (JSON, etc).
* Grammars can be dumped in a file to create a D module.

More advanced features, outside the standard PEG perimeter are there to
bring more power in the mix:

* Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The
previous rule defines a parametrized parser taking two other parsers
(namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s.
* Named captures: any parser can be named with the `=` operator. The
parse tree generated by the parser (so, also its matches) is delivered
to the user in the output. Other parsers in the grammar see the named
captures too.
* Semantic actions can be added to any rule in a grammar. Once a rule
has matched, its associated action is called on the rule output and
passed as final result to other parsers further up the grammar. Do what
you want to the parse tree. If the passed actions are delegates, they
can access external variables.


Philippe



Hm, since ' is used in the grammar of Pegged, how do I express it in my
grammar spec? Is there a predefined rule for it, or is \' supposed to work?



Ah, Quote it is. I also noticed there is DoubleQuote. Is " used for 
anything in the Pegged grammar?


--
- Alex


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread Alex Rønne Petersen

On 11-03-2012 00:28, Philippe Sigaud wrote:

Hello,

I created a new Github project, Pegged, a Parsing Expression Grammar
(PEG) generator in D.

https://github.com/PhilippeSigaud/Pegged

docs: https://github.com/PhilippeSigaud/Pegged/wiki

PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar

The idea is to give the generator a PEG with the standard syntax. From
this grammar definition, a set of related parsers will be created, to be
used at runtime or compile time.

Usage
-

To use Pegged, just call the `grammar` function with a PEG and mix it
in. For example:


import pegged.grammar;

mixin(grammar("
Expr <- Factor AddExpr*
AddExpr <- ('+'/'-') Factor
Factor <- Primary MulExpr*
MulExpr <- ('*'/'/') Primary
Primary <- Parens / Number / Variable / '-' Primary

Parens <- '(' Expr ')'
Number <~ [0-9]+
Variable <- Identifier
"));



This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for
basic arithmetic expressions with operator precedence ('*' and '/' bind
stronger than '+' or '-'). `Identifier` is a pre-defined parser
recognizing your basic C-style identifier. Recursive or mutually
recursive rules are OK (no left recursion for now).

To use a parser, use the `.parse` method. It will return a parse tree
containing the calls to the different rules:

// Parsing at compile-time:
enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6");

pragma(msg, parseTree1.capture);
writeln(parseTree1);

// And at runtime too:
auto parseTree2 = Expr.parse(" 0 + 123 - 456 ");
assert(parseTree2.capture == ["0", "+", "123", "-", "456"]);



Features


* The complete set of PEG operators are implemented
* Pegged can parse its input at compile time and generate a complete
parse tree at compile time. In a word: compile-time string (read: D
code) transformation and generation.
* You can parse at runtime also, you lucky you.
* Use a standard and readable PEG syntax as a DSL, not a bunch of
templates that hide the parser in noise.
* But you can use expression templates if you want, as parsers are all
available as such. Pegged is implemented as an expression template, and
what's good for the library writer is sure OK for the user too.
* Some useful additional operators are there too: a way to discard
matches (thus dumping them from the parse tree), to push captures on a
stack, to accept matches that are equal to another match
* Adding new parsers is easy.
* Grammars are composable: you can put different
`mixin(grammar(rules));` in a module and then grammars and rules can
refer to one another. That way, you can have utility grammars providing
their functionalities to other grammars.
* That's why Pegged comes with some pre-defined grammars (JSON, etc).
* Grammars can be dumped in a file to create a D module.

More advanced features, outside the standard PEG perimeter are there to
bring more power in the mix:

* Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The
previous rule defines a parametrized parser taking two other parsers
(namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s.
* Named captures: any parser can be named with the `=` operator. The
parse tree generated by the parser (so, also its matches) is delivered
to the user in the output. Other parsers in the grammar see the named
captures too.
* Semantic actions can be added to any rule in a grammar. Once a rule
has matched, its associated action is called on the rule output and
passed as final result to other parsers further up the grammar. Do what
you want to the parse tree. If the passed actions are delegates, they
can access external variables.


Philippe



Hm, since ' is used in the grammar of Pegged, how do I express it in my 
grammar spec? Is there a predefined rule for it, or is \' supposed to work?


--
- Alex


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread Andrei Alexandrescu

On 3/11/12 3:02 AM, Philippe Sigaud wrote:

There is an operator to drop unnecessary nodes (':').


Good.


Apart from that,
a Pegged grammar is a self-contained entity: it automatically cuts
nodes coming from other grammars, to simplify the tree (it keeps the
matcheds substrings, of course). I plan to code different type of
grammars, to play with the way the deal with nodes coming from outside
rules and grammar composition.


Not getting this, but picture me with an intense look and nodding.


As for the root, the PEG rule is that the first rule in the grammar is
the root. But currently, I deliberately made my grammars unsealed, in
that the user can call any rule inside the grammar to begin parsing:

mixin(grammar("
 A<- B C
 B<- ...
 C<-
"));

A.parse("some input"); // standard way to do it, the parse tree will
be rooted on an 'A' node.
B.parse("some input"); // also possible, the parse tree will be rooted
on a 'B' node.


That sounds cool, but how do you say that in the construct

Expression <- Term "+" Term

the "+" should become the root?


Andrei


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread Alex Rønne Petersen

On 11-03-2012 18:17, Philippe Sigaud wrote:

 > By the way, bootstrap.d seems to fail to build at the moment:
 >
 > ../pegged/utils/bootstrap.d(1433): found ':' when expecting ')'
following template argument list
 > ../pegged/utils/bootstrap.d(1433): members expected
 > ../pegged/utils/bootstrap.d(1433): { } expected following aggregate
declaration
 > ../pegged/utils/bootstrap.d(1433): semicolon expected, not '!'
 > ../pegged/utils/bootstrap.d(1433): Declaration expected, not '!'
 > ../pegged/utils/bootstrap.d(1466): unrecognized declaration

Hmm, it compiled for me a few hours ago. I'll see if I broke something
while pushing.

I'll also try to make the whole grammar-modification process easier.
Since users can modify Pegged own grammar, I might as well make that
fluid and easy to do.

I'll put the Pegged grammar as a string in a separate module and create
a function that does the rest: modify the string, it will recompile the
entire grammar for you.



Is bootstrap.d currently essential to actually use Pegged?

--
- Alex


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread Alex Rønne Petersen

On 11-03-2012 18:19, Philippe Sigaud wrote:

 > Hm, I don't *think* C has such ambiguities but I could well be wrong.
In any case, if it can handle the non-ambiguous case, that's enough for me.

I wanted to tackle D this week, but I might as well begin with C :)

Do you happen to have any handy and readable EBNF grammar for C? At
least for D, I have dlang.org .



Yep, see: http://ssw.jku.at/Coco/ (C.atg)

Coco/R is more or less EBNF.

--
- Alex


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread Philippe Sigaud
> Hm, I don't *think* C has such ambiguities but I could well be wrong. In
any case, if it can handle the non-ambiguous case, that's enough for me.

I wanted to tackle D this week, but I might as well begin with C :)

Do you happen to have any handy and readable EBNF grammar for C? At least
for D, I have dlang.org.


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread Philippe Sigaud
> By the way, bootstrap.d seems to fail to build at the moment:
>
> ../pegged/utils/bootstrap.d(1433): found ':' when expecting ')' following
template argument list
> ../pegged/utils/bootstrap.d(1433): members expected
> ../pegged/utils/bootstrap.d(1433): { } expected following aggregate
declaration
> ../pegged/utils/bootstrap.d(1433): semicolon expected, not '!'
> ../pegged/utils/bootstrap.d(1433): Declaration expected, not '!'
> ../pegged/utils/bootstrap.d(1466): unrecognized declaration

Hmm, it compiled for me a few hours ago. I'll see if I broke something
while pushing.

I'll also try to make the whole grammar-modification process easier. Since
users can modify Pegged own grammar, I might as well make that fluid and
easy to do.

I'll put the Pegged grammar as a string in a separate module and create a
function that does the rest: modify the string, it will recompile the
entire grammar for you.


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread Alex Rønne Petersen

On 11-03-2012 18:06, Philippe Sigaud wrote:

 >> On Sun, Mar 11, 2012 at 00:34, Alex Rønne
Petersenmailto:xtzgzo...@gmail.com>>  wrote:

[Parsing C?]
 >> I think so. But you'd have to do add some semantic action to deal with
 >> typedefs and macros.
 >
 >
 > Oh, I should have mentioned I only meant the actual language
(ignoring the preprocessor).

OK. I admit I downloaded the C spec online, but was a bit taken aback by
the size of it. mot of it was the definition of the standard library,
but still...

 > Why do you need semantic actions for typedefs though? Can't you defer
resolution of types until after parsing?

Yes, that the way I'd do it. But some people seem to want to do it while
parsing. Maybe it blocks some parsing, if the parser encounter an
identifier where there should be a type?



Hm, I don't *think* C has such ambiguities but I could well be wrong. In 
any case, if it can handle the non-ambiguous case, that's enough for me. :)


--
- Alex


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread Philippe Sigaud
> Also, I have sent a pull request to fix the build on 64-bit:
https://github.com/PhilippeSigaud/Pegged/pull/1

Merged, thanks!


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread Philippe Sigaud
> Quick question, you mention the ability to opt-out of the
space-insensitivity, where might one find this?

Yes, undocumented. Use the '>' operator.

You know, I introduced space-insensitivity recently, to simplify some rules
and it keeps biting me back.

For example

Line <- (!EOL .)* EOL

The catch is, the (!EOL .) sequence accepts spaces (so, line terminators)
between the !EOL and the .

Crap.

So, I keep writing

Line <- (!EOL > .)* EOL

And I'm more and more convinced that ws a bbad move on my part. Or, at
least, give the user a way to opt-out for an entire rule.


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread Philippe Sigaud
alex:
> Question: Are the generated parsers, AST nodes, etc classes or structs?

They are structs. See:

https://github.com/PhilippeSigaud/Pegged/wiki/Parse-Trees


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread Philippe Sigaud
>> On Sun, Mar 11, 2012 at 00:34, Alex Rønne Petersen
 wrote:

[Parsing C?]
>> I think so. But you'd have to do add some semantic action to deal with
>> typedefs and macros.
>
>
> Oh, I should have mentioned I only meant the actual language (ignoring
the preprocessor).

OK. I admit I downloaded the C spec online, but was a bit taken aback by
the size of it. mot of it was the definition of the standard library, but
still...

> Why do you need semantic actions for typedefs though? Can't you defer
resolution of types until after parsing?

Yes, that the way I'd do it. But some people seem to want to do it while
parsing. Maybe it blocks some parsing, if the parser encounter an
identifier where there should be a type?


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread Alex Rønne Petersen

On 11-03-2012 16:02, Alex Rønne Petersen wrote:

On 11-03-2012 00:28, Philippe Sigaud wrote:

Hello,

I created a new Github project, Pegged, a Parsing Expression Grammar
(PEG) generator in D.

https://github.com/PhilippeSigaud/Pegged

docs: https://github.com/PhilippeSigaud/Pegged/wiki

PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar

The idea is to give the generator a PEG with the standard syntax. From
this grammar definition, a set of related parsers will be created, to be
used at runtime or compile time.

Usage
-

To use Pegged, just call the `grammar` function with a PEG and mix it
in. For example:


import pegged.grammar;

mixin(grammar("
Expr <- Factor AddExpr*
AddExpr <- ('+'/'-') Factor
Factor <- Primary MulExpr*
MulExpr <- ('*'/'/') Primary
Primary <- Parens / Number / Variable / '-' Primary

Parens <- '(' Expr ')'
Number <~ [0-9]+
Variable <- Identifier
"));



This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for
basic arithmetic expressions with operator precedence ('*' and '/' bind
stronger than '+' or '-'). `Identifier` is a pre-defined parser
recognizing your basic C-style identifier. Recursive or mutually
recursive rules are OK (no left recursion for now).

To use a parser, use the `.parse` method. It will return a parse tree
containing the calls to the different rules:

// Parsing at compile-time:
enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6");

pragma(msg, parseTree1.capture);
writeln(parseTree1);

// And at runtime too:
auto parseTree2 = Expr.parse(" 0 + 123 - 456 ");
assert(parseTree2.capture == ["0", "+", "123", "-", "456"]);



Features


* The complete set of PEG operators are implemented
* Pegged can parse its input at compile time and generate a complete
parse tree at compile time. In a word: compile-time string (read: D
code) transformation and generation.
* You can parse at runtime also, you lucky you.
* Use a standard and readable PEG syntax as a DSL, not a bunch of
templates that hide the parser in noise.
* But you can use expression templates if you want, as parsers are all
available as such. Pegged is implemented as an expression template, and
what's good for the library writer is sure OK for the user too.
* Some useful additional operators are there too: a way to discard
matches (thus dumping them from the parse tree), to push captures on a
stack, to accept matches that are equal to another match
* Adding new parsers is easy.
* Grammars are composable: you can put different
`mixin(grammar(rules));` in a module and then grammars and rules can
refer to one another. That way, you can have utility grammars providing
their functionalities to other grammars.
* That's why Pegged comes with some pre-defined grammars (JSON, etc).
* Grammars can be dumped in a file to create a D module.

More advanced features, outside the standard PEG perimeter are there to
bring more power in the mix:

* Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The
previous rule defines a parametrized parser taking two other parsers
(namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s.
* Named captures: any parser can be named with the `=` operator. The
parse tree generated by the parser (so, also its matches) is delivered
to the user in the output. Other parsers in the grammar see the named
captures too.
* Semantic actions can be added to any rule in a grammar. Once a rule
has matched, its associated action is called on the rule output and
passed as final result to other parsers further up the grammar. Do what
you want to the parse tree. If the passed actions are delegates, they
can access external variables.


Philippe



By the way, bootstrap.d seems to fail to build at the moment:

.../pegged/utils/bootstrap.d(1433): found ':' when expecting ')'
following template argument list
.../pegged/utils/bootstrap.d(1433): members expected
.../pegged/utils/bootstrap.d(1433): { } expected following aggregate
declaration
.../pegged/utils/bootstrap.d(1433): semicolon expected, not '!'
.../pegged/utils/bootstrap.d(1433): Declaration expected, not '!'
.../pegged/utils/bootstrap.d(1466): unrecognized declaration



Also, I have sent a pull request to fix the build on 64-bit: 
https://github.com/PhilippeSigaud/Pegged/pull/1


--
- Alex


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread Alex Rønne Petersen

On 11-03-2012 00:28, Philippe Sigaud wrote:

Hello,

I created a new Github project, Pegged, a Parsing Expression Grammar
(PEG) generator in D.

https://github.com/PhilippeSigaud/Pegged

docs: https://github.com/PhilippeSigaud/Pegged/wiki

PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar

The idea is to give the generator a PEG with the standard syntax. From
this grammar definition, a set of related parsers will be created, to be
used at runtime or compile time.

Usage
-

To use Pegged, just call the `grammar` function with a PEG and mix it
in. For example:


import pegged.grammar;

mixin(grammar("
Expr <- Factor AddExpr*
AddExpr <- ('+'/'-') Factor
Factor <- Primary MulExpr*
MulExpr <- ('*'/'/') Primary
Primary <- Parens / Number / Variable / '-' Primary

Parens <- '(' Expr ')'
Number <~ [0-9]+
Variable <- Identifier
"));



This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for
basic arithmetic expressions with operator precedence ('*' and '/' bind
stronger than '+' or '-'). `Identifier` is a pre-defined parser
recognizing your basic C-style identifier. Recursive or mutually
recursive rules are OK (no left recursion for now).

To use a parser, use the `.parse` method. It will return a parse tree
containing the calls to the different rules:

// Parsing at compile-time:
enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6");

pragma(msg, parseTree1.capture);
writeln(parseTree1);

// And at runtime too:
auto parseTree2 = Expr.parse(" 0 + 123 - 456 ");
assert(parseTree2.capture == ["0", "+", "123", "-", "456"]);



Features


* The complete set of PEG operators are implemented
* Pegged can parse its input at compile time and generate a complete
parse tree at compile time. In a word: compile-time string (read: D
code) transformation and generation.
* You can parse at runtime also, you lucky you.
* Use a standard and readable PEG syntax as a DSL, not a bunch of
templates that hide the parser in noise.
* But you can use expression templates if you want, as parsers are all
available as such. Pegged is implemented as an expression template, and
what's good for the library writer is sure OK for the user too.
* Some useful additional operators are there too: a way to discard
matches (thus dumping them from the parse tree), to push captures on a
stack, to accept matches that are equal to another match
* Adding new parsers is easy.
* Grammars are composable: you can put different
`mixin(grammar(rules));` in a module and then grammars and rules can
refer to one another. That way, you can have utility grammars providing
their functionalities to other grammars.
* That's why Pegged comes with some pre-defined grammars (JSON, etc).
* Grammars can be dumped in a file to create a D module.

More advanced features, outside the standard PEG perimeter are there to
bring more power in the mix:

* Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The
previous rule defines a parametrized parser taking two other parsers
(namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s.
* Named captures: any parser can be named with the `=` operator. The
parse tree generated by the parser (so, also its matches) is delivered
to the user in the output. Other parsers in the grammar see the named
captures too.
* Semantic actions can be added to any rule in a grammar. Once a rule
has matched, its associated action is called on the rule output and
passed as final result to other parsers further up the grammar. Do what
you want to the parse tree. If the passed actions are delegates, they
can access external variables.


Philippe



By the way, bootstrap.d seems to fail to build at the moment:

../pegged/utils/bootstrap.d(1433): found ':' when expecting ')' 
following template argument list

../pegged/utils/bootstrap.d(1433): members expected
../pegged/utils/bootstrap.d(1433): { } expected following aggregate 
declaration

../pegged/utils/bootstrap.d(1433): semicolon expected, not '!'
../pegged/utils/bootstrap.d(1433): Declaration expected, not '!'
../pegged/utils/bootstrap.d(1466): unrecognized declaration

--
- Alex


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread kraybourne

On 3/11/12 24:28 , Philippe Sigaud wrote:

Hello,

I created a new Github project, Pegged, a Parsing Expression Grammar
(PEG) generator in D.


Philippe



Very cool!

Quick question, you mention the ability to opt-out of the 
space-insensitivity, where might one find this?


Thanks!


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread Alex Rønne Petersen

On 11-03-2012 00:28, Philippe Sigaud wrote:

Hello,

I created a new Github project, Pegged, a Parsing Expression Grammar
(PEG) generator in D.

https://github.com/PhilippeSigaud/Pegged

docs: https://github.com/PhilippeSigaud/Pegged/wiki

PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar

The idea is to give the generator a PEG with the standard syntax. From
this grammar definition, a set of related parsers will be created, to be
used at runtime or compile time.

Usage
-

To use Pegged, just call the `grammar` function with a PEG and mix it
in. For example:


import pegged.grammar;

mixin(grammar("
Expr <- Factor AddExpr*
AddExpr <- ('+'/'-') Factor
Factor <- Primary MulExpr*
MulExpr <- ('*'/'/') Primary
Primary <- Parens / Number / Variable / '-' Primary

Parens <- '(' Expr ')'
Number <~ [0-9]+
Variable <- Identifier
"));



This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for
basic arithmetic expressions with operator precedence ('*' and '/' bind
stronger than '+' or '-'). `Identifier` is a pre-defined parser
recognizing your basic C-style identifier. Recursive or mutually
recursive rules are OK (no left recursion for now).

To use a parser, use the `.parse` method. It will return a parse tree
containing the calls to the different rules:

// Parsing at compile-time:
enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6");

pragma(msg, parseTree1.capture);
writeln(parseTree1);

// And at runtime too:
auto parseTree2 = Expr.parse(" 0 + 123 - 456 ");
assert(parseTree2.capture == ["0", "+", "123", "-", "456"]);



Features


* The complete set of PEG operators are implemented
* Pegged can parse its input at compile time and generate a complete
parse tree at compile time. In a word: compile-time string (read: D
code) transformation and generation.
* You can parse at runtime also, you lucky you.
* Use a standard and readable PEG syntax as a DSL, not a bunch of
templates that hide the parser in noise.
* But you can use expression templates if you want, as parsers are all
available as such. Pegged is implemented as an expression template, and
what's good for the library writer is sure OK for the user too.
* Some useful additional operators are there too: a way to discard
matches (thus dumping them from the parse tree), to push captures on a
stack, to accept matches that are equal to another match
* Adding new parsers is easy.
* Grammars are composable: you can put different
`mixin(grammar(rules));` in a module and then grammars and rules can
refer to one another. That way, you can have utility grammars providing
their functionalities to other grammars.
* That's why Pegged comes with some pre-defined grammars (JSON, etc).
* Grammars can be dumped in a file to create a D module.

More advanced features, outside the standard PEG perimeter are there to
bring more power in the mix:

* Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The
previous rule defines a parametrized parser taking two other parsers
(namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s.
* Named captures: any parser can be named with the `=` operator. The
parse tree generated by the parser (so, also its matches) is delivered
to the user in the output. Other parsers in the grammar see the named
captures too.
* Semantic actions can be added to any rule in a grammar. Once a rule
has matched, its associated action is called on the rule output and
passed as final result to other parsers further up the grammar. Do what
you want to the parse tree. If the passed actions are delegates, they
can access external variables.


Philippe



Question: Are the generated parsers, AST nodes, etc classes or structs?

--
- Alex


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread Alex Rønne Petersen

On 11-03-2012 08:22, Philippe Sigaud wrote:

On Sun, Mar 11, 2012 at 00:34, Alex Rønne Petersen  wrote:


Admittedly I have not heard of PEGs before, so I'm curious: Is this powerful
enough to parse a language such as C?


I think so. But you'd have to do add some semantic action to deal with
typedefs and macros.


Oh, I should have mentioned I only meant the actual language (ignoring 
the preprocessor).


Why do you need semantic actions for typedefs though? Can't you defer 
resolution of types until after parsing?




People parsed Java and Javascript with them. I personnally never used
Pegged for more than JSON and custom formats, but I intend to try the
D grammar.


--
- Alex


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-11 Thread Philippe Sigaud
On Sun, Mar 11, 2012 at 08:51, Andrei Alexandrescu
 wrote:

> I was thinking of ANTLR-style operators in which you say where the root
> should be and you get to drop unnecessary nodes.
>
> (Post on 2012/02/29 10:19AM GMT-0600.)

Get it, thanks for the ref!

There is an operator to drop unnecessary nodes (':'). Apart from that,
a Pegged grammar is a self-contained entity: it automatically cuts
nodes coming from other grammars, to simplify the tree (it keeps the
matcheds substrings, of course). I plan to code different type of
grammars, to play with the way the deal with nodes coming from outside
rules and grammar composition.

As for the root, the PEG rule is that the first rule in the grammar is
the root. But currently, I deliberately made my grammars unsealed, in
that the user can call any rule inside the grammar to begin parsing:

mixin(grammar("
A <- B C
B <- ...
C <-
"));

A.parse("some input"); // standard way to do it, the parse tree will
be rooted on an 'A' node.
B.parse("some input"); // also possible, the parse tree will be rooted
on a 'B' node.

I'll add what I called closed and sealed grammars. I'll also add named
grammars and such in the days to come. I can add a root note
indication ("unless told otherwise, when asked to parse with grammar
G, begin with rule R).


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread Philippe Sigaud
On Sun, Mar 11, 2012 at 08:53, Philippe Sigaud
 wrote:

> Anyway, the separating if done on the rule definitions  (Identifier <- ...)

Aw, I meant, the separation is done on rule definitions. Damn children
climbing on me to get more breakfast :-)


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread Andrei Alexandrescu

On 3/11/12 1:38 AM, Philippe Sigaud wrote:

On Sun, Mar 11, 2012 at 08:26, Andrei Alexandrescu
  wrote:


Any chance you consider adding AST generator actions as discussed in the
main forum a while ago?


The AST is automatically produced, and there are already AST actions
to simplify / guide its creation. There already is an 'ignore that
node' syntax and a 'fuse the captures here'.
Also, semantic actions are there, so I think the basis is there, I
just need to add a few tree actions : cut children, take only some
children.

I'll add it today.

But maybe you had something else in mind? I'll go and search for the
thread you refer to.


I was thinking of ANTLR-style operators in which you say where the root 
should be and you get to drop unnecessary nodes.


(Post on 2012/02/29 10:19AM GMT-0600.)


Andrei


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread Philippe Sigaud
On Sun, Mar 11, 2012 at 08:47, Andrei Alexandrescu
 wrote:

> Great! I think you'd be wise to add the ";", otherwise it's impossible to
> split complex rules on more than one line. Alternatively you could have the
> complement, a line continuation thingie.

Complex rules can be multi-line, I'll put more examples of this in the docs.

Now that Pegged is bootstrapped (the grammar generator was generated
by Pegged itself), it's easy to change the grammar.

Anyway, the separating if done on the rule definitions  (Identifier <- ...)

A <- B   C
 / C
/D# // OK, cut between D and E.
E <- New Def

You can insert line comments with '#'


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread Philippe Sigaud
On Sun, Mar 11, 2012 at 08:46, Jonathan M Davis  wrote:

>> Is there an EBNF grammar for D somewhere, I mean outside the dlang docs?

> Someone may have been trying to write one, but I don't believe that there's an
> official one, and I recall someone complaining the the BNF grammar was
> inaccurate in a number of places, but that may have been fixed now, since
> Walter did a number of documentation fixes a few weeks back.

Hmm. I'll post on D.main and see.


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread Andrei Alexandrescu

On 3/11/12 1:35 AM, Philippe Sigaud wrote:

On Sun, Mar 11, 2012 at 08:26, Andrei Alexandrescu
  wrote:

Splitting on ";" is trivial and makes client code considerably easier to
play with.


It's already implemented! No need for ';'


Great! I think you'd be wise to add the ";", otherwise it's impossible 
to split complex rules on more than one line. Alternatively you could 
have the complement, a line continuation thingie.



Andrei


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread Jonathan M Davis
On Sunday, March 11, 2012 08:39:47 Philippe Sigaud wrote:
> On Sun, Mar 11, 2012 at 08:30, Jonathan M Davis  wrote:
> > On Sunday, March 11, 2012 00:28:42 Philippe Sigaud wrote:
> >> Hello,
> >> 
> >> I created a new Github project, Pegged, a Parsing Expression
> >> Grammar (PEG) generator in D.
> > 
> > Cool! PEGs aren't used all that much for language grammars - primarily
> > because BNF and EBNF are more traditional I think - but they're
> > definitely cool. And it's great to see a D implementation of a PEG
> > parser.
> 
> Is there an EBNF grammar for D somewhere, I mean outside the dlang docs?
> 
> I vaguely remember someone trying to write one.

Someone may have been trying to write one, but I don't believe that there's an 
official one, and I recall someone complaining the the BNF grammar was 
inaccurate in a number of places, but that may have been fixed now, since 
Walter did a number of documentation fixes a few weeks back.

- Jonathan M Davis


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread Philippe Sigaud
On Sun, Mar 11, 2012 at 08:30, Jonathan M Davis  wrote:
> On Sunday, March 11, 2012 00:28:42 Philippe Sigaud wrote:
>> Hello,
>>
>> I created a new Github project, Pegged, a Parsing Expression
>> Grammar (PEG) generator in D.
>
> Cool! PEGs aren't used all that much for language grammars - primarily because
> BNF and EBNF are more traditional I think - but they're definitely cool. And
> it's great to see a D implementation of a PEG parser.

Is there an EBNF grammar for D somewhere, I mean outside the dlang docs?

I vaguely remember someone trying to write one.


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread Philippe Sigaud
On Sun, Mar 11, 2012 at 08:26, Andrei Alexandrescu
 wrote:

> Any chance you consider adding AST generator actions as discussed in the
> main forum a while ago?

The AST is automatically produced, and there are already AST actions
to simplify / guide its creation. There already is an 'ignore that
node' syntax and a 'fuse the captures here'.
Also, semantic actions are there, so I think the basis is there, I
just need to add a few tree actions : cut children, take only some
children.

I'll add it today.

But maybe you had something else in mind? I'll go and search for the
thread you refer to.


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread Philippe Sigaud
On Sun, Mar 11, 2012 at 08:26, Andrei Alexandrescu
 wrote:
> I, too, think this is very significant work! Suggestion for Philippe:
> instead of this:
>
>
> enum PEGCode = grammarCode!(
>     "Grammar <- S Definition+ EOI"
>    ,"Definition <- RuleName Arrow Expression"
>    ,"RuleName   <- Identifier>(ParamList?)"
>    ,"Expression <- Sequence (OR Sequence)*"
> );
>
> how about this:
>
> enum PEGCode = grammarCode!("
>    Grammar <- S Definition+ EOI;
>
>    Definition <- RuleName Arrow Expression;
>    RuleName   <- Identifier>(ParamList?);
>    Expression <- Sequence (OR Sequence)*;
> ");
>
> Splitting on ";" is trivial and makes client code considerably easier to
> play with.

It's already implemented! No need for ';'
The docs are up to date, the internal code also. It's only an old
bootstrapping file that Andrej cited that still contains multi-string
definitions. I'll correct that, as that means I didn"t push dogfooding
far enough.

Now, just use one string:

"Grammar <- ...
 Definition <- ...

"


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread Jonathan M Davis
On Sunday, March 11, 2012 00:28:42 Philippe Sigaud wrote:
> Hello,
> 
> I created a new Github project, Pegged, a Parsing Expression
> Grammar (PEG) generator in D.

Cool! PEGs aren't used all that much for language grammars - primarily because 
BNF and EBNF are more traditional I think - but they're definitely cool. And 
it's great to see a D implementation of a PEG parser.

- Jonathan M Davis


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread Andrei Alexandrescu

On 3/11/12 1:22 AM, Philippe Sigaud wrote:

On Sun, Mar 11, 2012 at 00:34, Alex Rønne Petersen  wrote:


Admittedly I have not heard of PEGs before, so I'm curious: Is this powerful
enough to parse a language such as C?


I think so. But you'd have to do add some semantic action to deal with
typedefs and macros.

People parsed Java and Javascript with them. I personnally never used
Pegged for more than JSON and custom formats, but I intend to try the
D grammar.


Any chance you consider adding AST generator actions as discussed in the 
main forum a while ago?


Andrei


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread Andrei Alexandrescu

On 3/10/12 5:56 PM, Andrej Mitrovic wrote:

I see you are not the only one who started writing string array
literals like this:

enum PEGCode = grammarCode!(
  "Grammar<- S Definition+ EOI"
 ,"Definition<- RuleName Arrow Expression"
 ,"RuleName<- Identifier>(ParamList?)"
 ,"Expression<- Sequence (OR Sequence)*"
);

IOW comma on the left side. I know it's not a style preference but
actually a (unfortunate but needed) technique for avoiding bugs. :)

So can you use this to actually parse D code, extract syntax trees and
stuff like that? I'm clueless about parsing but it looks very neat.
Nice work!


I, too, think this is very significant work! Suggestion for Philippe: 
instead of this:


enum PEGCode = grammarCode!(
 "Grammar <- S Definition+ EOI"
,"Definition <- RuleName Arrow Expression"
,"RuleName   <- Identifier>(ParamList?)"
,"Expression <- Sequence (OR Sequence)*"
);

how about this:

enum PEGCode = grammarCode!("
Grammar <- S Definition+ EOI;
Definition <- RuleName Arrow Expression;
RuleName   <- Identifier>(ParamList?);
Expression <- Sequence (OR Sequence)*;
");

Splitting on ";" is trivial and makes client code considerably easier to 
play with.


I'll get back to this.


Andrei


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread Philippe Sigaud
On Sun, Mar 11, 2012 at 00:56, Andrej Mitrovic
 wrote:
> I see you are not the only one who started writing string array
> literals like this:
>
> enum PEGCode = grammarCode!(
>     "Grammar <- S Definition+ EOI"
>    ,"Definition <- RuleName Arrow Expression"
>    ,"RuleName   <- Identifier>(ParamList?)"
>    ,"Expression <- Sequence (OR Sequence)*"
> );
>
> IOW comma on the left side. I know it's not a style preference but
> actually a (unfortunate but needed) technique for avoiding bugs. :)

Yes, I use what I call "Haskell-comma" (I saw it first on Haskell
code). But in the previous sample, that's old code. Now, that should
only be;

enum code = grammar("
Grammar <- S Definition+ EOI
Definition <- RuleName Arrow Expression
RuleName   <- Identifier>(ParamList?)
Expression <- Sequence (OR Sequence)*
");

I see I didn't update bootstrap.d, I'll do that.


> So can you use this to actually parse D code, extract syntax trees and
> stuff like that? I'm clueless about parsing but it looks very neat.

I can partially parse D code, because I only wrote part of the D
grammar (see the pegged.examples.dgrammar module), just enough to
parse most of template constraints. I intend to complete it and see if
that floats.

Yes, it extracts syntax trees, at compile-time or runtime.

> Nice work!

Thanks!


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread Philippe Sigaud
On Sun, Mar 11, 2012 at 00:34, Alex Rønne Petersen  wrote:

> Admittedly I have not heard of PEGs before, so I'm curious: Is this powerful
> enough to parse a language such as C?

I think so. But you'd have to do add some semantic action to deal with
typedefs and macros.

People parsed Java and Javascript with them. I personnally never used
Pegged for more than JSON and custom formats, but I intend to try the
D grammar.


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread bearophile
Andrej Mitrovic:

> IOW comma on the left side. I know it's not a style preference but
> actually a (unfortunate but needed) technique for avoiding bugs. :)

To avoid that bug:
http://d.puremagic.com/issues/show_bug.cgi?id=3827

Bye,
bearophile


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread Andrej Mitrovic
I see you are not the only one who started writing string array
literals like this:

enum PEGCode = grammarCode!(
 "Grammar <- S Definition+ EOI"
,"Definition <- RuleName Arrow Expression"
,"RuleName   <- Identifier>(ParamList?)"
,"Expression <- Sequence (OR Sequence)*"
);

IOW comma on the left side. I know it's not a style preference but
actually a (unfortunate but needed) technique for avoiding bugs. :)

So can you use this to actually parse D code, extract syntax trees and
stuff like that? I'm clueless about parsing but it looks very neat.
Nice work!


Re: Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread Alex Rønne Petersen

On 11-03-2012 00:28, Philippe Sigaud wrote:

Hello,

I created a new Github project, Pegged, a Parsing Expression Grammar
(PEG) generator in D.

https://github.com/PhilippeSigaud/Pegged

docs: https://github.com/PhilippeSigaud/Pegged/wiki

PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar

The idea is to give the generator a PEG with the standard syntax. From
this grammar definition, a set of related parsers will be created, to be
used at runtime or compile time.

Usage
-

To use Pegged, just call the `grammar` function with a PEG and mix it
in. For example:


import pegged.grammar;

mixin(grammar("
Expr <- Factor AddExpr*
AddExpr <- ('+'/'-') Factor
Factor <- Primary MulExpr*
MulExpr <- ('*'/'/') Primary
Primary <- Parens / Number / Variable / '-' Primary

Parens <- '(' Expr ')'
Number <~ [0-9]+
Variable <- Identifier
"));



This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers for
basic arithmetic expressions with operator precedence ('*' and '/' bind
stronger than '+' or '-'). `Identifier` is a pre-defined parser
recognizing your basic C-style identifier. Recursive or mutually
recursive rules are OK (no left recursion for now).

To use a parser, use the `.parse` method. It will return a parse tree
containing the calls to the different rules:

// Parsing at compile-time:
enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6");

pragma(msg, parseTree1.capture);
writeln(parseTree1);

// And at runtime too:
auto parseTree2 = Expr.parse(" 0 + 123 - 456 ");
assert(parseTree2.capture == ["0", "+", "123", "-", "456"]);



Features


* The complete set of PEG operators are implemented
* Pegged can parse its input at compile time and generate a complete
parse tree at compile time. In a word: compile-time string (read: D
code) transformation and generation.
* You can parse at runtime also, you lucky you.
* Use a standard and readable PEG syntax as a DSL, not a bunch of
templates that hide the parser in noise.
* But you can use expression templates if you want, as parsers are all
available as such. Pegged is implemented as an expression template, and
what's good for the library writer is sure OK for the user too.
* Some useful additional operators are there too: a way to discard
matches (thus dumping them from the parse tree), to push captures on a
stack, to accept matches that are equal to another match
* Adding new parsers is easy.
* Grammars are composable: you can put different
`mixin(grammar(rules));` in a module and then grammars and rules can
refer to one another. That way, you can have utility grammars providing
their functionalities to other grammars.
* That's why Pegged comes with some pre-defined grammars (JSON, etc).
* Grammars can be dumped in a file to create a D module.

More advanced features, outside the standard PEG perimeter are there to
bring more power in the mix:

* Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. The
previous rule defines a parametrized parser taking two other parsers
(namely, `E` and `Sep`) to match a `Sep`-separated list of `E`'s.
* Named captures: any parser can be named with the `=` operator. The
parse tree generated by the parser (so, also its matches) is delivered
to the user in the output. Other parsers in the grammar see the named
captures too.
* Semantic actions can be added to any rule in a grammar. Once a rule
has matched, its associated action is called on the rule output and
passed as final result to other parsers further up the grammar. Do what
you want to the parse tree. If the passed actions are delegates, they
can access external variables.


Philippe



Admittedly I have not heard of PEGs before, so I'm curious: Is this 
powerful enough to parse a language such as C?


--
- Alex


Pegged, a Parsing Expression Grammar (PEG) generator in D

2012-03-10 Thread Philippe Sigaud

Hello,

I created a new Github project, Pegged, a Parsing Expression 
Grammar (PEG) generator in D.


https://github.com/PhilippeSigaud/Pegged

docs: https://github.com/PhilippeSigaud/Pegged/wiki

PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar

The idea is to give the generator a PEG with the standard syntax. 
From this grammar definition, a set of related parsers will be 
created, to be used at runtime or compile time.


Usage
-

To use Pegged, just call the `grammar` function with a PEG and 
mix it in. For example:



import pegged.grammar;

mixin(grammar("
Expr <- Factor AddExpr*
AddExpr  <- ('+'/'-') Factor
Factor   <- Primary MulExpr*
MulExpr  <- ('*'/'/') Primary
Primary  <- Parens / Number / Variable / '-' Primary

Parens   <- '(' Expr ')'
Number   <~ [0-9]+
Variable <- Identifier
"));



This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers 
for basic arithmetic expressions with operator precedence ('*' 
and '/' bind stronger than '+' or '-'). `Identifier` is a 
pre-defined parser recognizing your basic C-style identifier. 
Recursive or mutually recursive rules are OK (no left recursion 
for now).


To use a parser, use the `.parse` method. It will return a parse 
tree containing the calls to the different rules:


// Parsing at compile-time:
enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6");

pragma(msg, parseTree1.capture);
writeln(parseTree1);

// And at runtime too:
auto parseTree2 = Expr.parse(" 0 + 123 - 456 ");
assert(parseTree2.capture == ["0", "+", "123", "-", "456"]);



Features


* The complete set of PEG operators are implemented
* Pegged can parse its input at compile time and generate a 
complete parse tree at compile time. In a word: compile-time 
string (read: D code) transformation and generation.

* You can parse at runtime also, you lucky you.
* Use a standard and readable PEG syntax as a DSL, not a bunch of 
templates that hide the parser in noise.
* But you can use expression templates if you want, as parsers 
are all available as such. Pegged is implemented as an expression 
template, and what's good for the library writer is sure OK for 
the user too.
* Some useful additional operators are there too: a way to 
discard matches (thus dumping them from the parse tree), to push 
captures on a stack, to accept matches that are equal to another 
match

* Adding new parsers is easy.
* Grammars are composable: you can put different 
`mixin(grammar(rules));` in a module and then grammars and rules 
can refer to one another. That way, you can have utility grammars 
providing their functionalities to other grammars.
* That's why Pegged comes with some pre-defined grammars (JSON, 
etc).

* Grammars can be dumped in a file to create a D module.

More advanced features, outside the standard PEG perimeter are 
there to bring more power in the mix:


* Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. 
The previous rule defines a parametrized parser taking two other 
parsers (namely, `E` and `Sep`) to match a `Sep`-separated list 
of `E`'s.
* Named captures: any parser can be named with the `=` operator. 
The parse tree generated by the parser (so, also its matches) is 
delivered to the user in the output. Other parsers in the grammar 
see the named captures too.
* Semantic actions can be added to any rule in a grammar. Once a 
rule has matched, its associated action is called on the rule 
output and passed as final result to other parsers further up the 
grammar. Do what you want to the parse tree. If the passed 
actions are delegates, they can access external variables.



Philippe