Re: D parsing

2015-11-15 Thread Nordlöw via Digitalmars-d

On Sunday, 15 November 2015 at 01:17:47 UTC, Bastiaan Veelo wrote:
Left-recursion for Pegged is in the works: 
https://github.com/PhilippeSigaud/Pegged/pull/164


:-)


And merged!: https://github.com/PhilippeSigaud/Pegged/pull/164

Would be nice to have an overview of which languages the, with 
this change, can be parsed by a Pegged grammar.


Re: D parsing

2015-11-14 Thread Bastiaan Veelo via Digitalmars-d

On Sunday, 3 November 2013 at 01:45:23 UTC, Timothee Cour wrote:

1)
The main issue I see with pegged is PEG grammars don't support 
left

recursion, so for example will fail on foo[1].bar(2).fun().
Unless there's a plan to accomodate those, I sense a dead end.
One can eliminate left recursion but this has issues.

2)
There is some material on extending PEG to support those, eg 
"Left

Recursion in Parsing Expression Grammars", or code
https://github.com/orlandohill/peg-left-recursion but I don't 
know how well

they work in practice.


Left-recursion for Pegged is in the works: 
https://github.com/PhilippeSigaud/Pegged/pull/164


:-)


Re: D Parsing (again)/ D grammar

2014-10-03 Thread Vladimir Kazanov via Digitalmars-d
On Friday, 3 October 2014 at 05:10:45 UTC, Philippe Sigaud via 
Digitalmars-d wrote:



Yes, Elkhound is interesting, their approach is nice. But It 
gave me

the impression to be abandoned for a few years?



Don't know and don't care, really. All I know is that Scott sort 
of managed to deal with the main generalized parser application 
problems by avoiding them most of the time :)) May be a bad 
sign... After all, most of modern parser generators or parser 
combinators do not use GLRs, although they do sound interesting 
in theory.


Something tells me one has to stress this word here: *theory* :-)

By now, I have read the original Tomita's GLR paper, Antlr 
ALL(*) paper, a
few recent GLR papers, three papers on GLL and a few related 
ones . It
took... A while. I sort of understand the idea, but still not 
sure about the

details :-)


ALL(*) is on my todo list. I tried to implement it in Spring, 
but got
bogged down in the details. Even the white paper has some 
imprecisions

when you really sit down and try to code it.
I could have a look at ANTLR v4 source, but wanted to have a 
clean

start, right from the white paper.


What's the name of the paper you read? Modelling GLL Parser
Implementation?


Yes.


Scientists... The algorithm is hard to implement... Okay, let's 
invent an esoteric paper-only language to explain things to 
people :-)


Thanks a lot, by the way!

I've just skimmed through the code and the README... You did not 
use the packed forest representation, did you..?


Re: D Parsing (again)/ D grammar

2014-10-03 Thread Philippe Sigaud via Digitalmars-d
 Thanks a lot, by the way!

 I've just skimmed through the code and the README... You did not use the
 packed forest representation, did you..?

Sorry for the microscopic documentation (Pegged is more documented
than that...), it was a 'for me only' project.

The forest is packed, in the sense that common nodes are re-used and
shared among parse trees: all intermediate parse results from any
grammar part is stored and used to produce the parse nodes.

The range view gives access to parse trees one after another, but the
global parse forest is present in the grammar object (or rather,
generated and completed during the parse process: each new parse
result completes the parse forest).

It has a strange effect on parsing times repartition among the parse results:
If you time the different parse trees, you'll see that the first one
might take maybe 40% of the entire parsing time all by itself, because
it has to discover all parse results alone. The following trees are
very rapidly calculated, since the intermediate parse results are
already known. Of course, once the parse trees begin to deviate from
the first ones, the process slows down again since they have to
explore as-yet-unused rules and input slices.

I'm not sure the previous paragraph is clear...

Imagine you have 10 different parse trees. They could be disributed like this:

# parse result  % global parse time
1  40%
2  2
3  3
4  3
5  5
6  6
7  8
8  10
9  11
1012


Re: D Parsing (again)/ D grammar

2014-10-03 Thread Vladimir Kazanov via Digitalmars-d
On Friday, 3 October 2014 at 16:20:29 UTC, Philippe Sigaud via 
Digitalmars-d wrote:

Thanks a lot, by the way!

I've just skimmed through the code and the README... You did 
not use the

packed forest representation, did you..?


Sorry for the microscopic documentation (Pegged is more 
documented

than that...), it was a 'for me only' project.


This is perfectly fine with me: I think I should be okay with the 
theory behind

the code, and your style isn't cryptic.



I'm not sure the previous paragraph is clear...


I got the point.

Imagine you have 10 different parse trees. They could be 
disributed like this:


# parse result  % global parse time
1  40%
2  2
3  3
4  3
5  5
6  6
7  8
8  10
9  11
1012


Interesting, indeed.

Anyway, thank you very much for the code. The weekend is coming 
- I'll play with your implementation and see if there any 
improvements possible.


Re: D Parsing (again)/ D grammar

2014-10-03 Thread Philippe Sigaud via Digitalmars-d
 Anyway, thank you very much for the code. The weekend is coming - I'll play
 with your implementation and see if there any improvements possible.

Be sure to keep me informed of any enormous mistake I made. I tried
Appender and other concatenation means, without big success.

Btw, I saw on the ML that using byKeys.front() is very slow. Use
keys[0] of somesuch instead.


Re: D Parsing (again)/ D grammar

2014-10-03 Thread Vladimir Kazanov via Digitalmars-d
On Friday, 3 October 2014 at 17:24:43 UTC, Philippe Sigaud via 
Digitalmars-d wrote:


Be sure to keep me informed of any enormous mistake I made. I 
tried

Appender and other concatenation means, without big success.


I am not sure if there are any. Maybe GLL just IS non-practical, 
after all. Right now only one thing is for sure: the generalized 
parsing fruit is not a low-hanging one.


Yeap, I will let you know. This is sort of a personal challenge 
now :)



Btw, I saw on the ML that using byKeys.front() is very slow. Use
keys[0] of somesuch instead.


Hm... Funny. Did you investigate into it..?


D Parsing (again)/ D grammar

2014-10-02 Thread Vladimir Kazanov via Digitalmars-d

Hello, everyone!

Recently, I have been doing some experimental work related to 
generalized parser generators. It is not yet ready for a public 
release, I am not even sure it will ever be. Anyway, right now I 
am considering adding D as a target for parser generation. 
Actually, it's more than that: the grand target is to be able to 
generate a D language parser in D. With Python, for example, it 
is rather easy, as I can just mechanically transform the original 
grammar into a suitable form and be done with it.


The generator itself is quite powerful, theoretically it should 
be able to handle all context-free grammar (see 
http://dotat.at/tmp/gll.pdf for theory).


There is a grammar page on dlang.org 
(http://dlang.org/grammar.html). I have also found a related 
discussion in the forum 
(http://forum.dlang.org/thread/bwsofbnigfbrxwoui...@forum.dlang.org). 
From the discussion I found out that D parser is a hand-made 
RD-parser with a few tricks(c).


So, my questions:

1. How realistic is the grammar? Is it a real one or just a 
sketch of sorts? Is it something developers use to build the 
parser?


2. There's also a D grammar for Pegged. How good is it? Can it be 
used as a starting point to build a parser?


Re: D Parsing (again)/ D grammar

2014-10-02 Thread Daniel Kozak via Digitalmars-d
V Thu, 02 Oct 2014 13:49:10 +
Vladimir Kazanov via Digitalmars-d digitalmars-d@puremagic.com
napsáno:

 Hello, everyone!
 
 Recently, I have been doing some experimental work related to 
 generalized parser generators. It is not yet ready for a public 
 release, I am not even sure it will ever be. Anyway, right now I 
 am considering adding D as a target for parser generation. 
 Actually, it's more than that: the grand target is to be able to 
 generate a D language parser in D. With Python, for example, it 
 is rather easy, as I can just mechanically transform the original 
 grammar into a suitable form and be done with it.
 
 The generator itself is quite powerful, theoretically it should 
 be able to handle all context-free grammar (see 
 http://dotat.at/tmp/gll.pdf for theory).
 
 There is a grammar page on dlang.org 
 (http://dlang.org/grammar.html). I have also found a related 
 discussion in the forum 
 (http://forum.dlang.org/thread/bwsofbnigfbrxwoui...@forum.dlang.org). 
  From the discussion I found out that D parser is a hand-made 
 RD-parser with a few tricks(c).
 
 So, my questions:
 
 1. How realistic is the grammar? Is it a real one or just a 
 sketch of sorts? Is it something developers use to build the 
 parser?
 
 2. There's also a D grammar for Pegged. How good is it? Can it be 
 used as a starting point to build a parser?

https://github.com/Hackerpilot/DGrammar



Re: D Parsing (again)/ D grammar

2014-10-02 Thread via Digitalmars-d
On Thursday, 2 October 2014 at 13:49:12 UTC, Vladimir Kazanov 
wrote:
The generator itself is quite powerful, theoretically it should 
be able to handle all context-free grammar (see 
http://dotat.at/tmp/gll.pdf for theory).


Cool, GLL is the way to go IMO, but I am also looking at 
Earley-parsers. What is the advantage of GLL over Earley if you 
use a parser generator? I think they both are O(3) or something 
like that?


From the discussion I found out that D parser is a hand-made 
RD-parser with a few tricks(c).


I think D is close to LL(2) for the most part. But I suppose a 
GLL parser could allow keywords to be used as symbol names in 
most cases? That would be nice.




Re: D Parsing (again)/ D grammar

2014-10-02 Thread Vladimir Kazanov via Digitalmars-d
On Thursday, 2 October 2014 at 14:08:09 UTC, Daniel Kozak via 
Digitalmars-d wrote:




https://github.com/Hackerpilot/DGrammar


Thank you, this definitely looks like something I was looking for.

Any known pitfalls here..?


Re: D Parsing (again)/ D grammar

2014-10-02 Thread Vladimir Kazanov via Digitalmars-d
On Thursday, 2 October 2014 at 15:01:13 UTC, Ola Fosheim Grøstad 
wrote:


Cool, GLL is the way to go IMO, but I am also looking at 
Earley-parsers. What is the advantage of GLL over Earley if you 
use a parser generator? I think they both are O(3) or something 
like that?


They are somewhat similar in terms of asymptotic complexity on 
complicated examples. Constant is better though. But there's a 
nice property of all generalized parsers: for LL (for GLL) and  
LR (for GLR) parts of grammars they go almost as fast as LL/LR 
parsers do. On ambiguities they slow down, of course.


There are four properties I really like:

1. GLL should be faster than Earley's (even the modern 
incarnations of it), but this is something I have yet to test.


2. It is fully general.

3. The automatically generated code repeats the original grammar 
structure - the same way recursive decent parsers do.


4. The core parser is still that simple LL/RD parser I can 
practically debug.


This comes at a price, as usual... I would not call it obvious 
:-) But nobody can say that modern Earley's flavours are trivial.


From the discussion I found out that D parser is a hand-made 
RD-parser with a few tricks(c).


I think D is close to LL(2) for the most part. But I suppose a 
GLL parser could allow keywords to be used as symbol names in 
most cases? That would be nice.


This is possible, I guess, the same way people do it in GLR 
parsers.


Re: D Parsing (again)/ D grammar

2014-10-02 Thread Cliff via Digitalmars-d
On Thursday, 2 October 2014 at 15:47:04 UTC, Vladimir Kazanov 
wrote:
On Thursday, 2 October 2014 at 15:01:13 UTC, Ola Fosheim 
Grøstad wrote:


Cool, GLL is the way to go IMO, but I am also looking at 
Earley-parsers. What is the advantage of GLL over Earley if 
you use a parser generator? I think they both are O(3) or 
something like that?


They are somewhat similar in terms of asymptotic complexity on 
complicated examples. Constant is better though. But there's a 
nice property of all generalized parsers: for LL (for GLL) and  
LR (for GLR) parts of grammars they go almost as fast as LL/LR 
parsers do. On ambiguities they slow down, of course.


There are four properties I really like:

1. GLL should be faster than Earley's (even the modern 
incarnations of it), but this is something I have yet to test.


2. It is fully general.

3. The automatically generated code repeats the original 
grammar structure - the same way recursive decent parsers do.


4. The core parser is still that simple LL/RD parser I can 
practically debug.


This comes at a price, as usual... I would not call it obvious 
:-) But nobody can say that modern Earley's flavours are 
trivial.


From the discussion I found out that D parser is a hand-made 
RD-parser with a few tricks(c).


I think D is close to LL(2) for the most part. But I suppose a 
GLL parser could allow keywords to be used as symbol names in 
most cases? That would be nice.


This is possible, I guess, the same way people do it in GLR 
parsers.


What has steered you down the path of writing your own parser 
generator as opposed to using an existing one such as ANTLR?  
Were there properties you wanted that it didn't have, or 
performance, or...?


Re: D Parsing (again)/ D grammar

2014-10-02 Thread via Digitalmars-d
On Thursday, 2 October 2014 at 15:47:04 UTC, Vladimir Kazanov 
wrote:
3. The automatically generated code repeats the original 
grammar structure - the same way recursive decent parsers do.


4. The core parser is still that simple LL/RD parser I can 
practically debug.


This sounds nice! Does this mean that it would be possible to use 
your parser generator to create a skeleton which is then 
manipulated manually or is there non-local complexities that 
makes manual edits risky?


One reason I believe GLL is the right way to go is that I think 
RD makes it easier to generate good error messages suitable for 
display (to the end user).


Re: D Parsing (again)/ D grammar

2014-10-02 Thread Vladimir Kazanov via Digitalmars-d

On Thursday, 2 October 2014 at 17:17:53 UTC, Cliff wrote:



What has steered you down the path of writing your own parser 
generator as opposed to using an existing one such as ANTLR?  
Were there properties you wanted that it didn't have, or 
performance, or...?


Like I said in the introducing post, this is a personal 
experiment of sorts. I am aware of most alternatives, such as 
ANTLR's ALL(*) and many, MANY others. :) And I would never write 
something myself as a part of my full-time job.


But right now I am writing an article on generalized parsers, 
toying with implementations I could lay my hands on, implementing 
others. GLL is a rather exotic LL flavor which looks attractive 
in theory. I want to see it in practice.


Re: D Parsing (again)/ D grammar

2014-10-02 Thread Cliff via Digitalmars-d
On Thursday, 2 October 2014 at 17:43:45 UTC, Vladimir Kazanov 
wrote:

On Thursday, 2 October 2014 at 17:17:53 UTC, Cliff wrote:



What has steered you down the path of writing your own parser 
generator as opposed to using an existing one such as ANTLR?  
Were there properties you wanted that it didn't have, or 
performance, or...?


Like I said in the introducing post, this is a personal 
experiment of sorts. I am aware of most alternatives, such as 
ANTLR's ALL(*) and many, MANY others. :) And I would never 
write something myself as a part of my full-time job.


But right now I am writing an article on generalized parsers, 
toying with implementations I could lay my hands on, 
implementing others. GLL is a rather exotic LL flavor which 
looks attractive in theory. I want to see it in practice.


Very cool - post the GitHub or equivalent when you get the chance 
(assuming you are sharing).  This is an area of interest for me 
as well.


Re: D Parsing (again)/ D grammar

2014-10-02 Thread Vladimir Kazanov via Digitalmars-d
On Thursday, 2 October 2014 at 17:39:55 UTC, Ola Fosheim Grøstad 
wrote:
On Thursday, 2 October 2014 at 15:47:04 UTC, Vladimir Kazanov 
wrote:
3. The automatically generated code repeats the original 
grammar structure - the same way recursive decent parsers do.


4. The core parser is still that simple LL/RD parser I can 
practically debug.


This sounds nice! Does this mean that it would be possible to 
use your parser generator to create a skeleton which is then 
manipulated manually or is there non-local complexities that 
makes manual edits risky?


Chances are that I will be able to get the original GLL parser 
generator from one of algorithm authors (Adrian Johnstone). He's 
really helpful here. From that point, I will only have to add a 
frontend for generating a concrete parser, starting with Python - 
it already has a fully working grammar. Hopefully, I will also be 
able to find a suitable grammar for D, it is always a pleasure to 
play with the language.


Otherwise, I will continue my current effort - implementing the 
GLL parser generator in D.


Anyway, right now, from what I see in the papers, it looks like 
it is quite possible, yes. Again, requires a bit of understanding 
(nothing special, really), but compared to dumb LALR-style tables 
it's nothing.


One reason I believe GLL is the right way to go is that I think 
RD makes it easier to generate good error messages suitable for 
display (to the end user).


This is one of the reasons I prefer LL/RD parsers :-) They are 
easy to follow.


Re: D Parsing (again)/ D grammar

2014-10-02 Thread Philippe Sigaud via Digitalmars-d
 Chances are that I will be able to get the original GLL parser generator
 from one of algorithm authors (Adrian Johnstone). He's really helpful here.
 From that point, I will only have to add a frontend for generating a
 concrete parser, starting with Python - it already has a fully working
 grammar. Hopefully, I will also be able to find a suitable grammar for D, it
 is always a pleasure to play with the language.

 Otherwise, I will continue my current effort - implementing the GLL parser
 generator in D.

I did that during this summer, almost to the point it was
self-sustained (that is, the GLL parser generator was used to generate
a parser for grammars files). I chose a range interface: input is a
string, the parse forest is output as a range of parse trees.

I loved to see it generate the different possible parse trees on
ambiguous grammars, and accepting left- and right-recursing grammars!
GLL is a very noce algorithm.

Halas, even with some shortcuts on Kleene stars it was quite slow. I
tried to use threads (spawning worker threads on alternatives), but
that did not change the timings much.

I could make it generate a parser for JSON and compared it to the
jsonx module that Sönke presented here. Bah, it was 1000 times slower
(which is all relative: it still takes only a few ms to parse a big
JSON file. But still, far away from the microseconds it should need).
Pegged was faster that this GLL generator, but of course still far
slower than jsonx.

[Of course, Pegged can cheat: it can parse your file at compile-time,
resulting in 0-µs parsing time at runtime :-)]

Also, the way I coded it I hit CTFE limits and could not have it parse
at compile-time. A shame, really.

 This is one of the reasons I prefer LL/RD parsers :-) They are easy to
 follow.

I would like to try a LALR compile-time generator and compile-time
parser. I'm pretty sure LALR tables could be expanded directly as D
code instead of being read during parsing.

Philippe



Re: D Parsing (again)/ D grammar

2014-10-02 Thread Vladimir Kazanov via Digitalmars-d
On Thursday, 2 October 2014 at 18:20:36 UTC, Philippe Sigaud via 
Digitalmars-d wrote:

I did that during this summer, almost to the point it was
self-sustained (that is, the GLL parser generator was used to 
generate
a parser for grammars files). I chose a range interface: input 
is a

string, the parse forest is output as a range of parse trees.


Nice! Is it public? Github?

Halas, even with some shortcuts on Kleene stars it was quite 
slow. I
tried to use threads (spawning worker threads on alternatives), 
but

that did not change the timings much.


AFAIK, multithreading is a bad idea in parsing.

Actually, in the gll-combinators Scala project they have similar 
speed problems. I don't if it's a fundamental algorithm problem 
or an implementation details that lead to slowdowns.


Re: D Parsing (again)/ D grammar

2014-10-02 Thread Philippe Sigaud via Digitalmars-d
 I did that during this summer, almost to the point it was
 self-sustained (that is, the GLL parser generator was used to generate
 a parser for grammars files). I chose a range interface: input is a
 string, the parse forest is output as a range of parse trees.


 Nice! Is it public? Github?

No github repo. I could put it alongside Pegged, I suppose. I used git
internally, though.

 I was a bit disheartened by the speed I got, so did not publish nor
announced it here.

Note also that I saw it as an alternative engine for my own Pegged
project, so I used the same way of defining grammars (some prefixes
ans suffixes for dropping nodes in the parse tree and so on).

I can send you the code (well the entire repo), if you wish. I did not
touch it for the past 3 months, so I already don't remember what state
it was in :-(.
Looking at the code now, it seems I'm still using Pegged to parse the
initial grammar. Bootstrapping did not go well.

Send me an email at firstname . lastname @gmail.com

(philippe sigaud)


 Halas, even with some shortcuts on Kleene stars it was quite slow. I
 tried to use threads (spawning worker threads on alternatives), but
 that did not change the timings much.


 AFAIK, multithreading is a bad idea in parsing.

I think, as many things in CS, that people developed parsing
algorithms before multicores existed.
Spawning threads or fibers for some alternatives (A | B) seems nice to
me. I got interesting results with a purely multithreaded parser that
spawned children for each choice.


 Actually, in the gll-combinators Scala project they have similar speed
 problems. I don't if it's a fundamental algorithm problem or an
 implementation details that lead to slowdowns.

Well, when all resulting implementations have speed problems...
I found an article by the GLL authors explaining how they coded their
algorithm for more speed, but I couldn't wrap my head around it.

Philippe


Re: D Parsing (again)/ D grammar

2014-10-02 Thread Vladimir Kazanov via Digitalmars-d
On Thursday, 2 October 2014 at 20:28:47 UTC, Philippe Sigaud via 
Digitalmars-d wrote:

I did that during this summer, almost to the point it was
self-sustained (that is, the GLL parser generator was used to 
generate
a parser for grammars files). I chose a range interface: 
input is a

string, the parse forest is output as a range of parse trees.



Nice! Is it public? Github?


No github repo. I could put it alongside Pegged, I suppose. I 
used git

internally, though.

 I was a bit disheartened by the speed I got, so did not 
publish nor

announced it here.

Note also that I saw it as an alternative engine for my own 
Pegged
project, so I used the same way of defining grammars (some 
prefixes

ans suffixes for dropping nodes in the parse tree and so on).

I can send you the code (well the entire repo), if you wish. I 
did not
touch it for the past 3 months, so I already don't remember 
what state

it was in :-(.
Looking at the code now, it seems I'm still using Pegged to 
parse the

initial grammar. Bootstrapping did not go well.

Send me an email at firstname . lastname @gmail.com

(philippe sigaud)



Check the mailbox,

Thank you


Re: D Parsing (again)/ D grammar

2014-10-02 Thread Vladimir Kazanov via Digitalmars-d
On Thursday, 2 October 2014 at 20:28:47 UTC, Philippe Sigaud via 
Digitalmars-d


No github repo. I could put it alongside Pegged, I suppose. I 
used git

internally, though.

 I was a bit disheartened by the speed I got, so did not 
publish nor

announced it here.


I have a huge collection of projects I never published :-) We all 
do, I guess.



I think, as many things in CS, that people developed parsing
algorithms before multicores existed.
Spawning threads or fibers for some alternatives (A | B) seems 
nice to
me. I got interesting results with a purely multithreaded 
parser that

spawned children for each choice.


No, this is not exactly what I mean. Multithreading can be 
perfectly fine in some cases, very fruitful sometimes. Say, in 
NLP, when one has to process long or highly ambiguous strings, 
and the parse tree can become huge and is of importance in 
itself... Yes. In programming language parsing this is just a 
small step towards further stages within a longer pipeline. It 
has to be fast enough to make multithreading on this step an 
overkill.



Well, when all resulting implementations have speed problems...


Well... This is why I want to study the original implementation. 
I had read the story of red-black trees some time ago - it took a 
while to get it right, more to make it elegant.


BTW, there's one an interesting related work that probably should 
be taken as a positive example of generalized parsing: the 
Elkhound parser generator. It uses a hybrid LALR/GLR approach, 
thus achieving better performance. There's much more to it, I 
didn't go too deep into it.


I found an article by the GLL authors explaining how they coded 
their

algorithm for more speed, but I couldn't wrap my head around it.


By now, I have read the original Tomita's GLR paper, Antlr ALL(*) 
paper, a few recent GLR papers, three papers on GLL and a few 
related ones . It took... A while. I sort of understand the idea, 
but still not sure about the details :-)


What's the name of the paper you read? Modelling GLL Parser 
Implementation?


Re: D Parsing (again)/ D grammar

2014-10-02 Thread via Digitalmars-d
On Thursday, 2 October 2014 at 18:06:04 UTC, Vladimir Kazanov 
wrote:
Chances are that I will be able to get the original GLL parser 
generator from one of algorithm authors (Adrian Johnstone). 
He's really helpful here. From that point, I will only have to 
add a frontend for generating a concrete parser, starting with 
Python - it already has a fully working grammar. Hopefully, I 
will also be able to find a suitable grammar for D, it is 
always a pleasure to play with the language.


Nice! If you manage to get a GLL generator for D (or C++) going 
I'd love to try it out.


The only general parser generator I've tested so far is the 
Tomita-style GLR parser called DParser, but IIRC the 
documentation could need some improvement:


http://sourceforge.net/projects/dparser/



Re: D Parsing (again)/ D grammar

2014-10-02 Thread Vladimir Kazanov via Digitalmars-d
On Thursday, 2 October 2014 at 21:26:15 UTC, Ola Fosheim Grøstad 
wrote:




Nice! If you manage to get a GLL generator for D (or C++) going 
I'd love to try it out.


You will definitely hear about it here :-) I don't know yet if 
the parser itself will be worth the trouble, but a more or less 
complete resulting blog post with tests/results/examples will 
definitely see the world.


The only general parser generator I've tested so far is the 
Tomita-style GLR parser called DParser, but IIRC the 
documentation could need some improvement:


http://sourceforge.net/projects/dparser/


Yes, I know this one. A scannerless GLR parser. GLRs are trendy 
these days :) Easy to find many more, there's even an Emacs mode 
using GLR for Ada parsing.


Re: D Parsing (again)/ D grammar

2014-10-02 Thread Philippe Sigaud via Digitalmars-d
 I have a huge collection of projects I never published :-) We all do, I
 guess.

Oh, the ratio is more and 100 projects for one published...


 No, this is not exactly what I mean. Multithreading can be perfectly fine in
 some cases, very fruitful sometimes. Say, in NLP, when one has to process
 long or highly ambiguous strings, and the parse tree can become huge and is
 of importance in itself... Yes. In programming language parsing this is just
 a small step towards further stages within a longer pipeline. It has to be
 fast enough to make multithreading on this step an overkill.

I don't know. Using fibers, I'm hoping to get interesting results one
day. I got it by a workstorm before trying fibers. OS threads were a
bit to heavy and didn't work that well.


 BTW, there's one an interesting related work that probably should be taken
 as a positive example of generalized parsing: the Elkhound parser generator.
 It uses a hybrid LALR/GLR approach, thus achieving better performance.
 There's much more to it, I didn't go too deep into it.

Yes, Elkhound is interesting, their approach is nice. But It gave me
the impression to be abandoned for a few years?


 I found an article by the GLL authors explaining how they coded their
 algorithm for more speed, but I couldn't wrap my head around it.


 By now, I have read the original Tomita's GLR paper, Antlr ALL(*) paper, a
 few recent GLR papers, three papers on GLL and a few related ones . It
 took... A while. I sort of understand the idea, but still not sure about the
 details :-)

ALL(*) is on my todo list. I tried to implement it in Spring, but got
bogged down in the details. Even the white paper has some imprecisions
when you really sit down and try to code it.
I could have a look at ANTLR v4 source, but wanted to have a clean
start, right from the white paper.

 What's the name of the paper you read? Modelling GLL Parser
 Implementation?

Yes.


Re: D Parsing (again)/ D grammar

2014-10-02 Thread Philippe Sigaud via Digitalmars-d
On Thu, Oct 2, 2014 at 11:19 PM, Vladimir Kazanov via Digitalmars-d
digitalmars-d@puremagic.com wrote:

 Check the mailbox,

 Thank you

I sent it to you. I was asleep, sorry :-)


Re: D parsing

2013-11-14 Thread Timon Gehr

On 11/12/2013 09:14 AM, Andrei Alexandrescu wrote:

On 11/12/13 12:09 AM, Jacob Carlborg wrote:

...

An argument for macros would have to do a
lot more than 'we don't need __FILE__ etc. anymore'

...


It's very simple.  Timon got into that part as if it were important. I pointed out it's 

not important.

..


This is a misrepresentation. I'll just answer that post.


Re: D parsing

2013-11-14 Thread Timon Gehr

On 11/11/2013 12:37 AM, Andrei Alexandrescu wrote:

On 11/10/13 3:21 PM, Timon Gehr wrote:

On 11/10/2013 11:43 PM, Andrei Alexandrescu wrote:


It seems we could even get rid of __FILE__,__LINE__,__MODULE__ with AST
macros.


This would be a very small advantage. The special variables hardly cause
any trouble.

Andrei


The trouble is that there are too few of them to (efficiently) reflect
all the information about the calling context one may be interested in.
I don't see the point in adding them one after another in an
unprincipled ad-hoc fashion instead of actually thinking up a good API
coupled with some more expressive power.


But the API would add them in exactly the same rote manner.


No, it would be (close to) exhaustive from the beginning. That is the 
point. I assume the problem is that there have not yet been very 
specific proposals.



I really don't see an advantage here.


It's the difference between having an universal language, and a language 
that has a finite and fixed set of words (without any methods of 
combination).




An argument for macros would have to do a
lot more than we don't need __FILE__ etc. anymore.
...


Obviously, but I was not making this argument, and as far as I can see 
nobody else was either.




Re: D parsing

2013-11-12 Thread Jacob Carlborg

On 2013-11-12 08:52, Andrei Alexandrescu wrote:


Fine, although a sense of futility is hard to shake seeing as we won't
replace those existing features. I think a much stronger point would be
made if the power of the feature were demonstrated on problems not
accessible with the existing ones.


You just said we shouldn't replace existing features.

The point here is that there is
significant difficulty to remove features that already exist

http://forum.dlang.org/thread/bwsofbnigfbrxwoui...@forum.dlang.org?page=9#post-l5s44b:242c36:241:40digitalmars.com


About DIP 50: I will say no but please do not take it personally. It's
great there is discussion about this, and I encourage it, but at this
time I think we should make no plans to add macros to D.


I don't think we should add macros now either. This proposal is far from 
ready. If Martin hadn't suggested I should create a DIP, I wouldn't 
have, at least now at this time.


BTW, just saying no doesn't help a bit. You could just have said 
foo. That's extremely annoying. You're shooting down very many 
suggestions/proposal/ideas with just a no, or the more elaborated 
answer no, never going to happen.


On the other hand when you have a proposal it should be consider 
pre-approved and is a more of a FYI.


--
/Jacob Carlborg


Re: D parsing

2013-11-12 Thread Andrei Alexandrescu

On 11/11/13 11:50 PM, Jacob Carlborg wrote:

On 2013-11-12 03:35, Andrei Alexandrescu wrote:


So... there is rote addition to the context.caller structure. It's just
spelled differently. No?


Have you seen the other posts?


Yes. Don't assume whoever disagrees with you is misinformed.


Have even read the proposal?


Yes.


If you think
that I just want a different syntax for __LINE__ and __FILE__ you're
way, way off. That's just an extremely small part of the proposal.
__LINE__ and __FILE__ would not be removed.


This is putting words in my mouth.


Andrei


Re: D parsing

2013-11-12 Thread Jacob Carlborg

On 2013-11-12 09:00, Andrei Alexandrescu wrote:


Yes. Don't assume whoever disagrees with you is misinformed.


I absolutely do not. But when I read this:

An argument for macros would have to do a
lot more than 'we don't need __FILE__ etc. anymore'

I find it hard to believe anything else.

http://forum.dlang.org/thread/bwsofbnigfbrxwoui...@forum.dlang.org?page=8#post-l5p5b8:241kju:241:40digitalmars.com

--
/Jacob Carlborg


Re: D parsing

2013-11-12 Thread Andrei Alexandrescu

On 11/12/13 12:09 AM, Jacob Carlborg wrote:

On 2013-11-12 09:00, Andrei Alexandrescu wrote:


Yes. Don't assume whoever disagrees with you is misinformed.


I absolutely do not. But when I read this:

An argument for macros would have to do a
lot more than 'we don't need __FILE__ etc. anymore'

I find it hard to believe anything else.

http://forum.dlang.org/thread/bwsofbnigfbrxwoui...@forum.dlang.org?page=8#post-l5p5b8:241kju:241:40digitalmars.com


It's very simple. Timon got into that part as if it were important. I 
pointed out it's not important.


Andrei



Re: D parsing

2013-11-12 Thread Andrei Alexandrescu

On 11/12/13 12:02 AM, Jacob Carlborg wrote:

On 2013-11-12 08:52, Andrei Alexandrescu wrote:


Fine, although a sense of futility is hard to shake seeing as we won't
replace those existing features. I think a much stronger point would be
made if the power of the feature were demonstrated on problems not
accessible with the existing ones.


You just said we shouldn't replace existing features.

The point here is that there is
significant difficulty to remove features that already exist

http://forum.dlang.org/thread/bwsofbnigfbrxwoui...@forum.dlang.org?page=9#post-l5s44b:242c36:241:40digitalmars.com


Yes. So I said. I don't get why you'd provide a link - it's in my text 
that you quote. Indeed, we shouldn't replace existing features.



About DIP 50: I will say no but please do not take it personally. It's
great there is discussion about this, and I encourage it, but at this
time I think we should make no plans to add macros to D.


I don't think we should add macros now either. This proposal is far from
ready. If Martin hadn't suggested I should create a DIP, I wouldn't
have, at least now at this time.


Fine.


BTW, just saying no doesn't help a bit. You could just have said
foo. That's extremely annoying. You're shooting down very many
suggestions/proposal/ideas with just a no, or the more elaborated
answer no, never going to happen.

On the other hand when you have a proposal it should be consider
pre-approved and is a more of a FYI.


So how could we express a no that doesn't annoy you in the extreme? In 
case the answer would be you haven't explained why, allow me to retort.


I've mentioned the argument before: at this point we should focus on 
quality of implementation and making good use of the features we have. 
In fact I am repeating myself: http://goo.gl/1thq1j. As has been 
publicly known for a while now, our strategy has been to improve quality 
and to double down on the assets we have. People ask for a roadmap, and 
what's missing from a roadmap is as important as what's there.


This is a strategy that Walter and I agree with, have been transparent 
about, and that may work or not, with various degrees of success. 
Reasonable people may disagree what the best step moving forward should 
be, but at some point some step must be made and we can't adopt your 
strategy, with which we disagree, as our strategy, just to be nice and 
not offend your sensibility. (I'm using we here because Walter and I 
discussed this at large.) There must be a way to say no that doesn't 
offend you. Please advise what that is.



Andrei



Re: D parsing

2013-11-12 Thread Jacob Carlborg

On 2013-11-12 09:14, Andrei Alexandrescu wrote:


It's very simple. Timon got into that part as if it were important. I
pointed out it's not important.


Ok, fair enough. I'm sorry for wrongly interpreting it.

--
/Jacob Carlborg


Re: D parsing

2013-11-12 Thread Jacob Carlborg

On 2013-11-12 09:13, Andrei Alexandrescu wrote:


So how could we express a no that doesn't annoy you in the extreme? In
case the answer would be you haven't explained why, allow me to retort.

I've mentioned the argument before: at this point we should focus on
quality of implementation and making good use of the features we have.
In fact I am repeating myself: http://goo.gl/1thq1j. As has been
publicly known for a while now, our strategy has been to improve quality
and to double down on the assets we have. People ask for a roadmap, and
what's missing from a roadmap is as important as what's there.

This is a strategy that Walter and I agree with, have been transparent
about, and that may work or not, with various degrees of success.
Reasonable people may disagree what the best step moving forward should
be, but at some point some step must be made and we can't adopt your
strategy, with which we disagree, as our strategy, just to be nice and
not offend your sensibility. (I'm using we here because Walter and I
discussed this at large.) There must be a way to say no that doesn't
offend you. Please advise what that is.


Thank you for the explanation. I do understand your and Walter's 
position in this.


Just giving a short reason, to accompany the no with helps a lot. It 
doesn't need to be as long as the explanation above, just a sentence, like:


no, at this time we don't want to make such a big change, we're trying 
to stabilize.


This is not just for me. I'm hoping proposal from others also can get a 
fair reason to why the no.


--
/Jacob Carlborg


Re: D parsing

2013-11-12 Thread Andrei Alexandrescu

On 11/12/13 12:31 AM, Jacob Carlborg wrote:

Just giving a short reason, to accompany the no with helps a lot. It
doesn't need to be as long as the explanation above, just a sentence, like:

no, at this time we don't want to make such a big change, we're trying
to stabilize.

This is not just for me. I'm hoping proposal from others also can get a
fair reason to why the no.


OK, thanks. I'll do my best to improve on that in the future.

Andrei




Re: D parsing

2013-11-12 Thread Jacob Carlborg

On 2013-11-12 09:37, Andrei Alexandrescu wrote:


OK, thanks. I'll do my best to improve on that in the future.


Thanks again. Sorry for being frustrated.

--
/Jacob Carlborg


Re: D parsing

2013-11-12 Thread Walter Bright

On 11/4/2013 11:27 PM, Brian Schott wrote:

Seriously. We don't have a real grammar for D. We have the language spec on
dlang.org, but it isn't complete, consistent, or updated when the language
changes. Want examples? I have a tracker for them here:
http://d.puremagic.com/issues/show_bug.cgi?id=10233


Thank you, Brian, for finding and reporting these issues.



Re: D parsing

2013-11-11 Thread Jacob Carlborg

On 2013-11-10 23:49, Andrei Alexandrescu wrote:


The way I see this is, these preexisting features (which shan't be
thrown away) diminish the motivation for adding another feature.


I think the opposite. It shows how powerful macros are and that they 
will most likely be able to implement other new features that haven't 
been proposed yet.


Adding a bunch of new features to the language instead of taking a step 
back and thinking if something else can be added that implement these 
features and many more shows bad judgement.


--
/Jacob Carlborg


Re: D parsing

2013-11-11 Thread Jacob Carlborg

On 2013-11-11 00:37, Andrei Alexandrescu wrote:


But the API would add them in exactly the same rote manner. I really
don't see an advantage here. An argument for macros would have to do a
lot more than we don't need __FILE__ etc. anymore.


They do, and several examples have already been shown. I'll also add 
more as I come up with them.


--
/Jacob Carlborg


Re: D parsing

2013-11-11 Thread Jacob Carlborg

On 2013-11-11 02:54, Andrei Alexandrescu wrote:


What do you have in mind? I find it difficult to grab e.g. the file and
the line unless there's some plumbing in the language that allows you to.


The language just need to provide a way to introspect the AST. If we're 
talking about a specific API I can think of something like this:


macro foo (Context context, Ast!(string) str)
{
auto line = context.caller.line;
auto file = context.caller.file;
}

foo(asd);

--
/Jacob Carlborg


Re: D parsing

2013-11-11 Thread Andrei Alexandrescu

On 11/10/13 11:55 PM, Jacob Carlborg wrote:

On 2013-11-10 23:49, Andrei Alexandrescu wrote:


The way I see this is, these preexisting features (which shan't be
thrown away) diminish the motivation for adding another feature.


I think the opposite. It shows how powerful macros are and that they
will most likely be able to implement other new features that haven't
been proposed yet.

Adding a bunch of new features to the language instead of taking a step
back and thinking if something else can be added that implement these
features and many more shows bad judgement.


No need to be snarky about it. The point here is that there is 
significant difficulty to remove features that already exist and are 
useful, which changes the game considerably.


Andrei


Re: D parsing

2013-11-11 Thread Andrei Alexandrescu

On 11/11/13 12:00 AM, Jacob Carlborg wrote:

On 2013-11-11 02:54, Andrei Alexandrescu wrote:


What do you have in mind? I find it difficult to grab e.g. the file and
the line unless there's some plumbing in the language that allows you to.


The language just need to provide a way to introspect the AST. If we're
talking about a specific API I can think of something like this:

macro foo (Context context, Ast!(string) str)
{
 auto line = context.caller.line;
 auto file = context.caller.file;
}

foo(asd);


So... there is rote addition to the context.caller structure. It's just 
spelled differently. No?


Andrei




Re: D parsing

2013-11-11 Thread deadalnix
On Tuesday, 12 November 2013 at 02:34:52 UTC, Andrei Alexandrescu 
wrote:
No need to be snarky about it. The point here is that there is 
significant difficulty to remove features that already exist 
and are useful, which changes the game considerably.


Andrei


That is true. However, please understand that this is extremely 
frustrating to have a lot of gadget features that solve a 
specific issue when much more generic solutions exists, and when 
these gadgets actually makes it more difficult to introduce the 
proper features.


If we put aside the AST macro thing (granted we have a swiss army 
knife of gadgets for many of its typical uses cases), several 
other skeleton don't want to stay in the closet.


If I has to pick 2:
tail qualified type parameter are one. It is impossible to have a 
use defined type that behave as slices/pointers. That is 
showstopper for a proper collection library. I have only ugly 
solution to propose to that now. Lately, I was plying with thing 
along the lines of struct S[T] { auto foo() const[T] { ... } } 
I'm afraid that what already exist in that area makes things 
quite difficult.


The lack of qualifier for ownership, which makes concurrency and 
interface between manual memory management and GC tedious. For 
that one, isolated seems the way to go.


These same issue continue to pop up in many different forms (for 
instance, see Kenji's DIP, where he is trying to hack around some 
gadget to unique postblit work).


Re: D parsing

2013-11-11 Thread Andrei Alexandrescu

On 11/11/13 9:11 PM, deadalnix wrote:

On Tuesday, 12 November 2013 at 02:34:52 UTC, Andrei Alexandrescu wrote:

No need to be snarky about it. The point here is that there is
significant difficulty to remove features that already exist and are
useful, which changes the game considerably.

Andrei


That is true. However, please understand that this is extremely
frustrating to have a lot of gadget features that solve a specific issue
when much more generic solutions exists, and when these gadgets actually
makes it more difficult to introduce the proper features.


Frustration is inappropriate. Probably bummed would be more fit for 
the situation. (Which I am not, but that's a different story; I 
personally don't think macros are all they're cracked to be.)


We did not have a macro-based solution when D was being invented, and I 
don't think there's a lot of glory in seeing patterns for doing things 
differently after the language has been in use for a while. There's a 
whole water under the bridge between now and then.



If we put aside the AST macro thing (granted we have a swiss army knife
of gadgets for many of its typical uses cases), several other skeleton
don't want to stay in the closet.

If I has to pick 2:
tail qualified type parameter are one. It is impossible to have a use
defined type that behave as slices/pointers. That is showstopper for a
proper collection library. I have only ugly solution to propose to that
now. Lately, I was plying with thing along the lines of struct S[T] {
auto foo() const[T] { ... } } I'm afraid that what already exist in that
area makes things quite difficult.

The lack of qualifier for ownership, which makes concurrency and
interface between manual memory management and GC tedious. For that one,
isolated seems the way to go.

These same issue continue to pop up in many different forms (for
instance, see Kenji's DIP, where he is trying to hack around some gadget
to unique postblit work).


Our goals for the time being are to improve stability, close gaps in the 
type system, and make good use of features in code. To the extent we 
can't do what we want to do, the topics you mention are fit.



Andrei



Re: D parsing

2013-11-11 Thread deadalnix
On Tuesday, 12 November 2013 at 05:58:02 UTC, Andrei Alexandrescu 
wrote:

If I has to pick 2:
tail qualified type parameter are one. It is impossible to 
have a use
defined type that behave as slices/pointers. That is 
showstopper for a
proper collection library. I have only ugly solution to 
propose to that
now. Lately, I was plying with thing along the lines of struct 
S[T] {
auto foo() const[T] { ... } } I'm afraid that what already 
exist in that

area makes things quite difficult.

The lack of qualifier for ownership, which makes concurrency 
and
interface between manual memory management and GC tedious. For 
that one,

isolated seems the way to go.

These same issue continue to pop up in many different forms 
(for
instance, see Kenji's DIP, where he is trying to hack around 
some gadget

to unique postblit work).


Our goals for the time being are to improve stability, close 
gaps in the type system, and make good use of features in code. 
To the extent we can't do what we want to do, the topics you 
mention are fit.



Andrei


Yes, as much as I'd like to have macro, I don't think this is the 
right time now. Closing the gaps is more important.


Re: D parsing

2013-11-11 Thread Jacob Carlborg

On 2013-11-12 03:34, Andrei Alexandrescu wrote:


No need to be snarky about it. The point here is that there is
significant difficulty to remove features that already exist and are
useful, which changes the game considerably.


I'm really trying to be as humble as I can but you're not making it easy.

I have not said we should remove any of the existing features. I'm 
trying to show that several already existing features could have been 
implemented using macros if they existed early in the development. It 
should show that it is a powerful feature and can hopefully be used to 
solve new ides with library code instead changing the languages.


--
/Jacob Carlborg


Re: D parsing

2013-11-11 Thread Andrei Alexandrescu

On 11/11/13 11:46 PM, Jacob Carlborg wrote:

On 2013-11-12 03:34, Andrei Alexandrescu wrote:


No need to be snarky about it. The point here is that there is
significant difficulty to remove features that already exist and are
useful, which changes the game considerably.


I'm really trying to be as humble as I can but you're not making it easy.

I have not said we should remove any of the existing features. I'm
trying to show that several already existing features could have been
implemented using macros if they existed early in the development. It
should show that it is a powerful feature and can hopefully be used to
solve new ides with library code instead changing the languages.


Fine, although a sense of futility is hard to shake seeing as we won't 
replace those existing features. I think a much stronger point would be 
made if the power of the feature were demonstrated on problems not 
accessible with the existing ones.


About DIP 50: I will say no but please do not take it personally. It's 
great there is discussion about this, and I encourage it, but at this 
time I think we should make no plans to add macros to D.



Andrei



Re: D parsing

2013-11-11 Thread Jacob Carlborg

On 2013-11-12 03:35, Andrei Alexandrescu wrote:


So... there is rote addition to the context.caller structure. It's just
spelled differently. No?


Have you seen the other posts? Have even read the proposal? If you think 
that I just want a different syntax for __LINE__ and __FILE__ you're 
way, way off. That's just an extremely small part of the proposal. 
__LINE__ and __FILE__ would not be removed.


--
/Jacob Carlborg


Re: D parsing

2013-11-10 Thread Martin Nowak
On Thursday, 7 November 2013 at 13:50:57 UTC, Jacob Carlborg 
wrote:

https://dl.dropboxusercontent.com/u/18386187/ast_macros.html


Can you pour that into a DIP?


Re: D parsing

2013-11-10 Thread Jacob Carlborg

On 2013-11-10 17:37, Martin Nowak wrote:


Can you pour that into a DIP?


Absolutely, I've been meaning to do that.

--
/Jacob Carlborg


Re: D parsing

2013-11-10 Thread deadalnix

On Sunday, 10 November 2013 at 17:58:35 UTC, Jacob Carlborg wrote:

On 2013-11-10 17:37, Martin Nowak wrote:


Can you pour that into a DIP?


Absolutely, I've been meaning to do that.


It require a bit more precision. But as you know I really behind 
the idea.


Re: D parsing

2013-11-10 Thread Andrei Alexandrescu

On 11/10/13 9:58 AM, Jacob Carlborg wrote:

On 2013-11-10 17:37, Martin Nowak wrote:


Can you pour that into a DIP?


Absolutely, I've been meaning to do that.


Just a thought - I think the best strategy for the time being is to 
explore gainful use of the features already present, instead of adding 
new features.


Andrei




Re: D parsing

2013-11-10 Thread deadalnix
On Sunday, 10 November 2013 at 18:31:32 UTC, Andrei Alexandrescu 
wrote:

On 11/10/13 9:58 AM, Jacob Carlborg wrote:

On 2013-11-10 17:37, Martin Nowak wrote:


Can you pour that into a DIP?


Absolutely, I've been meaning to do that.


Just a thought - I think the best strategy for the time being 
is to explore gainful use of the features already present, 
instead of adding new features.


Andrei


It is also a given that many feature could have been avoided in 
the first place given that one.


Still I'd rather see thing settle down a little before 
introducing this. Fix the existing things, then introduce new 
stuff seems like the proper way to go.


Re: D parsing

2013-11-10 Thread Jacob Carlborg

On 2013-11-10 19:44, deadalnix wrote:


It is also a given that many feature could have been avoided in the
first place given that one.


I agree. It seems at least a couple of features could have been 
implemented using AST macros instead.


--
/Jacob Carlborg


Re: D parsing

2013-11-10 Thread Timothee Cour
On Sun, Nov 10, 2013 at 12:27 PM, Jacob Carlborg d...@me.com wrote:

 On 2013-11-10 19:44, deadalnix wrote:

  It is also a given that many feature could have been avoided in the
 first place given that one.


 I agree. It seems at least a couple of features could have been
 implemented using AST macros instead.

 --
 /Jacob Carlborg


I agree; eg, just to name 2 features that I proposed that could be simply
implemented via AST macros as Jacob said:
* [proposal: a new string litteral to embed variables in a
stringhttp://forum.dlang.org/post/mailman.94.1383254681.9546.digitalmar...@puremagic.com

http://forum.dlang.org/post/mailman.94.1383254681.9546.digitalmar...@puremagic.com
]
* [Re: feature request: __ARGS__ for logging (cf __FILE__, __LINE__,
__FUNC___ http://forum.dlang.org/post/kegmp0$30dn$1...@digitalmars.com]

It seems we could even get rid of __FILE__,__LINE__,__MODULE__ with AST
macros.


Re: D parsing

2013-11-10 Thread Jacob Carlborg

On 2013-11-10 21:38, Timothee Cour wrote:


I agree; eg, just to name 2 features that I proposed that could be
simply implemented via AST macros as Jacob said:
* [proposal: a new string litteral to embed variables in a string
http://forum.dlang.org/post/mailman.94.1383254681.9546.digitalmar...@puremagic.com
http://forum.dlang.org/post/mailman.94.1383254681.9546.digitalmar...@puremagic.com]
* [Re: feature request: __ARGS__ for logging (cf __FILE__, __LINE__,
__FUNC___ http://forum.dlang.org/post/kegmp0$30dn$1...@digitalmars.com]

It seems we could even get rid of __FILE__,__LINE__,__MODULE__ with AST
macros.


Let see, what else. These could, most likely, have been implemented 
using AST macros:


* scope
* foreach
* for
* inlining delegates. I think the current solution to pass a 
delegate/lambda as an alias parameter is a bit weird


One of my favorites that is not already in the language is implementing 
database queries:


Person.where(e = e.name == John);

Will be translate to the following SQL:

select * from person where name = 'John'

Another one is the property shortcut, see the example for attribute 
macros: http://wiki.dlang.org/DIP50#Attribute_macros


--
/Jacob Carlborg


Re: D parsing

2013-11-10 Thread Jacob Carlborg

On 2013-11-10 17:37, Martin Nowak wrote:


Can you pour that into a DIP?


Done: http://forum.dlang.org/thread/l5otb1$1dhi$1...@digitalmars.com

--
/Jacob Carlborg


Re: D parsing

2013-11-10 Thread Andrei Alexandrescu

On 11/10/13 12:38 PM, Timothee Cour wrote:

On Sun, Nov 10, 2013 at 12:27 PM, Jacob Carlborg d...@me.com
mailto:d...@me.com wrote:

On 2013-11-10 19:44, deadalnix wrote:

It is also a given that many feature could have been avoided in the
first place given that one.


I agree. It seems at least a couple of features could have been
implemented using AST macros instead.

--
/Jacob Carlborg


I agree; eg, just to name 2 features that I proposed that could be
simply implemented via AST macros as Jacob said:
* [proposal: a new string litteral to embed variables in a string
http://forum.dlang.org/post/mailman.94.1383254681.9546.digitalmar...@puremagic.com
http://forum.dlang.org/post/mailman.94.1383254681.9546.digitalmar...@puremagic.com]
* [Re: feature request: __ARGS__ for logging (cf __FILE__, __LINE__,
__FUNC___ http://forum.dlang.org/post/kegmp0$30dn$1...@digitalmars.com]

It seems we could even get rid of __FILE__,__LINE__,__MODULE__ with AST
macros.


This would be a very small advantage. The special variables hardly cause 
any trouble.


Andrei



Re: D parsing

2013-11-10 Thread Andrei Alexandrescu

On 11/10/13 1:17 PM, Jacob Carlborg wrote:

On 2013-11-10 21:38, Timothee Cour wrote:


I agree; eg, just to name 2 features that I proposed that could be
simply implemented via AST macros as Jacob said:
* [proposal: a new string litteral to embed variables in a string
http://forum.dlang.org/post/mailman.94.1383254681.9546.digitalmar...@puremagic.com

http://forum.dlang.org/post/mailman.94.1383254681.9546.digitalmar...@puremagic.com]

* [Re: feature request: __ARGS__ for logging (cf __FILE__, __LINE__,
__FUNC___ http://forum.dlang.org/post/kegmp0$30dn$1...@digitalmars.com]

It seems we could even get rid of __FILE__,__LINE__,__MODULE__ with AST
macros.


Let see, what else. These could, most likely, have been implemented
using AST macros:

* scope
* foreach
* for
* inlining delegates. I think the current solution to pass a
delegate/lambda as an alias parameter is a bit weird

One of my favorites that is not already in the language is implementing
database queries:

Person.where(e = e.name == John);

Will be translate to the following SQL:

select * from person where name = 'John'

Another one is the property shortcut, see the example for attribute
macros: http://wiki.dlang.org/DIP50#Attribute_macros


The way I see this is, these preexisting features (which shan't be 
thrown away) diminish the motivation for adding another feature.


Andrei




Re: D parsing

2013-11-10 Thread Timon Gehr

On 11/10/2013 11:43 PM, Andrei Alexandrescu wrote:


It seems we could even get rid of __FILE__,__LINE__,__MODULE__ with AST
macros.


This would be a very small advantage. The special variables hardly cause
any trouble.

Andrei


The trouble is that there are too few of them to (efficiently) reflect 
all the information about the calling context one may be interested in. 
I don't see the point in adding them one after another in an 
unprincipled ad-hoc fashion instead of actually thinking up a good API 
coupled with some more expressive power.


Re: D parsing

2013-11-10 Thread Andrei Alexandrescu

On 11/10/13 3:21 PM, Timon Gehr wrote:

On 11/10/2013 11:43 PM, Andrei Alexandrescu wrote:


It seems we could even get rid of __FILE__,__LINE__,__MODULE__ with AST
macros.


This would be a very small advantage. The special variables hardly cause
any trouble.

Andrei


The trouble is that there are too few of them to (efficiently) reflect
all the information about the calling context one may be interested in.
I don't see the point in adding them one after another in an
unprincipled ad-hoc fashion instead of actually thinking up a good API
coupled with some more expressive power.


But the API would add them in exactly the same rote manner. I really 
don't see an advantage here. An argument for macros would have to do a 
lot more than we don't need __FILE__ etc. anymore.


Andrei



Re: D parsing

2013-11-10 Thread Timothee Cour
On Sun, Nov 10, 2013 at 3:37 PM, Andrei Alexandrescu 
seewebsiteforem...@erdani.org wrote:

 On 11/10/13 3:21 PM, Timon Gehr wrote:

 On 11/10/2013 11:43 PM, Andrei Alexandrescu wrote:


 It seems we could even get rid of __FILE__,__LINE__,__MODULE__ with AST
 macros.


 This would be a very small advantage. The special variables hardly cause
 any trouble.

 Andrei


 The trouble is that there are too few of them to (efficiently) reflect
 all the information about the calling context one may be interested in.
 I don't see the point in adding them one after another in an
 unprincipled ad-hoc fashion instead of actually thinking up a good API
 coupled with some more expressive power.


 But the API would add them in exactly the same rote manner. I really don't
 see an advantage here.


All else being equal, a library solution is always preferable; and it's
easily extensible unlike what we have.


 An argument for macros would have to do a lot more than we don't need
 __FILE__ etc. anymore.


More use cases have already been provided in this thread and previously.
The fact that *one* language feature enables all of those is a very good
indication there'll be more use cases. Furthermore, writing AST macros
instead of mixins is cleaner in terms of introspection, refactoring tools,
etc.





 Andrei




Re: D parsing

2013-11-10 Thread Andrei Alexandrescu

On 11/10/13 5:03 PM, Timothee Cour wrote:

On Sun, Nov 10, 2013 at 3:37 PM, Andrei Alexandrescu
seewebsiteforem...@erdani.org mailto:seewebsiteforem...@erdani.org
wrote:

On 11/10/13 3:21 PM, Timon Gehr wrote:

On 11/10/2013 11:43 PM, Andrei Alexandrescu wrote:


It seems we could even get rid of
__FILE__,__LINE__,__MODULE__ with AST
macros.


This would be a very small advantage. The special variables
hardly cause
any trouble.

Andrei


The trouble is that there are too few of them to (efficiently)
reflect
all the information about the calling context one may be
interested in.
I don't see the point in adding them one after another in an
unprincipled ad-hoc fashion instead of actually thinking up a
good API
coupled with some more expressive power.


But the API would add them in exactly the same rote manner. I really
don't see an advantage here.


All else being equal, a library solution is always preferable; and it's
easily extensible unlike what we have.


What do you have in mind? I find it difficult to grab e.g. the file and 
the line unless there's some plumbing in the language that allows you to.



An argument for macros would have to do a lot more than we don't
need __FILE__ etc. anymore.


More use cases have already been provided in this thread and previously.
The fact that *one* language feature enables all of those is a very good
indication there'll be more use cases. Furthermore, writing AST macros
instead of mixins is cleaner in terms of introspection, refactoring
tools, etc.


I am aware of the advantages. You, too, should be aware of the 
disadvantages.



Andrei




Re: D parsing

2013-11-07 Thread Jacob Carlborg

On 2013-11-06 09:34, Chad Joan wrote:


Also, IIRC, it is believed that string mixins with CTFE are potentially
more powerful.  I am under the assumption that Walter is taking the long
view and waiting for the community to furnish their own powerful AST
manipulation tools using the existing spec.  I suspect that he is
opposed to baking AST manipulation into the /language spec/, but is
perfectly accepting of the notion of using AST manipulation to generate
code, reduce boilerplate, implement exotic features and DSLs, and so
on.  Just don't complicate the core language any more than it already
is. Sorry if I misrepresented you Walter; I can only make educated
guesses ;)


The problem with this is the ugly syntax, it requires a string mixin and 
the source code needs to be embedded in strings. You also need a 
complete front end that works at compile time.


This is my vision of AST macros: 
https://dl.dropboxusercontent.com/u/18386187/ast_macros.html


--
/Jacob Carlborg


Re: D parsing

2013-11-07 Thread Jacob Carlborg

On 2013-11-07 00:23, Philippe Sigaud wrote:


Well, I remember him saying that he doesn't want everybody and their dog
to change the D syntax. If we get macros, that's exactly what we can do.
For example, telling the compiler (or another tool), to rewrite

int a := 1;

into

immutable a = 1;


It depends on what the AST macros allows you to do. I guess everyone 
here has her/his own definition.



But that means the original not-D code cannot be compiled by anyone with
a D compiler anymore (obviously). And soon everybody is transforming the
grammar according to what they think is cool.
That's his PoV IIRC.

I'm more interested in getting to ability to define my own lowerings: in
the same way that scope statements or foreach over ranges are rewritten
into lower-level D code by the comiler, I would like to be able to
define my own syntactic constructs.


My idea of AST macros[1] is quite similar to those in Scala and a bit 
similar to what you want. The most important thing to remember is that 
AST macros cannot create new syntax. It only works at the semantic level 
of the language. That means the above example you wrote doesn't not 
work, since that isn't legal D syntax. What would work is doing 
something like this:


auto a = Person.where(e = e.name == John);

In this case where is a macro. Which translates the given AST into the 
following SQL query:


select * from person where name = 'John'

Another thing I would really like is to be able to pass a delegate to a 
function after argument list, if the function takes a delegate as its 
last parameter:


void each (T) (T[] arr, void delegate (T) dg);

[1, 2, 3].each(e) {
writeln(e);
}

Of courses, the compiler needs to be able to inline the delegate.


Plus, everybody here know that AST macros can cure world hunger :)
Wouldn't that be nice?


Of course :)

[1] https://dl.dropboxusercontent.com/u/18386187/ast_macros.html

--
/Jacob Carlborg


Re: D parsing

2013-11-06 Thread Chad Joan
On Tuesday, 5 November 2013 at 14:54:34 UTC, Dmitry Olshansky 
wrote:


I was also toying with the idea of exposing Builder interface 
for std.regex. But push/pop IMHO are better be implicitly 
designed-out:


auto re = 
atom('x').star(charClass(unicode.Letter),atom('y')).build();


... and letting the nesting be explicit.

Is the same as:
auto re = regex(`x(?:\p{L}y)*`);

Aimed for apps/libs that build regular expressions anyway and 
have no need in textual parser.




Interesting.  I like how it induces some amount of static 
verification, though I worry that it could harm procedural 
generation of grammars.  It would be difficult, for instance, to 
use that API to do the equivalent of pushing an atom in one 
function and popping it in another.


I wonder if we are at different levels of abstraction.  The 
example you give seems like it requires the API to remember, in a 
structured way, all of the information presented by the 
call-chaining and call-nesting.  I might implement something like 
that with a stateful builder object under the hood.  However, 
the example you give seems like it is closer to what a regex 
engine would morph an expression into, thus making it a higher 
abstraction.


That snippet would create a parser that recognizes the grammar 
'x' (

'y'? ).
The current fledgling implementation creates this parser:
http://pastebin.com/MgSqWXE2

Of course, no one would be expected to write grammars like 
that. It
would be the job of tools like Pegged or std.regex to package 
it up in

nice syntax that is easy to use.


I thought to provide some building blocks for that with new 
std.uni. Not quite everything I wanted, but now at least there 
is one set of wheels less to reinvent.




I haven't looked at std.uni earnestly yet, but if it succeeds at 
making that unicode/utf jungle manageable, then I will be 
incredibly thankful.



[snip]

Another fun thought: PEGs can have look-behind that includes 
any regular
elements without any additional algorithmic complexity. Just 
take all of
the look-behinds in the grammar, mash them together into one 
big
regular-expression using regular alternation (|), and then 
have the
resulting automaton consume in lock-step with the PEG parser.  
Whenever
the PEG parser needs to do a lookbehind, it just checks to see 
if the
companion automaton is in a matching state for the capture it 
needs.


Sounds quite slow to do it just in case. Also complete DFAs 
tend to be mm quite big.




I was envisioning it being done lazily and strictly as-needed, if 
I even got around to doing it at all.


What ANTLR does is similar technique - a regular lookahead to 
resolve ambiguity in the grammar (implicitly). A lot like LL(k) 
but with unlimited length (so called LL(*)). Of course, it 
generates LL(k) disambiguation where possible, then LL(*), 
failing that the usual backtracking.




Neat.



*sigh*, I feel like I could write a paper on this stuff if I 
were in
grad school right now.  Alas, I am stuck doing 50-60 hours a 
week of

soul-sucking business programming.


I heard Sociomantic is hiring D programmers for coding some 
awesome stuff, you may as well apply :)




Tempting.

And it seems even Facebook is joining the market now, which is 
news to me.



Well, then again, my understanding
is that even though I can think of things that seem like they 
would make
interesting topics for publishable papers, reality would have 
the profs
conscript me to do completely different things that are 
possibly just as

inane as the business programming.


Speaking for my limited experience - at times it's like that.



Yay, someone's observations corroborate mine!
...
Crap, someone observations corroborate mine :(

;)

I worry that the greater threat to good AST manipulation tools 
in D is a

lack of free time, and not the DMD bugs as much.


Good for you I guess, my developments in related area are 
blocked still :(




Well, hopefully you've got the wind at your back now.


Re: D parsing

2013-11-06 Thread Jacob Carlborg

On 2013-11-05 17:55, Philippe Sigaud wrote:


Walter is far from convinced that AST manipulation is a good thing. You
would have to convince him first. His fear is that it will lead to
unreadable code, and everyone using her own personnal version of D.
AFAICT, nothing of the sort happened in Lisp (I mean, Lispers have
balkanization, but *not* due to AST manipulation).


You know, Walter did a talk about AST macros at the first D conference. 
The idea back then was to add AST macros, hence the macro keyword.


--
/Jacob Carlborg


Re: D parsing

2013-11-06 Thread Chad Joan
On Tuesday, 5 November 2013 at 16:31:52 UTC, Philippe Sigaud 
wrote:
On Tue, Nov 5, 2013 at 5:45 AM, Chad Joan chadj...@gmail.com 
wrote:


 Use the repetition operator(s) and turn the resulting array 
into
whatever tree you like.  In practice, I have never had a 
problem with this.


I have used both Pegged and have written an SQL parser at 
work (not
exhaustive, just what's needed) that uses C macros as PEG 
elements.
 Tangent: Using C macros for this worked surprisingly well 
and allowed me
to avoid an extra step in the build process (I do not have 
access to D for
that project).  Pegged is still much more scalable, 
optimizable, and the

grammars are a lot less ugly/finicky.




That made my day, thanks!


You're very welcome!

Thank you for spending all that time to make a neat thing.

Maybe I'll rub it in:
Pegged taught me about PEGs and how to build reasonably powerful 
parsers with extremely limited tools and support.  It's like 
being able to make shelter in the woods when no one gives you a 
tent.  And my company's codebase is a jungle.  So even when D 
isn't around, it is still /useful/.  Thanks.


Re: D parsing

2013-11-06 Thread Jacob Carlborg

On 2013-11-05 18:23, Martin Nowak wrote:


Sounds promising.
I think I found a way to implement a D REPL but are lacking a reliable
way to classify code snippets (expressions, declarations, statements)
and get introduced symbol names.
Do you think it will be able to handle that (performance is not an issue)?

If there is no parser I either have to use a crude heuristic
or I could use machine learning to classify the code based on token
histograms (after all we have at least 2 D lexers).


There's a parser available, DScanner: 
https://github.com/Hackerpilot/Dscanner. It's part of DCD, autocomplete 
daemon.


--
/Jacob Carlborg


Re: D parsing

2013-11-06 Thread Chad Joan
On Tuesday, 5 November 2013 at 20:17:07 UTC, Andrei Alexandrescu 
wrote:


3. Grammar changes are the simplest ones and in a way the most 
embarrassing if they happen. The best solution I see to that is 
deriving the documentation and the actual parser from the same 
source. This is part of why I'm so keen on parser generators.




This.

This is why I have a hard time accepting any formalize a grammar 
first arguments.  I think any grammars for D will be pure 
speculation until they become executable and start to see heavy 
use in the tools that we use.


Re: D parsing

2013-11-06 Thread Chad Joan
On Wednesday, 6 November 2013 at 08:19:13 UTC, Jacob Carlborg 
wrote:

On 2013-11-05 17:55, Philippe Sigaud wrote:

Walter is far from convinced that AST manipulation is a good 
thing. You
would have to convince him first. His fear is that it will 
lead to
unreadable code, and everyone using her own personnal version 
of D.
AFAICT, nothing of the sort happened in Lisp (I mean, Lispers 
have

balkanization, but *not* due to AST manipulation).


You know, Walter did a talk about AST macros at the first D 
conference. The idea back then was to add AST macros, hence the 
macro keyword.


Also, IIRC, it is believed that string mixins with CTFE are 
potentially more powerful.  I am under the assumption that Walter 
is taking the long view and waiting for the community to furnish 
their own powerful AST manipulation tools using the existing 
spec.  I suspect that he is opposed to baking AST manipulation 
into the /language spec/, but is perfectly accepting of the 
notion of using AST manipulation to generate code, reduce 
boilerplate, implement exotic features and DSLs, and so on.  Just 
don't complicate the core language any more than it already is.  
Sorry if I misrepresented you Walter; I can only make educated 
guesses ;)


Re: D parsing

2013-11-06 Thread Chris Cain

On Wednesday, 6 November 2013 at 08:34:32 UTC, Chad Joan wrote:
Also, IIRC, it is believed that string mixins with CTFE are 
potentially more powerful.


Maybe. But first, the point that Brian Schott brought up has to 
be addressed (language spec and DMD must be in sync). Second, 
CTFE has to get more efficient. It's _very_ easy to make it run 
out of memory and it's fairly slow at executing. But yeah, I 
agree that there could be a library AST manipulator and it'd be 
pretty nice. But it'll require that two sets of parsers in two 
different languages be kept in sync, which is likely to be a 
challenge.


There's pros and cons to both of the approaches, really.


Re: D parsing

2013-11-06 Thread Chad Joan

On Wednesday, 6 November 2013 at 09:09:32 UTC, Chris Cain wrote:

On Wednesday, 6 November 2013 at 08:34:32 UTC, Chad Joan wrote:
Also, IIRC, it is believed that string mixins with CTFE are 
potentially more powerful.


Maybe. But first, the point that Brian Schott brought up has to 
be addressed (language spec and DMD must be in sync). Second, 
CTFE has to get more efficient. It's _very_ easy to make it run 
out of memory and it's fairly slow at executing. But yeah, I 
agree that there could be a library AST manipulator and it'd be 
pretty nice. But it'll require that two sets of parsers in two 
different languages be kept in sync, which is likely to be a 
challenge.


There's pros and cons to both of the approaches, really.


Right.

My thoughts are, be patient.  The slow CTFE can be fixed 
without changes to the language spec, and it has progressed 
really well over the last couple years.  The DMD vs spec thing is 
also fixable by building tools that operate on a formal grammar 
and forcing the grammar to be thoroughly explored.  At the end of 
the day, I believe these will both be pleasantly fixed, and the 
spec won't end up with a bunch of complexity that was useful 10 
years ago.  It all makes sense to me, at least.


Re: D parsing

2013-11-06 Thread Philippe Sigaud
On Wed, Nov 6, 2013 at 9:19 AM, Jacob Carlborg d...@me.com wrote:


 You know, Walter did a talk about AST macros at the first D conference.
 The idea back then was to add AST macros, hence the macro keyword.


I know. But since then, I guess he changed his mind, or thought about it a
bit more :-)


Re: D parsing

2013-11-06 Thread Philippe Sigaud
On Wed, Nov 6, 2013 at 9:34 AM, Chad Joan chadj...@gmail.com wrote:

 On Wednesday, 6 November 2013 at 08:19:13 UTC, Jacob Carlborg wrote:

 On 2013-11-05 17:55, Philippe Sigaud wrote:

  Walter is far from convinced that AST manipulation is a good thing. You
 would have to convince him first. His fear is that it will lead to
 unreadable code, and everyone using her own personnal version of D.
 AFAICT, nothing of the sort happened in Lisp (I mean, Lispers have
 balkanization, but *not* due to AST manipulation).


 You know, Walter did a talk about AST macros at the first D conference.
 The idea back then was to add AST macros, hence the macro keyword.


 Also, IIRC, it is believed that string mixins with CTFE are potentially
 more powerful.  I am under the assumption that Walter is taking the long
 view and waiting for the community to furnish their own powerful AST
 manipulation tools using the existing spec.  I suspect that he is opposed
 to baking AST manipulation into the /language spec/, but is perfectly
 accepting of the notion of using AST manipulation to generate code, reduce
 boilerplate, implement exotic features and DSLs, and so on.  Just don't
 complicate the core language any more than it already is.  Sorry if I
 misrepresented you Walter; I can only make educated guesses ;)


Well, I remember him saying that he doesn't want everybody and their dog to
change the D syntax. If we get macros, that's exactly what we can do. For
example, telling the compiler (or another tool), to rewrite

int a := 1;

into

immutable a = 1;

But that means the original not-D code cannot be compiled by anyone with a
D compiler anymore (obviously). And soon everybody is transforming the
grammar according to what they think is cool.
That's his PoV IIRC.

I'm more interested in getting to ability to define my own lowerings: in
the same way that scope statements or foreach over ranges are rewritten
into lower-level D code by the comiler, I would like to be able to define
my own syntactic constructs.

Plus, everybody here know that AST macros can cure world hunger :) Wouldn't
that be nice?


Re: D parsing

2013-11-06 Thread Philippe Sigaud
On Tue, Nov 5, 2013 at 6:39 PM, Martin Nowak c...@dawg.eu wrote:


 Like many others I'm hoping for a nice general parser generator for quite
 some time.
 So I'm also asking specifically about your insights on PEGs.
 From what I know they do have some memory issues due to the backtrace
 memoization (only for deep parse trees though, which aren't common in
 programming languages). Also it seems that the research community has some
 issues to formalize PEGs and what grammars they are capable to handle.


IIRC, they don't fit exactly inside the Chomsky hierarchy. But then, so do
other grammars and parsing algorithms.



 Also why is the expr grammar example so complicated?


What do you mean?


Re: D parsing

2013-11-06 Thread Philippe Sigaud
On Tue, Nov 5, 2013 at 6:45 PM, Martin Nowak c...@dawg.eu wrote:

 On 11/03/2013 02:45 AM, Timothee Cour wrote:

 1)
 The main issue I see with pegged is PEG grammars don't support left
 recursion, so for example will fail on foo[1].bar(2).fun().
 Unless there's a plan to accomodate those, I sense a dead end.
 One can eliminate left recursion but this has issues.

 2)
 There is some material on extending PEG to support those, eg Left
 Recursion in Parsing Expression Grammars, or code
 https://github.com/orlandohill/peg-left-recursion but I don't know how
 well they work in practice.

  Scala extended their PEG generator to handle left recursion.
 But then there is also a Scala GLL implementation.
 https://github.com/djspiewak/gll-combinators


I didn't know that. My 'programming in Scala' book is a bit dated.
I'll have a look, thanks.


Re: D parsing

2013-11-06 Thread Brian Schott
On Wednesday, 6 November 2013 at 08:21:57 UTC, Jacob Carlborg 
wrote:

It's part of DCD, autocomplete daemon.


Side note: DCD 0.2.0 is in beta now. There's a thread on the .ide 
list. 
http://forum.dlang.org/thread/flxjzwuaadnkrixry...@forum.dlang.org




Re: D parsing

2013-11-05 Thread Manfred Nowak
Brian Schott wrote:

 Why am I the only one who thinks this is a problem?

meetoo.

-manfred


Re: D parsing

2013-11-05 Thread Dmitry Olshansky

05-Nov-2013 10:36, Chad Joan пишет:

On Friday, 1 November 2013 at 21:22:46 UTC, Andrei Alexandrescu wrote:

...
Bugs stopping Pegged from going forward should receive high priority.
I also encourage others to participate to that and similar work.


Andrei



[snip]


I was working on a parser-engine-as-a-library that could be used to as
optimized internals for Pegged, as well as any other tools that need to
recognize these common patterns.

The idea was to expose an API like this:

 string makeParser()
 {
 auto builder = new ParserBuilder!char;
 builder.pushSequence();
 builder.literal('x');
 builder.pushMaybe();
 builder.literal('y');
 builder.pop();
 builder.pop();
 return builder.toDCode(callMe);
 }


I was also toying with the idea of exposing Builder interface for 
std.regex. But push/pop IMHO are better be implicitly designed-out:


auto re = atom('x').star(charClass(unicode.Letter),atom('y')).build();

... and letting the nesting be explicit.

Is the same as:
auto re = regex(`x(?:\p{L}y)*`);

Aimed for apps/libs that build regular expressions anyway and have no 
need in textual parser.



That snippet would create a parser that recognizes the grammar 'x' (
'y'? ).
The current fledgling implementation creates this parser:
http://pastebin.com/MgSqWXE2

Of course, no one would be expected to write grammars like that. It
would be the job of tools like Pegged or std.regex to package it up in
nice syntax that is easy to use.


I thought to provide some building blocks for that with new std.uni. Not 
quite everything I wanted, but now at least there is one set of wheels 
less to reinvent.


[snip]


Another fun thought: PEGs can have look-behind that includes any regular
elements without any additional algorithmic complexity. Just take all of
the look-behinds in the grammar, mash them together into one big
regular-expression using regular alternation (|), and then have the
resulting automaton consume in lock-step with the PEG parser.  Whenever
the PEG parser needs to do a lookbehind, it just checks to see if the
companion automaton is in a matching state for the capture it needs.


Sounds quite slow to do it just in case. Also complete DFAs tend to be 
mm quite big.


What ANTLR does is similar technique - a regular lookahead to resolve 
ambiguity in the grammar (implicitly). A lot like LL(k) but with 
unlimited length (so called LL(*)). Of course, it generates LL(k) 
disambiguation where possible, then LL(*), failing that the usual 
backtracking.




*sigh*, I feel like I could write a paper on this stuff if I were in
grad school right now.  Alas, I am stuck doing 50-60 hours a week of
soul-sucking business programming.


I heard Sociomantic is hiring D programmers for coding some awesome 
stuff, you may as well apply :)



Well, then again, my understanding
is that even though I can think of things that seem like they would make
interesting topics for publishable papers, reality would have the profs
conscript me to do completely different things that are possibly just as
inane as the business programming.


Speaking for my limited experience - at times it's like that.


I worry that the greater threat to good AST manipulation tools in D is a
lack of free time, and not the DMD bugs as much.


Good for you I guess, my developments in related area are blocked still :(


I hope this is useful to someone!



--
Dmitry Olshansky


Re: D parsing

2013-11-05 Thread Philippe Sigaud
On Tue, Nov 5, 2013 at 5:45 AM, Chad Joan chadj...@gmail.com wrote:

  Use the repetition operator(s) and turn the resulting array into
 whatever tree you like.  In practice, I have never had a problem with this.

 I have used both Pegged and have written an SQL parser at work (not
 exhaustive, just what's needed) that uses C macros as PEG elements.
  Tangent: Using C macros for this worked surprisingly well and allowed me
 to avoid an extra step in the build process (I do not have access to D for
 that project).  Pegged is still much more scalable, optimizable, and the
 grammars are a lot less ugly/finicky.


That made my day, thanks!


Re: D parsing

2013-11-05 Thread Philippe Sigaud
On Tue, Nov 5, 2013 at 3:54 PM, Dmitry Olshansky dmitry.o...@gmail.comwrote:




 I was also toying with the idea of exposing Builder interface for
 std.regex. But push/pop IMHO are better be implicitly designed-out:

 auto re = atom('x').star(charClass(unicode.Letter),atom('y')).build();

 ... and letting the nesting be explicit.

 Is the same as:
 auto re = regex(`x(?:\p{L}y)*`);

 Aimed for apps/libs that build regular expressions anyway and have no need
 in textual parser.


Another possible advantage is to reference external names inside your
construction, thus naming other regexen or refencing external variables to
deposit backreferences inside them. All in all, to get a regex construct
that can interact with the external word.




 What ANTLR does is similar technique - a regular lookahead to resolve
 ambiguity in the grammar (implicitly). A lot like LL(k) but with unlimited
 length (so called LL(*)). Of course, it generates LL(k) disambiguation
 where possible, then LL(*), failing that the usual backtracking.


I liked that idea since the author added it to ANTLR, but I never used it
since.
I wonder whether that can be implemented inside another parser generator or
if it uses some specific-to-ANTLR internal machinery.





 *sigh*, I feel like I could write a paper on this stuff if I were in
 grad school right now.  Alas, I am stuck doing 50-60 hours a week of
 soul-sucking business programming.


 I heard Sociomantic is hiring D programmers for coding some awesome stuff,
 you may as well apply :)




Well, I do 50-60 hours a week of fulfilling and challenging work, but the
free time at the end is the same ;) I feel Chad's pain, though.




  Well, then again, my understanding
 is that even though I can think of things that seem like they would make
 interesting topics for publishable papers, reality would have the profs
 conscript me to do completely different things that are possibly just as
 inane as the business programming.

 Speaking for my limited experience - at times it's like that.


 I worry that the greater threat to good AST manipulation tools in D is a
 lack of free time, and not the DMD bugs as much.


 Good for you I guess, my developments in related area are blocked still :(



Walter is far from convinced that AST manipulation is a good thing. You
would have to convince him first. His fear is that it will lead to
unreadable code, and everyone using her own personnal version of D.

AFAICT, nothing of the sort happened in Lisp (I mean, Lispers have
balkanization, but *not* due to AST manipulation).


Re: D parsing

2013-11-05 Thread Philippe Sigaud
On Tue, Nov 5, 2013 at 8:27 AM, Brian Schott briancsch...@gmail.com wrote:

 But we should take a step back first. Before we try to implement a parser
 for D's grammar, we need to figure out what exactly D's grammar is.

 Seriously. We don't have a real grammar for D. We have the language spec
 on dlang.org, but it isn't complete, consistent, or updated when the
 language changes. Want examples? I have a tracker for them here:
 http://d.puremagic.com/issues/show_bug.cgi?id=10233

 There's also my project here: https://github.com/Hackerpilot/DGrammar,
 but it's not official and I keep finding differences between it and the
 behavior of DMD.

 Why am I the only one who thinks this is a problem?


You're not alone in this, but Walter's answer is that we must make pull
requests on the website if we find errors in the online grammar. The
problem is not knowing what the intended behavior is. When the reference
compiler and the grammar disagree, who I am to say which is wrong?

We could also see if some slight modification to the grammar could push it
into LALR(1) territory.


Re: D parsing

2013-11-05 Thread Martin Nowak

On 11/02/2013 05:06 AM, Manfred Nowak wrote:

One prerequisite for every PEG-Parser is, that the language has to be
designed to be without any ambiguity.


AFAIK Parser expression grammars handle ambiguity through prioritization.
You'd still need to rewrite grammar rules that involve left-recursion or 
hidden left recursion.


Re: D parsing

2013-11-05 Thread Martin Nowak

On 11/01/2013 09:59 PM, Philippe Sigaud wrote:

The examples directory shows different grammars, from JSON to XML to C to D.


Sounds promising.
I think I found a way to implement a D REPL but are lacking a reliable 
way to classify code snippets (expressions, declarations, statements) 
and get introduced symbol names.

Do you think it will be able to handle that (performance is not an issue)?

If there is no parser I either have to use a crude heuristic
or I could use machine learning to classify the code based on token 
histograms (after all we have at least 2 D lexers).


Re: D parsing

2013-11-05 Thread Martin Nowak

On 11/01/2013 09:59 PM, Philippe Sigaud wrote:


I used it with the D grammar, but the one on the website is woefully
inadequate (some mistakes, out of date compared to the compiler and
written in a somewhat convoluted style full of left-recursion). The
shortcomings are that the generated parser is quite slow compared to
other D parsers.

That comes from my coding, of course: Pegged generates a simple
recursive descent parser. I guess I could push for a better engine, but
I'm waiting for CTFE to get a bit better.

The advantages are nice, though: full parse tree, semantic actions, and
so on.


Like many others I'm hoping for a nice general parser generator for 
quite some time.

So I'm also asking specifically about your insights on PEGs.
From what I know they do have some memory issues due to the backtrace 
memoization (only for deep parse trees though, which aren't common in 
programming languages). Also it seems that the research community has 
some issues to formalize PEGs and what grammars they are capable to handle.

Also why is the expr grammar example so complicated?


Re: D parsing

2013-11-05 Thread Martin Nowak

On 11/03/2013 02:45 AM, Timothee Cour wrote:

1)
The main issue I see with pegged is PEG grammars don't support left
recursion, so for example will fail on foo[1].bar(2).fun().
Unless there's a plan to accomodate those, I sense a dead end.
One can eliminate left recursion but this has issues.

2)
There is some material on extending PEG to support those, eg Left
Recursion in Parsing Expression Grammars, or code
https://github.com/orlandohill/peg-left-recursion but I don't know how
well they work in practice.


Scala extended their PEG generator to handle left recursion.
But then there is also a Scala GLL implementation.
https://github.com/djspiewak/gll-combinators



Re: D parsing

2013-11-05 Thread Manfred Nowak
Martin Nowak wrote:
 AFAIK Parser expression grammars handle ambiguity through prioritization.

My statement about ambiguity is taken from the first sentence of chapter 7 
of Parsing Expression Grammars: A Recognition-Based Syntactic Foundation
by Brian Ford: http://pdos.csail.mit.edu/papers/parsing%3Apopl04.pdf:

| Parsing expression grammars provide a [...] foundation for [...]
| languages that are designed to be unambiguous.

-manfred


Re: D parsing

2013-11-05 Thread Martin Nowak

On 10/30/2013 11:39 PM, jgetner wrote:

Is there a function in D for evaluating expressions before compile time?.


I want to throw two more ideas in the parser generator topic which
I carry around for quite some time. I think combined they'd enable AST 
generation from clean and simple grammars. Maybe someone find this 
interesting enough to give it a try, I hardly will ever get to this.


There is an interesting paper [MTP] that describes how a slight 
modification of the ENBF grammar can be used to generate a usable AST
from the grammar definition. Each AST node is named like it's 
production. This is possible because the modified grammar disallows to 
mix alternates and sequences which forces one to name all productions.


The other paper is [GLL] parsing which is a new general parser 
generation technique. The main advantage over GLR, it's a top-down 
parser. This allows to modularize grammars, e.g. you can define a 
grammar that reuses/imports an existing grammar for expressions.


The reason to choose generalized parsers instead of say LALR(1)
is maintainability of the grammars. The needed modifications to make
a context free grammar LR(1) or LALR(1)-compatible often make them 
extremely hard to read and change.
Parse forests (many parse trees) are not a problem I think because most 
CFG only have local ambiguities so they resolve to a single parse tree.


### A motivating example (desiderata)

grammar.d
```
import number;

Identifier = [_a-zA-Z][a-zA-Z]*;

// This still needs some syntax to annotate precedence and associativity
MulExpr = Expr * Expr;
DivExpr = Expr / Expr;
AddExpr = Expr + Expr;
SubExpr = Expr - Expr;
PowExpr = Expr ^^ Expr;

Expr = MulExpr | DivExpr | AddExpr | SubExpr | PowExpr | Number | Identifer;

Arguments = Expr % ,;

ExprStmt = Expr ;;
CallStmt = Identifier ( Arguments );

Stmt = ExprStmt | CallStmt;
Stmts = Stmt*;
```
```d
auto p = genParser(import(grammar.d));
auto stmts = p.parseStmts(input);
autp expr = p.parseExpr(input);
foreach (arg; p.parseArguments(input))
{}

final switch (expr)
{
case MulExpr.tag:
case DivExpr.tag:
case AddExpr.tag:
case SubExpr.tag:
case PowExpr.tag:
}

static assert(typeof(MulExpr == Tuple!(Expr, expr0, Expr, expr1)));
static assert(typeof(Stmt == Variant!(ExprStmt, CallStmt)));
static assert(typeof(Identifier) == string));
```

[MTP]: http://babel.ls.fi.upm.es/research/mtp/
   http://babel.ls.fi.upm.es/~angel/papers/finalcopy-2005prole.pdf

[GLL]: http://www.rhul.ac.uk/computerscience/research/csle/gllparsers.aspx
   http://dotat.at/tmp/gll.pdf
   for implementers - Modelling GLL Parser Implementations 
(http://dx.doi.org/10.1007/978-3-642-19440-5_4)


Re: D parsing

2013-11-05 Thread Martin Nowak

On 11/04/2013 06:52 AM, Philippe Sigaud wrote:

Do you know what part of the D grammar makes it non-LALR(1)?


At least function and template function declarations are not even LR(1).
You need to look for a second left parenthesis to distinguish both.

It's fairly easy to solve this though.
1. Add a new token r)[ ]*(
2. Add an alternate rule that never matches but forces the parser to 
look behind the closing paren of the function arguments.

...


Re: D parsing

2013-11-05 Thread Dmitry Olshansky

05-Nov-2013 20:55, Philippe Sigaud пишет:

On Tue, Nov 5, 2013 at 3:54 PM, Dmitry Olshansky dmitry.o...@gmail.com
mailto:dmitry.o...@gmail.com wrote:


I was also toying with the idea of exposing Builder interface for
std.regex. But push/pop IMHO are better be implicitly designed-out:

auto re =
atom('x').star(charClass(__unicode.Letter),atom('y')).__build();

... and letting the nesting be explicit.

Is the same as:
auto re = regex(`x(?:\p{L}y)*`);

Aimed for apps/libs that build regular expressions anyway and have
no need in textual parser.

Another possible advantage is to reference external names inside your
construction, thus naming other regexen or refencing external variables
to deposit backreferences inside them.


Actually it's a bad, bad idea. It has nice potential to destroy all 
optimization opportunities and performance guarantees of it (like being 
linear in time, and that only works today w/o funky extensions used).


After all I'm in a curious position of having to do some work at R-T as 
well where you can't always just generate some D code ;)


What would be real nice though is to let users register their own 
dictionary of 'tokens' from that. Then things like Ipv4 pattern or 
domain name pattern as simple as `\d` pieces they use today (say with 
\i{user-defined-name}).


 All in all, to get a regex
 construct that can interact with the external word.

Well, I think of some rather interesting ways to do it even w/o tying in 
some external stuff as building blocks. It's rather making std.regex 
itself less rigid and more lean (as in cheap to invoke). Then external 
modules may slice and dice its primitives as seen fit.




What ANTLR does is similar technique - a regular lookahead to
resolve ambiguity in the grammar (implicitly). A lot like LL(k) but
with unlimited length (so called LL(*)). Of course, it generates
LL(k) disambiguation where possible, then LL(*), failing that the
usual backtracking.

I liked that idea since the author added it to ANTLR, but I never used
it since.
I wonder whether that can be implemented inside another parser generator
or if it uses some specific-to-ANTLR internal machinery.


I don't think there is much of specific in it. You would though have to 
accept it's no longer a PEG but rather some hybrid top-down EBNF parser 
that resolves ambiguities.



I worry that the greater threat to good AST manipulation tools
in D is a
lack of free time, and not the DMD bugs as much.


Good for you I guess, my developments in related area are blocked
still :(

Walter is far from convinced that AST manipulation is a good thing. You
would have to convince him first.


I thought it was about tools that work with D code like say lints, 
refactoring, etc.



--
Dmitry Olshansky


Re: D parsing

2013-11-05 Thread Andrei Alexandrescu

On 11/4/13 11:27 PM, Brian Schott wrote:

On Monday, 4 November 2013 at 13:20:01 UTC, Timothee Cour wrote:

for lexing there's already dscanner we could use (while we wait for
perhaps
a autogenerated lexer);
so I think priority is on the autogenerated parser (dscanner has one but
hand designed), where it's still unknown what will work well.


Yes, that tool has two properties:
1) It works now. Not Soon(tm). You can download it, compile it, and use
it to dump the AST of your D code in just a minute or two.
2) It wasn't built THE ONE TRUE WAY.

But we should take a step back first. Before we try to implement a
parser for D's grammar, we need to figure out what exactly D's grammar is.

Seriously. We don't have a real grammar for D. We have the language spec
on dlang.org, but it isn't complete, consistent, or updated when the
language changes. Want examples? I have a tracker for them here:
http://d.puremagic.com/issues/show_bug.cgi?id=10233

There's also my project here: https://github.com/Hackerpilot/DGrammar,
but it's not official and I keep finding differences between it and the
behavior of DMD.

Why am I the only one who thinks this is a problem?


I agree it's a problem, in fact three problems in one. In decreasing 
difficulty order:


1. Semantic changes for working code (e.g. order of evaluation etc) are 
subtle enough to be very difficult to track and require sheer attention 
and careful manual verification and maintenance of documentation.


2. Semantic analysis changes (i.e. compiles/doesn't compile) are also 
difficult and require attention, but at least can be to a good extent 
verified automatically (by means of test suites and runnable examples). 
In TDPL I have two categories of examples - visible and invisible. The 
invisible ones do not occur in the printed text but are present in the 
book source and are used to check whether the claims made by the book 
are true. It would be really cool if we had something like that for the 
online documentation. We should be able to intersperse freely 
documentation text with invisible unittests that ensure the 
documentation is correct.


3. Grammar changes are the simplest ones and in a way the most 
embarrassing if they happen. The best solution I see to that is deriving 
the documentation and the actual parser from the same source. This is 
part of why I'm so keen on parser generators.




Andrei

P.S. I haven't forgotten about the lexer - it's still on the back burner 
but I will publish it as soon as I get a chance.


Re: D parsing

2013-11-05 Thread cal

On Tuesday, 5 November 2013 at 17:23:23 UTC, Martin Nowak wrote:

On 11/01/2013 09:59 PM, Philippe Sigaud wrote:
The examples directory shows different grammars, from JSON to 
XML to C to D.


Sounds promising.
I think I found a way to implement a D REPL but are lacking a 
reliable way to classify code snippets (expressions, 
declarations, statements) and get introduced symbol names.
Do you think it will be able to handle that (performance is not 
an issue)?


I did similar using Pegged little while back
(https://github.com/callumenator/dabble), I tuned the PEG grammar
quite a lot, but it worked ok.


Re: D parsing

2013-11-04 Thread Robert Schadek
On 11/04/2013 06:48 AM, Philippe Sigaud wrote:
 On Sun, Nov 3, 2013 at 7:08 PM, Timothee Cour
 thelastmamm...@gmail.com mailto:thelastmamm...@gmail.com wrote:


 On Sun, Nov 3, 2013 at 1:13 AM, Philippe Sigaud
 philippe.sig...@gmail.com mailto:philippe.sig...@gmail.comwrote:


 My current plan is to write different engines, and letting
 either the user select them at compile-time, or to have the
 parser decide which one to use, depending on the grammar. I'm
 pretty sure the 'Type 3' parts of a grammar (regular
 expressions) could be bone by using std.regex, for example.


 even lexing can't be done with regex, eg nesting comments : /+ ... +/
 Also, although it may seem cleaner at first to combine lexing and
 parsing in 1 big grammar (as done in pegged), it usually is faster
 do feed a (separate) lexer output into parser. 


 Lexing, yes. I was imprecise: even in a context-free grammar, some
 rules are regular and could use std.regex (the ct part) as the
 underlying engine, just for that rule.
Lexing can not be done with regex. Think myArray[1. ] ! What is next a
dot or a number.


Re: D parsing

2013-11-04 Thread Robert Schadek
On 11/04/2013 06:52 AM, Philippe Sigaud wrote:

 On Mon, Nov 4, 2013 at 1:55 AM, Timothee Cour
 thelastmamm...@gmail.com mailto:thelastmamm...@gmail.com wrote:


 I guess I'll have to write a CT-compatible LALR(1) engine...
  
 D does not fit into LALR(1), you need glr.


 Well, too bad. GLR is also on my plate, but I fear its cubic behavior
 (IIRC, it degrades gracefully, though).
Thats one part, and even worse is that you need a decider function if
more than one rule accepts.
 Do you know what part of the D grammar makes it non-LALR(1)?
I had big trouble with the IdentifierList rules.


Re: D parsing

2013-11-04 Thread Dmitry Olshansky

04-Nov-2013 12:28, Robert Schadek пишет:

On 11/04/2013 06:48 AM, Philippe Sigaud wrote:

On Sun, Nov 3, 2013 at 7:08 PM, Timothee Cour
thelastmamm...@gmail.com mailto:thelastmamm...@gmail.com wrote:


On Sun, Nov 3, 2013 at 1:13 AM, Philippe Sigaud
philippe.sig...@gmail.com mailto:philippe.sig...@gmail.comwrote:


My current plan is to write different engines, and letting
either the user select them at compile-time, or to have the
parser decide which one to use, depending on the grammar. I'm
pretty sure the 'Type 3' parts of a grammar (regular
expressions) could be bone by using std.regex, for example.


even lexing can't be done with regex, eg nesting comments : /+ ... +/
Also, although it may seem cleaner at first to combine lexing and
parsing in 1 big grammar (as done in pegged), it usually is faster
do feed a (separate) lexer output into parser.


Lexing, yes. I was imprecise: even in a context-free grammar, some
rules are regular and could use std.regex (the ct part) as the
underlying engine, just for that rule.

Lexing can not be done with regex. Think myArray[1. ] ! What is next a
dot or a number.


You could use lookahead extension ;)

In general I would not recommend ATM to use std.regex for 
performance-critical lexer if only because it wasn't tuned for such a 
use case yet.


I have plans for extending std.regex capabilities in many directions, 
lexing being one important use case.


--
Dmitry Olshansky


Re: D parsing

2013-11-04 Thread Timothee Cour
for lexing there's already dscanner we could use (while we wait for perhaps
a autogenerated lexer);
so I think priority is on the autogenerated parser (dscanner has one but
hand designed), where it's still unknown what will work well.


On Mon, Nov 4, 2013 at 2:43 AM, Dmitry Olshansky dmitry.o...@gmail.comwrote:

 04-Nov-2013 12:28, Robert Schadek пишет:

 On 11/04/2013 06:48 AM, Philippe Sigaud wrote:

 On Sun, Nov 3, 2013 at 7:08 PM, Timothee Cour
 thelastmamm...@gmail.com mailto:thelastmamm...@gmail.com wrote:


 On Sun, Nov 3, 2013 at 1:13 AM, Philippe Sigaud
 philippe.sig...@gmail.com mailto:philippe.sig...@gmail.comwrote:



 My current plan is to write different engines, and letting
 either the user select them at compile-time, or to have the
 parser decide which one to use, depending on the grammar. I'm
 pretty sure the 'Type 3' parts of a grammar (regular
 expressions) could be bone by using std.regex, for example.


 even lexing can't be done with regex, eg nesting comments : /+ ... +/
 Also, although it may seem cleaner at first to combine lexing and
 parsing in 1 big grammar (as done in pegged), it usually is faster
 do feed a (separate) lexer output into parser.


 Lexing, yes. I was imprecise: even in a context-free grammar, some
 rules are regular and could use std.regex (the ct part) as the
 underlying engine, just for that rule.

 Lexing can not be done with regex. Think myArray[1. ] ! What is next a
 dot or a number.


 You could use lookahead extension ;)

 In general I would not recommend ATM to use std.regex for
 performance-critical lexer if only because it wasn't tuned for such a use
 case yet.

 I have plans for extending std.regex capabilities in many directions,
 lexing being one important use case.

 --
 Dmitry Olshansky



Re: D parsing

2013-11-04 Thread Chad Joan

On Sunday, 3 November 2013 at 01:45:23 UTC, Timothee Cour wrote:

1)
The main issue I see with pegged is PEG grammars don't support 
left

recursion, so for example will fail on foo[1].bar(2).fun().
Unless there's a plan to accomodate those, I sense a dead end.
One can eliminate left recursion but this has issues.



Use the repetition operator(s) and turn the resulting array into 
whatever tree you like.  In practice, I have never had a problem 
with this.


I have used both Pegged and have written an SQL parser at work 
(not exhaustive, just what's needed) that uses C macros as PEG 
elements.  Tangent: Using C macros for this worked surprisingly 
well and allowed me to avoid an extra step in the build process 
(I do not have access to D for that project).  Pegged is still 
much more scalable, optimizable, and the grammars are a lot less 
ugly/finicky.


Re: D parsing

2013-11-04 Thread Chad Joan
On Friday, 1 November 2013 at 21:22:46 UTC, Andrei Alexandrescu 
wrote:

...
Bugs stopping Pegged from going forward should receive high 
priority. I also encourage others to participate to that and 
similar work.



Andrei


Ack!  I share Philippe's sense of timing.  This would have been 
even nicer to hear a year ago when both of us were actively 
committing ;)


I was going to close a project or two that I care deeply about 
and had started a long time ago, but I see that this might be a 
hard decision.


Nonetheless, I am really glad that you are showing this interest! 
 I like to hear stuff like this, since I too really like Pegged.



Andrei and Philippe,

I feel compelled to share some of my thoughts that I never had 
time to finish back then.


I was working on a parser-engine-as-a-library that could be used 
to as optimized internals for Pegged, as well as any other tools 
that need to recognize these common patterns.


The idea was to expose an API like this:

string makeParser()
{
auto builder = new ParserBuilder!char;
builder.pushSequence();
builder.literal('x');
builder.pushMaybe();
builder.literal('y');
builder.pop();
builder.pop();
return builder.toDCode(callMe);
}

const foo = makeParser();

pragma(msg, foo);

mixin(foo);

That snippet would create a parser that recognizes the grammar 
'x' ( 'y'? ).

The current fledgling implementation creates this parser:
http://pastebin.com/MgSqWXE2

Of course, no one would be expected to write grammars like that.  
It would be the job of tools like Pegged or std.regex to package 
it up in nice syntax that is easy to use.


The code already takes a somewhat different strategy than 
Pegged's original strategy.  Rather than generating a bunch of 
templates that dmd then has to instantiate to actualize the 
parser, it just emits a bunch of very primitive procedural D 
code.  I suspect that this approach would mixin far faster with 
current dmd, because the deeply nested templates generated by 
Pegged seemed to be a bottleneck.  I have to hand it to Philippe 
though for coming up with a very clever way to bootstrap the 
thing: once I saw how his templates assembled together, I 
realized just how convenient that was!


(And I think my parser generator still has to be tought how to 
avoid making redundant branches in its output: there's some hash 
table action that belongs in there somewhere.)


The small amount of code that I have for it is here:
https://github.com/chadjoan/xdc/blob/master/src/xdc/parser_builder/parser_builder.d

I wanted to eventually make it generic enough to recognize 
patterns in things besides strings.  Being able to write grammars 
that recognize patterns in ASTs is /useful/.  That leads into the 
whole xdc project: automate all of the tedious crud in semantic 
analysis, and thus make compiler writing, and possibly other AST 
manipulations in user code, become all much easier.


The other thing I wanted to do was to optimize it.
- I intended it to do no allocations unless the caller asks for 
it.

- I intended to do a bunch of PEG/regex hybridization.

I noticed some mathematical properties of PEGs and regular 
expressions that should allow you to mix the two as much as 
possible.  All you have to do is tell it how to behave at the 
boundaries where they meet.  And given that PEGs already define 
their own behavior pretty well, it would become possible to lower 
a lot of a PEG into regular expressions connected with a minimal 
set of PEG rules.  This would be some awesome lowering: if you 
first do a pass that inlines as many rules as possible, and then 
do a second pass that converts PEG elements into regular elements 
whenever possible, then I feel like the thing will be damned near 
optimal.  If you are wondering what these mathematical properties 
are, then I encourage you to look at this snippet where I define 
unitary and non-unitary expressions, for lack of prior terms:

http://pastebin.com/iYBypRc5

Another fun thought: PEGs can have look-behind that includes any 
regular elements without any additional algorithmic complexity.  
Just take all of the look-behinds in the grammar, mash them 
together into one big regular-expression using regular 
alternation (|), and then have the resulting automaton consume in 
lock-step with the PEG parser.  Whenever the PEG parser needs to 
do a lookbehind, it just checks to see if the companion automaton 
is in a matching state for the capture it needs.


*sigh*, I feel like I could write a paper on this stuff if I were 
in grad school right now.  Alas, I am stuck doing 50-60 hours a 
week of soul-sucking business programming.  Well, then again, my 
understanding is that even though I can think of things that seem 
like they would make interesting topics for publishable papers, 
reality would have the profs conscript me to do completely 
different things that are possibly just as inane 

  1   2   >