Re: D parsing
On Sunday, 15 November 2015 at 01:17:47 UTC, Bastiaan Veelo wrote: Left-recursion for Pegged is in the works: https://github.com/PhilippeSigaud/Pegged/pull/164 :-) And merged!: https://github.com/PhilippeSigaud/Pegged/pull/164 Would be nice to have an overview of which languages the, with this change, can be parsed by a Pegged grammar.
Re: D parsing
On Sunday, 3 November 2013 at 01:45:23 UTC, Timothee Cour wrote: 1) The main issue I see with pegged is PEG grammars don't support left recursion, so for example will fail on foo[1].bar(2).fun(). Unless there's a plan to accomodate those, I sense a dead end. One can eliminate left recursion but this has issues. 2) There is some material on extending PEG to support those, eg "Left Recursion in Parsing Expression Grammars", or code https://github.com/orlandohill/peg-left-recursion but I don't know how well they work in practice. Left-recursion for Pegged is in the works: https://github.com/PhilippeSigaud/Pegged/pull/164 :-)
Re: D Parsing (again)/ D grammar
On Friday, 3 October 2014 at 05:10:45 UTC, Philippe Sigaud via Digitalmars-d wrote: Yes, Elkhound is interesting, their approach is nice. But It gave me the impression to be abandoned for a few years? Don't know and don't care, really. All I know is that Scott sort of managed to deal with the main generalized parser application problems by avoiding them most of the time :)) May be a bad sign... After all, most of modern parser generators or parser combinators do not use GLRs, although they do sound interesting in theory. Something tells me one has to stress this word here: *theory* :-) By now, I have read the original Tomita's GLR paper, Antlr ALL(*) paper, a few recent GLR papers, three papers on GLL and a few related ones . It took... A while. I sort of understand the idea, but still not sure about the details :-) ALL(*) is on my todo list. I tried to implement it in Spring, but got bogged down in the details. Even the white paper has some imprecisions when you really sit down and try to code it. I could have a look at ANTLR v4 source, but wanted to have a clean start, right from the white paper. What's the name of the paper you read? Modelling GLL Parser Implementation? Yes. Scientists... The algorithm is hard to implement... Okay, let's invent an esoteric paper-only language to explain things to people :-) Thanks a lot, by the way! I've just skimmed through the code and the README... You did not use the packed forest representation, did you..?
Re: D Parsing (again)/ D grammar
Thanks a lot, by the way! I've just skimmed through the code and the README... You did not use the packed forest representation, did you..? Sorry for the microscopic documentation (Pegged is more documented than that...), it was a 'for me only' project. The forest is packed, in the sense that common nodes are re-used and shared among parse trees: all intermediate parse results from any grammar part is stored and used to produce the parse nodes. The range view gives access to parse trees one after another, but the global parse forest is present in the grammar object (or rather, generated and completed during the parse process: each new parse result completes the parse forest). It has a strange effect on parsing times repartition among the parse results: If you time the different parse trees, you'll see that the first one might take maybe 40% of the entire parsing time all by itself, because it has to discover all parse results alone. The following trees are very rapidly calculated, since the intermediate parse results are already known. Of course, once the parse trees begin to deviate from the first ones, the process slows down again since they have to explore as-yet-unused rules and input slices. I'm not sure the previous paragraph is clear... Imagine you have 10 different parse trees. They could be disributed like this: # parse result % global parse time 1 40% 2 2 3 3 4 3 5 5 6 6 7 8 8 10 9 11 1012
Re: D Parsing (again)/ D grammar
On Friday, 3 October 2014 at 16:20:29 UTC, Philippe Sigaud via Digitalmars-d wrote: Thanks a lot, by the way! I've just skimmed through the code and the README... You did not use the packed forest representation, did you..? Sorry for the microscopic documentation (Pegged is more documented than that...), it was a 'for me only' project. This is perfectly fine with me: I think I should be okay with the theory behind the code, and your style isn't cryptic. I'm not sure the previous paragraph is clear... I got the point. Imagine you have 10 different parse trees. They could be disributed like this: # parse result % global parse time 1 40% 2 2 3 3 4 3 5 5 6 6 7 8 8 10 9 11 1012 Interesting, indeed. Anyway, thank you very much for the code. The weekend is coming - I'll play with your implementation and see if there any improvements possible.
Re: D Parsing (again)/ D grammar
Anyway, thank you very much for the code. The weekend is coming - I'll play with your implementation and see if there any improvements possible. Be sure to keep me informed of any enormous mistake I made. I tried Appender and other concatenation means, without big success. Btw, I saw on the ML that using byKeys.front() is very slow. Use keys[0] of somesuch instead.
Re: D Parsing (again)/ D grammar
On Friday, 3 October 2014 at 17:24:43 UTC, Philippe Sigaud via Digitalmars-d wrote: Be sure to keep me informed of any enormous mistake I made. I tried Appender and other concatenation means, without big success. I am not sure if there are any. Maybe GLL just IS non-practical, after all. Right now only one thing is for sure: the generalized parsing fruit is not a low-hanging one. Yeap, I will let you know. This is sort of a personal challenge now :) Btw, I saw on the ML that using byKeys.front() is very slow. Use keys[0] of somesuch instead. Hm... Funny. Did you investigate into it..?
D Parsing (again)/ D grammar
Hello, everyone! Recently, I have been doing some experimental work related to generalized parser generators. It is not yet ready for a public release, I am not even sure it will ever be. Anyway, right now I am considering adding D as a target for parser generation. Actually, it's more than that: the grand target is to be able to generate a D language parser in D. With Python, for example, it is rather easy, as I can just mechanically transform the original grammar into a suitable form and be done with it. The generator itself is quite powerful, theoretically it should be able to handle all context-free grammar (see http://dotat.at/tmp/gll.pdf for theory). There is a grammar page on dlang.org (http://dlang.org/grammar.html). I have also found a related discussion in the forum (http://forum.dlang.org/thread/bwsofbnigfbrxwoui...@forum.dlang.org). From the discussion I found out that D parser is a hand-made RD-parser with a few tricks(c). So, my questions: 1. How realistic is the grammar? Is it a real one or just a sketch of sorts? Is it something developers use to build the parser? 2. There's also a D grammar for Pegged. How good is it? Can it be used as a starting point to build a parser?
Re: D Parsing (again)/ D grammar
V Thu, 02 Oct 2014 13:49:10 + Vladimir Kazanov via Digitalmars-d digitalmars-d@puremagic.com napsáno: Hello, everyone! Recently, I have been doing some experimental work related to generalized parser generators. It is not yet ready for a public release, I am not even sure it will ever be. Anyway, right now I am considering adding D as a target for parser generation. Actually, it's more than that: the grand target is to be able to generate a D language parser in D. With Python, for example, it is rather easy, as I can just mechanically transform the original grammar into a suitable form and be done with it. The generator itself is quite powerful, theoretically it should be able to handle all context-free grammar (see http://dotat.at/tmp/gll.pdf for theory). There is a grammar page on dlang.org (http://dlang.org/grammar.html). I have also found a related discussion in the forum (http://forum.dlang.org/thread/bwsofbnigfbrxwoui...@forum.dlang.org). From the discussion I found out that D parser is a hand-made RD-parser with a few tricks(c). So, my questions: 1. How realistic is the grammar? Is it a real one or just a sketch of sorts? Is it something developers use to build the parser? 2. There's also a D grammar for Pegged. How good is it? Can it be used as a starting point to build a parser? https://github.com/Hackerpilot/DGrammar
Re: D Parsing (again)/ D grammar
On Thursday, 2 October 2014 at 13:49:12 UTC, Vladimir Kazanov wrote: The generator itself is quite powerful, theoretically it should be able to handle all context-free grammar (see http://dotat.at/tmp/gll.pdf for theory). Cool, GLL is the way to go IMO, but I am also looking at Earley-parsers. What is the advantage of GLL over Earley if you use a parser generator? I think they both are O(3) or something like that? From the discussion I found out that D parser is a hand-made RD-parser with a few tricks(c). I think D is close to LL(2) for the most part. But I suppose a GLL parser could allow keywords to be used as symbol names in most cases? That would be nice.
Re: D Parsing (again)/ D grammar
On Thursday, 2 October 2014 at 14:08:09 UTC, Daniel Kozak via Digitalmars-d wrote: https://github.com/Hackerpilot/DGrammar Thank you, this definitely looks like something I was looking for. Any known pitfalls here..?
Re: D Parsing (again)/ D grammar
On Thursday, 2 October 2014 at 15:01:13 UTC, Ola Fosheim Grøstad wrote: Cool, GLL is the way to go IMO, but I am also looking at Earley-parsers. What is the advantage of GLL over Earley if you use a parser generator? I think they both are O(3) or something like that? They are somewhat similar in terms of asymptotic complexity on complicated examples. Constant is better though. But there's a nice property of all generalized parsers: for LL (for GLL) and LR (for GLR) parts of grammars they go almost as fast as LL/LR parsers do. On ambiguities they slow down, of course. There are four properties I really like: 1. GLL should be faster than Earley's (even the modern incarnations of it), but this is something I have yet to test. 2. It is fully general. 3. The automatically generated code repeats the original grammar structure - the same way recursive decent parsers do. 4. The core parser is still that simple LL/RD parser I can practically debug. This comes at a price, as usual... I would not call it obvious :-) But nobody can say that modern Earley's flavours are trivial. From the discussion I found out that D parser is a hand-made RD-parser with a few tricks(c). I think D is close to LL(2) for the most part. But I suppose a GLL parser could allow keywords to be used as symbol names in most cases? That would be nice. This is possible, I guess, the same way people do it in GLR parsers.
Re: D Parsing (again)/ D grammar
On Thursday, 2 October 2014 at 15:47:04 UTC, Vladimir Kazanov wrote: On Thursday, 2 October 2014 at 15:01:13 UTC, Ola Fosheim Grøstad wrote: Cool, GLL is the way to go IMO, but I am also looking at Earley-parsers. What is the advantage of GLL over Earley if you use a parser generator? I think they both are O(3) or something like that? They are somewhat similar in terms of asymptotic complexity on complicated examples. Constant is better though. But there's a nice property of all generalized parsers: for LL (for GLL) and LR (for GLR) parts of grammars they go almost as fast as LL/LR parsers do. On ambiguities they slow down, of course. There are four properties I really like: 1. GLL should be faster than Earley's (even the modern incarnations of it), but this is something I have yet to test. 2. It is fully general. 3. The automatically generated code repeats the original grammar structure - the same way recursive decent parsers do. 4. The core parser is still that simple LL/RD parser I can practically debug. This comes at a price, as usual... I would not call it obvious :-) But nobody can say that modern Earley's flavours are trivial. From the discussion I found out that D parser is a hand-made RD-parser with a few tricks(c). I think D is close to LL(2) for the most part. But I suppose a GLL parser could allow keywords to be used as symbol names in most cases? That would be nice. This is possible, I guess, the same way people do it in GLR parsers. What has steered you down the path of writing your own parser generator as opposed to using an existing one such as ANTLR? Were there properties you wanted that it didn't have, or performance, or...?
Re: D Parsing (again)/ D grammar
On Thursday, 2 October 2014 at 15:47:04 UTC, Vladimir Kazanov wrote: 3. The automatically generated code repeats the original grammar structure - the same way recursive decent parsers do. 4. The core parser is still that simple LL/RD parser I can practically debug. This sounds nice! Does this mean that it would be possible to use your parser generator to create a skeleton which is then manipulated manually or is there non-local complexities that makes manual edits risky? One reason I believe GLL is the right way to go is that I think RD makes it easier to generate good error messages suitable for display (to the end user).
Re: D Parsing (again)/ D grammar
On Thursday, 2 October 2014 at 17:17:53 UTC, Cliff wrote: What has steered you down the path of writing your own parser generator as opposed to using an existing one such as ANTLR? Were there properties you wanted that it didn't have, or performance, or...? Like I said in the introducing post, this is a personal experiment of sorts. I am aware of most alternatives, such as ANTLR's ALL(*) and many, MANY others. :) And I would never write something myself as a part of my full-time job. But right now I am writing an article on generalized parsers, toying with implementations I could lay my hands on, implementing others. GLL is a rather exotic LL flavor which looks attractive in theory. I want to see it in practice.
Re: D Parsing (again)/ D grammar
On Thursday, 2 October 2014 at 17:43:45 UTC, Vladimir Kazanov wrote: On Thursday, 2 October 2014 at 17:17:53 UTC, Cliff wrote: What has steered you down the path of writing your own parser generator as opposed to using an existing one such as ANTLR? Were there properties you wanted that it didn't have, or performance, or...? Like I said in the introducing post, this is a personal experiment of sorts. I am aware of most alternatives, such as ANTLR's ALL(*) and many, MANY others. :) And I would never write something myself as a part of my full-time job. But right now I am writing an article on generalized parsers, toying with implementations I could lay my hands on, implementing others. GLL is a rather exotic LL flavor which looks attractive in theory. I want to see it in practice. Very cool - post the GitHub or equivalent when you get the chance (assuming you are sharing). This is an area of interest for me as well.
Re: D Parsing (again)/ D grammar
On Thursday, 2 October 2014 at 17:39:55 UTC, Ola Fosheim Grøstad wrote: On Thursday, 2 October 2014 at 15:47:04 UTC, Vladimir Kazanov wrote: 3. The automatically generated code repeats the original grammar structure - the same way recursive decent parsers do. 4. The core parser is still that simple LL/RD parser I can practically debug. This sounds nice! Does this mean that it would be possible to use your parser generator to create a skeleton which is then manipulated manually or is there non-local complexities that makes manual edits risky? Chances are that I will be able to get the original GLL parser generator from one of algorithm authors (Adrian Johnstone). He's really helpful here. From that point, I will only have to add a frontend for generating a concrete parser, starting with Python - it already has a fully working grammar. Hopefully, I will also be able to find a suitable grammar for D, it is always a pleasure to play with the language. Otherwise, I will continue my current effort - implementing the GLL parser generator in D. Anyway, right now, from what I see in the papers, it looks like it is quite possible, yes. Again, requires a bit of understanding (nothing special, really), but compared to dumb LALR-style tables it's nothing. One reason I believe GLL is the right way to go is that I think RD makes it easier to generate good error messages suitable for display (to the end user). This is one of the reasons I prefer LL/RD parsers :-) They are easy to follow.
Re: D Parsing (again)/ D grammar
Chances are that I will be able to get the original GLL parser generator from one of algorithm authors (Adrian Johnstone). He's really helpful here. From that point, I will only have to add a frontend for generating a concrete parser, starting with Python - it already has a fully working grammar. Hopefully, I will also be able to find a suitable grammar for D, it is always a pleasure to play with the language. Otherwise, I will continue my current effort - implementing the GLL parser generator in D. I did that during this summer, almost to the point it was self-sustained (that is, the GLL parser generator was used to generate a parser for grammars files). I chose a range interface: input is a string, the parse forest is output as a range of parse trees. I loved to see it generate the different possible parse trees on ambiguous grammars, and accepting left- and right-recursing grammars! GLL is a very noce algorithm. Halas, even with some shortcuts on Kleene stars it was quite slow. I tried to use threads (spawning worker threads on alternatives), but that did not change the timings much. I could make it generate a parser for JSON and compared it to the jsonx module that Sönke presented here. Bah, it was 1000 times slower (which is all relative: it still takes only a few ms to parse a big JSON file. But still, far away from the microseconds it should need). Pegged was faster that this GLL generator, but of course still far slower than jsonx. [Of course, Pegged can cheat: it can parse your file at compile-time, resulting in 0-µs parsing time at runtime :-)] Also, the way I coded it I hit CTFE limits and could not have it parse at compile-time. A shame, really. This is one of the reasons I prefer LL/RD parsers :-) They are easy to follow. I would like to try a LALR compile-time generator and compile-time parser. I'm pretty sure LALR tables could be expanded directly as D code instead of being read during parsing. Philippe
Re: D Parsing (again)/ D grammar
On Thursday, 2 October 2014 at 18:20:36 UTC, Philippe Sigaud via Digitalmars-d wrote: I did that during this summer, almost to the point it was self-sustained (that is, the GLL parser generator was used to generate a parser for grammars files). I chose a range interface: input is a string, the parse forest is output as a range of parse trees. Nice! Is it public? Github? Halas, even with some shortcuts on Kleene stars it was quite slow. I tried to use threads (spawning worker threads on alternatives), but that did not change the timings much. AFAIK, multithreading is a bad idea in parsing. Actually, in the gll-combinators Scala project they have similar speed problems. I don't if it's a fundamental algorithm problem or an implementation details that lead to slowdowns.
Re: D Parsing (again)/ D grammar
I did that during this summer, almost to the point it was self-sustained (that is, the GLL parser generator was used to generate a parser for grammars files). I chose a range interface: input is a string, the parse forest is output as a range of parse trees. Nice! Is it public? Github? No github repo. I could put it alongside Pegged, I suppose. I used git internally, though. I was a bit disheartened by the speed I got, so did not publish nor announced it here. Note also that I saw it as an alternative engine for my own Pegged project, so I used the same way of defining grammars (some prefixes ans suffixes for dropping nodes in the parse tree and so on). I can send you the code (well the entire repo), if you wish. I did not touch it for the past 3 months, so I already don't remember what state it was in :-(. Looking at the code now, it seems I'm still using Pegged to parse the initial grammar. Bootstrapping did not go well. Send me an email at firstname . lastname @gmail.com (philippe sigaud) Halas, even with some shortcuts on Kleene stars it was quite slow. I tried to use threads (spawning worker threads on alternatives), but that did not change the timings much. AFAIK, multithreading is a bad idea in parsing. I think, as many things in CS, that people developed parsing algorithms before multicores existed. Spawning threads or fibers for some alternatives (A | B) seems nice to me. I got interesting results with a purely multithreaded parser that spawned children for each choice. Actually, in the gll-combinators Scala project they have similar speed problems. I don't if it's a fundamental algorithm problem or an implementation details that lead to slowdowns. Well, when all resulting implementations have speed problems... I found an article by the GLL authors explaining how they coded their algorithm for more speed, but I couldn't wrap my head around it. Philippe
Re: D Parsing (again)/ D grammar
On Thursday, 2 October 2014 at 20:28:47 UTC, Philippe Sigaud via Digitalmars-d wrote: I did that during this summer, almost to the point it was self-sustained (that is, the GLL parser generator was used to generate a parser for grammars files). I chose a range interface: input is a string, the parse forest is output as a range of parse trees. Nice! Is it public? Github? No github repo. I could put it alongside Pegged, I suppose. I used git internally, though. I was a bit disheartened by the speed I got, so did not publish nor announced it here. Note also that I saw it as an alternative engine for my own Pegged project, so I used the same way of defining grammars (some prefixes ans suffixes for dropping nodes in the parse tree and so on). I can send you the code (well the entire repo), if you wish. I did not touch it for the past 3 months, so I already don't remember what state it was in :-(. Looking at the code now, it seems I'm still using Pegged to parse the initial grammar. Bootstrapping did not go well. Send me an email at firstname . lastname @gmail.com (philippe sigaud) Check the mailbox, Thank you
Re: D Parsing (again)/ D grammar
On Thursday, 2 October 2014 at 20:28:47 UTC, Philippe Sigaud via Digitalmars-d No github repo. I could put it alongside Pegged, I suppose. I used git internally, though. I was a bit disheartened by the speed I got, so did not publish nor announced it here. I have a huge collection of projects I never published :-) We all do, I guess. I think, as many things in CS, that people developed parsing algorithms before multicores existed. Spawning threads or fibers for some alternatives (A | B) seems nice to me. I got interesting results with a purely multithreaded parser that spawned children for each choice. No, this is not exactly what I mean. Multithreading can be perfectly fine in some cases, very fruitful sometimes. Say, in NLP, when one has to process long or highly ambiguous strings, and the parse tree can become huge and is of importance in itself... Yes. In programming language parsing this is just a small step towards further stages within a longer pipeline. It has to be fast enough to make multithreading on this step an overkill. Well, when all resulting implementations have speed problems... Well... This is why I want to study the original implementation. I had read the story of red-black trees some time ago - it took a while to get it right, more to make it elegant. BTW, there's one an interesting related work that probably should be taken as a positive example of generalized parsing: the Elkhound parser generator. It uses a hybrid LALR/GLR approach, thus achieving better performance. There's much more to it, I didn't go too deep into it. I found an article by the GLL authors explaining how they coded their algorithm for more speed, but I couldn't wrap my head around it. By now, I have read the original Tomita's GLR paper, Antlr ALL(*) paper, a few recent GLR papers, three papers on GLL and a few related ones . It took... A while. I sort of understand the idea, but still not sure about the details :-) What's the name of the paper you read? Modelling GLL Parser Implementation?
Re: D Parsing (again)/ D grammar
On Thursday, 2 October 2014 at 18:06:04 UTC, Vladimir Kazanov wrote: Chances are that I will be able to get the original GLL parser generator from one of algorithm authors (Adrian Johnstone). He's really helpful here. From that point, I will only have to add a frontend for generating a concrete parser, starting with Python - it already has a fully working grammar. Hopefully, I will also be able to find a suitable grammar for D, it is always a pleasure to play with the language. Nice! If you manage to get a GLL generator for D (or C++) going I'd love to try it out. The only general parser generator I've tested so far is the Tomita-style GLR parser called DParser, but IIRC the documentation could need some improvement: http://sourceforge.net/projects/dparser/
Re: D Parsing (again)/ D grammar
On Thursday, 2 October 2014 at 21:26:15 UTC, Ola Fosheim Grøstad wrote: Nice! If you manage to get a GLL generator for D (or C++) going I'd love to try it out. You will definitely hear about it here :-) I don't know yet if the parser itself will be worth the trouble, but a more or less complete resulting blog post with tests/results/examples will definitely see the world. The only general parser generator I've tested so far is the Tomita-style GLR parser called DParser, but IIRC the documentation could need some improvement: http://sourceforge.net/projects/dparser/ Yes, I know this one. A scannerless GLR parser. GLRs are trendy these days :) Easy to find many more, there's even an Emacs mode using GLR for Ada parsing.
Re: D Parsing (again)/ D grammar
I have a huge collection of projects I never published :-) We all do, I guess. Oh, the ratio is more and 100 projects for one published... No, this is not exactly what I mean. Multithreading can be perfectly fine in some cases, very fruitful sometimes. Say, in NLP, when one has to process long or highly ambiguous strings, and the parse tree can become huge and is of importance in itself... Yes. In programming language parsing this is just a small step towards further stages within a longer pipeline. It has to be fast enough to make multithreading on this step an overkill. I don't know. Using fibers, I'm hoping to get interesting results one day. I got it by a workstorm before trying fibers. OS threads were a bit to heavy and didn't work that well. BTW, there's one an interesting related work that probably should be taken as a positive example of generalized parsing: the Elkhound parser generator. It uses a hybrid LALR/GLR approach, thus achieving better performance. There's much more to it, I didn't go too deep into it. Yes, Elkhound is interesting, their approach is nice. But It gave me the impression to be abandoned for a few years? I found an article by the GLL authors explaining how they coded their algorithm for more speed, but I couldn't wrap my head around it. By now, I have read the original Tomita's GLR paper, Antlr ALL(*) paper, a few recent GLR papers, three papers on GLL and a few related ones . It took... A while. I sort of understand the idea, but still not sure about the details :-) ALL(*) is on my todo list. I tried to implement it in Spring, but got bogged down in the details. Even the white paper has some imprecisions when you really sit down and try to code it. I could have a look at ANTLR v4 source, but wanted to have a clean start, right from the white paper. What's the name of the paper you read? Modelling GLL Parser Implementation? Yes.
Re: D Parsing (again)/ D grammar
On Thu, Oct 2, 2014 at 11:19 PM, Vladimir Kazanov via Digitalmars-d digitalmars-d@puremagic.com wrote: Check the mailbox, Thank you I sent it to you. I was asleep, sorry :-)
Re: D parsing
On 11/12/2013 09:14 AM, Andrei Alexandrescu wrote: On 11/12/13 12:09 AM, Jacob Carlborg wrote: ... An argument for macros would have to do a lot more than 'we don't need __FILE__ etc. anymore' ... It's very simple. Timon got into that part as if it were important. I pointed out it's not important. .. This is a misrepresentation. I'll just answer that post.
Re: D parsing
On 11/11/2013 12:37 AM, Andrei Alexandrescu wrote: On 11/10/13 3:21 PM, Timon Gehr wrote: On 11/10/2013 11:43 PM, Andrei Alexandrescu wrote: It seems we could even get rid of __FILE__,__LINE__,__MODULE__ with AST macros. This would be a very small advantage. The special variables hardly cause any trouble. Andrei The trouble is that there are too few of them to (efficiently) reflect all the information about the calling context one may be interested in. I don't see the point in adding them one after another in an unprincipled ad-hoc fashion instead of actually thinking up a good API coupled with some more expressive power. But the API would add them in exactly the same rote manner. No, it would be (close to) exhaustive from the beginning. That is the point. I assume the problem is that there have not yet been very specific proposals. I really don't see an advantage here. It's the difference between having an universal language, and a language that has a finite and fixed set of words (without any methods of combination). An argument for macros would have to do a lot more than we don't need __FILE__ etc. anymore. ... Obviously, but I was not making this argument, and as far as I can see nobody else was either.
Re: D parsing
On 2013-11-12 08:52, Andrei Alexandrescu wrote: Fine, although a sense of futility is hard to shake seeing as we won't replace those existing features. I think a much stronger point would be made if the power of the feature were demonstrated on problems not accessible with the existing ones. You just said we shouldn't replace existing features. The point here is that there is significant difficulty to remove features that already exist http://forum.dlang.org/thread/bwsofbnigfbrxwoui...@forum.dlang.org?page=9#post-l5s44b:242c36:241:40digitalmars.com About DIP 50: I will say no but please do not take it personally. It's great there is discussion about this, and I encourage it, but at this time I think we should make no plans to add macros to D. I don't think we should add macros now either. This proposal is far from ready. If Martin hadn't suggested I should create a DIP, I wouldn't have, at least now at this time. BTW, just saying no doesn't help a bit. You could just have said foo. That's extremely annoying. You're shooting down very many suggestions/proposal/ideas with just a no, or the more elaborated answer no, never going to happen. On the other hand when you have a proposal it should be consider pre-approved and is a more of a FYI. -- /Jacob Carlborg
Re: D parsing
On 11/11/13 11:50 PM, Jacob Carlborg wrote: On 2013-11-12 03:35, Andrei Alexandrescu wrote: So... there is rote addition to the context.caller structure. It's just spelled differently. No? Have you seen the other posts? Yes. Don't assume whoever disagrees with you is misinformed. Have even read the proposal? Yes. If you think that I just want a different syntax for __LINE__ and __FILE__ you're way, way off. That's just an extremely small part of the proposal. __LINE__ and __FILE__ would not be removed. This is putting words in my mouth. Andrei
Re: D parsing
On 2013-11-12 09:00, Andrei Alexandrescu wrote: Yes. Don't assume whoever disagrees with you is misinformed. I absolutely do not. But when I read this: An argument for macros would have to do a lot more than 'we don't need __FILE__ etc. anymore' I find it hard to believe anything else. http://forum.dlang.org/thread/bwsofbnigfbrxwoui...@forum.dlang.org?page=8#post-l5p5b8:241kju:241:40digitalmars.com -- /Jacob Carlborg
Re: D parsing
On 11/12/13 12:09 AM, Jacob Carlborg wrote: On 2013-11-12 09:00, Andrei Alexandrescu wrote: Yes. Don't assume whoever disagrees with you is misinformed. I absolutely do not. But when I read this: An argument for macros would have to do a lot more than 'we don't need __FILE__ etc. anymore' I find it hard to believe anything else. http://forum.dlang.org/thread/bwsofbnigfbrxwoui...@forum.dlang.org?page=8#post-l5p5b8:241kju:241:40digitalmars.com It's very simple. Timon got into that part as if it were important. I pointed out it's not important. Andrei
Re: D parsing
On 11/12/13 12:02 AM, Jacob Carlborg wrote: On 2013-11-12 08:52, Andrei Alexandrescu wrote: Fine, although a sense of futility is hard to shake seeing as we won't replace those existing features. I think a much stronger point would be made if the power of the feature were demonstrated on problems not accessible with the existing ones. You just said we shouldn't replace existing features. The point here is that there is significant difficulty to remove features that already exist http://forum.dlang.org/thread/bwsofbnigfbrxwoui...@forum.dlang.org?page=9#post-l5s44b:242c36:241:40digitalmars.com Yes. So I said. I don't get why you'd provide a link - it's in my text that you quote. Indeed, we shouldn't replace existing features. About DIP 50: I will say no but please do not take it personally. It's great there is discussion about this, and I encourage it, but at this time I think we should make no plans to add macros to D. I don't think we should add macros now either. This proposal is far from ready. If Martin hadn't suggested I should create a DIP, I wouldn't have, at least now at this time. Fine. BTW, just saying no doesn't help a bit. You could just have said foo. That's extremely annoying. You're shooting down very many suggestions/proposal/ideas with just a no, or the more elaborated answer no, never going to happen. On the other hand when you have a proposal it should be consider pre-approved and is a more of a FYI. So how could we express a no that doesn't annoy you in the extreme? In case the answer would be you haven't explained why, allow me to retort. I've mentioned the argument before: at this point we should focus on quality of implementation and making good use of the features we have. In fact I am repeating myself: http://goo.gl/1thq1j. As has been publicly known for a while now, our strategy has been to improve quality and to double down on the assets we have. People ask for a roadmap, and what's missing from a roadmap is as important as what's there. This is a strategy that Walter and I agree with, have been transparent about, and that may work or not, with various degrees of success. Reasonable people may disagree what the best step moving forward should be, but at some point some step must be made and we can't adopt your strategy, with which we disagree, as our strategy, just to be nice and not offend your sensibility. (I'm using we here because Walter and I discussed this at large.) There must be a way to say no that doesn't offend you. Please advise what that is. Andrei
Re: D parsing
On 2013-11-12 09:14, Andrei Alexandrescu wrote: It's very simple. Timon got into that part as if it were important. I pointed out it's not important. Ok, fair enough. I'm sorry for wrongly interpreting it. -- /Jacob Carlborg
Re: D parsing
On 2013-11-12 09:13, Andrei Alexandrescu wrote: So how could we express a no that doesn't annoy you in the extreme? In case the answer would be you haven't explained why, allow me to retort. I've mentioned the argument before: at this point we should focus on quality of implementation and making good use of the features we have. In fact I am repeating myself: http://goo.gl/1thq1j. As has been publicly known for a while now, our strategy has been to improve quality and to double down on the assets we have. People ask for a roadmap, and what's missing from a roadmap is as important as what's there. This is a strategy that Walter and I agree with, have been transparent about, and that may work or not, with various degrees of success. Reasonable people may disagree what the best step moving forward should be, but at some point some step must be made and we can't adopt your strategy, with which we disagree, as our strategy, just to be nice and not offend your sensibility. (I'm using we here because Walter and I discussed this at large.) There must be a way to say no that doesn't offend you. Please advise what that is. Thank you for the explanation. I do understand your and Walter's position in this. Just giving a short reason, to accompany the no with helps a lot. It doesn't need to be as long as the explanation above, just a sentence, like: no, at this time we don't want to make such a big change, we're trying to stabilize. This is not just for me. I'm hoping proposal from others also can get a fair reason to why the no. -- /Jacob Carlborg
Re: D parsing
On 11/12/13 12:31 AM, Jacob Carlborg wrote: Just giving a short reason, to accompany the no with helps a lot. It doesn't need to be as long as the explanation above, just a sentence, like: no, at this time we don't want to make such a big change, we're trying to stabilize. This is not just for me. I'm hoping proposal from others also can get a fair reason to why the no. OK, thanks. I'll do my best to improve on that in the future. Andrei
Re: D parsing
On 2013-11-12 09:37, Andrei Alexandrescu wrote: OK, thanks. I'll do my best to improve on that in the future. Thanks again. Sorry for being frustrated. -- /Jacob Carlborg
Re: D parsing
On 11/4/2013 11:27 PM, Brian Schott wrote: Seriously. We don't have a real grammar for D. We have the language spec on dlang.org, but it isn't complete, consistent, or updated when the language changes. Want examples? I have a tracker for them here: http://d.puremagic.com/issues/show_bug.cgi?id=10233 Thank you, Brian, for finding and reporting these issues.
Re: D parsing
On 2013-11-10 23:49, Andrei Alexandrescu wrote: The way I see this is, these preexisting features (which shan't be thrown away) diminish the motivation for adding another feature. I think the opposite. It shows how powerful macros are and that they will most likely be able to implement other new features that haven't been proposed yet. Adding a bunch of new features to the language instead of taking a step back and thinking if something else can be added that implement these features and many more shows bad judgement. -- /Jacob Carlborg
Re: D parsing
On 2013-11-11 00:37, Andrei Alexandrescu wrote: But the API would add them in exactly the same rote manner. I really don't see an advantage here. An argument for macros would have to do a lot more than we don't need __FILE__ etc. anymore. They do, and several examples have already been shown. I'll also add more as I come up with them. -- /Jacob Carlborg
Re: D parsing
On 2013-11-11 02:54, Andrei Alexandrescu wrote: What do you have in mind? I find it difficult to grab e.g. the file and the line unless there's some plumbing in the language that allows you to. The language just need to provide a way to introspect the AST. If we're talking about a specific API I can think of something like this: macro foo (Context context, Ast!(string) str) { auto line = context.caller.line; auto file = context.caller.file; } foo(asd); -- /Jacob Carlborg
Re: D parsing
On 11/10/13 11:55 PM, Jacob Carlborg wrote: On 2013-11-10 23:49, Andrei Alexandrescu wrote: The way I see this is, these preexisting features (which shan't be thrown away) diminish the motivation for adding another feature. I think the opposite. It shows how powerful macros are and that they will most likely be able to implement other new features that haven't been proposed yet. Adding a bunch of new features to the language instead of taking a step back and thinking if something else can be added that implement these features and many more shows bad judgement. No need to be snarky about it. The point here is that there is significant difficulty to remove features that already exist and are useful, which changes the game considerably. Andrei
Re: D parsing
On 11/11/13 12:00 AM, Jacob Carlborg wrote: On 2013-11-11 02:54, Andrei Alexandrescu wrote: What do you have in mind? I find it difficult to grab e.g. the file and the line unless there's some plumbing in the language that allows you to. The language just need to provide a way to introspect the AST. If we're talking about a specific API I can think of something like this: macro foo (Context context, Ast!(string) str) { auto line = context.caller.line; auto file = context.caller.file; } foo(asd); So... there is rote addition to the context.caller structure. It's just spelled differently. No? Andrei
Re: D parsing
On Tuesday, 12 November 2013 at 02:34:52 UTC, Andrei Alexandrescu wrote: No need to be snarky about it. The point here is that there is significant difficulty to remove features that already exist and are useful, which changes the game considerably. Andrei That is true. However, please understand that this is extremely frustrating to have a lot of gadget features that solve a specific issue when much more generic solutions exists, and when these gadgets actually makes it more difficult to introduce the proper features. If we put aside the AST macro thing (granted we have a swiss army knife of gadgets for many of its typical uses cases), several other skeleton don't want to stay in the closet. If I has to pick 2: tail qualified type parameter are one. It is impossible to have a use defined type that behave as slices/pointers. That is showstopper for a proper collection library. I have only ugly solution to propose to that now. Lately, I was plying with thing along the lines of struct S[T] { auto foo() const[T] { ... } } I'm afraid that what already exist in that area makes things quite difficult. The lack of qualifier for ownership, which makes concurrency and interface between manual memory management and GC tedious. For that one, isolated seems the way to go. These same issue continue to pop up in many different forms (for instance, see Kenji's DIP, where he is trying to hack around some gadget to unique postblit work).
Re: D parsing
On 11/11/13 9:11 PM, deadalnix wrote: On Tuesday, 12 November 2013 at 02:34:52 UTC, Andrei Alexandrescu wrote: No need to be snarky about it. The point here is that there is significant difficulty to remove features that already exist and are useful, which changes the game considerably. Andrei That is true. However, please understand that this is extremely frustrating to have a lot of gadget features that solve a specific issue when much more generic solutions exists, and when these gadgets actually makes it more difficult to introduce the proper features. Frustration is inappropriate. Probably bummed would be more fit for the situation. (Which I am not, but that's a different story; I personally don't think macros are all they're cracked to be.) We did not have a macro-based solution when D was being invented, and I don't think there's a lot of glory in seeing patterns for doing things differently after the language has been in use for a while. There's a whole water under the bridge between now and then. If we put aside the AST macro thing (granted we have a swiss army knife of gadgets for many of its typical uses cases), several other skeleton don't want to stay in the closet. If I has to pick 2: tail qualified type parameter are one. It is impossible to have a use defined type that behave as slices/pointers. That is showstopper for a proper collection library. I have only ugly solution to propose to that now. Lately, I was plying with thing along the lines of struct S[T] { auto foo() const[T] { ... } } I'm afraid that what already exist in that area makes things quite difficult. The lack of qualifier for ownership, which makes concurrency and interface between manual memory management and GC tedious. For that one, isolated seems the way to go. These same issue continue to pop up in many different forms (for instance, see Kenji's DIP, where he is trying to hack around some gadget to unique postblit work). Our goals for the time being are to improve stability, close gaps in the type system, and make good use of features in code. To the extent we can't do what we want to do, the topics you mention are fit. Andrei
Re: D parsing
On Tuesday, 12 November 2013 at 05:58:02 UTC, Andrei Alexandrescu wrote: If I has to pick 2: tail qualified type parameter are one. It is impossible to have a use defined type that behave as slices/pointers. That is showstopper for a proper collection library. I have only ugly solution to propose to that now. Lately, I was plying with thing along the lines of struct S[T] { auto foo() const[T] { ... } } I'm afraid that what already exist in that area makes things quite difficult. The lack of qualifier for ownership, which makes concurrency and interface between manual memory management and GC tedious. For that one, isolated seems the way to go. These same issue continue to pop up in many different forms (for instance, see Kenji's DIP, where he is trying to hack around some gadget to unique postblit work). Our goals for the time being are to improve stability, close gaps in the type system, and make good use of features in code. To the extent we can't do what we want to do, the topics you mention are fit. Andrei Yes, as much as I'd like to have macro, I don't think this is the right time now. Closing the gaps is more important.
Re: D parsing
On 2013-11-12 03:34, Andrei Alexandrescu wrote: No need to be snarky about it. The point here is that there is significant difficulty to remove features that already exist and are useful, which changes the game considerably. I'm really trying to be as humble as I can but you're not making it easy. I have not said we should remove any of the existing features. I'm trying to show that several already existing features could have been implemented using macros if they existed early in the development. It should show that it is a powerful feature and can hopefully be used to solve new ides with library code instead changing the languages. -- /Jacob Carlborg
Re: D parsing
On 11/11/13 11:46 PM, Jacob Carlborg wrote: On 2013-11-12 03:34, Andrei Alexandrescu wrote: No need to be snarky about it. The point here is that there is significant difficulty to remove features that already exist and are useful, which changes the game considerably. I'm really trying to be as humble as I can but you're not making it easy. I have not said we should remove any of the existing features. I'm trying to show that several already existing features could have been implemented using macros if they existed early in the development. It should show that it is a powerful feature and can hopefully be used to solve new ides with library code instead changing the languages. Fine, although a sense of futility is hard to shake seeing as we won't replace those existing features. I think a much stronger point would be made if the power of the feature were demonstrated on problems not accessible with the existing ones. About DIP 50: I will say no but please do not take it personally. It's great there is discussion about this, and I encourage it, but at this time I think we should make no plans to add macros to D. Andrei
Re: D parsing
On 2013-11-12 03:35, Andrei Alexandrescu wrote: So... there is rote addition to the context.caller structure. It's just spelled differently. No? Have you seen the other posts? Have even read the proposal? If you think that I just want a different syntax for __LINE__ and __FILE__ you're way, way off. That's just an extremely small part of the proposal. __LINE__ and __FILE__ would not be removed. -- /Jacob Carlborg
Re: D parsing
On Thursday, 7 November 2013 at 13:50:57 UTC, Jacob Carlborg wrote: https://dl.dropboxusercontent.com/u/18386187/ast_macros.html Can you pour that into a DIP?
Re: D parsing
On 2013-11-10 17:37, Martin Nowak wrote: Can you pour that into a DIP? Absolutely, I've been meaning to do that. -- /Jacob Carlborg
Re: D parsing
On Sunday, 10 November 2013 at 17:58:35 UTC, Jacob Carlborg wrote: On 2013-11-10 17:37, Martin Nowak wrote: Can you pour that into a DIP? Absolutely, I've been meaning to do that. It require a bit more precision. But as you know I really behind the idea.
Re: D parsing
On 11/10/13 9:58 AM, Jacob Carlborg wrote: On 2013-11-10 17:37, Martin Nowak wrote: Can you pour that into a DIP? Absolutely, I've been meaning to do that. Just a thought - I think the best strategy for the time being is to explore gainful use of the features already present, instead of adding new features. Andrei
Re: D parsing
On Sunday, 10 November 2013 at 18:31:32 UTC, Andrei Alexandrescu wrote: On 11/10/13 9:58 AM, Jacob Carlborg wrote: On 2013-11-10 17:37, Martin Nowak wrote: Can you pour that into a DIP? Absolutely, I've been meaning to do that. Just a thought - I think the best strategy for the time being is to explore gainful use of the features already present, instead of adding new features. Andrei It is also a given that many feature could have been avoided in the first place given that one. Still I'd rather see thing settle down a little before introducing this. Fix the existing things, then introduce new stuff seems like the proper way to go.
Re: D parsing
On 2013-11-10 19:44, deadalnix wrote: It is also a given that many feature could have been avoided in the first place given that one. I agree. It seems at least a couple of features could have been implemented using AST macros instead. -- /Jacob Carlborg
Re: D parsing
On Sun, Nov 10, 2013 at 12:27 PM, Jacob Carlborg d...@me.com wrote: On 2013-11-10 19:44, deadalnix wrote: It is also a given that many feature could have been avoided in the first place given that one. I agree. It seems at least a couple of features could have been implemented using AST macros instead. -- /Jacob Carlborg I agree; eg, just to name 2 features that I proposed that could be simply implemented via AST macros as Jacob said: * [proposal: a new string litteral to embed variables in a stringhttp://forum.dlang.org/post/mailman.94.1383254681.9546.digitalmar...@puremagic.com http://forum.dlang.org/post/mailman.94.1383254681.9546.digitalmar...@puremagic.com ] * [Re: feature request: __ARGS__ for logging (cf __FILE__, __LINE__, __FUNC___ http://forum.dlang.org/post/kegmp0$30dn$1...@digitalmars.com] It seems we could even get rid of __FILE__,__LINE__,__MODULE__ with AST macros.
Re: D parsing
On 2013-11-10 21:38, Timothee Cour wrote: I agree; eg, just to name 2 features that I proposed that could be simply implemented via AST macros as Jacob said: * [proposal: a new string litteral to embed variables in a string http://forum.dlang.org/post/mailman.94.1383254681.9546.digitalmar...@puremagic.com http://forum.dlang.org/post/mailman.94.1383254681.9546.digitalmar...@puremagic.com] * [Re: feature request: __ARGS__ for logging (cf __FILE__, __LINE__, __FUNC___ http://forum.dlang.org/post/kegmp0$30dn$1...@digitalmars.com] It seems we could even get rid of __FILE__,__LINE__,__MODULE__ with AST macros. Let see, what else. These could, most likely, have been implemented using AST macros: * scope * foreach * for * inlining delegates. I think the current solution to pass a delegate/lambda as an alias parameter is a bit weird One of my favorites that is not already in the language is implementing database queries: Person.where(e = e.name == John); Will be translate to the following SQL: select * from person where name = 'John' Another one is the property shortcut, see the example for attribute macros: http://wiki.dlang.org/DIP50#Attribute_macros -- /Jacob Carlborg
Re: D parsing
On 2013-11-10 17:37, Martin Nowak wrote: Can you pour that into a DIP? Done: http://forum.dlang.org/thread/l5otb1$1dhi$1...@digitalmars.com -- /Jacob Carlborg
Re: D parsing
On 11/10/13 12:38 PM, Timothee Cour wrote: On Sun, Nov 10, 2013 at 12:27 PM, Jacob Carlborg d...@me.com mailto:d...@me.com wrote: On 2013-11-10 19:44, deadalnix wrote: It is also a given that many feature could have been avoided in the first place given that one. I agree. It seems at least a couple of features could have been implemented using AST macros instead. -- /Jacob Carlborg I agree; eg, just to name 2 features that I proposed that could be simply implemented via AST macros as Jacob said: * [proposal: a new string litteral to embed variables in a string http://forum.dlang.org/post/mailman.94.1383254681.9546.digitalmar...@puremagic.com http://forum.dlang.org/post/mailman.94.1383254681.9546.digitalmar...@puremagic.com] * [Re: feature request: __ARGS__ for logging (cf __FILE__, __LINE__, __FUNC___ http://forum.dlang.org/post/kegmp0$30dn$1...@digitalmars.com] It seems we could even get rid of __FILE__,__LINE__,__MODULE__ with AST macros. This would be a very small advantage. The special variables hardly cause any trouble. Andrei
Re: D parsing
On 11/10/13 1:17 PM, Jacob Carlborg wrote: On 2013-11-10 21:38, Timothee Cour wrote: I agree; eg, just to name 2 features that I proposed that could be simply implemented via AST macros as Jacob said: * [proposal: a new string litteral to embed variables in a string http://forum.dlang.org/post/mailman.94.1383254681.9546.digitalmar...@puremagic.com http://forum.dlang.org/post/mailman.94.1383254681.9546.digitalmar...@puremagic.com] * [Re: feature request: __ARGS__ for logging (cf __FILE__, __LINE__, __FUNC___ http://forum.dlang.org/post/kegmp0$30dn$1...@digitalmars.com] It seems we could even get rid of __FILE__,__LINE__,__MODULE__ with AST macros. Let see, what else. These could, most likely, have been implemented using AST macros: * scope * foreach * for * inlining delegates. I think the current solution to pass a delegate/lambda as an alias parameter is a bit weird One of my favorites that is not already in the language is implementing database queries: Person.where(e = e.name == John); Will be translate to the following SQL: select * from person where name = 'John' Another one is the property shortcut, see the example for attribute macros: http://wiki.dlang.org/DIP50#Attribute_macros The way I see this is, these preexisting features (which shan't be thrown away) diminish the motivation for adding another feature. Andrei
Re: D parsing
On 11/10/2013 11:43 PM, Andrei Alexandrescu wrote: It seems we could even get rid of __FILE__,__LINE__,__MODULE__ with AST macros. This would be a very small advantage. The special variables hardly cause any trouble. Andrei The trouble is that there are too few of them to (efficiently) reflect all the information about the calling context one may be interested in. I don't see the point in adding them one after another in an unprincipled ad-hoc fashion instead of actually thinking up a good API coupled with some more expressive power.
Re: D parsing
On 11/10/13 3:21 PM, Timon Gehr wrote: On 11/10/2013 11:43 PM, Andrei Alexandrescu wrote: It seems we could even get rid of __FILE__,__LINE__,__MODULE__ with AST macros. This would be a very small advantage. The special variables hardly cause any trouble. Andrei The trouble is that there are too few of them to (efficiently) reflect all the information about the calling context one may be interested in. I don't see the point in adding them one after another in an unprincipled ad-hoc fashion instead of actually thinking up a good API coupled with some more expressive power. But the API would add them in exactly the same rote manner. I really don't see an advantage here. An argument for macros would have to do a lot more than we don't need __FILE__ etc. anymore. Andrei
Re: D parsing
On Sun, Nov 10, 2013 at 3:37 PM, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 11/10/13 3:21 PM, Timon Gehr wrote: On 11/10/2013 11:43 PM, Andrei Alexandrescu wrote: It seems we could even get rid of __FILE__,__LINE__,__MODULE__ with AST macros. This would be a very small advantage. The special variables hardly cause any trouble. Andrei The trouble is that there are too few of them to (efficiently) reflect all the information about the calling context one may be interested in. I don't see the point in adding them one after another in an unprincipled ad-hoc fashion instead of actually thinking up a good API coupled with some more expressive power. But the API would add them in exactly the same rote manner. I really don't see an advantage here. All else being equal, a library solution is always preferable; and it's easily extensible unlike what we have. An argument for macros would have to do a lot more than we don't need __FILE__ etc. anymore. More use cases have already been provided in this thread and previously. The fact that *one* language feature enables all of those is a very good indication there'll be more use cases. Furthermore, writing AST macros instead of mixins is cleaner in terms of introspection, refactoring tools, etc. Andrei
Re: D parsing
On 11/10/13 5:03 PM, Timothee Cour wrote: On Sun, Nov 10, 2013 at 3:37 PM, Andrei Alexandrescu seewebsiteforem...@erdani.org mailto:seewebsiteforem...@erdani.org wrote: On 11/10/13 3:21 PM, Timon Gehr wrote: On 11/10/2013 11:43 PM, Andrei Alexandrescu wrote: It seems we could even get rid of __FILE__,__LINE__,__MODULE__ with AST macros. This would be a very small advantage. The special variables hardly cause any trouble. Andrei The trouble is that there are too few of them to (efficiently) reflect all the information about the calling context one may be interested in. I don't see the point in adding them one after another in an unprincipled ad-hoc fashion instead of actually thinking up a good API coupled with some more expressive power. But the API would add them in exactly the same rote manner. I really don't see an advantage here. All else being equal, a library solution is always preferable; and it's easily extensible unlike what we have. What do you have in mind? I find it difficult to grab e.g. the file and the line unless there's some plumbing in the language that allows you to. An argument for macros would have to do a lot more than we don't need __FILE__ etc. anymore. More use cases have already been provided in this thread and previously. The fact that *one* language feature enables all of those is a very good indication there'll be more use cases. Furthermore, writing AST macros instead of mixins is cleaner in terms of introspection, refactoring tools, etc. I am aware of the advantages. You, too, should be aware of the disadvantages. Andrei
Re: D parsing
On 2013-11-06 09:34, Chad Joan wrote: Also, IIRC, it is believed that string mixins with CTFE are potentially more powerful. I am under the assumption that Walter is taking the long view and waiting for the community to furnish their own powerful AST manipulation tools using the existing spec. I suspect that he is opposed to baking AST manipulation into the /language spec/, but is perfectly accepting of the notion of using AST manipulation to generate code, reduce boilerplate, implement exotic features and DSLs, and so on. Just don't complicate the core language any more than it already is. Sorry if I misrepresented you Walter; I can only make educated guesses ;) The problem with this is the ugly syntax, it requires a string mixin and the source code needs to be embedded in strings. You also need a complete front end that works at compile time. This is my vision of AST macros: https://dl.dropboxusercontent.com/u/18386187/ast_macros.html -- /Jacob Carlborg
Re: D parsing
On 2013-11-07 00:23, Philippe Sigaud wrote: Well, I remember him saying that he doesn't want everybody and their dog to change the D syntax. If we get macros, that's exactly what we can do. For example, telling the compiler (or another tool), to rewrite int a := 1; into immutable a = 1; It depends on what the AST macros allows you to do. I guess everyone here has her/his own definition. But that means the original not-D code cannot be compiled by anyone with a D compiler anymore (obviously). And soon everybody is transforming the grammar according to what they think is cool. That's his PoV IIRC. I'm more interested in getting to ability to define my own lowerings: in the same way that scope statements or foreach over ranges are rewritten into lower-level D code by the comiler, I would like to be able to define my own syntactic constructs. My idea of AST macros[1] is quite similar to those in Scala and a bit similar to what you want. The most important thing to remember is that AST macros cannot create new syntax. It only works at the semantic level of the language. That means the above example you wrote doesn't not work, since that isn't legal D syntax. What would work is doing something like this: auto a = Person.where(e = e.name == John); In this case where is a macro. Which translates the given AST into the following SQL query: select * from person where name = 'John' Another thing I would really like is to be able to pass a delegate to a function after argument list, if the function takes a delegate as its last parameter: void each (T) (T[] arr, void delegate (T) dg); [1, 2, 3].each(e) { writeln(e); } Of courses, the compiler needs to be able to inline the delegate. Plus, everybody here know that AST macros can cure world hunger :) Wouldn't that be nice? Of course :) [1] https://dl.dropboxusercontent.com/u/18386187/ast_macros.html -- /Jacob Carlborg
Re: D parsing
On Tuesday, 5 November 2013 at 14:54:34 UTC, Dmitry Olshansky wrote: I was also toying with the idea of exposing Builder interface for std.regex. But push/pop IMHO are better be implicitly designed-out: auto re = atom('x').star(charClass(unicode.Letter),atom('y')).build(); ... and letting the nesting be explicit. Is the same as: auto re = regex(`x(?:\p{L}y)*`); Aimed for apps/libs that build regular expressions anyway and have no need in textual parser. Interesting. I like how it induces some amount of static verification, though I worry that it could harm procedural generation of grammars. It would be difficult, for instance, to use that API to do the equivalent of pushing an atom in one function and popping it in another. I wonder if we are at different levels of abstraction. The example you give seems like it requires the API to remember, in a structured way, all of the information presented by the call-chaining and call-nesting. I might implement something like that with a stateful builder object under the hood. However, the example you give seems like it is closer to what a regex engine would morph an expression into, thus making it a higher abstraction. That snippet would create a parser that recognizes the grammar 'x' ( 'y'? ). The current fledgling implementation creates this parser: http://pastebin.com/MgSqWXE2 Of course, no one would be expected to write grammars like that. It would be the job of tools like Pegged or std.regex to package it up in nice syntax that is easy to use. I thought to provide some building blocks for that with new std.uni. Not quite everything I wanted, but now at least there is one set of wheels less to reinvent. I haven't looked at std.uni earnestly yet, but if it succeeds at making that unicode/utf jungle manageable, then I will be incredibly thankful. [snip] Another fun thought: PEGs can have look-behind that includes any regular elements without any additional algorithmic complexity. Just take all of the look-behinds in the grammar, mash them together into one big regular-expression using regular alternation (|), and then have the resulting automaton consume in lock-step with the PEG parser. Whenever the PEG parser needs to do a lookbehind, it just checks to see if the companion automaton is in a matching state for the capture it needs. Sounds quite slow to do it just in case. Also complete DFAs tend to be mm quite big. I was envisioning it being done lazily and strictly as-needed, if I even got around to doing it at all. What ANTLR does is similar technique - a regular lookahead to resolve ambiguity in the grammar (implicitly). A lot like LL(k) but with unlimited length (so called LL(*)). Of course, it generates LL(k) disambiguation where possible, then LL(*), failing that the usual backtracking. Neat. *sigh*, I feel like I could write a paper on this stuff if I were in grad school right now. Alas, I am stuck doing 50-60 hours a week of soul-sucking business programming. I heard Sociomantic is hiring D programmers for coding some awesome stuff, you may as well apply :) Tempting. And it seems even Facebook is joining the market now, which is news to me. Well, then again, my understanding is that even though I can think of things that seem like they would make interesting topics for publishable papers, reality would have the profs conscript me to do completely different things that are possibly just as inane as the business programming. Speaking for my limited experience - at times it's like that. Yay, someone's observations corroborate mine! ... Crap, someone observations corroborate mine :( ;) I worry that the greater threat to good AST manipulation tools in D is a lack of free time, and not the DMD bugs as much. Good for you I guess, my developments in related area are blocked still :( Well, hopefully you've got the wind at your back now.
Re: D parsing
On 2013-11-05 17:55, Philippe Sigaud wrote: Walter is far from convinced that AST manipulation is a good thing. You would have to convince him first. His fear is that it will lead to unreadable code, and everyone using her own personnal version of D. AFAICT, nothing of the sort happened in Lisp (I mean, Lispers have balkanization, but *not* due to AST manipulation). You know, Walter did a talk about AST macros at the first D conference. The idea back then was to add AST macros, hence the macro keyword. -- /Jacob Carlborg
Re: D parsing
On Tuesday, 5 November 2013 at 16:31:52 UTC, Philippe Sigaud wrote: On Tue, Nov 5, 2013 at 5:45 AM, Chad Joan chadj...@gmail.com wrote: Use the repetition operator(s) and turn the resulting array into whatever tree you like. In practice, I have never had a problem with this. I have used both Pegged and have written an SQL parser at work (not exhaustive, just what's needed) that uses C macros as PEG elements. Tangent: Using C macros for this worked surprisingly well and allowed me to avoid an extra step in the build process (I do not have access to D for that project). Pegged is still much more scalable, optimizable, and the grammars are a lot less ugly/finicky. That made my day, thanks! You're very welcome! Thank you for spending all that time to make a neat thing. Maybe I'll rub it in: Pegged taught me about PEGs and how to build reasonably powerful parsers with extremely limited tools and support. It's like being able to make shelter in the woods when no one gives you a tent. And my company's codebase is a jungle. So even when D isn't around, it is still /useful/. Thanks.
Re: D parsing
On 2013-11-05 18:23, Martin Nowak wrote: Sounds promising. I think I found a way to implement a D REPL but are lacking a reliable way to classify code snippets (expressions, declarations, statements) and get introduced symbol names. Do you think it will be able to handle that (performance is not an issue)? If there is no parser I either have to use a crude heuristic or I could use machine learning to classify the code based on token histograms (after all we have at least 2 D lexers). There's a parser available, DScanner: https://github.com/Hackerpilot/Dscanner. It's part of DCD, autocomplete daemon. -- /Jacob Carlborg
Re: D parsing
On Tuesday, 5 November 2013 at 20:17:07 UTC, Andrei Alexandrescu wrote: 3. Grammar changes are the simplest ones and in a way the most embarrassing if they happen. The best solution I see to that is deriving the documentation and the actual parser from the same source. This is part of why I'm so keen on parser generators. This. This is why I have a hard time accepting any formalize a grammar first arguments. I think any grammars for D will be pure speculation until they become executable and start to see heavy use in the tools that we use.
Re: D parsing
On Wednesday, 6 November 2013 at 08:19:13 UTC, Jacob Carlborg wrote: On 2013-11-05 17:55, Philippe Sigaud wrote: Walter is far from convinced that AST manipulation is a good thing. You would have to convince him first. His fear is that it will lead to unreadable code, and everyone using her own personnal version of D. AFAICT, nothing of the sort happened in Lisp (I mean, Lispers have balkanization, but *not* due to AST manipulation). You know, Walter did a talk about AST macros at the first D conference. The idea back then was to add AST macros, hence the macro keyword. Also, IIRC, it is believed that string mixins with CTFE are potentially more powerful. I am under the assumption that Walter is taking the long view and waiting for the community to furnish their own powerful AST manipulation tools using the existing spec. I suspect that he is opposed to baking AST manipulation into the /language spec/, but is perfectly accepting of the notion of using AST manipulation to generate code, reduce boilerplate, implement exotic features and DSLs, and so on. Just don't complicate the core language any more than it already is. Sorry if I misrepresented you Walter; I can only make educated guesses ;)
Re: D parsing
On Wednesday, 6 November 2013 at 08:34:32 UTC, Chad Joan wrote: Also, IIRC, it is believed that string mixins with CTFE are potentially more powerful. Maybe. But first, the point that Brian Schott brought up has to be addressed (language spec and DMD must be in sync). Second, CTFE has to get more efficient. It's _very_ easy to make it run out of memory and it's fairly slow at executing. But yeah, I agree that there could be a library AST manipulator and it'd be pretty nice. But it'll require that two sets of parsers in two different languages be kept in sync, which is likely to be a challenge. There's pros and cons to both of the approaches, really.
Re: D parsing
On Wednesday, 6 November 2013 at 09:09:32 UTC, Chris Cain wrote: On Wednesday, 6 November 2013 at 08:34:32 UTC, Chad Joan wrote: Also, IIRC, it is believed that string mixins with CTFE are potentially more powerful. Maybe. But first, the point that Brian Schott brought up has to be addressed (language spec and DMD must be in sync). Second, CTFE has to get more efficient. It's _very_ easy to make it run out of memory and it's fairly slow at executing. But yeah, I agree that there could be a library AST manipulator and it'd be pretty nice. But it'll require that two sets of parsers in two different languages be kept in sync, which is likely to be a challenge. There's pros and cons to both of the approaches, really. Right. My thoughts are, be patient. The slow CTFE can be fixed without changes to the language spec, and it has progressed really well over the last couple years. The DMD vs spec thing is also fixable by building tools that operate on a formal grammar and forcing the grammar to be thoroughly explored. At the end of the day, I believe these will both be pleasantly fixed, and the spec won't end up with a bunch of complexity that was useful 10 years ago. It all makes sense to me, at least.
Re: D parsing
On Wed, Nov 6, 2013 at 9:19 AM, Jacob Carlborg d...@me.com wrote: You know, Walter did a talk about AST macros at the first D conference. The idea back then was to add AST macros, hence the macro keyword. I know. But since then, I guess he changed his mind, or thought about it a bit more :-)
Re: D parsing
On Wed, Nov 6, 2013 at 9:34 AM, Chad Joan chadj...@gmail.com wrote: On Wednesday, 6 November 2013 at 08:19:13 UTC, Jacob Carlborg wrote: On 2013-11-05 17:55, Philippe Sigaud wrote: Walter is far from convinced that AST manipulation is a good thing. You would have to convince him first. His fear is that it will lead to unreadable code, and everyone using her own personnal version of D. AFAICT, nothing of the sort happened in Lisp (I mean, Lispers have balkanization, but *not* due to AST manipulation). You know, Walter did a talk about AST macros at the first D conference. The idea back then was to add AST macros, hence the macro keyword. Also, IIRC, it is believed that string mixins with CTFE are potentially more powerful. I am under the assumption that Walter is taking the long view and waiting for the community to furnish their own powerful AST manipulation tools using the existing spec. I suspect that he is opposed to baking AST manipulation into the /language spec/, but is perfectly accepting of the notion of using AST manipulation to generate code, reduce boilerplate, implement exotic features and DSLs, and so on. Just don't complicate the core language any more than it already is. Sorry if I misrepresented you Walter; I can only make educated guesses ;) Well, I remember him saying that he doesn't want everybody and their dog to change the D syntax. If we get macros, that's exactly what we can do. For example, telling the compiler (or another tool), to rewrite int a := 1; into immutable a = 1; But that means the original not-D code cannot be compiled by anyone with a D compiler anymore (obviously). And soon everybody is transforming the grammar according to what they think is cool. That's his PoV IIRC. I'm more interested in getting to ability to define my own lowerings: in the same way that scope statements or foreach over ranges are rewritten into lower-level D code by the comiler, I would like to be able to define my own syntactic constructs. Plus, everybody here know that AST macros can cure world hunger :) Wouldn't that be nice?
Re: D parsing
On Tue, Nov 5, 2013 at 6:39 PM, Martin Nowak c...@dawg.eu wrote: Like many others I'm hoping for a nice general parser generator for quite some time. So I'm also asking specifically about your insights on PEGs. From what I know they do have some memory issues due to the backtrace memoization (only for deep parse trees though, which aren't common in programming languages). Also it seems that the research community has some issues to formalize PEGs and what grammars they are capable to handle. IIRC, they don't fit exactly inside the Chomsky hierarchy. But then, so do other grammars and parsing algorithms. Also why is the expr grammar example so complicated? What do you mean?
Re: D parsing
On Tue, Nov 5, 2013 at 6:45 PM, Martin Nowak c...@dawg.eu wrote: On 11/03/2013 02:45 AM, Timothee Cour wrote: 1) The main issue I see with pegged is PEG grammars don't support left recursion, so for example will fail on foo[1].bar(2).fun(). Unless there's a plan to accomodate those, I sense a dead end. One can eliminate left recursion but this has issues. 2) There is some material on extending PEG to support those, eg Left Recursion in Parsing Expression Grammars, or code https://github.com/orlandohill/peg-left-recursion but I don't know how well they work in practice. Scala extended their PEG generator to handle left recursion. But then there is also a Scala GLL implementation. https://github.com/djspiewak/gll-combinators I didn't know that. My 'programming in Scala' book is a bit dated. I'll have a look, thanks.
Re: D parsing
On Wednesday, 6 November 2013 at 08:21:57 UTC, Jacob Carlborg wrote: It's part of DCD, autocomplete daemon. Side note: DCD 0.2.0 is in beta now. There's a thread on the .ide list. http://forum.dlang.org/thread/flxjzwuaadnkrixry...@forum.dlang.org
Re: D parsing
Brian Schott wrote: Why am I the only one who thinks this is a problem? meetoo. -manfred
Re: D parsing
05-Nov-2013 10:36, Chad Joan пишет: On Friday, 1 November 2013 at 21:22:46 UTC, Andrei Alexandrescu wrote: ... Bugs stopping Pegged from going forward should receive high priority. I also encourage others to participate to that and similar work. Andrei [snip] I was working on a parser-engine-as-a-library that could be used to as optimized internals for Pegged, as well as any other tools that need to recognize these common patterns. The idea was to expose an API like this: string makeParser() { auto builder = new ParserBuilder!char; builder.pushSequence(); builder.literal('x'); builder.pushMaybe(); builder.literal('y'); builder.pop(); builder.pop(); return builder.toDCode(callMe); } I was also toying with the idea of exposing Builder interface for std.regex. But push/pop IMHO are better be implicitly designed-out: auto re = atom('x').star(charClass(unicode.Letter),atom('y')).build(); ... and letting the nesting be explicit. Is the same as: auto re = regex(`x(?:\p{L}y)*`); Aimed for apps/libs that build regular expressions anyway and have no need in textual parser. That snippet would create a parser that recognizes the grammar 'x' ( 'y'? ). The current fledgling implementation creates this parser: http://pastebin.com/MgSqWXE2 Of course, no one would be expected to write grammars like that. It would be the job of tools like Pegged or std.regex to package it up in nice syntax that is easy to use. I thought to provide some building blocks for that with new std.uni. Not quite everything I wanted, but now at least there is one set of wheels less to reinvent. [snip] Another fun thought: PEGs can have look-behind that includes any regular elements without any additional algorithmic complexity. Just take all of the look-behinds in the grammar, mash them together into one big regular-expression using regular alternation (|), and then have the resulting automaton consume in lock-step with the PEG parser. Whenever the PEG parser needs to do a lookbehind, it just checks to see if the companion automaton is in a matching state for the capture it needs. Sounds quite slow to do it just in case. Also complete DFAs tend to be mm quite big. What ANTLR does is similar technique - a regular lookahead to resolve ambiguity in the grammar (implicitly). A lot like LL(k) but with unlimited length (so called LL(*)). Of course, it generates LL(k) disambiguation where possible, then LL(*), failing that the usual backtracking. *sigh*, I feel like I could write a paper on this stuff if I were in grad school right now. Alas, I am stuck doing 50-60 hours a week of soul-sucking business programming. I heard Sociomantic is hiring D programmers for coding some awesome stuff, you may as well apply :) Well, then again, my understanding is that even though I can think of things that seem like they would make interesting topics for publishable papers, reality would have the profs conscript me to do completely different things that are possibly just as inane as the business programming. Speaking for my limited experience - at times it's like that. I worry that the greater threat to good AST manipulation tools in D is a lack of free time, and not the DMD bugs as much. Good for you I guess, my developments in related area are blocked still :( I hope this is useful to someone! -- Dmitry Olshansky
Re: D parsing
On Tue, Nov 5, 2013 at 5:45 AM, Chad Joan chadj...@gmail.com wrote: Use the repetition operator(s) and turn the resulting array into whatever tree you like. In practice, I have never had a problem with this. I have used both Pegged and have written an SQL parser at work (not exhaustive, just what's needed) that uses C macros as PEG elements. Tangent: Using C macros for this worked surprisingly well and allowed me to avoid an extra step in the build process (I do not have access to D for that project). Pegged is still much more scalable, optimizable, and the grammars are a lot less ugly/finicky. That made my day, thanks!
Re: D parsing
On Tue, Nov 5, 2013 at 3:54 PM, Dmitry Olshansky dmitry.o...@gmail.comwrote: I was also toying with the idea of exposing Builder interface for std.regex. But push/pop IMHO are better be implicitly designed-out: auto re = atom('x').star(charClass(unicode.Letter),atom('y')).build(); ... and letting the nesting be explicit. Is the same as: auto re = regex(`x(?:\p{L}y)*`); Aimed for apps/libs that build regular expressions anyway and have no need in textual parser. Another possible advantage is to reference external names inside your construction, thus naming other regexen or refencing external variables to deposit backreferences inside them. All in all, to get a regex construct that can interact with the external word. What ANTLR does is similar technique - a regular lookahead to resolve ambiguity in the grammar (implicitly). A lot like LL(k) but with unlimited length (so called LL(*)). Of course, it generates LL(k) disambiguation where possible, then LL(*), failing that the usual backtracking. I liked that idea since the author added it to ANTLR, but I never used it since. I wonder whether that can be implemented inside another parser generator or if it uses some specific-to-ANTLR internal machinery. *sigh*, I feel like I could write a paper on this stuff if I were in grad school right now. Alas, I am stuck doing 50-60 hours a week of soul-sucking business programming. I heard Sociomantic is hiring D programmers for coding some awesome stuff, you may as well apply :) Well, I do 50-60 hours a week of fulfilling and challenging work, but the free time at the end is the same ;) I feel Chad's pain, though. Well, then again, my understanding is that even though I can think of things that seem like they would make interesting topics for publishable papers, reality would have the profs conscript me to do completely different things that are possibly just as inane as the business programming. Speaking for my limited experience - at times it's like that. I worry that the greater threat to good AST manipulation tools in D is a lack of free time, and not the DMD bugs as much. Good for you I guess, my developments in related area are blocked still :( Walter is far from convinced that AST manipulation is a good thing. You would have to convince him first. His fear is that it will lead to unreadable code, and everyone using her own personnal version of D. AFAICT, nothing of the sort happened in Lisp (I mean, Lispers have balkanization, but *not* due to AST manipulation).
Re: D parsing
On Tue, Nov 5, 2013 at 8:27 AM, Brian Schott briancsch...@gmail.com wrote: But we should take a step back first. Before we try to implement a parser for D's grammar, we need to figure out what exactly D's grammar is. Seriously. We don't have a real grammar for D. We have the language spec on dlang.org, but it isn't complete, consistent, or updated when the language changes. Want examples? I have a tracker for them here: http://d.puremagic.com/issues/show_bug.cgi?id=10233 There's also my project here: https://github.com/Hackerpilot/DGrammar, but it's not official and I keep finding differences between it and the behavior of DMD. Why am I the only one who thinks this is a problem? You're not alone in this, but Walter's answer is that we must make pull requests on the website if we find errors in the online grammar. The problem is not knowing what the intended behavior is. When the reference compiler and the grammar disagree, who I am to say which is wrong? We could also see if some slight modification to the grammar could push it into LALR(1) territory.
Re: D parsing
On 11/02/2013 05:06 AM, Manfred Nowak wrote: One prerequisite for every PEG-Parser is, that the language has to be designed to be without any ambiguity. AFAIK Parser expression grammars handle ambiguity through prioritization. You'd still need to rewrite grammar rules that involve left-recursion or hidden left recursion.
Re: D parsing
On 11/01/2013 09:59 PM, Philippe Sigaud wrote: The examples directory shows different grammars, from JSON to XML to C to D. Sounds promising. I think I found a way to implement a D REPL but are lacking a reliable way to classify code snippets (expressions, declarations, statements) and get introduced symbol names. Do you think it will be able to handle that (performance is not an issue)? If there is no parser I either have to use a crude heuristic or I could use machine learning to classify the code based on token histograms (after all we have at least 2 D lexers).
Re: D parsing
On 11/01/2013 09:59 PM, Philippe Sigaud wrote: I used it with the D grammar, but the one on the website is woefully inadequate (some mistakes, out of date compared to the compiler and written in a somewhat convoluted style full of left-recursion). The shortcomings are that the generated parser is quite slow compared to other D parsers. That comes from my coding, of course: Pegged generates a simple recursive descent parser. I guess I could push for a better engine, but I'm waiting for CTFE to get a bit better. The advantages are nice, though: full parse tree, semantic actions, and so on. Like many others I'm hoping for a nice general parser generator for quite some time. So I'm also asking specifically about your insights on PEGs. From what I know they do have some memory issues due to the backtrace memoization (only for deep parse trees though, which aren't common in programming languages). Also it seems that the research community has some issues to formalize PEGs and what grammars they are capable to handle. Also why is the expr grammar example so complicated?
Re: D parsing
On 11/03/2013 02:45 AM, Timothee Cour wrote: 1) The main issue I see with pegged is PEG grammars don't support left recursion, so for example will fail on foo[1].bar(2).fun(). Unless there's a plan to accomodate those, I sense a dead end. One can eliminate left recursion but this has issues. 2) There is some material on extending PEG to support those, eg Left Recursion in Parsing Expression Grammars, or code https://github.com/orlandohill/peg-left-recursion but I don't know how well they work in practice. Scala extended their PEG generator to handle left recursion. But then there is also a Scala GLL implementation. https://github.com/djspiewak/gll-combinators
Re: D parsing
Martin Nowak wrote: AFAIK Parser expression grammars handle ambiguity through prioritization. My statement about ambiguity is taken from the first sentence of chapter 7 of Parsing Expression Grammars: A Recognition-Based Syntactic Foundation by Brian Ford: http://pdos.csail.mit.edu/papers/parsing%3Apopl04.pdf: | Parsing expression grammars provide a [...] foundation for [...] | languages that are designed to be unambiguous. -manfred
Re: D parsing
On 10/30/2013 11:39 PM, jgetner wrote: Is there a function in D for evaluating expressions before compile time?. I want to throw two more ideas in the parser generator topic which I carry around for quite some time. I think combined they'd enable AST generation from clean and simple grammars. Maybe someone find this interesting enough to give it a try, I hardly will ever get to this. There is an interesting paper [MTP] that describes how a slight modification of the ENBF grammar can be used to generate a usable AST from the grammar definition. Each AST node is named like it's production. This is possible because the modified grammar disallows to mix alternates and sequences which forces one to name all productions. The other paper is [GLL] parsing which is a new general parser generation technique. The main advantage over GLR, it's a top-down parser. This allows to modularize grammars, e.g. you can define a grammar that reuses/imports an existing grammar for expressions. The reason to choose generalized parsers instead of say LALR(1) is maintainability of the grammars. The needed modifications to make a context free grammar LR(1) or LALR(1)-compatible often make them extremely hard to read and change. Parse forests (many parse trees) are not a problem I think because most CFG only have local ambiguities so they resolve to a single parse tree. ### A motivating example (desiderata) grammar.d ``` import number; Identifier = [_a-zA-Z][a-zA-Z]*; // This still needs some syntax to annotate precedence and associativity MulExpr = Expr * Expr; DivExpr = Expr / Expr; AddExpr = Expr + Expr; SubExpr = Expr - Expr; PowExpr = Expr ^^ Expr; Expr = MulExpr | DivExpr | AddExpr | SubExpr | PowExpr | Number | Identifer; Arguments = Expr % ,; ExprStmt = Expr ;; CallStmt = Identifier ( Arguments ); Stmt = ExprStmt | CallStmt; Stmts = Stmt*; ``` ```d auto p = genParser(import(grammar.d)); auto stmts = p.parseStmts(input); autp expr = p.parseExpr(input); foreach (arg; p.parseArguments(input)) {} final switch (expr) { case MulExpr.tag: case DivExpr.tag: case AddExpr.tag: case SubExpr.tag: case PowExpr.tag: } static assert(typeof(MulExpr == Tuple!(Expr, expr0, Expr, expr1))); static assert(typeof(Stmt == Variant!(ExprStmt, CallStmt))); static assert(typeof(Identifier) == string)); ``` [MTP]: http://babel.ls.fi.upm.es/research/mtp/ http://babel.ls.fi.upm.es/~angel/papers/finalcopy-2005prole.pdf [GLL]: http://www.rhul.ac.uk/computerscience/research/csle/gllparsers.aspx http://dotat.at/tmp/gll.pdf for implementers - Modelling GLL Parser Implementations (http://dx.doi.org/10.1007/978-3-642-19440-5_4)
Re: D parsing
On 11/04/2013 06:52 AM, Philippe Sigaud wrote: Do you know what part of the D grammar makes it non-LALR(1)? At least function and template function declarations are not even LR(1). You need to look for a second left parenthesis to distinguish both. It's fairly easy to solve this though. 1. Add a new token r)[ ]*( 2. Add an alternate rule that never matches but forces the parser to look behind the closing paren of the function arguments. ...
Re: D parsing
05-Nov-2013 20:55, Philippe Sigaud пишет: On Tue, Nov 5, 2013 at 3:54 PM, Dmitry Olshansky dmitry.o...@gmail.com mailto:dmitry.o...@gmail.com wrote: I was also toying with the idea of exposing Builder interface for std.regex. But push/pop IMHO are better be implicitly designed-out: auto re = atom('x').star(charClass(__unicode.Letter),atom('y')).__build(); ... and letting the nesting be explicit. Is the same as: auto re = regex(`x(?:\p{L}y)*`); Aimed for apps/libs that build regular expressions anyway and have no need in textual parser. Another possible advantage is to reference external names inside your construction, thus naming other regexen or refencing external variables to deposit backreferences inside them. Actually it's a bad, bad idea. It has nice potential to destroy all optimization opportunities and performance guarantees of it (like being linear in time, and that only works today w/o funky extensions used). After all I'm in a curious position of having to do some work at R-T as well where you can't always just generate some D code ;) What would be real nice though is to let users register their own dictionary of 'tokens' from that. Then things like Ipv4 pattern or domain name pattern as simple as `\d` pieces they use today (say with \i{user-defined-name}). All in all, to get a regex construct that can interact with the external word. Well, I think of some rather interesting ways to do it even w/o tying in some external stuff as building blocks. It's rather making std.regex itself less rigid and more lean (as in cheap to invoke). Then external modules may slice and dice its primitives as seen fit. What ANTLR does is similar technique - a regular lookahead to resolve ambiguity in the grammar (implicitly). A lot like LL(k) but with unlimited length (so called LL(*)). Of course, it generates LL(k) disambiguation where possible, then LL(*), failing that the usual backtracking. I liked that idea since the author added it to ANTLR, but I never used it since. I wonder whether that can be implemented inside another parser generator or if it uses some specific-to-ANTLR internal machinery. I don't think there is much of specific in it. You would though have to accept it's no longer a PEG but rather some hybrid top-down EBNF parser that resolves ambiguities. I worry that the greater threat to good AST manipulation tools in D is a lack of free time, and not the DMD bugs as much. Good for you I guess, my developments in related area are blocked still :( Walter is far from convinced that AST manipulation is a good thing. You would have to convince him first. I thought it was about tools that work with D code like say lints, refactoring, etc. -- Dmitry Olshansky
Re: D parsing
On 11/4/13 11:27 PM, Brian Schott wrote: On Monday, 4 November 2013 at 13:20:01 UTC, Timothee Cour wrote: for lexing there's already dscanner we could use (while we wait for perhaps a autogenerated lexer); so I think priority is on the autogenerated parser (dscanner has one but hand designed), where it's still unknown what will work well. Yes, that tool has two properties: 1) It works now. Not Soon(tm). You can download it, compile it, and use it to dump the AST of your D code in just a minute or two. 2) It wasn't built THE ONE TRUE WAY. But we should take a step back first. Before we try to implement a parser for D's grammar, we need to figure out what exactly D's grammar is. Seriously. We don't have a real grammar for D. We have the language spec on dlang.org, but it isn't complete, consistent, or updated when the language changes. Want examples? I have a tracker for them here: http://d.puremagic.com/issues/show_bug.cgi?id=10233 There's also my project here: https://github.com/Hackerpilot/DGrammar, but it's not official and I keep finding differences between it and the behavior of DMD. Why am I the only one who thinks this is a problem? I agree it's a problem, in fact three problems in one. In decreasing difficulty order: 1. Semantic changes for working code (e.g. order of evaluation etc) are subtle enough to be very difficult to track and require sheer attention and careful manual verification and maintenance of documentation. 2. Semantic analysis changes (i.e. compiles/doesn't compile) are also difficult and require attention, but at least can be to a good extent verified automatically (by means of test suites and runnable examples). In TDPL I have two categories of examples - visible and invisible. The invisible ones do not occur in the printed text but are present in the book source and are used to check whether the claims made by the book are true. It would be really cool if we had something like that for the online documentation. We should be able to intersperse freely documentation text with invisible unittests that ensure the documentation is correct. 3. Grammar changes are the simplest ones and in a way the most embarrassing if they happen. The best solution I see to that is deriving the documentation and the actual parser from the same source. This is part of why I'm so keen on parser generators. Andrei P.S. I haven't forgotten about the lexer - it's still on the back burner but I will publish it as soon as I get a chance.
Re: D parsing
On Tuesday, 5 November 2013 at 17:23:23 UTC, Martin Nowak wrote: On 11/01/2013 09:59 PM, Philippe Sigaud wrote: The examples directory shows different grammars, from JSON to XML to C to D. Sounds promising. I think I found a way to implement a D REPL but are lacking a reliable way to classify code snippets (expressions, declarations, statements) and get introduced symbol names. Do you think it will be able to handle that (performance is not an issue)? I did similar using Pegged little while back (https://github.com/callumenator/dabble), I tuned the PEG grammar quite a lot, but it worked ok.
Re: D parsing
On 11/04/2013 06:48 AM, Philippe Sigaud wrote: On Sun, Nov 3, 2013 at 7:08 PM, Timothee Cour thelastmamm...@gmail.com mailto:thelastmamm...@gmail.com wrote: On Sun, Nov 3, 2013 at 1:13 AM, Philippe Sigaud philippe.sig...@gmail.com mailto:philippe.sig...@gmail.comwrote: My current plan is to write different engines, and letting either the user select them at compile-time, or to have the parser decide which one to use, depending on the grammar. I'm pretty sure the 'Type 3' parts of a grammar (regular expressions) could be bone by using std.regex, for example. even lexing can't be done with regex, eg nesting comments : /+ ... +/ Also, although it may seem cleaner at first to combine lexing and parsing in 1 big grammar (as done in pegged), it usually is faster do feed a (separate) lexer output into parser. Lexing, yes. I was imprecise: even in a context-free grammar, some rules are regular and could use std.regex (the ct part) as the underlying engine, just for that rule. Lexing can not be done with regex. Think myArray[1. ] ! What is next a dot or a number.
Re: D parsing
On 11/04/2013 06:52 AM, Philippe Sigaud wrote: On Mon, Nov 4, 2013 at 1:55 AM, Timothee Cour thelastmamm...@gmail.com mailto:thelastmamm...@gmail.com wrote: I guess I'll have to write a CT-compatible LALR(1) engine... D does not fit into LALR(1), you need glr. Well, too bad. GLR is also on my plate, but I fear its cubic behavior (IIRC, it degrades gracefully, though). Thats one part, and even worse is that you need a decider function if more than one rule accepts. Do you know what part of the D grammar makes it non-LALR(1)? I had big trouble with the IdentifierList rules.
Re: D parsing
04-Nov-2013 12:28, Robert Schadek пишет: On 11/04/2013 06:48 AM, Philippe Sigaud wrote: On Sun, Nov 3, 2013 at 7:08 PM, Timothee Cour thelastmamm...@gmail.com mailto:thelastmamm...@gmail.com wrote: On Sun, Nov 3, 2013 at 1:13 AM, Philippe Sigaud philippe.sig...@gmail.com mailto:philippe.sig...@gmail.comwrote: My current plan is to write different engines, and letting either the user select them at compile-time, or to have the parser decide which one to use, depending on the grammar. I'm pretty sure the 'Type 3' parts of a grammar (regular expressions) could be bone by using std.regex, for example. even lexing can't be done with regex, eg nesting comments : /+ ... +/ Also, although it may seem cleaner at first to combine lexing and parsing in 1 big grammar (as done in pegged), it usually is faster do feed a (separate) lexer output into parser. Lexing, yes. I was imprecise: even in a context-free grammar, some rules are regular and could use std.regex (the ct part) as the underlying engine, just for that rule. Lexing can not be done with regex. Think myArray[1. ] ! What is next a dot or a number. You could use lookahead extension ;) In general I would not recommend ATM to use std.regex for performance-critical lexer if only because it wasn't tuned for such a use case yet. I have plans for extending std.regex capabilities in many directions, lexing being one important use case. -- Dmitry Olshansky
Re: D parsing
for lexing there's already dscanner we could use (while we wait for perhaps a autogenerated lexer); so I think priority is on the autogenerated parser (dscanner has one but hand designed), where it's still unknown what will work well. On Mon, Nov 4, 2013 at 2:43 AM, Dmitry Olshansky dmitry.o...@gmail.comwrote: 04-Nov-2013 12:28, Robert Schadek пишет: On 11/04/2013 06:48 AM, Philippe Sigaud wrote: On Sun, Nov 3, 2013 at 7:08 PM, Timothee Cour thelastmamm...@gmail.com mailto:thelastmamm...@gmail.com wrote: On Sun, Nov 3, 2013 at 1:13 AM, Philippe Sigaud philippe.sig...@gmail.com mailto:philippe.sig...@gmail.comwrote: My current plan is to write different engines, and letting either the user select them at compile-time, or to have the parser decide which one to use, depending on the grammar. I'm pretty sure the 'Type 3' parts of a grammar (regular expressions) could be bone by using std.regex, for example. even lexing can't be done with regex, eg nesting comments : /+ ... +/ Also, although it may seem cleaner at first to combine lexing and parsing in 1 big grammar (as done in pegged), it usually is faster do feed a (separate) lexer output into parser. Lexing, yes. I was imprecise: even in a context-free grammar, some rules are regular and could use std.regex (the ct part) as the underlying engine, just for that rule. Lexing can not be done with regex. Think myArray[1. ] ! What is next a dot or a number. You could use lookahead extension ;) In general I would not recommend ATM to use std.regex for performance-critical lexer if only because it wasn't tuned for such a use case yet. I have plans for extending std.regex capabilities in many directions, lexing being one important use case. -- Dmitry Olshansky
Re: D parsing
On Sunday, 3 November 2013 at 01:45:23 UTC, Timothee Cour wrote: 1) The main issue I see with pegged is PEG grammars don't support left recursion, so for example will fail on foo[1].bar(2).fun(). Unless there's a plan to accomodate those, I sense a dead end. One can eliminate left recursion but this has issues. Use the repetition operator(s) and turn the resulting array into whatever tree you like. In practice, I have never had a problem with this. I have used both Pegged and have written an SQL parser at work (not exhaustive, just what's needed) that uses C macros as PEG elements. Tangent: Using C macros for this worked surprisingly well and allowed me to avoid an extra step in the build process (I do not have access to D for that project). Pegged is still much more scalable, optimizable, and the grammars are a lot less ugly/finicky.
Re: D parsing
On Friday, 1 November 2013 at 21:22:46 UTC, Andrei Alexandrescu wrote: ... Bugs stopping Pegged from going forward should receive high priority. I also encourage others to participate to that and similar work. Andrei Ack! I share Philippe's sense of timing. This would have been even nicer to hear a year ago when both of us were actively committing ;) I was going to close a project or two that I care deeply about and had started a long time ago, but I see that this might be a hard decision. Nonetheless, I am really glad that you are showing this interest! I like to hear stuff like this, since I too really like Pegged. Andrei and Philippe, I feel compelled to share some of my thoughts that I never had time to finish back then. I was working on a parser-engine-as-a-library that could be used to as optimized internals for Pegged, as well as any other tools that need to recognize these common patterns. The idea was to expose an API like this: string makeParser() { auto builder = new ParserBuilder!char; builder.pushSequence(); builder.literal('x'); builder.pushMaybe(); builder.literal('y'); builder.pop(); builder.pop(); return builder.toDCode(callMe); } const foo = makeParser(); pragma(msg, foo); mixin(foo); That snippet would create a parser that recognizes the grammar 'x' ( 'y'? ). The current fledgling implementation creates this parser: http://pastebin.com/MgSqWXE2 Of course, no one would be expected to write grammars like that. It would be the job of tools like Pegged or std.regex to package it up in nice syntax that is easy to use. The code already takes a somewhat different strategy than Pegged's original strategy. Rather than generating a bunch of templates that dmd then has to instantiate to actualize the parser, it just emits a bunch of very primitive procedural D code. I suspect that this approach would mixin far faster with current dmd, because the deeply nested templates generated by Pegged seemed to be a bottleneck. I have to hand it to Philippe though for coming up with a very clever way to bootstrap the thing: once I saw how his templates assembled together, I realized just how convenient that was! (And I think my parser generator still has to be tought how to avoid making redundant branches in its output: there's some hash table action that belongs in there somewhere.) The small amount of code that I have for it is here: https://github.com/chadjoan/xdc/blob/master/src/xdc/parser_builder/parser_builder.d I wanted to eventually make it generic enough to recognize patterns in things besides strings. Being able to write grammars that recognize patterns in ASTs is /useful/. That leads into the whole xdc project: automate all of the tedious crud in semantic analysis, and thus make compiler writing, and possibly other AST manipulations in user code, become all much easier. The other thing I wanted to do was to optimize it. - I intended it to do no allocations unless the caller asks for it. - I intended to do a bunch of PEG/regex hybridization. I noticed some mathematical properties of PEGs and regular expressions that should allow you to mix the two as much as possible. All you have to do is tell it how to behave at the boundaries where they meet. And given that PEGs already define their own behavior pretty well, it would become possible to lower a lot of a PEG into regular expressions connected with a minimal set of PEG rules. This would be some awesome lowering: if you first do a pass that inlines as many rules as possible, and then do a second pass that converts PEG elements into regular elements whenever possible, then I feel like the thing will be damned near optimal. If you are wondering what these mathematical properties are, then I encourage you to look at this snippet where I define unitary and non-unitary expressions, for lack of prior terms: http://pastebin.com/iYBypRc5 Another fun thought: PEGs can have look-behind that includes any regular elements without any additional algorithmic complexity. Just take all of the look-behinds in the grammar, mash them together into one big regular-expression using regular alternation (|), and then have the resulting automaton consume in lock-step with the PEG parser. Whenever the PEG parser needs to do a lookbehind, it just checks to see if the companion automaton is in a matching state for the capture it needs. *sigh*, I feel like I could write a paper on this stuff if I were in grad school right now. Alas, I am stuck doing 50-60 hours a week of soul-sucking business programming. Well, then again, my understanding is that even though I can think of things that seem like they would make interesting topics for publishable papers, reality would have the profs conscript me to do completely different things that are possibly just as inane