Re: Pegged, From EBNF to PEG
On Wed, Mar 14, 2012 at 10:48, Dmitry Olshansky dmitry.o...@gmail.com wrote: That's one of the caveats on PEG. That and greedy operators. 'a*a' never succeeds because 'a*' consumes all the available a's. Hey, wait, I thought there has to be backtrack here, i.e. when second 'a' finally fails? PEG only do local backtracking in 'or' choices, not in sequences. I think that's because the original author was dealing with packrat parsing and its O(input-size) guarantee. I remember trying compile-time backtracking in another similar library in D 1-2 years ago and getting lots of pb. I might add that in Pegged, but I don't know the consequences. How do you do that in std.regex? (nice article btw, I learnt some about regexes)
Re: Pegged: Syntax Highlighting
On Wed, Mar 14, 2012 at 21:03, Andrej Mitrovic andrej.mitrov...@gmail.com wrote: how would one use a parser like Pegged for syntax highlighting? Ok, typically one would use a lexer and not a parser. But using a parser might be more interesting for creating more complex syntax highlighting. :) Actually I think I can use the new ddmd-clean port for just this purpose. Sorry for the noise. Sorry for the late reply, I was away for a few days, in a Net-forsaken place ;) If ddmd-clean is OK for you, that's cool. Keep us informed how that went. If you want to use Pegged, you'd need to enter the entire D grammar to get a correct parse tree. I just finished writing it, but I'm afraid to try and compile it :) It's one huge monster.
Re: Pegged, From EBNF to PEG
On 17.03.2012 10:59, Philippe Sigaud wrote: On Wed, Mar 14, 2012 at 10:48, Dmitry Olshanskydmitry.o...@gmail.com wrote: That's one of the caveats on PEG. That and greedy operators. 'a*a' never succeeds because 'a*' consumes all the available a's. Hey, wait, I thought there has to be backtrack here, i.e. when second 'a' finally fails? PEG only do local backtracking in 'or' choices, not in sequences. I think that's because the original author was dealing with packrat parsing and its O(input-size) guarantee. Ok, let's agree on fact that semantically a* is : As - a As / a and a*? is this: As - a / a As Now that's local ;) I remember trying compile-time backtracking in another similar library in D 1-2 years ago and getting lots of pb. I might add that in Pegged, but I don't know the consequences. How do you do that in std.regex? There are two fundamental ways to get around the problem, the easiest to implement is to use a stack to record a position in input + number of alternative/production (I call it a PC - program counter) every time there is a fork like that. Then when the current path ultimately fails pop stack, reset input and go over again until you either match or empty the stack. That's how to avoid dp recursion that happens here. The problematic thing now is combinatorial explosion of a* a* a* //simplified, but you get the idea The trick to avoid this sort of crap is to 1) agree that whatever matches first has higher priority (left-most match) that's a simple disambiguation rule. 2) record what positions + pc you place on stack e.g. use a set, or in fact, a bitmap to push every unique combination (pc,index) only once. Voila! Now you have a linear worst-case algorithm with (M*N) complexity where M is number of all possible PCs, and N is number of all possible indexes in input. Now if we put aside all crap talk about mathematical purity and monads, and you name it, a packrat is essentially this, a dynamic programming trick applied to recursive top-down parsing. The trick could be extended even more to avoid extra checks on input characters, but the essence is this memoization. (nice article btw, I learnt some about regexes) Thanks, I hope it makes them more popular. Might as well keep me busy fixing bugs :) -- Dmitry Olshansky
Re: DUnit - class MyTest { mixin TestMixin; void testMethod1() {} void testMethod2() {}}
On Monday, 20 February 2012 at 01:49:04 UTC, Juan Manuel Cabo wrote: I thought I could do a better effort to describe why DUnit is so extraordinary, for a native language, especially for those unfamiliar with xUnit frameworks This is great stuff, thanks ! Anyway, I'm not fond of your examples; so here is a silly one from me : http://lanael.free.fr/summertest.d.html
Re: Pegged: Syntax Highlighting
On 17.03.2012 08:01, Philippe Sigaud wrote: On Wed, Mar 14, 2012 at 21:03, Andrej Mitrovic andrej.mitrov...@gmail.com wrote: how would one use a parser like Pegged for syntax highlighting? Ok, typically one would use a lexer and not a parser. But using a parser might be more interesting for creating more complex syntax highlighting. :) Actually I think I can use the new ddmd-clean port for just this purpose. Sorry for the noise. Sorry for the late reply, I was away for a few days, in a Net-forsaken place ;) If ddmd-clean is OK for you, that's cool. Keep us informed how that went. If you want to use Pegged, you'd need to enter the entire D grammar to get a correct parse tree. I just finished writing it, but I'm afraid to try and compile it :) It's one huge monster. I want to use Pegged for that purpose. So go ahead an commit the D grammar ;) Would be so awesome if Pegged would be able to parse D. ~Extrawurst
Re: Pegged, From EBNF to PEG
On Sat, Mar 17, 2012 at 10:09, Dmitry Olshansky dmitry.o...@gmail.com wrote: Ok, let's agree on fact that semantically a* is : As - a As / a and a*? is this: As - a / a As Now that's local ;) It's local, yes. But the pb is with Expr - A* B C D, when D fails. PEG sequences don't backtrack. I remember trying compile-time backtracking in another similar library in D 1-2 years ago and getting lots of pb. I might add that in Pegged, but I don't know the consequences. How do you do that in std.regex? There are two fundamental ways to get around the problem (snip) Thanks for the explanations. Does std.regex implement this? Now if we put aside all crap talk about mathematical purity and monads, and you name it, a packrat is essentially this, a dynamic programming trick applied to recursive top-down parsing. The trick could be extended even more to avoid extra checks on input characters, but the essence is this memoization. I plan to store indices anyway. I might add this in the future. A read something on using PEGs to parse Java and C and there was no need to do a complete memoization: storing the last parse result as a temporary seemed to be enough. (nice article btw, I learnt some about regexes) Thanks, I hope it makes them more popular. Might as well keep me busy fixing bugs :) As people said on Reddit, you should make more whooping sounds about CT-regex. That was what wowed me and pushed me to tackle CT-parsing again.
Re: Pegged: Syntax Highlighting
On 17.03.2012 15:13, Philippe Sigaud wrote: The D grammar is a 1000-line / hundreds of rules monster. I finished writing it and am now crushing bugs. Any ETA when u gonna commit it for the public ? Wouldn't mind getting my hands dirty on it and looking for bugs too ;)
Re: Pegged, From EBNF to PEG
On 17.03.2012 18:11, Philippe Sigaud wrote: On Sat, Mar 17, 2012 at 10:09, Dmitry Olshanskydmitry.o...@gmail.com wrote: Ok, let's agree on fact that semantically a* is : As- a As / a and a*? is this: As- a / a As Now that's local ;) It's local, yes. But the pb is with Expr- A* B C D, when D fails. PEG sequences don't backtrack. I'd argue they do. As I see it as: Expr - As B C D / B C D As - A / A As (or use an epsilon production for As, is it allowed in pegged ?) I remember trying compile-time backtracking in another similar library in D 1-2 years ago and getting lots of pb. I might add that in Pegged, but I don't know the consequences. How do you do that in std.regex? There are two fundamental ways to get around the problem (snip) Thanks for the explanations. Does std.regex implement this? It does the second way that I haven't told about, and had hard time to implement in C-T :) And the simple version of what I presented (i.e. stack stuff) is used in CT-regex and as fallback in R-T. The problem with doing a bitmap memoization is that regex often used to _search_ things in large inputs. Some compromise and/or thresholds have to be estimated and found. Parsing on the contrary guaranties that the whole input is used or in error, hence bitmap is not wasted. Now if we put aside all crap talk about mathematical purity and monads, and you name it, a packrat is essentially this, a dynamic programming trick applied to recursive top-down parsing. The trick could be extended even more to avoid extra checks on input characters, but the essence is this memoization. I plan to store indices anyway. I might add this in the future. A read something on using PEGs to parse Java and C and there was no need to do a complete memoization: storing the last parse result as a temporary seemed to be enough. Well even straight no-memorization is somehow enough, it's just a way to ensure no extra work is done. C Java are not much of backtracky grammars. (nice article btw, I learnt some about regexes) Thanks, I hope it makes them more popular. Might as well keep me busy fixing bugs :) As people said on Reddit, you should make more whooping sounds about CT-regex. That was what wowed me and pushed me to tackle CT-parsing again. It's just I'm also well aware of how much luck it (still) takes to fly ;) The asserts that compare C-T vs R-T go off too often for my taste, other then this the memory usage during compile is too high. I recall once dmd had GC re-enabled for brief period, dmd used no more then ~64Mb doing real crazy ct-regex stuff. OK, back to C-T parsing, I have one crazy idea that I can't get away from - add operator precedence grammar into the mix. From what I observe it should integrate into PEG smoothly. Then it would make military-grade hybrid that uses operator precedence parsing of expressions and the like. Few more tricks and it may beat some existing parser generators. See this post, where I tried to describe that idea early on: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.Darticle_id=159562 I might catch spare time to go about adding it myself, the only tricky thing is to embed plain semantic actions, as AST generation would be more or less the same. -- Dmitry Olshansky
Re: It's official: The D Programming Language will participate to GSoC 2012!
On 2012-03-16 19:32, Steven Schveighoffer wrote: On Fri, 16 Mar 2012 14:24:38 -0400, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: Just got the acceptance message. This is great news! If you consider being a mentor, please apply as described in http://dlang.org/gsoc2012.html. Thanks! You really think Google summer of code is going to sponsor development of iPhone compatibility? :) Not that I wouldn't welcome this with open arms... We'll see what happens. It can't hurt to have as an idea. -- /Jacob Carlborg
Re: Pegged: Syntax Highlighting
On 3/17/12 9:13 AM, Philippe Sigaud wrote: I want to use Pegged for that purpose. So go ahead an commit the D grammar ;) Would be so awesome if Pegged would be able to parse D. ~Extrawurst The D grammar is a 1000-line / hundreds of rules monster. I finished writing it and am now crushing bugs. God, that generates a 10_000 line module to parse it. I should simplify the code generator somewhat. Science is done. Welcome to implementation :o). I can't say how excited I am about this direction. I have this vision of having a D grammar published on the website that is actually it, i.e. the same exact grammar is used by a validator that goes through all of our test suite. (The validator wouldn't do any semantic checking.) The parser generator _and_ the reference D grammar would be available in Phobos, so for anyone it would be dirt cheap to parse some D code and wander through the generated AST. The availability of a reference grammar and parser would be golden to a variety of D toolchain creators. Just to gauge interest: 1. Would you consider submitting your work to Phobos? 2. Do you think your approach can generate parsers competitive with hand-written ones? If not, why? Andrei
Re: Pegged, From EBNF to PEG
On Sat, Mar 17, 2012 at 15:48, Dmitry Olshansky dmitry.o...@gmail.com wrote: PEG sequences don't backtrack. I'd argue they do. As I see it as: Expr - As B C D / B C D As - A / A As That's what people doing Regex-to-PEG translations do, yes. But it's not the spontaneous behavior of A* B C D in PEG. But that means I could add a switch to transform the expressions that way. (or use an epsilon production for As, is it allowed in pegged ?) I called it 'Eps', it's a predefined parser that always matches and consumes nothing. I used the greek epsilon at the beginning (ε) but thought that many people would shout at this :) OK, back to C-T parsing, I have one crazy idea that I can't get away from - add operator precedence grammar into the mix. From what I observe it should integrate into PEG smoothly. Then it would make military-grade hybrid that uses operator precedence parsing of expressions and the like. Few more tricks and it may beat some existing parser generators. See this post, where I tried to describe that idea early on: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.Darticle_id=159562 I remember reading this. But I don't feel I'm up to it for now. I might catch spare time to go about adding it myself, the only tricky thing is to embed plain semantic actions, as AST generation would be more or less the same. Cool!
Re: Pegged: Syntax Highlighting
On Sat, Mar 17, 2012 at 15:44, Extrawurst s...@extrawurst.org wrote: On 17.03.2012 15:13, Philippe Sigaud wrote: The D grammar is a 1000-line / hundreds of rules monster. I finished writing it and am now crushing bugs. Any ETA when u gonna commit it for the public ? Wouldn't mind getting my hands dirty on it and looking for bugs too ;) I just pushed it on Github. pegged/examples/dgrammar.d just contains the D grammar as a string. pegged/examples/ddump.d is the generated parser family. There are no more syntax bugs, Pegged accepts the string as a correct grammar and DMD accepts to compile the resulting classes. I tested the generated parser on microscopic D files and... it sometimes works :) I made many mistakes and typos while writing the grammar. I corrected a few, but there are many more, without a doubt I'll write a wiki page on how to generate the grammar anew, if need be. Btw, the D grammar comes from the website (I didn't find the time to compare it to the grammar Rainer uses for Mono-D), and its horribly BNF-like: almost no + or * operators, etc. I tried to factor some expressions and simplify some, but it could be a bit shorter (not much, but still).
mysql tool
Hi D friends. I'd like to share with you a little tool. It allows you to execute SQL statements in your MySQL database. It displays the data in a nice data grid widget written by David Hillard. I hope you like it. Link http://my.opera.com/run3/blog/2012/03/17/mysql-tool
Re: Pegged: Syntax Highlighting
On Sat, Mar 17, 2012 at 18:11, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: The D grammar is a 1000-line / hundreds of rules monster. I finished writing it and am now crushing bugs. God, that generates a 10_000 line module to parse it. I should simplify the code generator somewhat. Science is done. Welcome to implementation :o). Hey, it's only 3.000 lines now :) Coming from a thousand-lines grammar, it's not that much an inflation. I can't say how excited I am about this direction. I have this vision of having a D grammar published on the website that is actually it, i.e. the same exact grammar is used by a validator that goes through all of our test suite. (The validator wouldn't do any semantic checking.) The parser generator _and_ the reference D grammar would be available in Phobos, so for anyone it would be dirt cheap to parse some D code and wander through the generated AST. The availability of a reference grammar and parser would be golden to a variety of D toolchain creators. Indeed, but I fear the D grammar is a bit too complex to be easily walked. Now that I read it, I realize that '1' is parsed as a 10-levels deep leaf! Compared to lisp, it's... not in the same league, to say the least. I will see to drastically simplify the parse tree. Does anyone have experience with other languages similar to D and that offer AST-walking? Doesn't C# have something like this? (I'll have a look at Scala macros) Just to gauge interest: 1. Would you consider submitting your work to Phobos? Yes, of course. It's already Boost-licensed. Seeing the review processes for other modules, it'd most certainly put the code in great shape. But then, it's far from being submittable right now. 2. Do you think your approach can generate parsers competitive with hand-written ones? If not, why? Right now, no, if only because I didn't take any step in making it fast or in limiting its RAM consumption. After applying some ideas I have, I don't know. There are many people here that are parser-aware and could help make the code faster. But at the core, to allow mutually recursive rules, the design use classes: class A : someParserCombinationThatMayUseA { ... } Which means A.parse (a static method) is just typeof(super).parse (also static, and so on). Does that entail any crippling disadvantage compared to hand-written parser? Philippe
Re: Pegged: Syntax Highlighting
On 03/17/2012 01:53 PM, Philippe Sigaud wrote: Does anyone have experience with other languages similar to D and that offer AST-walking? Doesn't C# have something like this? (I'll have a look at Scala macros) Hi Philippe. Of course the visitor pattern comes in mind. Eclipse (Java) uses a specialized visitor pattern called hierarchical visitor pattern to traverse the AST. The classic visitor pattern has the following disadvantages : -- hierarchical navigation -- the traditional Visitor Pattern has no concept of depth. As a result, visitor cannot determine if one composite is within another composite or beside it. -- conditional navigation -- the traditional Visitor Pattern does not allow branches to be skipped. As a result, visitor cannot stop, filter, or optimize traversal based on some condition. Interesting stuff at : http://c2.com/cgi/wiki?HierarchicalVisitorPattern You'll find some implementation details at the bottom of the doc. hth Bjoern
Re: D to Javascript converter (a hacked up dmd)
This is still alive: https://github.com/adamdruppe/dmd/tree/dtojs Merging in DMD changes as they happen has been fairly easy, so we have things like UFCS in there. I'm pretty happy with the core setup, though it still isn't finished. Enough of the D language works like you expect that you can do a lot of things with it naturally. Thus, I've moved on to libraries. I previously talked about the jslang and browser packages. These provide zero-overhead, direct access to javascript built-ins. You can also use a good chunk of the real Phobos in here if you want. But now, I'm adding minimal overhead nicer libraries, including a port of my dom.d, adopted to try to cover some browser incompatibilities. (The browser package, being zero overhead, makes no attempt at this. It just provides the tools to DIY.) The trick is though, doing it with as lean code as possible. Here's my domtest.d: import arsd.dom; import browser.window; void main() { Element e = document.getElementById(cool); e.addClass(sweet); e.style = font-weight: bold;; e.style.fontSize = 30px; e.innerText = e.style; window.alert(e.parentNode.tagName); e.removeFromTree(); } It generates about 2 kb of javascript, libraries included, after you strip unused functions and reduce the variable name length. (NOT gzipped!) Let's talk about one line at a time: Element e = document.getElementById(cool); In browser.document, there are JSElement and JSDocument classes defined. Here, though, I'm using a custom type: Element. The implementation for this uses zero overhead forwarding to native methods, unless I override them, in which case it is implemented as a free function. Many functions are now mixin templates in browser.document so I don't have to repeat them, even if changing types. Using alias this, these are compatible with the native types as well. e.addClass(sweet); Is a new function in class Element. It manipulates the native className property and in the resulting .js file is a free function. e.style = font-weight: bold;; The style object is defined in browser.document and has a long list of properties. It maps directly to native code, but with one addition: alias cssText this; style.cssText in Javascript is the attribute as a string. In my dom.d, Element.style can be treated as a string or a struct, and I wanted that here too. alias this accomplishes that goal with zero overhead. e.style = ; translates simply to e.style.cssText = ; e.style.fontSize = 30px; The property is a native one, nothing special here. e.innerText = e.style; innerText, while a property on some browsers, isn't present on all of them. Thus, class Element defines it itself. This generates a free function (with a mangled D name, so no conflict) that is called using D property syntax. The generated code looks like this: _D_mangled_innerText(e.style.cssText, e); This is the general pattern I'll do for browser compatibility. Use the real name in D, but let it implement as free functions with mangled names. window.alert(e.parentNode.tagName); The browser.window module defines some things that are global in JS, including alert(). e.parentNode returns an Element, thanks to a mixin that can specialize on types with zero overhead. tagName, while a native property, is actually a function here. The reason is dom.d uses lowercase tag names, but Javascript uses uppercase. Thus, this is implemented: @property string tagName() { return __js_this.tagName.toLowerCase(); } and the above line calls a D function. Some overhead, but very little. e.removeFromTree(); Finally, this is just a regular function. I'm thinking about changing it to UFCS though so it can check for null on e too. (It does: if(parentNode) parentNode.removeChild(this); the idea being to just ensure it isn't in the tree without needing an external null check. If this is null, it is also not in the tree, so its contract is satisified anyway, so it might as well succeed. Currently, though, it will not work since the Element invariant will segfault it.) But, there's a rundown of what I'm doing with libraries. Beautiful D code, compatible implementations, minimal overhead. I'll be posting the library soon, and then this will be sharable.
Re: Pegged: Syntax Highlighting
On 3/17/12 3:53 PM, Philippe Sigaud wrote: On Sat, Mar 17, 2012 at 18:11, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: The D grammar is a 1000-line / hundreds of rules monster. I finished writing it and am now crushing bugs. God, that generates a 10_000 line module to parse it. I should simplify the code generator somewhat. Science is done. Welcome to implementation :o). Hey, it's only 3.000 lines now :) Coming from a thousand-lines grammar, it's not that much an inflation. That's quite promising. Indeed, but I fear the D grammar is a bit too complex to be easily walked. Now that I read it, I realize that '1' is parsed as a 10-levels deep leaf! Compared to lisp, it's... not in the same league, to say the least. I will see to drastically simplify the parse tree. This is where custom directives for helping AST creation might help. Also, ANTLR solves that problem by allowing people to define tree walkers. They have much simpler grammars (heck, the hard job has already been done - no more ambiguities). At an extreme, languages such as ML are good at walking trees because they essentially embed a tree walker in their pattern matching grammar for function parameters. Does anyone have experience with other languages similar to D and that offer AST-walking? Doesn't C# have something like this? (I'll have a look at Scala macros) Heck, I just found this which destroys ANTLR's tree walkers: http://www.antlr.org/article/1170602723163/treewalkers.html Didn't read it yet, but clearly it's an opposing viewpoint and relevant to your work (don't forget to also read the article to which it's replying http://antlr.org/article/1100569809276/use.tree.grammars.tml). 1. Would you consider submitting your work to Phobos? Yes, of course. It's already Boost-licensed. Seeing the review processes for other modules, it'd most certainly put the code in great shape. But then, it's far from being submittable right now. Let us know how we can help. This is an important project. 2. Do you think your approach can generate parsers competitive with hand-written ones? If not, why? Right now, no, if only because I didn't take any step in making it fast or in limiting its RAM consumption. After applying some ideas I have, I don't know. There are many people here that are parser-aware and could help make the code faster. But at the core, to allow mutually recursive rules, the design use classes: class A : someParserCombinationThatMayUseA { ... } Which means A.parse (a static method) is just typeof(super).parse (also static, and so on). Does that entail any crippling disadvantage compared to hand-written parser? I'm not sure without seeing more code. Andrei