Re: Pegged, From EBNF to PEG

2012-03-17 Thread Philippe Sigaud
On Wed, Mar 14, 2012 at 10:48, Dmitry Olshansky dmitry.o...@gmail.com wrote:

 That's one of the caveats on PEG. That and greedy operators.

 'a*a' never succeeds because 'a*' consumes all the available a's.


 Hey, wait, I thought there has to be backtrack here, i.e. when second 'a'
 finally fails?

PEG only do local backtracking in 'or' choices, not in sequences. I
think that's because the original author was dealing with packrat
parsing and its O(input-size) guarantee.

I remember trying compile-time backtracking in another similar library
in D 1-2 years ago and getting lots of pb. I might add that in Pegged,
but I don't know the consequences. How do you do that in std.regex?

(nice article btw, I learnt some about regexes)


Re: Pegged: Syntax Highlighting

2012-03-17 Thread Philippe Sigaud
On Wed, Mar 14, 2012 at 21:03, Andrej Mitrovic
andrej.mitrov...@gmail.com wrote:

 how would one use a parser like Pegged for syntax
 highlighting?

 Ok, typically one would use a lexer and not a parser. But using a
 parser might be more interesting for creating more complex syntax
 highlighting. :)


 Actually I think I can use the new ddmd-clean port for just this
 purpose. Sorry for the noise.

Sorry for the late reply, I was away for a few days, in a Net-forsaken place ;)

If ddmd-clean is OK for you, that's cool. Keep us informed how that went.
If you want to use Pegged, you'd need to enter the entire D grammar to
get a correct parse tree.
I just finished writing it, but I'm afraid to try and compile it :)
It's one huge monster.


Re: Pegged, From EBNF to PEG

2012-03-17 Thread Dmitry Olshansky

On 17.03.2012 10:59, Philippe Sigaud wrote:

On Wed, Mar 14, 2012 at 10:48, Dmitry Olshanskydmitry.o...@gmail.com  wrote:


That's one of the caveats on PEG. That and greedy operators.

'a*a' never succeeds because 'a*' consumes all the available a's.



Hey, wait, I thought there has to be backtrack here, i.e. when second 'a'
finally fails?


PEG only do local backtracking in 'or' choices, not in sequences. I
think that's because the original author was dealing with packrat
parsing and its O(input-size) guarantee.



Ok, let's agree on fact that semantically
a* is :
As - a As / a

and a*? is this:
As - a / a As

Now that's local ;)



I remember trying compile-time backtracking in another similar library
in D 1-2 years ago and getting lots of pb. I might add that in Pegged,
but I don't know the consequences. How do you do that in std.regex?



There are two fundamental ways to get around the problem, the easiest
to implement is to use a stack to record a position in input + number of 
alternative/production (I call it a PC - program counter) every time 
there is a fork like that. Then when the current path ultimately fails 
pop stack, reset input and go over again until you either match or empty 
the stack. That's how to avoid dp recursion that happens here.


The problematic thing now is combinatorial explosion of
a* a* a*  //simplified, but you get the idea

The trick to avoid this sort of crap is to
1) agree that whatever matches first has higher priority (left-most 
match) that's a simple disambiguation rule.
2) record what positions + pc you place on stack e.g. use a set, or in 
fact, a bitmap to push every unique combination  (pc,index) only once.


Voila! Now you have a linear worst-case algorithm with (M*N) complexity 
where M is number of all possible PCs, and N is number of all possible 
indexes in input.


Now if we put aside all crap talk about mathematical purity and monads, 
and you name it, a packrat is essentially this, a dynamic programming 
trick applied to recursive top-down parsing. The trick could be extended 
even more to avoid extra checks on input characters, but the essence is 
this memoization.




(nice article btw, I learnt some about regexes)


Thanks, I hope it makes them more popular.
Might as well keep me busy fixing bugs :)

--
Dmitry Olshansky


Re: DUnit - class MyTest { mixin TestMixin; void testMethod1() {} void testMethod2() {}}

2012-03-17 Thread Marc P. Michel
On Monday, 20 February 2012 at 01:49:04 UTC, Juan Manuel Cabo 
wrote:
I thought I could do a better effort to describe why DUnit is 
so extraordinary,
for a native language, especially for those unfamiliar with 
xUnit frameworks


This is great stuff, thanks !

Anyway, I'm not fond of your examples; so here is a silly one 
from me :


http://lanael.free.fr/summertest.d.html




Re: Pegged: Syntax Highlighting

2012-03-17 Thread Extrawurst

On 17.03.2012 08:01, Philippe Sigaud wrote:

On Wed, Mar 14, 2012 at 21:03, Andrej Mitrovic
andrej.mitrov...@gmail.com  wrote:


how would one use a parser like Pegged for syntax
highlighting?


Ok, typically one would use a lexer and not a parser. But using a
parser might be more interesting for creating more complex syntax
highlighting. :)



Actually I think I can use the new ddmd-clean port for just this
purpose. Sorry for the noise.


Sorry for the late reply, I was away for a few days, in a Net-forsaken place ;)

If ddmd-clean is OK for you, that's cool. Keep us informed how that went.
If you want to use Pegged, you'd need to enter the entire D grammar to
get a correct parse tree.
I just finished writing it, but I'm afraid to try and compile it :)
It's one huge monster.


I want to use Pegged for that purpose. So go ahead an commit the D 
grammar ;)

Would be so awesome if Pegged would be able to parse D.

~Extrawurst


Re: Pegged, From EBNF to PEG

2012-03-17 Thread Philippe Sigaud
On Sat, Mar 17, 2012 at 10:09, Dmitry Olshansky dmitry.o...@gmail.com wrote:

 Ok, let's agree on fact that semantically
 a* is :
 As - a As / a

 and a*? is this:
 As - a / a As

 Now that's local ;)

It's local, yes. But the pb is with   Expr - A* B C D, when D fails.
PEG sequences don't backtrack.


 I remember trying compile-time backtracking in another similar library
 in D 1-2 years ago and getting lots of pb. I might add that in Pegged,
 but I don't know the consequences. How do you do that in std.regex?



 There are two fundamental ways to get around the problem
(snip)

Thanks for the explanations. Does std.regex implement this?


 Now if we put aside all crap talk about mathematical purity and monads, and
 you name it, a packrat is essentially this, a dynamic programming trick
 applied to recursive top-down parsing. The trick could be extended even more
 to avoid extra checks on input characters, but the essence is this
 memoization.

I plan to store indices anyway. I might add this in the future. A read
something on using PEGs to parse Java and C and there was no need to
do a complete memoization: storing the last parse result as a
temporary seemed to be enough.

 (nice article btw, I learnt some about regexes)


 Thanks, I hope it makes them more popular.
 Might as well keep me busy fixing bugs :)

As people said on Reddit, you should make more whooping sounds about
CT-regex. That was what wowed me and pushed me to tackle CT-parsing
again.


Re: Pegged: Syntax Highlighting

2012-03-17 Thread Extrawurst

On 17.03.2012 15:13, Philippe Sigaud wrote:

The D grammar is a 1000-line / hundreds of rules monster. I finished
writing it and am now crushing bugs.


Any ETA when u gonna commit it for the public ? Wouldn't mind getting my 
hands dirty on it and looking for bugs too ;)


Re: Pegged, From EBNF to PEG

2012-03-17 Thread Dmitry Olshansky

On 17.03.2012 18:11, Philippe Sigaud wrote:

On Sat, Mar 17, 2012 at 10:09, Dmitry Olshanskydmitry.o...@gmail.com  wrote:


Ok, let's agree on fact that semantically
a* is :
As- a As / a

and a*? is this:
As- a / a As

Now that's local ;)


It's local, yes. But the pb is with   Expr- A* B C D, when D fails.
PEG sequences don't backtrack.


I'd argue they do. As I see it as:
Expr - As B C D / B C D
As - A / A As
(or use an epsilon production for As, is it allowed in pegged ?)





I remember trying compile-time backtracking in another similar library
in D 1-2 years ago and getting lots of pb. I might add that in Pegged,
but I don't know the consequences. How do you do that in std.regex?




There are two fundamental ways to get around the problem

(snip)

Thanks for the explanations. Does std.regex implement this?



It does the second way that I haven't told about, and had hard time to 
implement in C-T :)
And the simple version of what I presented (i.e. stack stuff) is used 
in CT-regex and as fallback in R-T. The problem with doing a bitmap 
memoization is that regex often used to _search_ things in large inputs. 
Some compromise and/or thresholds have to be estimated and found.
Parsing on the contrary guaranties that the whole input is used or in 
error, hence bitmap is not wasted.





Now if we put aside all crap talk about mathematical purity and monads, and
you name it, a packrat is essentially this, a dynamic programming trick
applied to recursive top-down parsing. The trick could be extended even more
to avoid extra checks on input characters, but the essence is this
memoization.


I plan to store indices anyway. I might add this in the future. A read
something on using PEGs to parse Java and C and there was no need to
do a complete memoization: storing the last parse result as a
temporary seemed to be enough.



Well even straight no-memorization is somehow enough, it's just a way to 
ensure no extra work is done. C  Java are not much of backtracky grammars.



(nice article btw, I learnt some about regexes)



Thanks, I hope it makes them more popular.
Might as well keep me busy fixing bugs :)


As people said on Reddit, you should make more whooping sounds about
CT-regex. That was what wowed me and pushed me to tackle CT-parsing
again.


It's just I'm also well aware of how much luck it (still) takes to fly ;)
The asserts that compare C-T vs R-T go off too often for my taste, other 
then this the memory usage during compile is too high.
I recall once dmd had GC re-enabled for brief period, dmd used no more 
then ~64Mb doing real crazy ct-regex stuff.


OK, back to C-T parsing, I have one crazy idea that I can't get away 
from - add operator precedence grammar into the mix. From what I observe 
it should integrate into PEG smoothly. Then it would make 
military-grade hybrid that uses operator precedence parsing of 
expressions and the like. Few more tricks and it may beat some

existing parser generators.

See this post, where I tried to describe that idea early on:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.Darticle_id=159562

I might catch spare time to go about adding it myself, the only tricky 
thing is to embed plain semantic actions, as AST generation would be 
more or less the same.


--
Dmitry Olshansky


Re: It's official: The D Programming Language will participate to GSoC 2012!

2012-03-17 Thread Jacob Carlborg

On 2012-03-16 19:32, Steven Schveighoffer wrote:

On Fri, 16 Mar 2012 14:24:38 -0400, Andrei Alexandrescu
seewebsiteforem...@erdani.org wrote:


Just got the acceptance message. This is great news!

If you consider being a mentor, please apply as described in
http://dlang.org/gsoc2012.html. Thanks!


You really think Google summer of code is going to sponsor development
of iPhone compatibility? :) Not that I wouldn't welcome this with open
arms...


We'll see what happens. It can't hurt to have as an idea.

--
/Jacob Carlborg


Re: Pegged: Syntax Highlighting

2012-03-17 Thread Andrei Alexandrescu

On 3/17/12 9:13 AM, Philippe Sigaud wrote:

I want to use Pegged for that purpose. So go ahead an commit the D grammar
;)
Would be so awesome if Pegged would be able to parse D.

~Extrawurst


The D grammar is a 1000-line / hundreds of rules monster. I finished
writing it and am now crushing bugs.
God, that generates a 10_000 line module to parse it. I should
simplify the code generator somewhat.


Science is done. Welcome to implementation :o).

I can't say how excited I am about this direction. I have this vision of 
having a D grammar published on the website that is actually it, i.e. 
the same exact grammar is used by a validator that goes through all of 
our test suite. (The validator wouldn't do any semantic checking.) The 
parser generator _and_ the reference D grammar would be available in 
Phobos, so for anyone it would be dirt cheap to parse some D code and 
wander through the generated AST. The availability of a reference 
grammar and parser would be golden to a variety of D toolchain creators.


Just to gauge interest:

1. Would you consider submitting your work to Phobos?

2. Do you think your approach can generate parsers competitive with 
hand-written ones? If not, why?



Andrei


Re: Pegged, From EBNF to PEG

2012-03-17 Thread Philippe Sigaud
On Sat, Mar 17, 2012 at 15:48, Dmitry Olshansky dmitry.o...@gmail.com wrote:

 PEG sequences don't backtrack.


 I'd argue they do. As I see it as:
 Expr - As B C D / B C D
 As - A / A As

That's what people doing Regex-to-PEG translations do, yes. But it's
not the spontaneous behavior of A* B C D in PEG.

But that means I could add a switch to transform the expressions that way.


 (or use an epsilon production for As, is it allowed in pegged ?)

I called it 'Eps', it's a predefined parser that always matches and
consumes nothing.
I used the greek epsilon at the beginning (ε) but thought that many
people would shout at this :)


 OK, back to C-T parsing, I have one crazy idea that I can't get away from -
 add operator precedence grammar into the mix. From what I observe it should
 integrate into PEG smoothly. Then it would make military-grade hybrid that
 uses operator precedence parsing of expressions and the like. Few more
 tricks and it may beat some
 existing parser generators.

 See this post, where I tried to describe that idea early on:
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.Darticle_id=159562

I remember reading this. But I don't feel I'm up to it for now.


 I might catch spare time to go about adding it myself, the only tricky thing
 is to embed plain semantic actions, as AST generation would be more or less
 the same.

Cool!


Re: Pegged: Syntax Highlighting

2012-03-17 Thread Philippe Sigaud
On Sat, Mar 17, 2012 at 15:44, Extrawurst s...@extrawurst.org wrote:
 On 17.03.2012 15:13, Philippe Sigaud wrote:

 The D grammar is a 1000-line / hundreds of rules monster. I finished
 writing it and am now crushing bugs.


 Any ETA when u gonna commit it for the public ? Wouldn't mind getting my
 hands dirty on it and looking for bugs too ;)

I just pushed it on Github.

pegged/examples/dgrammar.d just contains the D grammar as a string.
pegged/examples/ddump.d is the generated parser family.

There are no more syntax bugs, Pegged accepts the string as a correct
grammar and DMD accepts to compile the resulting classes.
I tested the generated parser on microscopic D files and... it
sometimes works :)

I made many mistakes and typos while writing the grammar. I corrected
a few, but there are many more, without a doubt

I'll write a wiki page on how to generate the grammar anew, if need be.

Btw, the D grammar comes from the website (I didn't find the time to
compare it to the grammar Rainer uses for Mono-D), and its horribly
BNF-like: almost no + or * operators, etc. I tried to factor some
expressions and simplify some, but it could be a bit shorter (not
much, but still).


mysql tool

2012-03-17 Thread dnewbie
Hi D friends.

I'd like to share with you a little tool. It allows you to execute SQL 
statements in your MySQL database.
It displays the data in a nice data grid widget written by David Hillard.

I hope you like it.

Link
http://my.opera.com/run3/blog/2012/03/17/mysql-tool



Re: Pegged: Syntax Highlighting

2012-03-17 Thread Philippe Sigaud
On Sat, Mar 17, 2012 at 18:11, Andrei Alexandrescu
seewebsiteforem...@erdani.org wrote:

 The D grammar is a 1000-line / hundreds of rules monster. I finished
 writing it and am now crushing bugs.
 God, that generates a 10_000 line module to parse it. I should
 simplify the code generator somewhat.


 Science is done. Welcome to implementation :o).

Hey, it's only 3.000 lines now :) Coming from a thousand-lines
grammar, it's not that much an inflation.


 I can't say how excited I am about this direction. I have this vision of
 having a D grammar published on the website that is actually it, i.e. the
 same exact grammar is used by a validator that goes through all of our test
 suite. (The validator wouldn't do any semantic checking.) The parser
 generator _and_ the reference D grammar would be available in Phobos, so for
 anyone it would be dirt cheap to parse some D code and wander through the
 generated AST. The availability of a reference grammar and parser would be
 golden to a variety of D toolchain creators.

Indeed, but I fear the D grammar is a bit too complex to be easily
walked. Now that I read it, I realize that '1' is parsed as a
10-levels deep leaf!
Compared to lisp, it's... not in the same league, to say the least. I
will see to drastically simplify the parse tree.

Does anyone have experience with other languages similar to D and that
offer AST-walking? Doesn't C# have something like this?
(I'll have a look at Scala macros)

 Just to gauge interest:

 1. Would you consider submitting your work to Phobos?

Yes, of course. It's already Boost-licensed.
Seeing the review processes for other modules, it'd most certainly put
the code in great shape. But then, it's far from being submittable
right now.


 2. Do you think your approach can generate parsers competitive with
 hand-written ones? If not, why?

Right now, no, if only because I didn't take any step in making it
fast or in limiting its RAM consumption.
After applying some ideas I have, I don't know. There are many people
here that are parser-aware and could help make the code faster. But at
the core, to allow mutually recursive rules, the design use classes:

class A : someParserCombinationThatMayUseA { ... }

Which means A.parse (a static method) is just typeof(super).parse
(also static, and so on). Does that entail any crippling disadvantage
compared to hand-written parser?


Philippe


Re: Pegged: Syntax Highlighting

2012-03-17 Thread bls

On 03/17/2012 01:53 PM, Philippe Sigaud wrote:

Does anyone have experience with other languages similar to D and that
offer AST-walking? Doesn't C# have something like this?
(I'll have a look at Scala macros)



Hi Philippe.
Of course the visitor pattern comes in mind.

Eclipse (Java) uses a specialized visitor pattern  called hierarchical 
visitor pattern to traverse the AST.


The classic visitor pattern has the following disadvantages :

-- hierarchical navigation -- the traditional Visitor Pattern has no 
concept of depth. As a result, visitor cannot determine if one composite 
is within another composite or beside it.


-- conditional navigation -- the traditional Visitor Pattern does not 
allow branches to be skipped. As a result, visitor cannot stop, filter, 
or optimize traversal based on some condition.


Interesting stuff at :

http://c2.com/cgi/wiki?HierarchicalVisitorPattern
You'll find some implementation details at the bottom of the doc.
hth Bjoern


Re: D to Javascript converter (a hacked up dmd)

2012-03-17 Thread Adam D. Ruppe

This is still alive:

https://github.com/adamdruppe/dmd/tree/dtojs

Merging in DMD changes as they happen has been fairly
easy, so we have things like UFCS in there.

I'm pretty happy with the core setup, though it still
isn't finished. Enough of the D language works like
you expect that you can do a lot of things with it
naturally.

Thus, I've moved on to libraries. I previously
talked about the jslang and browser packages. These
provide zero-overhead, direct access to javascript
built-ins.

You can also use a good chunk of the real Phobos in
here if you want.


But now, I'm adding minimal overhead nicer libraries,
including a port of my dom.d, adopted to try to
cover some browser incompatibilities. (The browser
package, being zero overhead, makes no attempt at
this. It just provides the tools to DIY.)


The trick is though, doing it with as lean code as
possible. Here's my domtest.d:

import arsd.dom;
import browser.window;

void main() {
Element e = document.getElementById(cool);
e.addClass(sweet);
e.style = font-weight: bold;;
e.style.fontSize = 30px;
e.innerText = e.style;
window.alert(e.parentNode.tagName);
e.removeFromTree();
}


It generates about 2 kb of javascript, libraries
included, after you strip unused functions and
reduce the variable name length. (NOT gzipped!)

Let's talk about one line at a time:

Element e = document.getElementById(cool);

In browser.document, there are JSElement and JSDocument
classes defined. Here, though, I'm using a custom
type: Element.

The implementation for this uses zero overhead forwarding
to native methods, unless I override them, in which case
it is implemented as a free function.

Many functions are now mixin templates in browser.document
so I don't have to repeat them, even if changing types.

Using alias this, these are compatible with the native
types as well.


e.addClass(sweet);

Is a new function in class Element. It manipulates the
native className property and in the resulting .js file
is a free function.

e.style = font-weight: bold;;

The style object is defined in browser.document and has
a  long list of properties. It maps directly to native
code, but with one addition:

   alias cssText this;


style.cssText in Javascript is the attribute as a string.
In my dom.d, Element.style can be treated as a string or
a struct, and I wanted that here too.

alias this accomplishes that goal with zero overhead.
e.style = ; translates simply to e.style.cssText = ;

e.style.fontSize = 30px;

The property is a native one, nothing special here.

e.innerText = e.style;

innerText, while a property on some browsers, isn't present
on all of them.

Thus, class Element defines it itself. This generates a free
function (with a mangled D name, so no conflict) that is
called using D property syntax.

The generated code looks like this:
_D_mangled_innerText(e.style.cssText, e);



This is the general pattern I'll do for browser compatibility.
Use the real name in D, but let it implement as free functions
with mangled names.

window.alert(e.parentNode.tagName);


The browser.window module defines some things that are global
in JS, including alert(). e.parentNode returns an Element,
thanks to a mixin that can specialize on types with zero overhead.


tagName, while a native property, is actually a function here.
The reason is dom.d uses lowercase tag names, but Javascript
uses uppercase.

Thus, this is implemented:

@property string tagName() { return 
__js_this.tagName.toLowerCase(); }



and the above line calls a D function. Some overhead, but very
little.

e.removeFromTree();


Finally, this is just a regular function. I'm thinking about
changing it to UFCS though so it can check for null on e too.

(It does:

 if(parentNode) parentNode.removeChild(this);

the idea being to just ensure it isn't in the tree without
needing an external null check.

If this is null, it is also not in the tree, so its contract
is satisified anyway, so it might as well succeed. Currently,
though, it will not work since the Element invariant will
segfault it.)




But, there's a rundown of what I'm doing with libraries.
Beautiful D code, compatible implementations, minimal
overhead.


I'll be posting the library soon, and then this will
be sharable.


Re: Pegged: Syntax Highlighting

2012-03-17 Thread Andrei Alexandrescu

On 3/17/12 3:53 PM, Philippe Sigaud wrote:

On Sat, Mar 17, 2012 at 18:11, Andrei Alexandrescu
seewebsiteforem...@erdani.org  wrote:


The D grammar is a 1000-line / hundreds of rules monster. I finished
writing it and am now crushing bugs.
God, that generates a 10_000 line module to parse it. I should
simplify the code generator somewhat.



Science is done. Welcome to implementation :o).


Hey, it's only 3.000 lines now :) Coming from a thousand-lines
grammar, it's not that much an inflation.


That's quite promising.


Indeed, but I fear the D grammar is a bit too complex to be easily
walked. Now that I read it, I realize that '1' is parsed as a
10-levels deep leaf!
Compared to lisp, it's... not in the same league, to say the least. I
will see to drastically simplify the parse tree.


This is where custom directives for helping AST creation might help. 
Also, ANTLR solves that problem by allowing people to define tree 
walkers. They have much simpler grammars (heck, the hard job has already 
been done - no more ambiguities). At an extreme, languages such as ML 
are good at walking trees because they essentially embed a tree walker 
in their pattern matching grammar for function parameters.



Does anyone have experience with other languages similar to D and that
offer AST-walking? Doesn't C# have something like this?
(I'll have a look at Scala macros)


Heck, I just found this which destroys ANTLR's tree walkers:

http://www.antlr.org/article/1170602723163/treewalkers.html

Didn't read it yet, but clearly it's an opposing viewpoint and relevant 
to your work (don't forget to also read the article to which it's 
replying http://antlr.org/article/1100569809276/use.tree.grammars.tml).



1. Would you consider submitting your work to Phobos?


Yes, of course. It's already Boost-licensed.
Seeing the review processes for other modules, it'd most certainly put
the code in great shape. But then, it's far from being submittable
right now.


Let us know how we can help. This is an important project.


2. Do you think your approach can generate parsers competitive with
hand-written ones? If not, why?


Right now, no, if only because I didn't take any step in making it
fast or in limiting its RAM consumption.
After applying some ideas I have, I don't know. There are many people
here that are parser-aware and could help make the code faster. But at
the core, to allow mutually recursive rules, the design use classes:

class A : someParserCombinationThatMayUseA { ... }

Which means A.parse (a static method) is just typeof(super).parse
(also static, and so on). Does that entail any crippling disadvantage
compared to hand-written parser?


I'm not sure without seeing more code.


Andrei