Re: Pegged: Syntax Highlighting

2012-03-13 Thread Andrej Mitrovic
On 3/11/12, Philippe Sigaud  wrote:
> Parsing Expression Grammar (PEG) generator in D.

Slightly off-topic and not directly aimed at Philippe, but I'm
curious, how would one use a parser like Pegged for syntax
highlighting? I know my way around painting and using style bits for
coloring text (e.g. I have a D port of Neatpad I play around with),
but I'm not sure how I would combine this with the parser. Maybe
someone more knowledgeable (Rainer Schuetze from VisualD perhaps? :) )
could chime in.

For example, I can set style bits for a certain range of text, e.g.
the style bits for this text:
x( int var);
might be (where S designates Style):
S.Text, S.Blank, S.OpenParen, S.Type, S.Type, S.Type, S.Blank...

Well you get the picture. To set those style bits I do need to run a
parser on the input string. So let's say Peg gives me back this
structure (I'm just guessing what the structure looks like):

ParseTree("Function", ...)
  .children -> ParseTree("BuiltInType", ...)
  .children -> ParseTree("ParamDecl")

This is all good so far, I can extract the needed info for the style
bits, but what I don't have are the offsets into the string at which a
ParseTree starts and ends. So I wouldn't know at which offset to put a
set of style bits.

I can't tell which parts of Pegged are the API and which are internal
functions, but maybe I could use some specific functions instead of
just calling .parse()., and then walk through the string and set each
style bit.

Anyway I think this is an interesting topic worth talking about. :)


Re: Pegged: Syntax Highlighting

2012-03-14 Thread Andrej Mitrovic
On 3/14/12, Andrej Mitrovic  wrote:
> how would one use a parser like Pegged for syntax
> highlighting?

Ok, typically one would use a lexer and not a parser. But using a
parser might be more interesting for creating more complex syntax
highlighting. :)


Re: Pegged: Syntax Highlighting

2012-03-14 Thread Andrej Mitrovic
On 3/14/12, Andrej Mitrovic  wrote:
> On 3/14/12, Andrej Mitrovic  wrote:
>> how would one use a parser like Pegged for syntax
>> highlighting?
>
> Ok, typically one would use a lexer and not a parser. But using a
> parser might be more interesting for creating more complex syntax
> highlighting. :)
>

Actually I think I can use the new ddmd-clean port for just this
purpose. Sorry for the noise.


Re: Pegged: Syntax Highlighting

2012-03-17 Thread Philippe Sigaud
On Wed, Mar 14, 2012 at 21:03, Andrej Mitrovic
 wrote:

>>> how would one use a parser like Pegged for syntax
>>> highlighting?
>>
>> Ok, typically one would use a lexer and not a parser. But using a
>> parser might be more interesting for creating more complex syntax
>> highlighting. :)
>>
>
> Actually I think I can use the new ddmd-clean port for just this
> purpose. Sorry for the noise.

Sorry for the late reply, I was away for a few days, in a Net-forsaken place ;)

If ddmd-clean is OK for you, that's cool. Keep us informed how that went.
If you want to use Pegged, you'd need to enter the entire D grammar to
get a correct parse tree.
I just finished writing it, but I'm afraid to try and compile it :)
It's one huge monster.


Re: Pegged: Syntax Highlighting

2012-03-17 Thread Extrawurst

On 17.03.2012 08:01, Philippe Sigaud wrote:

On Wed, Mar 14, 2012 at 21:03, Andrej Mitrovic
  wrote:


how would one use a parser like Pegged for syntax
highlighting?


Ok, typically one would use a lexer and not a parser. But using a
parser might be more interesting for creating more complex syntax
highlighting. :)



Actually I think I can use the new ddmd-clean port for just this
purpose. Sorry for the noise.


Sorry for the late reply, I was away for a few days, in a Net-forsaken place ;)

If ddmd-clean is OK for you, that's cool. Keep us informed how that went.
If you want to use Pegged, you'd need to enter the entire D grammar to
get a correct parse tree.
I just finished writing it, but I'm afraid to try and compile it :)
It's one huge monster.


I want to use Pegged for that purpose. So go ahead an commit the D 
grammar ;)

Would be so awesome if Pegged would be able to parse D.

~Extrawurst


Re: Pegged: Syntax Highlighting

2012-03-17 Thread Philippe Sigaud
> I want to use Pegged for that purpose. So go ahead an commit the D grammar
> ;)
> Would be so awesome if Pegged would be able to parse D.
>
> ~Extrawurst

The D grammar is a 1000-line / hundreds of rules monster. I finished
writing it and am now crushing bugs.
God, that generates a 10_000 line module to parse it. I should
simplify the code generator somewhat.


Re: Pegged: Syntax Highlighting

2012-03-17 Thread Extrawurst

On 17.03.2012 15:13, Philippe Sigaud wrote:

The D grammar is a 1000-line / hundreds of rules monster. I finished
writing it and am now crushing bugs.


Any ETA when u gonna commit it for the public ? Wouldn't mind getting my 
hands dirty on it and looking for bugs too ;)


Re: Pegged: Syntax Highlighting

2012-03-17 Thread Andrei Alexandrescu

On 3/17/12 9:13 AM, Philippe Sigaud wrote:

I want to use Pegged for that purpose. So go ahead an commit the D grammar
;)
Would be so awesome if Pegged would be able to parse D.

~Extrawurst


The D grammar is a 1000-line / hundreds of rules monster. I finished
writing it and am now crushing bugs.
God, that generates a 10_000 line module to parse it. I should
simplify the code generator somewhat.


Science is done. Welcome to implementation :o).

I can't say how excited I am about this direction. I have this vision of 
having a D grammar published on the website that is actually "it", i.e. 
the same exact grammar is used by a validator that goes through all of 
our test suite. (The validator wouldn't do any semantic checking.) The 
parser generator _and_ the reference D grammar would be available in 
Phobos, so for anyone it would be dirt cheap to parse some D code and 
wander through the generated AST. The availability of a reference 
grammar and parser would be golden to a variety of D toolchain creators.


Just to gauge interest:

1. Would you consider submitting your work to Phobos?

2. Do you think your approach can generate parsers competitive with 
hand-written ones? If not, why?



Andrei


Re: Pegged: Syntax Highlighting

2012-03-17 Thread Philippe Sigaud
On Sat, Mar 17, 2012 at 15:44, Extrawurst  wrote:
> On 17.03.2012 15:13, Philippe Sigaud wrote:
>>
>> The D grammar is a 1000-line / hundreds of rules monster. I finished
>> writing it and am now crushing bugs.
>
>
> Any ETA when u gonna commit it for the public ? Wouldn't mind getting my
> hands dirty on it and looking for bugs too ;)

I just pushed it on Github.

pegged/examples/dgrammar.d just contains the D grammar as a string.
pegged/examples/ddump.d is the generated parser family.

There are no more syntax bugs, Pegged accepts the string as a correct
grammar and DMD accepts to compile the resulting classes.
I tested the generated parser on microscopic D files and... it
sometimes works :)

I made many mistakes and typos while writing the grammar. I corrected
a few, but there are many more, without a doubt

I'll write a wiki page on how to generate the grammar anew, if need be.

Btw, the D grammar comes from the website (I didn't find the time to
compare it to the grammar Rainer uses for Mono-D), and its horribly
BNF-like: almost no + or * operators, etc. I tried to factor some
expressions and simplify some, but it could be a bit shorter (not
much, but still).


Re: Pegged: Syntax Highlighting

2012-03-17 Thread Philippe Sigaud
On Sat, Mar 17, 2012 at 18:11, Andrei Alexandrescu
 wrote:

>> The D grammar is a 1000-line / hundreds of rules monster. I finished
>> writing it and am now crushing bugs.
>> God, that generates a 10_000 line module to parse it. I should
>> simplify the code generator somewhat.
>
>
> Science is done. Welcome to implementation :o).

Hey, it's only 3.000 lines now :) Coming from a thousand-lines
grammar, it's not that much an inflation.


> I can't say how excited I am about this direction. I have this vision of
> having a D grammar published on the website that is actually "it", i.e. the
> same exact grammar is used by a validator that goes through all of our test
> suite. (The validator wouldn't do any semantic checking.) The parser
> generator _and_ the reference D grammar would be available in Phobos, so for
> anyone it would be dirt cheap to parse some D code and wander through the
> generated AST. The availability of a reference grammar and parser would be
> golden to a variety of D toolchain creators.

Indeed, but I fear the D grammar is a bit too complex to be easily
walked. Now that I read it, I realize that '1' is parsed as a
10-levels deep leaf!
Compared to lisp, it's... not in the same league, to say the least. I
will see to drastically simplify the parse tree.

Does anyone have experience with other languages similar to D and that
offer AST-walking? Doesn't C# have something like this?
(I'll have a look at Scala macros)

> Just to gauge interest:
>
> 1. Would you consider submitting your work to Phobos?

Yes, of course. It's already Boost-licensed.
Seeing the review processes for other modules, it'd most certainly put
the code in great shape. But then, it's far from being submittable
right now.


> 2. Do you think your approach can generate parsers competitive with
> hand-written ones? If not, why?

Right now, no, if only because I didn't take any step in making it
fast or in limiting its RAM consumption.
After applying some ideas I have, I don't know. There are many people
here that are parser-aware and could help make the code faster. But at
the core, to allow mutually recursive rules, the design use classes:

class A : someParserCombinationThatMayUseA { ... }

Which means A.parse (a static method) is just typeof(super).parse
(also static, and so on). Does that entail any crippling disadvantage
compared to hand-written parser?


Philippe


Re: Pegged: Syntax Highlighting

2012-03-17 Thread bls

On 03/17/2012 01:53 PM, Philippe Sigaud wrote:

Does anyone have experience with other languages similar to D and that
offer AST-walking? Doesn't C# have something like this?
(I'll have a look at Scala macros)



Hi Philippe.
Of course the visitor pattern comes in mind.

Eclipse (Java) uses a specialized visitor pattern  called "hierarchical 
visitor pattern" to traverse the AST.


The classic visitor pattern has the following disadvantages :

-- hierarchical navigation -- the traditional Visitor Pattern has no 
concept of depth. As a result, visitor cannot determine if one composite 
is within another composite or beside it.


-- conditional navigation -- the traditional Visitor Pattern does not 
allow branches to be skipped. As a result, visitor cannot stop, filter, 
or optimize traversal based on some condition.


Interesting stuff at :

http://c2.com/cgi/wiki?HierarchicalVisitorPattern
You'll find some implementation details at the bottom of the doc.
hth Bjoern


Re: Pegged: Syntax Highlighting

2012-03-17 Thread Andrei Alexandrescu

On 3/17/12 3:53 PM, Philippe Sigaud wrote:

On Sat, Mar 17, 2012 at 18:11, Andrei Alexandrescu
  wrote:


The D grammar is a 1000-line / hundreds of rules monster. I finished
writing it and am now crushing bugs.
God, that generates a 10_000 line module to parse it. I should
simplify the code generator somewhat.



Science is done. Welcome to implementation :o).


Hey, it's only 3.000 lines now :) Coming from a thousand-lines
grammar, it's not that much an inflation.


That's quite promising.


Indeed, but I fear the D grammar is a bit too complex to be easily
walked. Now that I read it, I realize that '1' is parsed as a
10-levels deep leaf!
Compared to lisp, it's... not in the same league, to say the least. I
will see to drastically simplify the parse tree.


This is where custom directives for helping AST creation might help. 
Also, ANTLR solves that problem by allowing people to define tree 
walkers. They have much simpler grammars (heck, the hard job has already 
been done - no more ambiguities). At an extreme, languages such as ML 
are good at walking trees because they essentially embed a tree walker 
in their pattern matching grammar for function parameters.



Does anyone have experience with other languages similar to D and that
offer AST-walking? Doesn't C# have something like this?
(I'll have a look at Scala macros)


Heck, I just found this which destroys ANTLR's tree walkers:

http://www.antlr.org/article/1170602723163/treewalkers.html

Didn't read it yet, but clearly it's an opposing viewpoint and relevant 
to your work (don't forget to also read the article to which it's 
replying http://antlr.org/article/1100569809276/use.tree.grammars.tml).



1. Would you consider submitting your work to Phobos?


Yes, of course. It's already Boost-licensed.
Seeing the review processes for other modules, it'd most certainly put
the code in great shape. But then, it's far from being submittable
right now.


Let us know how we can help. This is an important project.


2. Do you think your approach can generate parsers competitive with
hand-written ones? If not, why?


Right now, no, if only because I didn't take any step in making it
fast or in limiting its RAM consumption.
After applying some ideas I have, I don't know. There are many people
here that are parser-aware and could help make the code faster. But at
the core, to allow mutually recursive rules, the design use classes:

class A : someParserCombinationThatMayUseA { ... }

Which means A.parse (a static method) is just typeof(super).parse
(also static, and so on). Does that entail any crippling disadvantage
compared to hand-written parser?


I'm not sure without seeing more code.


Andrei


Re: Pegged: Syntax Highlighting

2012-03-27 Thread Andrej Mitrovic
On 3/17/12, Philippe Sigaud  wrote:
> If ddmd-clean is OK for you, that's cool. Keep us informed how that went.

Seems to work ok: http://i.imgur.com/qGVZD.png

I'd love to see if I can do it with Pegged too. I've yet to see how
Pegged works internally though and whether I can expose a nice API for
this one purpose.


Re: Pegged: Syntax Highlighting

2012-03-27 Thread Philippe Sigaud
On Tue, Mar 27, 2012 at 16:41, Andrej Mitrovic
 wrote:
> On 3/17/12, Philippe Sigaud  wrote:
>> If ddmd-clean is OK for you, that's cool. Keep us informed how that went.
>
> Seems to work ok: http://i.imgur.com/qGVZD.png

Nice one. Care to explain how you did it?


Re: Pegged: Syntax Highlighting

2012-03-28 Thread Andrej Mitrovic
On 3/27/12, Philippe Sigaud  wrote:
> Nice one. Care to explain how you did it?

Sure. Currently the "editor" is just a viewer (can't edit text
ironically :p), and is a port of one of the lessons of Neatpad
(http://www.catch22.net/tuts/neatpad). It's win32-specific and later
lessons cover very platform-specific unicode stuff so I haven't really
bothered with the rest of the tutorial.

What I have is one large char[]/wchar[] buffer, I store indices to
newlines within this buffer and when I need to lex a certain line I
just pass a slice into DDMD's lexer based on the position of the
newlines. I then store the beginning of each token and its type (e.g.
{ index 5, TOK.TOKImport }) as an array for that specific line. It's
easy to paint a line this way.

But I do have a couple of issues. One is that I have no way to figure
out where empty spaces are and not just spaces within string literals.
The DDMD API only exposes the beginning of each token and not its
length. And the lexer doesn't tokenize empty spaces between real
tokens. So with a string like this:
import foo;

The space between 'import' and 'foo' ends up being treated as
'TOK.TOKImport'. It's not a big issue when I only have foreground
coloring (empty space won't be drawn), but when I have background
coloring I end up with this:
http://i.imgur.com/0wUcR.png

The other issue is that DDMD explicitly takes a char[] and not just
any input range. The WinAPI text-drawing APIs require UTF16 arrays
(the unicode-aware functions anyway), so I end up having to store two
buffers, one UTF8 and one UTF16.

I'm sure these issues can be fixed in DDMD though. With that being
said the paint routine only takes about ~150 microseconds to finish
which is pretty neat.

Anyway if you have a win32 box you can clone:
https://github.com/AndrejMitrovic/DNeatpad
Then run:
DNeatpad\WindowsAPI\build.bat
DNeatpad\ddmd\build.bat
DNeatpad\textview\build.bat

That last one builds the "neatpad" folder as well. Anyway I was just
doing this for fun I have no intention on writing text editors. :)


Re: Pegged: Syntax Highlighting

2012-03-28 Thread Andrej Mitrovic
On 3/28/12, Andrej Mitrovic  wrote:
> snip

Btw it crashes sometime when I open std.datetime and scroll and resize
the window. I've no idea what's causing it. I don't seem to index pass
array bounds, and I'm not allocating win32 handles all the time
either. A catch(Throwable) doesn't help. Oh well..


Re: Pegged: Syntax Highlighting

2012-03-28 Thread Andrej Mitrovic
On 3/28/12, Andrej Mitrovic  wrote:
> On 3/28/12, Andrej Mitrovic  wrote:
>> snip

Accidentally left out ddmd from the repo but now it's in. I think it
should compile now. Let me know if it doesn't.


Re: Pegged: Syntax Highlighting

2012-03-28 Thread Andrej Mitrovic
On 3/27/12, Philippe Sigaud  wrote:
> snip

Philippe your example on this wiki page doesn't seem to work:
https://github.com/PhilippeSigaud/Pegged/wiki/

import pegged.grammar;

mixin(grammar(
   "Expr <- Factor AddExpr*
AddExpr  <- ('+'/'-') Factor
Factor   <- Primary MulExpr*
MulExpr  <- ('*'/'/') Primary
Primary  <- Parens / Number / Variable / '-' Primary

Parens   <- '(' Expr ')'
Number   <~ [0-9]+
Variable <- Identifier"));

void main()
{
auto parseTree2 = Expr.parse(" 0 + 123 - 456 ");
writeln(parseTree2.capture);
}

["Expr failure at pos [index: 0, line: 0, col: 0]", "Factor failure at
pos [index: 0, line: 0, col: 0]", "Primary failure at pos [index: 0,
line: 0, col: 0]", "Parens failure at pos [index: 0, line: 0, col:
0]", "Lit!(() failure at pos [index: 0, line: 0, col: 0]"]


Re: Pegged: Syntax Highlighting

2012-03-28 Thread Andrej Mitrovic
On 3/28/12, Andrej Mitrovic  wrote:
> On 3/27/12, Philippe Sigaud  wrote:
>> snip
>
> Philippe your example on this wiki page doesn't seem to work:
> https://github.com/PhilippeSigaud/Pegged/wiki/

Actually it seems to work if I remove all the spaces from the input
string. Maybe the grammar is just missing another rule that allows
spaces?


Re: Pegged: Syntax Highlighting

2012-03-28 Thread Andrej Mitrovic
On 3/28/12, Andrej Mitrovic  wrote:
> On 3/28/12, Andrej Mitrovic  wrote:
>> On 3/27/12, Philippe Sigaud  wrote:
>>> snip
>>
>> Philippe your example on this wiki page doesn't seem to work:
>> https://github.com/PhilippeSigaud/Pegged/wiki/
>
> Actually it seems to work if I remove all the spaces from the input
> string. Maybe the grammar is just missing another rule that allows
> spaces?

Okay I got it, you've recently changed some code. I can see it
mentioned in the readme:

By default, the grammars do not silently consume spaces, as this is
the standard behavior for PEGs. There is an opt-out though, with the
simple `<` arrow instead of `<-` (you can see it in the previous
example)

So yeah, if I change to '<' it works. :)


Re: Pegged: Syntax Highlighting

2012-03-28 Thread Andrej Mitrovic
On 3/28/12, Andrej Mitrovic  wrote:
> snip

Ouch, DMD crashes with that autogenerated ddump D grammar file.


Re: Pegged: Syntax Highlighting

2012-03-28 Thread Andrej Mitrovic
On 3/28/12, Andrej Mitrovic  wrote:
> On 3/28/12, Andrej Mitrovic  wrote:
>> snip
>
> Ouch, DMD crashes with that autogenerated ddump D grammar file.
>

Also asModule seems to have stopped generating valid modules since the
last time I've tried it. I keep getting this error when importing a
generated file:

arithmetic.d(44): Error: undefined identifier module arithmetic.empty
arithmetic.d(31):called from here:
parse(Input(input,Pos(0u,0u,0u),AssociativeList(null)))
simpleTest.d(30):called from here: parse("2/(8*7988+1*6196-y)")
Failed: "dmd" "-w" "-wi" "-v" "-o-" "simpleTest.d" "-I."


Re: Pegged: Syntax Highlighting

2012-03-28 Thread Philippe Sigaud
On Wed, Mar 28, 2012 at 18:06, Andrej Mitrovic
 wrote:
> Okay I got it, you've recently changed some code. I can see it
> mentioned in the readme:
>
> By default, the grammars do not silently consume spaces, as this is
> the standard behavior for PEGs. There is an opt-out though, with the
> simple `<` arrow instead of `<-` (you can see it in the previous
> example)
>
> So yeah, if I change to '<' it works. :)

Damn, I changed README.md a few days ago and forgot the equivalent
page in the wiki :( I knew that duplicating content would cause
trouble.


Re: Pegged: Syntax Highlighting

2012-03-28 Thread Philippe Sigaud
On Wed, Mar 28, 2012 at 19:08, Andrej Mitrovic
 wrote:
> On 3/28/12, Andrej Mitrovic  wrote:
>> snip
>
> Ouch, DMD crashes with that autogenerated ddump D grammar file.

Yeah, I spent two evenings trying to get why there is a segmentation
fault. I found some nice bugs (the rules have an internal member
called 'name' and for recursive rules it can become infinite). I still
don't get why the D grammar does this.

I'll start again, with a C grammar (I read one from the ANSI report
today). I should have done it in smaller steps. Right now, I'm more
into changing bits of the underlying code and then will code grammars
again.


Re: Pegged: Syntax Highlighting

2012-03-28 Thread Philippe Sigaud
On Wed, Mar 28, 2012 at 19:19, Andrej Mitrovic
 wrote:
> Also asModule seems to have stopped generating valid modules since the
> last time I've tried it. I keep getting this error when importing a
> generated file:
>
> arithmetic.d(44): Error: undefined identifier module arithmetic.empty
> arithmetic.d(31):        called from here:
> parse(Input(input,Pos(0u,0u,0u),AssociativeList(null)))
> simpleTest.d(30):        called from here: parse("2/(8*7988+1*6196-y)")
> Failed: "dmd" "-w" "-wi" "-v" "-o-" "simpleTest.d" "-I."

Ah, I'm preparing a future switch to ranges and changed 'arr.length ==
0' calls to 'arr.empty'. I forgot to put an 'import std.array;' at the
beginning of 'asModule()( I guess.

OK, it's done and on Github. Thanks for the headup!


Re: Pegged: Syntax Highlighting

2012-03-28 Thread Andrej Mitrovic
On 3/28/12, Philippe Sigaud  wrote:
> OK, it's done and on Github. Thanks for the headup!

Cool, thanks for the quick fixes!

I see that each child in the parse tree has a begin/end position mark,
this seems to be exactly what I need for syntax highlighting. I'll try
have some fun with it.


Re: Pegged: Syntax Highlighting

2012-03-29 Thread Andrej Mitrovic
On 3/28/12, Andrej Mitrovic  wrote:
> On 3/28/12, Philippe Sigaud  wrote:
>> OK, it's done and on Github. Thanks for the headup!

I've more to report :)

I don't know exactly which files I need to compile Pegged as a
library. Which files are essential and which can be left out? I know
utils.manual doesn't need to be included, but there's utils.bootstrap
which has a wrong import to pegged.utils.PEgrammar (I think it should
be pegged.utils.PEGGEDgrammar), and this statement "enum PEGCode =
grammar(PEG);" should probably be  "enum PEGCode =
grammar(PEGGEDgrammar)".

Still if I try to compile all the modules I get an out of memory error from DMD.


Re: Pegged: Syntax Highlighting

2012-03-29 Thread Philippe Sigaud
On Thu, Mar 29, 2012 at 12:40, Andrej Mitrovic
 wrote:

> I don't know exactly which files I need to compile Pegged as a
> library.

I propose we move this discussion the 'issues' part of Pegged Github
page. No need to pollute the announce ML.

FYI, you just need pegged.peg, pegged.grammar and
pegged.utils.associative (the latter being a transitory measure until
H.S. Teoh AA remplacement is incorporated in Phobos or someone can
testify D built-in AA work perfectly at CT. Heck, maybe I don't even
need AA)

The other files are part of the bootstrapping process and some
leftovers. I'll move them to another directory and explain on the page
how to compile Pegged as a lib. I never do that while developing and
didn't think about indicating this.

Philippe