Re: [Haskell-cafe] Question about implementing an off-side rule in Parsec 2

2009-04-28 Thread Lennart Augustsson
Implementing exactly Haskell's rule for indentation is incredibly hard.
In fact, no known Haskell compiler gets it right.
But if you make a slightly simpler one, it's easy.  The simple one is
the one based only on indentation.

There are different ways you can do this.

For instance, you can preprocess the token stream from the lexer.
This prprocessor needs a little bit of parsing, e.g., if it encounters
a let token that is not followed by a { token it should insert a
{ and then a corresponding } at the right place (this requires
every token to carry its column number).

You can also integrate it more into the parser.  Make, say, a block
parsing combinator that is called after seeing a let.  If block does
not see a { it will modify the remaining token stream to insert the
} at the right place.  Nested blocks will similarely to their
modifications.

You can also imagine inserting indentation change as a new kind of
token in the token streak and then rewriting the grammar to deal with
this.

Personally, I like option two (the block parsing combinator).  I've
used it several times.

  -- Lennart

On Mon, Apr 27, 2009 at 10:41 PM, Bas van Gijzel neneko...@gmail.com wrote:
 Hello everyone,

 I'm doing a bachelor project focused on comparing parsers. One of the parser
 libraries I'm using is Parsec (2) and I'm going to implement a very small
 subset of haskell with it, with as most important feature the off-side rule
 (indentation based parsing) used in function definitions and possibly in
 where clauses.

 But I'm still a bit stuck on how to implement this cleanly. I tried to
 search for some examples on blogs but I haven't found anything yet. As far
 as I can see the way to go would be using getState and updateState methods
 defined in Parsec.Prim and to use the methods in Parsec.Pos to compare the
 difference in indendation for tokens.

 But I haven't completely wrapped my head around any state monad yet and I
 don't understand Parsec enough yet to see how to use the methods Parsec.Pos
 and state easily. Some examples or pointers to something to read would
 really be helpful.

 Thanks in advance,

 Bas van Gijzel

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Question about implementing an off-side rule in Parsec 2

2009-04-28 Thread Neil Brown

Bas van Gijzel wrote:

Hello everyone,

I'm doing a bachelor project focused on comparing parsers. One of the 
parser libraries I'm using is Parsec (2) and I'm going to implement a 
very small subset of haskell with it, with as most important feature 
the off-side rule (indentation based parsing) used in function 
definitions and possibly in where clauses.


But I'm still a bit stuck on how to implement this cleanly. I tried to 
search for some examples on blogs but I haven't found anything yet. As 
far as I can see the way to go would be using getState and updateState 
methods defined in Parsec.Prim and to use the methods in Parsec.Pos to 
compare the difference in indendation for tokens.


But I haven't completely wrapped my head around any state monad yet 
and I don't understand Parsec enough yet to see how to use the methods 
Parsec.Pos and state easily. Some examples or pointers to something to 
read would really be helpful.

Hi,

I work on a compiler for occam-pi, which has indentation-based syntax.  
It's regular (two-spaces per indent) rather than 
different-number-of-spaces, and line continuations can only follow 
certain tokens, but perhaps our code might help you.


We use alex for tokenising and parsec for parsing.  We tokenise the 
file, and then use the source positions to create indent/outdent tokens 
in the token stream, and after that the parser parses things like a PAR 
block as: do {reserved PAR; indent; many1 subItem; outdent}.  Our code 
can be found at:


http://offog.org/darcs/tock/

Look in the frontends subdirectory, particularly at StructureOccam.hs, 
but also LexOccam.x and ParseOccam.hs.  It may not be the most elegant 
way to do things (occam has all sorts of oddities that make parsing a 
pain), but it does work :-)


Thanks,

Neil.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Question about implementing an off-side rule in Parsec 2

2009-04-28 Thread Bas van Gijzel
Hey,

Thanks for the help thusfar. These are interesting suggestions, and I think
the occam-pi compiler would help a bit as example. I'll force myself to
learn some more about the state monad, but I haven't found really good
examples except in Real World Haskell until now so I hope I'll manage. I'll
keep you posted about my further progress.

Cheers,

Bas

On Tue, Apr 28, 2009 at 2:04 PM, Neil Brown nc...@kent.ac.uk wrote:

 Bas van Gijzel wrote:

 Hello everyone,

 I'm doing a bachelor project focused on comparing parsers. One of the
 parser libraries I'm using is Parsec (2) and I'm going to implement a very
 small subset of haskell with it, with as most important feature the off-side
 rule (indentation based parsing) used in function definitions and possibly
 in where clauses.

 But I'm still a bit stuck on how to implement this cleanly. I tried to
 search for some examples on blogs but I haven't found anything yet. As far
 as I can see the way to go would be using getState and updateState methods
 defined in Parsec.Prim and to use the methods in Parsec.Pos to compare the
 difference in indendation for tokens.

 But I haven't completely wrapped my head around any state monad yet and I
 don't understand Parsec enough yet to see how to use the methods Parsec.Pos
 and state easily. Some examples or pointers to something to read would
 really be helpful.

 Hi,

 I work on a compiler for occam-pi, which has indentation-based syntax.
  It's regular (two-spaces per indent) rather than
 different-number-of-spaces, and line continuations can only follow certain
 tokens, but perhaps our code might help you.

 We use alex for tokenising and parsec for parsing.  We tokenise the file,
 and then use the source positions to create indent/outdent tokens in the
 token stream, and after that the parser parses things like a PAR block as:
 do {reserved PAR; indent; many1 subItem; outdent}.  Our code can be found
 at:

 http://offog.org/darcs/tock/

 Look in the frontends subdirectory, particularly at StructureOccam.hs, but
 also LexOccam.x and ParseOccam.hs.  It may not be the most elegant way to do
 things (occam has all sorts of oddities that make parsing a pain), but it
 does work :-)

 Thanks,

 Neil.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Question about implementing an off-side rule in Parsec 2

2009-04-28 Thread S. Doaitse Swierstra
As Lennart said, the complete offside rule as found in Haskell is  
almost impossible to get right. This is mainly due to the way in which  
it is formulated: in terms of error correction. This makes it very  
difficult to build a parser for such rules which have error correction  
built into them. We need to do a kind of open brain surgery to get  
this working. Note that the GHC treats the offside rule even a bit  
different in case it is caused by the do notation, in which case the  
indentation does not have to be greater, but has to be just at least  
as great as the previous indentation.


In the uulib package you will find a module which handles the offside  
parsing as we understand it; you may take it as an object of study. We  
use it in the UHC and we managed to compile almost all the basic  
libraries with it (with the exception of the do's mention above, which  
we had to give some extra indentation).  It basically follows the  
suggestion made by Lennart in this thread, by redefining the input  
state which is being maintained.


I understand that you try to build an Occam compiler. Fortunately the  
offside rule for Occam is much simpler, and resembles closely the  
Miranda rule.


I uploaded a new version of our parser combinators (uu-parisnglib) to  
Hackage, which is well documented in an associated tutorial. I think  
it could give you a good starting point. Note however that the library  
is far from stable, and will be extended in the near future. E.g. with  
a pBlock as we have in the uulib library to deal with the offside  
rule ;-}


 Hope you enjoy jumping into the deep,

 Doaitse Swierstra






On 28 apr 2009, at 22:03, Bas van Gijzel wrote:


Hey,

Thanks for the help thusfar. These are interesting suggestions, and  
I think the occam-pi compiler would help a bit as example. I'll  
force myself to learn some more about the state monad, but I haven't  
found really good examples except in Real World Haskell until now so  
I hope I'll manage. I'll keep you posted about my further progress.


Cheers,

Bas

On Tue, Apr 28, 2009 at 2:04 PM, Neil Brown nc...@kent.ac.uk wrote:
Bas van Gijzel wrote:
Hello everyone,

I'm doing a bachelor project focused on comparing parsers. One of  
the parser libraries I'm using is Parsec (2) and I'm going to  
implement a very small subset of haskell with it, with as most  
important feature the off-side rule (indentation based parsing) used  
in function definitions and possibly in where clauses.


But I'm still a bit stuck on how to implement this cleanly. I tried  
to search for some examples on blogs but I haven't found anything  
yet. As far as I can see the way to go would be using getState and  
updateState methods defined in Parsec.Prim and to use the methods in  
Parsec.Pos to compare the difference in indendation for tokens.


But I haven't completely wrapped my head around any state monad yet  
and I don't understand Parsec enough yet to see how to use the  
methods Parsec.Pos and state easily. Some examples or pointers to  
something to read would really be helpful.

Hi,

I work on a compiler for occam-pi, which has indentation-based  
syntax.  It's regular (two-spaces per indent) rather than different- 
number-of-spaces, and line continuations can only follow certain  
tokens, but perhaps our code might help you.


We use alex for tokenising and parsec for parsing.  We tokenise the  
file, and then use the source positions to create indent/outdent  
tokens in the token stream, and after that the parser parses things  
like a PAR block as: do {reserved PAR; indent; many1 subItem;  
outdent}.  Our code can be found at:


http://offog.org/darcs/tock/

Look in the frontends subdirectory, particularly at  
StructureOccam.hs, but also LexOccam.x and ParseOccam.hs.  It may  
not be the most elegant way to do things (occam has all sorts of  
oddities that make parsing a pain), but it does work :-)


Thanks,

Neil.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Question about implementing an off-side rule in Parsec 2

2009-04-28 Thread Bernie Pope
2009/4/28 Bas van Gijzel neneko...@gmail.com

 I'm doing a bachelor project focused on comparing parsers. One of the
 parser libraries I'm using is Parsec (2) and I'm going to implement a very
 small subset of haskell with it, with as most important feature the off-side
 rule (indentation based parsing) used in function definitions and possibly
 in where clauses.

 But I'm still a bit stuck on how to implement this cleanly. I tried to
 search for some examples on blogs but I haven't found anything yet. As far
 as I can see the way to go would be using getState and updateState methods
 defined in Parsec.Prim and to use the methods in Parsec.Pos to compare the
 difference in indendation for tokens.

 But I haven't completely wrapped my head around any state monad yet and I
 don't understand Parsec enough yet to see how to use the methods Parsec.Pos
 and state easily. Some examples or pointers to something to read would
 really be helpful.


Parsing a simple form of the offside rule is discussed in the paper:

Monadic Parser Combinators, Hutton and Meijer, 1996
http://www.cs.nott.ac.uk/~gmh/monparsing.pdf

see section 8, page 30.

Their parsers are similar in style to Parsec, but you may need to do some
translation.

I haven't thought about it hard, but I suspect their approach is not
efficient for deeply nested examples, due to repeated processing of the
token stream (but I could be wrong, and maybe it doesn't matter for what you
are trying to do).

I followed their approach in a toy language once, and the result was very
pleasing to read in code.

Cheers,
Bernie.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Question about implementing an off-side rule in Parsec 2

2009-04-27 Thread Jason Dagit
On Mon, Apr 27, 2009 at 2:41 PM, Bas van Gijzel neneko...@gmail.com wrote:
 Hello everyone,

 I'm doing a bachelor project focused on comparing parsers. One of the parser
 libraries I'm using is Parsec (2) and I'm going to implement a very small
 subset of haskell with it, with as most important feature the off-side rule
 (indentation based parsing) used in function definitions and possibly in
 where clauses.

 But I'm still a bit stuck on how to implement this cleanly. I tried to
 search for some examples on blogs but I haven't found anything yet. As far
 as I can see the way to go would be using getState and updateState methods
 defined in Parsec.Prim and to use the methods in Parsec.Pos to compare the
 difference in indendation for tokens.

I've never tried to implement a whitespace sensitive parser like you'd
need for Haskell or Python, but my thought was that maybe you use a
stack and you push/pop from it when the indentation level changes.
That way, I think you could isolate the part of the parser that
handles changes in indentation from the rest.  Maybe I haven't thought
about this enough yet because I'm not sure what I would store on the
stack.  In some sense, you want the indentation level change to
determine which production(s) of the grammar you are looking for.  You
might also use it to track scope so that you know the scope of the
names in the program source.

 But I haven't completely wrapped my head around any state monad yet and I
 don't understand Parsec enough yet to see how to use the methods Parsec.Pos
 and state easily. Some examples or pointers to something to read would
 really be helpful.

I'd start by playing with some toy examples in the State monad, then
try implementing a State monad.  Only look at the real implementation
when you get stuck.  Once you get State, then go back to Parsec which
is more complex.  This way you'll be in over your head less.

Good luck,
Jason
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Question about implementing an off-side rule in Parsec 2

2009-04-27 Thread Ryan Ingram
I don't have experience solving this problem, but I've read a few
horror stories from people who had state affect the results of parsing
in Parsec.

Haskell's layout rules replace indentation levels with braces and
semicolons; you could run an initial tokenizing parser that builds
tokens including the indentation level of each line, then convert
those tokens into the appropriate braces and semicolons (via a pure
function) before feeding the results into the real Haskell parser.

  -- ryan

On Mon, Apr 27, 2009 at 2:41 PM, Bas van Gijzel neneko...@gmail.com wrote:
 Hello everyone,

 I'm doing a bachelor project focused on comparing parsers. One of the parser
 libraries I'm using is Parsec (2) and I'm going to implement a very small
 subset of haskell with it, with as most important feature the off-side rule
 (indentation based parsing) used in function definitions and possibly in
 where clauses.

 But I'm still a bit stuck on how to implement this cleanly. I tried to
 search for some examples on blogs but I haven't found anything yet. As far
 as I can see the way to go would be using getState and updateState methods
 defined in Parsec.Prim and to use the methods in Parsec.Pos to compare the
 difference in indendation for tokens.

 But I haven't completely wrapped my head around any state monad yet and I
 don't understand Parsec enough yet to see how to use the methods Parsec.Pos
 and state easily. Some examples or pointers to something to read would
 really be helpful.

 Thanks in advance,

 Bas van Gijzel

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe