Re: [Haskell-cafe] Re: Lazy Parsing

Malcolm Wallace Sun, 31 May 2009 04:42:04 -0700

It is my pleasure to announce that after 5 days of experimentingwith uu-parsinglib I have absolutely no clue, whatsoever, on how touse it.
I do not even manage to write a parser for even a mere digit or asimple character.

I don't know whether you will be willing to change over to polyparselibrary, but here are some hints about how you might use it.

Given that you want the input to be a simple character stream, ratherthan use a more elaborate lexer, the first thing to do is tospecialise the parser type for your purposes:


> type TextParser a = Parser Char a

Now, to recognise a "mere digit",

> digit :: TextParser Char
> digit = satisfy Char.isDigit

and for a sequence of digits forming an unsigned integer:

> integer :: TextParser Integer
> integer = do ds <- many1 digit
>              return (foldl1 (\n d-> n*10+d)
>                             (map (fromIntegral.digitToInt) ds))
>           `adjustErr` (++("expected one or more digits"))

I mean I'd like to be able to turn "12.05.2009" into something like(12, 5, 2009) and got no clue what the code would have to look like.I do know almost every variation what the code must not look like :).


> date = do a <- integer
>           satisfy (=='.')
>           b <- integer
>           satisfy (=='.')
>           c <- integer
>           return (a,b,c)

Of course, that is just the standard (strict) monadic interface usedby many combinator libraries. Your original desire was for lazyparsing, and to achieve that, you must move over to the applicativeinterface. The key difference is that you cannot name intermediatevalues, but must construct larger values directly from smaller ones bysomething like function application.


> lazydate = return (,,) `apply` integer `discard` dot
>                        `apply` integer `discard` dot
>                        `apply` integer
>    where dot = satisfy (=='.')

The (,,) is the constructor function for triples. The `discard`combinator ensures that its second argument parses OK, but throws awayits result, keeping only the result of its first argument.

Apart from lazy space behaviour, the main observable differencebetween "date" and "lazydate" is when errors are reported on incorrectinput. For instance:


  > fst $ runParser date "12.05..2009"
  *** Exception: In a sequence:
  Parse.satisfy: failed
  expected one or more digits

  > fst $ runParser lazydate "12.05..2009"
  (12,5,*** Exception: In a sequence:
  Parse.satisfy: failed
  expected one or more digits

Notice how the lazy parser managed to build the first two elements ofthe triple, whilst the strict parser gave no value at all.

I know that the error messages shown here are not entirelysatisfactory, but they can be improved significantly just by makinggreater use of the `adjustErr` combinator in lots more places (it israther like Parsec's <?>). Errors containing positional informationabout the input can be constructed by introducing a separate lexicaltokenizer, which is also not difficult.


Regards,
    Malcolm

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: Lazy Parsing

Reply via email to