Re: [Haskell-cafe] Lazy Parsing (ANN: vcd-0.1.4)

2010-04-28 Thread Jason Dusek
2010/04/28 S. Doaitse Swierstra :
> On 27 apr 2010, at 22:12, Jason Dusek wrote:
> > So UU parsers can construct input?
>
> The perform an editing action on the input so it becomes a
> sentence of the language recognised.

  My questions betray a fundamental misunderstanding on my part;
  reading the above, its clear the parser is constructing
  strings that match the language, not values that are of the
  type.

--
Jason Dusek
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy Parsing (ANN: vcd-0.1.4)

2010-04-28 Thread S. Doaitse Swierstra

On 27 apr 2010, at 22:12, Jason Dusek wrote:

>  So UU parsers can construct input?

The perform an editing action on the input so it becomes a sentence of the 
language recognised. 

> The presence of an
>  empty list in the 2nd slot of the tuple is the only
>  indicator of errors?

The parser wants to see a natural number, whch is a non-empty list of digits. 
So it inserts a single digit, which is any character from the range '0'-'9'. 
Since no default value is given here, it takes the first one from the range: 
'0'. Furthermore you get a list of errors, which tell you which correcting 
steps were taken. There is a special combinator with which you can ask for the 
errors produced since the last time you asked, and which you can use to control 
further parsing.

> 
>  For parsing datatypes without a sensible default value,
>  what happens?

If you do nothing you get a less sensible default value; 
you may however provide (lower costs) extra alternatives which will be taken by 
the correcting process. There is a cost model which can be used to control the 
correction process. Tokens have a specific insertion cost and a specific 
deletion cost with which you can play. Usually this is not necessary. The 
typical process is that at first you do not pay attention to the correction 
process, and once you see things you really do not want, you provide an extra 
alternative, or rule out some alternatives by increasuig costs. 

In the UHC token like "if" have a high cost, since we think there is very 
little chance that people will forget to write them. A ')' can have a lower 
insertion and deletion cost, since people are more likely to have too many or 
not enough of them.



 Doaitse




> 
> --
> Jason Dusek

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy Parsing (ANN: vcd-0.1.4)

2010-04-27 Thread Jason Dusek
  So UU parsers can construct input? The presence of an
  empty list in the 2nd slot of the tuple is the only
  indicator of errors?

  For parsing datatypes without a sensible default value,
  what happens?

--
Jason Dusek
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Lazy Parsing (ANN: vcd-0.1.4)

2010-04-27 Thread S. Doaitse Swierstra
How about:

import Text.ParserCombinators.UU.Parsing
import Text.ParserCombinators.UU.Examples


pDate :: Pars (Int,Int,Int)
pDate = (,,) <$> pNatural <* pDot <*> pNatural <* pDot <*> pNatural
where pDot = pSym '.'

and then:

*Main> test pDate "3.4.5"
Loading package syb-0.1.0.2 ... linking ... done.
Loading package base-3.0.3.2 ... linking ... done.
Loading package array-0.3.0.0 ... linking ... done.
Loading package filepath-1.1.0.3 ... linking ... done.
Loading package old-locale-1.0.0.2 ... linking ... done.
Loading package old-time-1.0.0.3 ... linking ... done.
Loading package unix-2.4.0.0 ... linking ... done.
Loading package directory-1.0.1.0 ... linking ... done.
Loading package process-1.0.1.2 ... linking ... done.
Loading package time-1.1.4 ... linking ... done.
Loading package random-1.0.0.2 ... linking ... done.
Loading package haskell98 ... linking ... done.
Loading package uu-parsinglib-2.3.1 ... linking ... done.
((3,4,5),[])
*Main> test pDate "3..7"
((3,0,7),[
Inserted '0' at position 2 expecting '0'..'9'])
*Main> test pDate ""
((0,0,0),[
Inserted '0' at position 0 expecting '0'..'9',
Inserted '.' at position 0 expecting one of ['0'..'9', '.'],
Inserted '0' at position 0 expecting '0'..'9',
Inserted '.' at position 0 expecting one of ['0'..'9', '.'],
Inserted '0' at position 0 expecting '0'..'9'])
*Main> test pDate "3.4.2010"
((3,4,2010),[])
*Main>

Doaitse


On 27 apr 2010, at 13:23, Tom Hawkins wrote:

> I had been using Parsec to parse VCD files, but needed to lazily parse
> streaming data.  After stumbling on this thread below, I switch to
> polyparse.
> 
> What a great library!  I was able to migrate from a strict to a
> semi-lazy parser and many of my parse reductions didn't even need to
> change.  Thanks Malcolm!
> 
> In addition to lazy VCD parsing, this version of vcd [1] also includes
> step', which forces a step regardless if variables have changed or not
> -- helpful for realtime simulation.
> 
> (BTW, parsec is a great library too.)
> 
> -Tom
> 
> [1] http://hackage.haskell.org/package/vcd-0.1.4
> 
> 
> 
> On Sun, May 31, 2009 at 6:41 AM, Malcolm Wallace
>  wrote:
>> 
>> I don't know whether you will be willing to change over to polyparse
>> library, but here are some hints about how you might use it.
>> 
>> Given that you want the input to be a simple character stream, rather than
>> use a more elaborate lexer, the first thing to do is to specialise the
>> parser type for your purposes:
>> 
>>> type TextParser a = Parser Char a
>> 
>> Now, to recognise a "mere digit",
>> 
>>> digit :: TextParser Char
>>> digit = satisfy Char.isDigit
>> 
>> and for a sequence of digits forming an unsigned integer:
>> 
>>> integer :: TextParser Integer
>>> integer = do ds <- many1 digit
>>>  return (foldl1 (\n d-> n*10+d)
>>> (map (fromIntegral.digitToInt) ds))
>>>   `adjustErr` (++("expected one or more digits"))
>> 
>>> I mean I'd like to be able to turn "12.05.2009" into something like (12,
>>> 5, 2009) and got no clue what the code would have to look like. I do know
>>> almost every variation what the code must not look like :).
>> 
>>> date = do a <- integer
>>>   satisfy (=='.')
>>>   b <- integer
>>>   satisfy (=='.')
>>>   c <- integer
>>>   return (a,b,c)
>> 
>> Of course, that is just the standard (strict) monadic interface used by many
>> combinator libraries.  Your original desire was for lazy parsing, and to
>> achieve that, you must move over to the applicative interface.  The key
>> difference is that you cannot name intermediate values, but must construct
>> larger values directly from smaller ones by something like function
>> application.
>> 
>>> lazydate = return (,,) `apply` integer `discard` dot
>>>`apply` integer `discard` dot
>>>`apply` integer
>>>where dot = satisfy (=='.')
>> 
>> The (,,) is the constructor function for triples.  The `discard` combinator
>> ensures that its second argument parses OK, but throws away its result,
>> keeping only the result of its first argument.
>> 
>> Apart from lazy space behaviour, the main observable difference between
>> "date" and "lazydate" is when errors are reported on incorrect input.  For
>> instance:
>> 
>>  > fst $ runParser date "12.05..2009"
>>  *** Exception: In a sequence:
>>  Parse.satisfy: failed
>>  expected one or more digits
>> 
>>  > fst $ runParser lazydate "12.05..2009"
>>  (12,5,*** Exception: In a sequence:
>>  Parse.satisfy: failed
>>  expected one or more digits
>> 
>> Notice how the lazy parser managed to build the first two elements of the
>> triple, whilst the strict parser gave no value at all.
>> 
>> I know that the error messages shown here are not entirely satisfactory, but
>> they can be improved significantly just by making greater use of the
>> `adjustErr` combinator in lots more places (it is rather like Parsec's ).
>>  Errors containing positional inform

[Haskell-cafe] Lazy Parsing (ANN: vcd-0.1.4)

2010-04-27 Thread Tom Hawkins
I had been using Parsec to parse VCD files, but needed to lazily parse
streaming data.  After stumbling on this thread below, I switch to
polyparse.

What a great library!  I was able to migrate from a strict to a
semi-lazy parser and many of my parse reductions didn't even need to
change.  Thanks Malcolm!

In addition to lazy VCD parsing, this version of vcd [1] also includes
step', which forces a step regardless if variables have changed or not
-- helpful for realtime simulation.

(BTW, parsec is a great library too.)

-Tom

[1] http://hackage.haskell.org/package/vcd-0.1.4



On Sun, May 31, 2009 at 6:41 AM, Malcolm Wallace
 wrote:
>
> I don't know whether you will be willing to change over to polyparse
> library, but here are some hints about how you might use it.
>
> Given that you want the input to be a simple character stream, rather than
> use a more elaborate lexer, the first thing to do is to specialise the
> parser type for your purposes:
>
>> type TextParser a = Parser Char a
>
> Now, to recognise a "mere digit",
>
>> digit :: TextParser Char
>> digit = satisfy Char.isDigit
>
> and for a sequence of digits forming an unsigned integer:
>
>> integer :: TextParser Integer
>> integer = do ds <- many1 digit
>>              return (foldl1 (\n d-> n*10+d)
>>                             (map (fromIntegral.digitToInt) ds))
>>           `adjustErr` (++("expected one or more digits"))
>
>> I mean I'd like to be able to turn "12.05.2009" into something like (12,
>> 5, 2009) and got no clue what the code would have to look like. I do know
>> almost every variation what the code must not look like :).
>
>> date = do a <- integer
>>           satisfy (=='.')
>>           b <- integer
>>           satisfy (=='.')
>>           c <- integer
>>           return (a,b,c)
>
> Of course, that is just the standard (strict) monadic interface used by many
> combinator libraries.  Your original desire was for lazy parsing, and to
> achieve that, you must move over to the applicative interface.  The key
> difference is that you cannot name intermediate values, but must construct
> larger values directly from smaller ones by something like function
> application.
>
>> lazydate = return (,,) `apply` integer `discard` dot
>>                        `apply` integer `discard` dot
>>                        `apply` integer
>>    where dot = satisfy (=='.')
>
> The (,,) is the constructor function for triples.  The `discard` combinator
> ensures that its second argument parses OK, but throws away its result,
> keeping only the result of its first argument.
>
> Apart from lazy space behaviour, the main observable difference between
> "date" and "lazydate" is when errors are reported on incorrect input.  For
> instance:
>
>  > fst $ runParser date "12.05..2009"
>  *** Exception: In a sequence:
>  Parse.satisfy: failed
>  expected one or more digits
>
>  > fst $ runParser lazydate "12.05..2009"
>  (12,5,*** Exception: In a sequence:
>  Parse.satisfy: failed
>  expected one or more digits
>
> Notice how the lazy parser managed to build the first two elements of the
> triple, whilst the strict parser gave no value at all.
>
> I know that the error messages shown here are not entirely satisfactory, but
> they can be improved significantly just by making greater use of the
> `adjustErr` combinator in lots more places (it is rather like Parsec's ).
>  Errors containing positional information about the input can be constructed
> by introducing a separate lexical tokenizer, which is also not difficult.
>
> Regards,
>    Malcolm
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe