I had been using Parsec to parse VCD files, but needed to lazily parse streaming data. After stumbling on this thread below, I switch to polyparse.
What a great library! I was able to migrate from a strict to a semi-lazy parser and many of my parse reductions didn't even need to change. Thanks Malcolm! In addition to lazy VCD parsing, this version of vcd [1] also includes step', which forces a step regardless if variables have changed or not -- helpful for realtime simulation. (BTW, parsec is a great library too.) -Tom [1] http://hackage.haskell.org/package/vcd-0.1.4 On Sun, May 31, 2009 at 6:41 AM, Malcolm Wallace <malcolm.wall...@cs.york.ac.uk> wrote: > > I don't know whether you will be willing to change over to polyparse > library, but here are some hints about how you might use it. > > Given that you want the input to be a simple character stream, rather than > use a more elaborate lexer, the first thing to do is to specialise the > parser type for your purposes: > >> type TextParser a = Parser Char a > > Now, to recognise a "mere digit", > >> digit :: TextParser Char >> digit = satisfy Char.isDigit > > and for a sequence of digits forming an unsigned integer: > >> integer :: TextParser Integer >> integer = do ds <- many1 digit >> return (foldl1 (\n d-> n*10+d) >> (map (fromIntegral.digitToInt) ds)) >> `adjustErr` (++("expected one or more digits")) > >> I mean I'd like to be able to turn "12.05.2009" into something like (12, >> 5, 2009) and got no clue what the code would have to look like. I do know >> almost every variation what the code must not look like :). > >> date = do a <- integer >> satisfy (=='.') >> b <- integer >> satisfy (=='.') >> c <- integer >> return (a,b,c) > > Of course, that is just the standard (strict) monadic interface used by many > combinator libraries. Your original desire was for lazy parsing, and to > achieve that, you must move over to the applicative interface. The key > difference is that you cannot name intermediate values, but must construct > larger values directly from smaller ones by something like function > application. > >> lazydate = return (,,) `apply` integer `discard` dot >> `apply` integer `discard` dot >> `apply` integer >> where dot = satisfy (=='.') > > The (,,) is the constructor function for triples. The `discard` combinator > ensures that its second argument parses OK, but throws away its result, > keeping only the result of its first argument. > > Apart from lazy space behaviour, the main observable difference between > "date" and "lazydate" is when errors are reported on incorrect input. For > instance: > > > fst $ runParser date "12.05..2009" > *** Exception: In a sequence: > Parse.satisfy: failed > expected one or more digits > > > fst $ runParser lazydate "12.05..2009" > (12,5,*** Exception: In a sequence: > Parse.satisfy: failed > expected one or more digits > > Notice how the lazy parser managed to build the first two elements of the > triple, whilst the strict parser gave no value at all. > > I know that the error messages shown here are not entirely satisfactory, but > they can be improved significantly just by making greater use of the > `adjustErr` combinator in lots more places (it is rather like Parsec's <?>). > Errors containing positional information about the input can be constructed > by introducing a separate lexical tokenizer, which is also not difficult. > > Regards, > Malcolm > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe