Re: [Haskell-cafe] Lazy Parsing (ANN: vcd-0.1.4)
On 27 apr 2010, at 22:12, Jason Dusek wrote: So UU parsers can construct input? The perform an editing action on the input so it becomes a sentence of the language recognised. The presence of an empty list in the 2nd slot of the tuple is the only indicator of errors? The parser wants to see a natural number, whch is a non-empty list of digits. So it inserts a single digit, which is any character from the range '0'-'9'. Since no default value is given here, it takes the first one from the range: '0'. Furthermore you get a list of errors, which tell you which correcting steps were taken. There is a special combinator with which you can ask for the errors produced since the last time you asked, and which you can use to control further parsing. For parsing datatypes without a sensible default value, what happens? If you do nothing you get a less sensible default value; you may however provide (lower costs) extra alternatives which will be taken by the correcting process. There is a cost model which can be used to control the correction process. Tokens have a specific insertion cost and a specific deletion cost with which you can play. Usually this is not necessary. The typical process is that at first you do not pay attention to the correction process, and once you see things you really do not want, you provide an extra alternative, or rule out some alternatives by increasuig costs. In the UHC token like if have a high cost, since we think there is very little chance that people will forget to write them. A ')' can have a lower insertion and deletion cost, since people are more likely to have too many or not enough of them. Doaitse -- Jason Dusek ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Lazy Parsing (ANN: vcd-0.1.4)
I had been using Parsec to parse VCD files, but needed to lazily parse streaming data. After stumbling on this thread below, I switch to polyparse. What a great library! I was able to migrate from a strict to a semi-lazy parser and many of my parse reductions didn't even need to change. Thanks Malcolm! In addition to lazy VCD parsing, this version of vcd [1] also includes step', which forces a step regardless if variables have changed or not -- helpful for realtime simulation. (BTW, parsec is a great library too.) -Tom [1] http://hackage.haskell.org/package/vcd-0.1.4 On Sun, May 31, 2009 at 6:41 AM, Malcolm Wallace malcolm.wall...@cs.york.ac.uk wrote: I don't know whether you will be willing to change over to polyparse library, but here are some hints about how you might use it. Given that you want the input to be a simple character stream, rather than use a more elaborate lexer, the first thing to do is to specialise the parser type for your purposes: type TextParser a = Parser Char a Now, to recognise a mere digit, digit :: TextParser Char digit = satisfy Char.isDigit and for a sequence of digits forming an unsigned integer: integer :: TextParser Integer integer = do ds - many1 digit return (foldl1 (\n d- n*10+d) (map (fromIntegral.digitToInt) ds)) `adjustErr` (++(expected one or more digits)) I mean I'd like to be able to turn 12.05.2009 into something like (12, 5, 2009) and got no clue what the code would have to look like. I do know almost every variation what the code must not look like :). date = do a - integer satisfy (=='.') b - integer satisfy (=='.') c - integer return (a,b,c) Of course, that is just the standard (strict) monadic interface used by many combinator libraries. Your original desire was for lazy parsing, and to achieve that, you must move over to the applicative interface. The key difference is that you cannot name intermediate values, but must construct larger values directly from smaller ones by something like function application. lazydate = return (,,) `apply` integer `discard` dot `apply` integer `discard` dot `apply` integer where dot = satisfy (=='.') The (,,) is the constructor function for triples. The `discard` combinator ensures that its second argument parses OK, but throws away its result, keeping only the result of its first argument. Apart from lazy space behaviour, the main observable difference between date and lazydate is when errors are reported on incorrect input. For instance: fst $ runParser date 12.05..2009 *** Exception: In a sequence: Parse.satisfy: failed expected one or more digits fst $ runParser lazydate 12.05..2009 (12,5,*** Exception: In a sequence: Parse.satisfy: failed expected one or more digits Notice how the lazy parser managed to build the first two elements of the triple, whilst the strict parser gave no value at all. I know that the error messages shown here are not entirely satisfactory, but they can be improved significantly just by making greater use of the `adjustErr` combinator in lots more places (it is rather like Parsec's ?). Errors containing positional information about the input can be constructed by introducing a separate lexical tokenizer, which is also not difficult. Regards, Malcolm ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Lazy Parsing (ANN: vcd-0.1.4)
How about: import Text.ParserCombinators.UU.Parsing import Text.ParserCombinators.UU.Examples pDate :: Pars (Int,Int,Int) pDate = (,,) $ pNatural * pDot * pNatural * pDot * pNatural where pDot = pSym '.' and then: *Main test pDate 3.4.5 Loading package syb-0.1.0.2 ... linking ... done. Loading package base-3.0.3.2 ... linking ... done. Loading package array-0.3.0.0 ... linking ... done. Loading package filepath-1.1.0.3 ... linking ... done. Loading package old-locale-1.0.0.2 ... linking ... done. Loading package old-time-1.0.0.3 ... linking ... done. Loading package unix-2.4.0.0 ... linking ... done. Loading package directory-1.0.1.0 ... linking ... done. Loading package process-1.0.1.2 ... linking ... done. Loading package time-1.1.4 ... linking ... done. Loading package random-1.0.0.2 ... linking ... done. Loading package haskell98 ... linking ... done. Loading package uu-parsinglib-2.3.1 ... linking ... done. ((3,4,5),[]) *Main test pDate 3..7 ((3,0,7),[ Inserted '0' at position 2 expecting '0'..'9']) *Main test pDate ((0,0,0),[ Inserted '0' at position 0 expecting '0'..'9', Inserted '.' at position 0 expecting one of ['0'..'9', '.'], Inserted '0' at position 0 expecting '0'..'9', Inserted '.' at position 0 expecting one of ['0'..'9', '.'], Inserted '0' at position 0 expecting '0'..'9']) *Main test pDate 3.4.2010 ((3,4,2010),[]) *Main Doaitse On 27 apr 2010, at 13:23, Tom Hawkins wrote: I had been using Parsec to parse VCD files, but needed to lazily parse streaming data. After stumbling on this thread below, I switch to polyparse. What a great library! I was able to migrate from a strict to a semi-lazy parser and many of my parse reductions didn't even need to change. Thanks Malcolm! In addition to lazy VCD parsing, this version of vcd [1] also includes step', which forces a step regardless if variables have changed or not -- helpful for realtime simulation. (BTW, parsec is a great library too.) -Tom [1] http://hackage.haskell.org/package/vcd-0.1.4 On Sun, May 31, 2009 at 6:41 AM, Malcolm Wallace malcolm.wall...@cs.york.ac.uk wrote: I don't know whether you will be willing to change over to polyparse library, but here are some hints about how you might use it. Given that you want the input to be a simple character stream, rather than use a more elaborate lexer, the first thing to do is to specialise the parser type for your purposes: type TextParser a = Parser Char a Now, to recognise a mere digit, digit :: TextParser Char digit = satisfy Char.isDigit and for a sequence of digits forming an unsigned integer: integer :: TextParser Integer integer = do ds - many1 digit return (foldl1 (\n d- n*10+d) (map (fromIntegral.digitToInt) ds)) `adjustErr` (++(expected one or more digits)) I mean I'd like to be able to turn 12.05.2009 into something like (12, 5, 2009) and got no clue what the code would have to look like. I do know almost every variation what the code must not look like :). date = do a - integer satisfy (=='.') b - integer satisfy (=='.') c - integer return (a,b,c) Of course, that is just the standard (strict) monadic interface used by many combinator libraries. Your original desire was for lazy parsing, and to achieve that, you must move over to the applicative interface. The key difference is that you cannot name intermediate values, but must construct larger values directly from smaller ones by something like function application. lazydate = return (,,) `apply` integer `discard` dot `apply` integer `discard` dot `apply` integer where dot = satisfy (=='.') The (,,) is the constructor function for triples. The `discard` combinator ensures that its second argument parses OK, but throws away its result, keeping only the result of its first argument. Apart from lazy space behaviour, the main observable difference between date and lazydate is when errors are reported on incorrect input. For instance: fst $ runParser date 12.05..2009 *** Exception: In a sequence: Parse.satisfy: failed expected one or more digits fst $ runParser lazydate 12.05..2009 (12,5,*** Exception: In a sequence: Parse.satisfy: failed expected one or more digits Notice how the lazy parser managed to build the first two elements of the triple, whilst the strict parser gave no value at all. I know that the error messages shown here are not entirely satisfactory, but they can be improved significantly just by making greater use of the `adjustErr` combinator in lots more places (it is rather like Parsec's ?). Errors containing positional information about the input can be constructed by introducing a separate lexical tokenizer, which is also not difficult. Regards, Malcolm
Re: [Haskell-cafe] Lazy Parsing (ANN: vcd-0.1.4)
So UU parsers can construct input? The presence of an empty list in the 2nd slot of the tuple is the only indicator of errors? For parsing datatypes without a sensible default value, what happens? -- Jason Dusek ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Lazy Parsing
Hi Günther The code below should work for your simple example, provided it hasn't lost formatting when I pasted it in to the email. I was a bit surprised that there is no pSatisfy in this library, but there are parsers for digits, lower case, upper case letters etc. in the Examples module that would otherwise be achieved with pSatisfy. Best wishes Stephen {-# LANGUAGE FlexibleContexts #-} module Demo1 where import Text.ParserCombinators.UU.Examples import Text.ParserCombinators.UU.Parsing -- here's a simple character '@' parser pAtSym :: Symbol p Char Char = p Char pAtSym = pSym '@' test_simple_char = test pAtSym @ test_simple_char2 = test pAtSym @ -- pDigit is supplied in Text.ParserCombinators.UU.Examples test_any_digit= test pDigit 6 -- pNatural is supplied in Text.ParserCombinators.UU.Examples -- It looks like the most likely candidate to parse a -- sequence of digits... test_natural = test pNatural 1234 -- ... and it is! -- parse a date 12.05.2009 as a triple (Int,Int,Int) pDateTriple :: (Symbol p (Char,Char) Char, Applicative p, ExtApplicative p st, Provides st Char Char) = p (Int,Int,Int) pDateTriple = (,,) $ pNatural * pDot * pNatural * pDot * pNatural pDot :: (Symbol p Char Char, Applicative p) = p [Char] pDot = lift $ pSym '.' test_date = test pDateTriple 12.05.2009 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Lazy Parsing
It is my pleasure to announce that after 5 days of experimenting with uu-parsinglib I have absolutely no clue, whatsoever, on how to use it. I do not even manage to write a parser for even a mere digit or a simple character. I don't know whether you will be willing to change over to polyparse library, but here are some hints about how you might use it. Given that you want the input to be a simple character stream, rather than use a more elaborate lexer, the first thing to do is to specialise the parser type for your purposes: type TextParser a = Parser Char a Now, to recognise a mere digit, digit :: TextParser Char digit = satisfy Char.isDigit and for a sequence of digits forming an unsigned integer: integer :: TextParser Integer integer = do ds - many1 digit return (foldl1 (\n d- n*10+d) (map (fromIntegral.digitToInt) ds)) `adjustErr` (++(expected one or more digits)) I mean I'd like to be able to turn 12.05.2009 into something like (12, 5, 2009) and got no clue what the code would have to look like. I do know almost every variation what the code must not look like :). date = do a - integer satisfy (=='.') b - integer satisfy (=='.') c - integer return (a,b,c) Of course, that is just the standard (strict) monadic interface used by many combinator libraries. Your original desire was for lazy parsing, and to achieve that, you must move over to the applicative interface. The key difference is that you cannot name intermediate values, but must construct larger values directly from smaller ones by something like function application. lazydate = return (,,) `apply` integer `discard` dot `apply` integer `discard` dot `apply` integer where dot = satisfy (=='.') The (,,) is the constructor function for triples. The `discard` combinator ensures that its second argument parses OK, but throws away its result, keeping only the result of its first argument. Apart from lazy space behaviour, the main observable difference between date and lazydate is when errors are reported on incorrect input. For instance: fst $ runParser date 12.05..2009 *** Exception: In a sequence: Parse.satisfy: failed expected one or more digits fst $ runParser lazydate 12.05..2009 (12,5,*** Exception: In a sequence: Parse.satisfy: failed expected one or more digits Notice how the lazy parser managed to build the first two elements of the triple, whilst the strict parser gave no value at all. I know that the error messages shown here are not entirely satisfactory, but they can be improved significantly just by making greater use of the `adjustErr` combinator in lots more places (it is rather like Parsec's ?). Errors containing positional information about the input can be constructed by introducing a separate lexical tokenizer, which is also not difficult. Regards, Malcolm ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Lazy Parsing
Dear Doaitse, thank you very much for your help. I am curious to know what made you go wrong with the tutorial, and caused that you could not find the solution below? Well let's first agree that I'm not very bright. I hate to admit it, but it's a simple fact ;-). Second let's agree that the uu-parsinglib is a *very* sophisticated beast, I have not seen anything else like it out there, my sincere congratulations for it. Thirdly the tutorial is also a very sophisticated beast, and forthly, well just see point 1 :-). And I just figured out why I was unable to write even that simple parser. The code you sent me works just fine, I copied and pasted it, no problems. But, as soon as I comment out the main function the type checker complains, because now the ghci's type checker can no longer infer the types of pDate or pDot. And this is exactly what happened. I kept getting error messages from ghci, eventhough I had defined my parsers possible correctly, but, *minus* the type signatures *and* minus any main function that called it. In hindsight I realize that this is a trap I have walked into many times before, I guess I still have not acquired a Haskellers intuition. I promise to do better next time :) Günther ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Lazy Parsing
Dear Malcom, thanks for helping. I had actually come to Haskell originally because of a parsing problem. I had been using Smalltalk until I started a project which required parsing files. Until then I had not done any RW parsing. Well the route was more a Parsec - Haskell, wtf is Haskell? Anyway eventually I dropped Smalltalk and got addicted to Haskell. And managed familiarize myself with Haskell and Parsec, the latter as it turned out I didn't even need to solve my original problem. Anyway polyparse certainly is an option, but there are a few things that despite my list of failures to use it give uu-parsinglib a special appeal, the breadth-first approach with choice, I find that terrible elegant. Due to some kicks in my behind it seems that I might be able to use Doaitse's combinators now, some more details on that are in another post. Günther Malcolm Wallace schrieb: It is my pleasure to announce that after 5 days of experimenting with uu-parsinglib I have absolutely no clue, whatsoever, on how to use it. I do not even manage to write a parser for even a mere digit or a simple character. I don't know whether you will be willing to change over to polyparse library, but here are some hints about how you might use it. Given that you want the input to be a simple character stream, rather than use a more elaborate lexer, the first thing to do is to specialise the parser type for your purposes: type TextParser a = Parser Char a Now, to recognise a mere digit, digit :: TextParser Char digit = satisfy Char.isDigit and for a sequence of digits forming an unsigned integer: integer :: TextParser Integer integer = do ds - many1 digit return (foldl1 (\n d- n*10+d) (map (fromIntegral.digitToInt) ds)) `adjustErr` (++(expected one or more digits)) I mean I'd like to be able to turn 12.05.2009 into something like (12, 5, 2009) and got no clue what the code would have to look like. I do know almost every variation what the code must not look like :). date = do a - integer satisfy (=='.') b - integer satisfy (=='.') c - integer return (a,b,c) Of course, that is just the standard (strict) monadic interface used by many combinator libraries. Your original desire was for lazy parsing, and to achieve that, you must move over to the applicative interface. The key difference is that you cannot name intermediate values, but must construct larger values directly from smaller ones by something like function application. lazydate = return (,,) `apply` integer `discard` dot `apply` integer `discard` dot `apply` integer where dot = satisfy (=='.') The (,,) is the constructor function for triples. The `discard` combinator ensures that its second argument parses OK, but throws away its result, keeping only the result of its first argument. Apart from lazy space behaviour, the main observable difference between date and lazydate is when errors are reported on incorrect input. For instance: fst $ runParser date 12.05..2009 *** Exception: In a sequence: Parse.satisfy: failed expected one or more digits fst $ runParser lazydate 12.05..2009 (12,5,*** Exception: In a sequence: Parse.satisfy: failed expected one or more digits Notice how the lazy parser managed to build the first two elements of the triple, whilst the strict parser gave no value at all. I know that the error messages shown here are not entirely satisfactory, but they can be improved significantly just by making greater use of the `adjustErr` combinator in lots more places (it is rather like Parsec's ?). Errors containing positional information about the input can be constructed by introducing a separate lexical tokenizer, which is also not difficult. Regards, Malcolm ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Lazy Parsing
Hi Günther I suspect the problem you were having is that there are various 'parsers' (more correctly 'parser types') defined in Text.ParserCombinators.UU.Parsing and the code you had in your running example didn't always have enough information to allow GHC to pick a particular one. The /test/ function in Examples demands the parser to be of type 'P_m state a' [1], so if you were running your parsers with /test/ in your main function this would be give the parsers a concrete, inferable type (if you hadn't given then a type signature). Once you comment out main, the parsers have a more general type than 'P_m state a', which can't be inferred due to class constraints. Maybe the 'haskeller's inituition' in this instance is to define the type signatures and the functions at the same time, admittedly this can be difficult for functions with heavy use of type classes. Best wishes Stephen [1] I'm afraid I don't know the intricacies of the particular types in the new UU parsing library, until this morning I'd only used the previous version in uulib. 2009/5/31 Guenther Schmidt gue.schm...@web.de: Dear Doaitse, thank you very much for your help. I am curious to know what made you go wrong with the tutorial, and caused that you could not find the solution below? Well let's first agree that I'm not very bright. I hate to admit it, but it's a simple fact ;-). Second let's agree that the uu-parsinglib is a *very* sophisticated beast, I have not seen anything else like it out there, my sincere congratulations for it. Thirdly the tutorial is also a very sophisticated beast, and forthly, well just see point 1 :-). And I just figured out why I was unable to write even that simple parser. The code you sent me works just fine, I copied and pasted it, no problems. But, as soon as I comment out the main function the type checker complains, because now the ghci's type checker can no longer infer the types of pDate or pDot. And this is exactly what happened. I kept getting error messages from ghci, eventhough I had defined my parsers possible correctly, but, *minus* the type signatures *and* minus any main function that called it. In hindsight I realize that this is a trap I have walked into many times before, I guess I still have not acquired a Haskellers intuition. I promise to do better next time :) Günther ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Lazy Parsing
Dear Gunther, I am providing my solution, on which one can of course specialise in making sure that a valid date is parsed, which would be a bit more cumbersome; how should e.g. error correction be done. I prefer to test afterwards in such situations. Best, Doaitse module Guenther where import Text.ParserCombinators.UU.Parsing import Text.ParserCombinators.UU.BasicInstances import Text.ParserCombinators.UU.Examples hiding (main) import Control.Applicative hiding ((*), (*), ($)) {- The first decision we have to make is what kind of input we are providing. The simplest case is just to assume simple characters, hence for our input type we will use the standard provided stream of Characters: Str Char, so we use the type of our parsers to be the type used in the Examples module; since we do not know whether we wil be using the parsers in a monadic mode too we stay on the safe side ans use the type P_m -} type GP a = P_m (Str Char) a -- GP stands for GuenterParser {- Once we know that our input contains characters, but that in our output we what to have integer values, we start out by building a parser for a single integer , for which we use the function pNatural form the examples-} pDate = (,,) $ pNatural * pDot * pNatural * pDot * (pNatural ::GP Int) pDot = pSym '.' {- main = do print (test pDate 3.4.1900) print (test pDate 3 4 1900) print (test pDate ..1900)-} -- end of Module Guenther By playing with insertion and deletion costs (e.g. by building a more picky pNatural) one can control the error recovery. Another option to get better error recovery would be to define a specialised instance of Provides which removes spaces. You might even temporarily pSwitch to the use of this state Period. I do not even manage to write a parser for even a mere digit or a simple character. I have read the tutorial from a to a to z and from z to a and there were a few words I recognized. I mean I'd like to be able to turn 12.05.2009 into something like (12, 5, 2009) and got no clue what the code would have to look like. I do know almost every variation what the code must not look like :). I am guessing here that when one does define a parsing function, since all the parser combinators aren't function but methods, one *must* also provide a type signature so that the compiler knows the actual *instance* method? Günther ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Lazy Parsing
Dear Doaitse, It is my pleasure to announce that after 5 days of experimenting with uu-parsinglib I have absolutely no clue, whatsoever, on how to use it. Period. I do not even manage to write a parser for even a mere digit or a simple character. I have read the tutorial from a to a to z and from z to a and there were a few words I recognized. I mean I'd like to be able to turn 12.05.2009 into something like (12, 5, 2009) and got no clue what the code would have to look like. I do know almost every variation what the code must not look like :). I am guessing here that when one does define a parsing function, since all the parser combinators aren't function but methods, one *must* also provide a type signature so that the compiler knows the actual *instance* method? Günther ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Lazy Parsing
Lazy parsing has been the default for the last ten years in uulib, and is now available in the simple uu-parsinglib (http://hackage.haskell.org/cgi-bin/hackage-scripts/package/uu-parsinglib ). The whole design of the latter in described in a technical report to which references are given on the web page. It provides also error correction, the ability to use several different kinds of input tokens, and (with some help) ambiguities. If speed is an issue you can insert extra hints which locally change the breadth-first parsing process locally into a somewhat more depth-first form. When compared with Parsec the good news is that usually you do not have to put annotations to get nice results. The older uulib version also performs an abstract interpretation which basically changes the search for which alternative to take from a linear to a logarithmic complexity, but does not provide a monadic structure, in which you use results recognised thus far to construct new parsers. Both the old uulib version and the new version have always had an applicative interface. In the near future elements of the abstract interpretation of the old uulib version will migrate into the new version. It is the advent of GADT's which made this new version feasable. An example of the error correction at work at the following example code: pa, pb, paz :: P_m (Str Char) [Char] pa = lift $ pSym 'a' pb = lift $ pSym 'b' p ++ q = (++) $ p * q pa2 = pa ++ pa pa3 = pa ++ pa2 pCount p = (\ a b - b+1) $ p * pCount p | pReturn 0 pExact 0 p = pReturn [] pExact n p = (:) $ p * pExact (n-1) p paz = pMany (pSym ('a', 'z')) paz' = pSym (\t - 'a' = t t = 'z', a .. z, 'k') main :: IO () main = do print (test pa a) print (test pa b) print (test pa2 bbab) print (test pa ba) print (test pa aa) print (test (do l - pCount pa pExact l pb) aaacabbb) print (test (amb ( (++) $ pa2 * pa3 | (++) $ pa3 * pa2)) aaabaa) print (test paz ab1z7) print (test paz' m) print (test paz' ) is loeki:~ doaitse$ ghci -package uu-parsinglib GHCi, version 6.10.1: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer ... linking ... done. Loading package base ... linking ... done. Loading package syb ... linking ... done. Loading package array-0.2.0.0 ... linking ... done. Loading package filepath-1.1.0.1 ... linking ... done. Loading package old-locale-1.0.0.1 ... linking ... done. Loading package old-time-1.0.0.1 ... linking ... done. Loading package unix-2.3.1.0 ... linking ... done. Loading package directory-1.0.0.2 ... linking ... done. Loading package process-1.0.1.1 ... linking ... done. Loading package random-1.0.0.1 ... linking ... done. Loading package haskell98 ... linking ... done. Loading package uu-parsinglib-2.0.0 ... linking ... done. Prelude :m Text.ParserCombinators.UU.Examples Prelude Text.ParserCombinators.UU.Examples main (a,[]) (a,[ Deleted 'b' at position 0 expecting one of ['a'], Inserted 'a' at position 1 expecting one of ['a']]) (aa,[ Deleted 'b' at position 0 expecting one of ['a'], Deleted 'b' at position 1 expecting one of ['a'], Deleted 'b' at position 3 expecting one of ['a'], Inserted 'a' at position 4 expecting one of ['a']]) (a,[ Deleted 'b' at position 0 expecting one of ['a']]) (a,[ The token 'a'was not consumed by the parsing process.]) ([b,b,b,b],[ Deleted 'c' at position 3 expecting one of ['a','b'], Inserted 'b' at position 8 expecting one of ['b']]) ([a],[ Deleted 'b' at position 3 expecting one of ['a','a']]) (abz,[ Deleted '1' at position 2 expecting one of ['a'..'z'], The token '7'was not consumed by the parsing process.]) ('m',[]) ('k',[ Inserted 'k' at position 0 expecting one of [a .. z]]) Prelude Text.ParserCombinators.UU.Examples Doaitse Swierstra On 27 mei 2009, at 01:52, Günther Schmidt wrote: Hi all, is it possible to do lazy parsing with Parsec? I understand that one can do that with polyparse, don't know about uulib, but I happen to be already somewhat familiar with Parsec, so before I do switch to polyparse I rather make sure I actually have to. The files it has to parse is anywhere from 500 MB to 5 GB. Günther ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Lazy Parsing
In the uu-parsinglib we actually have two versions of parsers: lazy ones and strict ones, which have different types. So by giving a type annotation you can select the one you want. Notice that in the left- hand side of a monadic construct it does not make sense to use a lazy parser, since its result will be used as a parameter to the right-hand side operator, so in case of a monad our library system automagically selects the strict version for the left hand side. For the right hand side it depends on the type of the overall expression. Unfortunately in Haskell both the left and right hand side of a bind need the to be elements of the same monad, whereas in the case of a lazy oevrall parser this is not the case. We solve this problem by tupling the two parsers (NOT the parsing results), so still the do-notation can be used. The use of the library is free of any trickery! Doaitse Swierstra On 28 mei 2009, at 11:41, Malcolm Wallace wrote: Henning Thielemann schlepp...@henning-thielemann.de wrote: I don't think that it is in general possible to use the same parser for lazy and strict parsing, just because of the handling of parser failure. Polyparse demonstrates that you can mix-and-match lazy parsers with strict parsers in the different parts of a grammar (by choosing whether to use applicative or monadic style). You can also switch between lazy or strict interpretations of the applicative parts of your grammar (by changing the import that decides which version of the parser primitives is in scope). I also used polyparse for lazy parsing, but I found it unintuitive how to make a parser lazy. It can certainly be tricky, and requires a certain amount of experimentation. I think the difficulties are mainly due to the mix of lazy (applicative) and strict (monadic) styles in different non-terminals. A parser that you intend to be lazy, may turn out to be stricter than you hope, because of the strictness of another parser that it depends upon. Regards, Malcolm ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Lazy Parsing
Dear Doaitse, In the days since my original post I had already come to favor the uu-parsing package. I have printed the report and read it every day to figure out how to use it. I cannot follow everything yet, and also hope that won't be necessary in order to use it. :-) My progress is a bit slow, but I'm not giving up. What I do like most, over the other combinatory packages, is the approach of using breadth-first when it comes to choice, the idea is certainly enlightening. The packages capability to do online- / partial parsing is essential for me. I am a bit surprised about it's raw state. The basic combinators and primitives are there but combinators like pChain, pDigit etc. are not predefined and merely present in the examples package. I had gotten quite comfortable with parsec and need to find the right way to translate my parsec code to your package. Anyway let me thank you for your work, I really appreciate it very much. Günther S. Doaitse Swierstra schrieb: Lazy parsing has been the default for the last ten years in uulib, and is now available in the simple uu-parsinglib (http://hackage.haskell.org/cgi-bin/hackage-scripts/package/uu-parsinglib). The whole design of the latter in described in a technical report to which references are given on the web page. It provides also error correction, the ability to use several different kinds of input tokens, and (with some help) ambiguities. If speed is an issue you can insert extra hints which locally change the breadth-first parsing process locally into a somewhat more depth-first form. When compared with Parsec the good news is that usually you do not have to put annotations to get nice results. The older uulib version also performs an abstract interpretation which basically changes the search for which alternative to take from a linear to a logarithmic complexity, but does not provide a monadic structure, in which you use results recognised thus far to construct new parsers. Both the old uulib version and the new version have always had an applicative interface. In the near future elements of the abstract interpretation of the old uulib version will migrate into the new version. It is the advent of GADT's which made this new version feasable. An example of the error correction at work at the following example code: pa, pb, paz :: P_m (Str Char) [Char] pa = lift $ pSym 'a' pb = lift $ pSym 'b' p ++ q = (++) $ p * q pa2 = pa ++ pa pa3 = pa ++ pa2 pCount p = (\ a b - b+1) $ p * pCount p | pReturn 0 pExact 0 p = pReturn [] pExact n p = (:) $ p * pExact (n-1) p paz = pMany (pSym ('a', 'z')) paz' = pSym (\t - 'a' = t t = 'z', a .. z, 'k') main :: IO () main = do print (test pa a) print (test pa b) print (test pa2 bbab) print (test pa ba) print (test pa aa) print (test (do l - pCount pa pExact l pb) aaacabbb) print (test (amb ( (++) $ pa2 * pa3 | (++) $ pa3 * pa2)) aaabaa) print (test paz ab1z7) print (test paz' m) print (test paz' ) is loeki:~ doaitse$ ghci -package uu-parsinglib GHCi, version 6.10.1: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer ... linking ... done. Loading package base ... linking ... done. Loading package syb ... linking ... done. Loading package array-0.2.0.0 ... linking ... done. Loading package filepath-1.1.0.1 ... linking ... done. Loading package old-locale-1.0.0.1 ... linking ... done. Loading package old-time-1.0.0.1 ... linking ... done. Loading package unix-2.3.1.0 ... linking ... done. Loading package directory-1.0.0.2 ... linking ... done. Loading package process-1.0.1.1 ... linking ... done. Loading package random-1.0.0.1 ... linking ... done. Loading package haskell98 ... linking ... done. Loading package uu-parsinglib-2.0.0 ... linking ... done. Prelude :m Text.ParserCombinators.UU.Examples Prelude Text.ParserCombinators.UU.Examples main (a,[]) (a,[ Deleted 'b' at position 0 expecting one of ['a'], Inserted 'a' at position 1 expecting one of ['a']]) (aa,[ Deleted 'b' at position 0 expecting one of ['a'], Deleted 'b' at position 1 expecting one of ['a'], Deleted 'b' at position 3 expecting one of ['a'], Inserted 'a' at position 4 expecting one of ['a']]) (a,[ Deleted 'b' at position 0 expecting one of ['a']]) (a,[ The token 'a'was not consumed by the parsing process.]) ([b,b,b,b],[ Deleted 'c' at position 3 expecting one of ['a','b'], Inserted 'b' at position 8 expecting one of ['b']]) ([a],[ Deleted 'b' at position 3 expecting one of ['a','a']]) (abz,[ Deleted '1' at position 2 expecting one of ['a'..'z'], The token '7'was not consumed by the parsing process.]) ('m',[]) ('k',[ Inserted 'k' at position 0 expecting one of [a .. z]]) Prelude Text.ParserCombinators.UU.Examples Doaitse Swierstra On 27 mei 2009, at 01:52, GüŸnther Schmidt wrote: Hi all, is it possible to do lazy parsing with Parsec? I understand that one can do that with polyparse, don't know about uulib, but I happen to be already
Re: [Haskell-cafe] Lazy Parsing
Henning Thielemann schlepp...@henning-thielemann.de wrote: I don't think that it is in general possible to use the same parser for lazy and strict parsing, just because of the handling of parser failure. Polyparse demonstrates that you can mix-and-match lazy parsers with strict parsers in the different parts of a grammar (by choosing whether to use applicative or monadic style). You can also switch between lazy or strict interpretations of the applicative parts of your grammar (by changing the import that decides which version of the parser primitives is in scope). I also used polyparse for lazy parsing, but I found it unintuitive how to make a parser lazy. It can certainly be tricky, and requires a certain amount of experimentation. I think the difficulties are mainly due to the mix of lazy (applicative) and strict (monadic) styles in different non-terminals. A parser that you intend to be lazy, may turn out to be stricter than you hope, because of the strictness of another parser that it depends upon. Regards, Malcolm ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Lazy Parsing
Günther Schmidt schrieb: Hi all, is it possible to do lazy parsing with Parsec? I understand that one can do that with polyparse, don't know about uulib, but I happen to be already somewhat familiar with Parsec, so before I do switch to polyparse I rather make sure I actually have to. The files it has to parse is anywhere from 500 MB to 5 GB. I don't think that it is in general possible to use the same parser for lazy and strict parsing, just because of the handling of parser failure. If parser failure is denoted by a Left constructor in (Either Reason Result) then the whole parsing process must be finished, before the parser knows whether the answer is Left or Right. I also used polyparse for lazy parsing, but I found it unintuitive how to make a parser lazy. I tried to do better in tagchup, where I make explicit in the type, whether a parser can fail or not. In the first case in cannot be lazy, in the second case it can. I also did lazy parsing in 'midi' package and in 'spreadsheet'. I liked to factor out a lazy parser library from them, but I failed to unify all these applications. At least I have factored out handling of lazy failure (aka asnychronous exceptions) in explicit-exception package. Btw. a good place to discuss such issues is our local Haskell meeting that takes place on 2009-06-12: http://iba-cg.de/hal4.html ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Lazy Parsing
On Wed, 27 May 2009, Gü?nther Schmidt wrote: is it possible to do lazy parsing with Parsec? I understand that one can do that with polyparse, don't know about uulib, but I happen to be already somewhat familiar with Parsec, so before I do switch to polyparse I rather make sure I actually have to. Also see http://www.haskell.org/haskellwiki/Maintaining_laziness___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Lazy Parsing
Hi all, is it possible to do lazy parsing with Parsec? I understand that one can do that with polyparse, don't know about uulib, but I happen to be already somewhat familiar with Parsec, so before I do switch to polyparse I rather make sure I actually have to. The files it has to parse is anywhere from 500 MB to 5 GB. Günther ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Lazy Parsing
Brandon Michael Moore wrote: I'm wondering if there are any libraries out there for creating parsers that lazily build up their result. I know I could thread the remaining input through a parser by hand, but it seems like someone should have already done it. This turns out to be rather difficult to do in the general case (but see below -- XML is a special case). If you have type Parser sym result = [sym] - Maybe (result, [sym]) a Parser can't decide whether to return 'Just (result,rest)' or 'Nothing' until it has successfully parsed the complete result. So pattern matching on the parser's return value will force the entire production. Variations on the theme -- Either instead of Maybe, list-of-successes, continuation-passing combinators, etc -- all face a similar problem. However, if your top-level grammar is of the form: things :: empty | thing things {- == thing* -} then instead of: case runParser (pMany pThing) input of Just (result,[]) - ... you can use something like unfoldr (runParser pThing) input to build the result list incrementally. This will be less eager; instead of parsing and returning an entire list of Things, it parses one Thing at a time. Another thing to watch out for is heap drag. The list-of-successes approach tends to retain the entire input, just in case the parser needs to backtrack. Parsec [1] and UU_Parsing [?] solve this by severely restricting the amount of required lookahead. I'd like to be able to turn a stream of XML into a lazy tree of tags (probably Maybe tags, or Either errors tags), but I don't think HaXml and the like do that sort of thing. That's exactly how HXML [2] works. The parser returns a lazy list of tokens (analogous to SAX events), which are folded up into a tree by a separate function. In addition it uses a CPS parser library so (as with Parsec), there is minimal heap drag. [1] Parsec: URL: http://www.cs.ruu.nl/~daan/parsec.html [1] HXML: URL: http://www.flightlab.com/~joe/hxml (Note: HXML release 0.2 will be ready Real Soon Now, and there have been many incompatible changes since 0.1. The main thing left to be finished is the documentation, if you can live without that let me know and I'll put a snapshot up.) --Joe English [EMAIL PROTECTED] ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Lazy Parsing
I'm wondering if there are any libraries out there for creating parsers that lazily build up their result. I know I could thread the remaining input through a parser by hand, but it seems like someone should have already done it. I'd like to be able to turn a stream of XML into a lazy tree of tags (probably Maybe tags, or Either errors tags), but I don't think HaXml and the like do that sort of thing. Branodn Moore ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Lazy Parsing
There's a combinator which Phil Wadler called guarantee which makes a parser lazy -- guarantee p succeeds at once, with a result which will be produced, when demanded, by p. Many parsing libraries include it under one name or another... John ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe