Re: [Haskell-cafe] Troubles understanding Parsec Error Handling
On Wed, May 30, 2012 at 5:47 PM, Roman Cheplyaka r...@ro-che.info wrote: With this patch your code prints: parse error at (line 1, column 7): unexpected Hallofb, expecting one of [Hello,Hallo,Foo,HallofFame] Hi folks, Roman's patch has been included in the newly-released parsec 3.1.3: http://hackage.haskell.org/package/parsec-3.1.3 Enjoy, Antoine ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Troubles understanding Parsec Error Handling
Hello Thanks for the quick help with this. I thought about the idea that lookAhead might be the cause of the positioning bug but then discarded that idea because I thought lookAhead should never lead to an error past wherever the input position is now considering it doesn't consume any input. I am aware of the issue with the error message position and the output, I was still working on improving that when I was puzzled by the fact that the error message I specified wasn't even returned to me. As for try, I believe I need it to make sure the input I consume one character at the time, before I know if I will reach another valid match (or any at all) does not stay consumed when my parser fails. I am still very much in the experimental phase as far as writing Parsec combinators beyond very simple stuff is concerned so I am open for suggestions on how to improve it in a way that doesn't need try. Thanks again for all the help and especially for the patch. After applying it I do get the error message I specified. I noticed there are still some other problems in the code. In particular it doesn't work as intended in cases like this one: parseTest (do; r1 - anyOf [Hello, Hallo, Foo, HallofFame]; r2 - string fbla; return (r1, r2)) Hallofbla where it should (according to my goal) return no parse error but instead accept Hallo and allow the string parser to consume the rejected suffix but I will try to fix that. Matthias Hoermann ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Troubles understanding Parsec Error Handling
* Matthias Hörmann mhoerm...@gmail.com [2012-05-31 10:40:31+0200] I noticed there are still some other problems in the code. In particular it doesn't work as intended in cases like this one: parseTest (do; r1 - anyOf [Hello, Hallo, Foo, HallofFame]; r2 - string fbla; return (r1, r2)) Hallofbla where it should (according to my goal) return no parse error but instead accept Hallo and allow the string parser to consume the rejected suffix but I will try to fix that. This looks more like a job for regular expressions. E.g. using the regex-applicative package: let anyOf = foldr1 (|) . map string let re = (,) $ anyOf [Hello, Hallo, Foo, HallofFame] * string fbla Hallofbla =~ re Just (Hallo,fbla) Theoretically regular expressions also do the kind of optimization that you achieve with a trie, but this particular engine doesn't. Nevertheless, it may be a good base for your own engine. -- Roman I. Cheplyaka :: http://ro-che.info/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Troubles understanding Parsec Error Handling
I recently started writing my first application at work in Haskell and it deals with a lot of parsing. Among other things I often have to check for a lot of alternatives for fixed strings (parsing natural language text and people have a lot of ways to abbreviate the same thing in labels). So far I have been doing this basically via choice $ map (try . string) [ foo, bar, ... ] This works fine but has two disadvantages, it isn't very fast, in particular when many of the strings start with the same prefix and it also is a bit error prone since it breaks when you place a prefix of another string earlier in the list. My attempt at a solution was to use the bytestring-trie package for a little utility function that basically parses one character at a time, checks if the string parsed so far is in the trie and then calls itself recursively with the trie starting with that string. My attempt at that so far looks like this: (dependencies bytestring-trie, utf8-string and parsec 3) import qualified Data.ByteString.UTF8 as UTF8 import qualified Data.Trie as Trie import Text.Parsec import Text.Parsec.Text (GenParser) anyOf :: [String] - GenParser u String anyOf l = try $ anyOf' t where t = Trie.fromList $ zip (map UTF8.fromString l) (repeat ()) anyOf' :: Trie.Trie () - String - GenParser u String anyOf' t s = try $ do c - lookAhead $ anyChar let newS = s ++ [ c ] in case Trie.submap (UTF8.fromString newS) t of emptyT | Trie.null emptyT - case Trie.member (UTF8.fromString s) t of True - return s False - unexpected $ show newS++, expecting one of ++show l restT - do _ - anyChar try $ anyOf' restT newS A successful example usage would be: parseTest (do; r1 - anyOf [Hello, Hallo, Foo, HallofFame]; r2 - string bla; return (r1, r2)) Hallobla which results in (Hallo,bla) (the extra string parser is there so errors in parsing too much are not hidden). An error would result .e.g. from parseTest (do; r1 - anyOf [Hello, Hallo, Foo, HallofFame]; r2 - string bla; return (r1, r2)) Hallofbla which prints this: parse error at (line 1, column 8):unknown parse error And my question about this is made up of two parts 1. Why doesn't it print my unexpected message but instead says unknown parse error 2. Why is the location in the text off (I would expect it to fail at column 6 (first character beyond the result it could return) or 7 (first character that makes the string no prefix of any acceptable string) I am afraid my knowledge of Parsec internals is a bit too limited, some Google queries showed no similar problems and no obvious places in the Parsec source code to check for the answer to the first question in particular and I suspect the second is closely related to the first. Thanks for reading through my question and I hope someone knows the answer or at least some clues on where i might find it. Matthias Hoermann P.S.: I am hoping this time this works, last time it was rejected because google sends with @googlemail.com instead of @gmail.com for some reason. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Troubles understanding Parsec Error Handling
Hi Matthias, On Wed, May 30, 2012 at 1:36 PM, Matthias Hörmann mhoerm...@gmail.comwrote: parseTest (do; r1 - anyOf [Hello, Hallo, Foo, HallofFame]; r2 - string bla; return (r1, r2)) Hallofbla which prints this: parse error at (line 1, column 8):unknown parse error And my question about this is made up of two parts 1. Why doesn't it print my unexpected message but instead says unknown parse error 2. Why is the location in the text off (I would expect it to fail at column 6 (first character beyond the result it could return) or 7 (first character that makes the string no prefix of any acceptable string) What version of parsec 3 are you using? In version 3.1.1, I get (using Text.Parsec.String instead of Text.Parsec.Text): parse error at (line 1, column 1): unexpected Hallofb, expecting one of [Hello,Hallo,Foo,HallofFame] which is what I would have expected, bearing in mind that 'try p' pretends that it hasn't consumed input when 'p' fails. I don't think you need to use 'try' in your 'anyOf' function, but you'll have to change it to handle seeing the end of input if the one-character look-ahead fails. Kevin -- Kevin Charter kevin.char...@acm.org ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Troubles understanding Parsec Error Handling
On Wed, May 30, 2012 at 3:11 PM, Kevin Charter kchar...@gmail.com wrote: What version of parsec 3 are you using? In version 3.1.1, I get (using Text.Parsec.String instead of Text.Parsec.Text): Ah, answered my own question. I gather you're using 3.1.2, since it's the first and so far only version with the Text.Parsec.Text module. Kevin -- Kevin Charter kevin.char...@acm.org ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Troubles understanding Parsec Error Handling
On Wed, May 30, 2012 at 4:18 PM, Kevin Charter kchar...@gmail.com wrote: On Wed, May 30, 2012 at 3:11 PM, Kevin Charter kchar...@gmail.com wrote: What version of parsec 3 are you using? In version 3.1.1, I get (using Text.Parsec.String instead of Text.Parsec.Text): Ah, answered my own question. I gather you're using 3.1.2, since it's the first and so far only version with the Text.Parsec.Text module. We changed how 'try' handled errors in some cases in between 3.1.1 and 3.1.2. I'll take a look at this. Antoine Kevin -- Kevin Charter kevin.char...@acm.org ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Troubles understanding Parsec Error Handling
* Matthias Hörmann mhoerm...@gmail.com [2012-05-30 21:36:13+0200] And my question about this is made up of two parts 1. Why doesn't it print my unexpected message but instead says unknown parse error 2. Why is the location in the text off (I would expect it to fail at column 6 (first character beyond the result it could return) or 7 (first character that makes the string no prefix of any acceptable string) Thanks for reporting. This is a regression introduced by me in this patch: Sun Feb 20 18:24:22 EET 2011 Roman Cheplyaka r...@ro-che.info * Choose the longest match when merging error messages The source of the regression is that parsec sometimes generates dummy (aka unknown) error messages when no actual error has occurred. In your case the dummy error has a bigger position because it was generated by anyChar inside lookAhead. So, when merging errors, before simply looking at the positions we should check if one of them is dummy and just ignore it. The patch is attached. With this patch your code prints: parse error at (line 1, column 7): unexpected Hallofb, expecting one of [Hello,Hallo,Foo,HallofFame] This is probably still somewhat confusing to a user of your code (there's no Hallofb starting at column 7), but is correct from Parsec's point of view, because you generated this message while looking at the 7th character. -- Roman I. Cheplyaka :: http://ro-che.info/ 1 patch for repository http://code.haskell.org/parsec3: Thu May 31 01:38:09 EEST 2012 Roman Cheplyaka r...@ro-che.info * When merging error messages, prefer known messages to unknown ones This fixes a regression introduced by: Sun Feb 20 18:24:22 EET 2011 Roman Cheplyaka r...@ro-che.info * Choose the longest match when merging error messages The source of the regression is that parsec sometimes generates dummy (aka unknown) error messages when no actual error has occurred. So, when merging errors, before simply looking at the positions we should check if one of them is unknown and just ignore it. Reported by Matthias Hörmann. New patches: [When merging error messages, prefer known messages to unknown ones Roman Cheplyaka r...@ro-che.info**20120530223809 Ignore-this: 1cfcc0a8d1cbfd183a3897e79c320c22 This fixes a regression introduced by: Sun Feb 20 18:24:22 EET 2011 Roman Cheplyaka r...@ro-che.info * Choose the longest match when merging error messages The source of the regression is that parsec sometimes generates dummy (aka unknown) error messages when no actual error has occurred. So, when merging errors, before simply looking at the positions we should check if one of them is unknown and just ignore it. Reported by Matthias Hörmann. ] { hunk ./Text/Parsec/Error.hs 137 = ParseError pos (msg : filter (msg /=) msgs) mergeError :: ParseError - ParseError - ParseError -mergeError (ParseError pos1 msgs1) (ParseError pos2 msgs2) +mergeError e1@(ParseError pos1 msgs1) e2@(ParseError pos2 msgs2) +-- prefer meaningful errors +| null msgs2 not (null msgs1) = e1 +| null msgs1 not (null msgs2) = e2 +| otherwise = case pos1 `compare` pos2 of -- select the longest match EQ - ParseError pos1 (msgs1 ++ msgs2) hunk ./Text/Parsec/Error.hs 145 -GT - ParseError pos1 msgs1 -LT - ParseError pos2 msgs2 +GT - e1 +LT - e2 instance Show ParseError where show err } Context: [TAG 3.1.2 Antoine Latter aslat...@gmail.com**20111008182138 Ignore-this: 96361fd74cad3d51b4213e0bcd91cdf3 ] [version bump for release Antoine Latter aslat...@gmail.com**20111008181844 Ignore-this: 9c28994644744eaf375d9c5d75d2b201 ] [add Stream Text instances Antoine Latter aslat...@gmail.com**20111008181718 Ignore-this: fcf1bc6a54bae9936669e28047c4f736 ] [Fix reserved name recognition for case-insensitive languages. Antoine Latter aslat...@gmail.com**20111008180454 Ignore-this: aed4027b1f273913f7586208e5a6f82c ] [Documentation fix Roman Cheplyaka r...@ro-che.info**20111228222953 Ignore-this: 2d226ed7cde7a8322be04f5188957eb2 ] [lookAhead: do not consume input on success; update documentation Roman Cheplyaka r...@ro-che.info**20110220162920 Ignore-this: e884771490209b93e9fec044543a18ef ] [try: do not reset the error position Roman Cheplyaka r...@ro-che.info**20110220162449 Ignore-this: 8508bc41fc6dcd9b7c06aac762f12c71 ] [Choose the longest match when merging error messages Roman Cheplyaka r...@ro-che.info**20110220162422 Ignore-this: 54e2733159a1574abb229e09ff6935c1 ] [TAG 3.1.1 Antoine Latter aslat...@gmail.com**20110129160030 Ignore-this: 42ddc9e7316d68945c2c1260c2acd403 ] Patch bundle hash: 09fc71cdc9e86dc672f19bc8fb939103cae782bb ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Troubles understanding Parsec Error Handling
Hi Antoine and Roman, On Wed, May 30, 2012 at 4:14 PM, Antoine Latter aslat...@gmail.com wrote: We changed how 'try' handled errors in some cases in between 3.1.1 and 3.1.2. I'll take a look at this. Antoine Thanks for confirming -- I tried 3.1.2 and got the same result as Matthias. And Roman, thanks for the light-speed patch! I was about to say I had an example that showed the problem might actually have to do with 'lookAhead' rather than 'try', and then I saw your message. Kevin -- Kevin Charter kevin.char...@acm.org ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe