Re: [Haskell-cafe] Troubles understanding Parsec Error Handling

2012-06-11 Thread Antoine Latter
On Wed, May 30, 2012 at 5:47 PM, Roman Cheplyaka r...@ro-che.info wrote:

 With this patch your code prints:

    parse error at (line 1, column 7):
    unexpected Hallofb, expecting one of [Hello,Hallo,Foo,HallofFame]


Hi folks,

Roman's patch has been included in the newly-released parsec 3.1.3:

http://hackage.haskell.org/package/parsec-3.1.3

Enjoy,

Antoine

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Troubles understanding Parsec Error Handling

2012-05-31 Thread Matthias Hörmann
Hello

Thanks for the quick help with this. I thought about the idea that
lookAhead might be the cause of the
positioning bug but then discarded that idea because I thought lookAhead
should never lead to an error
past wherever the input position is now considering it doesn't consume any
input.

I am aware of the issue with the error message position and the output, I
was still working on improving
that when I was puzzled by the fact that the error message I specified
wasn't even returned to me.

As for try, I believe I need it to make sure the input I consume one
character at the time, before I know
if I will reach another valid match (or any at all) does not stay consumed
when my parser fails.

I am still very much in the experimental phase as far as writing Parsec
combinators beyond very simple
stuff is concerned so I am open for suggestions on how to improve it in a
way that doesn't need try.

Thanks again for all the help and especially for the patch. After applying
it I do get the error message I
specified.

I noticed there are still some other problems in the code. In particular it
doesn't work as intended in cases
like this one:

parseTest (do; r1 - anyOf [Hello, Hallo, Foo, HallofFame]; r2 -
string fbla; return (r1, r2)) Hallofbla

where it should (according to my goal) return no parse error but instead
accept Hallo and allow the string parser
to consume the rejected suffix but I will try to fix that.

Matthias Hoermann
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Troubles understanding Parsec Error Handling

2012-05-31 Thread Roman Cheplyaka
* Matthias Hörmann mhoerm...@gmail.com [2012-05-31 10:40:31+0200]
 I noticed there are still some other problems in the code. In particular it
 doesn't work as intended in cases
 like this one:
 
 parseTest (do; r1 - anyOf [Hello, Hallo, Foo, HallofFame]; r2 -
 string fbla; return (r1, r2)) Hallofbla
 
 where it should (according to my goal) return no parse error but instead
 accept Hallo and allow the string parser
 to consume the rejected suffix but I will try to fix that.

This looks more like a job for regular expressions.
E.g. using the regex-applicative package:

 let anyOf = foldr1 (|) . map string
 let re = (,) $ anyOf [Hello, Hallo, Foo, HallofFame] * string 
fbla
 Hallofbla =~ re

Just (Hallo,fbla)

Theoretically regular expressions also do the kind of optimization that
you achieve with a trie, but this particular engine doesn't. Nevertheless,
it may be a good base for your own engine.

-- 
Roman I. Cheplyaka :: http://ro-che.info/

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Troubles understanding Parsec Error Handling

2012-05-30 Thread Matthias Hörmann
I recently started writing my first application at work in Haskell and it
deals with a lot of parsing.
Among other things I often have to check for a lot of alternatives for
fixed strings (parsing natural
language text and people have a lot of ways to abbreviate the same thing in
labels). So far I have been
doing this basically via

choice $ map (try . string) [ foo, bar, ... ]

This works fine but has two disadvantages, it isn't very fast, in
particular when many of the strings
start with the same prefix and it also is a bit error prone since it breaks
when you place a prefix of
another string earlier in the list.

My attempt at a solution was to use the bytestring-trie package for a
little utility function that basically
parses one character at a time, checks if the string parsed so far is in
the trie and then calls itself recursively
with the trie starting with that string. My attempt at that so far looks
like this:

(dependencies bytestring-trie, utf8-string and parsec 3)

import qualified Data.ByteString.UTF8 as UTF8
import qualified Data.Trie as Trie
import Text.Parsec
import Text.Parsec.Text (GenParser)

anyOf :: [String] - GenParser u String
anyOf l =
  try $ anyOf' t 
where t = Trie.fromList $ zip (map UTF8.fromString l) (repeat ())
  anyOf' :: Trie.Trie () - String - GenParser u String
  anyOf' t s = try $ do
  c - lookAhead $ anyChar
  let newS = s ++ [ c ] in
case Trie.submap (UTF8.fromString newS) t of
  emptyT | Trie.null emptyT -
case Trie.member (UTF8.fromString s) t of
  True -
return s
  False -
unexpected $ show newS++, expecting one of ++show
l
  restT - do
_ - anyChar
try $ anyOf' restT newS

A successful example usage would be:

parseTest (do; r1 - anyOf [Hello, Hallo, Foo, HallofFame]; r2 -
string bla; return (r1, r2)) Hallobla

which results in

(Hallo,bla)

(the extra string parser is there so errors in parsing too much are not
hidden). An error would result .e.g. from

parseTest (do; r1 - anyOf [Hello, Hallo, Foo, HallofFame]; r2 -
string bla; return (r1, r2)) Hallofbla

which prints this:

parse error at (line 1, column 8):unknown parse error

And my question about this is made up of two parts

1. Why doesn't it print my unexpected message but instead says unknown
parse error
2. Why is the location in the text off (I would expect it to fail at column
6 (first character beyond the result it could return) or 7 (first character
that makes the string no prefix of any acceptable string)

I am afraid my knowledge of Parsec internals is a bit too limited, some
Google queries showed no similar problems and no obvious places in the
Parsec source code to check for the answer to the first question in
particular and I suspect the second is closely related to the first.

Thanks for reading through my question and I hope someone knows the answer
or at least some clues on where i might find it.

Matthias Hoermann

P.S.: I am hoping this time this works, last time it was rejected because
google sends with @googlemail.com instead of @gmail.com for some reason.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Troubles understanding Parsec Error Handling

2012-05-30 Thread Kevin Charter
Hi Matthias,

On Wed, May 30, 2012 at 1:36 PM, Matthias Hörmann mhoerm...@gmail.comwrote:

 parseTest (do; r1 - anyOf [Hello, Hallo, Foo, HallofFame]; r2 -
 string bla; return (r1, r2)) Hallofbla

 which prints this:

 parse error at (line 1, column 8):unknown parse error

 And my question about this is made up of two parts

 1. Why doesn't it print my unexpected message but instead says unknown
 parse error
 2. Why is the location in the text off (I would expect it to fail at
 column 6 (first character beyond the result it could return) or 7 (first
 character that makes the string no prefix of any acceptable string)


What version of parsec 3 are you using? In version 3.1.1, I get (using
Text.Parsec.String instead of Text.Parsec.Text):

parse error at (line 1, column 1):
unexpected Hallofb, expecting one of [Hello,Hallo,Foo,HallofFame]

which is what I would have expected, bearing in mind that 'try p' pretends
that it hasn't consumed input when 'p' fails.

I don't think you need to use 'try' in your 'anyOf' function, but you'll
have to change it to handle seeing the end of input if the one-character
look-ahead fails.

Kevin
-- 
Kevin Charter
kevin.char...@acm.org
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Troubles understanding Parsec Error Handling

2012-05-30 Thread Kevin Charter
On Wed, May 30, 2012 at 3:11 PM, Kevin Charter kchar...@gmail.com wrote:

 What version of parsec 3 are you using? In version 3.1.1, I get (using
 Text.Parsec.String instead of Text.Parsec.Text):


Ah, answered my own question. I gather you're using 3.1.2, since it's the
first and so far only version with the Text.Parsec.Text module.

Kevin
-- 
Kevin Charter
kevin.char...@acm.org
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Troubles understanding Parsec Error Handling

2012-05-30 Thread Antoine Latter
On Wed, May 30, 2012 at 4:18 PM, Kevin Charter kchar...@gmail.com wrote:
 On Wed, May 30, 2012 at 3:11 PM, Kevin Charter kchar...@gmail.com wrote:

 What version of parsec 3 are you using? In version 3.1.1, I get (using
 Text.Parsec.String instead of Text.Parsec.Text):


 Ah, answered my own question. I gather you're using 3.1.2, since it's the
 first and so far only version with the Text.Parsec.Text module.


We changed how 'try' handled errors in some cases in between 3.1.1 and
3.1.2. I'll take a look at this.

Antoine

 Kevin
 --
 Kevin Charter
 kevin.char...@acm.org

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Troubles understanding Parsec Error Handling

2012-05-30 Thread Roman Cheplyaka
* Matthias Hörmann mhoerm...@gmail.com [2012-05-30 21:36:13+0200]
 And my question about this is made up of two parts
 
 1. Why doesn't it print my unexpected message but instead says unknown
 parse error
 2. Why is the location in the text off (I would expect it to fail at column
 6 (first character beyond the result it could return) or 7 (first character
 that makes the string no prefix of any acceptable string)

Thanks for reporting. This is a regression introduced by me in this patch:

Sun Feb 20 18:24:22 EET 2011  Roman Cheplyaka r...@ro-che.info
  * Choose the longest match when merging error messages

The source of the regression is that parsec sometimes generates dummy (aka
unknown) error messages when no actual error has occurred. In your
case the dummy error has a bigger position because it was generated
by anyChar inside lookAhead.

So, when merging errors, before simply looking at the positions we
should check if one of them is dummy and just ignore it. The patch is
attached.

With this patch your code prints:

parse error at (line 1, column 7):
unexpected Hallofb, expecting one of [Hello,Hallo,Foo,HallofFame]

This is probably still somewhat confusing to a user of your code
(there's no Hallofb starting at column 7), but is correct from
Parsec's point of view, because you generated this message while looking
at the 7th character.

-- 
Roman I. Cheplyaka :: http://ro-che.info/
1 patch for repository http://code.haskell.org/parsec3:

Thu May 31 01:38:09 EEST 2012  Roman Cheplyaka r...@ro-che.info
  * When merging error messages, prefer known messages to unknown ones
  
  This fixes a regression introduced by:
  
  Sun Feb 20 18:24:22 EET 2011  Roman Cheplyaka r...@ro-che.info
* Choose the longest match when merging error messages
  
  The source of the regression is that parsec sometimes generates dummy (aka
  unknown) error messages when no actual error has occurred.
  
  So, when merging errors, before simply looking at the positions we should 
check
  if one of them is unknown and just ignore it.
  
  Reported by Matthias Hörmann.

New patches:

[When merging error messages, prefer known messages to unknown ones
Roman Cheplyaka r...@ro-che.info**20120530223809
 Ignore-this: 1cfcc0a8d1cbfd183a3897e79c320c22
 
 This fixes a regression introduced by:
 
 Sun Feb 20 18:24:22 EET 2011  Roman Cheplyaka r...@ro-che.info
   * Choose the longest match when merging error messages
 
 The source of the regression is that parsec sometimes generates dummy (aka
 unknown) error messages when no actual error has occurred.
 
 So, when merging errors, before simply looking at the positions we should check
 if one of them is unknown and just ignore it.
 
 Reported by Matthias Hörmann.
] {
hunk ./Text/Parsec/Error.hs 137
 = ParseError pos (msg : filter (msg /=) msgs)
 
 mergeError :: ParseError - ParseError - ParseError
-mergeError (ParseError pos1 msgs1) (ParseError pos2 msgs2)
+mergeError e1@(ParseError pos1 msgs1) e2@(ParseError pos2 msgs2)
+-- prefer meaningful errors
+| null msgs2  not (null msgs1) = e1
+| null msgs1  not (null msgs2) = e2
+| otherwise
 = case pos1 `compare` pos2 of
 -- select the longest match
 EQ - ParseError pos1 (msgs1 ++ msgs2)
hunk ./Text/Parsec/Error.hs 145
-GT - ParseError pos1 msgs1
-LT - ParseError pos2 msgs2
+GT - e1
+LT - e2
 
 instance Show ParseError where
 show err
}

Context:

[TAG 3.1.2
Antoine Latter aslat...@gmail.com**20111008182138
 Ignore-this: 96361fd74cad3d51b4213e0bcd91cdf3
] 
[version bump for release
Antoine Latter aslat...@gmail.com**20111008181844
 Ignore-this: 9c28994644744eaf375d9c5d75d2b201
] 
[add Stream Text instances
Antoine Latter aslat...@gmail.com**20111008181718
 Ignore-this: fcf1bc6a54bae9936669e28047c4f736
] 
[Fix reserved name recognition for case-insensitive languages.
Antoine Latter aslat...@gmail.com**20111008180454
 Ignore-this: aed4027b1f273913f7586208e5a6f82c
] 
[Documentation fix
Roman Cheplyaka r...@ro-che.info**20111228222953
 Ignore-this: 2d226ed7cde7a8322be04f5188957eb2
] 
[lookAhead: do not consume input on success; update documentation
Roman Cheplyaka r...@ro-che.info**20110220162920
 Ignore-this: e884771490209b93e9fec044543a18ef
] 
[try: do not reset the error position
Roman Cheplyaka r...@ro-che.info**20110220162449
 Ignore-this: 8508bc41fc6dcd9b7c06aac762f12c71
] 
[Choose the longest match when merging error messages
Roman Cheplyaka r...@ro-che.info**20110220162422
 Ignore-this: 54e2733159a1574abb229e09ff6935c1
] 
[TAG 3.1.1
Antoine Latter aslat...@gmail.com**20110129160030
 Ignore-this: 42ddc9e7316d68945c2c1260c2acd403
] 
Patch bundle hash:
09fc71cdc9e86dc672f19bc8fb939103cae782bb
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Troubles understanding Parsec Error Handling

2012-05-30 Thread Kevin Charter
Hi Antoine and Roman,

On Wed, May 30, 2012 at 4:14 PM, Antoine Latter aslat...@gmail.com wrote:

 We changed how 'try' handled errors in some cases in between 3.1.1 and
 3.1.2. I'll take a look at this.

 Antoine


Thanks for confirming -- I tried 3.1.2 and got the same result as Matthias.
And Roman, thanks for the light-speed patch! I was about to say I had an
example that showed the problem might actually have to do with 'lookAhead'
rather than 'try', and then I saw your message.

Kevin
-- 
Kevin Charter
kevin.char...@acm.org
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe