Stephane Bortzmeyer wrote:
On Mon, Jul 31, 2006 at 06:51:27PM +0100,
Chris Kuklewicz <[EMAIL PROTECTED]> wrote
a message of 102 lines which said:
minilang = do
char 'a'
optional (try (do {comma ; char 'b'}))
optional (do {comma ; char 'c'})
eof
return "OK"
I now have a new problem which was hidden beneath. If the language
authorizes "a,bb" and "a,bbc", "a,bbc" is not accepted by my parser
since it already accepted "a,bb" and the "c" which is left triggers a
syntax error.
This time, "try" believes it succeeded but should not. I need more
look-ahead but I'm not sure how?
The problem is mentioned here:
http://www.cs.uu.nl/people/daan/download/parsec/parsec.html#notFollowedBy
Your whole parser is indeed failing, and again it is because of the "failing
after consuming some input" issue. For "a,bbc" your "bb" token parser consumes
the "bb" and then the dangling "c" causes the error.
So you cannot commit to consuming the "bb" unless you know the rest of the
string is okay. There are a few ways to accomplish this. The first would be to
test whether "bb" is followed by "eof" or "comma" before accepting it. Another
solution is to try and parse what follows "bb" before accepting "bb".
A small fix would look like:
minilang' = do
string "a"
optional (try $ do {comma ; string "bb"; endToken})
optional (do {comma ; string "bbc"})
eof
return "OK"
where endToken = eof <|> lookAhead (comma >> return ())
A more general fix looks like this:
stringLang :: [String] -> GenParser Char st [String]
stringLang items = polyLang comma (map string items)
listLang :: [Char] -> GenParser Char st [Char]
listLang items = polyLang comma (map char items)
The first version of polyLang uses the "test eof or comma before accepting"
strategy:
polyLang :: (Show element,Show token) => GenParser element state ignore
-> [GenParser element state token] -> GenParser element state [token]
polyLang _ [] = eof >> return []
polyLang separator input = (use input) <|> polyLang separator (tail input)
where use (opX:xs) = do
(x,test) <- try (do x <- opX
test <- more
when test (separator >> return ())
return (x,test))
rest <- if test then (loop xs <|> unexpected ("(problem after "++show x++")"))
else return []
return (x:rest)
more = option True (eof >> return False)
loop [] = (unexpected "cannot parse")
loop input' = use input' <|> loop (tail input')
The second version polyLang' uses the "test rest of input before accepting"
strategy:
polyLang' :: (Show element,Show token) => GenParser element state ignore
-> [GenParser element state token] -> GenParser element state [token]
polyLang' _ [] = eof >> return []
polyLang' separator input = (use input) <|> polyLang' separator (tail input)
where use (opX:xs) = try (do x <- opX
test <- more
rest <- if test
then separator >> (loop xs <|> unexpected ("(problem
after "++show x++")"))
else return []
return (x:rest))
more = option True (eof >> return False)
loop [] = (unexpected "cannot parse")
loop input' = use input' <|> loop (tail input')
It works:
*Main> run (stringLang ["a","bb","bbc"]) "a,bbc"
["a","bbc"]
The error reporting gets a bit strange, and is different between the two
versions of polyLang'
*Main> run (polyLang comma (map string ["a","bb","bbc","dd"])) "a,bbc,bb"
parse error at (line 1, column 7):
unexpected cannot parse or (problem after "bbc")
expecting "dd"
*Main> run (polyLang' comma (map string ["a","bb","bbc","d"])) "a,bbc,bb"
parse error at (line 1, column 1):
unexpected "c", cannot parse, (problem after "bbc"), (problem after "a") or "a"
expecting end of input, ",", "dd", "bb" or "bbc"
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe