It occurs to me you could also use attoparsec, which is specifically optimised for bytestring processing.
sjoerd: > Hi, > > Somebody told me about Parsec 3, which uses a Stream type class so it > can parse any data type. This sounded like the right way to do > encoding independent parsing, so I decided to see how it would work to > parse UTF8 JSON. > > Sadly I could not use Text.JSON.Parsec directly, because it uses the > old Parsec CharParser type. So I copied to code, and also replaced > p_number with the "floating" parser from Text.Parsec.Token, because > Text.JSON.Parsec uses readFloat (a dirty hack imho) which works only > on String. > > If Text.JSON.Parsec was written for Parsec 3, the only thing to write > to get UTF8 JSON parsing would be: > > instance (Monad m, U.UTF8Bytes string index) => Stream (U.UTF8 string) > m Char where > uncons = return . U.uncons > > I did not do any performance measuring yet, I was glad I got it > working. Any comments on the code is appreciated! > > greetings, > Sjoerd Visscher > > {-# LANGUAGE FlexibleInstances, MultiParamTypeClasses, > UndecidableInstances #-} > import qualified Data.String.UTF8 as U > import qualified Data.ByteString as B > > import Text.Parsec hiding (many, optional, (<|>)) > import Control.Applicative > > import Text.JSON.Types > import Control.Monad > import Data.Char > import Numeric > > instance (Monad m, U.UTF8Bytes string index) => Stream (U.UTF8 string) > m Char where > uncons = return . U.uncons > > type CharParser st = Parsec (U.UTF8 B.ByteString) st > > parseFile :: FilePath -> IO (Either ParseError JSValue) > parseFile fileName = do > bs <- B.readFile fileName > return $ runParser json () fileName (U.fromRep bs) > > parseString :: String -> Either ParseError JSValue > parseString s = runParser json () "(unknown)" (U.fromString s) > > json :: CharParser () JSValue > json = spaces *> p_value > > tok :: CharParser () a -> CharParser () a > tok p = p <* spaces > > p_value :: CharParser () JSValue > p_value = (JSNull <$ p_null) > <|> (JSBool <$> p_boolean) > <|> (JSArray <$> p_array) > <|> (JSString <$> p_js_string) > <|> (JSObject <$> p_js_object) > <|> (JSRational False <$> p_number) > <?> "JSON value" > > p_null :: CharParser () () > p_null = tok (string "null") >> return () > > p_boolean :: CharParser () Bool > p_boolean = tok > ( (True <$ string "true") > <|> (False <$ string "false") > ) > > p_array :: CharParser () [JSValue] > p_array = between (tok (char '[')) (tok (char ']')) > $ p_value `sepBy` tok (char ',') > > p_string :: CharParser () String > p_string = between (tok (char '"')) (char '"') (many p_char) > where p_char = (char '\\' >> p_esc) > <|> (satisfy (\x -> x /= '"' && x /= '\\')) > > p_esc = ('"' <$ char '"') > <|> ('\\' <$ char '\\') > <|> ('/' <$ char '/') > <|> ('\b' <$ char 'b') > <|> ('\f' <$ char 'f') > <|> ('\n' <$ char 'n') > <|> ('\r' <$ char 'r') > <|> ('\t' <$ char 't') > <|> (char 'u' *> p_uni) > <?> "escape character" > > p_uni = check =<< count 4 (satisfy isHexDigit) > where check x | code <= max_char = pure (toEnum code) > | otherwise = empty > where code = fst $ head $ readHex x > max_char = fromEnum (maxBound :: Char) > > p_object :: CharParser () [(String,JSValue)] > p_object = between (tok (char '{')) (tok (char '}')) > $ p_field `sepBy` tok (char ',') > where p_field = (,) <$> (p_string <* tok (char ':')) <*> p_value > > p_number :: CharParser () Rational > p_number = tok floating where > > floating :: CharParser () Rational > floating = do{ n <- decimal > ; fract <- option 0 fraction > ; expo <- option 1 exponent' > ; return ((fromInteger n + fract)*expo) > } > > fraction = do{ char '.' > ; digits <- many1 digit <?> "fraction" > ; return (foldr op 0 digits) > } > <?> "fraction" > where > op d f = (f + fromIntegral (digitToInt d))/10 > > exponent' = do{ oneOf "eE" > ; f <- sign > ; e <- decimal <?> "exponent" > ; return (power (f e)) > } > <?> "exponent" > where > power e | e < 0 = 1/power(-e) > | otherwise = fromInteger (10^e) > > sign = (char '-' >> return negate) > <|> (char '+' >> return id) > <|> return id > > decimal = number 10 digit > > number base baseDigit > = do{ digits <- many1 baseDigit > ; let n = foldl (\x d -> base*x + toInteger (digitToInt > d)) 0 digits > ; seq n (return n) > } > > > p_js_string :: CharParser () JSString > p_js_string = toJSString <$> p_string > > p_js_object :: CharParser () (JSObject JSValue) > p_js_object = toJSObject <$> p_object > > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe