[Haskell-cafe] Slow Text.JSON parser

2009-01-13 Thread Levi Greenspan
Dear list members,

I tried Text.JSON from hackage and did an initial test to see how well
it performs. I created a single JSON file of roughly 6 MB containing a
single JSON array with 30906 JSON objects and used the following code
to parse it:


module Main where

import System.IO
import Data.Time.Clock
import System.Environment
import Text.Printf
import Text.JSON

parse s = do
start <- getCurrentTime
let !len = decode s
end <- getCurrentTime
print len
printf "Elapsed time = %s\n" (show $ diffUTCTime end start)
where
decode s = case decodeStrict s of
Ok (JSArray a) -> length a
_ -> -1

main = do
file <- getArgs >>= return . head
withFile file ReadMode (\h -> hGetContents h >>= parse)



The outcome was something like:

30906
Elapsed time = 2.902755s

on my 2GHz core 2 duo.

Another Java-based JSON parser (Jackson:
http://www.cowtowncoder.com/hatchery/jackson/index.html) gives me:

30906
Elapsed time = 480 ms

Now I wonder why Text.JSON is so slow in comparison and what can be
done about it. Any ideas? Or is the test case invalid?

Thanks,
Levi

---
The Java code for the Jackson test is:

import org.codehaus.jackson.JsonParser;
import org.codehaus.jackson.JsonFactory;
import org.codehaus.jackson.map.JsonTypeMapper;
import org.codehaus.jackson.map.JsonNode;

import java.io.File;

class Test {

public static void main(String[] args) throws Exception {
final long start = System.currentTimeMillis();
final JsonTypeMapper mapper = new JsonTypeMapper();
final JsonParser parser = new
JsonFactory().createJsonParser(new File(args[0]));
final JsonNode root = mapper.read(parser);
final long end = System.currentTimeMillis();
System.out.println(root.size());
System.out.println(String.format("Elapsed time = %d ms", end - start));
}
}
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Slow Text.JSON parser

2009-01-13 Thread Ketil Malde
"Levi Greenspan"  writes:

> Now I wonder why Text.JSON is so slow in comparison and what can be
> done about it. Any ideas? Or is the test case invalid?

I haven't used JSON, but at first glance, I'd blame String IO.  Can't
you decode from ByteString?

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Slow Text.JSON parser

2009-01-13 Thread Don Stewart
ketil:
> "Levi Greenspan"  writes:
> 
> > Now I wonder why Text.JSON is so slow in comparison and what can be
> > done about it. Any ideas? Or is the test case invalid?
> 
> I haven't used JSON, but at first glance, I'd blame String IO.  Can't
> you decode from ByteString?
> 

Text.JSON was never optimised for performance. It was designed for small
JSON objects. For things above 1M I'd suggest using Data.Binary (or a
quick JSON encoding over bytestrings). Shouldn't be too hard to prepare.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Slow Text.JSON parser

2009-01-13 Thread Sjoerd Visscher
JSON is a UNICODE format, like any modern format is today. ByteStrings  
are not going to work.


If everybody starts yelling "ByteString" every time String performance  
is an issue, I don't see how Haskell is ever going to be a "real world  
programming language".


On Jan 13, 2009, at 4:00 PM, Don Stewart wrote:


ketil:

"Levi Greenspan"  writes:


Now I wonder why Text.JSON is so slow in comparison and what can be
done about it. Any ideas? Or is the test case invalid?


I haven't used JSON, but at first glance, I'd blame String IO.  Can't
you decode from ByteString?



Text.JSON was never optimised for performance. It was designed for  
small

JSON objects. For things above 1M I'd suggest using Data.Binary (or a
quick JSON encoding over bytestrings). Shouldn't be too hard to  
prepare.


-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


--
Sjoerd Visscher
sjo...@w3future.com



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Slow Text.JSON parser

2009-01-13 Thread Don Stewart
utf8-string allows one to decode utf8 from bytestrings. It was built so
that we could decode utf8 strings at work from bytestrings :)


http://hackage.haskell.org/packages/archive/utf8-string/0.3.3/doc/html/Data-ByteString-UTF8.html

Enjoy! Libraries win every day of the week.

-- Don

sjoerd:
> JSON is a UNICODE format, like any modern format is today. ByteStrings  
> are not going to work.
> 
> If everybody starts yelling "ByteString" every time String performance  
> is an issue, I don't see how Haskell is ever going to be a "real world  
> programming language".
> 
> On Jan 13, 2009, at 4:00 PM, Don Stewart wrote:
> 
> >ketil:
> >>"Levi Greenspan"  writes:
> >>
> >>>Now I wonder why Text.JSON is so slow in comparison and what can be
> >>>done about it. Any ideas? Or is the test case invalid?
> >>
> >>I haven't used JSON, but at first glance, I'd blame String IO.  Can't
> >>you decode from ByteString?
> >>
> >
> >Text.JSON was never optimised for performance. It was designed for  
> >small
> >JSON objects. For things above 1M I'd suggest using Data.Binary (or a
> >quick JSON encoding over bytestrings). Shouldn't be too hard to  
> >prepare.
> >
> >-- Don
> >___
> >Haskell-Cafe mailing list
> >Haskell-Cafe@haskell.org
> >http://www.haskell.org/mailman/listinfo/haskell-cafe
> 
> --
> Sjoerd Visscher
> sjo...@w3future.com
> 
> 
> 
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Slow Text.JSON parser

2009-01-13 Thread Luke Palmer
On Tue, Jan 13, 2009 at 4:39 PM, Sjoerd Visscher wrote:

> JSON is a UNICODE format, like any modern format is today. ByteStrings are
> not going to work.


I don't understand this statement.  Why can one not make a parser from
ByteStrings that can decode UTF-8?

Luke


>
>
> If everybody starts yelling "ByteString" every time String performance is
> an issue, I don't see how Haskell is ever going to be a "real world
> programming language".
>
>
> On Jan 13, 2009, at 4:00 PM, Don Stewart wrote:
>
>  ketil:
>>
>>> "Levi Greenspan"  writes:
>>>
>>>  Now I wonder why Text.JSON is so slow in comparison and what can be
 done about it. Any ideas? Or is the test case invalid?

>>>
>>> I haven't used JSON, but at first glance, I'd blame String IO.  Can't
>>> you decode from ByteString?
>>>
>>>
>> Text.JSON was never optimised for performance. It was designed for small
>> JSON objects. For things above 1M I'd suggest using Data.Binary (or a
>> quick JSON encoding over bytestrings). Shouldn't be too hard to prepare.
>>
>> -- Don
>> ___
>> Haskell-Cafe mailing list
>> Haskell-Cafe@haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>
> --
> Sjoerd Visscher
> sjo...@w3future.com
>
>
>
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Slow Text.JSON parser

2009-01-13 Thread Sjoerd Visscher
It is not impossible, but a lot of work. And if you want to do it  
correctly you would have to support UTF-16 (BE of LE) and UTF-32 (BE  
of LE) as well. You can't expect someone to start writing utf encoders  
and decoders every time he needs a fast parser.


Sjoerd

On Jan 14, 2009, at 12:42 AM, Luke Palmer wrote:

On Tue, Jan 13, 2009 at 4:39 PM, Sjoerd Visscher  
 wrote:
JSON is a UNICODE format, like any modern format is today.  
ByteStrings are not going to work.


I don't understand this statement.  Why can one not make a parser  
from ByteStrings that can decode UTF-8?


Luke



If everybody starts yelling "ByteString" every time String  
performance is an issue, I don't see how Haskell is ever going to be  
a "real world programming language".



On Jan 13, 2009, at 4:00 PM, Don Stewart wrote:

ketil:
"Levi Greenspan"  writes:

Now I wonder why Text.JSON is so slow in comparison and what can be
done about it. Any ideas? Or is the test case invalid?

I haven't used JSON, but at first glance, I'd blame String IO.  Can't
you decode from ByteString?


Text.JSON was never optimised for performance. It was designed for  
small

JSON objects. For things above 1M I'd suggest using Data.Binary (or a
quick JSON encoding over bytestrings). Shouldn't be too hard to  
prepare.


-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

--
Sjoerd Visscher
sjo...@w3future.com




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe



--
Sjoerd Visscher
sjo...@w3future.com



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Slow Text.JSON parser

2009-01-13 Thread Brandon S. Allbery KF8NH

On 2009 Jan 13, at 18:54, Sjoerd Visscher wrote:
It is not impossible, but a lot of work. And if you want to do it  
correctly you would have to support UTF-16 (BE of LE) and UTF-32 (BE  
of LE) as well. You can't expect someone to start writing utf  
encoders and decoders every time he needs a fast parser.


...whereas making a linked list of Word32 run quickly is trivial?

--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com
system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon universityKF8NH


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Slow Text.JSON parser

2009-01-13 Thread Brandon S. Allbery KF8NH

On 2009 Jan 13, at 22:43, Brandon S. Allbery KF8NH wrote:

On 2009 Jan 13, at 18:54, Sjoerd Visscher wrote:
It is not impossible, but a lot of work. And if you want to do it  
correctly you would have to support UTF-16 (BE of LE) and UTF-32  
(BE of LE) as well. You can't expect someone to start writing utf  
encoders and decoders every time he needs a fast parser.


...whereas making a linked list of Word32 run quickly is trivial?


Correction:  a linked list of *indirect* Word32s.

--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com
system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon universityKF8NH


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Slow Text.JSON parser

2009-01-14 Thread Ketil Malde
Sjoerd Visscher  writes:

> JSON is a UNICODE format, like any modern format is today. ByteStrings
> are not going to work.

Well, neither is String as used in the code I responded to.  I'm not
intimately familiar with JSON, but I believe ByteStrings would work on
UTF-8 input, and both ByteString and String would fail on UTF-16 and
UTF-32. 

> If everybody starts yelling "ByteString" every time String performance
> is an issue, I don't see how Haskell is ever going to be a "real world
> programming language".

Insisting on linked lists of 32-bit characters isn't going to help,
either.  I'm also looking forward to a fast, robust, and complete
UniCode support, but the OP asked about performance.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Slow Text.JSON parser

2009-01-17 Thread Sjoerd Visscher

Hi,

Somebody told me about Parsec 3, which uses a Stream type class so it  
can parse any data type. This sounded like the right way to do  
encoding independent parsing, so I decided to see how it would work to  
parse UTF8 JSON.


Sadly I could not use Text.JSON.Parsec directly, because it uses the  
old Parsec CharParser type. So I copied to code, and also replaced  
p_number with the "floating" parser from Text.Parsec.Token, because  
Text.JSON.Parsec uses readFloat (a dirty hack imho) which works only  
on String.


If Text.JSON.Parsec was written for Parsec 3, the only thing to write  
to get UTF8 JSON parsing would be:


instance (Monad m, U.UTF8Bytes string index) => Stream (U.UTF8 string)  
m Char where

uncons = return . U.uncons

I did not do any performance measuring yet, I was glad I got it  
working. Any comments on the code is appreciated!


greetings,
Sjoerd Visscher

{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses,  
UndecidableInstances #-}

import qualified Data.String.UTF8 as U
import qualified Data.ByteString as B

import Text.Parsec hiding (many, optional, (<|>))
import Control.Applicative

import Text.JSON.Types
import Control.Monad
import Data.Char
import Numeric

instance (Monad m, U.UTF8Bytes string index) => Stream (U.UTF8 string)  
m Char where

uncons = return . U.uncons

type CharParser st = Parsec (U.UTF8 B.ByteString) st

parseFile :: FilePath -> IO (Either ParseError JSValue)
parseFile fileName = do
  bs <- B.readFile fileName
  return $ runParser json () fileName (U.fromRep bs)

parseString  :: String -> Either ParseError JSValue
parseString s = runParser json () "(unknown)" (U.fromString s)

json :: CharParser () JSValue
json  = spaces *> p_value

tok  :: CharParser () a -> CharParser () a
tok p = p <* spaces

p_value  :: CharParser () JSValue
p_value   =  (JSNull  <$  p_null)
 <|> (JSBool  <$> p_boolean)
 <|> (JSArray <$> p_array)
 <|> (JSString<$> p_js_string)
 <|> (JSObject<$> p_js_object)
 <|> (JSRational False <$> p_number)
  "JSON value"

p_null   :: CharParser () ()
p_null= tok (string "null") >> return ()

p_boolean:: CharParser () Bool
p_boolean = tok
  (  (True  <$ string "true")
 <|> (False <$ string "false")
  )

p_array  :: CharParser () [JSValue]
p_array   = between (tok (char '[')) (tok (char ']'))
  $ p_value `sepBy` tok (char ',')

p_string :: CharParser () String
p_string  = between (tok (char '"')) (char '"') (many p_char)
  where p_char=  (char '\\' >> p_esc)
 <|> (satisfy (\x -> x /= '"' && x /= '\\'))

p_esc =  ('"'   <$ char '"')
 <|> ('\\'  <$ char '\\')
 <|> ('/'   <$ char '/')
 <|> ('\b'  <$ char 'b')
 <|> ('\f'  <$ char 'f')
 <|> ('\n'  <$ char 'n')
 <|> ('\r'  <$ char 'r')
 <|> ('\t'  <$ char 't')
 <|> (char 'u' *> p_uni)
  "escape character"

p_uni = check =<< count 4 (satisfy isHexDigit)
  where check x | code <= max_char  = pure (toEnum code)
| otherwise = empty
  where code  = fst $ head $ readHex x
max_char  = fromEnum (maxBound :: Char)

p_object :: CharParser () [(String,JSValue)]
p_object  = between (tok (char '{')) (tok (char '}'))
  $ p_field `sepBy` tok (char ',')
  where p_field   = (,) <$> (p_string <* tok (char ':')) <*> p_value

p_number :: CharParser () Rational
p_number  = tok floating where

floating   :: CharParser () Rational
floating= do{ n <- decimal
; fract <- option 0 fraction
; expo  <- option 1 exponent'
; return ((fromInteger n + fract)*expo)
}

fraction= do{ char '.'
; digits <- many1 digit  "fraction"
; return (foldr op 0 digits)
}
   "fraction"
where
  op d f= (f + fromIntegral (digitToInt d))/10

exponent'   = do{ oneOf "eE"
; f <- sign
; e <- decimal  "exponent"
; return (power (f e))
}
   "exponent"
where
   power e  | e < 0  = 1/power(-e)
| otherwise  = fromInteger (10^e)

sign=   (char '-' >> return negate)
<|> (char '+' >> return id)
<|> return

Re: [Haskell-cafe] Slow Text.JSON parser

2009-01-17 Thread Don Stewart
It occurs to me you could also use attoparsec, which is specifically
optimised for bytestring processing.

sjoerd:
> Hi,
> 
> Somebody told me about Parsec 3, which uses a Stream type class so it  
> can parse any data type. This sounded like the right way to do  
> encoding independent parsing, so I decided to see how it would work to  
> parse UTF8 JSON.
> 
> Sadly I could not use Text.JSON.Parsec directly, because it uses the  
> old Parsec CharParser type. So I copied to code, and also replaced  
> p_number with the "floating" parser from Text.Parsec.Token, because  
> Text.JSON.Parsec uses readFloat (a dirty hack imho) which works only  
> on String.
> 
> If Text.JSON.Parsec was written for Parsec 3, the only thing to write  
> to get UTF8 JSON parsing would be:
> 
> instance (Monad m, U.UTF8Bytes string index) => Stream (U.UTF8 string)  
> m Char where
> uncons = return . U.uncons
> 
> I did not do any performance measuring yet, I was glad I got it  
> working. Any comments on the code is appreciated!
> 
> greetings,
> Sjoerd Visscher
> 
> {-# LANGUAGE FlexibleInstances, MultiParamTypeClasses,  
> UndecidableInstances #-}
> import qualified Data.String.UTF8 as U
> import qualified Data.ByteString as B
> 
> import Text.Parsec hiding (many, optional, (<|>))
> import Control.Applicative
> 
> import Text.JSON.Types
> import Control.Monad
> import Data.Char
> import Numeric
> 
> instance (Monad m, U.UTF8Bytes string index) => Stream (U.UTF8 string)  
> m Char where
> uncons = return . U.uncons
> 
> type CharParser st = Parsec (U.UTF8 B.ByteString) st
> 
> parseFile :: FilePath -> IO (Either ParseError JSValue)
> parseFile fileName = do
>   bs <- B.readFile fileName
>   return $ runParser json () fileName (U.fromRep bs)
> 
> parseString  :: String -> Either ParseError JSValue
> parseString s = runParser json () "(unknown)" (U.fromString s)
> 
> json :: CharParser () JSValue
> json  = spaces *> p_value
> 
> tok  :: CharParser () a -> CharParser () a
> tok p = p <* spaces
> 
> p_value  :: CharParser () JSValue
> p_value   =  (JSNull  <$  p_null)
>  <|> (JSBool  <$> p_boolean)
>  <|> (JSArray <$> p_array)
>  <|> (JSString<$> p_js_string)
>  <|> (JSObject<$> p_js_object)
>  <|> (JSRational False <$> p_number)
>   "JSON value"
> 
> p_null   :: CharParser () ()
> p_null= tok (string "null") >> return ()
> 
> p_boolean:: CharParser () Bool
> p_boolean = tok
>   (  (True  <$ string "true")
>  <|> (False <$ string "false")
>   )
> 
> p_array  :: CharParser () [JSValue]
> p_array   = between (tok (char '[')) (tok (char ']'))
>   $ p_value `sepBy` tok (char ',')
> 
> p_string :: CharParser () String
> p_string  = between (tok (char '"')) (char '"') (many p_char)
>   where p_char=  (char '\\' >> p_esc)
>  <|> (satisfy (\x -> x /= '"' && x /= '\\'))
> 
> p_esc =  ('"'   <$ char '"')
>  <|> ('\\'  <$ char '\\')
>  <|> ('/'   <$ char '/')
>  <|> ('\b'  <$ char 'b')
>  <|> ('\f'  <$ char 'f')
>  <|> ('\n'  <$ char 'n')
>  <|> ('\r'  <$ char 'r')
>  <|> ('\t'  <$ char 't')
>  <|> (char 'u' *> p_uni)
>   "escape character"
> 
> p_uni = check =<< count 4 (satisfy isHexDigit)
>   where check x | code <= max_char  = pure (toEnum code)
> | otherwise = empty
>   where code  = fst $ head $ readHex x
> max_char  = fromEnum (maxBound :: Char)
> 
> p_object :: CharParser () [(String,JSValue)]
> p_object  = between (tok (char '{')) (tok (char '}'))
>   $ p_field `sepBy` tok (char ',')
>   where p_field   = (,) <$> (p_string <* tok (char ':')) <*> p_value
> 
> p_number :: CharParser () Rational
> p_number  = tok floating where
> 
> floating   :: CharParser () Rational
> floating= do{ n <- decimal
> ; fract <- option 0 fraction
> ; expo  <- option 1 exponent'
> ; return ((fromInteger n + fract)*expo)
> }
> 
> fraction= do{ char '.'
> ; digits <- many1 digit  "fraction"
> ; return (foldr op 0 digits)
> }
>"fraction"
> where
>   op d f= (f + fromIntegral (digitToInt d))/10
> 
> exponent'   = do{ oneOf "eE"
> ; f <- sign
> ; e <- decimal  "exponent"
>   

Re: [Haskell-cafe] Slow Text.JSON parser

2009-01-17 Thread Sjoerd Visscher
Attoparsec does not have something like the Stream class, so I do not  
see how I could do UTF8 parsing easily.


On Jan 17, 2009, at 11:50 PM, Don Stewart wrote:


It occurs to me you could also use attoparsec, which is specifically
optimised for bytestring processing.

sjoerd:

Hi,

Somebody told me about Parsec 3, which uses a Stream type class so it
can parse any data type. This sounded like the right way to do
encoding independent parsing, so I decided to see how it would work  
to

parse UTF8 JSON.

Sadly I could not use Text.JSON.Parsec directly, because it uses the
old Parsec CharParser type. So I copied to code, and also replaced
p_number with the "floating" parser from Text.Parsec.Token, because
Text.JSON.Parsec uses readFloat (a dirty hack imho) which works only
on String.

If Text.JSON.Parsec was written for Parsec 3, the only thing to write
to get UTF8 JSON parsing would be:

instance (Monad m, U.UTF8Bytes string index) => Stream (U.UTF8  
string)

m Char where
   uncons = return . U.uncons

I did not do any performance measuring yet, I was glad I got it
working. Any comments on the code is appreciated!

greetings,
Sjoerd Visscher

{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses,
UndecidableInstances #-}
import qualified Data.String.UTF8 as U
import qualified Data.ByteString as B

import Text.Parsec hiding (many, optional, (<|>))
import Control.Applicative

import Text.JSON.Types
import Control.Monad
import Data.Char
import Numeric

instance (Monad m, U.UTF8Bytes string index) => Stream (U.UTF8  
string)

m Char where
   uncons = return . U.uncons

type CharParser st = Parsec (U.UTF8 B.ByteString) st

parseFile :: FilePath -> IO (Either ParseError JSValue)
parseFile fileName = do
 bs <- B.readFile fileName
 return $ runParser json () fileName (U.fromRep bs)

parseString  :: String -> Either ParseError JSValue
parseString s = runParser json () "(unknown)" (U.fromString s)

json :: CharParser () JSValue
json  = spaces *> p_value

tok  :: CharParser () a -> CharParser () a
tok p = p <* spaces

p_value  :: CharParser () JSValue
p_value   =  (JSNull  <$  p_null)
<|> (JSBool  <$> p_boolean)
<|> (JSArray <$> p_array)
<|> (JSString<$> p_js_string)
<|> (JSObject<$> p_js_object)
<|> (JSRational False <$> p_number)
 "JSON value"

p_null   :: CharParser () ()
p_null= tok (string "null") >> return ()

p_boolean:: CharParser () Bool
p_boolean = tok
 (  (True  <$ string "true")
<|> (False <$ string "false")
 )

p_array  :: CharParser () [JSValue]
p_array   = between (tok (char '[')) (tok (char ']'))
 $ p_value `sepBy` tok (char ',')

p_string :: CharParser () String
p_string  = between (tok (char '"')) (char '"') (many p_char)
 where p_char=  (char '\\' >> p_esc)
<|> (satisfy (\x -> x /= '"' && x /= '\\'))

   p_esc =  ('"'   <$ char '"')
<|> ('\\'  <$ char '\\')
<|> ('/'   <$ char '/')
<|> ('\b'  <$ char 'b')
<|> ('\f'  <$ char 'f')
<|> ('\n'  <$ char 'n')
<|> ('\r'  <$ char 'r')
<|> ('\t'  <$ char 't')
<|> (char 'u' *> p_uni)
 "escape character"

   p_uni = check =<< count 4 (satisfy isHexDigit)
 where check x | code <= max_char  = pure (toEnum code)
   | otherwise = empty
 where code  = fst $ head $ readHex x
   max_char  = fromEnum (maxBound :: Char)

p_object :: CharParser () [(String,JSValue)]
p_object  = between (tok (char '{')) (tok (char '}'))
 $ p_field `sepBy` tok (char ',')
 where p_field   = (,) <$> (p_string <* tok (char ':')) <*> p_value

p_number :: CharParser () Rational
p_number  = tok floating where

   floating   :: CharParser () Rational
   floating= do{ n <- decimal
   ; fract <- option 0 fraction
   ; expo  <- option 1 exponent'
   ; return ((fromInteger n + fract)*expo)
   }

   fraction= do{ char '.'
   ; digits <- many1 digit  "fraction"
   ; return (foldr op 0 digits)
   }
  "fraction"
   where
 op d f= (f + fromIntegral (digitToInt d))/10

   exponent'   = do{ oneOf "eE"
   ; f <- sign
   ; e <- decimal  "exponent"
   ; return (power (f e))
   }
  "exponent"
   where
  power e  | e < 0  = 1/pow

Re: [Haskell-cafe] Slow Text.JSON parser

2009-01-17 Thread Sigbjorn Finne


Maybe. Handling the common cases reasonably well is
probably worth doing first (+profiling) before opting for
a heart&lung transplant..

To wit, I've trivially improved the handling of string and
integer lits in version 0.4.3 (just released.) It cuts down
the running times by a factor of 2-3 on larger inputs --

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/json

Not saying that there aren't additional wins to be had :)

hth
--sigbjorn

On 1/17/2009 14:50, Don Stewart wrote:

It occurs to me you could also use attoparsec, which is specifically
optimised for bytestring processing.

  




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Slow Text.JSON parser

2009-01-17 Thread wren ng thornton

Ketil Malde wrote:

Sjoerd Visscher  writes:


JSON is a UNICODE format, like any modern format is today. ByteStrings
are not going to work.


Well, neither is String as used in the code I responded to.  I'm not
intimately familiar with JSON, but I believe ByteStrings would work on
UTF-8 input, and both ByteString and String would fail on UTF-16 and
UTF-32. 


ByteStrings can handle Unicode just fine, provided the right 
(de)serialization tools:


http://hackage.haskell.org/cgi-bin/hackage-scripts/package/utf8-light
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/utf8-string


--
Live well,
~wren
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Slow Text.JSON parser

2009-01-18 Thread Levi Greenspan
On Sun, Jan 18, 2009 at 6:07 AM, Sigbjorn Finne
 wrote:
>
> Maybe. Handling the common cases reasonably well is
> probably worth doing first (+profiling) before opting for
> a heart&lung transplant..
>
> To wit, I've trivially improved the handling of string and
> integer lits in version 0.4.3 (just released.) It cuts down
> the running times by a factor of 2-3 on larger inputs --

Indeed, I have just tried version 0.4.3 and my previous test which
took about 3 seconds to run is now running in about one second. Very
nice improvement. Thanks for all your work Sigbjorn.

Cheers,
Levi
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe