On Thu, Jan 13, 2011 at 12:15 AM, Evan Laforge <qdun...@gmail.com> wrote: > Well, I tried it... and it's still slower! > > parsec2, String: (a little faster since last time since I have new computer) > total time = 9.10 secs (455 ticks @ 20 ms) > total alloc = 2,295,837,512 bytes (excludes profiling overheads) > > attoparsec-text, Data.Text: > total time = 14.72 secs (736 ticks @ 20 ms) > total alloc = 2,797,672,844 bytes (excludes profiling overheads)
Interesting. > Just in case there's some useful criticism, here's one of the busier parsers: > > p_unsigned_float :: A.Parser Double > p_unsigned_float = do > i <- A.takeWhile Char.isDigit > f <- A.option "" (A.char '.' >> A.takeWhile1 Char.isDigit) > if (Text.null i && Text.null f) then mzero else do > case (dec i, dec f) of > (Just i', Just f') -> return $ fromIntegral i' > + fromIntegral f' / fromIntegral (10 ^ (Text.length f)) > _ -> mzero > where > dec :: Text.Text -> Maybe Int > dec s > | Text.null s = Just 0 > | otherwise = case Text.Read.decimal s of > Right (d, rest) | Text.null rest -> Just d > _ -> Nothing > I've tried creating a benchmark using this code. It's on the recently created attoparsec-text darcs repo [1,2]. There is a 2.7 MiB test file with many numbers to be parsed. The attoparsec-text package was installed using -O (Cabal's default) and the test program was compiled with ghc -hide-package parsec-3.1.0 --make -O2. Using parsers that return the parsed number as a double and then sum everything up, I get the following timings: attoparsec_text_builtin 2,241,038,864 bytes allocated in the heap 46 MB total memory in use (1 MB lost due to fragmentation) MUT time 1.10s ( 1.13s elapsed) GC time 0.15s ( 0.20s elapsed) Total time 1.25s ( 1.32s elapsed) attoparsec_text_laforge 1,281,603,768 bytes allocated in the heap 101 MB total memory in use (2 MB lost due to fragmentation) MUT time 0.58s ( 0.62s elapsed) GC time 0.47s ( 0.54s elapsed) Total time 1.05s ( 1.16s elapsed) parsec_laforge 1,558,621,208 bytes allocated in the heap 47 MB total memory in use (0 MB lost due to fragmentation) MUT time 0.82s ( 0.84s elapsed) GC time 0.46s ( 0.51s elapsed) Total time 1.27s ( 1.35s elapsed) 'attoparsec_text_builtin' uses Data.Attoparsec.Text.double available on the darcs version of the library. It tries to handle more cases, like exponents, and thus it is expected to be slower than your version. 'attoparsec_text_laforge' and 'parsec_laforge' are very similar to the one you gave in your e-mail, but with some modifications (e.g. Text.Read.decimal can't be used with Strings). Using attoparsec-text is faster and allocates less, but for some reason the faster version takes up a lot more memory. As the total memory figures were strange, I created a different version that parses the input but does not create any Doubles. Instead of summing them, the number of Doubles (if they were parsed) is counted. These are the results: attoparsec_text_laforge_discarding 985,843,696 bytes allocated in the heap 25 MB total memory in use (0 MB lost due to fragmentation) MUT time 0.38s ( 0.39s elapsed) GC time 0.07s ( 0.10s elapsed) Total time 0.45s ( 0.49s elapsed) parsec_laforge_discarding double_test.txt +RTS -s 1,471,829,664 bytes allocated in the heap 28 MB total memory in use (0 MB lost due to fragmentation) MUT time 0.66s ( 0.68s elapsed) GC time 0.44s ( 0.46s elapsed) Total time 1.10s ( 1.14s elapsed) Now attoparsec-text is more than twice faster, allocates even less memory and the total memory figures seem right. Bottom line: I think this benchmark doesn't really represent the kind of workload your parser has. Can you reproduce these results on your system? Cheers! =) [1] http://patch-tag.com/r/felipe/attoparsec-text/ [2] http://patch-tag.com/r/felipe/attoparsec-text/snapshot/current/content/pretty/benchmarks/Double.hs -- Felipe. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe