After the recent discussion on json parsing performance, I spent some time
doing a quick port of enough of attoparsec and aeson (haskell libraries
providing, respectively, byte-oriented parser combinators and json parsing)
to do some testing.

The result: really slow. (This is where the puzzlement comes in. I'll get
back to that in just a moment.) This 1MB sample file [
https://github.com/samoconnor/LazyJSON.jl/blob/master/test/ec2-2016-11-15.normal.json]
takes about 20s to parse on my machine using my aeson-alike library. By
contrast, the same file parses in roughly 160ms with the json module that
ships with racket.

So really, really slow.

Then I downloaded the most recent racket-cs build. My library parsed the
same file in about 3.3s. So an *enormous* improvement. (Caveat, I'm only
measuring runtime here, not compilation/expansion time.) Still, sadly, very
slow compared to the existing library.

So I tested the existing library on the file, under racket-cs. (Actually, I
think the version that came with this racket-cs build includes Matthew's
recent improvements, so it's not exactly quite the same parser.) It parsed
the file in 45-50ms.

That's a long-winded way of saying that:
- Racket CS provides a huge performance improvement to some programs.
- My parser is sloooow.

This is the part where I express puzzlement and ask for your help.

It's not obvious to me why my parser is so slow. The haskell library from
which it is unabashedly copied is known to be fast, and I don't think it
leans heavily on laziness anywhere. One quirk of its design: each parser is
passed a failure and a success continuation; they never return. So, on top
of all the normal function-upon-function layering that you get in any
parser combinator library, you get some extra here, since the continuations
are heap-allocated closures. Apparently, this was a big win for the
original library's performance [
http://www.serpentine.com/blog/2011/02/25/cps-is-great-cps-is-terrible/],
but maybe that doesn't work so well in racket.

I tried profiling. It was not very illuminating, and I suspect CPS is
getting in the way there, too. (It's hard to measure the time between
function call and return if there is no return.) Any clever ideas on how to
find out where all the time is being spent?

- Jon

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to