After the recent discussion on json parsing performance, I spent some time doing a quick port of enough of attoparsec and aeson (haskell libraries providing, respectively, byte-oriented parser combinators and json parsing) to do some testing.
The result: really slow. (This is where the puzzlement comes in. I'll get back to that in just a moment.) This 1MB sample file [ https://github.com/samoconnor/LazyJSON.jl/blob/master/test/ec2-2016-11-15.normal.json] takes about 20s to parse on my machine using my aeson-alike library. By contrast, the same file parses in roughly 160ms with the json module that ships with racket. So really, really slow. Then I downloaded the most recent racket-cs build. My library parsed the same file in about 3.3s. So an *enormous* improvement. (Caveat, I'm only measuring runtime here, not compilation/expansion time.) Still, sadly, very slow compared to the existing library. So I tested the existing library on the file, under racket-cs. (Actually, I think the version that came with this racket-cs build includes Matthew's recent improvements, so it's not exactly quite the same parser.) It parsed the file in 45-50ms. That's a long-winded way of saying that: - Racket CS provides a huge performance improvement to some programs. - My parser is sloooow. This is the part where I express puzzlement and ask for your help. It's not obvious to me why my parser is so slow. The haskell library from which it is unabashedly copied is known to be fast, and I don't think it leans heavily on laziness anywhere. One quirk of its design: each parser is passed a failure and a success continuation; they never return. So, on top of all the normal function-upon-function layering that you get in any parser combinator library, you get some extra here, since the continuations are heap-allocated closures. Apparently, this was a big win for the original library's performance [ http://www.serpentine.com/blog/2011/02/25/cps-is-great-cps-is-terrible/], but maybe that doesn't work so well in racket. I tried profiling. It was not very illuminating, and I suspect CPS is getting in the way there, too. (It's hard to measure the time between function call and return if there is no return.) Any clever ideas on how to find out where all the time is being spent? - Jon -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.