Your algorithm looks fine. The problems are entirely in your testing script. The first issue is that JSON.parse returns a Vector{Any}, which deoptimizes everything. Try:
f = open("test.json") data = convert(Vector{Float64}, JSON.parse(f)) The second issue is that you're including compilation time in your test. Try: jenks(data, 5) @time jenks(data, 5) With these changes, I get: ➜ jenks.jl git:(master) ✗ julia jenks-run.jl elapsed time: 0.013948242 seconds (199084 bytes allocated) On Tuesday, March 25, 2014 8:45:43 PM UTC-4, Shaun Walbridge wrote: > > Hello, > > I've been playing around with Julia for some data classifiers commonly > used in mapping, such as the Jenks "natural breaks" > algorithm<http://en.wikipedia.org/wiki/Jenks_natural_breaks_optimization>. > (For background and a Javascript implementation, I highly recommend Tom > MacWright's literate programming > implementation<http://www.macwright.org/2013/02/18/literate-jenks.html>.) > Recently, a friend created a Cython Python > version<https://github.com/perrygeo/jenks> of > Jenks which is very performant. I've created my Julia version largely based > on his version, and based on posts like this one on isotonic regression > in Julia <http://tullo.ch/articles/python-vs-julia/>, I had hoped the > Julia one would be in a similar ballpark in terms of performance to Cython. > Here's my very basic timing results: > > Matt Perry's Cython version: > In [15]: %timeit jenks(data, 5) > 100 loops, best of 3: 13.9 ms per loop > > My Julia version, running in 0.2.1 (running against master produced slower > results): > julia> @time jenks(data, 5) > elapsed time: 1.046356641 seconds (646397168 bytes allocated) > > In comparison, an implementation in Python which only uses > lists<https://gist.github.com/llimllib/4974446> runs > in about 2.8 seconds on my machine. So I imagine that I must be doing > something wrong, because I imagine the performance different should not be > in favor of the Cython version by a factor of 75, and should handily > dispatch the implementation which uses ill-fitting data structures. My > Julia code is a rather crude port and is by no means idiomatic, I did do > some basic profiling, and most of the runtime seems to come from the basic > math performed in each loop, I've only done some minor performance > optimization. Any and all advice appreciated on what would make this code > more performant for this particular task. > > Code and data at: > https://github.com/scw/jenks.jl > > Thanks for your help, > Shaun >