Your algorithm looks fine. The problems are entirely in your testing 
script. The first issue is that JSON.parse returns a Vector{Any}, which 
deoptimizes everything. Try:

f = open("test.json")
data = convert(Vector{Float64}, JSON.parse(f))

The second issue is that you're including compilation time in your test. 
Try:

jenks(data, 5)
@time jenks(data, 5)

With these changes, I get:

➜  jenks.jl git:(master) ✗ julia jenks-run.jl 
elapsed time: 0.013948242 seconds (199084 bytes allocated)

On Tuesday, March 25, 2014 8:45:43 PM UTC-4, Shaun Walbridge wrote:
>
> Hello,
>
> I've been playing around with Julia for some data classifiers commonly 
> used in mapping, such as the Jenks "natural breaks" 
> algorithm<http://en.wikipedia.org/wiki/Jenks_natural_breaks_optimization>. 
> (For background and a Javascript implementation, I highly recommend Tom 
> MacWright's literate programming 
> implementation<http://www.macwright.org/2013/02/18/literate-jenks.html>.) 
>  Recently, a friend created a Cython Python 
> version<https://github.com/perrygeo/jenks> of 
> Jenks which is very performant. I've created my Julia version largely based 
> on his version, and based on posts like this one on isotonic regression 
> in Julia <http://tullo.ch/articles/python-vs-julia/>, I had hoped the 
> Julia one would be in a similar ballpark in terms of performance to Cython. 
> Here's my very basic timing results:
>
> Matt Perry's Cython version:
>   In [15]: %timeit jenks(data, 5)
>   100 loops, best of 3: 13.9 ms per loop
>
> My Julia version, running in 0.2.1 (running against master produced slower 
> results):
> julia> @time jenks(data, 5)
> elapsed time: 1.046356641 seconds (646397168 bytes allocated)
>
> In comparison, an implementation in Python which only uses 
> lists<https://gist.github.com/llimllib/4974446> runs 
> in about 2.8 seconds on my machine. So I imagine that I must be doing 
> something wrong, because I imagine the performance different should not be 
> in favor of the Cython version by a factor of 75, and should handily 
> dispatch the implementation which uses ill-fitting data structures. My 
> Julia code is a rather crude port and is by no means idiomatic, I did do 
> some basic profiling, and most of the runtime seems to come from the basic 
> math performed in each loop, I've only done some minor performance 
> optimization. Any and all advice appreciated on what would make this code 
> more performant for this particular task. 
>
> Code and data at:
>   https://github.com/scw/jenks.jl
>
> Thanks for your help,
> Shaun
>

Reply via email to