I ran into strange performance issues in an algorithm I have been working 
on. 

I have a test case as well as some timing and profiler results at this 
gist: https://gist.github.com/spencerlyon2/d21d6368a2ccbf6f1e7b


I summarize the issues here. Consider the following code (note I am 
defining myexp because of this 
issue: https://github.com/JuliaLang/julia/issues/11048. It turns out that 
on OS X, calling apple's libm gives a substantial speed up -- e.g. I'm 
doing everything I can to give OS X a chance to win here)

the code: 

@osx? (
         begin
             myexp(x::Float64) = ccall((:exp, :libm), Float64, (Float64,), x)
             # myexp(x::Float64) = exp(x)
         end
       : begin
             myexp(x::Float64) = exp(x)
         end
       )
 
function test_func(data::Matrix, points::Matrix)
    # extract input dimensions
    n, d = size(data)
    n_points = size(points, 1)
 
    # transpose data and points to access columns at a time
    data = data'
    points = points'
 
    # Define constants
    hbar = n^(-1.0/(d+4.0))
    hbar2 = hbar^2
    constant = 1.0/(n*hbar^(d) * (2π)^(d/2))
 
    # allocate space
    density = Array(Float64, n_points)
    Di_min = Array(Float64, n_points)
 
    # apply formula (2)
    for i=1:n_points  # loop over all points
        dens_i = 0.0
        min_di2 = Inf
        for j=1:n_points  # loop over all other points
            d_i2_j = 0.0
            for k=1:d  # loop over d
                @inbounds d_i2_j += ((points[k, i] - data[k, j])^2)
            end
            dens_i += myexp(-0.5*d_i2_j/hbar2)
            if i != j && d_i2_j < min_di2
                min_di2 = d_i2_j
            end
        end
        density[i] = constant * dens_i
        Di_min[i] = sqrt(min_di2)
    end
 
    return density, Di_min
end



To test the performance of this code on linux and OS X, I started up a 
docker image with a recent (40 days old master) julia from my OS X machine 
and compared the timing against running it on OS X directly (with 1 days 
old julia). I found that for `data, points = randn(9500, 2)` on linux 
version takes about 2.6 seconds to run `test_func` whereas on OS X  it 
takes about 9.3. 

I can't explain this large (almost 4x) performance hit that I get from 
running the code on the native OS vs the virtual machine. 

More details (profiler results, timing stats, self-contained runnable 
example) in the 
gist: https://gist.github.com/spencerlyon2/d21d6368a2ccbf6f1e7b



Reply via email to