See "method lookup hosting" and "function-valued argument inlining" in 
https://github.com/JuliaLang/julia/issues/3440. One good workaround is to 
pass a type instead of a function. If you define:

type X; end
evaluate(::X) = ...

then you can pass X() to a function and evaluate(t) and will be as fast as 
if you defined a function in the global scope.

In this particular case, you might consider just using the kde function 
from StatsBase, which uses an FFT-based algorithm that should be much 
faster.

Simon

On Wednesday, March 5, 2014 8:57:09 PM UTC-5, Antonio Linero wrote:
>
> Below is a (highly non-optimal) implementation of kernel density 
> estimation. In principle, I'd like to pass in different kernel functions 
> (e.g. use the same code with a Gaussian kernel or triweight kernel). 
> However, I get a huge performance hit if I try to pass the kernel function 
> in. The following code calculates, for the Gaussian kernel, the KDE on a 
> grid of size 1000 given 1000 training data points. The first implementation 
> gets the kernel function from the global environment while the second takes 
> the kernel function passed in as an argument.
>
> Why does this give different performance and is there any way to repair 
> it? The version that does not pass the kernel function is (1) at least 
> three times faster and (2) allocates far more memory.
>
> using Rmath
>
>
> function kern(x::Float64, y::Float64, h::Float64)
>     return Rmath.dnorm(x, y, h)
> end
>
>
> function eval_kern(x::Array{Float64, 1}, X::Array{Float64, 1}, h::Float64)
>     N   = length(x)
>     K   = length(X)
>     out = zeros(N)
>     for i in 1:N
>         for k in 1:K
>             out[i] += kern(x[i], X[k], h)
>         end
>     end
>     return out / N
> end
>
>
> function eval_kern(x::Array{Float64, 1}, X::Array{Float64, 1}, h::Float64, 
> Kern::Function)
>     N   = length(x)
>     K   = length(X)
>     out = zeros(N)
>     for i in 1:N
>         for k in 1:K
>             out[i] += Kern(x[i], X[k], h)
>         end
>     end
>     return out / N
> end
>
> N = 1000
> X = randn(N)
> h = std(X) * N ^ (-.2) * 1.06
> x_gridd = linspace(-4.0, 4.0, 1000);
>
> @time y_gridd = eval_kern(x_gridd, X, h);
> @time y_gridd = eval_kern(x_gridd, X, h, kern);
>
> This gives the following results for me:
>
> julia> @time y_gridd = eval_kern(x_gridd, X, h);
> elapsed time: 0.062120766 seconds (422240 bytes allocated)
>
> julia> @time y_gridd = eval_kern(x_gridd, X, h, kern);
> elapsed time: 0.249457403 seconds (104151016 bytes allocated)
>
>
>
> I've experimented with a few things. Using Rmath.dnorm directly rather 
> than kern cuts down on the memory allocation for the first implementation 
> but doesn't seem to change the run time.. If I use
> out[i] += Kern(x[i]::Float64, X[k]::Float64, h::Float64)::Float64
> instead I get within a factor of 2 in terms of speed and a 50% decreasing 
> in memory allocation. 
>

Reply via email to