One thing that's hurting your home-rolled function's performance is that .^
is allocating a copy for each 1e6-length row. I'm not sure if the
vectorized sqrt operator does something similar (I'm actually having some
trouble finding the source code for it).
Anyway, the best thing to do here seems to be to embrace loops:
julia> function f(mat)
A = zeros(Float64, size(mat, 2))
for j in 1:size(mat, 2)
for i in 1:size(mat, 1)
A[j] += mat[i, j]^2
end
A[j] = sqrt(A[j])
end
A
end
f (generic function with 2 methods)
julia> f(mat);
julia> @time f(mat);
0.009359 seconds (6 allocations: 7.630 MB)
On Wednesday, August 12, 2015 at 11:24:10 AM UTC-4, René Donner wrote:
>
> You could also try something like
>
> julia> a = rand(100,100000);
>
> julia> @time [norm(sub(a,:,i)) for i in 1:size(a,2)];
> 0.135117 seconds (1.80 M allocations: 40.381 MB, 14.14% gc time)
>
> ----
>
> As long as your slices are along the last dimension, i.e. columns in a
> matrix, you can use FunctionalData, which is optimized exactly for this:
> https://github.com/rened/FunctionalData.jl#efficiency
>
> julia> @time map(a, norm);
> 0.014685 seconds (200.02 k allocations: 3.815 MB)
>
>
>
>
> Am 12.08.2015 um 16:26 schrieb jw3126 <[email protected] <javascript:>>:
>
> > I have a matrix and I want to compute the norm of each column. I tried
> >
> > using TimeIt
> > N = 10^6
> > mat = randn(3, N)
> >
> > @timeit mapslices(norm, mat, 1)
> >
> > 1 loops, best of 3:
> > 915.13 ms per loop
> >
> >
> > Dog slow, 100times slower then numpy:
> >
> > import numpy as np
> > N = 10**6
> > mat = np.random.randn(3, N)
> > %timeit np.linalg.norm(mat, axis=0)
> >
> > 100 loops, best of 3:
> > 9.71 ms per loop
> >
> > If I understand it correctly, the reason is that mapslices can not
> really optimize on the norm argument, e.g. inline norm. I had two ideas how
> to solve this problem, both of which are problematic:
> >
> > 1) There is the NumericExtensions package, which seems to be aimed at
> this kind of problem, however this package is deprecated, so I don't want
> to use it.
> >
> > 2) Write some code by hand. This gives speedup, but is still slower
> then numpy:
> >
> > N = 10^6
> > mat = randn(3, N)
> > function f(mat)
> > return sqrt(mat[1,:].^2 + mat[2,:].^2 + mat[3,:].^2)
> > end
> >
> > @timeit f(mat)
> >
> > 10 loops, best of 3:
> > 38.51 ms per loop
> >
> >
> > So how to make this computation fast? And more generally what are ways
> speed up mapslices without using NumericExtensions?
>
>