On Fri, 2016-09-02 at 13:34, Jong Wook Kim <jongw...@nyu.edu> wrote: > Hi Yichao, what a nice idea :) > > But even if I write in the C++ way, @time sqrt(1) yields 5 allocations of 176 > bytes, and in inner loops this could be a bottleneck.
Those are just allocations for the return value of sqrt. Consider: julia> function f(n) out = 0.0 for i=1:n out += sin(n) end out end f (generic function with 1 method) julia> @time f(10) # warmup 0.000008 seconds (149 allocations: 10.167 KB) -5.440211108893696 julia> @time f(10) 0.000005 seconds (5 allocations: 176 bytes) -5.440211108893696 julia> @time f(10000) 0.000849 seconds (5 allocations: 176 bytes) -3056.143888882987 > Is this an inevitable overhead of using ccall, or is it just a bogus that I > can > ignore? > > Jong Wook > > > On Sep 2, 2016, at 7:14 AM, Yichao Yu <yyc1...@gmail.com> wrote: > > > > On Fri, Sep 2, 2016 at 7:03 AM, Jong Wook Kim <ilike...@gmail.com> wrote: > > Hi, > > I'm using Julia 0.4.6 on OSX El Capitan, and was trying to normalize > each column of matrix, so that the norm of each column becomes 1. > Below > is a isolated and simplified version of what I'm doing: > > function foo1() > local a = rand(1000, 10000) > @time for i in 1:size(a, 2) > a[:, i] /= norm(a[:, i]) > end > end > > foo1() > 0.165662 seconds (117.44 k allocations: 232.505 MB, 37.08% gc time) > > I thought maybe the array copying is the problem, but this didn't help > much: > > function foo2() > local a = rand(1000, 10000) > @time for i in 1:size(a, 2) > a[:, i] /= norm(slice(a, :, i)) > end > end > > foo2() > 0.131377 seconds (98.47 k allocations: 155.921 MB, 36.66% gc time) > > and then I figured that this ugly one runs the fastest: > > function foo3() > local a = rand(1000, 10000) > @time for i in 1:size(a, 2) > setindex!(a, norm(slice(a, :, i)), :, i) > end > end > > foo3() > 0.013814 seconds (49.49 k allocations: 1.365 MB, 4.86% gc time) > > So I overheard a few times that plain for-loops are faster than > vectorized code in Julia, and it seems it's allocating slightly less > memory, but it's slower than the above. > > function foo4() > local a = rand(1000, 10000) > @time @inbounds for i in 1:size(a, 2) > n = norm(slice(a, :, i)) > @inbounds for j in 1:size(a, 1) > a[j, i] /= n > end > end > end > > foo4() > 0.055522 seconds (30.00 k allocations: 1.068 MB, 15.14% gc time) > > Is there a solution that is faster and less uglier than foo3() and > foo4 > ()? > > Thinking of an equivalent implementation in C/C++, I should be able to > write this logic without any heap allocation. Is it possible in Julia? > > > You can write it in the way you'd write it in c++ and just don't use `norm > `. > > > > Thanks, > Jong Wook