Yeah, that’s what I figured, I don’t even need the sin() julia> f() = 42 f (generic function with 1 method)
julia> @time f() 0.001347 seconds (141 allocations: 10.266 KB) 42 julia> @time f() 0.000002 seconds (4 allocations: 160 bytes) 42 julia> @time f() 0.000002 seconds (4 allocations: 160 bytes) 42 Is @time counting the stack allocations as well? Otherwise I don’t see why any heap allocation is needed. > On Sep 2, 2016, at 7:41 AM, Mauro <mauro...@runbox.com> wrote: > > On Fri, 2016-09-02 at 13:34, Jong Wook Kim <jongw...@nyu.edu> wrote: >> Hi Yichao, what a nice idea :) >> >> But even if I write in the C++ way, @time sqrt(1) yields 5 allocations of >> 176 >> bytes, and in inner loops this could be a bottleneck. > > Those are just allocations for the return value of sqrt. Consider: > > julia> function f(n) > out = 0.0 > for i=1:n > out += sin(n) > end > out > end > f (generic function with 1 method) > > julia> @time f(10) # warmup > 0.000008 seconds (149 allocations: 10.167 KB) > -5.440211108893696 > > julia> @time f(10) > 0.000005 seconds (5 allocations: 176 bytes) > -5.440211108893696 > > julia> @time f(10000) > 0.000849 seconds (5 allocations: 176 bytes) > -3056.143888882987 > > >> Is this an inevitable overhead of using ccall, or is it just a bogus that I >> can >> ignore? >> >> Jong Wook >> >> >> On Sep 2, 2016, at 7:14 AM, Yichao Yu <yyc1...@gmail.com> wrote: >> >> >> >> On Fri, Sep 2, 2016 at 7:03 AM, Jong Wook Kim <ilike...@gmail.com> wrote: >> >> Hi, >> >> I'm using Julia 0.4.6 on OSX El Capitan, and was trying to normalize >> each column of matrix, so that the norm of each column becomes 1. >> Below >> is a isolated and simplified version of what I'm doing: >> >> function foo1() >> local a = rand(1000, 10000) >> @time for i in 1:size(a, 2) >> a[:, i] /= norm(a[:, i]) >> end >> end >> >> foo1() >> 0.165662 seconds (117.44 k allocations: 232.505 MB, 37.08% gc time) >> >> I thought maybe the array copying is the problem, but this didn't help >> much: >> >> function foo2() >> local a = rand(1000, 10000) >> @time for i in 1:size(a, 2) >> a[:, i] /= norm(slice(a, :, i)) >> end >> end >> >> foo2() >> 0.131377 seconds (98.47 k allocations: 155.921 MB, 36.66% gc time) >> >> and then I figured that this ugly one runs the fastest: >> >> function foo3() >> local a = rand(1000, 10000) >> @time for i in 1:size(a, 2) >> setindex!(a, norm(slice(a, :, i)), :, i) >> end >> end >> >> foo3() >> 0.013814 seconds (49.49 k allocations: 1.365 MB, 4.86% gc time) >> >> So I overheard a few times that plain for-loops are faster than >> vectorized code in Julia, and it seems it's allocating slightly less >> memory, but it's slower than the above. >> >> function foo4() >> local a = rand(1000, 10000) >> @time @inbounds for i in 1:size(a, 2) >> n = norm(slice(a, :, i)) >> @inbounds for j in 1:size(a, 1) >> a[j, i] /= n >> end >> end >> end >> >> foo4() >> 0.055522 seconds (30.00 k allocations: 1.068 MB, 15.14% gc time) >> >> Is there a solution that is faster and less uglier than foo3() and >> foo4 >> ()? >> >> Thinking of an equivalent implementation in C/C++, I should be able to >> write this logic without any heap allocation. Is it possible in Julia? >> >> >> You can write it in the way you'd write it in c++ and just don't use `norm >> `. >> >> >> >> Thanks, >> Jong Wook