On Fri, 2016-09-02 at 13:34, Jong Wook Kim <jongw...@nyu.edu> wrote:
> Hi Yichao, what a nice idea :)
>
> But even if I write in the C++ way,  @time sqrt(1) yields 5 allocations of 176
> bytes, and in inner loops this could be a bottleneck.

Those are just allocations for the return value of sqrt.  Consider:

julia> function f(n)
       out = 0.0
       for i=1:n
       out += sin(n)
       end
       out
       end
f (generic function with 1 method)

julia> @time f(10) # warmup
  0.000008 seconds (149 allocations: 10.167 KB)
-5.440211108893696

julia> @time f(10)
  0.000005 seconds (5 allocations: 176 bytes)
-5.440211108893696

julia> @time f(10000)
  0.000849 seconds (5 allocations: 176 bytes)
-3056.143888882987


> Is this an inevitable overhead of using ccall, or is it just a bogus that I 
> can
> ignore?
>
> Jong Wook
>
>
>     On Sep 2, 2016, at 7:14 AM, Yichao Yu <yyc1...@gmail.com> wrote:
>
>
>
>     On Fri, Sep 2, 2016 at 7:03 AM, Jong Wook Kim <ilike...@gmail.com> wrote:
>
>         Hi,
>
>         I'm using Julia 0.4.6 on OSX El Capitan, and was trying to normalize
>         each column of matrix, so that the norm of each column becomes 1. 
> Below
>         is a isolated and simplified version of what I'm doing:
>
>         function foo1()
>             local a = rand(1000, 10000)
>             @time for i in 1:size(a, 2)
>                 a[:, i] /= norm(a[:, i])
>             end
>         end
>
>         foo1()
>         0.165662 seconds (117.44 k allocations: 232.505 MB, 37.08% gc time)
>
>         I thought maybe the array copying is the problem, but this didn't help
>         much:
>
>         function foo2()
>             local a = rand(1000, 10000)
>             @time for i in 1:size(a, 2)
>                 a[:, i] /= norm(slice(a, :, i))
>             end
>         end
>
>         foo2()
>         0.131377 seconds (98.47 k allocations: 155.921 MB, 36.66% gc time)
>
>         and then I figured that this ugly one runs the fastest:
>
>         function foo3()
>             local a = rand(1000, 10000)
>             @time for i in 1:size(a, 2)
>                 setindex!(a, norm(slice(a, :, i)), :, i)
>             end
>         end
>
>         foo3()
>         0.013814 seconds (49.49 k allocations: 1.365 MB, 4.86% gc time)
>
>         So I overheard a few times that plain for-loops are faster than
>         vectorized code in Julia, and it seems it's allocating slightly less
>         memory, but it's slower than the above.
>
>         function foo4()
>             local a = rand(1000, 10000)
>             @time @inbounds for i in 1:size(a, 2)
>                 n = norm(slice(a, :, i))
>                 @inbounds for j in 1:size(a, 1)
>                     a[j, i] /= n
>                 end
>             end
>         end
>
>         foo4()
>         0.055522 seconds (30.00 k allocations: 1.068 MB, 15.14% gc time)
>
>         Is there a solution that is faster and less uglier than foo3() and 
> foo4
>         ()?
>
>         Thinking of an equivalent implementation in C/C++, I should be able to
>         write this logic without any heap allocation. Is it possible in Julia?
>
>
>     You can write it in the way you'd write it in c++ and just don't use `norm
>     `.
>
>
>
>         Thanks,
>         Jong Wook

Reply via email to