Oh never mind - I see that you have a matrix multiply there that benefits from 
calling BLAS. If it is a matrix multiply, how come you can get away with axpy? 
Shouldn’t you need a gemm?

Another way to avoid creating temporary arrays with indexing is to use 
subArrays, which the linear algebra routines can work with.

-viral



> On 14-Sep-2014, at 2:43 pm, Viral Shah <vi...@mayin.org> wrote:
> 
> That is great! However, by devectorizing, I meant writing the loop statement 
> itself as two more loops, so that you end up with 3 nested loops effectively. 
> You basically do not want all those w[:,:,ti] calls that create matrices 
> every time.
> 
> You could also potentially hoist the deltas.d out of the loop. Try something 
> like:
> 
> 
> function errprop!(w::Array{Float32,3}, d::Array{Float32,3}, deltas)
>       deltas.d[:] = 0.
>       dd = deltas.d
>       for ti=1:size(w,3), ti2 = 1:size(d,3)
>               for i=1:size(w,1)
>                       for j=size(w,2)
>                           dd[i,j,ti+ti2-1] += w[i,j,ti]'*d[i,j,ti2]
>                       end
>               end
>       end
>       deltas.d
> end
> 
> 
> -viral
> 
> 
> 
>> On 14-Sep-2014, at 12:47 pm, Michael Oliver <michael.d.oli...@gmail.com> 
>> wrote:
>> 
>> Thanks Viral for the quick reply, that's good to know. I was able to squeeze 
>> a little more performance out with axpy (see below). I tried devectorizing 
>> the inner loop, but it was much slower, I believe because it was no longer 
>> taking full advantage of MKL for the matrix multiply. So far I've got the 
>> code running at 1.4x what I had in Matlab and according to @time I still 
>> have 44.41% gc time. So 0.4 can't come soon enough! Great work guys, I'm 
>> really enjoying learning Julia.
>> 
>> function errprop!(w::Array{Float32,3}, d::Array{Float32,3}, deltas)
>>      deltas.d[:] = 0.
>>      rg =size(w,2)*size(d,2);
>>      for ti=1:size(w,3), ti2 = 1:size(d,3)
>>             
>> Base.LinAlg.BLAS.axpy!(1,w[:,:,ti]'*d[:,:,ti2],range(1,rg),deltas.d[:,:,ti+ti2-1],range(1,rg))
>>      end
>>      deltas.d
>> end
>> 
>> On Saturday, September 13, 2014 10:10:25 PM UTC-7, Viral Shah wrote:
>> The garbage is generated from the indexing operations. In 0.4, we should 
>> have array views that should solve this problem. For now, you can either 
>> manually devectorize the inner loop, or use the @devectorize macros in the 
>> Devectorize package, if they work out in this case.
>> 
>> -viral
>> 
>> On Sunday, September 14, 2014 10:34:45 AM UTC+5:30, Michael Oliver wrote:
>> Hi all,
>> I've implemented a time delay neural network module and have been trying to 
>> optimize it now. This function is for propagating the error backwards 
>> through the network.
>> The deltas.d is just a container for holding the errors so I can do things 
>> in place and don't have to keep initializing arrays. w and d are collections 
>> of weights and errors respectively for different time lags.
>> This function gets called many many times and according to profiling, there 
>> is a lot of garbage collection being induced by the fourth line, 
>> specifically within multidimensional.jl getindex and setindex! and array.jl +
>> 
>> function errprop!(w::Array{Float32,3}, d::Array{Float32,3}, deltas)
>>      deltas.d[:] = 0.
>>      for ti=1:size(w,3), ti2 = 1:size(d,3)
>>          deltas.d[:,:,ti+ti2-1] += w[:,:,ti]'*d[:,:,ti2];
>>      end
>>      deltas.d
>> end
>> 
>> Any advice would be much appreciated!
>> Best,
>> Michael
> 

Reply via email to