Thanks Viral for the quick reply, that's good to know. I was able to 
squeeze a little more performance out with axpy (see below). I tried 
devectorizing the inner loop, but it was much slower, I believe because it 
was no longer taking full advantage of MKL for the matrix multiply. So far 
I've got the code running at 1.4x what I had in Matlab and according to 
@time I still have 44.41% gc time. So 0.4 can't come soon enough! Great 
work guys, I'm really enjoying learning Julia.

function errprop!(w::Array{Float32,3}, d::Array{Float32,3}, deltas)
deltas.d[:] = 0.
rg =size(w,2)*size(d,2);
for ti=1:size(w,3), ti2 = 1:size(d,3)

On Saturday, September 13, 2014 10:10:25 PM UTC-7, Viral Shah wrote:
> The garbage is generated from the indexing operations. In 0.4, we should 
> have array views that should solve this problem. For now, you can either 
> manually devectorize the inner loop, or use the @devectorize macros in the 
> Devectorize package, if they work out in this case.
> -viral
> On Sunday, September 14, 2014 10:34:45 AM UTC+5:30, Michael Oliver wrote:
>> Hi all,
>> I've implemented a time delay neural network module and have been trying 
>> to optimize it now. This function is for propagating the error backwards 
>> through the network.
>> The deltas.d is just a container for holding the errors so I can do 
>> things in place and don't have to keep initializing arrays. w and d are 
>> collections of weights and errors respectively for different time lags.
>> This function gets called many many times and according to profiling, 
>> there is a lot of garbage collection being induced by the fourth line, 
>> specifically within multidimensional.jl getindex and setindex! and array.jl 
>> +
>> function errprop!(w::Array{Float32,3}, d::Array{Float32,3}, deltas)
>> deltas.d[:] = 0.
>> for ti=1:size(w,3), ti2 = 1:size(d,3)
>>     deltas.d[:,:,ti+ti2-1] += w[:,:,ti]'*d[:,:,ti2];
>> end
>> deltas.d
>> end
>> Any advice would be much appreciated!
>> Best,
>> Michael

Reply via email to