I see your point and I definitely agree with that example. At the risk of extending this discussion beyond it’s usefulness I’ve attached a snippet of some of the code that motivated this question. The issue is the variable tmp_rtnk in the loop. I think it’s as readable as it possibly can be and I was hoping to make a small syntax change so I don’t’ rebind tmp_rtnk each each (although I’m well aware that the there are a bunch of temporarily arrays created in the right hand side of operations involving tmp_rtnk).
function Cℓbiasfun{dm}(parms::PhaseParms{dm}) m1s = ones(parms.Cϕϕk) r = √(sum([abs2(kdim) for kdim in parms.k])) rtnk = zeros(parms.x[1]) for p = 1:dm, q = 1:dm, p′ = 1:dm, q′ = 1:dm estϕC2_kℓ_pq = unnormalized_estϕkfun(parms.C2k[p,q], m1s, parms) estϕC2_kℓ_p′q′ = unnormalized_estϕkfun(parms.C2k[p′,q′], m1s, parms) for ωinx in find( r .<= 20 ) ω = Float64[parms.k[jj][ωinx] for jj = 1:dm] C2_kℓminusω_pq = shiftfk_by_ω(parms.C2k[p,q], -ω, parms) C2_kℓminusω_p′q′ = shiftfk_by_ω(parms.C2k[p′,q′], -ω, parms) estϕC2_kℓminusω_pq = unnormalized_estϕkfun(C2_kℓminusω_pq, m1s, parms) estϕC2_kℓminusω_p′q′ = unnormalized_estϕkfun(C2_kℓminusω_p′q′,m1s, parms) tmp_rtnk = shiftfk_by_ω(parms.ξk[q].*conj(parms.ξk[p′]).*parms.Cϕϕk, -ω, parms) tmp_rtnk .*= parms.ξk[p][ωinx].*conj(parms.ξk[q′][ωinx]).*parms.Cϕϕk[ωinx] tmp_rtnk .-= conj(estϕC2_kℓ_p′q′ - estϕC2_kℓminusω_p′q′) tmp_rtnk ./= estϕC2_kℓ_pq - estϕC2_kℓminusω_pq tmp_rtnk .*= (parms.deltk / 2π) ^ dm rtnk .+= real(tmp_rtnk) end end rtnk ./= abs2(invAℓfun(parms)) squash!(rtnk) return rtnk end Anyway, I think your point is that in this case just write out myupdate!(tmp_rtnk, A,B,C,D,E) for this function and replace tmp_rtnk = shiftfk_by_ω(parms.ξk[q].*conj(parms.ξk[p′]).*parms.Cϕϕk, -ω, parms) tmp_rtnk .*= parms.ξk[p][ωinx].*conj(parms.ξk[q′][ωinx]).*parms.Cϕϕk[ωinx] tmp_rtnk .-= conj(estϕC2_kℓ_p′q′ - estϕC2_kℓminusω_p′q′) tmp_rtnk ./= estϕC2_kℓ_pq - estϕC2_kℓminusω_pq tmp_rtnk .*= (parms.deltk / 2π) ^ dm with myupdate!(tmp_rtnk, shiftfk_by_ω(parms.ξk[q].*conj(parms.ξk[p′]).*parms.Cϕϕk, -ω, parms) , parms.ξk[p][ωinx].*conj(parms.ξk[q′][ωinx]).*parms.Cϕϕk[ωinx], conj(estϕC2_kℓ_p′q′ - estϕC2_kℓminusω_p′q′), estϕC2_kℓ_pq - estϕC2_kℓminusω_pq, (parms.deltk / 2π) ^ dm ) and then write a specialized myupdate! for each such function in my module. Its funny...I tend to think my question comes from the fact that I'm somewhat addicted to tinkering with Julia functions for speed/memory improvements. I probably need to learn how to stop. Anyway, I leared a lot from asking so thanks! On Friday, December 18, 2015 at 10:53:02 AM UTC-8, Steven G. Johnson wrote: > > On Friday, December 18, 2015 at 1:32:16 PM UTC-5, Ethan Anderes wrote: >> >> Ok, thanks for the info (and @inbounds does improve it a bit). I usually >> follow your advice and fuse the operations together when I need the speed, >> but since I do all manner of combinations of vectorized operations >> throughout my module I tend to prefer using .*=, ./=, etc unless I need >> it. >> > Having "all manner of combinations" of these operations is a good reason > *not* to define in-place versions of these operations. For example, > imagine the computation: > > x = x + (2y - 4z) ./ w > > > with your proposed in-place assignment operations, I guess this would > become: > > tmp = 2y > tmp .-= 4z > tmp ./= w > x .+= tmp > > > which still allocates two temporary arrays (one for tmp and one for 4z), > and involves five separate loops. Compare to: > > for i in eachindex(x) > x[i] += (2y[i] - 4z[i]) / w[i] > end > > > which involves only one loop (and probably better cache performance as a > result) and no temporary arrays. (You can add @inbounds if you want a bit > more performance and know that w/x/y/z have the same shape.) Not only is > it more efficient than a sequence of in-place assignments, but I would > argue that it is much more readable as well, despite the need for an > explicit loop. > > Alternatively, you can use the Devectorize package, and something like > > @devec x[:] = x + (2y - 4z) ./ w > > > will basically do the same thing as the loop if I understand @devec > correctly. >