I see your point and I definitely agree with that example. At the risk of 
extending this discussion beyond it’s usefulness I’ve attached a snippet of 
some of the code that motivated this question. The issue is the variable 
tmp_rtnk in the loop. I think it’s as readable as it possibly can be and I 
was hoping to make a small syntax change so I don’t’ rebind tmp_rtnk each 
each (although I’m well aware that the there are a bunch of temporarily 
arrays created in the right hand side of operations involving tmp_rtnk). 

function Cℓbiasfun{dm}(parms::PhaseParms{dm})
    m1s   = ones(parms.Cϕϕk)
    r = √(sum([abs2(kdim) for kdim in parms.k]))
    rtnk     = zeros(parms.x[1])
    for p = 1:dm, q = 1:dm, p′ = 1:dm, q′ = 1:dm
        estϕC2_kℓ_pq   =  unnormalized_estϕkfun(parms.C2k[p,q],   m1s, parms) 
        estϕC2_kℓ_p′q′ =  unnormalized_estϕkfun(parms.C2k[p′,q′], m1s, parms) 
        for ωinx in find( r .<= 20 )
            ω = Float64[parms.k[jj][ωinx] for jj = 1:dm]
            C2_kℓminusω_pq        = shiftfk_by_ω(parms.C2k[p,q], -ω, parms)
            C2_kℓminusω_p′q′      = shiftfk_by_ω(parms.C2k[p′,q′], -ω, parms)
            estϕC2_kℓminusω_pq    =  unnormalized_estϕkfun(C2_kℓminusω_pq,  
m1s, parms) 
            estϕC2_kℓminusω_p′q′  =  
unnormalized_estϕkfun(C2_kℓminusω_p′q′,m1s, parms) 
            tmp_rtnk   =  
shiftfk_by_ω(parms.ξk[q].*conj(parms.ξk[p′]).*parms.Cϕϕk, -ω, parms) 
            tmp_rtnk .*=  
parms.ξk[p][ωinx].*conj(parms.ξk[q′][ωinx]).*parms.Cϕϕk[ωinx]
            tmp_rtnk .-=  conj(estϕC2_kℓ_p′q′ - estϕC2_kℓminusω_p′q′) 
            tmp_rtnk ./=  estϕC2_kℓ_pq - estϕC2_kℓminusω_pq
            tmp_rtnk .*=  (parms.deltk / 2π) ^ dm 
            rtnk     .+=  real(tmp_rtnk)
        end
    end
    rtnk ./= abs2(invAℓfun(parms))
    squash!(rtnk)
    return rtnk
end

Anyway, I think your point is that in this case just write out 
myupdate!(tmp_rtnk, 
A,B,C,D,E) for this function and replace

tmp_rtnk   =  shiftfk_by_ω(parms.ξk[q].*conj(parms.ξk[p′]).*parms.Cϕϕk, -ω, 
parms) 
tmp_rtnk .*=  parms.ξk[p][ωinx].*conj(parms.ξk[q′][ωinx]).*parms.Cϕϕk[ωinx]
tmp_rtnk .-=  conj(estϕC2_kℓ_p′q′ - estϕC2_kℓminusω_p′q′) 
tmp_rtnk ./=  estϕC2_kℓ_pq - estϕC2_kℓminusω_pq
tmp_rtnk .*=  (parms.deltk / 2π) ^ dm

with 

myupdate!(tmp_rtnk,
        shiftfk_by_ω(parms.ξk[q].*conj(parms.ξk[p′]).*parms.Cϕϕk, -ω, parms) ,
        parms.ξk[p][ωinx].*conj(parms.ξk[q′][ωinx]).*parms.Cϕϕk[ωinx],
        conj(estϕC2_kℓ_p′q′ - estϕC2_kℓminusω_p′q′),
        estϕC2_kℓ_pq - estϕC2_kℓminusω_pq,
        (parms.deltk / 2π) ^ dm 
        )

and then write a specialized myupdate! for each such function in my module.

Its funny...I tend to think my question comes from the fact that I'm 
somewhat addicted to tinkering with Julia functions for  speed/memory 
improvements. I probably need to learn how to stop.

Anyway, I leared a lot from asking so thanks!

On Friday, December 18, 2015 at 10:53:02 AM UTC-8, Steven G. Johnson wrote:


>
> On Friday, December 18, 2015 at 1:32:16 PM UTC-5, Ethan Anderes wrote:
>>
>> Ok, thanks for the info (and @inbounds does improve it a bit). I usually 
>> follow your advice and fuse the operations together when I need the speed, 
>> but since I do all manner of combinations of vectorized operations 
>> throughout my module I tend to prefer using .*=, ./=, etc unless I need 
>> it.
>>
> Having "all manner of combinations" of these operations is a good reason 
> *not* to define in-place versions of these operations.  For example, 
> imagine the computation:
>
> x = x + (2y - 4z) ./ w
>
>
> with your proposed in-place assignment operations, I guess this would 
> become:
>
> tmp = 2y
> tmp .-= 4z
> tmp ./= w
> x .+= tmp
>
>
> which still allocates two temporary arrays (one for tmp and one for 4z), 
> and involves five separate loops.  Compare to:
>
> for i in eachindex(x)
>     x[i] += (2y[i] - 4z[i]) / w[i]
> end
>
>
> which involves only one loop (and probably better cache performance as a 
> result) and no temporary arrays.  (You can add @inbounds if you want a bit 
> more performance and know that w/x/y/z have the same shape.)  Not only is 
> it more efficient than a sequence of in-place assignments, but I would 
> argue that it is much more readable as well, despite the need for an 
> explicit loop.
>
> Alternatively, you can use the Devectorize package, and something like
>
> @devec x[:] = x + (2y - 4z) ./ w
>
>
> will basically do the same thing as the loop if I understand @devec 
> correctly.
>
​

Reply via email to