Putting things into a function to avoid benchmarking in global scope: foo!(x,y) = x .= cos.(sin.(y))
y = rand(1000); x = similar(y); @time foo!(x, y); gives (after running @time twice to eliminate the compilation time): 0.000035 seconds (5 allocations: 176 bytes) i.e. it is not allocating any arrays, and all the loops are fused. Binary operations like .+ will not be fused until Julia 0.6, however. (This is documented in the manual.)