P.S. Given how strange this problem is for me, I would appreciate if
anyone can confirm either this is a real issue or I'm somehow being
crazy or stupid.



On Sun, Jul 12, 2015 at 7:30 PM, Yichao Yu <yyc1...@gmail.com> wrote:
> Hi,
>
> I've just seen a very strange (for me) performance difference for
> exactly the same code on slightly different input with no explicit
> branches.
>
> The code is available here[1]. The most relavant part is the following
> function. (All other part of the code are for initialization and bench
> mark). This is a simplified version of my similation that compute the
> next array column in the array based on the previous one.
>
> The strange part is that the performance of this function can differ
> by 10x depend on the value of the scaling factor (`eΓ`, the only use
> of which is marked in the code below) even though I don't see any
> branches that depends on that value in the relavant code. (unless the
> cpu is 10x less efficient for certain input values)
>
> function propagate(P, ψ0, ψs, eΓ)
>     @inbounds for i in 1:P.nele
>         ψs[1, i, 1] = ψ0[1, i]
>         ψs[2, i, 1] = ψ0[2, i]
>     end
>     T12 = im * sin(P.Ω)
>     T11 = cos(P.Ω)
>     @inbounds for i in 2:(P.nstep + 1)
>         for j in 1:P.nele
>             ψ_e = ψs[1, j, i - 1]
>             ψ_g = ψs[2, j, i - 1] * eΓ # <---- Scaling factor
>             ψs[2, j, i] = T11 * ψ_e + T12 * ψ_g
>             ψs[1, j, i] = T11 * ψ_g + T12 * ψ_e
>         end
>     end
>     ψs
> end
>
> The output of the full script is attached and it can be clearly seen
> that for scaling factor 0.6-0.8, the performance is 5-10 times slower
> than others.
>
> The assembly[2] and llvm[3] code of this function is also in the same
> repo. I see the same behavior on both 0.3 and 0.4 and with LLVM 3.3
> and LLVM 3.6 on two different x86_64 machine (my laptop and a linode
> VPS) (the only platform I've tried that doesn't show similar behavior
> is running julia 0.4 on qemu-arm....... although the performance
> between different values also differ by ~30% which is bigger than
> noise)
>
> This also seems to depend on the initial value.
>
> Has anyone seen similar problems before?
>
> Outputs:
>
> 325.821 milliseconds (25383 allocations: 1159 KB)
> 307.826 milliseconds (4 allocations: 144 bytes)
> 0.0
>  19.227 milliseconds (2 allocations: 48 bytes)
> 0.1
>  17.291 milliseconds (2 allocations: 48 bytes)
> 0.2
>  17.404 milliseconds (2 allocations: 48 bytes)
> 0.3
>  19.231 milliseconds (2 allocations: 48 bytes)
> 0.4
>  20.278 milliseconds (2 allocations: 48 bytes)
> 0.5
>  23.692 milliseconds (2 allocations: 48 bytes)
> 0.6
> 328.107 milliseconds (2 allocations: 48 bytes)
> 0.7
> 312.425 milliseconds (2 allocations: 48 bytes)
> 0.8
> 201.494 milliseconds (2 allocations: 48 bytes)
> 0.9
>  16.314 milliseconds (2 allocations: 48 bytes)
> 1.0
>  16.264 milliseconds (2 allocations: 48 bytes)
>
>
> [1] 
> https://github.com/yuyichao/explore/blob/e4be0151df33571c1c22f54fe044c929ca821c46/julia/array_prop/array_prop.jl
> [2] 
> https://github.com/yuyichao/explore/blob/e4be0151df33571c1c22f54fe044c929ca821c46/julia/array_prop/propagate.S
> [2] 
> https://github.com/yuyichao/explore/blob/e4be0151df33571c1c22f54fe044c929ca821c46/julia/array_prop/propagate.ll

Reply via email to