On Sun, Jul 12, 2015 at 7:40 PM, Yichao Yu <yyc1...@gmail.com> wrote:
> P.S. Given how strange this problem is for me, I would appreciate if
> anyone can confirm either this is a real issue or I'm somehow being
> crazy or stupid.
>

One additional strange property of this issue is that I used to have
much costy operations in the (outer) loop (the one that iterate over
nsteps with i) like Fourier transformations. However, when the scaling
factor is taking the bad value, it slows everything down (i.e. the
Fourier transformation is also slower by ~10x).

>
>
> On Sun, Jul 12, 2015 at 7:30 PM, Yichao Yu <yyc1...@gmail.com> wrote:
>> Hi,
>>
>> I've just seen a very strange (for me) performance difference for
>> exactly the same code on slightly different input with no explicit
>> branches.
>>
>> The code is available here[1]. The most relavant part is the following
>> function. (All other part of the code are for initialization and bench
>> mark). This is a simplified version of my similation that compute the
>> next array column in the array based on the previous one.
>>
>> The strange part is that the performance of this function can differ
>> by 10x depend on the value of the scaling factor (`eΓ`, the only use
>> of which is marked in the code below) even though I don't see any
>> branches that depends on that value in the relavant code. (unless the
>> cpu is 10x less efficient for certain input values)
>>
>> function propagate(P, ψ0, ψs, eΓ)
>>     @inbounds for i in 1:P.nele
>>         ψs[1, i, 1] = ψ0[1, i]
>>         ψs[2, i, 1] = ψ0[2, i]
>>     end
>>     T12 = im * sin(P.Ω)
>>     T11 = cos(P.Ω)
>>     @inbounds for i in 2:(P.nstep + 1)
>>         for j in 1:P.nele
>>             ψ_e = ψs[1, j, i - 1]
>>             ψ_g = ψs[2, j, i - 1] * eΓ # <---- Scaling factor
>>             ψs[2, j, i] = T11 * ψ_e + T12 * ψ_g
>>             ψs[1, j, i] = T11 * ψ_g + T12 * ψ_e
>>         end
>>     end
>>     ψs
>> end
>>
>> The output of the full script is attached and it can be clearly seen
>> that for scaling factor 0.6-0.8, the performance is 5-10 times slower
>> than others.
>>
>> The assembly[2] and llvm[3] code of this function is also in the same
>> repo. I see the same behavior on both 0.3 and 0.4 and with LLVM 3.3
>> and LLVM 3.6 on two different x86_64 machine (my laptop and a linode
>> VPS) (the only platform I've tried that doesn't show similar behavior
>> is running julia 0.4 on qemu-arm....... although the performance
>> between different values also differ by ~30% which is bigger than
>> noise)
>>
>> This also seems to depend on the initial value.
>>
>> Has anyone seen similar problems before?
>>
>> Outputs:
>>
>> 325.821 milliseconds (25383 allocations: 1159 KB)
>> 307.826 milliseconds (4 allocations: 144 bytes)
>> 0.0
>>  19.227 milliseconds (2 allocations: 48 bytes)
>> 0.1
>>  17.291 milliseconds (2 allocations: 48 bytes)
>> 0.2
>>  17.404 milliseconds (2 allocations: 48 bytes)
>> 0.3
>>  19.231 milliseconds (2 allocations: 48 bytes)
>> 0.4
>>  20.278 milliseconds (2 allocations: 48 bytes)
>> 0.5
>>  23.692 milliseconds (2 allocations: 48 bytes)
>> 0.6
>> 328.107 milliseconds (2 allocations: 48 bytes)
>> 0.7
>> 312.425 milliseconds (2 allocations: 48 bytes)
>> 0.8
>> 201.494 milliseconds (2 allocations: 48 bytes)
>> 0.9
>>  16.314 milliseconds (2 allocations: 48 bytes)
>> 1.0
>>  16.264 milliseconds (2 allocations: 48 bytes)
>>
>>
>> [1] 
>> https://github.com/yuyichao/explore/blob/e4be0151df33571c1c22f54fe044c929ca821c46/julia/array_prop/array_prop.jl
>> [2] 
>> https://github.com/yuyichao/explore/blob/e4be0151df33571c1c22f54fe044c929ca821c46/julia/array_prop/propagate.S
>> [2] 
>> https://github.com/yuyichao/explore/blob/e4be0151df33571c1c22f54fe044c929ca821c46/julia/array_prop/propagate.ll

Reply via email to