Thanks for your answer!
I deleted the post you quoted and re-posted a complete version because I 
posted it prematurely by accident, sorry about that.
The commit in question is d9f7c2125831a16c2386888904f303846a1ced95


On Wednesday, 14 October 2015 17:09:52 UTC+2, Yichao Yu wrote:
> On Wed, Oct 14, 2015 at 10:57 AM, Damien < <javascript:>> 
> wrote: 
> > Hi all, 
> > 
> > I'm noticing a strange performance issue with expressions such as this 
> one: 
> > 
> > n = 100000 
> > a = zeros(Float32, n) 
> > b = rand(Float32, n) 
> > c = rand(Float32, n) 
> > 
> > function test(a, b, c) 
> >     @simd for i in 1:length(a) 
> >         @inbounds a[i] += b[i] * c[i] * (c[i] < b[i]) * (c[i] > b[i]) * 
> > (c[i] <= b[i]) * (c[i] >= b[i]) 
> >     end 
> > end 
> > 
> > 
> > The problem depends on the number of statements in the expression and 
> > whether the comparisons are explicitely cast to Float32. 
> > 
> > In Julia 0.4-rc4, I get the following: 
> >         @inbounds a[i] += b[i] * c[i] * (c[i] < b[i]) * (c[i] > b[i]) * 
> > (c[i] <= b[i]) * (c[i] >= b[i]) 
> > 
> >> test(a, b, c) 
> >> @time test(a, b, c) 
> > 
> > 0.000143 seconds (4 allocations: 160 bytes) 
> > 
> > 
> > 
> > 
> > @inbounds a[i] += b[i] * (c[i] < b[i]) * (c[i] < b[i]) * (c[i] < b[i]) 
> > 
> >> test(a, b, c) 
> >> @time test(a, b, c) 
> > 0.000004 seconds (4 allocations: 160 bytes) 
> > 
> > 
> > Four or more, loop is NOT vectorised: @inbounds a[i] += b[i] * (c[i] < 
> b[i]) 
> > * (c[i] < b[i]) * (c[i] < b[i]) * (c[i] < b[i]) 
> > 
> > 
> >> test(a, b, c) 
> >> @time test(a, b, c) 
> > 0.000021 seconds (204 allocations: 3.281 KB) 
> > 
> > 
> > Explicit casts, loop is vectorised again: @inbounds a[i] += b[i] * 
> > Float32(c[i] < b[i]) * Float32(c[i] < b[i]) * Float32(c[i] < b[i]) * 
> > Float32(c[i] < b[i]) 
> > 
> >> test(a, b, c) 
> >> @time test(a, b, c) 
> > 
> > 0.000003 seconds (4 allocations: 160 bytes) 
> > 
> > 
> > 
> > Julia Version 0.5.0-dev+769 
> > Commit d9f7c21* (2015-10-14 12:03 UTC) 
> > Platform Info: 
> >   System: Darwin (x86_64-apple-darwin13.4.0) 
> >   CPU: Intel(R) Core(TM) i7-2635QM CPU @ 2.00GHz 
> >   WORD_SIZE: 64 
> >   BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Sandybridge) 
> >   LAPACK: libopenblas 
> >   LIBM: libopenlibm 
> >   LLVM: libLLVM-3.3 
> > 
> The inlining is a little too fragile and you should check with 
> @code_llvm if all the functions are inlined. 
> I've also noticed that the SHA you give doesn't seems to be a valid 
> commit on JuliaLang/julia so I couldn't check if the inlining fix is 
> included. 
> > 
> > 
> > 

Reply via email to