Try putting some extraneous parentheses around some of your operations, and 
you'll get good performance again. It's an inlining thing.

Please do report this as an issue: 
https://github.com/JuliaLang/julia/issues/new

--Tim

On Wednesday, October 14, 2015 08:07:11 AM Damien wrote:
> Hi all,
> 
> I'm noticing a strange performance issue with expressions such as this one:
> 
> n = 100000
> a = zeros(Float32, n)
> b = rand(Float32, n)
> c = rand(Float32, n)
> 
> function test(a, b, c)
>    @simd for i in 1:length(a)
>        @inbounds a[i] += b[i] * c[i] * (c[i] < b[i]) * (c[i] > b[i]) *
> (c[i] <= b[i]) * (c[i] >= b[i])
>    end
> end
> 
> The problem is that performance and successful vectorisation depend on the
> number of comparison statements in the expression and whether the
> comparisons are explicitely cast to Float32.
> 
> In Julia 0.4-rc4, I get the following:
> 
> @inbounds a[i] += b[i] * c[i] * (c[i] < b[i]) * (c[i] > b[i]) * (c[i] <=
> b[i])
> 
> > test(a, b, c)
> > @time test(a, b, c)
> 
> 0.000169 seconds (4 allocations: 160 bytes)
> 
> @inbounds a[i] += b[i] * c[i] * (c[i] < b[i]) * (c[i] > b[i]) * (c[i] <=
> b[i]) * (c[i] >= b[i])
> 
> > test(a, b, c)
> > @time test(a, b, c)
> 
> 0.007258 seconds (200.00 k allocations: 3.052 MB, 47.59% gc time)
> 
> @inbounds a[i] += b[i] * c[i] * Float32(c[i] < b[i]) * Float32(c[i] > b[i])
> * Float32(c[i] <= b[i]) * Float32(c[i] <= b[i])
> 
> > test(a, b, c)
> > @time test(a, b, c)
> 
> 0.000137 seconds (4 allocations: 160 bytes)
> 
> I get a similar behavior in the current 0.5 HEAD (Commit d9f7c21* with the
> fix for issue #13553) but the threshold for the number of comparisons is
> slightly different.
> 
> (a) Is meant to be OK to use expressions like a[i] * (c[i] < b[i]) or
> should I always cast explicitely? I really like the implicit version,
> because it is very readable and a natural translation of equations
> involving cases.
> 
> (b) What is causing the vectorisation threshold observed here?
> 
> Best,
> Damien

Reply via email to