Note also that: function mynorm(x) s = zero(x[1]^2) @inbounds @simd for I in eachindex(x) s += x[I]^2 end sqrt(s) end
does get SIMDed. So the difference is almost surely vectorization. --Tim On Wednesday, November 18, 2015 01:38:38 PM Tim Holy wrote: > On Wednesday, November 18, 2015 02:30:01 PM Stefan Karpinski wrote: > > Those numbers don't include any compilation (the allocations are too low). > > I'm seeing a similar thing. They're just implemented in really different > > ways. maxabs uses mapreduce, which seems to be a chronic source of > > less-than-optimal performance. > > Not the problem: > > julia> function mymaxabs(x) > s = abs(x[1]) > @inbounds @simd for I in eachindex(x) > s = max(s, abs(x[I])) > end > s > end > mymaxabs (generic function with 1 method) > > julia> x = randn(100000); > > # warmup suppressed > > julia> @time maxabs(x) > 0.000425 seconds (5 allocations: 176 bytes) > 4.513240114499124 > > julia> @time mymaxabs(x) > 0.000642 seconds (5 allocations: 176 bytes) > 4.513240114499124 > > > (It doesn't actually get SIMDed, though.) > > I'm not entirely surprised. Multiplication is fast, and with 10^5 elements > the sqrt should not be the bottleneck. > > --Tim > > > On Wed, Nov 18, 2015 at 2:12 PM, Benjamin Deonovic <bdeono...@gmail.com> > > > > wrote: > > > Does norm use maxabs? If so this could be due to maxabs getting > > > compiled. > > > try running both of the timed statements a second time. > > > > > > On Wednesday, November 18, 2015 at 10:41:48 AM UTC-6, Sisyphuss wrote: > > >> Interesting phenomenon: norm() is faster than maxabs() > > >> > > >> x = randn(100000) > > >> @time maxabs(x) > > >> @time norm(x) > > >> > > >> > > >> 0.000108 seconds (5 allocations: 176 bytes) > > >> 0.000040 seconds (5 allocations: 176 bytes) > > >> > > >> I have thought the contrary, for norm() requires N square and 1 square > > >> root; maxabs() requires 2N change of sign bit and N comparison.