Note also that:

            function mynorm(x)
                  s = zero(x[1]^2)
                  @inbounds @simd for I in eachindex(x)
                      s += x[I]^2
                  end
                  sqrt(s)
              end

does get SIMDed. So the difference is almost surely vectorization.

--Tim


On Wednesday, November 18, 2015 01:38:38 PM Tim Holy wrote:
> On Wednesday, November 18, 2015 02:30:01 PM Stefan Karpinski wrote:
> > Those numbers don't include any compilation (the allocations are too low).
> > I'm seeing a similar thing. They're just implemented in really different
> > ways. maxabs uses mapreduce, which seems to be a chronic source of
> > less-than-optimal performance.
> 
> Not the problem:
> 
> julia> function mymaxabs(x)
>            s = abs(x[1])
>            @inbounds @simd for I in eachindex(x)
>                s = max(s, abs(x[I]))
>            end
>            s
>        end
> mymaxabs (generic function with 1 method)
> 
> julia> x = randn(100000);
> 
> # warmup suppressed
> 
> julia> @time maxabs(x)
>   0.000425 seconds (5 allocations: 176 bytes)
> 4.513240114499124
> 
> julia> @time mymaxabs(x)
>   0.000642 seconds (5 allocations: 176 bytes)
> 4.513240114499124
> 
> 
> (It doesn't actually get SIMDed, though.)
> 
> I'm not entirely surprised. Multiplication is fast, and with 10^5 elements
> the sqrt should not be the bottleneck.
> 
> --Tim
> 
> > On Wed, Nov 18, 2015 at 2:12 PM, Benjamin Deonovic <bdeono...@gmail.com>
> > 
> > wrote:
> > > Does norm use maxabs? If so this could be due to maxabs getting
> > > compiled.
> > > try running both of the timed statements a second time.
> > > 
> > > On Wednesday, November 18, 2015 at 10:41:48 AM UTC-6, Sisyphuss wrote:
> > >> Interesting phenomenon: norm() is faster than maxabs()
> > >> 
> > >> x = randn(100000)
> > >> @time maxabs(x)
> > >> @time norm(x)
> > >> 
> > >> 
> > >> 0.000108 seconds (5 allocations: 176 bytes)
> > >> 0.000040 seconds (5 allocations: 176 bytes)
> > >> 
> > >> I have thought the contrary, for norm() requires N square and 1 square
> > >> root; maxabs() requires 2N change of sign bit and N comparison.

Reply via email to