540 ulps does seem kind of big. On Thu, Oct 13, 2016 at 4:21 AM, DNF <oyv...@gmail.com> wrote:
> That seems right: > > julia> f(a, p) > 781.4987197415827 > > julia> f(reverse(a), reverse(p)) > 781.4987197415213 > > > But I'm pretty surprised the effect is that big. > > On Thursday, October 13, 2016 at 9:49:00 AM UTC+2, Kristoffer Carlsson > wrote: >> >> I think you will not add the numbers in the same order when SIMD is used. >> Floating point addition is not commutative so you get slightly different >> answers. >> >> On Thursday, October 13, 2016 at 9:14:31 AM UTC+2, DNF wrote: >>> >>> This is about twice as fast with, with @simd: >>> >>> function f2(a, p) >>> @assert length(a) == length(p) >>> s = 0.0 >>> @simd for i in eachindex(a) >>> @inbounds s += abs((a[i] - p[i])/a[i]) >>> end >>> return 100s/length(a) >>> end >>> >>> julia> @benchmark f(a, p) >>> BenchmarkTools.Trial: >>> samples: 115 >>> evals/sample: 1 >>> time tolerance: 5.00% >>> memory tolerance: 1.00% >>> memory estimate: 144.00 bytes >>> allocs estimate: 7 >>> minimum time: 41.96 ms (0.00% GC) >>> median time: 42.53 ms (0.00% GC) >>> mean time: 43.49 ms (0.00% GC) >>> maximum time: 52.82 ms (0.00% GC) >>> >>> julia> @benchmark f2(a, p) >>> BenchmarkTools.Trial: >>> samples: 224 >>> evals/sample: 1 >>> time tolerance: 5.00% >>> memory tolerance: 1.00% >>> memory estimate: 0.00 bytes >>> allocs estimate: 0 >>> minimum time: 21.08 ms (0.00% GC) >>> median time: 21.86 ms (0.00% GC) >>> mean time: 22.38 ms (0.00% GC) >>> maximum time: 27.30 ms (0.00% GC) >>> >>> >>> Weirdly, they give slightly different answers: >>> >>> julia> f(a, p) >>> 781.4987197415827 >>> >>> julia> f2(a, p) >>> 781.498719741497 >>> >>> >>> I would like to know why that happens. >>> >>> On Friday, October 7, 2016 at 10:29:20 AM UTC+2, Martin Florek wrote: >>>> >>>> Thanks Andrew for answer. >>>> I also have experience that eachindex() is slightly faster. >>>> In Performance tips I found macros e.g. @simd. Do you have any >>>> experience with them? >>>> >>> >>> >>> >>