Thanks for the update. I read the link. I did some further investigation e.g.
N = 5_000_000 A = rand(N) R = UnitRange[ 1:round(Int, a*10) for a in A] function testMap1(R) tmp = zeros(Int, length(R)) for i in eachindex(R) tmp[i] = length(R[i]) end sum(tmp) end function testMap2(R) sum(map(length, R)) end testMap1(R) testMap2(R) @time testMap1(R) 0.572483 seconds (7 allocations: 38.147 MB, 0.23% gc time) 24992990 @time testMap2(R) 0.279889 seconds (8 allocations: 38.147 MB, 0.62% gc time) 24992990 I wonder why here is map() so efficient, even faster than de-vectorized loop version in testMap1(). Regards, Jan Dňa piatok, 23. októbra 2015 11:12:06 UTC+2 Kristoffer Carlsson napísal(-a): > > This is not a new issue. > > You are simply bumping into the problem that passing functions as > arguments incur a cost every time the function is called. > > If you want to compare it with map! in base you should do the following: > > function mape4a(f, A, F) > > tmp = similar(A) > for i in eachindex(A) > tmp[i] = f(A[i], F[i]) > end > > 100 * sumabs(tmp) / length(A) > end > > > @time mape4a(_f A,F) > 0.348988 seconds (20.00 M allocations: 343.323 MB, 8.25% gc time) > > There are plans on fixing this, see > https://github.com/JuliaLang/julia/pull/13412 > > On Friday, October 23, 2015 at 10:58:06 AM UTC+2, Ján Dolinský wrote: >> >> versioninfo() >> Julia Version 0.4.0 >> Commit 0ff703b* (2015-10-08 06:20 UTC) >> Platform Info: >> System: Linux (x86_64-linux-gnu) >> CPU: Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz >> WORD_SIZE: 64 >> BLAS: libopenblas (NO_LAPACK NO_LAPACKE DYNAMIC_ARCH NO_AFFINITY >> Haswell) >> LAPACK: liblapack.so.3 >> LIBM: libopenlibm >> LLVM: libLLVM-3.3 >> >> Hi Milan, >> >> The above is the versioninfo() output. I am exploring this further, using >> map() instead of map!() give me 3 time 5 million allocations as opposed to >> map!() with 4 times 5 million allocations. The "for" cycle in either map or >> map!() should not allocate that much memory. See my devectorized example >> in the previous post. >> >> Shall I file an issue, please advise me on how to do it. In general, I >> think map() and broadcast() should have about the same performance in the >> example given in the beginning of this thread. >> >> Thanks, >> Jan >> >> Dňa piatok, 23. októbra 2015 10:44:01 UTC+2 Milan Bouchet-Valat >> napísal(-a): >>> >>> This sounds suspicious to me. If you can file an issue with a >>> reproducible example, you'll soon get feedback about what's going on >>> here. >>> >>> Please report the output of versioninfo() there too. I assume this is >>> on 0.4? >>> >>> >>> Regards >>> >>> Le vendredi 23 octobre 2015 à 00:42 -0700, Ján Dolinský a écrit : >>> > ## 2 argument >>> > function map!{F}(f::F, dest::AbstractArray, A::AbstractArray, >>> > B::AbstractArray) >>> > for i = 1:length(A) >>> > dest[i] = f(A[i], B[i]) >>> > end >>> > return dest >>> > end >>> > >>> > The above is the map!() implementation in abstractarray.jl. Should it >>> > return "dest" if it is an in-place function ? Is there any >>> > fundamental difference between my mape4a() and map!() in >>> > abstractarray.jl ? >>> > >>> > Thanks, >>> > Jan >>> > >>> > Dňa piatok, 23. októbra 2015 9:30:36 UTC+2 Ján Dolinský napísal(-a): >>> > > Hi Glen, >>> > > >>> > > Thanks for the investigation. I am afraid the for loop in map!() is >>> > > not the source of the issue. Consider the folowing: >>> > > >>> > > _f(a,f) = (a - f) / a >>> > > >>> > > function mape4(A, F) >>> > > # A - actual target values >>> > > # F - forecasts (model estimations) >>> > > >>> > > tmp = similar(A) >>> > > map!(_f, tmp, A, F) >>> > > 100 * sumabs(tmp) / length(A) >>> > > >>> > > end >>> > > >>> > > function mape4a(A, F) >>> > > >>> > > tmp = similar(A) >>> > > for i in eachindex(A) >>> > > tmp[i] = _f(A[i], F[i]) >>> > > end >>> > > 100 * sumabs(tmp) / length(A) >>> > > end >>> > > >>> > > @time mape4(A,F) >>> > > 0.452273 seconds (20.00 M allocations: 343.323 MB, 9.80% gc time) >>> > > 832.852597807525 >>> > > >>> > > @time mape4a(A,F) >>> > > 0.040240 seconds (7 allocations: 38.147 MB, 1.93% gc time) >>> > > 832.852597807525 >>> > > >>> > > The for loop in mape4a() does not do 4 * 5 milion allocations, >>> > > neither should do the loop in map!(). Is this possibly a bug ? >>> > > >>> > > Thanks, >>> > > Jan >>> > > >>> > > Dňa štvrtok, 22. októbra 2015 19:43:31 UTC+2 Glen O napísal(-a): >>> > > > I'm uncertain, but I think I may have figured out what's going >>> > > > on. >>> > > > >>> > > > The hint lies in the number of allocations - map! has 20 million >>> > > > allocations, while broadcast! has just 5. So I had a look at how >>> > > > the two functions are implemented. >>> > > > >>> > > > map! is implemented in perhaps the simplest way you can think of >>> > > > - for i=1:length(A) dest[i]=f(A[i],B[i]); end - which means that >>> > > > it has to store four values per iteration - i, A[i], B[i], and >>> > > > f(A[i],B[i]). Thus, 4 times 5 million allocations. >>> > > > >>> > > > broadcast! is using a cache to store values, instead, and I >>> > > > believe it's generating instructions using a macro instead of a >>> > > > regular loop, thus avoiding the assignments for i. As such, it >>> > > > doesn't need to store anything except for the initial caches, and >>> > > > after that it just overwrites the existing values. Unfortunately, >>> > > > that's as much as I can figure out from broadcast!, because it >>> > > > uses a lot of macros and a lot of relatively opaque structure. >>> > > > >>> > > > I'm also not entirely sure how it avoids the assignments >>> > > > necessary in the function call. >>> > > > >>> > > > On Friday, 23 October 2015 01:54:14 UTC+10, Ján Dolinský wrote: >>> > > > > Hi, >>> > > > > >>> > > > > I am exploring Julia's map() and broadcast() functions. I did a >>> > > > > simple implementation of MAPE (mean absolute percentage error) >>> > > > > using broadcast() and map(). Interestingly, the difference in >>> > > > > performance was huge. >>> > > > > >>> > > > > A = rand(5_000_000) >>> > > > > F = rand(5_000_000) >>> > > > > >>> > > > > _f(a,f) = (a - f) / a >>> > > > > >>> > > > > function mape3(A, F) >>> > > > > # A - actual target values >>> > > > > # F - forecasts (model estimations) >>> > > > > >>> > > > > tmp = similar(A) >>> > > > > broadcast!(_f, tmp, A, F) >>> > > > > 100 * sumabs(tmp) / length(A) >>> > > > > >>> > > > > end >>> > > > > >>> > > > > function mape4(A, F) >>> > > > > # A - actual target values >>> > > > > # F - forecasts (model estimations) >>> > > > > >>> > > > > tmp = similar(A) >>> > > > > map!(_f, tmp, A, F) >>> > > > > 100 * sumabs(tmp) / length(A) >>> > > > > >>> > > > > end >>> > > > > >>> > > > > @time mape3(A,F) # after JIT warm-up >>> > > > > 0.038686 seconds (8 allocations: 38.147 MB, 2.25% gc time) >>> > > > > 876.4813057521973 >>> > > > > >>> > > > > @time mape4(A,F) # after JIT warm-up >>> > > > > 0.457771 seconds (20.00 M allocations: 343.323 MB, 11.29% gc >>> > > > > time) >>> > > > > 876.4813057521973 >>> > > > > >>> > > > > I wonder why map() is so much slower ? >>> > > > > >>> > > > > Thanks, >>> > > > > Jan >>> > > > > >>> >>