I'm uncertain, but I think I may have figured out what's going on. The hint lies in the number of allocations - map! has 20 million allocations, while broadcast! has just 5. So I had a look at how the two functions are implemented.
map! is implemented in perhaps the simplest way you can think of - for i=1:length(A) dest[i]=f(A[i],B[i]); end - which means that it has to store four values per iteration - i, A[i], B[i], and f(A[i],B[i]). Thus, 4 times 5 million allocations. broadcast! is using a cache to store values, instead, and I believe it's generating instructions using a macro instead of a regular loop, thus avoiding the assignments for i. As such, it doesn't need to store anything except for the initial caches, and after that it just overwrites the existing values. Unfortunately, that's as much as I can figure out from broadcast!, because it uses a lot of macros and a lot of relatively opaque structure. I'm also not entirely sure how it avoids the assignments necessary in the function call. On Friday, 23 October 2015 01:54:14 UTC+10, Ján Dolinský wrote: > > Hi, > > I am exploring Julia's map() and broadcast() functions. I did a simple > implementation of MAPE (mean absolute percentage error) using broadcast() > and map(). Interestingly, the difference in performance was huge. > > A = rand(5_000_000) > F = rand(5_000_000) > > _f(a,f) = (a - f) / a > > function mape3(A, F) > # A - actual target values > # F - forecasts (model estimations) > > tmp = similar(A) > broadcast!(_f, tmp, A, F) > 100 * sumabs(tmp) / length(A) > > end > > function mape4(A, F) > # A - actual target values > # F - forecasts (model estimations) > > tmp = similar(A) > map!(_f, tmp, A, F) > 100 * sumabs(tmp) / length(A) > > end > > @time mape3(A,F) # after JIT warm-up > 0.038686 seconds (8 allocations: 38.147 MB, 2.25% gc time) > 876.4813057521973 > > @time mape4(A,F) # after JIT warm-up > 0.457771 seconds (20.00 M allocations: 343.323 MB, 11.29% gc time) > 876.4813057521973 > > I wonder why map() is so much slower ? > > Thanks, > Jan >