> One more thing to mention on this topic.
>
> From a certain size dot product becomes faster than sum (due to 
> parallelisation I guess?).
>
> E.g.
> def dotsum(arr):
>     a = arr.reshape(1000, 100)
>     return a.dot(np.ones(100)).sum()
>
> a = np.ones(100000)
>
> In [45]: %timeit np.add.reduce(a, axis=None)
> 42.8 µs ± 2.44 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>
> In [43]: %timeit dotsum(a)
> 26.1 µs ± 718 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>
> But theoretically, sum, should be faster than dot product by a fair bit.
>
> Isn’t parallelisation implemented for it?

I cannot reproduce that:

In [3]: %timeit np.add.reduce(a, axis=None)
19.7 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [4]: %timeit dotsum(a)
47.2 µs ± 360 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

But almost certainly it is indeed due to optimizations, since .dot uses
BLAS which is highly optimized (at least on some platforms, clearly
better on yours than on mine!).

I thought .sum() was optimized too, but perhaps less so?

It may be good to raise a quick issue about this!

Thanks, Marten
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to