> On 16 Feb 2024, at 2:48 am, Marten van Kerkwijk <m...@astro.utoronto.ca> 
> wrote:
> 
>> In [45]: %timeit np.add.reduce(a, axis=None)
>> 42.8 µs ± 2.44 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>> 
>> In [43]: %timeit dotsum(a)
>> 26.1 µs ± 718 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>> 
>> But theoretically, sum, should be faster than dot product by a fair bit.
>> 
>> Isn’t parallelisation implemented for it?
> 
> I cannot reproduce that:
> 
> In [3]: %timeit np.add.reduce(a, axis=None)
> 19.7 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
> 
> In [4]: %timeit dotsum(a)
> 47.2 µs ± 360 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
> 
> But almost certainly it is indeed due to optimizations, since .dot uses
> BLAS which is highly optimized (at least on some platforms, clearly
> better on yours than on mine!).
> 
> I thought .sum() was optimized too, but perhaps less so?


I can confirm at least it does not seem to use multithreading – with the 
conda-installed numpy+BLAS
I almost exactly reproduce your numbers, whereas linked against my own OpenBLAS 
build

In [3]: %timeit np.add.reduce(a, axis=None)
19 µs ± 111 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# OMP_NUM_THREADS=1
In [4]: %timeit dots(a)
20.5 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# OMP_NUM_THREADS=8
In [4]: %timeit dots(a)
9.84 µs ± 1.1 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

add.reduce shows no difference between the two and always remains at <= 100 % 
CPU usage.
dotsum is scaling still better with larger matrices, e.g. ~4 x for 1000x1000.

Cheers,
                                                        Derek
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to