Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Charles R Harris Wed, 08 Jul 2009 15:23:29 -0700

On Wed, Jul 8, 2009 at 4:16 PM, Pauli Virtanen <pav...@iki.fi<pav%2...@iki.fi>
> wrote:


> Hi,
>
> Ticket #1143 points out that Numpy's reduction operations are not
> always cache friendly. I worked a bit on tuning them.
>
>
> Just to tickle some interest, a "pathological" case before optimization:
>
>    In [1]: import numpy as np
>    In [2]: x = np.zeros((80000, 256))
>    In [3]: %timeit x.sum(axis=0)
>    10 loops, best of 3: 850 ms per loop
>
> After optimization:
>
>    In [1]: import numpy as np
>    In [2]: x = np.zeros((80000, 256))
>    In [3]: %timeit x.sum(axis=0)
>    10 loops, best of 3: 78.5 ms per loop
>
> For comparison, a reduction operation on a contiguous array of
> the same size:
>
>    In [4]: x = np.zeros((256, 80000))
>    In [5]: %timeit x.sum(axis=1)
>    10 loops, best of 3: 88.9 ms per loop
>

;)


>
> Funnily enough, it's actually slower than the reduction over the
> axis with the larger stride. The improvement factor depends on
> the CPU and its cache size.
>
>
How do the benchmarks compare with just making contiguous copies? Which is
blocking of sort, I suppose.

Chuck

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Optimizing reduction loops (sum(), prod(), et al.)

Reply via email to