On Wed, Jul 8, 2009 at 4:16 PM, Pauli Virtanen <pav...@iki.fi<pav%2...@iki.fi> > wrote:
> Hi, > > Ticket #1143 points out that Numpy's reduction operations are not > always cache friendly. I worked a bit on tuning them. > > > Just to tickle some interest, a "pathological" case before optimization: > > In [1]: import numpy as np > In [2]: x = np.zeros((80000, 256)) > In [3]: %timeit x.sum(axis=0) > 10 loops, best of 3: 850 ms per loop > > After optimization: > > In [1]: import numpy as np > In [2]: x = np.zeros((80000, 256)) > In [3]: %timeit x.sum(axis=0) > 10 loops, best of 3: 78.5 ms per loop > > For comparison, a reduction operation on a contiguous array of > the same size: > > In [4]: x = np.zeros((256, 80000)) > In [5]: %timeit x.sum(axis=1) > 10 loops, best of 3: 88.9 ms per loop > ;) > > Funnily enough, it's actually slower than the reduction over the > axis with the larger stride. The improvement factor depends on > the CPU and its cache size. > > How do the benchmarks compare with just making contiguous copies? Which is blocking of sort, I suppose. Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion