On Thu, Feb 10, 2011 at 2:26 PM, Mark Wiebe <mwwi...@gmail.com> wrote:
> On Thu, Feb 10, 2011 at 10:31 AM, Pauli Virtanen <p...@iki.fi> wrote: > >> Thu, 10 Feb 2011 12:16:12 -0600, Robert Kern wrote: >> [clip] >> > One thing that might be worthwhile is to make >> > implementations of sum() and cumsum() that avoid the ufunc machinery and >> > do their iterations more quickly, at least for some common combinations >> > of dtype and contiguity. >> >> I wonder what is the balance between the iterator overhead and the time >> taken in the reduction inner loop. This should be straightforward to >> benchmark. >> >> Apparently, some overhead decreased with the new iterators, since current >> Numpy master outperforms 1.5.1 by a factor of 2 for this benchmark: >> >> In [8]: %timeit M.sum(1) # Numpy 1.5.1 >> 10 loops, best of 3: 85 ms per loop >> >> In [8]: %timeit M.sum(1) # Numpy master >> 10 loops, best of 3: 49.5 ms per loop >> >> I don't think this is explainable by the new memory layout optimizations, >> since M is C-contiguous. >> >> Perhaps there would be room for more optimization, even within the ufunc >> framework? >> > > I played around with this in einsum, where it's a bit easier to specialize > this case than in the ufunc machinery. What I found made the biggest > difference is to use SSE prefetching instructions to prepare the cache in > advance. Here are the kind of numbers I get, all from the current Numpy > master: > > In [7]: timeit M.sum(1) > 10 loops, best of 3: 44.6 ms per loop > > In [8]: timeit dot(M, o) > 10 loops, best of 3: 36.8 ms per loop > > In [9]: timeit einsum('ij->i', M) > 10 loops, best of 3: 32.1 ms per loop > ... > I get an even bigger speedup: In [5]: timeit M.sum(1) 10 loops, best of 3: 19.2 ms per loop In [6]: timeit dot(M, o) 100 loops, best of 3: 15.2 ms per loop In [7]: timeit einsum('ij->i', M) 100 loops, best of 3: 11.4 ms per loop <snip> Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion