Thu, 10 Feb 2011 12:16:12 -0600, Robert Kern wrote: [clip] > One thing that might be worthwhile is to make > implementations of sum() and cumsum() that avoid the ufunc machinery and > do their iterations more quickly, at least for some common combinations > of dtype and contiguity.
I wonder what is the balance between the iterator overhead and the time taken in the reduction inner loop. This should be straightforward to benchmark. Apparently, some overhead decreased with the new iterators, since current Numpy master outperforms 1.5.1 by a factor of 2 for this benchmark: In [8]: %timeit M.sum(1) # Numpy 1.5.1 10 loops, best of 3: 85 ms per loop In [8]: %timeit M.sum(1) # Numpy master 10 loops, best of 3: 49.5 ms per loop I don't think this is explainable by the new memory layout optimizations, since M is C-contiguous. Perhaps there would be room for more optimization, even within the ufunc framework? -- Pauli Virtanen _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion