On Wed, Jun 16, 2010 at 12:16 AM, Pauli Virtanen <p...@iki.fi> wrote: > ti, 2010-06-15 kello 10:10 -0400, Anne Archibald kirjoitti: >> Correct me if I'm wrong, but this code still doesn't seem to make the >> optimization of flattening arrays as much as possible. The array you >> get out of np.zeros((100,100)) can be iterated over as an array of >> shape (10000,), which should yield very substantial speedups. Since >> most arrays one operates on are like this, there's potentially a large >> speedup here. (On the other hand, if this optimization is being done, >> then these tests are somewhat deceptive.) > > It does perform this optimization, and unravels the loop as much as > possible. If all arrays are wholly contiguous, iterators are not even > used in the ufunc loop. Check the part after > > /* Determine how many of the trailing dimensions are contiguous > */ > > However, in practice it seems that this typically is not a significant > win -- I don't get speedups over the unoptimized numpy code even for > shapes > > (2,)*20 > > where you'd think that the iterator overhead could be important:
I unfortunately don't have much time to look into the code ATM, but tests should be run with different CPU. When I implemented the neighborhood iterator, I observed significant (somtimes several tens of %) differences - the gcc version also matters, David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion