On Tue, Jun 15, 2010 at 9:37 AM, David Cournapeau <courn...@gmail.com>wrote:
> On Wed, Jun 16, 2010 at 12:16 AM, Pauli Virtanen <p...@iki.fi> wrote: > > ti, 2010-06-15 kello 10:10 -0400, Anne Archibald kirjoitti: > >> Correct me if I'm wrong, but this code still doesn't seem to make the > >> optimization of flattening arrays as much as possible. The array you > >> get out of np.zeros((100,100)) can be iterated over as an array of > >> shape (10000,), which should yield very substantial speedups. Since > >> most arrays one operates on are like this, there's potentially a large > >> speedup here. (On the other hand, if this optimization is being done, > >> then these tests are somewhat deceptive.) > > > > It does perform this optimization, and unravels the loop as much as > > possible. If all arrays are wholly contiguous, iterators are not even > > used in the ufunc loop. Check the part after > > > > /* Determine how many of the trailing dimensions are contiguous > > */ > > > > However, in practice it seems that this typically is not a significant > > win -- I don't get speedups over the unoptimized numpy code even for > > shapes > > > > (2,)*20 > > > > where you'd think that the iterator overhead could be important: > > I unfortunately don't have much time to look into the code ATM, but > tests should be run with different CPU. When I implemented the > neighborhood iterator, I observed significant (somtimes several tens > of %) differences - the gcc version also matters, > > That's a common problem with trying to optimize at that level, things become architecture and compiler dependent. Reminds me a bit of an experiment where the experimenter was using genetic optimization to design a circuit on a chip and the optimal design ended up taking advantage of some stray capacitance. Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion