A Monday 10 January 2011 19:29:33 Mark Wiebe escrigué: > > so, the new code is just < 5% slower. I suppose that removing the > > NPY_ITER_ALIGNED flag would give us a bit more performance, but > > that's great as it is now. How did you do that? Your new_iter > > branch in NumPy already deals with unaligned data, right? > > Take a look at lowlevel_strided_loops.c.src. In this case, the > buffering setup code calls PyArray_GetDTypeTransferFunction, which > in turn calls PyArray_GetStridedCopyFn, which on an x86 platform > returns > _aligned_strided_to_contig_size8. This function has a simple loop of > copies using a npy_uint64 data type.
I see. Brilliant! > > Well, if you can support reduce operations with your patch that > > would be extremely good news as I'm afraid that the current reduce > > code is a bit broken in Numexpr (at least, I vaguely remember > > seeing it working badly in some cases). > > Cool, I'll take a look at some point. I imagine with the most > obvious implementation small reductions would perform poorly. IMO, reductions like sum() or prod() are mainly limited my memory access, so my advise would be to not try to over-optimize here, and just make use of the new iterator. We can refine performance later on. -- Francesc Alted _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion