On Mon, Jan 10, 2011 at 9:47 AM, Francesc Alted <fal...@pytables.org> wrote:
> <snip> > > so, the new code is just < 5% slower. I suppose that removing the > NPY_ITER_ALIGNED flag would give us a bit more performance, but that's > great as it is now. How did you do that? Your new_iter branch in NumPy > already deals with unaligned data, right? > Take a look at lowlevel_strided_loops.c.src. In this case, the buffering setup code calls PyArray_GetDTypeTransferFunction, which in turn calls PyArray_GetStridedCopyFn, which on an x86 platform returns _aligned_strided_to_contig_size8. This function has a simple loop of copies using a npy_uint64 data type. > The new code also needs support for the reduce operation. I didn't > > look too closely at the code for that, but a nested iteration > > pattern is probably appropriate. If the inner loop is just allowed > > to be one dimension, it could be done without actually creating the > > inner iterator. > > Well, if you can support reduce operations with your patch that would be > extremely good news as I'm afraid that the current reduce code is a bit > broken in Numexpr (at least, I vaguely remember seeing it working badly > in some cases). > Cool, I'll take a look at some point. I imagine with the most obvious implementation small reductions would perform poorly. -Mark
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion