Thu, 10 Jun 2010 23:56:56 +0200, Sturla Molden wrote: [clip] > Also about array iterators in NumPy's C base (i.e. for doing something > along an axis): we don't need those. There is a different way of coding > which leads to faster code. > > 1. Collect an array of pointers to each subarray (e.g. using > std::vector<dtype*> or dtype**) > 2. Dispatch on the pointer array...
This is actually what the current ufunc code does. The innermost dimension is handled via the ufunc loop, which is a simple for loop with constant-size step and is given a number of iterations. The array iterator objects are used only for stepping through the outer dimensions. That is, it essentially steps through your dtype** array, without explicitly constructing it. An abstraction for iteration through an array is useful, also for building the dtype** array, so we should probably retain them. For multidimensional arrays, you run here into the optimization problem of choosing the order of axes so that the memory access pattern makes a maximally efficient use of the cache. Currently, this optimization heuristic is really simple, and makes the wrong choice even in the simple 2D case. We could in principle use OpenMP to parallelize the outer loop, as long as the inner ufunc part is implemented in C, and does not go back into Python. But if the problem is memory-bound, it is not clear that parallelization helps. Another task for optimization would perhaps be to implement specialized loops for each operation, currently we do one function indirection per iteration which probably costs something. But again, it may be that the operations are already bounded by memory speed, and this would not improve performance. -- Pauli Virtanen _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion