On Tue, Jan 4, 2011 at 4:34 AM, David Cournapeau <courn...@gmail.com> wrote:
> > Ok, I took some time to look into it, but I am far from understanding > everything yet. I will need more time. > Yeah, it ended up being pretty large. I think the UFunc code will shrink substantially when it uses this iterator, which is something I was targeting. One design issue which bothers me a bit is the dynamically created > structure for the iterator - do you have some benchmarks which show > that this design is significantly better than a plain old C data > structure with a couple of dynamically allocated arrays ? Besides > bypassing the compiler type checks, I am a bit worried about the > ability to extend the iterator through "inheritence in C" like I did > with neighborhood iterator, but maybe I should just try it. > I know what you mean - if I could use C++ templates the implementation could probably have the best of both worlds, but seeing as NumPy is in C I tried to compromise mostly towards higher performance. I don't have benchmarks showing that the implementation is faster, but I did validate that the compiler does the optimizations I want it to do. For example, the specialized iternext function for 1 operand and 1 dimension, a common case because of dimension coalescing, looks like this on my machine: 0: 48 83 47 58 01 addq $0x1,0x58(%rdi) 5: 48 8b 47 60 mov 0x60(%rdi),%rax 9: 48 01 47 68 add %rax,0x68(%rdi) d: 48 8b 47 50 mov 0x50(%rdi),%rax 11: 48 39 47 58 cmp %rax,0x58(%rdi) 15: 0f 9c c0 setl %al 18: 0f b6 c0 movzbl %al,%eax 1b: c3 retq The function has no branches and all memory accesses are directly offset from the iter pointer %rdi, something I think is pretty good. If this data was in separately allocated arrays, I think it would hurt locality as well as add some more instructions. In the implementation, I tried to structure the data access macros so errors are easy to spot. Accessing the bufferdata and the axisdata isn't typed, but I can think of ways to do that. I was viewing the implementation as fully opaque to any non-iterator code, even within NumPy, do you think such access will be necessary? I think the code would benefit from smaller functions, too - 500+ > lines functions is just too much IMO, it should be split up. > I definitely agree, I've been splitting things up as they got large, but that's not finished. I also think the main iterator .c file is too large and needs splitting up. To get a deeper understanding of the code, I am starting to implement > several benchmarks to compare old and new iterator - do you already > have some of them handy ? > So far I've just done timing with the Python exposure, C-based benchmarking is welcome. Where possible, NPY_ITER_NO_INNER_ITERATION should be used, since it exposes the possibility of longer inner loops with no function calls. An example where this is not possible is when coordinates are required. I should probably put together a collection of copy/paste templates for typical use. Thanks for the hard work, that's a really nice piece of code, > Thanks for taking the time to look into it, Mark
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion