Thu, 10 Feb 2011 20:49:28 +0000, Pauli Virtanen wrote:
[clip]
>   1. Check first if the bottleneck is in the inner reduction loop
> (function DOUBLE_add in loops.c.src:712) or in the outer iteration
> (function PyUFunc_ReductionOp in ufunc_object.c:2781).
>  2. If it's in the inner loop, some optimizations are possible, e.g. 
> specialized cases for sizeof(item) strides. Think how to add them
> cleanly.

A quick check (just replace the inner loop with a no-op) shows that for 
100 items, the bottleneck is in the inner loop. The cross-over between 
inner loop time and strided iterator overhead apparently occurs around 
~20-30 items (on the machine I used for testing).

Anyway, spending time for optimizing the inner loop for a 30% speed gain 
(max) seems questionable...

-- 
Pauli Virtanen

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to