Dag Sverre Seljebotn wrote:
> Thanks for your input!, you definitely know more about such computations
> than me.
>
> Roland Schulz wrote:
>> Component wise operations without optimization (thus collapsing
>> d=a*b*c*d into one loop instead of 3 and not using temporary arrays)
>> does not give you any speed-up over Numpy for vectorized code with large
>> arrays.
>>
>> For vectorized Numpy code the bottleneck is not the call from Python to
>> C, but the inefficient use of cache because of the temporary arrays.
>
> I don't know enough about this, but these two paragraphs seem slightly
> contradictory to me.
Or did you mean that
optimization == collapsing d=a*b*c*d into one loop instead of 3 and not
using temporary arrays
?
It is definitely the plan of CEP 517 that
cdef int[:] a = ..., b = ..., c = ..., d = ...
d = a * b * c * d
turn into something very similar to
cdef size_t tmp1
cdef int[:] tmpresult = new array of right length
for tmp1 in range(a.shape[0]):
tmpresult[tmp1] = a[tmp1] + b[tmp1] + c[tmp1] + d[tmp1]
d = tmpresult
although broadcasting should be supported and makes it more complicated
(repeat arrays of length 1, if d has length 1 it must be reallocated --
and so on). With multidimensional this becomes more difficult, there's
lots of ugly details concerning broadcasting and non-contiguous arrays
(where the "innermost" dimension must be found at runtime...)
IMPORTANT NOTE: All of this is way ahead, all that is the question now
is a coarse roadmap, and whether this is wanted at all or not.
--
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev