On 10/11/2011 12:06 PM, Skipper Seabold wrote:
> On Tue, Oct 11, 2011 at 12:41 PM, Christoph Groth<c...@falma.de>  wrote:
>> Skipper Seabold<jsseab...@gmail.com>  writes:
>>
>>> So it's the dot function being called repeatedly on smallish arrays
>>> that's the bottleneck? I've run into this as well. See this thread
>>> [1].
>>> (...)
>> Thanks for the links.  "tokyo" is interesting, though I fear the
>> intermediate matrix size regime where it really makes a difference will
>> be rather small.  My concern is in really tiny vectors, where it's not
>> even worth to call BLAS.
>>
> IIUC, it's not so much the BLAS that's helpful but avoiding the
> overhead in calling numpy.dot from cython.
>
>>> I'd be very interested to hear if you achieve a great speed-up with
>>> cython+tokyo.
>> I try to solve this problem in some way or other.  I'll post here if I
>> end up with something interesting.
> Please do.
>
> Skipper
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
In the example, M is an identity 2 by 2 array. This creates a lot of 
overhead in creating arrays from a tuple followed by two dot operations. 
But the tuple code is not exactly equivalent because M is 'expanded' 
into a single dimension to avoid some of the unnecessary 
multiplications. Thus the tuple code is already a different algorithm 
than the numpy code so the comparison is not really correct.

All that is needed here for looping over scalar values of x, y and 
radius is to evaluate (x*x + y*y) < radius**2
That could probably be done with array multiplication and broadcasting.

Bruce




_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to