Robert Bradshaw wrote:
> On Dec 11, 2008, at 4:33 PM, Joris De Ridder wrote:
>
>   
>> Hi,
>>
>> Below you can find a small cython function that implements a simple  
>> max-filter. That is, given an image (a 2d numpy ndarray), each  
>> pixel is replaced by the maximum value in a neighborhood [-dx:dx, - 
>> dy:dy] around that pixel. For simplicity it just ignores the edges  
>> of the image.
>>
>> I implemented the same function using C++ & ctypes, and the latter  
>> function turns out to be about 100 times faster than the cython  
>> one, for a 500x500 image, and dx,dy = 5,5. This surprises me,  
>> because of the simplicity of the function. I checked the C file,  
>> and also the translation of the maxfilter() function in C is quite  
>> simple. The bulk of the time is of course spent in the loops, and  
>> the only bottleneck that I can see is accessing the numpy arrays.  
>> Is there anything I can do in Cython to speed that up?
>>     
>
> Try declaring i and j to be unsigned ints, and see if that speeds  
> anything up. (Currently it checks to see if the index is negative and  
> does the Python-style indexing backwards in that case). You could  
>   
Note that you can also use the negative_indices parameter, i.e. 
ndarray[DTYPE_t, ndim=2, negative_indices=False]. This is guaranteed not 
to emit those checks.

Also I'd like to add these things to check:
- Make absolutely sure that your Cython-generated C code is compiled 
using -O2
- How does your profiling work? 500x500 isn't all that much, and the 
calls to "np.zeros" takes some time that C++ might not take (Gabriel 
Gellner has work in progress to make this faster in Cython). Can you try 
with 5000x5000 and see if there's still the same speed difference? If 
this theory is right (i.e. you profile the entire function and not just 
the loop) then the difference in speed should converge to a ratio 
between 1 and 3 as N goes to infinity.
- Is the array statically allocated in C++? I.e. does it say "int 
data[500][500]", or is it dynamically allocated? In the former case C++ 
could be theoretically a little bit faster (but if it makes a real 
impact I'll fix Cython to make it as fast :-))

100x difference is definitely too much, in fact I'd be worried with a 3x 
difference (if only the loop is profiled).

Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to