sigh; yet another email dropped by the list.

David Warde-Farley wrote:
> On 21-Oct-09, at 9:14 AM, Pauli Virtanen wrote:
> 
>> Since these are ufuncs, I suppose the SSE implementations could just  
>> be
>> put in a separate module, which is always compiled. Before importing  
>> the
>> module, we could simply check from Python side that the CPU supports  
>> the
>> necessary instructions. If everything is OK, the accelerated
>> implementations would then just replace the Numpy routines.
> 
> Am I mistaken or wasn't that sort of the goal of Andrew Friedley's  
> CorePy work this summer?
> 
> Looking at his slides again, the speedups are rather impressive. I  
> wonder if these could be usefully integrated into numpy itself?

Yes, my GSoC project is closely related, though I didn't do the CPU 
detection part, that'd be easy to do.  Also I wrote my code specifically 
for 64-bit x86.

I didn't focus so much on the transcendental functions, though they 
wouldn't be too hard to implement.  There's also the possibility to 
provide implementations with differing tradeoffs between accuracy and 
performance.

I think the blog link got posted already, but here's relevant info:

http://numcorepy.blogspot.com
http://www.corepy.org/wiki/index.php?title=CoreFunc

I talked about this in my SciPy talk and up-coming paper, as well.

Also people have just been talking about x86 in this thread -- other 
architectures could be supported too; eg PPC/Altivec or even Cell SPU 
and other accelerators.  I actually wrote a quick/dirty implementation 
of addition and vector normalization ufuncs for Cell SPU recently. Basic 
result is that overall performance is very roughly comparable to a 
similar speed x86 chip, but this is a huge win over just running on the 
extremely slow Cell PPC cores.

Andrew
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to