sigh; yet another email dropped by the list. David Warde-Farley wrote: > On 21-Oct-09, at 9:14 AM, Pauli Virtanen wrote: > >> Since these are ufuncs, I suppose the SSE implementations could just >> be >> put in a separate module, which is always compiled. Before importing >> the >> module, we could simply check from Python side that the CPU supports >> the >> necessary instructions. If everything is OK, the accelerated >> implementations would then just replace the Numpy routines. > > Am I mistaken or wasn't that sort of the goal of Andrew Friedley's > CorePy work this summer? > > Looking at his slides again, the speedups are rather impressive. I > wonder if these could be usefully integrated into numpy itself?
Yes, my GSoC project is closely related, though I didn't do the CPU detection part, that'd be easy to do. Also I wrote my code specifically for 64-bit x86. I didn't focus so much on the transcendental functions, though they wouldn't be too hard to implement. There's also the possibility to provide implementations with differing tradeoffs between accuracy and performance. I think the blog link got posted already, but here's relevant info: http://numcorepy.blogspot.com http://www.corepy.org/wiki/index.php?title=CoreFunc I talked about this in my SciPy talk and up-coming paper, as well. Also people have just been talking about x86 in this thread -- other architectures could be supported too; eg PPC/Altivec or even Cell SPU and other accelerators. I actually wrote a quick/dirty implementation of addition and vector normalization ufuncs for Cell SPU recently. Basic result is that overall performance is very roughly comparable to a similar speed x86 chip, but this is a huge win over just running on the extremely slow Cell PPC cores. Andrew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion