James Philbin wrote: > Personally, I think that the time would be better spent optimizing > routines for single-threaded code and relying on BLAS and LAPACK > libraries to use multiple cores for more complex calculations. In > particular, doing some basic loop unrolling and SSE versions of the > ufuncs would be beneficial. I have some experience writing SSE code > using intrinsics and would be happy to give it a shot if people tell > me what functions I should focus on. > > Fabulous! This is on my Project List of todo items for NumPy. See http://projects.scipy.org/scipy/numpy/wiki/ProjectIdeas I should spend some time refactoring the ufunc loops so that the templating does not get in the way of doing this on a case by case basis.
1) You should focus on the math operations: add, subtract, multiply, divide, and so forth. 2) Then for "combined operations" we should expose the functionality at a high-level. So, that somebody could write code to take advantage of it. It would be easiest to use intrinsics which would then work for AMD, Intel, on multiple compilers. -Travis O. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion