Anne Archibald wrote: > On 22/03/2008, Travis E. Oliphant <[EMAIL PROTECTED]> wrote: > >> James Philbin wrote: >> > Personally, I think that the time would be better spent optimizing >> > routines for single-threaded code and relying on BLAS and LAPACK >> > libraries to use multiple cores for more complex calculations. In >> > particular, doing some basic loop unrolling and SSE versions of the >> > ufuncs would be beneficial. I have some experience writing SSE code >> > using intrinsics and would be happy to give it a shot if people tell >> > me what functions I should focus on. >> >> Fabulous! This is on my Project List of todo items for NumPy. See >> http://projects.scipy.org/scipy/numpy/wiki/ProjectIdeas I should spend >> some time refactoring the ufunc loops so that the templating does not >> get in the way of doing this on a case by case basis. >> >> 1) You should focus on the math operations: add, subtract, multiply, >> divide, and so forth. >> 2) Then for "combined operations" we should expose the functionality at >> a high-level. So, that somebody could write code to take advantage of it. >> >> It would be easiest to use intrinsics which would then work for AMD, >> Intel, on multiple compilers. >> > > I think even heavier use of code generation would be a good idea here. > There are so many different versions of each loop, and the fastest way > to run each one is going to be different for different versions and > different platforms, that a routine that assembled the code from > chunks and picked the fastest combination for each instance might make > a big difference - this is roughly what FFTW and ATLAS do. > > There are also some optimizations to be made at a higher level that > might give these optimizations more traction. For example: > > A = randn(100*100) > A.shape = (100,100) > A*A > > There's no reason the multiply ufunc couldn't flatten A and use a > single unstrided loop to do the multiplication. > Good idea, it does already do that :-) The ufunc machinery is also a good place for an optional thread pool.
Perhaps we could drum up interest in a Need for Speed Sprint on NumPy sometime over the next few months. -Travis O. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion