On 02/10/06, Travis Oliphant <[EMAIL PROTECTED]> wrote:

> Perhaps those inner 1-d loops could be optimized (using prefetch or
> something) to reduce the number of cache misses on the inner
> computation, and the concept of looping over the largest dimension
> (instead of the last dimension) should be re-considered.

Cache control seems to be the main factor deciding the speed of many
algorithms. Prefectching could make a huge difference, particularly on
NUMA machines (like a dual opteron). I think GCC has a moderately
portable way to request it (though it may be only in beta versions as
yet).

More generally, all the tricks that ATLAS uses to accelerate BLAS
routines would (in principle) be applicable here. The implementation
would be extremely difficult, though, even if all the basic loops
could be expressed in a few primitives.

A. M. Archibald

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion

Reply via email to