On Wed, Mar 10, 2010 at 03:37, Jaroslav Hajek <[email protected]> wrote: > so it seems that block multiplication is really a big advantage (with > larger blocks), so much that it even outweighs the extra work & memory > needed by the m-code. It's not that surprising, considering that > optimized BLAS also get their speed from blocked operations. > Unfortunately; while it's relatively easy to split arbitrary dense > matrix to blocks, it would be quite hard to do so with a general > sparse (CSC) matrix. That's why a separate class is needed to take > advantage of block structure.
I suspect this is the issue. The optimized BLAS is very careful to use block sizes that match the cache. None of the sparse code I've seen does this. However, this should be feasible to do. Cache optimized sparse code would be a nice - but tricky - research project. -- Andy Adler <[email protected]> +1-613-520-2600x8785 ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Octave-dev mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/octave-dev
