On May 7, 2010, at 3:27 AM, Daniel Carrera wrote: >> The good news is that is what caches are for which is why things >> aren't so bad for smaller matrices. This type of optimization >> is what is needed to be done to speed up PDL's matrix multiply. >> Since you have O(N**3) computations and O(N**2) memory accesses, >> for large matrix multiplies you can completely hide the memory >> access cost---if it is implemented to do that... > > Yeah, I think I understand now. Is the high-performance implementation > hard to do? I don't really have a sense of whether we are talking > about > a weekend project or a major overhaul.
It could be a weekend project for someone smart. Doing it "Right" would require an overhaul of the PP code generator itself, which is pretty hairy - but making matrix multiplication, in particular, using tiling to optimize cache usage could probably be done with stupid index tricks. _______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
