On May 7, 2010, at 3:27 AM, Daniel Carrera wrote:
>> The good news is that is what caches are for which is why things
>> aren't so bad for smaller matrices.  This type of optimization
>> is what is needed to be done to speed up PDL's matrix multiply.
>> Since you have O(N**3) computations and O(N**2) memory accesses,
>> for large matrix multiplies you can completely hide the memory
>> access cost---if it is implemented to do that...
>
> Yeah, I think I understand now. Is the high-performance implementation
> hard to do? I don't really have a sense of whether we are talking  
> about
> a weekend project or a major overhaul.


It could be a weekend project for someone smart.  Doing it "Right"  
would require an overhaul of the PP code generator itself, which is  
pretty hairy - but making matrix multiplication, in particular, using  
tiling to optimize cache usage could probably be done with stupid  
index tricks.



_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to