On 5/8/2010 3:44 AM, Craig DeForest wrote:
>
> Er, sorry, I was noodling around and may have jumped the gun. I just
> checked in a small speed improvement for matmult. It just evaluates
> the terms in the matrix product in tiled order (multiplying 32x32
> tiles) rather than in direct threading order; that fits each tile into
> 16k in the double-precision case, which is small enough to fit in L1
> cache of most performance CPUs. Unsurprisingly, it helps.
> Surprisingly, not so very much. On my PowerBook:
>
> perldl> $a = random(2000,2000);
> perldl> $b = random(2000,2000);
> perldl> {$t0=time; $c = $a->dummy(1)->inner($b->xchg(0,1)->dummy(2));
> ..{> $t1=time; print $t1-$t0,"\n";}
> 82
>
> perldl> {$t0=time; $d = $a x $b;
> ..{> $t1=time; print $t1-$t0,"\n";}
> 70
>
> perldl> print all($d==$c)
> 1
>
> I am a bit puzzled how these other packages manage to go so much
> faster...
The tiled calculation optimizes the memory accesses
but, if I understand correctly, the tile code still
uses the existing inner product algorithm.
If so the total memory ops is still N**3 rather than
the optimal N**2, they've just been moved to a different
level of the memory hierarchy.
Another possibility is that a 32x32 tile is not
big enough to hide the memory access time for the
data behind the floating point calculations.
As to other packages performance, I'm sure an
optimized C matrix multiply routine that did the
entire optimization would be very fast. I don't
know how much threading would/could be supported.
--Chris
> Cheers,
> Craig
>
> _______________________________________________
> Perldl mailing list
> [email protected]
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.819 / Virus Database: 271.1.1/2860 - Release Date: 05/07/10
> 14:26:00
>
_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl