The tiled implementation appeared in PDL-2.007 so if you're using an older PDL version, you'll get the extra slowdown for free! :-)
On Sun, Sep 7, 2014 at 8:45 AM, Chris Marshall <devel.chm...@gmail.com> wrote: > Performance was also helped by the fact that the original > inner product matmult algorithm was replaced by a tile > based one which leads to better cache reuse, > > ,,,and you have Craig to thank for the implementation. :-) > > --Chris > > > On Sun, Sep 7, 2014 at 7:03 AM, David Mertens <dcmertens.p...@gmail.com> > wrote: >> It's also quite likely that Intel has heuristics to pre-fetch data, and >> they're helping out. Maybe. >> >> David >> >> >> On Sat, Sep 6, 2014 at 10:28 PM, Craig DeForest <defor...@boulder.swri.edu> >> wrote: >>> >>> Cool! Maybe there are more cache hits than I expected (which was none)... >>> >>> (Mobile) >>> >>> >>> > On Sep 6, 2014, at 1:54 PM, Chris Marshall <devel.chm...@gmail.com> >>> > wrote: >>> > >>> > On my PC (2.8GHZ i7) it takes about an hour for the multiply >>> > just using $a x $b as Craig shows. I haven't tried using the >>> > autothreading support to see how that changes things. >>> > >>> > As discussed already, GPU acceleration could allow for much >>> > faster computation. For a start Nvidia has a cuBLAS library >>> > which implements matrix multiply which could be used to >>> > optimize the performance. >>> > >>> > --Chris _______________________________________________ Perldl mailing list Perldl@jach.hawaii.edu http://mailman.jach.hawaii.edu/mailman/listinfo/perldl