On 5/6/2010 8:17 PM, Daniel Carrera wrote: > > Chris Marshall<[email protected]> wrote: > >> Well, if I cared about optimizing the matmul >> performance, I would put in an optimized kernel >> for that. > > I wonder if there are any PDL users who care about matrix mult. I see a > lot of astronomy people on this list. Not a lot of matrix mult in > astronomy AFAIK.
Any sort of algorithmic processing these days can be, and often is, formulated in linear algebra terms. For me, the biggest win from PDL is the ease of developing and expressing new science/algorithms and getting something working. However, if I end up with data that needs crunching, I'll definitely be wanting that lost factor of 10 in performance. That said, I'd like to see PDL ported *fully* to win32 first.... --Chris >> Well, I took a look at out default matrix multiply >> in Primitive.pm. This is the kernel: >> >> $a->dummy(1)->inner($b->xchg(0,1)->dummy(2),$c); >> >> so you can see that the kernel is an inner product >> operation so that the total op count is something >> like: >> >> O(N**3) memory ops >> O(N**3) float ops > > Interesting. I have a couple of dumb questions: > > 1) Why is that O(N**3) memory ops? I'm not very versed with PDL, but it > looks to me that ->dummy() and ->xchg() should not cost any memory. Is > there a reason why ->inner() costs extra memory? The inner product takes 2 O(N) vectors and produces O(1) result. The threading is over O(N**2) elements of the output matrix. O(N) * O(N**2) is O(N**3) A more complete analysis would include the lengths of the input matrix dimensions, say [M,N] and [N,P] to produce an [M,P] output matrix. The simple analysis is to have all matrices [N,N] and not worry about odd O(1) terms in the noise. --Chris _______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
