The PDL matmult routine was originally implemented as a loop over inner products. Last year it was redone as a cache-friendly tiled implementation.
--Chris On 5/22/11, Daniel Carrera <[email protected]> wrote: > Hi Dima, > > On 05/22/2011 11:45 PM, Dima Kogan wrote: >> The new functionality in PDL is able to distribute operations created >> by PDL threading into separate processor threads. This takes effect >> if, for example, you use PDL to multiply a 5000x5000x5 piddle by a >> 5000x5000 piddle. PDL threading treats this as 5 separate >> multiplications of 5000x5000 matrices, and the new code will >> parallelize this. However, if you're simply multiplying two 5000x5000 >> matrices together, there is no PDL threading involved, so the new patch >> will do nothing. > > > Ah, thanks. That makes everything a lot more clear now. > > >> It COULD do something if we define matrix multiplication as a bunch of >> matrix-vector multiplications threaded together. Then the >> parallelization will 'just work', but we don't define matrix >> multiplication this way. (Sorta off-topic: should we change the >> multiplication definition to this?) > > > This may not apply to PDL, but last year I tried something like this > using OpenMP (i.e. threads) and Fortran, and the "parallel" code was > actually slower. > > In Fortran, when I just did "matmul(A,B)" the compiler wrote a loop that > accessed memory very efficiently, and by forcing matrix-vector products > I ruined that optimization and made the code slower. But I have no idea > if this has any relevance to PDL. > > -- > I'm not overweight, I'm undertall. > > _______________________________________________ > Perldl mailing list > [email protected] > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl > _______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
