i mean, before we consider hardware based implementations for bigger matrices, this change seems like a very easy win
On Fri, Apr 17, 2015 at 5:26 PM, Dmitriy Lyubimov <[email protected]> wrote: > Spent an hour on this today. > > What i am doing: simply reimplementing pairwise dot-product algorithm in > stock dense matrix times(). > > However, equipping every matrix with structure "flavor" (i.e. dense(...) > reports row-wise , and dense(...).t reports column wise, dense().t.t > reports row-wise again, etc.) > > Next, wrote a binary operator that switches on combination of operand > orientation and flips misaligned operand(s) (if any) to match most "speedy" > orientation RW-CW. here are result for 300x300 dense matrix pairs: > > Ad %*% Bd: (107.125,46.375) > Ad' %*% Bd: (206.475,39.325) > Ad %*% Bd': (37.2,42.65) > Ad' %*% Bd': (100.95,38.025) > Ad'' %*% Bd'': (120.125,43.3) > > these results are for transpose combinations of original 300x300 dense > random matrices, averaged over 40 runs (so standard error should be well > controlled), in ms. First number is stock times() application (i.e. what > we'd do with %*% operator now), and second number is ms with rewriting > matrices into RW-CW orientation. > > For example, AB reorients B only, just like A''B'', AB' reorients nothing, > and worst case A'B re-orients both (I also tried to run sum of outer > products for A'B case without re-orientation -- apparently L1 misses far > outweigh costs of reorientation there, i got very bad results there for > outer product sum). > > as we can see, stock times() version does pretty bad for even dense > operands for any orientation except for the optimal. > > Given that, i am inclined just to add orientation-driven structure > optimization here and replace all stock calls with just orientation > adjustment. > > Of course i will need to append this matrix with sparse and sparse row > matrix combination (quite a bit of those i guess) and see what happens > compared to stock sparse multiplications. > > But even that seems like a big win to me (basically, just doing > reorientation optimization seems to give 3x speed up on average in > matrix-matrix multiplication in 3 cases out of 4, and ties in 1 case). >
