Spent an hour on this today.

What i am doing: simply reimplementing pairwise dot-product algorithm in
stock dense matrix times().

However, equipping every matrix with structure "flavor" (i.e. dense(...)
reports row-wise ,  and dense(...).t reports column wise, dense().t.t
reports row-wise again, etc.)

Next, wrote a binary operator that switches on combination of operand
orientation and flips misaligned operand(s) (if any) to match most "speedy"
orientation RW-CW. here are result for 300x300 dense matrix pairs:

Ad %*% Bd: (107.125,46.375)
Ad' %*% Bd: (206.475,39.325)
Ad %*% Bd': (37.2,42.65)
Ad' %*% Bd': (100.95,38.025)
Ad'' %*% Bd'': (120.125,43.3)

these results are for transpose combinations of original 300x300 dense
random matrices, averaged over 40 runs (so standard error should be well
controlled), in ms. First number is stock times() application (i.e. what
we'd do with %*% operator now), and second number is ms with rewriting
matrices into RW-CW orientation.

For example, AB reorients B only, just like A''B'', AB' reorients nothing,
and worst case A'B re-orients both (I also tried to run sum of outer
products for A'B case without re-orientation -- apparently L1 misses far
outweigh costs of reorientation there, i got very bad results there for
outer product sum).

as we can see, stock times() version does pretty bad for even dense
operands for any orientation except for the optimal.

Given that, i am inclined just to add orientation-driven structure
optimization here and replace all stock calls with just orientation
adjustment.

Of course i will need to append this matrix with sparse and sparse row
matrix combination (quite a bit of those i guess) and see what happens
compared to stock sparse multiplications.

But even that seems like a big win to me (basically, just doing
reorientation optimization seems to give 3x speed up on average in
matrix-matrix multiplication in 3 cases out of 4, and ties in 1 case).

Reply via email to