Mark Adams via petsc-users <[email protected]> writes: > Anyway, my data for this is in my SC 2004 paper (MakeNextMat_private in > attached, NB, this is code that I wrote in grad school). It is memory > efficient and simple, just four nested loops i,j,I,J: C(I,J) = > P(i,I)*A(i,j)*P(j,J). In eyeballing the numbers and from new data that I am > getting from my bone modeling colleagues, that use this old code on > Stampede now, the times look reasonable compared to GAMG. This is optimized > for elasticity, where I unroll loops (so it is really six nested loops).
Is the A above meant to include some ghosted rows? > In thinking about this now, I think you want to make a local copy of P with > rows (j) for every column in A that you have locally, then transpose this > local thing for the P(j,J) term. A sparse AXPY on j. (My code uses a > special tree data structure but a matrix is simpler.) Why transpose for P(j,J)?
