> >> >> MatMult 2 1.0 1.1801e-02 1.0 1.16e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 10 0 0 0 0 10 0 0 0 981 >> MatSOR 3 1.0 4.6818e-02 1.0 1.78e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 16 0 0 0 0 16 0 0 0 380 >> >> Thus we see that we save all of the MatMult time which is 2 units of the 5 >> units needed with SOR in terms of flops computed so 40% of the work but only >> 20% of the time. >> >> On the post-smooth of the multigrid there is a nonzero initial guess >> eisenstat does >> >> if (nonzero) { >> ierr = VecCopy(x,eis->b[pc->presolvedone-1]);CHKERRQ(ierr); >> ierr = >> MatSOR(eis->A,eis->b[pc->presolvedone-1],eis->omega,SOR_APPLY_UPPER,0.0,1,1,x);CHKERRQ(ierr); >> >> so an extra .5 work unit >> >> while Chebychev does the matrix vector product to get the initial residual >> so >> >> Eisenstat is 3 units + .5 unit + 1 unit = 4.5 units >> SOR 5 units + 1 unit = 6 units >> >> so for combined pre and post smooth Eisenstat/SOR = 7.5/11 work units > > I think that is right, and indeed, that looks like enough benefit to > justify converting the matrix format.
Just to be clear. The current eisenstat code (MatSOR) uses a standard AIJ matrix (obviously) but applies SOR with the U or L terms and so has some logic to skip stuff (e.g., skip L+D when processing U). If we have native U and L matrices then we should be able to recover most of the ~2x performance penalty that Barry is showing. If I'm on the right page then we would probably want this new matrix to have a MatMult that applies U & D & L in one shot. (It might be good to fold this together if performance is limited by cache misses on the source vector.)