Mark, Unlike what we though, R*A*Rt turns to be more difficult than PtAP because sparse dot product is inefficient. Barry's cool idea only works well on A*Rt for your ex56.c thus far. We are trying to understand what we get and exploring ....
It is not ready for you dive into R*A*Rt yet :-( I'll let you informed about our progress. Hong On Wed, Nov 9, 2011 at 10:59 AM, Mark F. Adams <mark.adams at columbia.edu> wrote: > > On Nov 9, 2011, at 11:02 AM, Barry Smith wrote: > >> >> On Nov 9, 2011, at 7:58 AM, Mark F. Adams wrote: >> >>> FYI: I appear to be getting not great flop rates out of these methods on my >>> Mac: >> >> ? Known issue. Do you have alternative algorithms that would crank it up? >> This is something we are actively working on. >> > > I don't know what you're doing now, Hong mentioned that you had some good > ideas for optimizing the code from what we look at together when I was at > Argonne. > > One generic idea that has come to mind since we last talked, for RAP, is > folding the two parts: 1) T = A*P, 2) RAP = R*T, together. ?I have not looked > at this in detail but perhaps instead of computing the whole "T" here, > compute parts (a row, an element ...) call each one T_i, and use them right > away, RAP += R*T_i, in one big loop and throw them away. ?This might improve > cache performance because T will be high in cache. > > It sounds like the serial code is stable now. ?I will dive into it this week, > finally figure out what you are doing exactly, and see if I can come up with > any ideas. ?This is a hard problem and I'm not even sure what fast is here. > > Mark > >> ? Barry >> >>> >>> MatMatMult ? ? ? ? ? ? 2 1.0 1.1382e+00 1.0 8.48e+07 1.0 0.0e+00 0.0e+00 >>> 4.0e+00 ?4 ?1 ?0 ?0 ?2 ?12 ?8 ?0 ?0 ?2 ? ?75 >>> MatPtAPNumeric ? ? ? ? 2 1.0 4.3557e+00 1.0 7.82e+08 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 14 11 ?0 ?0 ?0 ?45 74 ?0 ?0 ?0 ? 180 >>> MatTrnMatMult ? ? ? ? ?2 1.0 6.0777e-01 1.0 3.31e+07 1.0 0.0e+00 0.0e+00 >>> 8.0e+00 ?2 ?0 ?0 ?0 ?4 ? 6 ?3 ?0 ?0 ?4 ? ?55 >>> >>> KSPSolve ? ? ? ? ? ? ? 1 1.0 2.9164e+00 1.0 1.57e+09 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 10 21 ?0 ?0 ?0 100100 ?0 ?0 ?0 ? 538 >>> >>> Mark >>> >>> On Nov 9, 2011, at 10:47 AM, Hong Zhang wrote: >>> >>>>> Hong, please update src/docs/website/documentation/changes/dev.html when >>>>> you >>>>> make API changes. >>>> Done and pushed to petsc-dev. >>>> >>>> Hong >>>> >>> >> >> > >