Hi Mark, Thanks for your email.
On Thu, Mar 21, 2019 at 6:39 AM Mark Adams via petsc-dev < petsc-dev@mcs.anl.gov> wrote: > I'm probably screwing up some sort of history by jumping into dev, but > this is a dev comment ... > > (1) -matptap_via hypre: This call the hypre package to do the PtAP trough >> an all-at-once triple product. In our experiences, it is the most memory >> efficient, but could be slow. >> > > FYI, > > I visited LLNL in about 1997 and told them how I did RAP. Simple 4 nested > loops. They were very interested. Clearly they did it this way after I > talked to them. This approach came up here a while back (eg, we should > offer this as an option). > > Anecdotally, I don't see a noticeable difference in performance on my 3D > elasticity problems between my old code (still used by the bone modeling > people) and ex56 ... > You may not see differences when the problem is small. What I observed is that the HYPRE PtAP is ten times slower than the PETSc scalable PtAP when we had a 3-billions problem on 10K processor cores. > > My kernel is an unrolled dense matrix triple product. I doubt Hypre did > this. It ran at about 2x+ the flop rate of the mat-vec at scale on the SP3 > in 2004. > Could you explain this more by adding some small examples? I am profiling the current PETSc algorithms on some real simulations. If PETSc PtAP still takes more memory than desired with my fix ( https://bitbucket.org/petsc/petsc/pull-requests/1452), I am going to implement the all-at-once triple product with dropping all intermediate data. If you have any documents (except the code you posted before), it would be a great help. Fande, > Mark > >