I'm probably screwing up some sort of history by jumping into dev, but this
is a dev comment ...

(1) -matptap_via hypre: This call the hypre package to do the PtAP trough
> an all-at-once triple product. In our experiences, it is the most memory
> efficient, but could be slow.
>

FYI,

I visited LLNL in about 1997 and told them how I did RAP. Simple 4 nested
loops. They were very interested. Clearly they did it this way after I
talked to them. This approach came up here a while back (eg, we should
offer this as an option).

Anecdotally, I don't see a noticeable difference in performance on my 3D
elasticity problems between my old code (still used by the bone modeling
people) and ex56 ...

My kernel is an unrolled dense matrix triple product. I doubt Hypre did
this. It ran at about 2x+ the flop rate of the mat-vec at scale on the SP3
in 2004.

Mark

Reply via email to