I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and used it by "spack load petsc/fwge6pf". Then I compiled the application code (purely CPU code) linking to the petsc package, hoping that I can get performance improvement using the petsc GPU backend. However, the timing was the same using the same number of MPI tasks with and without GPU accelerators. Have I missed something in the process, for example, setting up PETSc options at runtime to use the GPU backend?
Thanks, Cho