Here except for VecNorm the GPU is used effectively in that most of the time 
is time is spent doing real work on the GPU

VecNorm              402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00 0.0e+00 
4.0e+02  0  1  0  0 20   9  1  0  0 33 30230   225393      0 0.00e+00    0 
0.00e+00 100

Even the dots are very effective, only the VecNorm flop rate over the full time 
is much much lower than the vecdot. Which is somehow due to the use of the GPU 
or CPU MPI in the allreduce?



> On Jan 24, 2022, at 12:14 PM, Mark Adams <mfad...@lbl.gov> wrote:
> 
> 
> 
> Mark, can we compare with Spock?
> 
>  Looks much better. This puts two processes/GPU because there are only 4.
> <jac_out_001_kokkos_Spock_6_1_notpl.txt>

Reply via email to