I think you should contact the crusher ECP technical support team and tell them you are getting dismel performance and ask if you should expect better. Don't waste time flogging a dead horse.
> On Jan 24, 2022, at 2:16 PM, Matthew Knepley <knep...@gmail.com> wrote: > > On Mon, Jan 24, 2022 at 2:11 PM Junchao Zhang <junchao.zh...@gmail.com > <mailto:junchao.zh...@gmail.com>> wrote: > > > On Mon, Jan 24, 2022 at 12:55 PM Mark Adams <mfad...@lbl.gov > <mailto:mfad...@lbl.gov>> wrote: > > > On Mon, Jan 24, 2022 at 1:38 PM Junchao Zhang <junchao.zh...@gmail.com > <mailto:junchao.zh...@gmail.com>> wrote: > Mark, I think you can benchmark individual vector operations, and once we get > reasonable profiling results, we can move to solvers etc. > > Can you suggest a code to run or are you suggesting making a vector benchmark > code? > Make a vector benchmark code, testing vector operations that would be used in > your solver. > Also, we can run MatMult() to see if the profiling result is reasonable. > Only once we get some solid results on basic operations, it is useful to run > big codes. > > So we have to make another throw-away code? Why not just look at the vector > ops in Mark's actual code? > > Matt > > > > --Junchao Zhang > > > On Mon, Jan 24, 2022 at 12:09 PM Mark Adams <mfad...@lbl.gov > <mailto:mfad...@lbl.gov>> wrote: > > > On Mon, Jan 24, 2022 at 12:44 PM Barry Smith <bsm...@petsc.dev > <mailto:bsm...@petsc.dev>> wrote: > > Here except for VecNorm the GPU is used effectively in that most of the > time is time is spent doing real work on the GPU > > VecNorm 402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00 0.0e+00 > 4.0e+02 0 1 0 0 20 9 1 0 0 33 30230 225393 0 0.00e+00 0 > 0.00e+00 100 > > Even the dots are very effective, only the VecNorm flop rate over the full > time is much much lower than the vecdot. Which is somehow due to the use of > the GPU or CPU MPI in the allreduce? > > The VecNorm GPU rate is relatively high on Crusher and the CPU rate is about > the same as the other vec ops. I don't know what to make of that. > > But Crusher is clearly not crushing it. > > Junchao: Perhaps we should ask Kokkos if they have any experience with > Crusher that they can share. They could very well find some low level magic. > > > > > >> On Jan 24, 2022, at 12:14 PM, Mark Adams <mfad...@lbl.gov >> <mailto:mfad...@lbl.gov>> wrote: >> >> >> >> Mark, can we compare with Spock? >> >> Looks much better. This puts two processes/GPU because there are only 4. >> <jac_out_001_kokkos_Spock_6_1_notpl.txt> > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>