Re: [petsc-dev] Kokkos/Crusher perforance

Barry Smith Mon, 24 Jan 2022 11:18:49 -0800

  I think you should contact the crusher ECP technical support team and tell 
them you are getting dismel performance and ask if you should expect better. 
Don't waste time flogging a dead horse.


> On Jan 24, 2022, at 2:16 PM, Matthew Knepley <knep...@gmail.com> wrote:
> 
> On Mon, Jan 24, 2022 at 2:11 PM Junchao Zhang <junchao.zh...@gmail.com 
> <mailto:junchao.zh...@gmail.com>> wrote:
> 
> 
> On Mon, Jan 24, 2022 at 12:55 PM Mark Adams <mfad...@lbl.gov 
> <mailto:mfad...@lbl.gov>> wrote:
> 
> 
> On Mon, Jan 24, 2022 at 1:38 PM Junchao Zhang <junchao.zh...@gmail.com 
> <mailto:junchao.zh...@gmail.com>> wrote:
> Mark, I think you can benchmark individual vector operations, and once we get 
> reasonable profiling results, we can move to solvers etc.
> 
> Can you suggest a code to run or are you suggesting making a vector benchmark 
> code?
> Make a vector benchmark code, testing vector operations that would be used in 
> your solver.
> Also, we can run MatMult() to see if the profiling result is reasonable.
> Only once we get some solid results on basic operations, it is useful to run 
> big codes.
> 
> So we have to make another throw-away code? Why not just look at the vector 
> ops in Mark's actual code?
> 
>    Matt
>  
>  
> 
> --Junchao Zhang
> 
> 
> On Mon, Jan 24, 2022 at 12:09 PM Mark Adams <mfad...@lbl.gov 
> <mailto:mfad...@lbl.gov>> wrote:
> 
> 
> On Mon, Jan 24, 2022 at 12:44 PM Barry Smith <bsm...@petsc.dev 
> <mailto:bsm...@petsc.dev>> wrote:
> 
>   Here except for VecNorm the GPU is used effectively in that most of the 
> time is time is spent doing real work on the GPU
> 
> VecNorm              402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00 0.0e+00 
> 4.0e+02  0  1  0  0 20   9  1  0  0 33 30230   225393      0 0.00e+00    0 
> 0.00e+00 100
> 
> Even the dots are very effective, only the VecNorm flop rate over the full 
> time is much much lower than the vecdot. Which is somehow due to the use of 
> the GPU or CPU MPI in the allreduce?
> 
> The VecNorm GPU rate is relatively high on Crusher and the CPU rate is about 
> the same as the other vec ops. I don't know what to make of that.
> 
> But Crusher is clearly not crushing it. 
> 
> Junchao: Perhaps we should ask Kokkos if they have any experience with 
> Crusher that they can share. They could very well find some low level magic.
> 
> 
> 
> 
> 
>> On Jan 24, 2022, at 12:14 PM, Mark Adams <mfad...@lbl.gov 
>> <mailto:mfad...@lbl.gov>> wrote:
>> 
>> 
>> 
>> Mark, can we compare with Spock?
>> 
>>  Looks much better. This puts two processes/GPU because there are only 4.
>> <jac_out_001_kokkos_Spock_6_1_notpl.txt>
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

Re: [petsc-dev] Kokkos/Crusher perforance

Reply via email to