> On Jan 24, 2022, at 2:46 PM, Mark Adams <mfad...@lbl.gov> wrote:
> 
> Yea, CG/Jacobi is as close to a benchmark code as we could want. I could run 
> this on one processor to get cleaner numbers.
> 
> Is there a designated ECP technical support contact?

   Mark, you've forgotten you work for DOE. There isn't a non-ECP technical 
support contact. 

   But if this is an AMD machine then maybe contact Matt's student Justin Chang?



> 
> 
> On Mon, Jan 24, 2022 at 2:18 PM Barry Smith <bsm...@petsc.dev 
> <mailto:bsm...@petsc.dev>> wrote:
> 
>   I think you should contact the crusher ECP technical support team and tell 
> them you are getting dismel performance and ask if you should expect better. 
> Don't waste time flogging a dead horse. 
> 
>> On Jan 24, 2022, at 2:16 PM, Matthew Knepley <knep...@gmail.com 
>> <mailto:knep...@gmail.com>> wrote:
>> 
>> On Mon, Jan 24, 2022 at 2:11 PM Junchao Zhang <junchao.zh...@gmail.com 
>> <mailto:junchao.zh...@gmail.com>> wrote:
>> 
>> 
>> On Mon, Jan 24, 2022 at 12:55 PM Mark Adams <mfad...@lbl.gov 
>> <mailto:mfad...@lbl.gov>> wrote:
>> 
>> 
>> On Mon, Jan 24, 2022 at 1:38 PM Junchao Zhang <junchao.zh...@gmail.com 
>> <mailto:junchao.zh...@gmail.com>> wrote:
>> Mark, I think you can benchmark individual vector operations, and once we 
>> get reasonable profiling results, we can move to solvers etc.
>> 
>> Can you suggest a code to run or are you suggesting making a vector 
>> benchmark code?
>> Make a vector benchmark code, testing vector operations that would be used 
>> in your solver.
>> Also, we can run MatMult() to see if the profiling result is reasonable.
>> Only once we get some solid results on basic operations, it is useful to run 
>> big codes.
>> 
>> So we have to make another throw-away code? Why not just look at the vector 
>> ops in Mark's actual code?
>> 
>>    Matt
>>  
>>  
>> 
>> --Junchao Zhang
>> 
>> 
>> On Mon, Jan 24, 2022 at 12:09 PM Mark Adams <mfad...@lbl.gov 
>> <mailto:mfad...@lbl.gov>> wrote:
>> 
>> 
>> On Mon, Jan 24, 2022 at 12:44 PM Barry Smith <bsm...@petsc.dev 
>> <mailto:bsm...@petsc.dev>> wrote:
>> 
>>   Here except for VecNorm the GPU is used effectively in that most of the 
>> time is time is spent doing real work on the GPU
>> 
>> VecNorm              402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00 0.0e+00 
>> 4.0e+02  0  1  0  0 20   9  1  0  0 33 30230   225393      0 0.00e+00    0 
>> 0.00e+00 100
>> 
>> Even the dots are very effective, only the VecNorm flop rate over the full 
>> time is much much lower than the vecdot. Which is somehow due to the use of 
>> the GPU or CPU MPI in the allreduce?
>> 
>> The VecNorm GPU rate is relatively high on Crusher and the CPU rate is about 
>> the same as the other vec ops. I don't know what to make of that.
>> 
>> But Crusher is clearly not crushing it. 
>> 
>> Junchao: Perhaps we should ask Kokkos if they have any experience with 
>> Crusher that they can share. They could very well find some low level magic.
>> 
>> 
>> 
>> 
>> 
>>> On Jan 24, 2022, at 12:14 PM, Mark Adams <mfad...@lbl.gov 
>>> <mailto:mfad...@lbl.gov>> wrote:
>>> 
>>> 
>>> 
>>> Mark, can we compare with Spock?
>>> 
>>>  Looks much better. This puts two processes/GPU because there are only 4.
>>> <jac_out_001_kokkos_Spock_6_1_notpl.txt>
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> 

Reply via email to