> On Jan 23, 2022, at 11:47 PM, Jed Brown <j...@jedbrown.org> wrote:
> 
> Barry Smith via petsc-dev <petsc-dev@mcs.anl.gov> writes:
> 
>>  The PetscLogGpuTimeBegin()/End was written by Hong so it works with events 
>> to get a GPU timing, it is not suppose to include the CPU kernel launch 
>> times or the time to move the scalar arguments to the GPU. It may not be 
>> perfect but it is the best we can do to capture the time the GPU is actively 
>> doing the numerics, which is what we want.
> 
> As we discussed at the time, collecting the results can be asynchronous and 
> this would be useful to reduce the negative impact of profiling on end-to-end 
> performance.
> 
> But I think what's proposed here is okay because PetscLogGpuTimeBegin() 
> starts counting when the device reaches that point, not when it's given on 
> the host.

  This is how it is suppose to work.

  We should make it easy to turn off the logging and synchronizations (from 
PetscLogGpu) for everything Vec and below, and everything Mat and below to 
remove all the synchronizations needed for the low level timing. I think we can 
do that by having  PetscLogGpu take a PETSc class id argument.

Reply via email to