On Tue, Apr 26, 2022 at 8:00 PM Barry Smith <bsm...@petsc.dev> wrote:
> > The current nan output has to be replaced to get the column alignment > correct, I just didn't feel like making that change also in the same MR. > > Something like Unknown or anything that fits in the column space would > be fine. It just means for the given run the timing numbers are not > meaningful/correct for those events. > Just a note, just about every event is NAN for me. My GAMG setup that is all CPU is NAN. High level functions like PtAP as well. That said, adding -log_view_gpu_time is fine. Not worth the churn. > This is to obtain the best meaningful results for the outer events per Jed > since timing the inner events accurately introduces extra time in the outer > events. That is it is not possible to have the best accurate times for both > inner events and outer events in the same run. So if you want to compare > KSPSolve timings, for example, you run as-is, it you want to examine, > low-level vector operations run also with -log_view_gpu_time but know that > the KSP times are higher than need be. > > Sorry for the confusion. > > > > > On Apr 26, 2022, at 3:49 PM, Matthew Knepley <knep...@gmail.com> wrote: > > On Tue, Apr 26, 2022 at 12:03 PM Mark Adams <mfad...@lbl.gov> wrote: > >> Well, Nans are a clear sign that something is very wrong. >> > > Barry chose them so that it could not be mistaken for an actual number. > > Matt > > >> On Tue, Apr 26, 2022 at 11:52 AM Jacob Faibussowitsch < >> jacob....@gmail.com> wrote: >> >>> There is an automatic warning that shows when you do run with >>> `-log_view_gpu_time`, but perhaps there should also be an automatic warning >>> when *not* running with it. It is unfortunate that NaN is the value printed >>> as this implies a bug but AFAIK it is unavoidable (Barry can say more on >>> this though). >>> >>> Best regards, >>> >>> Jacob Faibussowitsch >>> (Jacob Fai - booss - oh - vitch) >>> >>> > On Apr 26, 2022, at 09:48, Jose E. Roman <jro...@dsic.upv.es> wrote: >>> > >>> > You have to add -log_view_gpu_time >>> > See https://gitlab.com/petsc/petsc/-/merge_requests/5056 >>> > >>> > Jose >>> > >>> > >>> >> El 26 abr 2022, a las 16:39, Mark Adams <mfad...@lbl.gov> escribió: >>> >> >>> >> I'm seeing this on Perlmutter with Kokkos-CUDA. Nans in most log >>> timing data except the two 'Solve' lines. >>> >> Just cg/jacobi on snes/ex56. >>> >> >>> >> Any ideas? >>> >> >>> >> VecTDot 2 1.0 nan nan 1.20e+01 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 >>> 0.00e+00 100 >>> >> VecNorm 2 1.0 nan nan 1.00e+01 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 >>> 0.00e+00 100 >>> >> VecCopy 2 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 >>> 0.00e+00 0 >>> >> VecSet 5 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 >>> 0.00e+00 0 >>> >> VecAXPY 4 1.0 nan nan 2.40e+01 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 >>> 0.00e+00 100 >>> >> VecPointwiseMult 1 1.0 nan nan 3.00e+00 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 >>> 0.00e+00 100 >>> >> KSPSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 >>> 0.00e+00 0 >>> >> KSPSolve 1 1.0 4.0514e-04 1.0 5.50e+01 1.0 0.0e+00 >>> 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0 -nan 0 >>> 0.00e+00 0 0.00e+00 100 >>> >> SNESSolve 1 1.0 2.2128e-02 1.0 5.55e+05 1.0 0.0e+00 >>> 0.0e+00 0.0e+00 72 56 0 0 0 100100 0 0 0 25 -nan 0 >>> 0.00e+00 0 0.00e+00 0 >>> > >>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <http://www.cse.buffalo.edu/~knepley/> > > >