Ahh, PGI compiler, that explains it :-)
Ok, thanks. Don't worry about the runs right now. We'll figure out the fix. The code is just *a = (PetscReal)strtod(name,endptr); could be a compiler bug. > On Aug 14, 2019, at 9:23 PM, Mark Adams <mfad...@lbl.gov> wrote: > > I am getting this error with single: > > 22:21 /gpfs/alpine/geo127/scratch/adams$ jsrun -n 1 -a 1 -c 1 -g 1 > ./ex56_single -cells 2,2,2 -ex56_dm_vec_type cuda -ex56_dm_mat_type > aijcusparse -fp_trap > [0] 81 global equations, 27 vertices > [0]PETSC ERROR: *** unknown floating point error occurred *** > [0]PETSC ERROR: The specific exception can be determined by running in a > debugger. When the > [0]PETSC ERROR: debugger traps the signal, the exception can be found with > fetestexcept(0x3e000000) > [0]PETSC ERROR: where the result is a bitwise OR of the following flags: > [0]PETSC ERROR: FE_INVALID=0x20000000 FE_DIVBYZERO=0x4000000 > FE_OVERFLOW=0x10000000 FE_UNDERFLOW=0x8000000 FE_INEXACT=0x2000000 > [0]PETSC ERROR: Try option -start_in_debugger > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] PetscDefaultFPTrap line 355 > /autofs/nccs-svm1_home1/adams/petsc/src/sys/error/fp.c > [0]PETSC ERROR: [0] PetscStrtod line 1964 > /autofs/nccs-svm1_home1/adams/petsc/src/sys/objects/options.c > [0]PETSC ERROR: [0] PetscOptionsStringToReal line 2021 > /autofs/nccs-svm1_home1/adams/petsc/src/sys/objects/options.c > [0]PETSC ERROR: [0] PetscOptionsGetReal line 2321 > /autofs/nccs-svm1_home1/adams/petsc/src/sys/objects/options.c > [0]PETSC ERROR: [0] PetscOptionsReal_Private line 1015 > /autofs/nccs-svm1_home1/adams/petsc/src/sys/objects/aoptions.c > [0]PETSC ERROR: [0] KSPSetFromOptions line 329 > /autofs/nccs-svm1_home1/adams/petsc/src/ksp/ksp/interface/itcl.c > [0]PETSC ERROR: [0] SNESSetFromOptions line 869 > /autofs/nccs-svm1_home1/adams/petsc/src/snes/interface/snes.c > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Floating point exception > [0]PETSC ERROR: trapped floating point error > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.11.3-1685-gd3eb2e1 GIT > Date: 2019-08-13 06:33:29 -0400 > [0]PETSC ERROR: ./ex56_single on a arch-summit-dbg-single-pgi-cuda named > h36n11 by adams Wed Aug 14 22:21:56 2019 > [0]PETSC ERROR: Configure options --with-cc=mpicc --with-cxx=mpiCC > --with-fc=mpif90 COPTFLAGS="-g -Mfcon" CXXOPTFLAGS="-g -Mfcon" FOPTFLAGS="-g > -Mfcon" --with-precision=single --with-ssl=0 --with-batch=0 > --with-mpiexec="jsrun -g 1" --with-cuda=1 --with-cudac=nvcc CUDAFLAGS="-ccbin > pgc++" --download-metis --download-parmetis --download-fblaslapack --with-x=0 > --with-64-bit-indices=0 --with-debugging=1 > PETSC_ARCH=arch-summit-dbg-single-pgi-cuda > [0]PETSC ERROR: #1 User provided function() line 0 in Unknown file > -------------------------------------------------------------------------- > > On Wed, Aug 14, 2019 at 9:51 PM Smith, Barry F. <bsm...@mcs.anl.gov> wrote: > > Oh, doesn't even have to be that large. We just need to be able to look at > the flop rates (as a surrogate for run times) and compare with the previous > runs. So long as the size per process is pretty much the same that is good > enough. > > Barry > > > > On Aug 14, 2019, at 8:45 PM, Mark Adams <mfad...@lbl.gov> wrote: > > > > I can run single, I just can't scale up. But I can use like 1500 processors. > > > > On Wed, Aug 14, 2019 at 9:31 PM Smith, Barry F. <bsm...@mcs.anl.gov> wrote: > > > > Oh, are all your integers 8 bytes? Even on one node? > > > > Once Karl's new middleware is in place we should see about reducing to 4 > > bytes on the GPU. > > > > Barry > > > > > > > On Aug 14, 2019, at 7:44 PM, Mark Adams <mfad...@lbl.gov> wrote: > > > > > > OK, I'll run single. It a bit perverse to run with 4 byte floats and 8 > > > byte integers ... I could use 32 bit ints and just not scale out. > > > > > > On Wed, Aug 14, 2019 at 6:48 PM Smith, Barry F. <bsm...@mcs.anl.gov> > > > wrote: > > > > > > Mark, > > > > > > Oh, I don't even care if it converges, just put in a fixed number of > > > iterations. The idea is to just get a baseline of the possible > > > improvement. > > > > > > ECP is literally dropping millions into research on "multi precision" > > > computations on GPUs, we need to have some actual numbers for the best > > > potential benefit to determine how much we invest in further > > > investigating it, or not. > > > > > > I am not expressing any opinions on the approach, we are just in the > > > fact gathering stage. > > > > > > > > > Barry > > > > > > > > > > On Aug 14, 2019, at 2:27 PM, Mark Adams <mfad...@lbl.gov> wrote: > > > > > > > > > > > > > > > > On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. <bsm...@mcs.anl.gov> > > > > wrote: > > > > > > > > Mark, > > > > > > > > Would you be able to make one run using single precision? Just > > > > single everywhere since that is all we support currently? > > > > > > > > > > > > Experience in engineering at least is single does not work for FE > > > > elasticity. I have tried it many years ago and have heard this from > > > > others. This problem is pretty simple other than using Q2. I suppose I > > > > could try it, but just be aware the FE people might say that single > > > > sucks. > > > > > > > > The results will give us motivation (or anti-motivation) to have > > > > support for running KSP (or PC (or Mat) in single precision while the > > > > simulation is double. > > > > > > > > Thanks. > > > > > > > > Barry > > > > > > > > For example if the GPU speed on KSP is a factor of 3 over the double on > > > > GPUs this is serious motivation. > > > > > > > > > > > > > On Aug 14, 2019, at 12:45 PM, Mark Adams <mfad...@lbl.gov> wrote: > > > > > > > > > > FYI, Here is some scaling data of GAMG on SUMMIT. Getting about 4x > > > > > GPU speedup with 98K dof/proc (3D Q2 elasticity). > > > > > > > > > > This is weak scaling of a solve. There is growth in iteration count > > > > > folded in here. I should put rtol in the title and/or run a fixed > > > > > number of iterations and make it clear in the title. > > > > > > > > > > Comments welcome. > > > > > <out_cpu_012288><out_cpu_001536><out_cuda_012288><out_cpu_000024><out_cpu_000192><out_cuda_001536><out_cuda_000192><out_cuda_000024><weak_scaling_cpu.png><weak_scaling_cuda.png> > > > > > > > > > >