The most frustrating part is that the issue is not reproducible. Fande,
On Mon, Jul 20, 2020 at 12:36 PM Fande Kong <fdkong...@gmail.com> wrote: > Hi Mark, > > Just to be clear, I do not think it is related to GAMG or PtAP. It is a > communication issue: > > Reran the same code, and I just got : > > [252]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [252]PETSC ERROR: Petsc has generated inconsistent data > [252]PETSC ERROR: Received vector entry 4469094877509280860 out of local > range [255426072,256718616)] > [252]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [252]PETSC ERROR: Petsc Release Version 3.13.3, unknown > [252]PETSC ERROR: ../../griffin-opt on a arch-moose named r5i4n13 by kongf > Mon Jul 20 12:16:47 2020 > [252]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=no > --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 > --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1 > --download-mumps=1 --download-scalapack=1 --download-slepc=1 --with-mpi=1 > --with-cxx-dialect=C++11 --with-fortran-bindings=0 --with-sowing=0 > --with-64-bit-indices --download-mumps=0 > [252]PETSC ERROR: #1 VecAssemblyEnd_MPI_BTS() line 324 in > /home/kongf/workhome/sawtooth/moosers/petsc/src/vec/vec/impls/mpi/pbvec.c > [252]PETSC ERROR: #2 VecAssemblyEnd() line 171 in > /home/kongf/workhome/sawtooth/moosers/petsc/src/vec/vec/interface/vector.c > [cli_252]: aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 252 > > > Thanks, > > Fande, > > On Mon, Jul 20, 2020 at 12:24 PM Mark Adams <mfad...@lbl.gov> wrote: > >> OK, so this is happening in MatProductNumeric_PtAP. This must be in >> constructing the coarse grid. >> >> GAMG sort of wants to coarse at a rate of 30:1 but that needs to be >> verified. With that your index is at about the size of the first coarse >> grid. I'm trying to figure out if the index is valid. But the size of the >> max-index is 740521. This is about what I would guess is the size of the >> second coarse grid. >> >> So it kinda looks like it has a "fine" grid index in the "coarse" grid >> (2nd - 3rd coarse grids). >> >> But Chris is not using GAMG. >> >> Chris: It sounds like you just have one matrix that you give to MUMPS. >> You seem to be creating a matrix in the middle of your run. Are you doing >> dynamic adaptivity? >> >> I think we generate unique tags for each operation but it sounds like >> maybe a message is getting mixed up in some way. >> >> >> >> On Mon, Jul 20, 2020 at 12:35 PM Fande Kong <fdkong...@gmail.com> wrote: >> >>> Hi Mark, >>> >>> Thanks for your reply. >>> >>> On Mon, Jul 20, 2020 at 7:13 AM Mark Adams <mfad...@lbl.gov> wrote: >>> >>>> Fande, >>>> do you know if your 45226154 was out of range in the real matrix? >>>> >>> >>> I do not know since it was in building the AMG hierarchy. The size of >>> the original system is 1,428,284,880 >>> >>> >>>> What size integers do you use? >>>> >>> >>> We are using 64-bit via "--with-64-bit-indices" >>> >>> >>> I am trying to catch the cause of this issue by running more simulations >>> with different configurations. >>> >>> Thanks, >>> >>> Fande, >>> >>> >>> Thanks, >>>> Mark >>>> >>>> On Mon, Jul 20, 2020 at 1:17 AM Fande Kong <fdkong...@gmail.com> wrote: >>>> >>>>> Trace could look like this: >>>>> >>>>> [640]PETSC ERROR: --------------------- Error Message >>>>> -------------------------------------------------------------- >>>>> >>>>> [640]PETSC ERROR: Argument out of range >>>>> >>>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed >>>>> 740521 >>>>> >>>>> [640]PETSC ERROR: See >>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>> shooting. >>>>> >>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>>> >>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by >>>>> wangy2 Sun Jul 19 17:14:28 2020 >>>>> >>>>> [640]PETSC ERROR: Configure options --download-hypre=1 >>>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >>>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >>>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >>>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 >>>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices >>>>> --download-mumps=0 >>>>> >>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>>> >>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>>> >>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>>> >>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>> >>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line >>>>> 901 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>>> >>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line >>>>> 3180 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>>> >>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>> >>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>>> >>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>> >>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>>> >>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>>> >>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>>> >>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>>> >>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>> >>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>> >>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>>> >>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>>> >>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in >>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>>> >>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong <fdkong...@gmail.com> >>>>> wrote: >>>>> >>>>>> I am not entirely sure what is happening, but we encountered similar >>>>>> issues recently. It was not reproducible. It might occur at different >>>>>> stages, and errors could be weird other than "ctable stuff." Our code was >>>>>> Valgrind clean since every PR in moose needs to go through rigorous >>>>>> Valgrind checks before it reaches the devel branch. The errors happened >>>>>> when we used mvapich. >>>>>> >>>>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything was >>>>>> smooth. May you try a different MPI? It is better to try a system >>>>>> carried >>>>>> one. >>>>>> >>>>>> We did not get the bottom of this problem yet, but we at least know >>>>>> this is kind of MPI-related. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Fande, >>>>>> >>>>>> >>>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson <ch...@resfrac.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am having a bug that is occurring in PETSC with the return string: >>>>>>> >>>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater >>>>>>> than >>>>>>> largest key allowed 5693 >>>>>>> >>>>>>> This is using petsc-3.13.2, compiled and running using mpich with >>>>>>> -O3 and debugging turned off tuned to the haswell architecture and >>>>>>> occurring either before or during a KSPBCGS solve/setup or during a >>>>>>> MUMPS >>>>>>> factorization solve (I haven't been able to replicate this issue with >>>>>>> the >>>>>>> same set of instructions etc.). >>>>>>> >>>>>>> This is a terrible way to ask a question, I know, and not very >>>>>>> helpful from your side, but this is what I have from a user's run and >>>>>>> can't >>>>>>> reproduce on my end (either with the optimization compilation or with >>>>>>> debugging turned on). This happens when the code has run for quite some >>>>>>> time and is happening somewhat rarely. >>>>>>> >>>>>>> More than likely I am using a static variable (code is written in >>>>>>> c++) that I'm not updating when the matrix size is changing or something >>>>>>> silly like that, but any help or guidance on this would be appreciated. >>>>>>> >>>>>>> *Chris Hewson* >>>>>>> Senior Reservoir Simulation Engineer >>>>>>> ResFrac >>>>>>> +1.587.575.9792 >>>>>>> >>>>>>