The most frustrating part is that the issue is not reproducible.

Fande,

On Mon, Jul 20, 2020 at 12:36 PM Fande Kong <fdkong...@gmail.com> wrote:

> Hi Mark,
>
> Just to be clear, I do not think it is related to GAMG or PtAP. It is a
> communication issue:
>
> Reran the same code, and I just got :
>
> [252]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [252]PETSC ERROR: Petsc has generated inconsistent data
> [252]PETSC ERROR: Received vector entry 4469094877509280860 out of local
> range [255426072,256718616)]
> [252]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [252]PETSC ERROR: Petsc Release Version 3.13.3, unknown
> [252]PETSC ERROR: ../../griffin-opt on a arch-moose named r5i4n13 by kongf
> Mon Jul 20 12:16:47 2020
> [252]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=no
> --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1
> --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1
> --download-mumps=1 --download-scalapack=1 --download-slepc=1 --with-mpi=1
> --with-cxx-dialect=C++11 --with-fortran-bindings=0 --with-sowing=0
> --with-64-bit-indices --download-mumps=0
> [252]PETSC ERROR: #1 VecAssemblyEnd_MPI_BTS() line 324 in
> /home/kongf/workhome/sawtooth/moosers/petsc/src/vec/vec/impls/mpi/pbvec.c
> [252]PETSC ERROR: #2 VecAssemblyEnd() line 171 in
> /home/kongf/workhome/sawtooth/moosers/petsc/src/vec/vec/interface/vector.c
> [cli_252]: aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 252
>
>
> Thanks,
>
> Fande,
>
> On Mon, Jul 20, 2020 at 12:24 PM Mark Adams <mfad...@lbl.gov> wrote:
>
>> OK, so this is happening in MatProductNumeric_PtAP. This must be in
>> constructing the coarse grid.
>>
>> GAMG sort of wants to coarse at a rate of 30:1 but that needs to be
>> verified. With that your index is at about the size of the first coarse
>> grid. I'm trying to figure out if the index is valid. But the size of the
>> max-index is 740521. This is about what I would guess is the size of the
>> second coarse grid.
>>
>> So it kinda looks like it has a "fine" grid index in the "coarse" grid
>> (2nd - 3rd coarse grids).
>>
>> But Chris is not using GAMG.
>>
>> Chris: It sounds like you just have one matrix that you give to MUMPS.
>> You seem to be creating a matrix in the middle of your run. Are you doing
>> dynamic adaptivity?
>>
>> I think we generate unique tags for each operation but it sounds like
>> maybe a message is getting mixed up in some way.
>>
>>
>>
>> On Mon, Jul 20, 2020 at 12:35 PM Fande Kong <fdkong...@gmail.com> wrote:
>>
>>> Hi Mark,
>>>
>>> Thanks for your reply.
>>>
>>> On Mon, Jul 20, 2020 at 7:13 AM Mark Adams <mfad...@lbl.gov> wrote:
>>>
>>>> Fande,
>>>> do you know if your 45226154 was out of range in the real  matrix?
>>>>
>>>
>>> I do not know since it was in building the AMG hierarchy.  The size of
>>> the original system is 1,428,284,880
>>>
>>>
>>>> What size integers do you use?
>>>>
>>>
>>> We are using 64-bit via "--with-64-bit-indices"
>>>
>>>
>>> I am trying to catch the cause of this issue by running more simulations
>>> with different configurations.
>>>
>>> Thanks,
>>>
>>> Fande,
>>>
>>>
>>> Thanks,
>>>> Mark
>>>>
>>>> On Mon, Jul 20, 2020 at 1:17 AM Fande Kong <fdkong...@gmail.com> wrote:
>>>>
>>>>> Trace could look like this:
>>>>>
>>>>> [640]PETSC ERROR: --------------------- Error Message
>>>>> --------------------------------------------------------------
>>>>>
>>>>> [640]PETSC ERROR: Argument out of range
>>>>>
>>>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed
>>>>> 740521
>>>>>
>>>>> [640]PETSC ERROR: See
>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>>>> shooting.
>>>>>
>>>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown
>>>>>
>>>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by
>>>>> wangy2 Sun Jul 19 17:14:28 2020
>>>>>
>>>>> [640]PETSC ERROR: Configure options --download-hypre=1
>>>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1
>>>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1
>>>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1
>>>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11
>>>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices
>>>>> --download-mumps=0
>>>>>
>>>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h
>>>>>
>>>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c
>>>>>
>>>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c
>>>>>
>>>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c
>>>>>
>>>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line
>>>>> 901 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c
>>>>>
>>>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line
>>>>> 3180 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c
>>>>>
>>>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c
>>>>>
>>>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c
>>>>>
>>>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c
>>>>>
>>>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c
>>>>>
>>>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c
>>>>>
>>>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c
>>>>>
>>>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c
>>>>>
>>>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c
>>>>>
>>>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c
>>>>>
>>>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c
>>>>>
>>>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c
>>>>>
>>>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in
>>>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c
>>>>>
>>>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong <fdkong...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I am not entirely sure what is happening, but we encountered similar
>>>>>> issues recently.  It was not reproducible. It might occur at different
>>>>>> stages, and errors could be weird other than "ctable stuff." Our code was
>>>>>> Valgrind clean since every PR in moose needs to go through rigorous
>>>>>> Valgrind checks before it reaches the devel branch.  The errors happened
>>>>>> when we used mvapich.
>>>>>>
>>>>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything was
>>>>>> smooth.  May you try a different MPI? It is better to try a system 
>>>>>> carried
>>>>>> one.
>>>>>>
>>>>>> We did not get the bottom of this problem yet, but we at least know
>>>>>> this is kind of MPI-related.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Fande,
>>>>>>
>>>>>>
>>>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson <ch...@resfrac.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am having a bug that is occurring in PETSC with the return string:
>>>>>>>
>>>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in
>>>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater 
>>>>>>> than
>>>>>>> largest key allowed 5693
>>>>>>>
>>>>>>> This is using petsc-3.13.2, compiled and running using mpich with
>>>>>>> -O3 and debugging turned off tuned to the haswell architecture and
>>>>>>> occurring either before or during a KSPBCGS solve/setup or during a 
>>>>>>> MUMPS
>>>>>>> factorization solve (I haven't been able to replicate this issue with 
>>>>>>> the
>>>>>>> same set of instructions etc.).
>>>>>>>
>>>>>>> This is a terrible way to ask a question, I know, and not very
>>>>>>> helpful from your side, but this is what I have from a user's run and 
>>>>>>> can't
>>>>>>> reproduce on my end (either with the optimization compilation or with
>>>>>>> debugging turned on). This happens when the code has run for quite some
>>>>>>> time and is happening somewhat rarely.
>>>>>>>
>>>>>>> More than likely I am using a static variable (code is written in
>>>>>>> c++) that I'm not updating when the matrix size is changing or something
>>>>>>> silly like that, but any help or guidance on this would be appreciated.
>>>>>>>
>>>>>>> *Chris Hewson*
>>>>>>> Senior Reservoir Simulation Engineer
>>>>>>> ResFrac
>>>>>>> +1.587.575.9792
>>>>>>>
>>>>>>

Reply via email to