Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
Valgrind was not useful. Just an MPI abort message with 128 process output.
Can we merge my MR and I can test your branch.

On Wed, Jan 26, 2022 at 2:51 PM Barry Smith  wrote:

>
>   I have added a mini-MR to print out the key so we can see if it is 0 or
> some crazy number. https://gitlab.com/petsc/petsc/-/merge_requests/4766
>
>   Note that the table data structure is not sent through MPI so if MPI is
> the culprit it is not just that MPI is putting incorrect (or no)
> information in the receive buffer; it is that MPI is seemingly messing up
> other data.
>
> On Jan 26, 2022, at 2:25 PM, Mark Adams  wrote:
>
> I have used valgrind here. I did not run it on this MPI error. I will.
>
> On Wed, Jan 26, 2022 at 10:56 AM Barry Smith  wrote:
>
>>
>>   Any way to run with valgrind (or a HIP variant of valgrind)? It looks
>> like a memory corruption issue and tracking down exactly when the
>> corruption begins is 3/4's of the way to finding the exact cause.
>>
>>   Are the crashes reproducible in the same place with identical runs?
>>
>>
>> On Jan 26, 2022, at 10:46 AM, Mark Adams  wrote:
>>
>> I think it is an MPI bug. It works with GPU aware MPI turned off.
>> I am sure Summit will be fine.
>> We have had users fix this error by switching thier MPI.
>>
>> On Wed, Jan 26, 2022 at 10:10 AM Junchao Zhang 
>> wrote:
>>
>>> I don't know if this is due to bugs in petsc/kokkos backend.   See if
>>> you can run 6 nodes (48 mpi ranks).  If it fails, then run the same problem
>>> on Summit with 8 nodes to see if it still fails. If yes, it is likely a bug
>>> of our own.
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Wed, Jan 26, 2022 at 8:44 AM Mark Adams  wrote:
>>>
 I am not able to reproduce this with a small problem. 2 nodes or less
 refinement works. This is from the 8 node test, the -dm_refine 5 version.
 I see that it comes from PtAP.
 This is on the fine grid. (I was thinking it could be on a reduced grid
 with idle processors, but no)

 [15]PETSC ERROR: Argument out of range
 [15]PETSC ERROR: Key <= 0
 [15]PETSC ERROR: See https://petsc.org/release/faq/ for trouble
 shooting.
 [15]PETSC ERROR: Petsc Development GIT revision:
 v3.16.3-696-g46640c56cb  GIT Date: 2022-01-25 09:20:51 -0500
 [15]PETSC ERROR:
 /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a
 arch-olcf-crusher named crusher020 by adams Wed Jan 26 08:35:47 2022
 [15]PETSC ERROR: Configure options --with-cc=cc --with-cxx=CC
 --with-fc=ftn --with-fortran-bindings=0
 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0
 --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" --FOPTFLAGS=-g
 --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00"
 --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a
 --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0
 --download-p4est=1
 --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4
 PETSC_ARCH=arch-olcf-crusher
 [15]PETSC ERROR: #1 PetscTableFind() at
 /gpfs/alpine/csc314/scratch/adams/petsc/include/petscctable.h:131
 [15]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mmaij.c:35
 [15]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mpiaij.c:735
 [15]PETSC ERROR: #4 MatAssemblyEnd_MPIAIJKokkos() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:14
 [15]PETSC ERROR: #5 MatAssemblyEnd() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:5678
 [15]PETSC ERROR: #6 MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices()
 at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:267
 [15]PETSC ERROR: #7 MatSetMPIAIJKokkosWithGlobalCSRMatrix() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:825
 [15]PETSC ERROR: #8 MatProductSymbolic_MPIAIJKokkos() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1167
 [15]PETSC ERROR: #9 MatProductSymbolic() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matproduct.c:825
 [15]PETSC ERROR: #10 MatPtAP() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:9656
 [15]PETSC ERROR: #11 PCGAMGCreateLevel_GAMG() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:87
 [15]PETSC ERROR: #12 PCSetUp_GAMG() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:663
 [15]PETSC ERROR: #13 PCSetUp() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/interface/precon.c:1017
 [15]PETSC ERROR: #14 KSPSetUp() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:41

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Barry Smith

  I have added a mini-MR to print out the key so we can see if it is 0 or some 
crazy number. https://gitlab.com/petsc/petsc/-/merge_requests/4766

  Note that the table data structure is not sent through MPI so if MPI is the 
culprit it is not just that MPI is putting incorrect (or no) information in the 
receive buffer; it is that MPI is seemingly messing up other data.

> On Jan 26, 2022, at 2:25 PM, Mark Adams  wrote:
> 
> I have used valgrind here. I did not run it on this MPI error. I will.
> 
> On Wed, Jan 26, 2022 at 10:56 AM Barry Smith  > wrote:
> 
>   Any way to run with valgrind (or a HIP variant of valgrind)? It looks like 
> a memory corruption issue and tracking down exactly when the corruption 
> begins is 3/4's of the way to finding the exact cause.
> 
>   Are the crashes reproducible in the same place with identical runs?
> 
> 
>> On Jan 26, 2022, at 10:46 AM, Mark Adams > > wrote:
>> 
>> I think it is an MPI bug. It works with GPU aware MPI turned off. 
>> I am sure Summit will be fine.
>> We have had users fix this error by switching thier MPI.
>> 
>> On Wed, Jan 26, 2022 at 10:10 AM Junchao Zhang > > wrote:
>> I don't know if this is due to bugs in petsc/kokkos backend.   See if you 
>> can run 6 nodes (48 mpi ranks).  If it fails, then run the same problem on 
>> Summit with 8 nodes to see if it still fails. If yes, it is likely a bug of 
>> our own.
>> 
>> --Junchao Zhang
>> 
>> 
>> On Wed, Jan 26, 2022 at 8:44 AM Mark Adams > > wrote:
>> I am not able to reproduce this with a small problem. 2 nodes or less 
>> refinement works. This is from the 8 node test, the -dm_refine 5 version.
>> I see that it comes from PtAP.
>> This is on the fine grid. (I was thinking it could be on a reduced grid with 
>> idle processors, but no)
>> 
>> [15]PETSC ERROR: Argument out of range
>> [15]PETSC ERROR: Key <= 0
>> [15]PETSC ERROR: See https://petsc.org/release/faq/ 
>>  for trouble shooting.
>> [15]PETSC ERROR: Petsc Development GIT revision: v3.16.3-696-g46640c56cb  
>> GIT Date: 2022-01-25 09:20:51 -0500
>> [15]PETSC ERROR: 
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a 
>> arch-olcf-crusher named crusher020 by adams Wed Jan 26 08:35:47 2022
>> [15]PETSC ERROR: Configure options --with-cc=cc --with-cxx=CC --with-fc=ftn 
>> --with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib 
>> -lmpi_gtl_hsa" --with-debugging=0 --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" 
>> --FOPTFLAGS=-g --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 
>> 00:10:00" --with-hip --with-hipc=hipcc --download-hypre 
>> --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels 
>> --with-kokkos-kernels-tpl=0 --download-p4est=1 
>> --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4
>>  PETSC_ARCH=arch-olcf-crusher
>> [15]PETSC ERROR: #1 PetscTableFind() at 
>> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscctable.h:131
>> [15]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() at 
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mmaij.c:35
>> [15]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() at 
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mpiaij.c:735
>> [15]PETSC ERROR: #4 MatAssemblyEnd_MPIAIJKokkos() at 
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:14
>> [15]PETSC ERROR: #5 MatAssemblyEnd() at 
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:5678
>> [15]PETSC ERROR: #6 MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices() at 
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:267
>> [15]PETSC ERROR: #7 MatSetMPIAIJKokkosWithGlobalCSRMatrix() at 
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:825
>> [15]PETSC ERROR: #8 MatProductSymbolic_MPIAIJKokkos() at 
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1167
>> [15]PETSC ERROR: #9 MatProductSymbolic() at 
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matproduct.c:825
>> [15]PETSC ERROR: #10 MatPtAP() at 
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:9656
>> [15]PETSC ERROR: #11 PCGAMGCreateLevel_GAMG() at 
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:87
>> [15]PETSC ERROR: #12 PCSetUp_GAMG() at 
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:663
>> [15]PETSC ERROR: #13 PCSetUp() at 
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/interface/precon.c:1017
>> [15]PETSC ERROR: #14 KSPSetUp() at 
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:417
>> [15]PETSC ERROR: #15 KSPSolve_Private() at 
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:863
>> [15]PET

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
On Wed, Jan 26, 2022 at 2:32 PM Justin Chang  wrote:

> rocgdb requires "-ggdb" in addition to "-g"
>

Ah, OK.


>
> What happens if you lower AMD_LOG_LEVEL to something like 1 or 2? I was
> hoping AMD_LOG_LEVEL could at least give you something like a "stacktrace"
> showing what the last successful HIP/HSA call was. I believe it should also
> show line numbers in the code.
>

I get a stack trace. The failure happens in our code. We can not find an
index that we received. The error message does not have the bad index. It
used to.
We have seen this before with buggy MPIs.


>
> On Wed, Jan 26, 2022 at 1:29 PM Mark Adams  wrote:
>
>>
>>
>> On Wed, Jan 26, 2022 at 1:54 PM Justin Chang  wrote:
>>
>>> Couple suggestions:
>>>
>>> 1. Set the environment variable "export AMD_LOG_LEVEL=3" <- this will
>>> tell you everything that's happening at the HIP level (memcpy's, mallocs,
>>> kernel execution time, etc)
>>>
>>
>> Humm, My reproducer uses 2 nodes and 128 processes. Don't think I could
>> do much with this flood of data.
>>
>>
>>> 2. Try rocgdb, AFAIK this is the closest "HIP variant of valgrind" that
>>> we officially support.
>>>
>>
>> rocgdb just sat there reading symbols forever. I look at your doc.
>> Valgrind seem OK here.
>>
>>
>>> There are some tricks on running this together with mpi, to which you
>>> can just google "mpi with gdb". But you can see how rocgdb works here:
>>> https://www.olcf.ornl.gov/wp-content/uploads/2021/04/rocgdb_hipmath_ornl_2021_v2.pdf
>>>
>>>
>>> On Wed, Jan 26, 2022 at 9:56 AM Barry Smith  wrote:
>>>

   Any way to run with valgrind (or a HIP variant of valgrind)? It looks
 like a memory corruption issue and tracking down exactly when the
 corruption begins is 3/4's of the way to finding the exact cause.

   Are the crashes reproducible in the same place with identical runs?


 On Jan 26, 2022, at 10:46 AM, Mark Adams  wrote:

 I think it is an MPI bug. It works with GPU aware MPI turned off.
 I am sure Summit will be fine.
 We have had users fix this error by switching thier MPI.

 On Wed, Jan 26, 2022 at 10:10 AM Junchao Zhang 
 wrote:

> I don't know if this is due to bugs in petsc/kokkos backend.   See if
> you can run 6 nodes (48 mpi ranks).  If it fails, then run the same 
> problem
> on Summit with 8 nodes to see if it still fails. If yes, it is likely a 
> bug
> of our own.
>
> --Junchao Zhang
>
>
> On Wed, Jan 26, 2022 at 8:44 AM Mark Adams  wrote:
>
>> I am not able to reproduce this with a small problem. 2 nodes or less
>> refinement works. This is from the 8 node test, the -dm_refine 5 version.
>> I see that it comes from PtAP.
>> This is on the fine grid. (I was thinking it could be on a reduced
>> grid with idle processors, but no)
>>
>> [15]PETSC ERROR: Argument out of range
>> [15]PETSC ERROR: Key <= 0
>> [15]PETSC ERROR: See https://petsc.org/release/faq/ for trouble
>> shooting.
>> [15]PETSC ERROR: Petsc Development GIT revision:
>> v3.16.3-696-g46640c56cb  GIT Date: 2022-01-25 09:20:51 -0500
>> [15]PETSC ERROR:
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a
>> arch-olcf-crusher named crusher020 by adams Wed Jan 26 08:35:47 2022
>> [15]PETSC ERROR: Configure options --with-cc=cc --with-cxx=CC
>> --with-fc=ftn --with-fortran-bindings=0
>> LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" 
>> --with-debugging=0
>> --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" --FOPTFLAGS=-g
>> --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00"
>> --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a
>> --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0
>> --download-p4est=1
>> --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4
>> PETSC_ARCH=arch-olcf-crusher
>> [15]PETSC ERROR: #1 PetscTableFind() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscctable.h:131
>> [15]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mmaij.c:35
>> [15]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mpiaij.c:735
>> [15]PETSC ERROR: #4 MatAssemblyEnd_MPIAIJKokkos() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:14
>> [15]PETSC ERROR: #5 MatAssemblyEnd() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:5678
>> [15]PETSC ERROR: #6 MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices()
>> at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:267
>> [15]PETSC ERROR: #7 MatSetMPIAIJKokkosWithGlobalCSRMatrix() at
>> /gpfs/alpine/csc314/scratch/a

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
>
>
>   Are the crashes reproducible in the same place with identical runs?
>
>
I have not seen my repoducer work and it is in MatAssemblyEnd with not
finding a table entry. I can't tell if it is the same error everytime.


Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Justin Chang
rocgdb requires "-ggdb" in addition to "-g"

What happens if you lower AMD_LOG_LEVEL to something like 1 or 2? I was
hoping AMD_LOG_LEVEL could at least give you something like a "stacktrace"
showing what the last successful HIP/HSA call was. I believe it should also
show line numbers in the code.

On Wed, Jan 26, 2022 at 1:29 PM Mark Adams  wrote:

>
>
> On Wed, Jan 26, 2022 at 1:54 PM Justin Chang  wrote:
>
>> Couple suggestions:
>>
>> 1. Set the environment variable "export AMD_LOG_LEVEL=3" <- this will
>> tell you everything that's happening at the HIP level (memcpy's, mallocs,
>> kernel execution time, etc)
>>
>
> Humm, My reproducer uses 2 nodes and 128 processes. Don't think I could do
> much with this flood of data.
>
>
>> 2. Try rocgdb, AFAIK this is the closest "HIP variant of valgrind" that
>> we officially support.
>>
>
> rocgdb just sat there reading symbols forever. I look at your doc.
> Valgrind seem OK here.
>
>
>> There are some tricks on running this together with mpi, to which you can
>> just google "mpi with gdb". But you can see how rocgdb works here:
>> https://www.olcf.ornl.gov/wp-content/uploads/2021/04/rocgdb_hipmath_ornl_2021_v2.pdf
>>
>>
>> On Wed, Jan 26, 2022 at 9:56 AM Barry Smith  wrote:
>>
>>>
>>>   Any way to run with valgrind (or a HIP variant of valgrind)? It looks
>>> like a memory corruption issue and tracking down exactly when the
>>> corruption begins is 3/4's of the way to finding the exact cause.
>>>
>>>   Are the crashes reproducible in the same place with identical runs?
>>>
>>>
>>> On Jan 26, 2022, at 10:46 AM, Mark Adams  wrote:
>>>
>>> I think it is an MPI bug. It works with GPU aware MPI turned off.
>>> I am sure Summit will be fine.
>>> We have had users fix this error by switching thier MPI.
>>>
>>> On Wed, Jan 26, 2022 at 10:10 AM Junchao Zhang 
>>> wrote:
>>>
 I don't know if this is due to bugs in petsc/kokkos backend.   See if
 you can run 6 nodes (48 mpi ranks).  If it fails, then run the same problem
 on Summit with 8 nodes to see if it still fails. If yes, it is likely a bug
 of our own.

 --Junchao Zhang


 On Wed, Jan 26, 2022 at 8:44 AM Mark Adams  wrote:

> I am not able to reproduce this with a small problem. 2 nodes or less
> refinement works. This is from the 8 node test, the -dm_refine 5 version.
> I see that it comes from PtAP.
> This is on the fine grid. (I was thinking it could be on a reduced
> grid with idle processors, but no)
>
> [15]PETSC ERROR: Argument out of range
> [15]PETSC ERROR: Key <= 0
> [15]PETSC ERROR: See https://petsc.org/release/faq/ for trouble
> shooting.
> [15]PETSC ERROR: Petsc Development GIT revision:
> v3.16.3-696-g46640c56cb  GIT Date: 2022-01-25 09:20:51 -0500
> [15]PETSC ERROR:
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a
> arch-olcf-crusher named crusher020 by adams Wed Jan 26 08:35:47 2022
> [15]PETSC ERROR: Configure options --with-cc=cc --with-cxx=CC
> --with-fc=ftn --with-fortran-bindings=0
> LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" 
> --with-debugging=0
> --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" --FOPTFLAGS=-g
> --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00"
> --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a
> --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0
> --download-p4est=1
> --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4
> PETSC_ARCH=arch-olcf-crusher
> [15]PETSC ERROR: #1 PetscTableFind() at
> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscctable.h:131
> [15]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mmaij.c:35
> [15]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mpiaij.c:735
> [15]PETSC ERROR: #4 MatAssemblyEnd_MPIAIJKokkos() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:14
> [15]PETSC ERROR: #5 MatAssemblyEnd() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:5678
> [15]PETSC ERROR: #6 MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices()
> at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:267
> [15]PETSC ERROR: #7 MatSetMPIAIJKokkosWithGlobalCSRMatrix() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:825
> [15]PETSC ERROR: #8 MatProductSymbolic_MPIAIJKokkos() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1167
> [15]PETSC ERROR: #9 MatProductSymbolic() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matproduct.c:825
> [15]PETSC ERROR: #10 MatPtAP() 

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
On Wed, Jan 26, 2022 at 1:54 PM Justin Chang  wrote:

> Couple suggestions:
>
> 1. Set the environment variable "export AMD_LOG_LEVEL=3" <- this will tell
> you everything that's happening at the HIP level (memcpy's, mallocs, kernel
> execution time, etc)
>

Humm, My reproducer uses 2 nodes and 128 processes. Don't think I could do
much with this flood of data.


> 2. Try rocgdb, AFAIK this is the closest "HIP variant of valgrind" that we
> officially support.
>

rocgdb just sat there reading symbols forever. I look at your doc.
Valgrind seem OK here.


> There are some tricks on running this together with mpi, to which you can
> just google "mpi with gdb". But you can see how rocgdb works here:
> https://www.olcf.ornl.gov/wp-content/uploads/2021/04/rocgdb_hipmath_ornl_2021_v2.pdf
>
>
> On Wed, Jan 26, 2022 at 9:56 AM Barry Smith  wrote:
>
>>
>>   Any way to run with valgrind (or a HIP variant of valgrind)? It looks
>> like a memory corruption issue and tracking down exactly when the
>> corruption begins is 3/4's of the way to finding the exact cause.
>>
>>   Are the crashes reproducible in the same place with identical runs?
>>
>>
>> On Jan 26, 2022, at 10:46 AM, Mark Adams  wrote:
>>
>> I think it is an MPI bug. It works with GPU aware MPI turned off.
>> I am sure Summit will be fine.
>> We have had users fix this error by switching thier MPI.
>>
>> On Wed, Jan 26, 2022 at 10:10 AM Junchao Zhang 
>> wrote:
>>
>>> I don't know if this is due to bugs in petsc/kokkos backend.   See if
>>> you can run 6 nodes (48 mpi ranks).  If it fails, then run the same problem
>>> on Summit with 8 nodes to see if it still fails. If yes, it is likely a bug
>>> of our own.
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Wed, Jan 26, 2022 at 8:44 AM Mark Adams  wrote:
>>>
 I am not able to reproduce this with a small problem. 2 nodes or less
 refinement works. This is from the 8 node test, the -dm_refine 5 version.
 I see that it comes from PtAP.
 This is on the fine grid. (I was thinking it could be on a reduced grid
 with idle processors, but no)

 [15]PETSC ERROR: Argument out of range
 [15]PETSC ERROR: Key <= 0
 [15]PETSC ERROR: See https://petsc.org/release/faq/ for trouble
 shooting.
 [15]PETSC ERROR: Petsc Development GIT revision:
 v3.16.3-696-g46640c56cb  GIT Date: 2022-01-25 09:20:51 -0500
 [15]PETSC ERROR:
 /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a
 arch-olcf-crusher named crusher020 by adams Wed Jan 26 08:35:47 2022
 [15]PETSC ERROR: Configure options --with-cc=cc --with-cxx=CC
 --with-fc=ftn --with-fortran-bindings=0
 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0
 --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" --FOPTFLAGS=-g
 --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00"
 --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a
 --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0
 --download-p4est=1
 --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4
 PETSC_ARCH=arch-olcf-crusher
 [15]PETSC ERROR: #1 PetscTableFind() at
 /gpfs/alpine/csc314/scratch/adams/petsc/include/petscctable.h:131
 [15]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mmaij.c:35
 [15]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mpiaij.c:735
 [15]PETSC ERROR: #4 MatAssemblyEnd_MPIAIJKokkos() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:14
 [15]PETSC ERROR: #5 MatAssemblyEnd() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:5678
 [15]PETSC ERROR: #6 MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices()
 at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:267
 [15]PETSC ERROR: #7 MatSetMPIAIJKokkosWithGlobalCSRMatrix() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:825
 [15]PETSC ERROR: #8 MatProductSymbolic_MPIAIJKokkos() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1167
 [15]PETSC ERROR: #9 MatProductSymbolic() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matproduct.c:825
 [15]PETSC ERROR: #10 MatPtAP() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:9656
 [15]PETSC ERROR: #11 PCGAMGCreateLevel_GAMG() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:87
 [15]PETSC ERROR: #12 PCSetUp_GAMG() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:663
 [15]PETSC ERROR: #13 PCSetUp() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/interface/precon.c:1017
 [15]PETSC E

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
On Wed, Jan 26, 2022 at 2:25 PM Mark Adams  wrote:

> I have used valgrind here. I did not run it on this MPI error. I will.
>
> On Wed, Jan 26, 2022 at 10:56 AM Barry Smith  wrote:
>
>>
>>   Any way to run with valgrind (or a HIP variant of valgrind)? It looks
>> like a memory corruption issue and tracking down exactly when the
>> corruption begins is 3/4's of the way to finding the exact cause.
>>
>>   Are the crashes reproducible in the same place with identical runs?
>>
>>
>> On Jan 26, 2022, at 10:46 AM, Mark Adams  wrote:
>>
>> I think it is an MPI bug. It works with GPU aware MPI turned off.
>> I am sure Summit will be fine.
>> We have had users fix this error by switching thier MPI.
>>
>> On Wed, Jan 26, 2022 at 10:10 AM Junchao Zhang 
>> wrote:
>>
>>> I don't know if this is due to bugs in petsc/kokkos backend.   See if
>>> you can run 6 nodes (48 mpi ranks).  If it fails, then run the same problem
>>> on Summit with 8 nodes to see if it still fails. If yes, it is likely a bug
>>> of our own.
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Wed, Jan 26, 2022 at 8:44 AM Mark Adams  wrote:
>>>
 I am not able to reproduce this with a small problem. 2 nodes or less
 refinement works. This is from the 8 node test, the -dm_refine 5 version.
 I see that it comes from PtAP.
 This is on the fine grid. (I was thinking it could be on a reduced grid
 with idle processors, but no)

 [15]PETSC ERROR: Argument out of range
 [15]PETSC ERROR: Key <= 0
 [15]PETSC ERROR: See https://petsc.org/release/faq/ for trouble
 shooting.
 [15]PETSC ERROR: Petsc Development GIT revision:
 v3.16.3-696-g46640c56cb  GIT Date: 2022-01-25 09:20:51 -0500
 [15]PETSC ERROR:
 /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a
 arch-olcf-crusher named crusher020 by adams Wed Jan 26 08:35:47 2022
 [15]PETSC ERROR: Configure options --with-cc=cc --with-cxx=CC
 --with-fc=ftn --with-fortran-bindings=0
 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0
 --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" --FOPTFLAGS=-g
 --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00"
 --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a
 --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0
 --download-p4est=1
 --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4
 PETSC_ARCH=arch-olcf-crusher
 [15]PETSC ERROR: #1 PetscTableFind() at
 /gpfs/alpine/csc314/scratch/adams/petsc/include/petscctable.h:131
 [15]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mmaij.c:35
 [15]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mpiaij.c:735
 [15]PETSC ERROR: #4 MatAssemblyEnd_MPIAIJKokkos() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:14
 [15]PETSC ERROR: #5 MatAssemblyEnd() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:5678
 [15]PETSC ERROR: #6 MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices()
 at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:267
 [15]PETSC ERROR: #7 MatSetMPIAIJKokkosWithGlobalCSRMatrix() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:825
 [15]PETSC ERROR: #8 MatProductSymbolic_MPIAIJKokkos() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1167
 [15]PETSC ERROR: #9 MatProductSymbolic() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matproduct.c:825
 [15]PETSC ERROR: #10 MatPtAP() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:9656
 [15]PETSC ERROR: #11 PCGAMGCreateLevel_GAMG() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:87
 [15]PETSC ERROR: #12 PCSetUp_GAMG() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:663
 [15]PETSC ERROR: #13 PCSetUp() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/interface/precon.c:1017
 [15]PETSC ERROR: #14 KSPSetUp() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:417
 [15]PETSC ERROR: #15 KSPSolve_Private() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:863
 [15]PETSC ERROR: #16 KSPSolve() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:1103
 [15]PETSC ERROR: #17 SNESSolve_KSPONLY() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/impls/ksponly/ksponly.c:51
 [15]PETSC ERROR: #18 SNESSolve() at
 /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/interface/snes.c:4810
 [15]PETSC ERROR: #19 main() at ex13.c:169
 [15]PETSC ERROR:

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
I have used valgrind here. I did not run it on this MPI error. I will.

On Wed, Jan 26, 2022 at 10:56 AM Barry Smith  wrote:

>
>   Any way to run with valgrind (or a HIP variant of valgrind)? It looks
> like a memory corruption issue and tracking down exactly when the
> corruption begins is 3/4's of the way to finding the exact cause.
>
>   Are the crashes reproducible in the same place with identical runs?
>
>
> On Jan 26, 2022, at 10:46 AM, Mark Adams  wrote:
>
> I think it is an MPI bug. It works with GPU aware MPI turned off.
> I am sure Summit will be fine.
> We have had users fix this error by switching thier MPI.
>
> On Wed, Jan 26, 2022 at 10:10 AM Junchao Zhang 
> wrote:
>
>> I don't know if this is due to bugs in petsc/kokkos backend.   See if you
>> can run 6 nodes (48 mpi ranks).  If it fails, then run the same problem on
>> Summit with 8 nodes to see if it still fails. If yes, it is likely a bug of
>> our own.
>>
>> --Junchao Zhang
>>
>>
>> On Wed, Jan 26, 2022 at 8:44 AM Mark Adams  wrote:
>>
>>> I am not able to reproduce this with a small problem. 2 nodes or less
>>> refinement works. This is from the 8 node test, the -dm_refine 5 version.
>>> I see that it comes from PtAP.
>>> This is on the fine grid. (I was thinking it could be on a reduced grid
>>> with idle processors, but no)
>>>
>>> [15]PETSC ERROR: Argument out of range
>>> [15]PETSC ERROR: Key <= 0
>>> [15]PETSC ERROR: See https://petsc.org/release/faq/ for trouble
>>> shooting.
>>> [15]PETSC ERROR: Petsc Development GIT revision: v3.16.3-696-g46640c56cb
>>>  GIT Date: 2022-01-25 09:20:51 -0500
>>> [15]PETSC ERROR:
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a
>>> arch-olcf-crusher named crusher020 by adams Wed Jan 26 08:35:47 2022
>>> [15]PETSC ERROR: Configure options --with-cc=cc --with-cxx=CC
>>> --with-fc=ftn --with-fortran-bindings=0
>>> LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0
>>> --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" --FOPTFLAGS=-g
>>> --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00"
>>> --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a
>>> --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0
>>> --download-p4est=1
>>> --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4
>>> PETSC_ARCH=arch-olcf-crusher
>>> [15]PETSC ERROR: #1 PetscTableFind() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscctable.h:131
>>> [15]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mmaij.c:35
>>> [15]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mpiaij.c:735
>>> [15]PETSC ERROR: #4 MatAssemblyEnd_MPIAIJKokkos() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:14
>>> [15]PETSC ERROR: #5 MatAssemblyEnd() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:5678
>>> [15]PETSC ERROR: #6 MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:267
>>> [15]PETSC ERROR: #7 MatSetMPIAIJKokkosWithGlobalCSRMatrix() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:825
>>> [15]PETSC ERROR: #8 MatProductSymbolic_MPIAIJKokkos() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1167
>>> [15]PETSC ERROR: #9 MatProductSymbolic() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matproduct.c:825
>>> [15]PETSC ERROR: #10 MatPtAP() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:9656
>>> [15]PETSC ERROR: #11 PCGAMGCreateLevel_GAMG() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:87
>>> [15]PETSC ERROR: #12 PCSetUp_GAMG() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:663
>>> [15]PETSC ERROR: #13 PCSetUp() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/interface/precon.c:1017
>>> [15]PETSC ERROR: #14 KSPSetUp() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:417
>>> [15]PETSC ERROR: #15 KSPSolve_Private() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:863
>>> [15]PETSC ERROR: #16 KSPSolve() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:1103
>>> [15]PETSC ERROR: #17 SNESSolve_KSPONLY() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/impls/ksponly/ksponly.c:51
>>> [15]PETSC ERROR: #18 SNESSolve() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/interface/snes.c:4810
>>> [15]PETSC ERROR: #19 main() at ex13.c:169
>>> [15]PETSC ERROR: PETSc Option Table entries:
>>> [15]PETSC ERROR: -benchmark_it 10
>>>
>>> On Wed, Jan 26, 2022 at 7:26 AM Mark Adams  wrote:
>>>
 The GPU aware MPI 

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Justin Chang
Couple suggestions:

1. Set the environment variable "export AMD_LOG_LEVEL=3" <- this will tell
you everything that's happening at the HIP level (memcpy's, mallocs, kernel
execution time, etc)
2. Try rocgdb, AFAIK this is the closest "HIP variant of valgrind" that we
officially support. There are some tricks on running this together with
mpi, to which you can just google "mpi with gdb". But you can see how
rocgdb works here:
https://www.olcf.ornl.gov/wp-content/uploads/2021/04/rocgdb_hipmath_ornl_2021_v2.pdf


On Wed, Jan 26, 2022 at 9:56 AM Barry Smith  wrote:

>
>   Any way to run with valgrind (or a HIP variant of valgrind)? It looks
> like a memory corruption issue and tracking down exactly when the
> corruption begins is 3/4's of the way to finding the exact cause.
>
>   Are the crashes reproducible in the same place with identical runs?
>
>
> On Jan 26, 2022, at 10:46 AM, Mark Adams  wrote:
>
> I think it is an MPI bug. It works with GPU aware MPI turned off.
> I am sure Summit will be fine.
> We have had users fix this error by switching thier MPI.
>
> On Wed, Jan 26, 2022 at 10:10 AM Junchao Zhang 
> wrote:
>
>> I don't know if this is due to bugs in petsc/kokkos backend.   See if you
>> can run 6 nodes (48 mpi ranks).  If it fails, then run the same problem on
>> Summit with 8 nodes to see if it still fails. If yes, it is likely a bug of
>> our own.
>>
>> --Junchao Zhang
>>
>>
>> On Wed, Jan 26, 2022 at 8:44 AM Mark Adams  wrote:
>>
>>> I am not able to reproduce this with a small problem. 2 nodes or less
>>> refinement works. This is from the 8 node test, the -dm_refine 5 version.
>>> I see that it comes from PtAP.
>>> This is on the fine grid. (I was thinking it could be on a reduced grid
>>> with idle processors, but no)
>>>
>>> [15]PETSC ERROR: Argument out of range
>>> [15]PETSC ERROR: Key <= 0
>>> [15]PETSC ERROR: See https://petsc.org/release/faq/ for trouble
>>> shooting.
>>> [15]PETSC ERROR: Petsc Development GIT revision: v3.16.3-696-g46640c56cb
>>>  GIT Date: 2022-01-25 09:20:51 -0500
>>> [15]PETSC ERROR:
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a
>>> arch-olcf-crusher named crusher020 by adams Wed Jan 26 08:35:47 2022
>>> [15]PETSC ERROR: Configure options --with-cc=cc --with-cxx=CC
>>> --with-fc=ftn --with-fortran-bindings=0
>>> LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0
>>> --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" --FOPTFLAGS=-g
>>> --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00"
>>> --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a
>>> --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0
>>> --download-p4est=1
>>> --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4
>>> PETSC_ARCH=arch-olcf-crusher
>>> [15]PETSC ERROR: #1 PetscTableFind() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscctable.h:131
>>> [15]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mmaij.c:35
>>> [15]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mpiaij.c:735
>>> [15]PETSC ERROR: #4 MatAssemblyEnd_MPIAIJKokkos() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:14
>>> [15]PETSC ERROR: #5 MatAssemblyEnd() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:5678
>>> [15]PETSC ERROR: #6 MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:267
>>> [15]PETSC ERROR: #7 MatSetMPIAIJKokkosWithGlobalCSRMatrix() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:825
>>> [15]PETSC ERROR: #8 MatProductSymbolic_MPIAIJKokkos() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1167
>>> [15]PETSC ERROR: #9 MatProductSymbolic() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matproduct.c:825
>>> [15]PETSC ERROR: #10 MatPtAP() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:9656
>>> [15]PETSC ERROR: #11 PCGAMGCreateLevel_GAMG() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:87
>>> [15]PETSC ERROR: #12 PCSetUp_GAMG() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:663
>>> [15]PETSC ERROR: #13 PCSetUp() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/interface/precon.c:1017
>>> [15]PETSC ERROR: #14 KSPSetUp() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:417
>>> [15]PETSC ERROR: #15 KSPSolve_Private() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:863
>>> [15]PETSC ERROR: #16 KSPSolve() at
>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:1103
>>> [15]PETSC ERROR

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Barry Smith

  Any way to run with valgrind (or a HIP variant of valgrind)? It looks like a 
memory corruption issue and tracking down exactly when the corruption begins is 
3/4's of the way to finding the exact cause.

  Are the crashes reproducible in the same place with identical runs?


> On Jan 26, 2022, at 10:46 AM, Mark Adams  wrote:
> 
> I think it is an MPI bug. It works with GPU aware MPI turned off. 
> I am sure Summit will be fine.
> We have had users fix this error by switching thier MPI.
> 
> On Wed, Jan 26, 2022 at 10:10 AM Junchao Zhang  > wrote:
> I don't know if this is due to bugs in petsc/kokkos backend.   See if you can 
> run 6 nodes (48 mpi ranks).  If it fails, then run the same problem on Summit 
> with 8 nodes to see if it still fails. If yes, it is likely a bug of our own.
> 
> --Junchao Zhang
> 
> 
> On Wed, Jan 26, 2022 at 8:44 AM Mark Adams  > wrote:
> I am not able to reproduce this with a small problem. 2 nodes or less 
> refinement works. This is from the 8 node test, the -dm_refine 5 version.
> I see that it comes from PtAP.
> This is on the fine grid. (I was thinking it could be on a reduced grid with 
> idle processors, but no)
> 
> [15]PETSC ERROR: Argument out of range
> [15]PETSC ERROR: Key <= 0
> [15]PETSC ERROR: See https://petsc.org/release/faq/ 
>  for trouble shooting.
> [15]PETSC ERROR: Petsc Development GIT revision: v3.16.3-696-g46640c56cb  GIT 
> Date: 2022-01-25 09:20:51 -0500
> [15]PETSC ERROR: 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a 
> arch-olcf-crusher named crusher020 by adams Wed Jan 26 08:35:47 2022
> [15]PETSC ERROR: Configure options --with-cc=cc --with-cxx=CC --with-fc=ftn 
> --with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib 
> -lmpi_gtl_hsa" --with-debugging=0 --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" 
> --FOPTFLAGS=-g --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 
> 00:10:00" --with-hip --with-hipc=hipcc --download-hypre 
> --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels 
> --with-kokkos-kernels-tpl=0 --download-p4est=1 
> --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4
>  PETSC_ARCH=arch-olcf-crusher
> [15]PETSC ERROR: #1 PetscTableFind() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscctable.h:131
> [15]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mmaij.c:35
> [15]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mpiaij.c:735
> [15]PETSC ERROR: #4 MatAssemblyEnd_MPIAIJKokkos() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:14
> [15]PETSC ERROR: #5 MatAssemblyEnd() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:5678
> [15]PETSC ERROR: #6 MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:267
> [15]PETSC ERROR: #7 MatSetMPIAIJKokkosWithGlobalCSRMatrix() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:825
> [15]PETSC ERROR: #8 MatProductSymbolic_MPIAIJKokkos() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1167
> [15]PETSC ERROR: #9 MatProductSymbolic() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matproduct.c:825
> [15]PETSC ERROR: #10 MatPtAP() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:9656
> [15]PETSC ERROR: #11 PCGAMGCreateLevel_GAMG() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:87
> [15]PETSC ERROR: #12 PCSetUp_GAMG() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:663
> [15]PETSC ERROR: #13 PCSetUp() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/interface/precon.c:1017
> [15]PETSC ERROR: #14 KSPSetUp() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:417
> [15]PETSC ERROR: #15 KSPSolve_Private() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:863
> [15]PETSC ERROR: #16 KSPSolve() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:1103
> [15]PETSC ERROR: #17 SNESSolve_KSPONLY() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/impls/ksponly/ksponly.c:51
> [15]PETSC ERROR: #18 SNESSolve() at 
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/interface/snes.c:4810
> [15]PETSC ERROR: #19 main() at ex13.c:169
> [15]PETSC ERROR: PETSc Option Table entries:
> [15]PETSC ERROR: -benchmark_it 10
> 
> On Wed, Jan 26, 2022 at 7:26 AM Mark Adams  > wrote:
> The GPU aware MPI is dying going 1 to 8 nodes, 8 processes per node.
> I will make a minimum reproducer. start with 2 nodes, one process on each 
> node

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
I think it is an MPI bug. It works with GPU aware MPI turned off.
I am sure Summit will be fine.
We have had users fix this error by switching thier MPI.

On Wed, Jan 26, 2022 at 10:10 AM Junchao Zhang 
wrote:

> I don't know if this is due to bugs in petsc/kokkos backend.   See if you
> can run 6 nodes (48 mpi ranks).  If it fails, then run the same problem on
> Summit with 8 nodes to see if it still fails. If yes, it is likely a bug of
> our own.
>
> --Junchao Zhang
>
>
> On Wed, Jan 26, 2022 at 8:44 AM Mark Adams  wrote:
>
>> I am not able to reproduce this with a small problem. 2 nodes or less
>> refinement works. This is from the 8 node test, the -dm_refine 5 version.
>> I see that it comes from PtAP.
>> This is on the fine grid. (I was thinking it could be on a reduced grid
>> with idle processors, but no)
>>
>> [15]PETSC ERROR: Argument out of range
>> [15]PETSC ERROR: Key <= 0
>> [15]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>> [15]PETSC ERROR: Petsc Development GIT revision: v3.16.3-696-g46640c56cb
>>  GIT Date: 2022-01-25 09:20:51 -0500
>> [15]PETSC ERROR:
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a
>> arch-olcf-crusher named crusher020 by adams Wed Jan 26 08:35:47 2022
>> [15]PETSC ERROR: Configure options --with-cc=cc --with-cxx=CC
>> --with-fc=ftn --with-fortran-bindings=0
>> LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0
>> --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" --FOPTFLAGS=-g
>> --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00"
>> --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a
>> --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0
>> --download-p4est=1
>> --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4
>> PETSC_ARCH=arch-olcf-crusher
>> [15]PETSC ERROR: #1 PetscTableFind() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscctable.h:131
>> [15]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mmaij.c:35
>> [15]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mpiaij.c:735
>> [15]PETSC ERROR: #4 MatAssemblyEnd_MPIAIJKokkos() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:14
>> [15]PETSC ERROR: #5 MatAssemblyEnd() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:5678
>> [15]PETSC ERROR: #6 MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:267
>> [15]PETSC ERROR: #7 MatSetMPIAIJKokkosWithGlobalCSRMatrix() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:825
>> [15]PETSC ERROR: #8 MatProductSymbolic_MPIAIJKokkos() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1167
>> [15]PETSC ERROR: #9 MatProductSymbolic() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matproduct.c:825
>> [15]PETSC ERROR: #10 MatPtAP() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:9656
>> [15]PETSC ERROR: #11 PCGAMGCreateLevel_GAMG() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:87
>> [15]PETSC ERROR: #12 PCSetUp_GAMG() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:663
>> [15]PETSC ERROR: #13 PCSetUp() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/interface/precon.c:1017
>> [15]PETSC ERROR: #14 KSPSetUp() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:417
>> [15]PETSC ERROR: #15 KSPSolve_Private() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:863
>> [15]PETSC ERROR: #16 KSPSolve() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:1103
>> [15]PETSC ERROR: #17 SNESSolve_KSPONLY() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/impls/ksponly/ksponly.c:51
>> [15]PETSC ERROR: #18 SNESSolve() at
>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/interface/snes.c:4810
>> [15]PETSC ERROR: #19 main() at ex13.c:169
>> [15]PETSC ERROR: PETSc Option Table entries:
>> [15]PETSC ERROR: -benchmark_it 10
>>
>> On Wed, Jan 26, 2022 at 7:26 AM Mark Adams  wrote:
>>
>>> The GPU aware MPI is dying going 1 to 8 nodes, 8 processes per node.
>>> I will make a minimum reproducer. start with 2 nodes, one process on
>>> each node.
>>>
>>>
>>> On Tue, Jan 25, 2022 at 10:19 PM Barry Smith  wrote:
>>>

   So the MPI is killing you in going from 8 to 64. (The GPU flop rate
 scales almost perfectly, but the overall flop rate is only half of what it
 should be at 64).

 On Jan 25, 2022, at 9:24 PM, Mark Adams  wrote:

 It looks like we have our instrumentation and job configuration in
 decent shape so on to scaling w

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Junchao Zhang
I don't know if this is due to bugs in petsc/kokkos backend.   See if you
can run 6 nodes (48 mpi ranks).  If it fails, then run the same problem on
Summit with 8 nodes to see if it still fails. If yes, it is likely a bug of
our own.

--Junchao Zhang


On Wed, Jan 26, 2022 at 8:44 AM Mark Adams  wrote:

> I am not able to reproduce this with a small problem. 2 nodes or less
> refinement works. This is from the 8 node test, the -dm_refine 5 version.
> I see that it comes from PtAP.
> This is on the fine grid. (I was thinking it could be on a reduced grid
> with idle processors, but no)
>
> [15]PETSC ERROR: Argument out of range
> [15]PETSC ERROR: Key <= 0
> [15]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [15]PETSC ERROR: Petsc Development GIT revision: v3.16.3-696-g46640c56cb
>  GIT Date: 2022-01-25 09:20:51 -0500
> [15]PETSC ERROR:
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a
> arch-olcf-crusher named crusher020 by adams Wed Jan 26 08:35:47 2022
> [15]PETSC ERROR: Configure options --with-cc=cc --with-cxx=CC
> --with-fc=ftn --with-fortran-bindings=0
> LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0
> --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" --FOPTFLAGS=-g
> --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00"
> --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a
> --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0
> --download-p4est=1
> --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4
> PETSC_ARCH=arch-olcf-crusher
> [15]PETSC ERROR: #1 PetscTableFind() at
> /gpfs/alpine/csc314/scratch/adams/petsc/include/petscctable.h:131
> [15]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mmaij.c:35
> [15]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mpiaij.c:735
> [15]PETSC ERROR: #4 MatAssemblyEnd_MPIAIJKokkos() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:14
> [15]PETSC ERROR: #5 MatAssemblyEnd() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:5678
> [15]PETSC ERROR: #6 MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:267
> [15]PETSC ERROR: #7 MatSetMPIAIJKokkosWithGlobalCSRMatrix() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:825
> [15]PETSC ERROR: #8 MatProductSymbolic_MPIAIJKokkos() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1167
> [15]PETSC ERROR: #9 MatProductSymbolic() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matproduct.c:825
> [15]PETSC ERROR: #10 MatPtAP() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:9656
> [15]PETSC ERROR: #11 PCGAMGCreateLevel_GAMG() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:87
> [15]PETSC ERROR: #12 PCSetUp_GAMG() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:663
> [15]PETSC ERROR: #13 PCSetUp() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/interface/precon.c:1017
> [15]PETSC ERROR: #14 KSPSetUp() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:417
> [15]PETSC ERROR: #15 KSPSolve_Private() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:863
> [15]PETSC ERROR: #16 KSPSolve() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:1103
> [15]PETSC ERROR: #17 SNESSolve_KSPONLY() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/impls/ksponly/ksponly.c:51
> [15]PETSC ERROR: #18 SNESSolve() at
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/interface/snes.c:4810
> [15]PETSC ERROR: #19 main() at ex13.c:169
> [15]PETSC ERROR: PETSc Option Table entries:
> [15]PETSC ERROR: -benchmark_it 10
>
> On Wed, Jan 26, 2022 at 7:26 AM Mark Adams  wrote:
>
>> The GPU aware MPI is dying going 1 to 8 nodes, 8 processes per node.
>> I will make a minimum reproducer. start with 2 nodes, one process on each
>> node.
>>
>>
>> On Tue, Jan 25, 2022 at 10:19 PM Barry Smith  wrote:
>>
>>>
>>>   So the MPI is killing you in going from 8 to 64. (The GPU flop rate
>>> scales almost perfectly, but the overall flop rate is only half of what it
>>> should be at 64).
>>>
>>> On Jan 25, 2022, at 9:24 PM, Mark Adams  wrote:
>>>
>>> It looks like we have our instrumentation and job configuration in
>>> decent shape so on to scaling with AMG.
>>> In using multiple nodes I got errors with table entries not found, which
>>> can be caused by a buggy MPI, and the problem does go away when I turn GPU
>>> aware MPI off.
>>> Jed's analysis, if I have this right, is that at *0.7T* flops we are at
>>> about 35% of theoretical peal wrt memory band

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
I am not able to reproduce this with a small problem. 2 nodes or less
refinement works. This is from the 8 node test, the -dm_refine 5 version.
I see that it comes from PtAP.
This is on the fine grid. (I was thinking it could be on a reduced grid
with idle processors, but no)

[15]PETSC ERROR: Argument out of range
[15]PETSC ERROR: Key <= 0
[15]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[15]PETSC ERROR: Petsc Development GIT revision: v3.16.3-696-g46640c56cb
 GIT Date: 2022-01-25 09:20:51 -0500
[15]PETSC ERROR:
/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a
arch-olcf-crusher named crusher020 by adams Wed Jan 26 08:35:47 2022
[15]PETSC ERROR: Configure options --with-cc=cc --with-cxx=CC --with-fc=ftn
--with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib
-lmpi_gtl_hsa" --with-debugging=0 --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O"
--FOPTFLAGS=-g --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t
00:10:00" --with-hip --with-hipc=hipcc --download-hypre
--with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels
--with-kokkos-kernels-tpl=0 --download-p4est=1
--with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4
PETSC_ARCH=arch-olcf-crusher
[15]PETSC ERROR: #1 PetscTableFind() at
/gpfs/alpine/csc314/scratch/adams/petsc/include/petscctable.h:131
[15]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mmaij.c:35
[15]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/mpiaij.c:735
[15]PETSC ERROR: #4 MatAssemblyEnd_MPIAIJKokkos() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:14
[15]PETSC ERROR: #5 MatAssemblyEnd() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:5678
[15]PETSC ERROR: #6 MatSetMPIAIJKokkosWithSplitSeqAIJKokkosMatrices() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:267
[15]PETSC ERROR: #7 MatSetMPIAIJKokkosWithGlobalCSRMatrix() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:825
[15]PETSC ERROR: #8 MatProductSymbolic_MPIAIJKokkos() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/mat/impls/aij/mpi/kokkos/mpiaijkok.kokkos.cxx:1167
[15]PETSC ERROR: #9 MatProductSymbolic() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matproduct.c:825
[15]PETSC ERROR: #10 MatPtAP() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/mat/interface/matrix.c:9656
[15]PETSC ERROR: #11 PCGAMGCreateLevel_GAMG() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:87
[15]PETSC ERROR: #12 PCSetUp_GAMG() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/impls/gamg/gamg.c:663
[15]PETSC ERROR: #13 PCSetUp() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/pc/interface/precon.c:1017
[15]PETSC ERROR: #14 KSPSetUp() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:417
[15]PETSC ERROR: #15 KSPSolve_Private() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:863
[15]PETSC ERROR: #16 KSPSolve() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/interface/itfunc.c:1103
[15]PETSC ERROR: #17 SNESSolve_KSPONLY() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/impls/ksponly/ksponly.c:51
[15]PETSC ERROR: #18 SNESSolve() at
/gpfs/alpine/csc314/scratch/adams/petsc/src/snes/interface/snes.c:4810
[15]PETSC ERROR: #19 main() at ex13.c:169
[15]PETSC ERROR: PETSc Option Table entries:
[15]PETSC ERROR: -benchmark_it 10

On Wed, Jan 26, 2022 at 7:26 AM Mark Adams  wrote:

> The GPU aware MPI is dying going 1 to 8 nodes, 8 processes per node.
> I will make a minimum reproducer. start with 2 nodes, one process on each
> node.
>
>
> On Tue, Jan 25, 2022 at 10:19 PM Barry Smith  wrote:
>
>>
>>   So the MPI is killing you in going from 8 to 64. (The GPU flop rate
>> scales almost perfectly, but the overall flop rate is only half of what it
>> should be at 64).
>>
>> On Jan 25, 2022, at 9:24 PM, Mark Adams  wrote:
>>
>> It looks like we have our instrumentation and job configuration in decent
>> shape so on to scaling with AMG.
>> In using multiple nodes I got errors with table entries not found, which
>> can be caused by a buggy MPI, and the problem does go away when I turn GPU
>> aware MPI off.
>> Jed's analysis, if I have this right, is that at *0.7T* flops we are at
>> about 35% of theoretical peal wrt memory bandwidth.
>> I run out of memory with the next step in this study (7 levels of
>> refinement), with 2M equations per GPU. This seems low to me and we will
>> see if we can fix this.
>> So this 0.7Tflops is with only 1/4 M equations so 35% is not terrible.
>> Here are the solve times with 001, 008 and 064 nodes, and 5 or 6 levels
>> of refinement.
>>
>> out_001_kokkos_Crusher_5_1.txt:KSPSolve  10 1.0 1.2933e+00
>> 1.0 4.13e+10 1.1 1.8e+05 8.4e

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-26 Thread Mark Adams
The GPU aware MPI is dying going 1 to 8 nodes, 8 processes per node.
I will make a minimum reproducer. start with 2 nodes, one process on each
node.


On Tue, Jan 25, 2022 at 10:19 PM Barry Smith  wrote:

>
>   So the MPI is killing you in going from 8 to 64. (The GPU flop rate
> scales almost perfectly, but the overall flop rate is only half of what it
> should be at 64).
>
> On Jan 25, 2022, at 9:24 PM, Mark Adams  wrote:
>
> It looks like we have our instrumentation and job configuration in decent
> shape so on to scaling with AMG.
> In using multiple nodes I got errors with table entries not found, which
> can be caused by a buggy MPI, and the problem does go away when I turn GPU
> aware MPI off.
> Jed's analysis, if I have this right, is that at *0.7T* flops we are at
> about 35% of theoretical peal wrt memory bandwidth.
> I run out of memory with the next step in this study (7 levels of
> refinement), with 2M equations per GPU. This seems low to me and we will
> see if we can fix this.
> So this 0.7Tflops is with only 1/4 M equations so 35% is not terrible.
> Here are the solve times with 001, 008 and 064 nodes, and 5 or 6 levels of
> refinement.
>
> out_001_kokkos_Crusher_5_1.txt:KSPSolve  10 1.0 1.2933e+00 1.0
> 4.13e+10 1.1 1.8e+05 8.4e+03 5.8e+02  3 87 86 78 48 100100100100100 248792
>   423857   6840 3.85e+02 6792 3.85e+02 100
> out_001_kokkos_Crusher_6_1.txt:KSPSolve  10 1.0 5.3667e+00 1.0
> 3.89e+11 1.0 2.1e+05 3.3e+04 6.7e+02  2 87 86 79 48 100100100100100 571572
>   *72*   7920 1.74e+03 7920 1.74e+03 100
> out_008_kokkos_Crusher_5_1.txt:KSPSolve  10 1.0 1.9407e+00 1.0
> 4.94e+10 1.1 3.5e+06 6.2e+03 6.7e+02  5 87 86 79 47 100100100100100 1581096
>   3034723   7920 6.88e+02 7920 6.88e+02 100
> out_008_kokkos_Crusher_6_1.txt:KSPSolve  10 1.0 7.4478e+00 1.0
> 4.49e+11 1.0 4.1e+06 2.3e+04 7.6e+02  2 88 87 80 49 100100100100100 3798162
>   5557106   9367 3.02e+03 9359 3.02e+03 100
> out_064_kokkos_Crusher_5_1.txt:KSPSolve  10 1.0 2.4551e+00 1.0
> 5.40e+10 1.1 4.2e+07 5.4e+03 7.3e+02  5 88 87 80 47 100100100100100
> 11065887   23792978   8684 8.90e+02 8683 8.90e+02 100
> out_064_kokkos_Crusher_6_1.txt:KSPSolve  10 1.0 1.1335e+01 1.0
> 5.38e+11 1.0 5.4e+07 2.0e+04 9.1e+02  4 88 88 82 49 100100100100100
> 24130606   43326249   11249 4.26e+03 11249 4.26e+03 100
>
> On Tue, Jan 25, 2022 at 1:49 PM Mark Adams  wrote:
>
>>
>>> Note that Mark's logs have been switching back and forth between
>>> -use_gpu_aware_mpi and changing number of ranks -- we won't have that
>>> information if we do manual timing hacks. This is going to be a routine
>>> thing we'll need on the mailing list and we need the provenance to go with
>>> it.
>>>
>>
>> GPU aware MPI crashes sometimes so to be safe, while debugging, I had it
>> off. It works fine here so it has been on in the last tests.
>> Here is a comparison.
>>
>>
> 
>
>
>