On May 29, 2019, at 11:51 PM, Sanjay Govindjee via petsc-users
<petsc-users@mcs.anl.gov> wrote:
I am trying to track down a memory issue with my code; apologies in advance for
the longish message.
I am solving a FEA problem with a number of load steps involving about 3000
right hand side and tangent assemblies and solves. The program is mainly
Fortran, with a C memory allocator.
When I run my code in strictly serial mode (no Petsc or MPI routines) the
memory stays constant over the whole run.
When I run it in parallel mode with petsc solvers with num_processes=1, the
memory (max resident set size) also stays constant:
PetscMalloc = 28,976, ProgramNativeMalloc = constant, Resident Size =
24,854,528 (constant) [CG/JACOBI]
[PetscMalloc and Resident Size as reported by PetscMallocGetCurrentUsage and
PetscMemoryGetCurrentUsage (and summed across processes as needed);
ProgramNativeMalloc reported by program memory allocator.]
When I run it in parallel mode with petsc solvers but num_processes=2, the
resident memory grows steadily during the run:
PetscMalloc = 3,039,072 (constant), ProgramNativeMalloc = constant, Resident
Size = (finish) 31,313,920 (start) 24,698,880 [CG/JACOBI]
When I run it in parallel mode with petsc solvers but num_processes=4, the
resident memory grows steadily during the run:
PetscMalloc = 3,307,888 (constant), ProgramNativeMalloc = 1,427,584 (constant),
Resident Size = (finish) 70,787,072 (start) 45,801,472 [CG/JACOBI]
PetscMalloc = 5,903,808 (constant), ProgramNativeMalloc = 1,427,584 (constant),
Resident Size = (finish) 112,410,624 (start) 52,076,544 [GMRES/BJACOBI]
PetscMalloc = 3,188,944 (constant), ProgramNativeMalloc = 1,427,584 (constant),
Resident Size = (finish) 712,798,208 (start) 381,480,960 [SUPERLU]
PetscMalloc = 6,539,408 (constant), ProgramNativeMalloc = 1,427,584 (constant),
Resident Size = (finish) 591,048,704 (start) 278,671,360 [MUMPS]
The memory growth feels alarming but maybe I do not understand the values in
ru_maxrss from getrusage().
My box (MacBook Pro) has a broken Valgrind so I need to get to a system with a
functional one; notwithstanding, the code has always been Valgrind clean.
There are no Fortran Pointers or Fortran Allocatable arrays in the part of the
code being used. The program's C memory allocator keeps track of
itself so I do not see that the problem is there. The Petsc malloc is also
steady.
Other random hints:
1) If I comment out the call to KSPSolve and to my MPI data-exchange routine
(for passing solution values between processes after each solve,
use MPI_Isend, MPI_Recv, MPI_BARRIER) the memory growth essentially goes away.
2) If I comment out the call to my MPI data-exchange routine but leave the call
to KSPSolve the problem remains but is substantially reduced
for CG/JACOBI, and is marginally reduced for the GMRES/BJACOBI, SUPERLU, and
MUMPS runs.
3) If I comment out the call to KSPSolve but leave the call to my MPI
data-exchange routine the problem remains.
Any suggestions/hints of where to look will be great.
-sanjay