The problem seems to persist but with a different signature.  Graphs attached as before.

Totals with MPICH (NB: single run)

For the CG/Jacobi          data_exchange_total = 41,385,984; kspsolve_total = 
38,289,408
For the GMRES/BJACOBI      data_exchange_total = 41,324,544; kspsolve_total = 
41,324,544

Just reading the MPI docs I am wondering if I need some sort of MPI_Wait/MPI_Waitall before my MPI_Barrier in the data exchange routine? I would have thought that with the blocking receives and the MPI_Barrier that everything will have fully completed and cleaned up before
all processes exited the routine, but perhaps I am wrong on that.

-sanjay

On 5/30/19 12:14 AM, Smith, Barry F. wrote:
   Let us know how it goes with MPICH


On May 30, 2019, at 2:01 AM, Sanjay Govindjee <s...@berkeley.edu> wrote:

I put in calls to PetscMemoryGetCurrentUsage( ) around KSPSolve and my data 
exchange routine.  The problem is clearly mostly in my data exchange routine.
Attached are graphs of the change in memory  for each call.  Lots of calls have 
zero change, but on a periodic regular basis the memory goes up from the data 
exchange; much less
so with the KSPSolve calls (and then mostly on the first calls).

For the CG/Jacobi          data_exchange_total = 21,311,488; kspsolve_total = 
2,625,536
For the GMRES/BJACOBI data_exchange_total = 6,619,136; kspsolve_total = 
54,403,072 (dominated by initial calls)

I will try to switch up my MPI to see if anything changes; right now my 
configure is with  --download-openmpi.
I've also attached the data exchange routine in case there is something 
obviously wrong.

NB: Graphs/Data are from just one run each.

-sanjay

On 5/29/19 10:17 PM, Smith, Barry F. wrote:
    This is indeed worrisome.

     Would it be possible to put PetscMemoryGetCurrentUsage() around each call 
to KSPSolve() and each call to your data exchange? See if at each step they 
increase?

     One thing to be aware of with "max resident set size" is that it measures the number 
of pages that have been set up. Not the amount of memory allocated. So, if, for example, you 
allocate a very large array but don't actually read or write the memory in that array until later 
in the code it won't appear in the "resident set size" until you read or write the memory 
(because Unix doesn't set up pages until it needs to).

    You should also try another MPI. Both OpenMPI and MPICH can be installed 
with brew or you can use --download-mpich or --download-openmp to see if the 
MPI implementation is making a difference.

     For now I would focus on the PETSc only solvers to eliminate one variable 
from the equation; once that is understood you can go back to the question of 
the memory management of the other solvers

   Barry


On May 29, 2019, at 11:51 PM, Sanjay Govindjee via petsc-users 
<petsc-users@mcs.anl.gov> wrote:

I am trying to track down a memory issue with my code; apologies in advance for 
the longish message.

I am solving a FEA problem with a number of load steps involving about 3000
right hand side and tangent assemblies and solves.  The program is mainly 
Fortran, with a C memory allocator.

When I run my code in strictly serial mode (no Petsc or MPI routines) the 
memory stays constant over the whole run.

When I run it in parallel mode with petsc solvers with num_processes=1, the 
memory (max resident set size) also stays constant:

PetscMalloc = 28,976, ProgramNativeMalloc = constant, Resident Size = 
24,854,528 (constant) [CG/JACOBI]

[PetscMalloc and Resident Size as reported by PetscMallocGetCurrentUsage and 
PetscMemoryGetCurrentUsage (and summed across processes as needed);
ProgramNativeMalloc reported by program memory allocator.]

When I run it in parallel mode with petsc solvers but num_processes=2, the 
resident memory grows steadily during the run:

PetscMalloc = 3,039,072 (constant), ProgramNativeMalloc = constant, Resident 
Size = (finish) 31,313,920 (start) 24,698,880 [CG/JACOBI]

When I run it in parallel mode with petsc solvers but num_processes=4, the 
resident memory grows steadily during the run:

PetscMalloc = 3,307,888 (constant), ProgramNativeMalloc = 1,427,584 (constant), 
Resident Size = (finish) 70,787,072  (start) 45,801,472 [CG/JACOBI]
PetscMalloc = 5,903,808 (constant), ProgramNativeMalloc = 1,427,584 (constant), 
Resident Size = (finish) 112,410,624 (start) 52,076,544 [GMRES/BJACOBI]
PetscMalloc = 3,188,944 (constant), ProgramNativeMalloc = 1,427,584 (constant), 
Resident Size = (finish) 712,798,208 (start) 381,480,960 [SUPERLU]
PetscMalloc = 6,539,408 (constant), ProgramNativeMalloc = 1,427,584 (constant), 
Resident Size = (finish) 591,048,704 (start) 278,671,360 [MUMPS]

The memory growth feels alarming but maybe I do not understand the values in 
ru_maxrss from getrusage().

My box (MacBook Pro) has a broken Valgrind so I need to get to a system with a 
functional one; notwithstanding, the code has always been Valgrind clean.
There are no Fortran Pointers or Fortran Allocatable arrays in the part of the 
code being used.  The program's C memory allocator keeps track of
itself so I do not see that the problem is there.  The Petsc malloc is also 
steady.

Other random hints:

1) If I comment out the call to KSPSolve and to my MPI data-exchange routine 
(for passing solution values between processes after each solve,
use  MPI_Isend, MPI_Recv, MPI_BARRIER)  the memory growth essentially goes away.

2) If I comment out the call to my MPI data-exchange routine but leave the call 
to KSPSolve the problem remains but is substantially reduced
for CG/JACOBI, and is marginally reduced for the GMRES/BJACOBI, SUPERLU, and 
MUMPS runs.

3) If I comment out the call to KSPSolve but leave the call to my MPI 
data-exchange routine the problem remains.

Any suggestions/hints of where to look will be great.

-sanjay


<cg.png><gmres.png><psetb.F>

Reply via email to