solve

Matthew Knepley Thu, 04 Oct 2018 11:58:39 -0700

On Thu, Oct 4, 2018 at 1:54 PM Ale Foggia <amfog...@gmail.com> wrote:


> Thank you both for your answers :)
>
> Matt:
> -Yes, sorry I forgot to tell you that, but I've also called
> PetscMemorySetGetMaximumUsage() right after initializing SLEPc. Also I've
> seen a strange behaviour: if I ran the same code in my computer and in the
> cluster *without* the command line option -malloc_dump, in the cluster the
> output of PetscMallocGetCurrentUsage and PetscMallocGetMaximumUsage is
> always zero, but that doesn't happen in my computer.
>
> - This is the output of the code for the solving part (after EPSCreate and
> after EPSSolve), and I've compared it with the output of *top* during those
> moments of peak memory consumption. *top* provides in one of the columns
> the resident set size (RES) and the numbers are around 1 GB per process,
> while, considering the numbers reported by the PETSc functions, the one
> that is more similar to that is given by MemoryGetCurrentUsage and is only
> 800 MB in the solving stage. Maybe, we can consider that those numbers are
> the same plus/minus something? Is it safe to say that MemoryGetCurrentUsage
> is measuring the "ru_maxss" member of "rusage" (or something similar)? If
> that's the case, what do the other functions report?
>

This is a perennial problem, since RSS is no guarantee of stuff that is
actually being used, but only was allocated at some point. The best tool I
have seen for this is Massif. I really recommend it:

  http://valgrind.org/docs/manual/ms-manual.html

  Thanks,

     Matt


> ==================== SOLVER INIT ====================
> MallocGetCurrent (init): 396096192.0 B
> MallocGetMaximum (init): 415178624.0 B
> MemoryGetCurrent (init): 624050176.0 B
> MemoryGetMaximum (init): 623775744.0 B
> ==================== SOLVER ====================
> MallocGetCurrent (solver): 560320256.0 B
> MallocGetMaximum (solver): 560333440.0 B
> MemoryGetCurrent (solver): 820961280.0 B
> MemoryGetMaximum (solver): 623775744.0 B
>
> Jose:
> - By each step I mean each of the step of the the program in order to
> diagonalize the matrix. For me, those are: creation of basis, preallocation
> of matrix, setting values of matrix, initializing solver,
> solving/diagonalizing and cleaning. I'm only diagonalizing once.
>
> - Regarding the information provided by -log_view, it's confusing for me:
> for example, it reports the creation of Vecs scattered across the various
> stages that I've set up (with PetscLogStageRegister and
> PetscLogStagePush/Pop), but almost all the deletions are presented in the
> "Main Stage". What does that "Main Stage" consider? Why are more deletions
> in there that creations? It's nor completely for me clear how things are
> presented there.
>
> - Thanks for the suggestion about the solver. Does "faster convergence"
> for Krylov-Schur mean less memory and less computation, or just less
> computation?
>
> Ale
>
>
> El jue., 4 oct. 2018 a las 13:12, Jose E. Roman (<jro...@dsic.upv.es>)
> escribió:
>
>> Regarding the SLEPc part:
>> - What do you mean by "each step"? Are you calling EPSSolve() several
>> times?
>> - Yes, the BV object is generally what takes most of the memory. It is
>> allocated at the beginning of EPSSolve(). Depending on the solver/options,
>> other memory may be allocated as well.
>> - You can also see the memory reported at the end of -log_view
>> - I would suggest using the default solver Krylov-Schur - it will do
>> Lanczos with implicit restart, which will give faster convergence than the
>> EPSLANCZOS solver.
>>
>> Jose
>>
>>
>> > El 4 oct 2018, a las 12:49, Matthew Knepley <knep...@gmail.com>
>> escribió:
>> >
>> > On Thu, Oct 4, 2018 at 4:43 AM Ale Foggia <amfog...@gmail.com> wrote:
>> > Hello all,
>> >
>> > I'm using SLEPc 3.9.2 (and PETSc 3.9.3) to get the EPS_SMALLEST_REAL of
>> a matrix with the following characteristics:
>> >
>> > * type: real, Hermitian, sparse
>> > * linear size: 2333606220
>> > * distributed in 2048 processes (64 nodes, 32 procs per node)
>> >
>> > My code first preallocates the necessary memory with
>> *MatMPIAIJSetPreallocation*, then fills it with the values and finally it
>> calls the following functions to create the solver and diagonalize the
>> matrix:
>> >
>> > EPSCreate(PETSC_COMM_WORLD, &solver);
>> > EPSSetOperators(solver,matrix,NULL);
>> > EPSSetProblemType(solver, EPS_HEP);
>> > EPSSetType(solver, EPSLANCZOS);
>> > EPSSetWhichEigenpairs(solver, EPS_SMALLEST_REAL);
>> > EPSSetFromOptions(solver);
>> > EPSSolve(solver);
>> >
>> > I want to make an estimation for larger size problems of the memory
>> used by the program (at every step) because I would like to keep it under
>> 16 GB per node. I've used the "memory usage" functions provided by PETSc,
>> but something happens during the solver stage that I can't explain. This
>> brings up two questions.
>> >
>> > 1) In each step I put a call to four memory functions and between them
>> I print the value of mem:
>> >
>> > Did you call PetscMemorySetGetMaximumUsage() first?
>> >
>> > We are computing https://en.wikipedia.org/wiki/Resident_set_size
>> however we can. Usually with getrusage().
>> > From this (
>> https://www.binarytides.com/linux-command-check-memory-usage/), it looks
>> like top also reports
>> > paged out memory.
>> >
>> >    Matt
>> >
>> > mem = 0;
>> > PetscMallocGetCurrentUsage(&mem);
>> > PetscMallocGetMaximumUsage(&mem);
>> > PetscMemoryGetCurrentUsage(&mem);
>> > PetscMemoryGetMaximumUsage(&mem);
>> >
>> > I've read some other question in the mailing list regarding the same
>> issue but I can't fully understand this. What is the difference between all
>> of them? What information are they actually giving me? (I know this is only
>> a "per process" output). I copy the output of two steps of the program as
>> an example:
>> >
>> > ==================== step N ====================
>> > MallocGetCurrent: 314513664.0 B
>> > MallocGetMaximum: 332723328.0 B
>> > MemoryGetCurrent: 539996160.0 B
>> > MemoryGetMaximum: 0.0 B
>> > ==================== step N+1 ====================
>> > MallocGetCurrent: 395902912.0 B
>> > MallocGetMaximum: 415178624.0 B
>> > MemoryGetCurrent: 623783936.0 B
>> > MemoryGetMaximum: 623775744.0 B
>> >
>> > 2) I was using this information to make the calculation of the memory
>> required per node to run my problem. Also, I'm able to login to the
>> computing node while running and I can check the memory consumption (with
>> *top*). The memory used that I see with top is more or less the same as the
>> one reported by PETSc functions at the beginning. But during the
>> inialization of the solver and during the solving, *top* reports a
>> consumption two times bigger than the one the functions report. Is it
>> possible to know from where this extra memory consumption comes from? What
>> things does SLEPc allocate that need that much memory? I've been trying to
>> do the math but I think there are things I'm missing. I thought that part
>> of it comes from the "BV" that the option -eps_view reports:
>> >
>> > BV Object: 2048 MPI processes
>> >   type: svec
>> >   17 columns of global length 2333606220
>> >   vector orthogonalization method: modified Gram-Schmidt
>> >   orthogonalization refinement: if needed (eta: 0.7071)
>> >   block orthogonalization method: GS
>> >   doing matmult as a single matrix-matrix product
>> >
>> > But "17 * 2333606220 * 8 Bytes / #nodes" only explains on third or less
>> of the "extra" memory.
>> >
>> > Ale
>> >
>> >
>> >
>> > --
>> > What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> > -- Norbert Wiener
>> >
>> > https://www.cse.buffalo.edu/~knepley/
>>
>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

Re: [petsc-users] PETSc/SLEPc: Memory consumption, particularly during solver initialization/solve

Reply via email to