Re: [petsc-users] PETSc/SLEPc: Memory consumption, particularly during solver initialization/solve

2018-10-10 Thread Matthew Knepley
On Wed, Oct 10, 2018 at 3:38 AM Ale Foggia  wrote:

> Jed, Jose and Matthew,
> I've finally managed to make massif (it gives pretty detailed information,
> I like it) work in the correct way in the cluster and I'm able to track
> down the memory consumption, and what's more important (for me), I think
> now I'm able to make a more accurate prediction of the memory I need for a
> particular size of the problem. Thank you very much for all your answers
> and suggestions.
>

Great! Could you tell me in one line what was taking up memory? It always
good to hear about real applications.

  Thanks,

Matt


> El vie., 5 oct. 2018 a las 9:38, Jose E. Roman ()
> escribió:
>
>>
>>
>> > El 4 oct 2018, a las 19:54, Ale Foggia  escribió:
>> >
>> > Jose:
>> > - By each step I mean each of the step of the the program in order to
>> diagonalize the matrix. For me, those are: creation of basis, preallocation
>> of matrix, setting values of matrix, initializing solver,
>> solving/diagonalizing and cleaning. I'm only diagonalizing once.
>> >
>> > - Regarding the information provided by -log_view, it's confusing for
>> me: for example, it reports the creation of Vecs scattered across the
>> various stages that I've set up (with PetscLogStageRegister and
>> PetscLogStagePush/Pop), but almost all the deletions are presented in the
>> "Main Stage". What does that "Main Stage" consider? Why are more deletions
>> in there that creations? It's nor completely for me clear how things are
>> presented there.
>>
>> I guess deletions should match creations. Seems to be related to using
>> stages. Maybe someone from PETSc can give an explanation, but looking at a
>> PETSc example that uses stages (e.g. dm/impls/plex/examples/tests/ex1.c) it
>> seems that some destructions are counted in the main stage while the
>> creation is counted in another stage - I guess it depends on the points
>> where the stages are defined. The sum of creations matches the sum of
>> destroys.
>>
>> >
>> > - Thanks for the suggestion about the solver. Does "faster convergence"
>> for Krylov-Schur mean less memory and less computation, or just less
>> computation?
>> >
>>
>> It should be about the same memory with less iterations.
>>
>> Jose
>>
>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] PETSc/SLEPc: Memory consumption, particularly during solver initialization/solve

2018-10-10 Thread Ale Foggia
Jed, Jose and Matthew,
I've finally managed to make massif (it gives pretty detailed information,
I like it) work in the correct way in the cluster and I'm able to track
down the memory consumption, and what's more important (for me), I think
now I'm able to make a more accurate prediction of the memory I need for a
particular size of the problem. Thank you very much for all your answers
and suggestions.

El vie., 5 oct. 2018 a las 9:38, Jose E. Roman ()
escribió:

>
>
> > El 4 oct 2018, a las 19:54, Ale Foggia  escribió:
> >
> > Jose:
> > - By each step I mean each of the step of the the program in order to
> diagonalize the matrix. For me, those are: creation of basis, preallocation
> of matrix, setting values of matrix, initializing solver,
> solving/diagonalizing and cleaning. I'm only diagonalizing once.
> >
> > - Regarding the information provided by -log_view, it's confusing for
> me: for example, it reports the creation of Vecs scattered across the
> various stages that I've set up (with PetscLogStageRegister and
> PetscLogStagePush/Pop), but almost all the deletions are presented in the
> "Main Stage". What does that "Main Stage" consider? Why are more deletions
> in there that creations? It's nor completely for me clear how things are
> presented there.
>
> I guess deletions should match creations. Seems to be related to using
> stages. Maybe someone from PETSc can give an explanation, but looking at a
> PETSc example that uses stages (e.g. dm/impls/plex/examples/tests/ex1.c) it
> seems that some destructions are counted in the main stage while the
> creation is counted in another stage - I guess it depends on the points
> where the stages are defined. The sum of creations matches the sum of
> destroys.
>
> >
> > - Thanks for the suggestion about the solver. Does "faster convergence"
> for Krylov-Schur mean less memory and less computation, or just less
> computation?
> >
>
> It should be about the same memory with less iterations.
>
> Jose
>
>


Re: [petsc-users] PETSc/SLEPc: Memory consumption, particularly during solver initialization/solve

2018-10-05 Thread Jose E. Roman



> El 4 oct 2018, a las 19:54, Ale Foggia  escribió:
> 
> Jose: 
> - By each step I mean each of the step of the the program in order to 
> diagonalize the matrix. For me, those are: creation of basis, preallocation 
> of matrix, setting values of matrix, initializing solver, 
> solving/diagonalizing and cleaning. I'm only diagonalizing once. 
> 
> - Regarding the information provided by -log_view, it's confusing for me: for 
> example, it reports the creation of Vecs scattered across the various stages 
> that I've set up (with PetscLogStageRegister and PetscLogStagePush/Pop), but 
> almost all the deletions are presented in the "Main Stage". What does that 
> "Main Stage" consider? Why are more deletions in there that creations? It's 
> nor completely for me clear how things are presented there.

I guess deletions should match creations. Seems to be related to using stages. 
Maybe someone from PETSc can give an explanation, but looking at a PETSc 
example that uses stages (e.g. dm/impls/plex/examples/tests/ex1.c) it seems 
that some destructions are counted in the main stage while the creation is 
counted in another stage - I guess it depends on the points where the stages 
are defined. The sum of creations matches the sum of destroys.

> 
> - Thanks for the suggestion about the solver. Does "faster convergence" for 
> Krylov-Schur mean less memory and less computation, or just less computation? 
> 

It should be about the same memory with less iterations.

Jose



Re: [petsc-users] PETSc/SLEPc: Memory consumption, particularly during solver initialization/solve

2018-10-04 Thread Jed Brown
Matthew Knepley  writes:

> On Thu, Oct 4, 2018 at 1:54 PM Ale Foggia  wrote:
>
>> Thank you both for your answers :)
>>
>> Matt:
>> -Yes, sorry I forgot to tell you that, but I've also called
>> PetscMemorySetGetMaximumUsage() right after initializing SLEPc. Also I've
>> seen a strange behaviour: if I ran the same code in my computer and in the
>> cluster *without* the command line option -malloc_dump, in the cluster the
>> output of PetscMallocGetCurrentUsage and PetscMallocGetMaximumUsage is
>> always zero, but that doesn't happen in my computer.
>>
>> - This is the output of the code for the solving part (after EPSCreate and
>> after EPSSolve), and I've compared it with the output of *top* during those
>> moments of peak memory consumption. *top* provides in one of the columns
>> the resident set size (RES) and the numbers are around 1 GB per process,
>> while, considering the numbers reported by the PETSc functions, the one
>> that is more similar to that is given by MemoryGetCurrentUsage and is only
>> 800 MB in the solving stage. Maybe, we can consider that those numbers are
>> the same plus/minus something? Is it safe to say that MemoryGetCurrentUsage
>> is measuring the "ru_maxss" member of "rusage" (or something similar)? If
>> that's the case, what do the other functions report?
>>
>
> This is a perennial problem, since RSS is no guarantee of stuff that is
> actually being used, but only was allocated at some point. 

No, allocation alone does not make it resident on most operating
systems.  If you run top, you see a column VIRT (memory that was
allocated/mmap'd) and RES (actually resident in physical memory).

PetscMemoryGetCurrentUsage tries to get resident memory usage via
/proc/{PID}/statm or getrusage() ru_maxrss.

PetscMallocGetMaximumUsage just says how much memory has been allocated
using PETSc's tracing malloc (off by default with an optimized build,
but you can turn on by running with -malloc or related diagnostic
options).

> The best tool I have seen for this is Massif. I really recommend it:
>
>   http://valgrind.org/docs/manual/ms-manual.html
>
>   Thanks,
>
>  Matt
>
>
>>  SOLVER INIT 
>> MallocGetCurrent (init): 396096192.0 B
>> MallocGetMaximum (init): 415178624.0 B
>> MemoryGetCurrent (init): 624050176.0 B
>> MemoryGetMaximum (init): 623775744.0 B
>>  SOLVER 
>> MallocGetCurrent (solver): 560320256.0 B
>> MallocGetMaximum (solver): 560333440.0 B
>> MemoryGetCurrent (solver): 820961280.0 B
>> MemoryGetMaximum (solver): 623775744.0 B
>>
>> Jose:
>> - By each step I mean each of the step of the the program in order to
>> diagonalize the matrix. For me, those are: creation of basis, preallocation
>> of matrix, setting values of matrix, initializing solver,
>> solving/diagonalizing and cleaning. I'm only diagonalizing once.
>>
>> - Regarding the information provided by -log_view, it's confusing for me:
>> for example, it reports the creation of Vecs scattered across the various
>> stages that I've set up (with PetscLogStageRegister and
>> PetscLogStagePush/Pop), but almost all the deletions are presented in the
>> "Main Stage". What does that "Main Stage" consider? Why are more deletions
>> in there that creations? It's nor completely for me clear how things are
>> presented there.
>>
>> - Thanks for the suggestion about the solver. Does "faster convergence"
>> for Krylov-Schur mean less memory and less computation, or just less
>> computation?
>>
>> Ale
>>
>>
>> El jue., 4 oct. 2018 a las 13:12, Jose E. Roman ()
>> escribió:
>>
>>> Regarding the SLEPc part:
>>> - What do you mean by "each step"? Are you calling EPSSolve() several
>>> times?
>>> - Yes, the BV object is generally what takes most of the memory. It is
>>> allocated at the beginning of EPSSolve(). Depending on the solver/options,
>>> other memory may be allocated as well.
>>> - You can also see the memory reported at the end of -log_view
>>> - I would suggest using the default solver Krylov-Schur - it will do
>>> Lanczos with implicit restart, which will give faster convergence than the
>>> EPSLANCZOS solver.
>>>
>>> Jose
>>>
>>>
>>> > El 4 oct 2018, a las 12:49, Matthew Knepley 
>>> escribió:
>>> >
>>> > On Thu, Oct 4, 2018 at 4:43 AM Ale Foggia  wrote:
>>> > Hello all,
>>> >
>>> > I'm using SLEPc 3.9.2 (and PETSc 3.9.3) to get the EPS_SMALLEST_REAL of
>>> a matrix with the following characteristics:
>>> >
>>> > * type: real, Hermitian, sparse
>>> > * linear size: 2333606220
>>> > * distributed in 2048 processes (64 nodes, 32 procs per node)
>>> >
>>> > My code first preallocates the necessary memory with
>>> *MatMPIAIJSetPreallocation*, then fills it with the values and finally it
>>> calls the following functions to create the solver and diagonalize the
>>> matrix:
>>> >
>>> > EPSCreate(PETSC_COMM_WORLD, );
>>> > EPSSetOperators(solver,matrix,NULL);
>>> > EPSSetProblemType(solver, EPS_HEP);
>>> > EPSSetType(solver, EPSLANCZOS);
>>> > 

Re: [petsc-users] PETSc/SLEPc: Memory consumption, particularly during solver initialization/solve

2018-10-04 Thread Matthew Knepley
On Thu, Oct 4, 2018 at 1:54 PM Ale Foggia  wrote:

> Thank you both for your answers :)
>
> Matt:
> -Yes, sorry I forgot to tell you that, but I've also called
> PetscMemorySetGetMaximumUsage() right after initializing SLEPc. Also I've
> seen a strange behaviour: if I ran the same code in my computer and in the
> cluster *without* the command line option -malloc_dump, in the cluster the
> output of PetscMallocGetCurrentUsage and PetscMallocGetMaximumUsage is
> always zero, but that doesn't happen in my computer.
>
> - This is the output of the code for the solving part (after EPSCreate and
> after EPSSolve), and I've compared it with the output of *top* during those
> moments of peak memory consumption. *top* provides in one of the columns
> the resident set size (RES) and the numbers are around 1 GB per process,
> while, considering the numbers reported by the PETSc functions, the one
> that is more similar to that is given by MemoryGetCurrentUsage and is only
> 800 MB in the solving stage. Maybe, we can consider that those numbers are
> the same plus/minus something? Is it safe to say that MemoryGetCurrentUsage
> is measuring the "ru_maxss" member of "rusage" (or something similar)? If
> that's the case, what do the other functions report?
>

This is a perennial problem, since RSS is no guarantee of stuff that is
actually being used, but only was allocated at some point. The best tool I
have seen for this is Massif. I really recommend it:

  http://valgrind.org/docs/manual/ms-manual.html

  Thanks,

 Matt


>  SOLVER INIT 
> MallocGetCurrent (init): 396096192.0 B
> MallocGetMaximum (init): 415178624.0 B
> MemoryGetCurrent (init): 624050176.0 B
> MemoryGetMaximum (init): 623775744.0 B
>  SOLVER 
> MallocGetCurrent (solver): 560320256.0 B
> MallocGetMaximum (solver): 560333440.0 B
> MemoryGetCurrent (solver): 820961280.0 B
> MemoryGetMaximum (solver): 623775744.0 B
>
> Jose:
> - By each step I mean each of the step of the the program in order to
> diagonalize the matrix. For me, those are: creation of basis, preallocation
> of matrix, setting values of matrix, initializing solver,
> solving/diagonalizing and cleaning. I'm only diagonalizing once.
>
> - Regarding the information provided by -log_view, it's confusing for me:
> for example, it reports the creation of Vecs scattered across the various
> stages that I've set up (with PetscLogStageRegister and
> PetscLogStagePush/Pop), but almost all the deletions are presented in the
> "Main Stage". What does that "Main Stage" consider? Why are more deletions
> in there that creations? It's nor completely for me clear how things are
> presented there.
>
> - Thanks for the suggestion about the solver. Does "faster convergence"
> for Krylov-Schur mean less memory and less computation, or just less
> computation?
>
> Ale
>
>
> El jue., 4 oct. 2018 a las 13:12, Jose E. Roman ()
> escribió:
>
>> Regarding the SLEPc part:
>> - What do you mean by "each step"? Are you calling EPSSolve() several
>> times?
>> - Yes, the BV object is generally what takes most of the memory. It is
>> allocated at the beginning of EPSSolve(). Depending on the solver/options,
>> other memory may be allocated as well.
>> - You can also see the memory reported at the end of -log_view
>> - I would suggest using the default solver Krylov-Schur - it will do
>> Lanczos with implicit restart, which will give faster convergence than the
>> EPSLANCZOS solver.
>>
>> Jose
>>
>>
>> > El 4 oct 2018, a las 12:49, Matthew Knepley 
>> escribió:
>> >
>> > On Thu, Oct 4, 2018 at 4:43 AM Ale Foggia  wrote:
>> > Hello all,
>> >
>> > I'm using SLEPc 3.9.2 (and PETSc 3.9.3) to get the EPS_SMALLEST_REAL of
>> a matrix with the following characteristics:
>> >
>> > * type: real, Hermitian, sparse
>> > * linear size: 2333606220
>> > * distributed in 2048 processes (64 nodes, 32 procs per node)
>> >
>> > My code first preallocates the necessary memory with
>> *MatMPIAIJSetPreallocation*, then fills it with the values and finally it
>> calls the following functions to create the solver and diagonalize the
>> matrix:
>> >
>> > EPSCreate(PETSC_COMM_WORLD, );
>> > EPSSetOperators(solver,matrix,NULL);
>> > EPSSetProblemType(solver, EPS_HEP);
>> > EPSSetType(solver, EPSLANCZOS);
>> > EPSSetWhichEigenpairs(solver, EPS_SMALLEST_REAL);
>> > EPSSetFromOptions(solver);
>> > EPSSolve(solver);
>> >
>> > I want to make an estimation for larger size problems of the memory
>> used by the program (at every step) because I would like to keep it under
>> 16 GB per node. I've used the "memory usage" functions provided by PETSc,
>> but something happens during the solver stage that I can't explain. This
>> brings up two questions.
>> >
>> > 1) In each step I put a call to four memory functions and between them
>> I print the value of mem:
>> >
>> > Did you call PetscMemorySetGetMaximumUsage() first?
>> >
>> > We are computing 

Re: [petsc-users] PETSc/SLEPc: Memory consumption, particularly during solver initialization/solve

2018-10-04 Thread Ale Foggia
Thank you both for your answers :)

Matt:
-Yes, sorry I forgot to tell you that, but I've also called
PetscMemorySetGetMaximumUsage() right after initializing SLEPc. Also I've
seen a strange behaviour: if I ran the same code in my computer and in the
cluster *without* the command line option -malloc_dump, in the cluster the
output of PetscMallocGetCurrentUsage and PetscMallocGetMaximumUsage is
always zero, but that doesn't happen in my computer.

- This is the output of the code for the solving part (after EPSCreate and
after EPSSolve), and I've compared it with the output of *top* during those
moments of peak memory consumption. *top* provides in one of the columns
the resident set size (RES) and the numbers are around 1 GB per process,
while, considering the numbers reported by the PETSc functions, the one
that is more similar to that is given by MemoryGetCurrentUsage and is only
800 MB in the solving stage. Maybe, we can consider that those numbers are
the same plus/minus something? Is it safe to say that MemoryGetCurrentUsage
is measuring the "ru_maxss" member of "rusage" (or something similar)? If
that's the case, what do the other functions report?

 SOLVER INIT 
MallocGetCurrent (init): 396096192.0 B
MallocGetMaximum (init): 415178624.0 B
MemoryGetCurrent (init): 624050176.0 B
MemoryGetMaximum (init): 623775744.0 B
 SOLVER 
MallocGetCurrent (solver): 560320256.0 B
MallocGetMaximum (solver): 560333440.0 B
MemoryGetCurrent (solver): 820961280.0 B
MemoryGetMaximum (solver): 623775744.0 B

Jose:
- By each step I mean each of the step of the the program in order to
diagonalize the matrix. For me, those are: creation of basis, preallocation
of matrix, setting values of matrix, initializing solver,
solving/diagonalizing and cleaning. I'm only diagonalizing once.

- Regarding the information provided by -log_view, it's confusing for me:
for example, it reports the creation of Vecs scattered across the various
stages that I've set up (with PetscLogStageRegister and
PetscLogStagePush/Pop), but almost all the deletions are presented in the
"Main Stage". What does that "Main Stage" consider? Why are more deletions
in there that creations? It's nor completely for me clear how things are
presented there.

- Thanks for the suggestion about the solver. Does "faster convergence" for
Krylov-Schur mean less memory and less computation, or just less
computation?

Ale


El jue., 4 oct. 2018 a las 13:12, Jose E. Roman ()
escribió:

> Regarding the SLEPc part:
> - What do you mean by "each step"? Are you calling EPSSolve() several
> times?
> - Yes, the BV object is generally what takes most of the memory. It is
> allocated at the beginning of EPSSolve(). Depending on the solver/options,
> other memory may be allocated as well.
> - You can also see the memory reported at the end of -log_view
> - I would suggest using the default solver Krylov-Schur - it will do
> Lanczos with implicit restart, which will give faster convergence than the
> EPSLANCZOS solver.
>
> Jose
>
>
> > El 4 oct 2018, a las 12:49, Matthew Knepley 
> escribió:
> >
> > On Thu, Oct 4, 2018 at 4:43 AM Ale Foggia  wrote:
> > Hello all,
> >
> > I'm using SLEPc 3.9.2 (and PETSc 3.9.3) to get the EPS_SMALLEST_REAL of
> a matrix with the following characteristics:
> >
> > * type: real, Hermitian, sparse
> > * linear size: 2333606220
> > * distributed in 2048 processes (64 nodes, 32 procs per node)
> >
> > My code first preallocates the necessary memory with
> *MatMPIAIJSetPreallocation*, then fills it with the values and finally it
> calls the following functions to create the solver and diagonalize the
> matrix:
> >
> > EPSCreate(PETSC_COMM_WORLD, );
> > EPSSetOperators(solver,matrix,NULL);
> > EPSSetProblemType(solver, EPS_HEP);
> > EPSSetType(solver, EPSLANCZOS);
> > EPSSetWhichEigenpairs(solver, EPS_SMALLEST_REAL);
> > EPSSetFromOptions(solver);
> > EPSSolve(solver);
> >
> > I want to make an estimation for larger size problems of the memory used
> by the program (at every step) because I would like to keep it under 16 GB
> per node. I've used the "memory usage" functions provided by PETSc, but
> something happens during the solver stage that I can't explain. This brings
> up two questions.
> >
> > 1) In each step I put a call to four memory functions and between them I
> print the value of mem:
> >
> > Did you call PetscMemorySetGetMaximumUsage() first?
> >
> > We are computing https://en.wikipedia.org/wiki/Resident_set_size
> however we can. Usually with getrusage().
> > From this (https://www.binarytides.com/linux-command-check-memory-usage/),
> it looks like top also reports
> > paged out memory.
> >
> >Matt
> >
> > mem = 0;
> > PetscMallocGetCurrentUsage();
> > PetscMallocGetMaximumUsage();
> > PetscMemoryGetCurrentUsage();
> > PetscMemoryGetMaximumUsage();
> >
> > I've read some other question in the mailing list regarding the same
> issue but I can't fully understand 

Re: [petsc-users] PETSc/SLEPc: Memory consumption, particularly during solver initialization/solve

2018-10-04 Thread Jose E. Roman
Regarding the SLEPc part:
- What do you mean by "each step"? Are you calling EPSSolve() several times?
- Yes, the BV object is generally what takes most of the memory. It is 
allocated at the beginning of EPSSolve(). Depending on the solver/options, 
other memory may be allocated as well.
- You can also see the memory reported at the end of -log_view
- I would suggest using the default solver Krylov-Schur - it will do Lanczos 
with implicit restart, which will give faster convergence than the EPSLANCZOS 
solver.

Jose


> El 4 oct 2018, a las 12:49, Matthew Knepley  escribió:
> 
> On Thu, Oct 4, 2018 at 4:43 AM Ale Foggia  wrote:
> Hello all,
> 
> I'm using SLEPc 3.9.2 (and PETSc 3.9.3) to get the EPS_SMALLEST_REAL of a 
> matrix with the following characteristics:
> 
> * type: real, Hermitian, sparse
> * linear size: 2333606220 
> * distributed in 2048 processes (64 nodes, 32 procs per node)
> 
> My code first preallocates the necessary memory with 
> *MatMPIAIJSetPreallocation*, then fills it with the values and finally it 
> calls the following functions to create the solver and diagonalize the matrix:
> 
> EPSCreate(PETSC_COMM_WORLD, );
> EPSSetOperators(solver,matrix,NULL);
> EPSSetProblemType(solver, EPS_HEP);
> EPSSetType(solver, EPSLANCZOS);
> EPSSetWhichEigenpairs(solver, EPS_SMALLEST_REAL);
> EPSSetFromOptions(solver);
> EPSSolve(solver);
> 
> I want to make an estimation for larger size problems of the memory used by 
> the program (at every step) because I would like to keep it under 16 GB per 
> node. I've used the "memory usage" functions provided by PETSc, but something 
> happens during the solver stage that I can't explain. This brings up two 
> questions.
> 
> 1) In each step I put a call to four memory functions and between them I 
> print the value of mem:
> 
> Did you call PetscMemorySetGetMaximumUsage() first?
> 
> We are computing https://en.wikipedia.org/wiki/Resident_set_size however we 
> can. Usually with getrusage().
> From this (https://www.binarytides.com/linux-command-check-memory-usage/), it 
> looks like top also reports
> paged out memory.
> 
>Matt
>  
> mem = 0;
> PetscMallocGetCurrentUsage();
> PetscMallocGetMaximumUsage();
> PetscMemoryGetCurrentUsage();
> PetscMemoryGetMaximumUsage();
> 
> I've read some other question in the mailing list regarding the same issue 
> but I can't fully understand this. What is the difference between all of 
> them? What information are they actually giving me? (I know this is only a 
> "per process" output). I copy the output of two steps of the program as an 
> example:
> 
>  step N 
> MallocGetCurrent: 314513664.0 B
> MallocGetMaximum: 332723328.0 B
> MemoryGetCurrent: 539996160.0 B
> MemoryGetMaximum: 0.0 B
>  step N+1 
> MallocGetCurrent: 395902912.0 B
> MallocGetMaximum: 415178624.0 B
> MemoryGetCurrent: 623783936.0 B
> MemoryGetMaximum: 623775744.0 B
> 
> 2) I was using this information to make the calculation of the memory 
> required per node to run my problem. Also, I'm able to login to the computing 
> node while running and I can check the memory consumption (with *top*). The 
> memory used that I see with top is more or less the same as the one reported 
> by PETSc functions at the beginning. But during the inialization of the 
> solver and during the solving, *top* reports a consumption two times bigger 
> than the one the functions report. Is it possible to know from where this 
> extra memory consumption comes from? What things does SLEPc allocate that 
> need that much memory? I've been trying to do the math but I think there are 
> things I'm missing. I thought that part of it comes from the "BV" that the 
> option -eps_view reports:
> 
> BV Object: 2048 MPI processes
>   type: svec
>   17 columns of global length 2333606220
>   vector orthogonalization method: modified Gram-Schmidt
>   orthogonalization refinement: if needed (eta: 0.7071)
>   block orthogonalization method: GS
>   doing matmult as a single matrix-matrix product
> 
> But "17 * 2333606220 * 8 Bytes / #nodes" only explains on third or less of 
> the "extra" memory.
> 
> Ale
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/



Re: [petsc-users] PETSc/SLEPc: Memory consumption, particularly during solver initialization/solve

2018-10-04 Thread Matthew Knepley
On Thu, Oct 4, 2018 at 4:43 AM Ale Foggia  wrote:

> Hello all,
>
> I'm using SLEPc 3.9.2 (and PETSc 3.9.3) to get the EPS_SMALLEST_REAL of a
> matrix with the following characteristics:
>
> * type: real, Hermitian, sparse
> * linear size: 2333606220
> * distributed in 2048 processes (64 nodes, 32 procs per node)
>
> My code first preallocates the necessary memory with
> *MatMPIAIJSetPreallocation*, then fills it with the values and finally it
> calls the following functions to create the solver and diagonalize the
> matrix:
>
> EPSCreate(PETSC_COMM_WORLD, );
> EPSSetOperators(solver,matrix,NULL);
> EPSSetProblemType(solver, EPS_HEP);
> EPSSetType(solver, EPSLANCZOS);
> EPSSetWhichEigenpairs(solver, EPS_SMALLEST_REAL);
> EPSSetFromOptions(solver);
> EPSSolve(solver);
>
> I want to make an estimation for larger size problems of the memory used
> by the program (at every step) because I would like to keep it under 16 GB
> per node. I've used the "memory usage" functions provided by PETSc, but
> something happens during the solver stage that I can't explain. This brings
> up two questions.
>
> 1) In each step I put a call to four memory functions and between them I
> print the value of mem:
>

Did you call PetscMemorySetGetMaximumUsage() first?

We are computing https://en.wikipedia.org/wiki/Resident_set_size however we
can. Usually with getrusage().
>From this (https://www.binarytides.com/linux-command-check-memory-usage/),
it looks like top also reports
paged out memory.

   Matt


> mem = 0;
> PetscMallocGetCurrentUsage();
> PetscMallocGetMaximumUsage();
> PetscMemoryGetCurrentUsage();
> PetscMemoryGetMaximumUsage();
>
> I've read some other question in the mailing list regarding the same issue
> but I can't fully understand this. What is the difference between all of
> them? What information are they actually giving me? (I know this is only a
> "per process" output). I copy the output of two steps of the program as an
> example:
>
>  step N 
> MallocGetCurrent: 314513664.0 B
> MallocGetMaximum: 332723328.0 B
> MemoryGetCurrent: 539996160.0 B
> MemoryGetMaximum: 0.0 B
>  step N+1 
> MallocGetCurrent: 395902912.0 B
> MallocGetMaximum: 415178624.0 B
> MemoryGetCurrent: 623783936.0 B
> MemoryGetMaximum: 623775744.0 B
>
> 2) I was using this information to make the calculation of the memory
> required per node to run my problem. Also, I'm able to login to the
> computing node while running and I can check the memory consumption (with
> *top*). The memory used that I see with top is more or less the same as the
> one reported by PETSc functions at the beginning. But during the
> inialization of the solver and during the solving, *top* reports a
> consumption two times bigger than the one the functions report. Is it
> possible to know from where this extra memory consumption comes from? What
> things does SLEPc allocate that need that much memory? I've been trying to
> do the math but I think there are things I'm missing. I thought that part
> of it comes from the "BV" that the option -eps_view reports:
>
> BV Object: 2048 MPI processes
>   type: svec
>   17 columns of global length 2333606220
>   vector orthogonalization method: modified Gram-Schmidt
>   orthogonalization refinement: if needed (eta: 0.7071)
>   block orthogonalization method: GS
>   doing matmult as a single matrix-matrix product
>
> But "17 * 2333606220 * 8 Bytes / #nodes" only explains on third or less
> of the "extra" memory.
>
> Ale
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


[petsc-users] PETSc/SLEPc: Memory consumption, particularly during solver initialization/solve

2018-10-04 Thread Ale Foggia
Hello all,

I'm using SLEPc 3.9.2 (and PETSc 3.9.3) to get the EPS_SMALLEST_REAL of a
matrix with the following characteristics:

* type: real, Hermitian, sparse
* linear size: 2333606220
* distributed in 2048 processes (64 nodes, 32 procs per node)

My code first preallocates the necessary memory with
*MatMPIAIJSetPreallocation*, then fills it with the values and finally it
calls the following functions to create the solver and diagonalize the
matrix:

EPSCreate(PETSC_COMM_WORLD, );
EPSSetOperators(solver,matrix,NULL);
EPSSetProblemType(solver, EPS_HEP);
EPSSetType(solver, EPSLANCZOS);
EPSSetWhichEigenpairs(solver, EPS_SMALLEST_REAL);
EPSSetFromOptions(solver);
EPSSolve(solver);

I want to make an estimation for larger size problems of the memory used by
the program (at every step) because I would like to keep it under 16 GB per
node. I've used the "memory usage" functions provided by PETSc, but
something happens during the solver stage that I can't explain. This brings
up two questions.

1) In each step I put a call to four memory functions and between them I
print the value of mem:

mem = 0;
PetscMallocGetCurrentUsage();
PetscMallocGetMaximumUsage();
PetscMemoryGetCurrentUsage();
PetscMemoryGetMaximumUsage();

I've read some other question in the mailing list regarding the same issue
but I can't fully understand this. What is the difference between all of
them? What information are they actually giving me? (I know this is only a
"per process" output). I copy the output of two steps of the program as an
example:

 step N 
MallocGetCurrent: 314513664.0 B
MallocGetMaximum: 332723328.0 B
MemoryGetCurrent: 539996160.0 B
MemoryGetMaximum: 0.0 B
 step N+1 
MallocGetCurrent: 395902912.0 B
MallocGetMaximum: 415178624.0 B
MemoryGetCurrent: 623783936.0 B
MemoryGetMaximum: 623775744.0 B

2) I was using this information to make the calculation of the memory
required per node to run my problem. Also, I'm able to login to the
computing node while running and I can check the memory consumption (with
*top*). The memory used that I see with top is more or less the same as the
one reported by PETSc functions at the beginning. But during the
inialization of the solver and during the solving, *top* reports a
consumption two times bigger than the one the functions report. Is it
possible to know from where this extra memory consumption comes from? What
things does SLEPc allocate that need that much memory? I've been trying to
do the math but I think there are things I'm missing. I thought that part
of it comes from the "BV" that the option -eps_view reports:

BV Object: 2048 MPI processes
  type: svec
  17 columns of global length 2333606220
  vector orthogonalization method: modified Gram-Schmidt
  orthogonalization refinement: if needed (eta: 0.7071)
  block orthogonalization method: GS
  doing matmult as a single matrix-matrix product

But "17 * 2333606220 * 8 Bytes / #nodes" only explains on third or less of
the "extra" memory.

Ale