On Thu, Oct 4, 2018 at 1:54 PM Ale Foggia <amfog...@gmail.com> wrote:
> Thank you both for your answers :) > > Matt: > -Yes, sorry I forgot to tell you that, but I've also called > PetscMemorySetGetMaximumUsage() right after initializing SLEPc. Also I've > seen a strange behaviour: if I ran the same code in my computer and in the > cluster *without* the command line option -malloc_dump, in the cluster the > output of PetscMallocGetCurrentUsage and PetscMallocGetMaximumUsage is > always zero, but that doesn't happen in my computer. > > - This is the output of the code for the solving part (after EPSCreate and > after EPSSolve), and I've compared it with the output of *top* during those > moments of peak memory consumption. *top* provides in one of the columns > the resident set size (RES) and the numbers are around 1 GB per process, > while, considering the numbers reported by the PETSc functions, the one > that is more similar to that is given by MemoryGetCurrentUsage and is only > 800 MB in the solving stage. Maybe, we can consider that those numbers are > the same plus/minus something? Is it safe to say that MemoryGetCurrentUsage > is measuring the "ru_maxss" member of "rusage" (or something similar)? If > that's the case, what do the other functions report? > This is a perennial problem, since RSS is no guarantee of stuff that is actually being used, but only was allocated at some point. The best tool I have seen for this is Massif. I really recommend it: http://valgrind.org/docs/manual/ms-manual.html Thanks, Matt > ==================== SOLVER INIT ==================== > MallocGetCurrent (init): 396096192.0 B > MallocGetMaximum (init): 415178624.0 B > MemoryGetCurrent (init): 624050176.0 B > MemoryGetMaximum (init): 623775744.0 B > ==================== SOLVER ==================== > MallocGetCurrent (solver): 560320256.0 B > MallocGetMaximum (solver): 560333440.0 B > MemoryGetCurrent (solver): 820961280.0 B > MemoryGetMaximum (solver): 623775744.0 B > > Jose: > - By each step I mean each of the step of the the program in order to > diagonalize the matrix. For me, those are: creation of basis, preallocation > of matrix, setting values of matrix, initializing solver, > solving/diagonalizing and cleaning. I'm only diagonalizing once. > > - Regarding the information provided by -log_view, it's confusing for me: > for example, it reports the creation of Vecs scattered across the various > stages that I've set up (with PetscLogStageRegister and > PetscLogStagePush/Pop), but almost all the deletions are presented in the > "Main Stage". What does that "Main Stage" consider? Why are more deletions > in there that creations? It's nor completely for me clear how things are > presented there. > > - Thanks for the suggestion about the solver. Does "faster convergence" > for Krylov-Schur mean less memory and less computation, or just less > computation? > > Ale > > > El jue., 4 oct. 2018 a las 13:12, Jose E. Roman (<jro...@dsic.upv.es>) > escribió: > >> Regarding the SLEPc part: >> - What do you mean by "each step"? Are you calling EPSSolve() several >> times? >> - Yes, the BV object is generally what takes most of the memory. It is >> allocated at the beginning of EPSSolve(). Depending on the solver/options, >> other memory may be allocated as well. >> - You can also see the memory reported at the end of -log_view >> - I would suggest using the default solver Krylov-Schur - it will do >> Lanczos with implicit restart, which will give faster convergence than the >> EPSLANCZOS solver. >> >> Jose >> >> >> > El 4 oct 2018, a las 12:49, Matthew Knepley <knep...@gmail.com> >> escribió: >> > >> > On Thu, Oct 4, 2018 at 4:43 AM Ale Foggia <amfog...@gmail.com> wrote: >> > Hello all, >> > >> > I'm using SLEPc 3.9.2 (and PETSc 3.9.3) to get the EPS_SMALLEST_REAL of >> a matrix with the following characteristics: >> > >> > * type: real, Hermitian, sparse >> > * linear size: 2333606220 >> > * distributed in 2048 processes (64 nodes, 32 procs per node) >> > >> > My code first preallocates the necessary memory with >> *MatMPIAIJSetPreallocation*, then fills it with the values and finally it >> calls the following functions to create the solver and diagonalize the >> matrix: >> > >> > EPSCreate(PETSC_COMM_WORLD, &solver); >> > EPSSetOperators(solver,matrix,NULL); >> > EPSSetProblemType(solver, EPS_HEP); >> > EPSSetType(solver, EPSLANCZOS); >> > EPSSetWhichEigenpairs(solver, EPS_SMALLEST_REAL); >> > EPSSetFromOptions(solver); >> > EPSSolve(solver); >> > >> > I want to make an estimation for larger size problems of the memory >> used by the program (at every step) because I would like to keep it under >> 16 GB per node. I've used the "memory usage" functions provided by PETSc, >> but something happens during the solver stage that I can't explain. This >> brings up two questions. >> > >> > 1) In each step I put a call to four memory functions and between them >> I print the value of mem: >> > >> > Did you call PetscMemorySetGetMaximumUsage() first? >> > >> > We are computing https://en.wikipedia.org/wiki/Resident_set_size >> however we can. Usually with getrusage(). >> > From this ( >> https://www.binarytides.com/linux-command-check-memory-usage/), it looks >> like top also reports >> > paged out memory. >> > >> > Matt >> > >> > mem = 0; >> > PetscMallocGetCurrentUsage(&mem); >> > PetscMallocGetMaximumUsage(&mem); >> > PetscMemoryGetCurrentUsage(&mem); >> > PetscMemoryGetMaximumUsage(&mem); >> > >> > I've read some other question in the mailing list regarding the same >> issue but I can't fully understand this. What is the difference between all >> of them? What information are they actually giving me? (I know this is only >> a "per process" output). I copy the output of two steps of the program as >> an example: >> > >> > ==================== step N ==================== >> > MallocGetCurrent: 314513664.0 B >> > MallocGetMaximum: 332723328.0 B >> > MemoryGetCurrent: 539996160.0 B >> > MemoryGetMaximum: 0.0 B >> > ==================== step N+1 ==================== >> > MallocGetCurrent: 395902912.0 B >> > MallocGetMaximum: 415178624.0 B >> > MemoryGetCurrent: 623783936.0 B >> > MemoryGetMaximum: 623775744.0 B >> > >> > 2) I was using this information to make the calculation of the memory >> required per node to run my problem. Also, I'm able to login to the >> computing node while running and I can check the memory consumption (with >> *top*). The memory used that I see with top is more or less the same as the >> one reported by PETSc functions at the beginning. But during the >> inialization of the solver and during the solving, *top* reports a >> consumption two times bigger than the one the functions report. Is it >> possible to know from where this extra memory consumption comes from? What >> things does SLEPc allocate that need that much memory? I've been trying to >> do the math but I think there are things I'm missing. I thought that part >> of it comes from the "BV" that the option -eps_view reports: >> > >> > BV Object: 2048 MPI processes >> > type: svec >> > 17 columns of global length 2333606220 >> > vector orthogonalization method: modified Gram-Schmidt >> > orthogonalization refinement: if needed (eta: 0.7071) >> > block orthogonalization method: GS >> > doing matmult as a single matrix-matrix product >> > >> > But "17 * 2333606220 * 8 Bytes / #nodes" only explains on third or less >> of the "extra" memory. >> > >> > Ale >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > -- Norbert Wiener >> > >> > https://www.cse.buffalo.edu/~knepley/ >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>