On Tue, May 28, 2013 at 5:54 AM, Fande Kong <[email protected]> wrote:
> Hi Smith, > > Thank you very much. According to your suggestions and information, I > added these functions into my code to measure the memory usage. Now I am > confused, since the small problem needs large memory. > > I added the function PetscMemorySetGetMaximumUsage() immediately after > PetscInitialize(). And then I added the following code into several > positions in the code (before & after setting up unstructured mesh, before > & after KSPSetUp(), before & after KSPSolve(), and Destroy all stuffs): > > PetscLogDouble space =0; > ierr = PetscMallocGetCurrentUsage(&space);CHKERRQ(ierr); > ierr = PetscPrintf(comm,"Current space PetscMalloc()ed %G M\n", > space/(1024*1024));CHKERRQ(ierr); > ierr = PetscMallocGetMaximumUsage(&space);CHKERRQ(ierr); > ierr = PetscPrintf(comm,"Max space PetscMalloced() %G M\n", > space/(1024*1024));CHKERRQ(ierr); > ierr = PetscMemoryGetCurrentUsage(&space);CHKERRQ(ierr); > ierr = PetscPrintf(comm,"Current process memory %G M\n", > space/(1024*1024));CHKERRQ(ierr); > ierr = PetscMemoryGetMaximumUsage(&space);CHKERRQ(ierr); > ierr = PetscPrintf(comm,"Max process memory %G M\n", > space/(1024*1024));CHKERRQ(ierr); > > > In order to measure the memory usage, I just used only one core (mpirun -n > 1 ./program ) to solve a small problem with 12691 mesh nodes (the freedom > is about 12691*3= 4 *10^4 ). I solve the linear elasticity problem by using > FGMRES preconditioned by multigrid method (PCMG). I use all petsc standard > routines except that I construct coarse matrix and interpolation matrix by > myself. I used the following run script to set up solver and preconditioner: > > mpirun -n 1 ./linearElasticity -ksp_type fgmres -pc_type mg > -pc_mg_levels 2 -pc_mg_cycle_type v -pc_mg_type multiplicative > -mg_levels_1_ksp_type richardson -mg_levels_1_ksp_max_it 1 > -mg_levels_1_pc_type asm -mg_levels_1_sub_ksp_type preonly > -mg_levels_1_sub_pc_type ilu -mg_levels_1_sub_pc_factor_levels 4 > -mg_levels_1_sub_pc_factor_mat_ordering_type rcm -mg_coarse_ksp_type cg > -mg_coarse_ksp_rtol 0.1 -mg_coarse_ksp_max_it 10 -mg_coarse_pc_type asm > -mg_coarse_sub_ksp_type preonly -mg_coarse_sub_pc_type ilu > -mg_coarse_sub_pc_factor_levels 2 > -mg_coarse_sub_pc_factor_mat_ordering_type rcm -ksp_view -log_summary > -pc_mg_log > > > I got the following results: > > (1) before setting up mesh, > > Current space PetscMalloc()ed 0.075882 M > Max space PetscMalloced() 0.119675 M > Current process memory 7.83203 M > Max process memory 0 M > > (2) after setting up mesh, > > Current space PetscMalloc()ed 16.8411 M > Max space PetscMalloced() 22.1353 M > Current process memory 28.4336 M > Max process memory 33.0547 M > > (3) before calling KSPSetUp() > > Current space PetscMalloc()ed 16.868 M > Max space PetscMalloced() 22.1353 M > Current process memory 28.6914 M > Max process memory 33.0547 M > > > (4) after calling KSPSetUp() > > Current space PetscMalloc()ed 74.3354 M > Max space PetscMalloced() 74.3355 M > This makes sense. It is 20M for your mesh, 20M for the Krylov space on the fine level, and I am guessing 35M for the Jacobian and the ILU factors. > Current process memory 85.6953 M > Max process memory 84.9258 M > > (5) before calling KSPSolve() > > Current space PetscMalloc()ed 74.3354 M > Max space PetscMalloced() 74.3355 M > Current process memory 85.8711 M > Max process memory 84.9258 M > > (6) after calling KSPSolve() > The question is what was malloc'd here. There is no way we could tell without seeing the code and probably running it. I suggest using http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscMallocDump.html to see what was allocated. The solvers tend not to allocated during the solve, as that is slow. So I would be inclined to check user code first. Matt > Current space PetscMalloc()ed 290.952 M > Max space PetscMalloced() 593.367 M > Current process memory 306.852 M > Max process memory 301.441 M > > (7) After destroying all stuffs > > Current space PetscMalloc()ed 0.331482 M > Max space PetscMalloced() 593.367 M > Current process memory 67.2539 M > Max process memory 309.137 M > > > So my question is why/if I need so much memory (306.852 M) for so small > problem (freedom: 4*10^4). Or is it normal case? Or my run script used to > set up solver is not reasonable? > > > Regards, > > Fande Kong, > > Department of Computer Science > University of Colorado Boulder > > > > > > > > > > > On Mon, May 27, 2013 at 9:48 PM, Barry Smith <[email protected]> wrote: > >> >> There are several ways to monitor the memory usage. You can divide >> them into two categories: those that monitor how much memory has been >> malloced specifically by PETSc and how much is used totally be the process. >> >> PetscMallocGetCurrentUsage() and PetscMallocGetMaximumUsage() which only >> work with the command line option -malloc provide how much PETSc has >> malloced. >> >> PetscMemoryGetCurrentUsage() and PetscMemoryGetMaximumUsage() (call >> PetscMemorySetGetMaximumUsage() immediately after PetscInitialize() for >> this one to work) provide total memory usage. >> >> These are called on each process so use a MPI_Reduce() to gather the >> total memory across all processes to process 0 to print it out. Suggest >> calling it after the mesh as been set up, then call again immediately >> before the XXXSolve() is called and then after the XXXSolve() is called. >> >> Please let us know if you have any difficulties. >> >> As always we recommend you upgrade to PETSc 3.4 >> >> Barry >> >> >> >> On May 27, 2013, at 10:22 PM, Fande Kong <[email protected]> wrote: >> >> > Hi all, >> > >> > How to measure the memory usage of the application built on the Petsc? >> I am now solving linear elasticity equations with fgmres preconditioned by >> two-level method, that is, preconditioned by multigrid method where on each >> level the additive Schwarz method is adopted. More than 1000 cores are >> adopted to solve this problem on the supercomputer. When the total freedom >> of the problem is about 60M, the application correctly run and produce >> correct results. But when the total freedom increases to 600M, the >> application abort and say there is not enough memory ( the system >> administrator of the supercomputer told me that my application run out >> memory). >> > >> > Thus, I want to monitor the memory usage dynamically when the >> application running. Are there any functions or strategies that could be >> used for this purpose? >> > >> > The error information is attached. >> > >> > Regards, >> > -- >> > Fande Kong >> > Department of Computer Science >> > University of Colorado at Boulder >> > <solid3dcube2.o1603352><configure and make log.zip> >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
