The memory status after running DMMGSetSNESLocal() is Mem: 2033752k total, 1821800k used, 211952k free, 5944k buffers
then when it calls DMMGSolve(), the memory has been used up... till corruption. R Quoting "(Rebecca) Xuefei YUAN" <xy2102 at columbia.edu>: > Dear Matt, > > I ran the code on a 321*321 grid, with dof=4. The matrix is a sparse > matrix with type aij. > > After set up user defined options calls, the memory status is > Mem: 2033752k total, 607456k used, 1426296k free, 4832k buffers > > ierr = DMMGCreate(comm, parameters.numberOfLevels, &appCtx, > &dmmg);CHKERRQ(ierr); > ierr = DACreate2d(comm,DA_NONPERIODIC,DA_STENCIL_BOX, -5, -5, > PETSC_DECIDE, PETSC_DECIDE, 4, 2, 0, 0, &da);CHKERRQ(ierr); > ierr = DMMGSetDM(dmmg, (DM)da);CHKERRQ(ierr); > ierr = DASetFieldName(DMMGGetDA(dmmg), 0, "phi");CHKERRQ(ierr); > ierr = DASetFieldName(DMMGGetDA(dmmg), 1, "vz");CHKERRQ(ierr); > ierr = DASetFieldName(DMMGGetDA(dmmg), 2, "psi");CHKERRQ(ierr); > ierr = DASetFieldName(DMMGGetDA(dmmg), 3, "bz");CHKERRQ(ierr); > > > before DAGetMatrix() called, the memory status is > Mem: 2033752k total, 642940k used, 1390812k free, 4972k buffers > > ierr = DAGetMatrix(DMMGGetDA(dmmg), MATAIJ, &jacobian);CHKERRQ(ierr); > > In gdb, it uses around 500M memory after DAGetMatrix(), which I do not > think it is right, since for a sparse matrix with 13 nonzeros per row, > the memory it needs should be 321*321*4(dof)*13(nonzeros per > row)*8(PetscReal) = 42865056 bytes ~ 40M. It is strange. > i.e., after DAGetMatrix() call, the memory status is > > Mem: 2033752k total, 1152032k used, 881720k free, 5072k buffers > > Then when it goes the call of DMMGSetSNESLocal(), I found my memory is > using till the message of corruption has appeared. > > ierr = DMMGSetSNESLocal(dmmg, FormFunctionLocal, > FormJacobianLocal,0,0);CHKERRQ(ierr); > > The memory corruption happens. The error message is: > > 0 SNES Function norm 4.925849247379e-03 > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Out of memory. This could be due to allocating > [0]PETSC ERROR: too large an object or bleeding by not properly > [0]PETSC ERROR: destroying unneeded objects. > [0]PETSC ERROR: Memory allocated 2134287012 Memory used by process 1630638080 > [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info. > [0]PETSC ERROR: Memory requested 327270824! > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Petsc Release Version 3.0.0, Patch 1, Thu Jan 1 > 13:54:27 CST 2009 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: /home/rebecca/linux/code/physics/qffxmhd/tests/qffxmhd3 > on a linux-gnu named YuanWork by rebecca Mon Jul 27 17:49:53 2009 > [0]PETSC ERROR: Libraries linked from > /home/rebecca/soft/petsc-3.0.0-p1/linux-gnu-c-debug/lib > [0]PETSC ERROR: Configure run at Mon Apr 20 16:41:56 2009 > [0]PETSC ERROR: Configure options > --with-blas-lapack-dir=./externalpackages/fblaslapack-3.1.1/ > --download-mpich=1 --with-shared=0 > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: PetscMallocAlign() line 61 in src/sys/memory/mal.c > [0]PETSC ERROR: PetscTrMallocDefault() line 194 in src/sys/memory/mtr.c > [0]PETSC ERROR: MatDuplicateNoCreate_SeqAIJ() line 3402 in > src/mat/impls/aij/seq/aij.c > [0]PETSC ERROR: MatILUFactorSymbolic_SeqAIJ() line 1241 in > src/mat/impls/aij/seq/aijfact.c > [0]PETSC ERROR: MatILUFactorSymbolic() line 5243 in > src/mat/interface/matrix.c > [0]PETSC ERROR: PCSetUp_ILU() line 293 in src/ksp/pc/impls/factor/ilu/ilu.c > [0]PETSC ERROR: PCSetUp() line 794 in src/ksp/pc/interface/precon.c > [0]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: PCSetUp_MG() line 516 in src/ksp/pc/impls/mg/mg.c > [0]PETSC ERROR: PCSetUp() line 794 in src/ksp/pc/interface/precon.c > [0]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: SNES_KSPSolve() line 2899 in src/snes/interface/snes.c > [0]PETSC ERROR: SNESSolve_LS() line 191 in src/snes/impls/ls/ls.c > [0]PETSC ERROR: SNESSolve() line 2221 in src/snes/interface/snes.c > [0]PETSC ERROR: DMMGSolveSNES() line 510 in src/snes/utils/damgsnes.c > [0]PETSC ERROR: DMMGSolve() line 372 in src/snes/utils/damg.c > [0]PETSC ERROR: Solve() line 318 in qffxmhd.c > [0]PETSC ERROR: main() line 172 in qffxmhd.c > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0[unset]: > aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 > > Program exited with code 01. > > > 0 SNES Function norm 4.925849247379e-03 > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Out of memory. This could be due to allocating > [0]PETSC ERROR: too large an object or bleeding by not properly > [0]PETSC ERROR: destroying unneeded objects. > [0]PETSC ERROR: Memory allocated 2134287012 Memory used by process 1630638080 > [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info. > [0]PETSC ERROR: Memory requested 327270824! > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Petsc Release Version 3.0.0, Patch 1, Thu Jan 1 > 13:54:27 CST 2009 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: /home/rebecca/linux/code/physics/qffxmhd/tests/qffxmhd3 > on a linux-gnu named YuanWork by rebecca Mon Jul 27 17:49:53 2009 > [0]PETSC ERROR: Libraries linked from > /home/rebecca/soft/petsc-3.0.0-p1/linux-gnu-c-debug/lib > [0]PETSC ERROR: Configure run at Mon Apr 20 16:41:56 2009 > [0]PETSC ERROR: Configure options > --with-blas-lapack-dir=./externalpackages/fblaslapack-3.1.1/ > --download-mpich=1 --with-shared=0 > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: PetscMallocAlign() line 61 in src/sys/memory/mal.c > [0]PETSC ERROR: PetscTrMallocDefault() line 194 in src/sys/memory/mtr.c > [0]PETSC ERROR: MatDuplicateNoCreate_SeqAIJ() line 3402 in > src/mat/impls/aij/seq/aij.c > [0]PETSC ERROR: MatILUFactorSymbolic_SeqAIJ() line 1241 in > src/mat/impls/aij/seq/aijfact.c > [0]PETSC ERROR: MatILUFactorSymbolic() line 5243 in > src/mat/interface/matrix.c > [0]PETSC ERROR: PCSetUp_ILU() line 293 in src/ksp/pc/impls/factor/ilu/ilu.c > [0]PETSC ERROR: PCSetUp() line 794 in src/ksp/pc/interface/precon.c > [0]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: PCSetUp_MG() line 516 in src/ksp/pc/impls/mg/mg.c > [0]PETSC ERROR: PCSetUp() line 794 in src/ksp/pc/interface/precon.c > [0]PETSC ERROR: KSPSetUp() line 237 in src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: KSPSolve() line 353 in src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: SNES_KSPSolve() line 2899 in src/snes/interface/snes.c > [0]PETSC ERROR: SNESSolve_LS() line 191 in src/snes/impls/ls/ls.c > [0]PETSC ERROR: SNESSolve() line 2221 in src/snes/interface/snes.c > [0]PETSC ERROR: DMMGSolveSNES() line 510 in src/snes/utils/damgsnes.c > [0]PETSC ERROR: DMMGSolve() line 372 in src/snes/utils/damg.c > [0]PETSC ERROR: Solve() line 318 in qffxmhd.c > [0]PETSC ERROR: main() line 172 in qffxmhd.c > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0[unset]: > aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 > > Program exited with code 01. > > I thought it might because of the unfreed memory, so I picked up ex29.c > as a comparision. > > Thanks, > > Rebecca > > > Quoting Matthew Knepley <knepley at gmail.com>: > >> On Mon, Jul 27, 2009 at 4:34 PM, (Rebecca) Xuefei YUAN >> <xy2102 at columbia.edu>wrote: >> >>> Those unfreed bytes cause "out of memory" when it runs at bigger grid >>> sizes. So I have to find out those unfreed memory and free them... Any >>> suggestions? >> >> >> Not from what you mailed in. On that DA line, I see PetscHeaderCreate(). Is >> that what you see? >> >> Matt >> >> >>> >>> Thanks, >>> >>> R >>> >>> >>> Quoting Matthew Knepley <knepley at gmail.com>: >>> >>> On Mon, Jul 27, 2009 at 3:51 PM, (Rebecca) Xuefei YUAN >>>> <xy2102 at columbia.edu>wrote: >>>> >>>> Hi, >>>>> >>>>> My own code has some left bytes still reachable according to valgrind, >>>>> then >>>>> I use two different version petsc (2.3.3-p15 and 3.0.0-p1) to compile and >>>>> make the files, it gives me different number of bytes left still >>>>> reachable. >>>>> Moreover, I picked up the /snes/example/tutorials/ex29.c as another >>>>> example, >>>>> and found that some bytes are still reachable, what is the cause of it? >>>>> It >>>>> shows that it is from DACreate2D() and the I use -malloc_dump to get >>>>> those >>>>> unfreed informations. >>>>> >>>>> I understand that for those 5 loss record, the 2nd, 3rd and 4th are true >>>>> for all examples, but where do 1st and 5th ones come from? Also, the >>>>> -malloc_dump information shows that there are >>>>> "[0]Total space allocated 37780 bytes", >>>>> but valgrind gives the information as >>>>> "==26628== still reachable: 132,828 bytes in 323 blocks" >>>>> >>>>> Why there is a big difference? >>>>> >>>> >>>> >>>> 1 is fine. It is from PMPI setup, which has some bytes not freed from >>>> setting up the MPI >>>> processes. The last one looks like an unfreed header for a DA, which is >>>> strange. >>>> >>>> Matt >>>> >>>> >>>> >>>>> Thanks very much! >>>>> >>>>> Rebecca >>>>> >>>>> Here is the message from valgrind of running ex29: >>>>> ==26628== 32 bytes in 2 blocks are still reachable in loss record 1 of 5 >>>>> ==26628== at 0x4022AB8: malloc (vg_replace_malloc.c:207) >>>>> ==26628== by 0x86F9A78: MPID_VCRT_Create (mpid_vc.c:62) >>>>> ==26628== by 0x86F743A: MPID_Init (mpid_init.c:116) >>>>> ==26628== by 0x86D040B: MPIR_Init_thread (initthread.c:288) >>>>> ==26628== by 0x86CFF2D: PMPI_Init (init.c:106) >>>>> ==26628== by 0x8613D69: PetscInitialize (pinit.c:503) >>>>> ==26628== by 0x804B796: main (ex29.c:139) >>>>> ==26628== >>>>> ==26628== >>>>> ==26628== 156 (36 direct, 120 indirect) bytes in 1 blocks are definitely >>>>> lost in loss record 2 of 5 >>>>> ==26628== at 0x4022AB8: malloc (vg_replace_malloc.c:207) >>>>> ==26628== by 0x429B3E2: (within /lib/tls/i686/cmov/libc-2.7.so) >>>>> ==26628== by 0x429BC2D: __nss_database_lookup (in /lib/tls/i686/cmov/ >>>>> libc-2.7.so) >>>>> ==26628== by 0x4732FDB: ??? >>>>> ==26628== by 0x473413C: ??? >>>>> ==26628== by 0x4247D15: getpwuid_r (in /lib/tls/i686/cmov/libc-2.7.so >>>>> ) >>>>> ==26628== by 0x424765D: getpwuid (in /lib/tls/i686/cmov/libc-2.7.so) >>>>> ==26628== by 0x8623509: PetscGetUserName (fuser.c:68) >>>>> ==26628== by 0x85E0CF0: PetscErrorPrintfInitialize (errtrace.c:68) >>>>> ==26628== by 0x8613E23: PetscInitialize (pinit.c:518) >>>>> ==26628== by 0x804B796: main (ex29.c:139) >>>>> ==26628== >>>>> ==26628== >>>>> ==26628== 40 bytes in 5 blocks are indirectly lost in loss record 3 of 5 >>>>> ==26628== at 0x4022AB8: malloc (vg_replace_malloc.c:207) >>>>> ==26628== by 0x429AFBB: __nss_lookup_function (in /lib/tls/i686/cmov/ >>>>> libc-2.7.so) >>>>> ==26628== by 0x4732FFB: ??? >>>>> ==26628== by 0x473413C: ??? >>>>> ==26628== by 0x4247D15: getpwuid_r (in /lib/tls/i686/cmov/libc-2.7.so >>>>> ) >>>>> ==26628== by 0x424765D: getpwuid (in /lib/tls/i686/cmov/libc-2.7.so) >>>>> ==26628== by 0x8623509: PetscGetUserName (fuser.c:68) >>>>> ==26628== by 0x85E0CF0: PetscErrorPrintfInitialize (errtrace.c:68) >>>>> ==26628== by 0x8613E23: PetscInitialize (pinit.c:518) >>>>> ==26628== by 0x804B796: main (ex29.c:139) >>>>> ==26628== >>>>> ==26628== >>>>> ==26628== 80 bytes in 5 blocks are indirectly lost in loss record 4 of 5 >>>>> ==26628== at 0x4022AB8: malloc (vg_replace_malloc.c:207) >>>>> ==26628== by 0x428839B: tsearch (in /lib/tls/i686/cmov/libc-2.7.so) >>>>> ==26628== by 0x429AF7D: __nss_lookup_function (in /lib/tls/i686/cmov/ >>>>> libc-2.7.so) >>>>> ==26628== by 0x4732FFB: ??? >>>>> ==26628== by 0x473413C: ??? >>>>> ==26628== by 0x4247D15: getpwuid_r (in /lib/tls/i686/cmov/libc-2.7.so >>>>> ) >>>>> ==26628== by 0x424765D: getpwuid (in /lib/tls/i686/cmov/libc-2.7.so) >>>>> ==26628== by 0x8623509: PetscGetUserName (fuser.c:68) >>>>> ==26628== by 0x85E0CF0: PetscErrorPrintfInitialize (errtrace.c:68) >>>>> ==26628== by 0x8613E23: PetscInitialize (pinit.c:518) >>>>> ==26628== by 0x804B796: main (ex29.c:139) >>>>> ==26628== >>>>> ==26628== >>>>> ==26628== 132,796 bytes in 321 blocks are still reachable in loss record >>>>> 5 >>>>> of 5 >>>>> ==26628== at 0x4022AB8: malloc (vg_replace_malloc.c:207) >>>>> ==26628== by 0x85EF3AC: PetscMallocAlign (mal.c:40) >>>>> ==26628== by 0x85F049B: PetscTrMallocDefault (mtr.c:194) >>>>> ==26628== by 0x81BCD3F: DACreate2d (da2.c:364) >>>>> ==26628== by 0x804BAFB: main (ex29.c:153) >>>>> ==26628== >>>>> ==26628== LEAK SUMMARY: >>>>> ==26628== definitely lost: 36 bytes in 1 blocks. >>>>> ==26628== indirectly lost: 120 bytes in 10 blocks. >>>>> ==26628== possibly lost: 0 bytes in 0 blocks. >>>>> ==26628== still reachable: 132,828 bytes in 323 blocks. >>>>> ==26628== suppressed: 0 bytes in 0 blocks. >>>>> >>>>> >>>>> -- >>>>> (Rebecca) Xuefei YUAN >>>>> Department of Applied Physics and Applied Mathematics >>>>> Columbia University >>>>> Tel:917-399-8032 >>>>> www.columbia.edu/~xy2102 <http://www.columbia.edu/%7Exy2102> < >>>>> http://www.columbia.edu/%7Exy2102> >>>>> >>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments >>>> is infinitely more interesting than any results to which their experiments >>>> lead. >>>> -- Norbert Wiener >>>> >>>> >>> >>> >>> -- >>> (Rebecca) Xuefei YUAN >>> Department of Applied Physics and Applied Mathematics >>> Columbia University >>> Tel:917-399-8032 >>> www.columbia.edu/~xy2102 <http://www.columbia.edu/%7Exy2102> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> > > > > -- > (Rebecca) Xuefei YUAN > Department of Applied Physics and Applied Mathematics > Columbia University > Tel:917-399-8032 > www.columbia.edu/~xy2102 -- (Rebecca) Xuefei YUAN Department of Applied Physics and Applied Mathematics Columbia University Tel:917-399-8032 www.columbia.edu/~xy2102