On Wed, Jul 22, 2015 at 4:15 PM, [email protected] < [email protected]> wrote:
> So, I tried to use MUMPS instead of SuperLU. > > > > Still some problems with running out of memory, but I think that Sherry > and Hong have poked me in an important direction: > > > > I may have also underestimated the need for matrix reordering when > computing the LU-factors. I have to look into this… > > > > > > One thing which is puzzling me now is that when I followed Hongs > suggestion to try an iterative solver, I found that it solved my simple > test problem after some testing with different settings. > > I am solving Ax = b with a sparse, indefinite, symmetric, complex matrix, > can anything be said about the chances of success in using an iterative > method? > That depends entirely on your problem. It you are solving something that looks like elasticity, then -pc_type gamg should solve it fine. However, nothing can be said in general. Matt > /Mahir > > > > > > *From:* Matthew Knepley [mailto:[email protected]] > *Sent:* den 22 juli 2015 19:17 > *To:* Ülker-Kaustell, Mahir > *Cc:* Barry Smith; petsc-users > *Subject:* Re: [petsc-users] SuperLU MPI-problem > > > > On Wed, Jul 22, 2015 at 11:11 AM, [email protected] < > [email protected]> wrote: > > Thank you for your reply. > > As you have probably figured out already, I am not a computational > scientist. I am a researcher in civil engineering (railways for high-speed > traffic), trying to produce some, from my perspective, fairly large > parametric studies based on finite element discretizations. > > I am working in a Windows-environment and have installed PETSc through > Cygwin. > Apparently, there is no support for Valgrind in this OS. > > > > It is really worth any amount of time and effort to get away from Windows > if you are doing computational science. > > > > If I have understood you correct, the memory issues are related to superLU > and given my background, there is not much I can do. Is this correct? > > > > The next step is to run the problem using MUMPS (--download-mumps > --download-scalapack). > > > > Thanks, > > > > Matt > > > > Best regards, > Mahir > > ______________________________________________ > Mahir Ülker-Kaustell, Kompetenssamordnare, Brokonstruktör, Tekn. Dr, > Tyréns AB > 010 452 30 82, [email protected] > ______________________________________________ > > -----Original Message----- > From: Barry Smith [mailto:[email protected]] > Sent: den 22 juli 2015 02:57 > To: Ülker-Kaustell, Mahir > Cc: Xiaoye S. Li; petsc-users > Subject: Re: [petsc-users] SuperLU MPI-problem > > > Run the program under valgrind > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind . When I use > the option -mat_superlu_dist_parsymbfact I get many scary memory problems > some involving for example ddist_psymbtonum (pdsymbfact_distdata.c:1332) > > Note that I consider it unacceptable for running programs to EVER use > uninitialized values; until these are all cleaned up I won't trust any runs > like this. > > Barry > > > > > ==42050== Conditional jump or move depends on uninitialised value(s) > ==42050== at 0x10274C436: MPI_Allgatherv (allgatherv.c:1053) > ==42050== by 0x101557F60: get_perm_c_parmetis > (get_perm_c_parmetis.c:285) > ==42050== by 0x101501192: pdgssvx (pdgssvx.c:934) > ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42050== by 0x100FF9036: PCSetUp (precon.c:982) > ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42050== by 0x100001B3C: main (in ./ex19) > ==42050== Uninitialised value was created by a stack allocation > ==42050== at 0x10155751B: get_perm_c_parmetis (get_perm_c_parmetis.c:96) > ==42050== > ==42050== Conditional jump or move depends on uninitialised value(s) > ==42050== at 0x102851C61: MPIR_Allgatherv_intra (allgatherv.c:651) > ==42050== by 0x102853EC7: MPIR_Allgatherv (allgatherv.c:903) > ==42050== by 0x102853F84: MPIR_Allgatherv_impl (allgatherv.c:944) > ==42050== by 0x10274CA41: MPI_Allgatherv (allgatherv.c:1107) > ==42050== by 0x101557F60: get_perm_c_parmetis > (get_perm_c_parmetis.c:285) > ==42050== by 0x101501192: pdgssvx (pdgssvx.c:934) > ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42050== by 0x100FF9036: PCSetUp (precon.c:982) > ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42050== by 0x100001B3C: main (in ./ex19) > ==42050== Uninitialised value was created by a stack allocation > ==42050== at 0x10155751B: get_perm_c_parmetis (get_perm_c_parmetis.c:96) > ==42050== > ==42049== Syscall param writev(vector[...]) points to uninitialised byte(s) > ==42049== at 0x102DA1C3A: writev (in > /usr/lib/system/libsystem_kernel.dylib) > ==42049== by 0x10296A0DC: MPL_large_writev (mplsock.c:32) > ==42049== by 0x10295F6AD: MPIDU_Sock_writev (sock_immed.i:610) > ==42049== by 0x102943FCA: MPIDI_CH3_iSendv (ch3_isendv.c:84) > ==42049== by 0x102934361: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:556) > ==42049== by 0x102939531: MPID_Isend (mpid_isend.c:138) > ==42049== by 0x10277656E: MPI_Isend (isend.c:125) > ==42049== by 0x102088B66: libparmetis__gkMPI_Isend (gkmpi.c:63) > ==42049== by 0x10208140F: libparmetis__CommInterfaceData (comm.c:298) > ==42049== by 0x1020A8758: libparmetis__CompactGraph (ometis.c:553) > ==42049== by 0x1020A77BB: libparmetis__MultilevelOrder (ometis.c:225) > ==42049== by 0x1020A7493: ParMETIS_V32_NodeND (ometis.c:151) > ==42049== by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34) > ==42049== by 0x101557CFC: get_perm_c_parmetis > (get_perm_c_parmetis.c:241) > ==42049== by 0x101501192: pdgssvx (pdgssvx.c:934) > ==42049== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42049== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42049== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42049== by 0x100FF9036: PCSetUp (precon.c:982) > ==42048== Syscall param writev(vector[...]) points to uninitialised byte(s) > ==42049== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42049== Address 0x105edff70 is 1,424 bytes inside a block of size > 752,720 alloc'd > ==42049== at 0x1000183B1: malloc (vg_replace_malloc.c:303) > ==42049== by 0x1020EB90C: gk_malloc (memory.c:147) > ==42049== by 0x1020EAA28: gk_mcoreCreate (mcore.c:28) > ==42048== at 0x102DA1C3A: writev (in > /usr/lib/system/libsystem_kernel.dylib) > ==42048== by 0x10296A0DC: MPL_large_writev (mplsock.c:32) > ==42049== by 0x1020BA5CF: libparmetis__AllocateWSpace (wspace.c:23) > ==42049== by 0x1020A6E84: ParMETIS_V32_NodeND (ometis.c:98) > ==42048== by 0x10295F6AD: MPIDU_Sock_writev (sock_immed.i:610) > ==42048== by 0x102943FCA: MPIDI_CH3_iSendv (ch3_isendv.c:84) > ==42048== by 0x102934361: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:556) > ==42049== by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34) > ==42049== by 0x101557CFC: get_perm_c_parmetis > (get_perm_c_parmetis.c:241) > ==42049== by 0x101501192: pdgssvx (pdgssvx.c:934) > ==42048== by 0x102939531: MPID_Isend (mpid_isend.c:138) > ==42048== by 0x10277656E: MPI_Isend (isend.c:125) > ==42049== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42049== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42049== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42049== by 0x100FF9036: PCSetUp (precon.c:982) > ==42049== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42048== by 0x102088B66: libparmetis__gkMPI_Isend (gkmpi.c:63) > ==42048== by 0x10208140F: libparmetis__CommInterfaceData (comm.c:298) > ==42049== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42049== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42048== by 0x1020A8758: libparmetis__CompactGraph (ometis.c:553) > ==42048== by 0x1020A77BB: libparmetis__MultilevelOrder (ometis.c:225) > ==42048== by 0x1020A7493: ParMETIS_V32_NodeND (ometis.c:151) > ==42049== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42049== by 0x100001B3C: main (in ./ex19) > ==42049== Uninitialised value was created by a heap allocation > ==42049== at 0x1000183B1: malloc (vg_replace_malloc.c:303) > ==42049== by 0x1020EB90C: gk_malloc (memory.c:147) > ==42048== by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34) > ==42048== by 0x101557CFC: get_perm_c_parmetis > (get_perm_c_parmetis.c:241) > ==42048== by 0x101501192: pdgssvx (pdgssvx.c:934) > ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42049== by 0x10211C50B: libmetis__imalloc (gklib.c:24) > ==42049== by 0x1020A8566: libparmetis__CompactGraph (ometis.c:519) > ==42049== by 0x1020A77BB: libparmetis__MultilevelOrder (ometis.c:225) > ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42049== by 0x1020A7493: ParMETIS_V32_NodeND (ometis.c:151) > ==42049== by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34) > ==42049== by 0x101557CFC: get_perm_c_parmetis > (get_perm_c_parmetis.c:241) > ==42049== by 0x101501192: pdgssvx (pdgssvx.c:934) > ==42049== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42049== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42049== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42049== by 0x100FF9036: PCSetUp (precon.c:982) > ==42049== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42049== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42049== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42048== by 0x100FF9036: PCSetUp (precon.c:982) > ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42048== Address 0x10597a860 is 1,408 bytes inside a block of size > 752,720 alloc'd > ==42049== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42049== by 0x100001B3C: main (in ./ex19) > ==42049== > ==42048== at 0x1000183B1: malloc (vg_replace_malloc.c:303) > ==42048== by 0x1020EB90C: gk_malloc (memory.c:147) > ==42048== by 0x1020EAA28: gk_mcoreCreate (mcore.c:28) > ==42048== by 0x1020BA5CF: libparmetis__AllocateWSpace (wspace.c:23) > ==42048== by 0x1020A6E84: ParMETIS_V32_NodeND (ometis.c:98) > ==42048== by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34) > ==42048== by 0x101557CFC: get_perm_c_parmetis > (get_perm_c_parmetis.c:241) > ==42048== by 0x101501192: pdgssvx (pdgssvx.c:934) > ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42048== by 0x100FF9036: PCSetUp (precon.c:982) > ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42048== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42048== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42048== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42048== by 0x100001B3C: main (in ./ex19) > ==42048== Uninitialised value was created by a heap allocation > ==42048== at 0x1000183B1: malloc (vg_replace_malloc.c:303) > ==42048== by 0x1020EB90C: gk_malloc (memory.c:147) > ==42048== by 0x10211C50B: libmetis__imalloc (gklib.c:24) > ==42048== by 0x1020A8566: libparmetis__CompactGraph (ometis.c:519) > ==42048== by 0x1020A77BB: libparmetis__MultilevelOrder (ometis.c:225) > ==42048== by 0x1020A7493: ParMETIS_V32_NodeND (ometis.c:151) > ==42048== by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34) > ==42048== by 0x101557CFC: get_perm_c_parmetis > (get_perm_c_parmetis.c:241) > ==42048== by 0x101501192: pdgssvx (pdgssvx.c:934) > ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42048== by 0x100FF9036: PCSetUp (precon.c:982) > ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42048== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42048== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42048== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42048== by 0x100001B3C: main (in ./ex19) > ==42048== > ==42048== Syscall param write(buf) points to uninitialised byte(s) > ==42048== at 0x102DA1C22: write (in > /usr/lib/system/libsystem_kernel.dylib) > ==42048== by 0x10295F5BD: MPIDU_Sock_write (sock_immed.i:525) > ==42048== by 0x102944839: MPIDI_CH3_iStartMsg (ch3_istartmsg.c:86) > ==42048== by 0x102933B80: MPIDI_CH3_EagerContigShortSend > (ch3u_eager.c:257) > ==42048== by 0x10293ADBA: MPID_Send (mpid_send.c:130) > ==42048== by 0x10277A1FA: MPI_Send (send.c:127) > ==42048== by 0x10155802F: get_perm_c_parmetis > (get_perm_c_parmetis.c:299) > ==42048== by 0x101501192: pdgssvx (pdgssvx.c:934) > ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42048== by 0x100FF9036: PCSetUp (precon.c:982) > ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42048== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42048== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42048== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42048== by 0x100001B3C: main (in ./ex19) > ==42048== Address 0x104810704 is on thread 1's stack > ==42048== in frame #3, created by MPIDI_CH3_EagerContigShortSend > (ch3u_eager.c:218) > ==42048== Uninitialised value was created by a heap allocation > ==42048== at 0x1000183B1: malloc (vg_replace_malloc.c:303) > ==42048== by 0x10153B704: superlu_malloc_dist (memory.c:108) > ==42048== by 0x101557AB9: get_perm_c_parmetis > (get_perm_c_parmetis.c:185) > ==42048== by 0x101501192: pdgssvx (pdgssvx.c:934) > ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42048== by 0x100FF9036: PCSetUp (precon.c:982) > ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42048== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42048== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42048== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42048== by 0x100001B3C: main (in ./ex19) > ==42048== > ==42050== Conditional jump or move depends on uninitialised value(s) > ==42050== at 0x102744CB8: MPI_Alltoallv (alltoallv.c:480) > ==42050== by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539) > ==42050== by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275) > ==42050== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) > ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42050== by 0x100FF9036: PCSetUp (precon.c:982) > ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42050== by 0x100001B3C: main (in ./ex19) > ==42050== Uninitialised value was created by a stack allocation > ==42050== at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96) > ==42050== > ==42050== Conditional jump or move depends on uninitialised value(s) > ==42050== at 0x102744E43: MPI_Alltoallv (alltoallv.c:490) > ==42050== by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539) > ==42050== by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275) > ==42050== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) > ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42050== by 0x100FF9036: PCSetUp (precon.c:982) > ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42050== by 0x100001B3C: main (in ./ex19) > ==42050== Uninitialised value was created by a stack allocation > ==42050== at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96) > ==42050== > ==42050== Conditional jump or move depends on uninitialised value(s) > ==42050== at 0x102744EBF: MPI_Alltoallv (alltoallv.c:497) > ==42050== by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539) > ==42050== by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275) > ==42050== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) > ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42050== by 0x100FF9036: PCSetUp (precon.c:982) > ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42050== by 0x100001B3C: main (in ./ex19) > ==42050== Uninitialised value was created by a stack allocation > ==42050== at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96) > ==42050== > ==42050== Conditional jump or move depends on uninitialised value(s) > ==42050== at 0x1027450B1: MPI_Alltoallv (alltoallv.c:512) > ==42050== by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539) > ==42050== by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275) > ==42050== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) > ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42050== by 0x100FF9036: PCSetUp (precon.c:982) > ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42050== by 0x100001B3C: main (in ./ex19) > ==42050== Uninitialised value was created by a stack allocation > ==42050== at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96) > ==42050== > ==42050== Conditional jump or move depends on uninitialised value(s) > ==42050== at 0x10283FB06: MPIR_Alltoallv_intra (alltoallv.c:92) > ==42050== by 0x1028407B6: MPIR_Alltoallv (alltoallv.c:343) > ==42050== by 0x102840884: MPIR_Alltoallv_impl (alltoallv.c:380) > ==42050== by 0x10274541B: MPI_Alltoallv (alltoallv.c:531) > ==42050== by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539) > ==42050== by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275) > ==42050== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) > ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42050== by 0x100FF9036: PCSetUp (precon.c:982) > ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42050== by 0x100001B3C: main (in ./ex19) > ==42050== Uninitialised value was created by a stack allocation > ==42050== at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96) > ==42050== > ==42050== Syscall param writev(vector[...]) points to uninitialised byte(s) > ==42050== at 0x102DA1C3A: writev (in > /usr/lib/system/libsystem_kernel.dylib) > ==42050== by 0x10296A0DC: MPL_large_writev (mplsock.c:32) > ==42050== by 0x10295F6AD: MPIDU_Sock_writev (sock_immed.i:610) > ==42050== by 0x102943FCA: MPIDI_CH3_iSendv (ch3_isendv.c:84) > ==42050== by 0x102934361: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:556) > ==42050== by 0x102939531: MPID_Isend (mpid_isend.c:138) > ==42050== by 0x10277656E: MPI_Isend (isend.c:125) > ==42050== by 0x101524C41: pdgstrf2_trsm (pdgstrf2.c:201) > ==42050== by 0x10151ECBF: pdgstrf (pdgstrf.c:1082) > ==42050== by 0x1015019A5: pdgssvx (pdgssvx.c:1069) > ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42050== by 0x100FF9036: PCSetUp (precon.c:982) > ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42050== by 0x100001B3C: main (in ./ex19) > ==42050== Address 0x1060144d0 is 1,168 bytes inside a block of size > 131,072 alloc'd > ==42050== at 0x1000183B1: malloc (vg_replace_malloc.c:303) > ==42050== by 0x10153B704: superlu_malloc_dist (memory.c:108) > ==42050== by 0x1014FD7AD: doubleMalloc_dist (dmemory.c:145) > ==42050== by 0x10151DA7D: pdgstrf (pdgstrf.c:735) > ==42050== by 0x1015019A5: pdgssvx (pdgssvx.c:1069) > ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42050== by 0x100FF9036: PCSetUp (precon.c:982) > ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42050== by 0x100001B3C: main (in ./ex19) > ==42050== Uninitialised value was created by a heap allocation > ==42050== at 0x1000183B1: malloc (vg_replace_malloc.c:303) > ==42050== by 0x10153B704: superlu_malloc_dist (memory.c:108) > ==42050== by 0x1014FD7AD: doubleMalloc_dist (dmemory.c:145) > ==42050== by 0x10151DA7D: pdgstrf (pdgstrf.c:735) > ==42050== by 0x1015019A5: pdgssvx (pdgssvx.c:1069) > ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42050== by 0x100FF9036: PCSetUp (precon.c:982) > ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42050== by 0x100001B3C: main (in ./ex19) > ==42050== > ==42048== Conditional jump or move depends on uninitialised value(s) > ==42048== at 0x10151F141: pdgstrf (pdgstrf.c:1139) > ==42048== by 0x1015019A5: pdgssvx (pdgssvx.c:1069) > ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42048== by 0x100FF9036: PCSetUp (precon.c:982) > ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42048== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42048== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42048== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42048== by 0x100001B3C: main (in ./ex19) > ==42048== Uninitialised value was created by a heap allocation > ==42048== at 0x1000183B1: malloc (vg_replace_malloc.c:303) > ==42048== by 0x10153B704: superlu_malloc_dist (memory.c:108) > ==42048== by 0x10150ABE2: ddist_psymbtonum (pdsymbfact_distdata.c:1332) > ==42048== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) > ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42048== by 0x100FF9036: PCSetUp (precon.c:982) > ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42048== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42048== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42048== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42048== by 0x100001B3C: main (in ./ex19) > ==42048== > ==42049== Conditional jump or move depends on uninitialised value(s) > ==42049== at 0x10151F141: pdgstrf (pdgstrf.c:1139) > ==42049== by 0x1015019A5: pdgssvx (pdgssvx.c:1069) > ==42049== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42049== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42049== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42049== by 0x100FF9036: PCSetUp (precon.c:982) > ==42049== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42049== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42049== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42049== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42049== by 0x100001B3C: main (in ./ex19) > ==42049== Uninitialised value was created by a heap allocation > ==42049== at 0x1000183B1: malloc (vg_replace_malloc.c:303) > ==42049== by 0x10153B704: superlu_malloc_dist (memory.c:108) > ==42049== by 0x10150ABE2: ddist_psymbtonum (pdsymbfact_distdata.c:1332) > ==42049== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) > ==42049== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42049== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42049== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42049== by 0x100FF9036: PCSetUp (precon.c:982) > ==42049== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42049== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42049== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42049== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42049== by 0x100001B3C: main (in ./ex19) > ==42049== > ==42048== Conditional jump or move depends on uninitialised value(s) > ==42048== at 0x101520054: pdgstrf (pdgstrf.c:1429) > ==42048== by 0x1015019A5: pdgssvx (pdgssvx.c:1069) > ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42048== by 0x100FF9036: PCSetUp (precon.c:982) > ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42048== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42049== Conditional jump or move depends on uninitialised value(s) > ==42048== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42048== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42048== by 0x100001B3C: main (in ./ex19) > ==42048== Uninitialised value was created by a heap allocation > ==42049== at 0x101520054: pdgstrf (pdgstrf.c:1429) > ==42048== at 0x1000183B1: malloc (vg_replace_malloc.c:303) > ==42048== by 0x10153B704: superlu_malloc_dist (memory.c:108) > ==42049== by 0x1015019A5: pdgssvx (pdgssvx.c:1069) > ==42049== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42048== by 0x10150ABE2: ddist_psymbtonum (pdsymbfact_distdata.c:1332) > ==42048== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) > ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42049== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42049== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42049== by 0x100FF9036: PCSetUp (precon.c:982) > ==42049== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42049== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42048== by 0x100FF9036: PCSetUp (precon.c:982) > ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42048== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42049== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42049== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42048== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42048== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42049== by 0x100001B3C: main (in ./ex19) > ==42049== Uninitialised value was created by a heap allocation > ==42049== at 0x1000183B1: malloc (vg_replace_malloc.c:303) > ==42048== by 0x100001B3C: main (in ./ex19) > ==42048== > ==42049== by 0x10153B704: superlu_malloc_dist (memory.c:108) > ==42049== by 0x10150ABE2: ddist_psymbtonum (pdsymbfact_distdata.c:1332) > ==42049== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) > ==42049== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42049== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42049== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42049== by 0x100FF9036: PCSetUp (precon.c:982) > ==42049== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42049== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42049== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42049== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42049== by 0x100001B3C: main (in ./ex19) > ==42049== > ==42050== Conditional jump or move depends on uninitialised value(s) > ==42050== at 0x10151FDE6: pdgstrf (pdgstrf.c:1382) > ==42050== by 0x1015019A5: pdgssvx (pdgssvx.c:1069) > ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42050== by 0x100FF9036: PCSetUp (precon.c:982) > ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42050== by 0x100001B3C: main (in ./ex19) > ==42050== Uninitialised value was created by a heap allocation > ==42050== at 0x1000183B1: malloc (vg_replace_malloc.c:303) > ==42050== by 0x10153B704: superlu_malloc_dist (memory.c:108) > ==42050== by 0x10150B241: ddist_psymbtonum (pdsymbfact_distdata.c:1389) > ==42050== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) > ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST > (superlu_dist.c:414) > ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) > ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) > ==42050== by 0x100FF9036: PCSetUp (precon.c:982) > ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) > ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) > ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) > ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) > ==42050== by 0x100001B3C: main (in ./ex19) > ==42050== > > > > On Jul 20, 2015, at 12:03 PM, [email protected] wrote: > > > > Ok. So I have been creating the full factorization on each process. That > gives me some hope! > > > > I followed your suggestion and tried to use the runtime option > ‘-mat_superlu_dist_parsymbfact’. > > However, now the program crashes with: > > > > Invalid ISPEC at line 484 in file get_perm_c.c > > > > And so on… > > > > From the SuperLU manual; I should give the option either YES or NO, > however -mat_superlu_dist_parsymbfact YES makes the program crash in the > same way as above. > > Also I can’t find any reference to -mat_superlu_dist_parsymbfact in the > PETSc documentation > > > > Mahir > > > > Mahir Ülker-Kaustell, Kompetenssamordnare, Brokonstruktör, Tekn. Dr, > Tyréns AB > > 010 452 30 82, [email protected] > > > > From: Xiaoye S. Li [mailto:[email protected]] > > Sent: den 20 juli 2015 18:12 > > To: Ülker-Kaustell, Mahir > > Cc: Hong; petsc-users > > Subject: Re: [petsc-users] SuperLU MPI-problem > > > > The default SuperLU_DIST setting is to serial symbolic factorization. > Therefore, what matters is how much memory do you have per MPI task? > > > > The code failed to malloc memory during redistribution of matrix A to > {L\U} data struction (using result of serial symbolic factorization.) > > > > You can use parallel symbolic factorization, by runtime option: > '-mat_superlu_dist_parsymbfact' > > > > Sherry Li > > > > > > On Mon, Jul 20, 2015 at 8:59 AM, [email protected] < > [email protected]> wrote: > > Hong: > > > > Previous experiences with this equation have shown that it is very > difficult to solve it iteratively. Hence the use of a direct solver. > > > > The large test problem I am trying to solve has slightly less than 10^6 > degrees of freedom. The matrices are derived from finite elements so they > are sparse. > > The machine I am working on has 128GB ram. I have estimated the memory > needed to less than 20GB, so if the solver needs twice or even three times > as much, it should still work well. Or have I completely misunderstood > something here? > > > > Mahir > > > > > > > > From: Hong [mailto:[email protected]] > > Sent: den 20 juli 2015 17:39 > > To: Ülker-Kaustell, Mahir > > Cc: petsc-users > > Subject: Re: [petsc-users] SuperLU MPI-problem > > > > Mahir: > > Direct solvers consume large amount of memory. Suggest to try followings: > > > > 1. A sparse iterative solver if [-omega^2M + K] is not too > ill-conditioned. You may test it using the small matrix. > > > > 2. Incrementally increase your matrix sizes. Try different matrix > orderings. > > Do you get memory crash in the 1st symbolic factorization? > > In your case, matrix data structure stays same when omega changes, so > you only need to do one matrix symbolic factorization and reuse it. > > > > 3. Use a machine that gives larger memory. > > > > Hong > > > > Dear Petsc-Users, > > > > I am trying to use PETSc to solve a set of linear equations arising from > Naviers equation (elastodynamics) in the frequency domain. > > The frequency dependency of the problem requires that the system > > > > [-omega^2M + K]u = F > > > > where M and K are constant, square, positive definite matrices (mass and > stiffness respectively) is solved for each frequency omega of interest. > > K is a complex matrix, including material damping. > > > > I have written a PETSc program which solves this problem for a small > (1000 degrees of freedom) test problem on one or several processors, but it > keeps crashing when I try it on my full scale (in the order of 10^6 degrees > of freedom) problem. > > > > The program crashes at KSPSetUp() and from what I can see in the error > messages, it appears as if it consumes too much memory. > > > > I would guess that similar problems have occurred in this mail-list, so > I am hoping that someone can push me in the right direction… > > > > Mahir > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
