Venkatesh, You may also test superlu_dist, which may use less memory. Hong On Mon, Jun 22, 2015 at 12:43 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
> > There is nothing we can really do to help on the PETSc side. I do note > from the output > > REDISTRIB: TOTAL DATA LOCAL/SENT = 328575589 1437471711 > GLOBAL TIME FOR MATRIX DISTRIBUTION = 206.6792 > ** Memory relaxation parameter ( ICNTL(14) ) : 35 > ** Rank of processor needing largest memory in facto : 30 > ** Space in MBYTES used by this processor for facto : 21593 > ** Avg. Space in MBYTES per working proc during facto : 7708 > > some processes (like 30) require three times as much memory as other > processes so perhaps a better load balancing of the matrix during the > factorization would help with memory usage. > > Barry > > > > On Jun 22, 2015, at 10:57 AM, venkatesh g <venkateshg...@gmail.com> > wrote: > > > > Hi > > I have restructured my matrix eigenvalue problem to see why B is > singular as you suggested by changing the governing equations in different > form. > > > > Now my matrix B is not singular. Both A and B are invertible in > Ax=lambda Bx. > > > > Still I receive error in MUMPS as it uses large memory (attached is the > error log) > > > > I gave the command: aprun -n 240 -N 24 ./ex7 -f1 A100t -f2 B100t > -st_type sinvert -eps_target 0.01 -st_ksp_type preonly -st_pc_type lu > -st_pc_factor_mat_solver_package mumps -mat_mumps_cntl_1 1e-5 > -mat_mumps_icntl_4 2 -evecs v100t > > > > The matrix A is 60% with zeros. > > > > Kindly help me. > > > > Venkatesh > > > > On Sun, May 31, 2015 at 8:04 PM, Hong <hzh...@mcs.anl.gov> wrote: > > venkatesh, > > > > As we discussed previously, even on smaller problems, > > both mumps and superlu_dist failed, although Mumps gave "OOM" error in > numerical factorization. > > > > You acknowledged that B is singular, which may need additional > reformulation for your eigenvalue problems. The option '-st_type sinvert' > likely uses B^{-1} (have you read slepc manual?), which could be the source > of trouble. > > > > Please investigate your model, understand why B is singular; if there is > a way to dump null space before submitting large size simulation. > > > > Hong > > > > > > On Sun, May 31, 2015 at 8:36 AM, Dave May <dave.mayhe...@gmail.com> > wrote: > > It failed due to a lack of memory. "OOM" stands for "out of memory". OOM > killer terminated your job means you ran out of memory. > > > > > > > > > > On Sunday, 31 May 2015, venkatesh g <venkateshg...@gmail.com> wrote: > > Hi all, > > > > I tried to run my Generalized Eigenproblem in 120 x 24 = 2880 cores. > > The matrix size of A = 20GB and B = 5GB. > > > > It got killed after 7 Hrs of run time. Please see the mumps error log. > Why must it fail ? > > I gave the command: > > > > aprun -n 240 -N 24 ./ex7 -f1 a110t -f2 b110t -st_type sinvert -eps_nev 1 > -log_summary -st_ksp_type preonly -st_pc_type lu > -st_pc_factor_mat_solver_package mumps -mat_mumps_cntl_1 1e-2 > > > > Kindly let me know. > > > > cheers, > > Venkatesh > > > > On Fri, May 29, 2015 at 10:46 PM, venkatesh g <venkateshg...@gmail.com> > wrote: > > Hi Matt, users, > > > > Thanks for the info. Do you also use Petsc and Slepc with MUMPS ? I get > into the segmentation error if I increase my matrix size. > > > > Can you suggest other software for direct solver for QR in parallel > since as LU may not be good for a singular B matrix in Ax=lambda Bx ? I am > attaching the working version mumps log. > > > > My matrix size here is around 47000x47000. If I am not wrong, the memory > usage per core is 272MB. > > > > Can you tell me if I am wrong ? or really if its light on memory for > this matrix ? > > > > Thanks > > cheers, > > Venkatesh > > > > On Fri, May 29, 2015 at 4:00 PM, Matt Landreman < > matt.landre...@gmail.com> wrote: > > Dear Venkatesh, > > > > As you can see in the error log, you are now getting a segmentation > fault, which is almost certainly a separate issue from the info(1)=-9 > memory problem you had previously. Here is one idea which may or may not > help. I've used mumps on the NERSC Edison system, and I found that I > sometimes get segmentation faults when using the default Intel compiler. > When I switched to the cray compiler the problem disappeared. So you could > perhaps try a different compiler if one is available on your system. > > > > Matt > > > > On May 29, 2015 4:04 AM, "venkatesh g" <venkateshg...@gmail.com> wrote: > > Hi Matt, > > > > I did what you told and read the manual of that CNTL parameters. I solve > for that with CNTL(1)=1e-4. It is working. > > > > But it was a test matrix with size 46000x46000. Actual matrix size is > 108900x108900 and will increase in the future. > > > > I get this error of memory allocation failed. And the binary matrix size > of A is 20GB and B is 5 GB. > > > > Now I submit this in 240 processors each 4 GB RAM and also in 128 > Processors with total 512 GB RAM. > > > > In both the cases, it fails with the following error like memory is not > enough. But for 90000x90000 size it had run serially in Matlab with <256 GB > RAM. > > > > Kindly let me know. > > > > Venkatesh > > > > On Tue, May 26, 2015 at 8:02 PM, Matt Landreman < > matt.landre...@gmail.com> wrote: > > Hi Venkatesh, > > > > I've struggled a bit with mumps memory allocation too. I think the > behavior of mumps is roughly the following. First, in the "analysis step", > mumps computes a minimum memory required based on the structure of nonzeros > in the matrix. Then when it actually goes to factorize the matrix, if it > ever encounters an element smaller than CNTL(1) (default=0.01) in the > diagonal of a sub-matrix it is trying to factorize, it modifies the > ordering to avoid the small pivot, which increases the fill-in (hence > memory needed). ICNTL(14) sets the margin allowed for this unanticipated > fill-in. Setting ICNTL(14)=200000 as in your email is not the solution, > since this means mumps asks for a huge amount of memory at the start. > Better would be to lower CNTL(1) or (I think) use static pivoting > (CNTL(4)). Read the section in the mumps manual about these CNTL > parameters. I typically set CNTL(1)=1e-6, which eliminated all the > INFO(1)=-9 errors for my problem, without having to modify ICNTL(14). > > > > Also, I recommend running with ICNTL(4)=3 to display diagnostics. Look > for the line in standard output that says "TOTAL space in MBYTES for IC > factorization". This is the amount of memory that mumps is trying to > allocate, and for the default ICNTL(14), it should be similar to matlab's > need. > > > > Hope this helps, > > -Matt Landreman > > University of Maryland > > > > On Tue, May 26, 2015 at 10:03 AM, venkatesh g <venkateshg...@gmail.com> > wrote: > > I posted a while ago in MUMPS forums but no one seems to reply. > > > > I am solving a large generalized Eigenvalue problem. > > > > I am getting the following error which is attached, after giving the > command: > > > > /cluster/share/venkatesh/petsc-3.5.3/linux-gnu/bin/mpiexec -np 64 -hosts > compute-0-4,compute-0-6,compute-0-7,compute-0-8 ./ex7 -f1 a72t -f2 b72t > -st_type sinvert -eps_nev 3 -eps_target 0.5 -st_ksp_type preonly > -st_pc_type lu -st_pc_factor_mat_solver_package mumps -mat_mumps_icntl_14 > 200000 > > > > IT IS impossible to allocate so much memory per processor.. it is asking > like around 70 GB per processor. > > > > A serial job in MATLAB for the same matrices takes < 60GB. > > > > After trying out superLU_dist, I have attached the error there also > (segmentation error). > > > > Kindly help me. > > > > Venkatesh > > > > > > > > > > > > > > > > > > <mumps_error_log.txt> > >