Hi, The same eigenproblem runs with 120 GB RAM in a serial machine in Matlab.
In Cray I fired with 240*4 GB RAM in parallel. So it has to go in right ? And for small matrices it is having negative scaling i.e 24 core is running faster. I have attached the submission script. Pls see.. Kindly let me know cheers, Venkatesh On Sat, May 23, 2015 at 4:58 PM, Matthew Knepley <knep...@gmail.com> wrote: > On Sat, May 23, 2015 at 2:39 AM, venkatesh g <venkateshg...@gmail.com> > wrote: > >> Hi again, >> >> I have installed the Petsc and Slepc in Cray with intel compilers with >> Mumps. >> >> I am getting this error when I solve eigenvalue problem with large >> matrices: [201]PETSC ERROR: Error reported by MUMPS in numerical >> factorization phase: Cannot allocate required memory 9632 megabytes >> > > It ran out of memory on the node. > > >> Also it is again not scaling well for small matrices. >> > > MUMPS strong scaling for small matrices is not very good. Weak scaling is > looking at big matrices. > > Thanks, > > Matt > > >> Kindly let me know what to do. >> >> cheers, >> >> Venkatesh >> >> >> On Tue, May 19, 2015 at 3:02 PM, Matthew Knepley <knep...@gmail.com> >> wrote: >> >>> On Tue, May 19, 2015 at 1:04 AM, venkatesh g <venkateshg...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I have attached the log of the command which I gave in the master node: >>>> make streams NPMAX=32 >>>> >>>> I dont know why it says 'It appears you have only 1 node'. But other >>>> codes run in parallel with good scaling on 8 nodes. >>>> >>> >>> If you look at the STREAMS numbers, you can see that your system is only >>> able to support about 2 cores with the >>> available memory bandwidth. Thus for bandwidth constrained operations >>> (almost everything in sparse linear algebra >>> and solvers), your speedup will not be bigger than 2. >>> >>> Other codes may do well on this machine, but they would be compute >>> constrained, using things like DGEMM. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Kindly let me know. >>>> >>>> Venkatesh >>>> >>>> >>>> >>>> On Mon, May 18, 2015 at 11:21 PM, Barry Smith <bsm...@mcs.anl.gov> >>>> wrote: >>>> >>>>> >>>>> Run the streams benchmark on this system and send the results. >>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#computers >>>>> >>>>> >>>>> > On May 18, 2015, at 11:14 AM, venkatesh g <venkateshg...@gmail.com> >>>>> wrote: >>>>> > >>>>> > Hi, >>>>> > I have emailed the mumps-user list. >>>>> > Actually the cluster has 8 nodes with 16 cores, and other codes >>>>> scale well. >>>>> > I wanted to ask if this job takes much time, then if I submit on >>>>> more cores, I have to increase the icntl(14).. which would again take long >>>>> time. >>>>> > >>>>> > So is there another way ? >>>>> > >>>>> > cheers, >>>>> > Venkatesh >>>>> > >>>>> > On Mon, May 18, 2015 at 7:16 PM, Matthew Knepley <knep...@gmail.com> >>>>> wrote: >>>>> > On Mon, May 18, 2015 at 8:29 AM, venkatesh g < >>>>> venkateshg...@gmail.com> wrote: >>>>> > Hi I have attached the performance logs for 2 jobs on different >>>>> processors. I had to increase the workspace icntl(14) when I submit on >>>>> more >>>>> cores since it is failing with small value of icntl(14). >>>>> > >>>>> > 1. performance_log1.txt is run on 8 cores (option given >>>>> -mat_mumps_icntl_14 200) >>>>> > 2. performance_log2.txt is run on 2 cores (option given >>>>> -mat_mumps_icntl_14 85 ) >>>>> > >>>>> > 1) Your number of iterates increased from 7600 to 9600, but that is >>>>> a relatively small effect >>>>> > >>>>> > 2) MUMPS is just taking a lot longer to do forward/backward solve. >>>>> You might try emailing >>>>> > the list for them. However, I would bet that your system has enough >>>>> bandwidth for 2 procs >>>>> > and not much more. >>>>> > >>>>> > Thanks, >>>>> > >>>>> > Matt >>>>> > >>>>> > Venkatesh >>>>> > >>>>> > On Sun, May 17, 2015 at 6:13 PM, Matthew Knepley <knep...@gmail.com> >>>>> wrote: >>>>> > On Sun, May 17, 2015 at 1:38 AM, venkatesh g < >>>>> venkateshg...@gmail.com> wrote: >>>>> > Hi, Thanks for the information. I now increased the workspace by >>>>> adding '-mat_mumps_icntl_14 100' >>>>> > >>>>> > It works. However, the problem is, if I submit in 1 core I get the >>>>> answer in 200 secs, but with 4 cores and '-mat_mumps_icntl_14 100' it >>>>> takes >>>>> 3500secs. >>>>> > >>>>> > Send the output of -log_summary for all performance queries. >>>>> Otherwise we are just guessing. >>>>> > >>>>> > Matt >>>>> > >>>>> > My command line is: 'mpiexec -np 4 ./ex7 -f1 a2 -f2 b2 -eps_nev 1 >>>>> -st_type sinvert -eps_max_it 5000 -st_ksp_type preonly -st_pc_type lu >>>>> -st_pc_factor_mat_solver_package mumps -mat_mumps_icntl_14 100' >>>>> > >>>>> > Kindly let me know. >>>>> > >>>>> > Venkatesh >>>>> > >>>>> > >>>>> > >>>>> > On Sat, May 16, 2015 at 7:10 PM, David Knezevic < >>>>> david.kneze...@akselos.com> wrote: >>>>> > On Sat, May 16, 2015 at 8:08 AM, venkatesh g < >>>>> venkateshg...@gmail.com> wrote: >>>>> > Hi, >>>>> > I am trying to solving AX=lambda BX eigenvalue problem. >>>>> > >>>>> > A and B are of sizes 3600x3600 >>>>> > >>>>> > I run with this command : >>>>> > >>>>> > 'mpiexec -np 4 ./ex7 -f1 a2 -f2 b2 -eps_nev 1 -st_type sinvert >>>>> -eps_max_it 5000 -st_ksp_type preonly -st_pc_type lu >>>>> -st_pc_factor_mat_solver_package mumps' >>>>> > >>>>> > I get this error: (I get result only when I give 1 or 2 processors) >>>>> > Reading COMPLEX matrices from binary files... >>>>> > [0]PETSC ERROR: --------------------- Error Message >>>>> ------------------------------------ >>>>> > [0]PETSC ERROR: Error in external library! >>>>> > [0]PETSC ERROR: Error reported by MUMPS in numerical factorization >>>>> phase: INFO(1)=-9, INFO(2)=2024 >>>>> > >>>>> > >>>>> > The MUMPS error types are described in Chapter 7 of the MUMPS >>>>> manual. In this case you have INFO(1)=-9, which is explained in the manual >>>>> as: >>>>> > >>>>> > "–9 Main internal real/complex workarray S too small. If INFO(2) is >>>>> positive, then the number of entries that are missing in S at the moment >>>>> when the error is raised is available in INFO(2). If INFO(2) is negative, >>>>> then its absolute value should be multiplied by 1 million. If an error –9 >>>>> occurs, the user should increase the value of ICNTL(14) before calling the >>>>> factorization (JOB=2) again, except if ICNTL(23) is provided, in which >>>>> case >>>>> ICNTL(23) should be increased." >>>>> > >>>>> > This says that you should use ICTNL(14) to increase the working >>>>> space size: >>>>> > >>>>> > "ICNTL(14) is accessed by the host both during the analysis and the >>>>> factorization phases. It corresponds to the percentage increase in the >>>>> estimated working space. When significant extra fill-in is caused by >>>>> numerical pivoting, increasing ICNTL(14) may help. Except in special >>>>> cases, >>>>> the default value is 20 (which corresponds to a 20 % increase)." >>>>> > >>>>> > So, for example, you can avoid this error via the following command >>>>> line argument to PETSc: "-mat_mumps_icntl_14 30", where 30 indicates that >>>>> we allow a 30% increase in the workspace instead of the default 20%. >>>>> > >>>>> > David >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > -- >>>>> > What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> > -- Norbert Wiener >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > -- >>>>> > What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> > -- Norbert Wiener >>>>> > >>>>> >>>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener >
script
Description: Binary data