Hello, This is the output of -log_view. I selected what I thought were the important parts. I don't know if this is the best format to send the logs. If a text file is better let me know. Thanks again,
---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./dos.exe on a named compute-0-11.local with 20 processors, by pcd Tue Nov 26 15:50:50 2019 Using Petsc Release Version 3.10.5, Mar, 28, 2019 Max Max/Min Avg Total Time (sec): 2.214e+03 1.000 2.214e+03 Objects: 1.370e+02 1.030 1.332e+02 Flop: 1.967e+14 1.412 1.539e+14 3.077e+15 Flop/sec: 8.886e+10 1.412 6.950e+10 1.390e+12 MPI Messages: 1.716e+03 1.350 1.516e+03 3.032e+04 MPI Message Lengths: 2.559e+08 5.796 4.179e+04 1.267e+09 MPI Reductions: 3.840e+02 1.000 Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 1.0000e+02 4.5% 3.0771e+15 100.0% 3.016e+04 99.5% 4.190e+04 99.7% 3.310e+02 86.2% 1: Setting Up EPS: 2.1137e+03 95.5% 7.4307e+09 0.0% 1.600e+02 0.5% 2.000e+04 0.3% 4.600e+01 12.0% ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage PetscBarrier 2 1.0 2.6554e+004632.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 0 0 0 0 0 BuildTwoSidedF 3 1.0 1.2021e-01672.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecDot 8 1.0 1.1364e-02 2.3 8.00e+05 1.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 2 0 0 0 0 2 1408 VecMDot 11 1.0 4.8588e-02 2.2 6.60e+06 1.0 0.0e+00 0.0e+00 1.1e+01 0 0 0 0 3 0 0 0 0 3 2717 VecNorm 12 1.0 5.2616e-02 4.3 1.20e+06 1.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 3 0 0 0 0 4 456 VecScale 12 1.0 9.8681e-04 2.2 6.00e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 12160 VecCopy 3 1.0 4.1175e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 108 1.0 9.3610e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 1 1.0 1.6284e-04 3.2 1.00e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 12282 VecMAXPY 12 1.0 7.6976e-03 1.9 7.70e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 20006 VecScatterBegin 419 1.0 4.5905e-01 3.7 0.00e+00 0.0 2.9e+04 3.7e+04 9.0e+01 0 0 96 85 23 0 0 97 85 27 0 VecScatterEnd 329 1.0 9.3328e-01 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 VecSetRandom 1 1.0 4.3299e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 12 1.0 5.3697e-02 4.2 1.80e+06 1.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 3 0 0 0 0 4 670 MatMult 240 1.0 1.2112e-01 1.5 1.86e+07 1.0 4.4e+02 8.0e+04 0.0e+00 0 0 1 3 0 0 0 1 3 0 3071 MatSolve 101 1.0 9.3087e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04 9.1e+01 4100 97 82 24 93100 97 82 27 33055277 MatCholFctrNum 1 1.0 1.2752e-02 2.8 5.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 78 MatICCFactorSym 1 1.0 4.0321e-03 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 5 1.7 1.2031e-01501.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 5 1.7 6.6613e-02 2.4 0.00e+00 0.0 1.6e+02 2.0e+04 2.4e+01 0 0 1 0 6 0 0 1 0 7 0 MatGetRowIJ 1 1.0 7.1526e-06 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.2271e-03 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLoad 3 1.0 2.8543e-01 1.0 0.00e+00 0.0 3.3e+02 5.6e+05 5.4e+01 0 0 1 15 14 0 0 1 15 16 0 MatView 2 0.0 7.4778e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 2 1.0 1.3866e-0236.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 90 1.0 9.3211e+01 1.0 1.97e+14 1.4 3.0e+04 3.6e+04 1.1e+02 4100 98 85 30 93100 99 85 34 33011509 KSPGMRESOrthog 11 1.0 5.3543e-02 2.0 1.32e+07 1.0 0.0e+00 0.0e+00 1.1e+01 0 0 0 0 3 0 0 0 0 3 4931 PCSetUp 2 1.0 1.8253e-02 2.9 5.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 55 PCSetUpOnBlocks 1 1.0 1.8055e-02 2.9 5.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 55 PCApply 101 1.0 9.3089e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04 9.1e+01 4100 97 82 24 93100 97 82 27 33054820 EPSSolve 1 1.0 9.5183e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04 2.4e+02 4100 97 82 63 95100 97 82 73 32327750 STApply 89 1.0 9.3107e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04 9.1e+01 4100 97 82 24 93100 97 82 27 33048198 STMatSolve 89 1.0 9.3084e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04 9.1e+01 4100 97 82 24 93100 97 82 27 33056525 BVCreate 2 1.0 5.0357e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 2 0 0 0 0 2 0 BVCopy 1 1.0 9.2030e-05 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 BVMultVec 132 1.0 7.2259e-01 1.3 5.26e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 14567 BVMultInPlace 1 1.0 2.2316e-01 1.1 6.40e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 57357 BVDotVec 132 1.0 1.3370e+00 1.1 5.46e+08 1.0 0.0e+00 0.0e+00 1.3e+02 0 0 0 0 35 1 0 0 0 40 8169 BVOrthogonalizeV 81 1.0 1.9413e+00 1.1 1.07e+09 1.0 0.0e+00 0.0e+00 1.3e+02 0 0 0 0 35 2 0 0 0 40 11048 BVScale 89 1.0 3.0558e-03 1.4 4.45e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 29125 BVNormVec 8 1.0 1.5073e-02 1.9 1.20e+06 1.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 3 0 0 0 0 3 1592 BVSetRandom 1 1.0 4.3440e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DSSolve 1 1.0 2.5339e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DSVectors 80 1.0 3.5286e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DSOther 1 1.0 6.0797e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 --- Event Stage 1: Setting Up EPS BuildTwoSidedF 3 1.0 2.8591e-0211.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 4 1.0 6.1312e-03122.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCholFctrSym 1 1.0 1.1540e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.0e+00 1 0 0 0 1 1 0 0 0 11 0 MatCholFctrNum 2 1.0 2.1019e+03 1.0 1.00e+09 4.3 0.0e+00 0.0e+00 0.0e+00 95 0 0 0 0 99100 0 0 0 4 MatCopy 1 1.0 3.3707e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 1 0 0 0 0 4 0 MatConvert 1 1.0 6.1760e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 3 1.0 2.8630e-0211.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 3 1.0 3.2575e-02 1.1 0.00e+00 0.0 1.6e+02 2.0e+04 1.8e+01 0 0 1 0 5 0 0100100 39 0 MatGetRowIJ 1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 2.6703e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 1 1.0 1.0121e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAXPY 2 1.0 1.1354e-01 1.1 0.00e+00 0.0 1.6e+02 2.0e+04 2.0e+01 0 0 1 0 5 0 0100100 43 0 KSPSetUp 2 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 2 1.0 2.1135e+03 1.0 1.00e+09 4.3 0.0e+00 0.0e+00 1.2e+01 95 0 0 0 3 100100 0 0 26 4 EPSSetUp 1 1.0 2.1137e+03 1.0 1.00e+09 4.3 1.6e+02 2.0e+04 4.6e+01 95 0 1 0 12 100100100100100 4 STSetUp 2 1.0 1.0712e+03 1.0 4.95e+08 4.3 8.0e+01 2.0e+04 2.6e+01 48 0 0 0 7 51 50 50 50 57 3 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 37 50 126614208 0. Matrix 13 17 159831092 0. Viewer 6 5 4200 0. Index Set 12 13 2507240 0. Vec Scatter 5 7 128984 0. Krylov Solver 3 4 22776 0. Preconditioner 3 4 3848 0. EPS Solver 1 2 8632 0. Spectral Transform 1 2 1664 0. Basis Vectors 3 4 45600 0. PetscRandom 2 2 1292 0. Region 1 2 1344 0. Direct Solver 1 2 163856 0. --- Event Stage 1: Setting Up EPS Vector 19 6 729576 0. Matrix 10 6 12178892 0. Index Set 9 8 766336 0. Vec Scatter 4 2 2640 0. Krylov Solver 1 0 0 0. Preconditioner 1 0 0 0. EPS Solver 1 0 0 0. Spectral Transform 1 0 0 0. Basis Vectors 1 0 0 0. Region 1 0 0 0. Direct Solver 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 0.000263596 Average time for zero size MPI_Send(): 5.78523e-05 #PETSc Option Table entries: -log_view -mat_mumps_cntl_3 1e-12 -mat_mumps_icntl_13 1 -mat_mumps_icntl_14 60 -mat_mumps_icntl_24 1 -matload_block_size 1 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --prefix=/share/apps/petsc/3.10.5 --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-debugging=0 COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --download-mpich --download-fblaslapack --download-scalapack --download-mumps Best regards, Perceval, > On Mon, Nov 25, 2019 at 11:45 AM Perceval Desforges > <perceval.desfor...@polytechnique.edu> wrote: > >> I am basically trying to solve a finite element problem, which is why in 3D >> I have 7 non-zero diagonals that are quite farm apart from one another. In >> 2D I only have 5 non-zero diagonals that are less far apart. So is it normal >> that the set up time is around 400 times greater in the 3D case? Is there >> nothing to be done? > > No. It is almost certain that preallocation is screwed up. There is no way it > can take 400x longer for a few nonzeros. > > In order to debug, please send the output of -log_view and indicate where the > time is taken for assembly. You can usually > track down bad preallocation using -info. > > Thanks, > > Matt > > I will try setting up only one partition. > > Thanks, > > Perceval, > Probably it is not a preallocation issue, as it shows "total number of > mallocs used during MatSetValues calls =0". > > Adding new diagonals may increase fill-in a lot, if the new diagonals are > displaced with respect to the other ones. > > The partitions option is intended for running several nodes. If you are using > just one node probably it is better to set one partition only. > > Jose > > El 25 nov 2019, a las 18:25, Matthew Knepley <knep...@gmail.com> escribió: > > On Mon, Nov 25, 2019 at 11:20 AM Perceval Desforges > <perceval.desfor...@polytechnique.edu> wrote: > Hi, > > So I'm loading two matrices from files, both 1000000 by 10000000. I ran the > program with -mat_view::ascii_info and I got: > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=7000000, allocated nonzeros=7000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times, and then > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=1000000, allocated nonzeros=1000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times as well, and then > > Mat Object: 1 MPI processes > type: seqaij > rows=1000000, cols=1000000 > total: nonzeros=7000000, allocated nonzeros=7000000 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > > 20 times as well before crashing. > > I realized it might be because I am setting up 20 krylov schur partitions > which may be too much. I tried running the code again with only 2 partitions > and now the code runs but I have speed issues. > > I have one version of the code where my first matrix has 5 non-zero diagonals > (so 5000000 non-zero entries), and the set up time is quite fast (8 seconds) > and solving is also quite fast. The second version is the same but I have two > extra non-zero diagonals (7000000 non-zero entries) and the set up time is a > lot slower (2900 seconds ~ 50 minutes) and solving is also a lot slower. Is > it normal that adding two extra diagonals increases solve and set up time so > much? > > I can't see the rest of your code, but I am guessing your preallocation > statement has "5", so it does no mallocs when you create > your first matrix, but mallocs for every row when you create your second > matrix. When you load them from disk, we do all the > preallocation correctly. > > Thanks, > > Matt > Thanks again, > > Best regards, > > Perceval, > > Then I guess it is the factorization that is failing. How many nonzero > entries do you have? Run with > -mat_view ::ascii_info > > Jose > > El 22 nov 2019, a las 19:56, Perceval Desforges > <perceval.desfor...@polytechnique.edu> escribió: > > Hi, > > Thanks for your answer. I tried looking at the inertias before solving, but > the problem is that the program crashes when I call EPSSetUp with this error: > > slurmstepd: error: Step 2140.0 exceeded virtual memory limit (313526508 > > 107317760), being killed > > I get this error even when there are no eigenvalues in the interval. > > I've started using BVMAT instead of BVVECS by the way. > > Thanks, > > Perceval, > > Don't use -mat_mumps_icntl_14 to reduce the memory used by MUMPS. > > Most likely the problem is that the interval you gave is too large and > contains too many eigenvalues (SLEPc needs to allocate at least one vector > per each eigenvalue). You can count the eigenvalues in the interval with the > inertias, which are available at EPSSetUp (no need to call EPSSolve). See > this example: > http://slepc.upv.es/documentation/current/src/eps/examples/tutorials/ex25.c.html > You can comment out the call to EPSSolve() and run with the option > -show_inertias > For example, the output > Shift 0.1 Inertia 3 > Shift 0.35 Inertia 11 > means that the interval [0.1,0.35] contains 8 eigenvalues (=11-3). > > By the way, I would suggest using BVMAT instead of BVVECS (the latter is > slower). > > Jose > > El 21 nov 2019, a las 18:13, Perceval Desforges via petsc-users > <petsc-users@mcs.anl.gov> escribió: > > Hello all, > > I am trying to obtain all the eigenvalues in a certain interval for a fairly > large matrix (1000000 * 1000000). I therefore use the spectrum slicing method > detailed in section 3.4.5 of the manual. The calculations are run on a > processor with 20 cores and 96 Go of RAM. > > The options I use are : > > -bv_type vecs -eps_krylovschur_detect_zeros 1 -mat_mumps_icntl_13 1 > -mat_mumps_icntl_24 1 -mat_mumps_cntl_3 1e-12 > > However the program quickly crashes with this error: > > slurmstepd: error: Step 2115.0 exceeded virtual memory limit (312121084 > > 107317760), being killed > > I've tried reducing the amount of memory used by slepc with the > -mat_mumps_icntl_14 option by setting it at -70 for example but then I get > this error: > > [1]PETSC ERROR: Error in external library > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: > INFOG(1)=-9, INFO(2)=82733614 > > which is an error due to setting the mumps icntl option so low from what I've > gathered. > > Is there any other way I can reduce memory usage? > > Thanks, > > Regards, > > Perceval, > > P.S. I sent the same email a few minutes ago but I think I made a mistake in > the address, I'm sorry if I've sent it twice. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ [1] Links: ------ [1] http://www.cse.buffalo.edu/~knepley/