Re: [petsc-users] Memory optimization

Perceval Desforges Tue, 26 Nov 2019 07:24:27 -0800

Hello, 

This is the output of -log_view. I selected what I thought were the
important parts. I don't know if this is the best format to send the
logs. If a text file is better let me know. Thanks again,


---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------

./dos.exe on a  named compute-0-11.local with 20 processors, by pcd Tue
Nov 26 15:50:50 2019
Using Petsc Release Version 3.10.5, Mar, 28, 2019 

                         Max       Max/Min     Avg       Total 
Time (sec):           2.214e+03     1.000   2.214e+03
Objects:              1.370e+02     1.030   1.332e+02
Flop:                 1.967e+14     1.412   1.539e+14  3.077e+15
Flop/sec:             8.886e+10     1.412   6.950e+10  1.390e+12
MPI Messages:         1.716e+03     1.350   1.516e+03  3.032e+04
MPI Message Lengths:  2.559e+08     5.796   4.179e+04  1.267e+09
MPI Reductions:       3.840e+02     1.000 

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages
---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count  
%Total     Avg         %Total    Count   %Total 
 0:      Main Stage: 1.0000e+02   4.5%  3.0771e+15 100.0%  3.016e+04 
99.5%  4.190e+04       99.7%  3.310e+02  86.2% 
 1:  Setting Up EPS: 2.1137e+03  95.5%  7.4307e+09   0.0%  1.600e+02  
0.5%  2.000e+04        0.3%  4.600e+01  12.0% 

------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                     
        --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen 
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

PetscBarrier           2 1.0 2.6554e+004632.9 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   3  0  0  0  0     0
BuildTwoSidedF         3 1.0 1.2021e-01672.3 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecDot                 8 1.0 1.1364e-02 2.3 8.00e+05 1.0 0.0e+00 0.0e+00
8.0e+00  0  0  0  0  2   0  0  0  0  2  1408
VecMDot               11 1.0 4.8588e-02 2.2 6.60e+06 1.0 0.0e+00 0.0e+00
1.1e+01  0  0  0  0  3   0  0  0  0  3  2717
VecNorm               12 1.0 5.2616e-02 4.3 1.20e+06 1.0 0.0e+00 0.0e+00
1.2e+01  0  0  0  0  3   0  0  0  0  4   456
VecScale              12 1.0 9.8681e-04 2.2 6.00e+05 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 12160
VecCopy                3 1.0 4.1175e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               108 1.0 9.3610e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                1 1.0 1.6284e-04 3.2 1.00e+05 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 12282
VecMAXPY              12 1.0 7.6976e-03 1.9 7.70e+06 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 20006
VecScatterBegin      419 1.0 4.5905e-01 3.7 0.00e+00 0.0 2.9e+04 3.7e+04
9.0e+01  0  0 96 85 23   0  0 97 85 27     0
VecScatterEnd        329 1.0 9.3328e-01 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   1  0  0  0  0     0
VecSetRandom           1 1.0 4.3299e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize          12 1.0 5.3697e-02 4.2 1.80e+06 1.0 0.0e+00 0.0e+00
1.2e+01  0  0  0  0  3   0  0  0  0  4   670
MatMult              240 1.0 1.2112e-01 1.5 1.86e+07 1.0 4.4e+02 8.0e+04
0.0e+00  0  0  1  3  0   0  0  1  3  0  3071
MatSolve             101 1.0 9.3087e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04
9.1e+01  4100 97 82 24  93100 97 82 27 33055277
MatCholFctrNum         1 1.0 1.2752e-02 2.8 5.00e+04 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0    78
MatICCFactorSym        1 1.0 4.0321e-03 3.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       5 1.7 1.2031e-01501.1 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         5 1.7 6.6613e-02 2.4 0.00e+00 0.0 1.6e+02 2.0e+04
2.4e+01  0  0  1  0  6   0  0  1  0  7     0
MatGetRowIJ            1 1.0 7.1526e-06 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.2271e-03 3.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLoad                3 1.0 2.8543e-01 1.0 0.00e+00 0.0 3.3e+02 5.6e+05
5.4e+01  0  0  1 15 14   0  0  1 15 16     0
MatView                2 0.0 7.4778e-02 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               2 1.0 1.3866e-0236.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              90 1.0 9.3211e+01 1.0 1.97e+14 1.4 3.0e+04 3.6e+04
1.1e+02  4100 98 85 30  93100 99 85 34 33011509
KSPGMRESOrthog        11 1.0 5.3543e-02 2.0 1.32e+07 1.0 0.0e+00 0.0e+00
1.1e+01  0  0  0  0  3   0  0  0  0  3  4931
PCSetUp                2 1.0 1.8253e-02 2.9 5.00e+04 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0    55
PCSetUpOnBlocks        1 1.0 1.8055e-02 2.9 5.00e+04 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0    55
PCApply              101 1.0 9.3089e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04
9.1e+01  4100 97 82 24  93100 97 82 27 33054820
EPSSolve               1 1.0 9.5183e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04
2.4e+02  4100 97 82 63  95100 97 82 73 32327750
STApply               89 1.0 9.3107e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04
9.1e+01  4100 97 82 24  93100 97 82 27 33048198
STMatSolve            89 1.0 9.3084e+01 1.0 1.97e+14 1.4 2.9e+04 3.5e+04
9.1e+01  4100 97 82 24  93100 97 82 27 33056525
BVCreate               2 1.0 5.0357e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00  0  0  0  0  2   0  0  0  0  2     0
BVCopy                 1 1.0 9.2030e-05 2.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
BVMultVec            132 1.0 7.2259e-01 1.3 5.26e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   1  0  0  0  0 14567
BVMultInPlace          1 1.0 2.2316e-01 1.1 6.40e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 57357
BVDotVec             132 1.0 1.3370e+00 1.1 5.46e+08 1.0 0.0e+00 0.0e+00
1.3e+02  0  0  0  0 35   1  0  0  0 40  8169
BVOrthogonalizeV      81 1.0 1.9413e+00 1.1 1.07e+09 1.0 0.0e+00 0.0e+00
1.3e+02  0  0  0  0 35   2  0  0  0 40 11048
BVScale               89 1.0 3.0558e-03 1.4 4.45e+06 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 29125
BVNormVec              8 1.0 1.5073e-02 1.9 1.20e+06 1.0 0.0e+00 0.0e+00
1.0e+01  0  0  0  0  3   0  0  0  0  3  1592
BVSetRandom            1 1.0 4.3440e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSSolve                1 1.0 2.5339e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSVectors             80 1.0 3.5286e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSOther                1 1.0 6.0797e-05 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: Setting Up EPS

BuildTwoSidedF         3 1.0 2.8591e-0211.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                 4 1.0 6.1312e-03122.5 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCholFctrSym         1 1.0 1.1540e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
5.0e+00  1  0  0  0  1   1  0  0  0 11     0
MatCholFctrNum         2 1.0 2.1019e+03 1.0 1.00e+09 4.3 0.0e+00 0.0e+00
0.0e+00 95  0  0  0  0  99100  0  0  0     4
MatCopy                1 1.0 3.3707e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  1   0  0  0  0  4     0
MatConvert             1 1.0 6.1760e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       3 1.0 2.8630e-0211.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         3 1.0 3.2575e-02 1.1 0.00e+00 0.0 1.6e+02 2.0e+04
1.8e+01  0  0  1  0  5   0  0100100 39     0
MatGetRowIJ            1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 2.6703e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         1 1.0 1.0121e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAXPY                2 1.0 1.1354e-01 1.1 0.00e+00 0.0 1.6e+02 2.0e+04
2.0e+01  0  0  1  0  5   0  0100100 43     0
KSPSetUp               2 1.0 2.1458e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCSetUp                2 1.0 2.1135e+03 1.0 1.00e+09 4.3 0.0e+00 0.0e+00
1.2e+01 95  0  0  0  3 100100  0  0 26     4
EPSSetUp               1 1.0 2.1137e+03 1.0 1.00e+09 4.3 1.6e+02 2.0e+04
4.6e+01 95  0  1  0 12 100100100100100     4
STSetUp                2 1.0 1.0712e+03 1.0 4.95e+08 4.3 8.0e+01 2.0e+04
2.6e+01 48  0  0  0  7  51 50 50 50 57     3
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants'
Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage 

              Vector    37             50    126614208     0.
              Matrix    13             17    159831092     0.
              Viewer     6              5         4200     0.
           Index Set    12             13      2507240     0.
         Vec Scatter     5              7       128984     0.
       Krylov Solver     3              4        22776     0.
      Preconditioner     3              4         3848     0.
          EPS Solver     1              2         8632     0.
  Spectral Transform     1              2         1664     0.
       Basis Vectors     3              4        45600     0.
         PetscRandom     2              2         1292     0.
              Region     1              2         1344     0.
       Direct Solver     1              2       163856     0.

--- Event Stage 1: Setting Up EPS

              Vector    19              6       729576     0.
              Matrix    10              6     12178892     0.
           Index Set     9              8       766336     0.
         Vec Scatter     4              2         2640     0.
       Krylov Solver     1              0            0     0.
      Preconditioner     1              0            0     0.
          EPS Solver     1              0            0     0.
  Spectral Transform     1              0            0     0.
       Basis Vectors     1              0            0     0.
              Region     1              0            0     0.
       Direct Solver     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.000263596
Average time for zero size MPI_Send(): 5.78523e-05
#PETSc Option Table entries:
-log_view
-mat_mumps_cntl_3 1e-12
-mat_mumps_icntl_13 1
-mat_mumps_icntl_14 60
-mat_mumps_icntl_24 1
-matload_block_size 1
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --prefix=/share/apps/petsc/3.10.5 --with-cc=gcc
--with-cxx=g++ --with-fc=gfortran --with-debugging=0 COPTFLAGS="-O3
-march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native
-mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native"
--download-mpich --download-fblaslapack --download-scalapack
--download-mumps 

Best regards, 

Perceval, 

> On Mon, Nov 25, 2019 at 11:45 AM Perceval Desforges 
> <perceval.desfor...@polytechnique.edu> wrote: 
> 
>> I am basically trying to solve a finite element problem, which is why in 3D 
>> I have 7 non-zero diagonals that are quite farm apart from one another. In 
>> 2D I only have 5 non-zero diagonals that are less far apart. So is it normal 
>> that the set up time is around 400 times greater in the 3D case? Is there 
>> nothing to be done?
> 
> No. It is almost certain that preallocation is screwed up. There is no way it 
> can take 400x longer for a few nonzeros. 
> 
> In order to debug, please send the output of -log_view and indicate where the 
> time is taken for assembly. You can usually 
> track down bad preallocation using -info. 
> 
> Thanks, 
> 
> Matt  
> 
> I will try setting up only one partition. 
> 
> Thanks, 
> 
> Perceval, 
> Probably it is not a preallocation issue, as it shows "total number of 
> mallocs used during MatSetValues calls =0".
> 
> Adding new diagonals may increase fill-in a lot, if the new diagonals are 
> displaced with respect to the other ones.
> 
> The partitions option is intended for running several nodes. If you are using 
> just one node probably it is better to set one partition only.
> 
> Jose
> 
> El 25 nov 2019, a las 18:25, Matthew Knepley <knep...@gmail.com> escribió:
> 
> On Mon, Nov 25, 2019 at 11:20 AM Perceval Desforges 
> <perceval.desfor...@polytechnique.edu> wrote:
> Hi,
> 
> So I'm loading two matrices from files, both 1000000 by 10000000. I ran the 
> program with -mat_view::ascii_info and I got:
> 
> Mat Object: 1 MPI processes
> type: seqaij
> rows=1000000, cols=1000000
> total: nonzeros=7000000, allocated nonzeros=7000000
> total number of mallocs used during MatSetValues calls =0
> not using I-node routines
> 
> 20 times, and then
> 
> Mat Object: 1 MPI processes
> type: seqaij
> rows=1000000, cols=1000000
> total: nonzeros=1000000, allocated nonzeros=1000000
> total number of mallocs used during MatSetValues calls =0
> not using I-node routines
> 
> 20 times as well, and then
> 
> Mat Object: 1 MPI processes
> type: seqaij
> rows=1000000, cols=1000000
> total: nonzeros=7000000, allocated nonzeros=7000000
> total number of mallocs used during MatSetValues calls =0
> not using I-node routines
> 
> 20 times as well before crashing.
> 
> I realized it might be because I am setting up 20 krylov schur partitions 
> which may be too much. I tried running the code again with only 2 partitions 
> and now the code runs but I have speed issues.
> 
> I have one version of the code where my first matrix has 5 non-zero diagonals 
> (so 5000000 non-zero entries), and the set up time is quite fast (8 seconds)  
> and solving is also quite fast. The second version is the same but I have two 
> extra non-zero diagonals (7000000 non-zero entries)  and the set up time is a 
> lot slower (2900 seconds ~ 50 minutes) and solving is also a lot slower. Is 
> it normal that adding two extra diagonals increases solve and set up time so 
> much?
> 
> I can't see the rest of your code, but I am guessing your preallocation 
> statement has "5", so it does no mallocs when you create
> your first matrix, but mallocs for every row when you create your second 
> matrix. When you load them from disk, we do all the
> preallocation correctly.
> 
> Thanks,
> 
> Matt 
> Thanks again,
> 
> Best regards,
> 
> Perceval,
> 
> Then I guess it is the factorization that is failing. How many nonzero 
> entries do you have? Run with
> -mat_view ::ascii_info
> 
> Jose
> 
> El 22 nov 2019, a las 19:56, Perceval Desforges 
> <perceval.desfor...@polytechnique.edu> escribió:
> 
> Hi,
> 
> Thanks for your answer. I tried looking at the inertias before solving, but 
> the problem is that the program crashes when I call EPSSetUp with this error:
> 
> slurmstepd: error: Step 2140.0 exceeded virtual memory limit (313526508 > 
> 107317760), being killed
> 
> I get this error even when there are no eigenvalues in the interval.
> 
> I've started using BVMAT instead of BVVECS by the way.
> 
> Thanks,
> 
> Perceval,
> 
> Don't use -mat_mumps_icntl_14 to reduce the memory used by MUMPS.
> 
> Most likely the problem is that the interval you gave is too large and 
> contains too many eigenvalues (SLEPc needs to allocate at least one vector 
> per each eigenvalue). You can count the eigenvalues in the interval with the 
> inertias, which are available at EPSSetUp (no need to call EPSSolve). See 
> this example:
> http://slepc.upv.es/documentation/current/src/eps/examples/tutorials/ex25.c.html
> You can comment out the call to EPSSolve() and run with the option 
> -show_inertias
> For example, the output
> Shift 0.1  Inertia 3 
> Shift 0.35  Inertia 11 
> means that the interval [0.1,0.35] contains 8 eigenvalues (=11-3).
> 
> By the way, I would suggest using BVMAT instead of BVVECS (the latter is 
> slower).
> 
> Jose
> 
> El 21 nov 2019, a las 18:13, Perceval Desforges via petsc-users 
> <petsc-users@mcs.anl.gov> escribió:
> 
> Hello all,
> 
> I am trying to obtain all the eigenvalues in a certain interval for a fairly 
> large matrix (1000000 * 1000000). I therefore use the spectrum slicing method 
> detailed in section 3.4.5 of the manual. The calculations are run on a 
> processor with 20 cores and 96 Go of RAM.
> 
> The options I use are :
> 
> -bv_type vecs  -eps_krylovschur_detect_zeros 1 -mat_mumps_icntl_13 1 
> -mat_mumps_icntl_24 1 -mat_mumps_cntl_3 1e-12
> 
> However the program quickly crashes with this error:
> 
> slurmstepd: error: Step 2115.0 exceeded virtual memory limit (312121084 > 
> 107317760), being killed
> 
> I've tried reducing the amount of memory used by slepc with the 
> -mat_mumps_icntl_14 option by setting it at -70 for example but then I get 
> this error:
> 
> [1]PETSC ERROR: Error in external library
> [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: 
> INFOG(1)=-9, INFO(2)=82733614
> 
> which is an error due to setting the mumps icntl option so low from what I've 
> gathered.
> 
> Is there any other way I can reduce memory usage?
> 
> Thanks,
> 
> Regards,
> 
> Perceval,
> 
> P.S. I sent the same email a few minutes ago but I think I made a mistake in 
> the address, I'm sorry if I've sent it twice.

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 

  -- 

What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener 

https://www.cse.buffalo.edu/~knepley/ [1] 

 

Links:
------
[1] http://www.cse.buffalo.edu/~knepley/

Re: [petsc-users] Memory optimization

Reply via email to