On 19 September 2016 at 21:05, David Knezevic <david.kneze...@akselos.com> wrote:
> When I use MUMPS via PETSc, one issue is that it can sometimes fail with > MUMPS error -9, which means that MUMPS didn't allocate a big enough > workspace. This can typically be fixed by increasing MUMPS icntl 14, e.g. > via the command line option -mat_mumps_icntl_14. > > However, instead of having to run several times with different command > line options, I'd like to be able to automatically increment icntl 14 value > in a loop until the solve succeeds. > > I have a saved matrix which fails when I use it for a solve with MUMPS > with 4 MPI processes and the default ictnl values, so I'm using this to > check that I can achieve the automatic icntl 14 update, as described above. > (The matrix is 14MB so I haven't attached it here, but I'd be happy to send > it to anyone else who wants to try this test case out.) > > I've pasted some test code below which provides a simple test of this idea > using two solves. The first solve uses the default value of icntl 14, which > fails, and then we update icntl 14 to 30 and solve again. The second solve > should succeed since icntl 14 of 30 is sufficient for MUMPS to succeed in > this case, but for some reason the second solve still fails. > > Below I've also pasted the output from -ksp_view, and you can see that > ictnl 14 is being updated correctly (see the ICNTL(14) lines in the > output), so it's not clear to me why the second solve fails. It seems like > MUMPS is ignoring the update to the ictnl value? > I believe this parameter is utilized during the numerical factorization phase. In your code, the operator hasn't changed, however you haven't signalled to the KSP that you want to re-perform the numerical factorization. You can do this by calling KSPSetOperators() before your second solve. I think if you do this (please try it), the factorization will be performed again and the new value of icntl will have an effect. Note this is a wild stab in the dark - I haven't dug through the petsc-mumps code in detail... Thanks, Dave > > > Thanks, > David > > ------------------------------------------------------------ > ----------------------------------------- > Test code: > > Mat A; > MatCreate(PETSC_COMM_WORLD,&A); > MatSetType(A,MATMPIAIJ); > > PetscViewer petsc_viewer; > PetscViewerBinaryOpen( PETSC_COMM_WORLD, > "matrix.dat", > FILE_MODE_READ, > &petsc_viewer); > MatLoad(A, petsc_viewer); > PetscViewerDestroy(&petsc_viewer); > > PetscInt m, n; > MatGetSize(A, &m, &n); > > Vec x; > VecCreate(PETSC_COMM_WORLD,&x); > VecSetSizes(x,PETSC_DECIDE,m); > VecSetFromOptions(x); > VecSet(x,1.0); > > Vec b; > VecDuplicate(x,&b); > > KSP ksp; > PC pc; > > KSPCreate(PETSC_COMM_WORLD,&ksp); > KSPSetOperators(ksp,A,A); > > KSPSetType(ksp,KSPPREONLY); > KSPGetPC(ksp,&pc); > > PCSetType(pc,PCCHOLESKY); > > PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS); > PCFactorSetUpMatSolverPackage(pc); > > KSPSetFromOptions(ksp); > KSPSetUp(ksp); > > KSPSolve(ksp,b,x); > > { > KSPConvergedReason reason; > KSPGetConvergedReason(ksp, &reason); > std::cout << "converged reason: " << reason << std::endl; > } > > Mat F; > PCFactorGetMatrix(pc,&F); > MatMumpsSetIcntl(F,14,30); > > KSPSolve(ksp,b,x); > > { > KSPConvergedReason reason; > KSPGetConvergedReason(ksp, &reason); > std::cout << "converged reason: " << reason << std::endl; > } > > ------------------------------------------------------------ > ----------------------------------------- > -ksp_view output (ICNTL(14) changes from 20 to 30, but we get "converged > reason: -11" for both solves) > > KSP Object: 4 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: 4 MPI processes > type: cholesky > Cholesky: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 0., needed 0. > Factored matrix follows: > Mat Object: 4 MPI processes > type: mpiaij > rows=22878, cols=22878 > package used to perform factorization: mumps > total: nonzeros=3361617, allocated nonzeros=3361617 > total number of mallocs used during MatSetValues calls =0 > MUMPS run parameters: > SYM (matrix type): 2 > PAR (host participation): 1 > ICNTL(1) (output for error): 6 > ICNTL(2) (output of diagnostic msg): 0 > ICNTL(3) (output for global info): 0 > ICNTL(4) (level of printing): 0 > ICNTL(5) (input mat struct): 0 > ICNTL(6) (matrix prescaling): 7 > ICNTL(7) (sequentia matrix ordering):7 > ICNTL(8) (scalling strategy): 77 > ICNTL(10) (max num of refinements): 0 > ICNTL(11) (error analysis): 0 > ICNTL(12) (efficiency control): 0 > ICNTL(13) (efficiency control): 0 > ICNTL(14) (percentage of estimated workspace increase): 20 > ICNTL(18) (input mat struct): 3 > ICNTL(19) (Shur complement info): 0 > ICNTL(20) (rhs sparse pattern): 0 > ICNTL(21) (solution struct): 1 > ICNTL(22) (in-core/out-of-core facility): 0 > ICNTL(23) (max size of memory can be allocated locally):0 > ICNTL(24) (detection of null pivot rows): 0 > ICNTL(25) (computation of a null space basis): 0 > ICNTL(26) (Schur options for rhs or solution): 0 > ICNTL(27) (experimental parameter): -24 > ICNTL(28) (use parallel or sequential ordering): 1 > ICNTL(29) (parallel ordering): 0 > ICNTL(30) (user-specified set of entries in inv(A)): 0 > ICNTL(31) (factors is discarded in the solve phase): 0 > ICNTL(33) (compute determinant): 0 > CNTL(1) (relative pivoting threshold): 0.01 > CNTL(2) (stopping criterion of refinement): 1.49012e-08 > CNTL(3) (absolute pivoting threshold): 0. > CNTL(4) (value of static pivoting): -1. > CNTL(5) (fixation for null pivots): 0. > RINFO(1) (local estimated flops for the elimination after > analysis): > [0] 1.84947e+08 > [1] 2.42065e+08 > [2] 2.53044e+08 > [3] 2.18441e+08 > RINFO(2) (local estimated flops for the assembly after > factorization): > [0] 945938. > [1] 906795. > [2] 897815. > [3] 998840. > RINFO(3) (local estimated flops for the elimination after > factorization): > [0] 1.59835e+08 > [1] 1.50867e+08 > [2] 2.27932e+08 > [3] 1.52037e+08 > INFO(15) (estimated size of (in MB) MUMPS internal data for > running numerical factorization): > [0] 36 > [1] 37 > [2] 38 > [3] 39 > INFO(16) (size of (in MB) MUMPS internal data used during > numerical factorization): > [0] 36 > [1] 37 > [2] 38 > [3] 39 > INFO(23) (num of pivots eliminated on this processor after > factorization): > [0] 6450 > [1] 5442 > [2] 4386 > [3] 5526 > RINFOG(1) (global estimated flops for the elimination after > analysis): 8.98497e+08 > RINFOG(2) (global estimated flops for the assembly after > factorization): 3.74939e+06 > RINFOG(3) (global estimated flops for the elimination after > factorization): 6.9067e+08 > (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): > (0.,0.)*(2^0) > INFOG(3) (estimated real workspace for factors on all > processors after analysis): 4082184 > INFOG(4) (estimated integer workspace for factors on all > processors after analysis): 231846 > INFOG(5) (estimated maximum front size in the complete > tree): 678 > INFOG(6) (number of nodes in the complete tree): 1380 > INFOG(7) (ordering option effectively use after analysis): 5 > INFOG(8) (structural symmetry in percent of the permuted > matrix after analysis): 100 > INFOG(9) (total real/complex workspace to store the matrix > factors after factorization): 3521904 > INFOG(10) (total integer space store the matrix factors > after factorization): 229416 > INFOG(11) (order of largest frontal matrix after > factorization): 678 > INFOG(12) (number of off-diagonal pivots): 0 > INFOG(13) (number of delayed pivots after factorization): 0 > INFOG(14) (number of memory compress after factorization): 0 > INFOG(15) (number of steps of iterative refinement after > solution): 0 > INFOG(16) (estimated size (in MB) of all MUMPS internal data > for factorization after analysis: value on the most memory consuming > processor): 39 > INFOG(17) (estimated size of all MUMPS internal data for > factorization after analysis: sum over all processors): 150 > INFOG(18) (size of all MUMPS internal data allocated during > factorization: value on the most memory consuming processor): 39 > INFOG(19) (size of all MUMPS internal data allocated during > factorization: sum over all processors): 150 > INFOG(20) (estimated number of entries in the factors): > 3361617 > INFOG(21) (size in MB of memory effectively used during > factorization - value on the most memory consuming processor): 35 > INFOG(22) (size in MB of memory effectively used during > factorization - sum over all processors): 136 > INFOG(23) (after analysis: value of ICNTL(6) effectively > used): 0 > INFOG(24) (after analysis: value of ICNTL(12) effectively > used): 1 > INFOG(25) (after factorization: number of pivots modified by > static pivoting): 0 > INFOG(28) (after factorization: number of null pivots > encountered): 0 > INFOG(29) (after factorization: effective number of entries > in the factors (sum over all processors)): 2931438 > INFOG(30, 31) (after solution: size in Mbytes of memory used > during solution phase): 0, 0 > INFOG(32) (after analysis: type of analysis done): 1 > INFOG(33) (value used for ICNTL(8)): 7 > INFOG(34) (exponent of the determinant if determinant is > requested): 0 > linear system matrix = precond matrix: > Mat Object: 4 MPI processes > type: mpiaij > rows=22878, cols=22878 > total: nonzeros=1219140, allocated nonzeros=1219140 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 1889 nodes, limit used > is 5 > converged reason: -11 > KSP Object: 4 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: 4 MPI processes > type: cholesky > Cholesky: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 0., needed 0. > Factored matrix follows: > Mat Object: 4 MPI processes > type: mpiaij > rows=22878, cols=22878 > package used to perform factorization: mumps > total: nonzeros=3361617, allocated nonzeros=3361617 > total number of mallocs used during MatSetValues calls =0 > MUMPS run parameters: > SYM (matrix type): 2 > PAR (host participation): 1 > ICNTL(1) (output for error): 6 > ICNTL(2) (output of diagnostic msg): 0 > ICNTL(3) (output for global info): 0 > ICNTL(4) (level of printing): 0 > ICNTL(5) (input mat struct): 0 > ICNTL(6) (matrix prescaling): 7 > ICNTL(7) (sequentia matrix ordering):7 > ICNTL(8) (scalling strategy): 77 > ICNTL(10) (max num of refinements): 0 > ICNTL(11) (error analysis): 0 > ICNTL(12) (efficiency control): 0 > ICNTL(13) (efficiency control): 0 > ICNTL(14) (percentage of estimated workspace increase): 30 > ICNTL(18) (input mat struct): 3 > ICNTL(19) (Shur complement info): 0 > ICNTL(20) (rhs sparse pattern): 0 > ICNTL(21) (solution struct): 1 > ICNTL(22) (in-core/out-of-core facility): 0 > ICNTL(23) (max size of memory can be allocated locally):0 > ICNTL(24) (detection of null pivot rows): 0 > ICNTL(25) (computation of a null space basis): 0 > ICNTL(26) (Schur options for rhs or solution): 0 > ICNTL(27) (experimental parameter): -24 > ICNTL(28) (use parallel or sequential ordering): 1 > ICNTL(29) (parallel ordering): 0 > ICNTL(30) (user-specified set of entries in inv(A)): 0 > ICNTL(31) (factors is discarded in the solve phase): 0 > ICNTL(33) (compute determinant): 0 > CNTL(1) (relative pivoting threshold): 0.01 > CNTL(2) (stopping criterion of refinement): 1.49012e-08 > CNTL(3) (absolute pivoting threshold): 0. > CNTL(4) (value of static pivoting): -1. > CNTL(5) (fixation for null pivots): 0. > RINFO(1) (local estimated flops for the elimination after > analysis): > [0] 1.84947e+08 > [1] 2.42065e+08 > [2] 2.53044e+08 > [3] 2.18441e+08 > RINFO(2) (local estimated flops for the assembly after > factorization): > [0] 945938. > [1] 906795. > [2] 897815. > [3] 998840. > RINFO(3) (local estimated flops for the elimination after > factorization): > [0] 1.59835e+08 > [1] 1.50867e+08 > [2] 2.27932e+08 > [3] 1.52037e+08 > INFO(15) (estimated size of (in MB) MUMPS internal data for > running numerical factorization): > [0] 36 > [1] 37 > [2] 38 > [3] 39 > INFO(16) (size of (in MB) MUMPS internal data used during > numerical factorization): > [0] 36 > [1] 37 > [2] 38 > [3] 39 > INFO(23) (num of pivots eliminated on this processor after > factorization): > [0] 6450 > [1] 5442 > [2] 4386 > [3] 5526 > RINFOG(1) (global estimated flops for the elimination after > analysis): 8.98497e+08 > RINFOG(2) (global estimated flops for the assembly after > factorization): 3.74939e+06 > RINFOG(3) (global estimated flops for the elimination after > factorization): 6.9067e+08 > (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): > (0.,0.)*(2^0) > INFOG(3) (estimated real workspace for factors on all > processors after analysis): 4082184 > INFOG(4) (estimated integer workspace for factors on all > processors after analysis): 231846 > INFOG(5) (estimated maximum front size in the complete > tree): 678 > INFOG(6) (number of nodes in the complete tree): 1380 > INFOG(7) (ordering option effectively use after analysis): 5 > INFOG(8) (structural symmetry in percent of the permuted > matrix after analysis): 100 > INFOG(9) (total real/complex workspace to store the matrix > factors after factorization): 3521904 > INFOG(10) (total integer space store the matrix factors > after factorization): 229416 > INFOG(11) (order of largest frontal matrix after > factorization): 678 > INFOG(12) (number of off-diagonal pivots): 0 > INFOG(13) (number of delayed pivots after factorization): 0 > INFOG(14) (number of memory compress after factorization): 0 > INFOG(15) (number of steps of iterative refinement after > solution): 0 > INFOG(16) (estimated size (in MB) of all MUMPS internal data > for factorization after analysis: value on the most memory consuming > processor): 39 > INFOG(17) (estimated size of all MUMPS internal data for > factorization after analysis: sum over all processors): 150 > INFOG(18) (size of all MUMPS internal data allocated during > factorization: value on the most memory consuming processor): 39 > INFOG(19) (size of all MUMPS internal data allocated during > factorization: sum over all processors): 150 > INFOG(20) (estimated number of entries in the factors): > 3361617 > INFOG(21) (size in MB of memory effectively used during > factorization - value on the most memory consuming processor): 35 > INFOG(22) (size in MB of memory effectively used during > factorization - sum over all processors): 136 > INFOG(23) (after analysis: value of ICNTL(6) effectively > used): 0 > INFOG(24) (after analysis: value of ICNTL(12) effectively > used): 1 > INFOG(25) (after factorization: number of pivots modified by > static pivoting): 0 > INFOG(28) (after factorization: number of null pivots > encountered): 0 > INFOG(29) (after factorization: effective number of entries > in the factors (sum over all processors)): 2931438 > INFOG(30, 31) (after solution: size in Mbytes of memory used > during solution phase): 0, 0 > INFOG(32) (after analysis: type of analysis done): 1 > INFOG(33) (value used for ICNTL(8)): 7 > INFOG(34) (exponent of the determinant if determinant is > requested): 0 > linear system matrix = precond matrix: > Mat Object: 4 MPI processes > type: mpiaij > rows=22878, cols=22878 > total: nonzeros=1219140, allocated nonzeros=1219140 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 1889 nodes, limit used > is 5 > converged reason: -11 > > ------------------------------------------------------------ > ----------------------------------------- >