Another guess: maybe you also need KSPSetUp(ksp); before the second KSPSolve(ksp,b,x);.
Murat Keceli On Mon, Sep 19, 2016 at 8:33 PM, David Knezevic <david.kneze...@akselos.com> wrote: > On Mon, Sep 19, 2016 at 7:26 PM, Dave May <dave.mayhe...@gmail.com> wrote: > >> >> >> On 19 September 2016 at 21:05, David Knezevic <david.kneze...@akselos.com >> > wrote: >> >>> When I use MUMPS via PETSc, one issue is that it can sometimes fail with >>> MUMPS error -9, which means that MUMPS didn't allocate a big enough >>> workspace. This can typically be fixed by increasing MUMPS icntl 14, e.g. >>> via the command line option -mat_mumps_icntl_14. >>> >>> However, instead of having to run several times with different command >>> line options, I'd like to be able to automatically increment icntl 14 value >>> in a loop until the solve succeeds. >>> >>> I have a saved matrix which fails when I use it for a solve with MUMPS >>> with 4 MPI processes and the default ictnl values, so I'm using this to >>> check that I can achieve the automatic icntl 14 update, as described above. >>> (The matrix is 14MB so I haven't attached it here, but I'd be happy to send >>> it to anyone else who wants to try this test case out.) >>> >>> I've pasted some test code below which provides a simple test of this >>> idea using two solves. The first solve uses the default value of icntl 14, >>> which fails, and then we update icntl 14 to 30 and solve again. The second >>> solve should succeed since icntl 14 of 30 is sufficient for MUMPS to >>> succeed in this case, but for some reason the second solve still fails. >>> >>> Below I've also pasted the output from -ksp_view, and you can see that >>> ictnl 14 is being updated correctly (see the ICNTL(14) lines in the >>> output), so it's not clear to me why the second solve fails. It seems like >>> MUMPS is ignoring the update to the ictnl value? >>> >> >> I believe this parameter is utilized during the numerical factorization >> phase. >> In your code, the operator hasn't changed, however you haven't signalled >> to the KSP that you want to re-perform the numerical factorization. >> You can do this by calling KSPSetOperators() before your second solve. >> I think if you do this (please try it), the factorization will be >> performed again and the new value of icntl will have an effect. >> >> Note this is a wild stab in the dark - I haven't dug through the >> petsc-mumps code in detail... >> > > That sounds like a plausible guess to me, but unfortunately it didn't > work. I added KSPSetOperators(ksp,A,A); before the second solve and I got > the same behavior as before. > > Thanks, > David > > > > > >> ------------------------------------------------------------ >>> ----------------------------------------- >>> Test code: >>> >>> Mat A; >>> MatCreate(PETSC_COMM_WORLD,&A); >>> MatSetType(A,MATMPIAIJ); >>> >>> PetscViewer petsc_viewer; >>> PetscViewerBinaryOpen( PETSC_COMM_WORLD, >>> "matrix.dat", >>> FILE_MODE_READ, >>> &petsc_viewer); >>> MatLoad(A, petsc_viewer); >>> PetscViewerDestroy(&petsc_viewer); >>> >>> PetscInt m, n; >>> MatGetSize(A, &m, &n); >>> >>> Vec x; >>> VecCreate(PETSC_COMM_WORLD,&x); >>> VecSetSizes(x,PETSC_DECIDE,m); >>> VecSetFromOptions(x); >>> VecSet(x,1.0); >>> >>> Vec b; >>> VecDuplicate(x,&b); >>> >>> KSP ksp; >>> PC pc; >>> >>> KSPCreate(PETSC_COMM_WORLD,&ksp); >>> KSPSetOperators(ksp,A,A); >>> >>> KSPSetType(ksp,KSPPREONLY); >>> KSPGetPC(ksp,&pc); >>> >>> PCSetType(pc,PCCHOLESKY); >>> >>> PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS); >>> PCFactorSetUpMatSolverPackage(pc); >>> >>> KSPSetFromOptions(ksp); >>> KSPSetUp(ksp); >>> >>> KSPSolve(ksp,b,x); >>> >>> { >>> KSPConvergedReason reason; >>> KSPGetConvergedReason(ksp, &reason); >>> std::cout << "converged reason: " << reason << std::endl; >>> } >>> >>> Mat F; >>> PCFactorGetMatrix(pc,&F); >>> MatMumpsSetIcntl(F,14,30); >>> >>> KSPSolve(ksp,b,x); >>> >>> { >>> KSPConvergedReason reason; >>> KSPGetConvergedReason(ksp, &reason); >>> std::cout << "converged reason: " << reason << std::endl; >>> } >>> >>> ------------------------------------------------------------ >>> ----------------------------------------- >>> -ksp_view output (ICNTL(14) changes from 20 to 30, but we get "converged >>> reason: -11" for both solves) >>> >>> KSP Object: 4 MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: 4 MPI processes >>> type: cholesky >>> Cholesky: out-of-place factorization >>> tolerance for zero pivot 2.22045e-14 >>> matrix ordering: natural >>> factor fill ratio given 0., needed 0. >>> Factored matrix follows: >>> Mat Object: 4 MPI processes >>> type: mpiaij >>> rows=22878, cols=22878 >>> package used to perform factorization: mumps >>> total: nonzeros=3361617, allocated nonzeros=3361617 >>> total number of mallocs used during MatSetValues calls =0 >>> MUMPS run parameters: >>> SYM (matrix type): 2 >>> PAR (host participation): 1 >>> ICNTL(1) (output for error): 6 >>> ICNTL(2) (output of diagnostic msg): 0 >>> ICNTL(3) (output for global info): 0 >>> ICNTL(4) (level of printing): 0 >>> ICNTL(5) (input mat struct): 0 >>> ICNTL(6) (matrix prescaling): 7 >>> ICNTL(7) (sequentia matrix ordering):7 >>> ICNTL(8) (scalling strategy): 77 >>> ICNTL(10) (max num of refinements): 0 >>> ICNTL(11) (error analysis): 0 >>> ICNTL(12) (efficiency control): 0 >>> ICNTL(13) (efficiency control): 0 >>> ICNTL(14) (percentage of estimated workspace increase): 20 >>> ICNTL(18) (input mat struct): 3 >>> ICNTL(19) (Shur complement info): 0 >>> ICNTL(20) (rhs sparse pattern): 0 >>> ICNTL(21) (solution struct): 1 >>> ICNTL(22) (in-core/out-of-core facility): 0 >>> ICNTL(23) (max size of memory can be allocated locally):0 >>> ICNTL(24) (detection of null pivot rows): 0 >>> ICNTL(25) (computation of a null space basis): 0 >>> ICNTL(26) (Schur options for rhs or solution): 0 >>> ICNTL(27) (experimental parameter): >>> -24 >>> ICNTL(28) (use parallel or sequential ordering): 1 >>> ICNTL(29) (parallel ordering): 0 >>> ICNTL(30) (user-specified set of entries in inv(A)): 0 >>> ICNTL(31) (factors is discarded in the solve phase): 0 >>> ICNTL(33) (compute determinant): 0 >>> CNTL(1) (relative pivoting threshold): 0.01 >>> CNTL(2) (stopping criterion of refinement): 1.49012e-08 >>> CNTL(3) (absolute pivoting threshold): 0. >>> CNTL(4) (value of static pivoting): -1. >>> CNTL(5) (fixation for null pivots): 0. >>> RINFO(1) (local estimated flops for the elimination after >>> analysis): >>> [0] 1.84947e+08 >>> [1] 2.42065e+08 >>> [2] 2.53044e+08 >>> [3] 2.18441e+08 >>> RINFO(2) (local estimated flops for the assembly after >>> factorization): >>> [0] 945938. >>> [1] 906795. >>> [2] 897815. >>> [3] 998840. >>> RINFO(3) (local estimated flops for the elimination after >>> factorization): >>> [0] 1.59835e+08 >>> [1] 1.50867e+08 >>> [2] 2.27932e+08 >>> [3] 1.52037e+08 >>> INFO(15) (estimated size of (in MB) MUMPS internal data >>> for running numerical factorization): >>> [0] 36 >>> [1] 37 >>> [2] 38 >>> [3] 39 >>> INFO(16) (size of (in MB) MUMPS internal data used during >>> numerical factorization): >>> [0] 36 >>> [1] 37 >>> [2] 38 >>> [3] 39 >>> INFO(23) (num of pivots eliminated on this processor after >>> factorization): >>> [0] 6450 >>> [1] 5442 >>> [2] 4386 >>> [3] 5526 >>> RINFOG(1) (global estimated flops for the elimination >>> after analysis): 8.98497e+08 >>> RINFOG(2) (global estimated flops for the assembly after >>> factorization): 3.74939e+06 >>> RINFOG(3) (global estimated flops for the elimination >>> after factorization): 6.9067e+08 >>> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): >>> (0.,0.)*(2^0) >>> INFOG(3) (estimated real workspace for factors on all >>> processors after analysis): 4082184 >>> INFOG(4) (estimated integer workspace for factors on all >>> processors after analysis): 231846 >>> INFOG(5) (estimated maximum front size in the complete >>> tree): 678 >>> INFOG(6) (number of nodes in the complete tree): 1380 >>> INFOG(7) (ordering option effectively use after analysis): >>> 5 >>> INFOG(8) (structural symmetry in percent of the permuted >>> matrix after analysis): 100 >>> INFOG(9) (total real/complex workspace to store the matrix >>> factors after factorization): 3521904 >>> INFOG(10) (total integer space store the matrix factors >>> after factorization): 229416 >>> INFOG(11) (order of largest frontal matrix after >>> factorization): 678 >>> INFOG(12) (number of off-diagonal pivots): 0 >>> INFOG(13) (number of delayed pivots after factorization): >>> 0 >>> INFOG(14) (number of memory compress after factorization): >>> 0 >>> INFOG(15) (number of steps of iterative refinement after >>> solution): 0 >>> INFOG(16) (estimated size (in MB) of all MUMPS internal >>> data for factorization after analysis: value on the most memory consuming >>> processor): 39 >>> INFOG(17) (estimated size of all MUMPS internal data for >>> factorization after analysis: sum over all processors): 150 >>> INFOG(18) (size of all MUMPS internal data allocated >>> during factorization: value on the most memory consuming processor): 39 >>> INFOG(19) (size of all MUMPS internal data allocated >>> during factorization: sum over all processors): 150 >>> INFOG(20) (estimated number of entries in the factors): >>> 3361617 >>> INFOG(21) (size in MB of memory effectively used during >>> factorization - value on the most memory consuming processor): 35 >>> INFOG(22) (size in MB of memory effectively used during >>> factorization - sum over all processors): 136 >>> INFOG(23) (after analysis: value of ICNTL(6) effectively >>> used): 0 >>> INFOG(24) (after analysis: value of ICNTL(12) effectively >>> used): 1 >>> INFOG(25) (after factorization: number of pivots modified >>> by static pivoting): 0 >>> INFOG(28) (after factorization: number of null pivots >>> encountered): 0 >>> INFOG(29) (after factorization: effective number of >>> entries in the factors (sum over all processors)): 2931438 >>> INFOG(30, 31) (after solution: size in Mbytes of memory >>> used during solution phase): 0, 0 >>> INFOG(32) (after analysis: type of analysis done): 1 >>> INFOG(33) (value used for ICNTL(8)): 7 >>> INFOG(34) (exponent of the determinant if determinant is >>> requested): 0 >>> linear system matrix = precond matrix: >>> Mat Object: 4 MPI processes >>> type: mpiaij >>> rows=22878, cols=22878 >>> total: nonzeros=1219140, allocated nonzeros=1219140 >>> total number of mallocs used during MatSetValues calls =0 >>> using I-node (on process 0) routines: found 1889 nodes, limit used >>> is 5 >>> converged reason: -11 >>> KSP Object: 4 MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: 4 MPI processes >>> type: cholesky >>> Cholesky: out-of-place factorization >>> tolerance for zero pivot 2.22045e-14 >>> matrix ordering: natural >>> factor fill ratio given 0., needed 0. >>> Factored matrix follows: >>> Mat Object: 4 MPI processes >>> type: mpiaij >>> rows=22878, cols=22878 >>> package used to perform factorization: mumps >>> total: nonzeros=3361617, allocated nonzeros=3361617 >>> total number of mallocs used during MatSetValues calls =0 >>> MUMPS run parameters: >>> SYM (matrix type): 2 >>> PAR (host participation): 1 >>> ICNTL(1) (output for error): 6 >>> ICNTL(2) (output of diagnostic msg): 0 >>> ICNTL(3) (output for global info): 0 >>> ICNTL(4) (level of printing): 0 >>> ICNTL(5) (input mat struct): 0 >>> ICNTL(6) (matrix prescaling): 7 >>> ICNTL(7) (sequentia matrix ordering):7 >>> ICNTL(8) (scalling strategy): 77 >>> ICNTL(10) (max num of refinements): 0 >>> ICNTL(11) (error analysis): 0 >>> ICNTL(12) (efficiency control): 0 >>> ICNTL(13) (efficiency control): 0 >>> ICNTL(14) (percentage of estimated workspace increase): 30 >>> ICNTL(18) (input mat struct): 3 >>> ICNTL(19) (Shur complement info): 0 >>> ICNTL(20) (rhs sparse pattern): 0 >>> ICNTL(21) (solution struct): 1 >>> ICNTL(22) (in-core/out-of-core facility): 0 >>> ICNTL(23) (max size of memory can be allocated locally):0 >>> ICNTL(24) (detection of null pivot rows): 0 >>> ICNTL(25) (computation of a null space basis): 0 >>> ICNTL(26) (Schur options for rhs or solution): 0 >>> ICNTL(27) (experimental parameter): >>> -24 >>> ICNTL(28) (use parallel or sequential ordering): 1 >>> ICNTL(29) (parallel ordering): 0 >>> ICNTL(30) (user-specified set of entries in inv(A)): 0 >>> ICNTL(31) (factors is discarded in the solve phase): 0 >>> ICNTL(33) (compute determinant): 0 >>> CNTL(1) (relative pivoting threshold): 0.01 >>> CNTL(2) (stopping criterion of refinement): 1.49012e-08 >>> CNTL(3) (absolute pivoting threshold): 0. >>> CNTL(4) (value of static pivoting): -1. >>> CNTL(5) (fixation for null pivots): 0. >>> RINFO(1) (local estimated flops for the elimination after >>> analysis): >>> [0] 1.84947e+08 >>> [1] 2.42065e+08 >>> [2] 2.53044e+08 >>> [3] 2.18441e+08 >>> RINFO(2) (local estimated flops for the assembly after >>> factorization): >>> [0] 945938. >>> [1] 906795. >>> [2] 897815. >>> [3] 998840. >>> RINFO(3) (local estimated flops for the elimination after >>> factorization): >>> [0] 1.59835e+08 >>> [1] 1.50867e+08 >>> [2] 2.27932e+08 >>> [3] 1.52037e+08 >>> INFO(15) (estimated size of (in MB) MUMPS internal data >>> for running numerical factorization): >>> [0] 36 >>> [1] 37 >>> [2] 38 >>> [3] 39 >>> INFO(16) (size of (in MB) MUMPS internal data used during >>> numerical factorization): >>> [0] 36 >>> [1] 37 >>> [2] 38 >>> [3] 39 >>> INFO(23) (num of pivots eliminated on this processor after >>> factorization): >>> [0] 6450 >>> [1] 5442 >>> [2] 4386 >>> [3] 5526 >>> RINFOG(1) (global estimated flops for the elimination >>> after analysis): 8.98497e+08 >>> RINFOG(2) (global estimated flops for the assembly after >>> factorization): 3.74939e+06 >>> RINFOG(3) (global estimated flops for the elimination >>> after factorization): 6.9067e+08 >>> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): >>> (0.,0.)*(2^0) >>> INFOG(3) (estimated real workspace for factors on all >>> processors after analysis): 4082184 >>> INFOG(4) (estimated integer workspace for factors on all >>> processors after analysis): 231846 >>> INFOG(5) (estimated maximum front size in the complete >>> tree): 678 >>> INFOG(6) (number of nodes in the complete tree): 1380 >>> INFOG(7) (ordering option effectively use after analysis): >>> 5 >>> INFOG(8) (structural symmetry in percent of the permuted >>> matrix after analysis): 100 >>> INFOG(9) (total real/complex workspace to store the matrix >>> factors after factorization): 3521904 >>> INFOG(10) (total integer space store the matrix factors >>> after factorization): 229416 >>> INFOG(11) (order of largest frontal matrix after >>> factorization): 678 >>> INFOG(12) (number of off-diagonal pivots): 0 >>> INFOG(13) (number of delayed pivots after factorization): >>> 0 >>> INFOG(14) (number of memory compress after factorization): >>> 0 >>> INFOG(15) (number of steps of iterative refinement after >>> solution): 0 >>> INFOG(16) (estimated size (in MB) of all MUMPS internal >>> data for factorization after analysis: value on the most memory consuming >>> processor): 39 >>> INFOG(17) (estimated size of all MUMPS internal data for >>> factorization after analysis: sum over all processors): 150 >>> INFOG(18) (size of all MUMPS internal data allocated >>> during factorization: value on the most memory consuming processor): 39 >>> INFOG(19) (size of all MUMPS internal data allocated >>> during factorization: sum over all processors): 150 >>> INFOG(20) (estimated number of entries in the factors): >>> 3361617 >>> INFOG(21) (size in MB of memory effectively used during >>> factorization - value on the most memory consuming processor): 35 >>> INFOG(22) (size in MB of memory effectively used during >>> factorization - sum over all processors): 136 >>> INFOG(23) (after analysis: value of ICNTL(6) effectively >>> used): 0 >>> INFOG(24) (after analysis: value of ICNTL(12) effectively >>> used): 1 >>> INFOG(25) (after factorization: number of pivots modified >>> by static pivoting): 0 >>> INFOG(28) (after factorization: number of null pivots >>> encountered): 0 >>> INFOG(29) (after factorization: effective number of >>> entries in the factors (sum over all processors)): 2931438 >>> INFOG(30, 31) (after solution: size in Mbytes of memory >>> used during solution phase): 0, 0 >>> INFOG(32) (after analysis: type of analysis done): 1 >>> INFOG(33) (value used for ICNTL(8)): 7 >>> INFOG(34) (exponent of the determinant if determinant is >>> requested): 0 >>> linear system matrix = precond matrix: >>> Mat Object: 4 MPI processes >>> type: mpiaij >>> rows=22878, cols=22878 >>> total: nonzeros=1219140, allocated nonzeros=1219140 >>> total number of mallocs used during MatSetValues calls =0 >>> using I-node (on process 0) routines: found 1889 nodes, limit used >>> is 5 >>> converged reason: -11 >>> >>> ------------------------------------------------------------ >>> ----------------------------------------- >>> >> >>