Placing PCReset(PC pc) before the second kspsolve might works. Fande Kong,
On Mon, Sep 19, 2016 at 7:38 PM, murat keçeli <kec...@gmail.com> wrote: > Another guess: maybe you also need KSPSetUp(ksp); before the second > KSPSolve(ksp,b,x);. > > Murat Keceli > > > On Mon, Sep 19, 2016 at 8:33 PM, David Knezevic < > david.kneze...@akselos.com> wrote: > >> On Mon, Sep 19, 2016 at 7:26 PM, Dave May <dave.mayhe...@gmail.com> >> wrote: >> >>> >>> >>> On 19 September 2016 at 21:05, David Knezevic < >>> david.kneze...@akselos.com> wrote: >>> >>>> When I use MUMPS via PETSc, one issue is that it can sometimes fail >>>> with MUMPS error -9, which means that MUMPS didn't allocate a big enough >>>> workspace. This can typically be fixed by increasing MUMPS icntl 14, e.g. >>>> via the command line option -mat_mumps_icntl_14. >>>> >>>> However, instead of having to run several times with different command >>>> line options, I'd like to be able to automatically increment icntl 14 value >>>> in a loop until the solve succeeds. >>>> >>>> I have a saved matrix which fails when I use it for a solve with MUMPS >>>> with 4 MPI processes and the default ictnl values, so I'm using this to >>>> check that I can achieve the automatic icntl 14 update, as described above. >>>> (The matrix is 14MB so I haven't attached it here, but I'd be happy to send >>>> it to anyone else who wants to try this test case out.) >>>> >>>> I've pasted some test code below which provides a simple test of this >>>> idea using two solves. The first solve uses the default value of icntl 14, >>>> which fails, and then we update icntl 14 to 30 and solve again. The second >>>> solve should succeed since icntl 14 of 30 is sufficient for MUMPS to >>>> succeed in this case, but for some reason the second solve still fails. >>>> >>>> Below I've also pasted the output from -ksp_view, and you can see that >>>> ictnl 14 is being updated correctly (see the ICNTL(14) lines in the >>>> output), so it's not clear to me why the second solve fails. It seems like >>>> MUMPS is ignoring the update to the ictnl value? >>>> >>> >>> I believe this parameter is utilized during the numerical factorization >>> phase. >>> In your code, the operator hasn't changed, however you haven't signalled >>> to the KSP that you want to re-perform the numerical factorization. >>> You can do this by calling KSPSetOperators() before your second solve. >>> I think if you do this (please try it), the factorization will be >>> performed again and the new value of icntl will have an effect. >>> >>> Note this is a wild stab in the dark - I haven't dug through the >>> petsc-mumps code in detail... >>> >> >> That sounds like a plausible guess to me, but unfortunately it didn't >> work. I added KSPSetOperators(ksp,A,A); before the second solve and I >> got the same behavior as before. >> >> Thanks, >> David >> >> >> >> >> >>> ------------------------------------------------------------ >>>> ----------------------------------------- >>>> Test code: >>>> >>>> Mat A; >>>> MatCreate(PETSC_COMM_WORLD,&A); >>>> MatSetType(A,MATMPIAIJ); >>>> >>>> PetscViewer petsc_viewer; >>>> PetscViewerBinaryOpen( PETSC_COMM_WORLD, >>>> "matrix.dat", >>>> FILE_MODE_READ, >>>> &petsc_viewer); >>>> MatLoad(A, petsc_viewer); >>>> PetscViewerDestroy(&petsc_viewer); >>>> >>>> PetscInt m, n; >>>> MatGetSize(A, &m, &n); >>>> >>>> Vec x; >>>> VecCreate(PETSC_COMM_WORLD,&x); >>>> VecSetSizes(x,PETSC_DECIDE,m); >>>> VecSetFromOptions(x); >>>> VecSet(x,1.0); >>>> >>>> Vec b; >>>> VecDuplicate(x,&b); >>>> >>>> KSP ksp; >>>> PC pc; >>>> >>>> KSPCreate(PETSC_COMM_WORLD,&ksp); >>>> KSPSetOperators(ksp,A,A); >>>> >>>> KSPSetType(ksp,KSPPREONLY); >>>> KSPGetPC(ksp,&pc); >>>> >>>> PCSetType(pc,PCCHOLESKY); >>>> >>>> PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS); >>>> PCFactorSetUpMatSolverPackage(pc); >>>> >>>> KSPSetFromOptions(ksp); >>>> KSPSetUp(ksp); >>>> >>>> KSPSolve(ksp,b,x); >>>> >>>> { >>>> KSPConvergedReason reason; >>>> KSPGetConvergedReason(ksp, &reason); >>>> std::cout << "converged reason: " << reason << std::endl; >>>> } >>>> >>>> Mat F; >>>> PCFactorGetMatrix(pc,&F); >>>> MatMumpsSetIcntl(F,14,30); >>>> >>>> KSPSolve(ksp,b,x); >>>> >>>> { >>>> KSPConvergedReason reason; >>>> KSPGetConvergedReason(ksp, &reason); >>>> std::cout << "converged reason: " << reason << std::endl; >>>> } >>>> >>>> ------------------------------------------------------------ >>>> ----------------------------------------- >>>> -ksp_view output (ICNTL(14) changes from 20 to 30, but we get >>>> "converged reason: -11" for both solves) >>>> >>>> KSP Object: 4 MPI processes >>>> type: preonly >>>> maximum iterations=10000, initial guess is zero >>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using NONE norm type for convergence test >>>> PC Object: 4 MPI processes >>>> type: cholesky >>>> Cholesky: out-of-place factorization >>>> tolerance for zero pivot 2.22045e-14 >>>> matrix ordering: natural >>>> factor fill ratio given 0., needed 0. >>>> Factored matrix follows: >>>> Mat Object: 4 MPI processes >>>> type: mpiaij >>>> rows=22878, cols=22878 >>>> package used to perform factorization: mumps >>>> total: nonzeros=3361617, allocated nonzeros=3361617 >>>> total number of mallocs used during MatSetValues calls =0 >>>> MUMPS run parameters: >>>> SYM (matrix type): 2 >>>> PAR (host participation): 1 >>>> ICNTL(1) (output for error): 6 >>>> ICNTL(2) (output of diagnostic msg): 0 >>>> ICNTL(3) (output for global info): 0 >>>> ICNTL(4) (level of printing): 0 >>>> ICNTL(5) (input mat struct): 0 >>>> ICNTL(6) (matrix prescaling): 7 >>>> ICNTL(7) (sequentia matrix ordering):7 >>>> ICNTL(8) (scalling strategy): 77 >>>> ICNTL(10) (max num of refinements): 0 >>>> ICNTL(11) (error analysis): 0 >>>> ICNTL(12) (efficiency control): 0 >>>> ICNTL(13) (efficiency control): 0 >>>> ICNTL(14) (percentage of estimated workspace increase): >>>> 20 >>>> ICNTL(18) (input mat struct): 3 >>>> ICNTL(19) (Shur complement info): 0 >>>> ICNTL(20) (rhs sparse pattern): 0 >>>> ICNTL(21) (solution struct): 1 >>>> ICNTL(22) (in-core/out-of-core facility): 0 >>>> ICNTL(23) (max size of memory can be allocated locally):0 >>>> ICNTL(24) (detection of null pivot rows): 0 >>>> ICNTL(25) (computation of a null space basis): 0 >>>> ICNTL(26) (Schur options for rhs or solution): 0 >>>> ICNTL(27) (experimental parameter): >>>> -24 >>>> ICNTL(28) (use parallel or sequential ordering): 1 >>>> ICNTL(29) (parallel ordering): 0 >>>> ICNTL(30) (user-specified set of entries in inv(A)): 0 >>>> ICNTL(31) (factors is discarded in the solve phase): 0 >>>> ICNTL(33) (compute determinant): 0 >>>> CNTL(1) (relative pivoting threshold): 0.01 >>>> CNTL(2) (stopping criterion of refinement): 1.49012e-08 >>>> CNTL(3) (absolute pivoting threshold): 0. >>>> CNTL(4) (value of static pivoting): -1. >>>> CNTL(5) (fixation for null pivots): 0. >>>> RINFO(1) (local estimated flops for the elimination after >>>> analysis): >>>> [0] 1.84947e+08 >>>> [1] 2.42065e+08 >>>> [2] 2.53044e+08 >>>> [3] 2.18441e+08 >>>> RINFO(2) (local estimated flops for the assembly after >>>> factorization): >>>> [0] 945938. >>>> [1] 906795. >>>> [2] 897815. >>>> [3] 998840. >>>> RINFO(3) (local estimated flops for the elimination after >>>> factorization): >>>> [0] 1.59835e+08 >>>> [1] 1.50867e+08 >>>> [2] 2.27932e+08 >>>> [3] 1.52037e+08 >>>> INFO(15) (estimated size of (in MB) MUMPS internal data >>>> for running numerical factorization): >>>> [0] 36 >>>> [1] 37 >>>> [2] 38 >>>> [3] 39 >>>> INFO(16) (size of (in MB) MUMPS internal data used during >>>> numerical factorization): >>>> [0] 36 >>>> [1] 37 >>>> [2] 38 >>>> [3] 39 >>>> INFO(23) (num of pivots eliminated on this processor >>>> after factorization): >>>> [0] 6450 >>>> [1] 5442 >>>> [2] 4386 >>>> [3] 5526 >>>> RINFOG(1) (global estimated flops for the elimination >>>> after analysis): 8.98497e+08 >>>> RINFOG(2) (global estimated flops for the assembly after >>>> factorization): 3.74939e+06 >>>> RINFOG(3) (global estimated flops for the elimination >>>> after factorization): 6.9067e+08 >>>> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): >>>> (0.,0.)*(2^0) >>>> INFOG(3) (estimated real workspace for factors on all >>>> processors after analysis): 4082184 >>>> INFOG(4) (estimated integer workspace for factors on all >>>> processors after analysis): 231846 >>>> INFOG(5) (estimated maximum front size in the complete >>>> tree): 678 >>>> INFOG(6) (number of nodes in the complete tree): 1380 >>>> INFOG(7) (ordering option effectively use after >>>> analysis): 5 >>>> INFOG(8) (structural symmetry in percent of the permuted >>>> matrix after analysis): 100 >>>> INFOG(9) (total real/complex workspace to store the >>>> matrix factors after factorization): 3521904 >>>> INFOG(10) (total integer space store the matrix factors >>>> after factorization): 229416 >>>> INFOG(11) (order of largest frontal matrix after >>>> factorization): 678 >>>> INFOG(12) (number of off-diagonal pivots): 0 >>>> INFOG(13) (number of delayed pivots after factorization): >>>> 0 >>>> INFOG(14) (number of memory compress after >>>> factorization): 0 >>>> INFOG(15) (number of steps of iterative refinement after >>>> solution): 0 >>>> INFOG(16) (estimated size (in MB) of all MUMPS internal >>>> data for factorization after analysis: value on the most memory consuming >>>> processor): 39 >>>> INFOG(17) (estimated size of all MUMPS internal data for >>>> factorization after analysis: sum over all processors): 150 >>>> INFOG(18) (size of all MUMPS internal data allocated >>>> during factorization: value on the most memory consuming processor): 39 >>>> INFOG(19) (size of all MUMPS internal data allocated >>>> during factorization: sum over all processors): 150 >>>> INFOG(20) (estimated number of entries in the factors): >>>> 3361617 >>>> INFOG(21) (size in MB of memory effectively used during >>>> factorization - value on the most memory consuming processor): 35 >>>> INFOG(22) (size in MB of memory effectively used during >>>> factorization - sum over all processors): 136 >>>> INFOG(23) (after analysis: value of ICNTL(6) effectively >>>> used): 0 >>>> INFOG(24) (after analysis: value of ICNTL(12) effectively >>>> used): 1 >>>> INFOG(25) (after factorization: number of pivots modified >>>> by static pivoting): 0 >>>> INFOG(28) (after factorization: number of null pivots >>>> encountered): 0 >>>> INFOG(29) (after factorization: effective number of >>>> entries in the factors (sum over all processors)): 2931438 >>>> INFOG(30, 31) (after solution: size in Mbytes of memory >>>> used during solution phase): 0, 0 >>>> INFOG(32) (after analysis: type of analysis done): 1 >>>> INFOG(33) (value used for ICNTL(8)): 7 >>>> INFOG(34) (exponent of the determinant if determinant is >>>> requested): 0 >>>> linear system matrix = precond matrix: >>>> Mat Object: 4 MPI processes >>>> type: mpiaij >>>> rows=22878, cols=22878 >>>> total: nonzeros=1219140, allocated nonzeros=1219140 >>>> total number of mallocs used during MatSetValues calls =0 >>>> using I-node (on process 0) routines: found 1889 nodes, limit >>>> used is 5 >>>> converged reason: -11 >>>> KSP Object: 4 MPI processes >>>> type: preonly >>>> maximum iterations=10000, initial guess is zero >>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using NONE norm type for convergence test >>>> PC Object: 4 MPI processes >>>> type: cholesky >>>> Cholesky: out-of-place factorization >>>> tolerance for zero pivot 2.22045e-14 >>>> matrix ordering: natural >>>> factor fill ratio given 0., needed 0. >>>> Factored matrix follows: >>>> Mat Object: 4 MPI processes >>>> type: mpiaij >>>> rows=22878, cols=22878 >>>> package used to perform factorization: mumps >>>> total: nonzeros=3361617, allocated nonzeros=3361617 >>>> total number of mallocs used during MatSetValues calls =0 >>>> MUMPS run parameters: >>>> SYM (matrix type): 2 >>>> PAR (host participation): 1 >>>> ICNTL(1) (output for error): 6 >>>> ICNTL(2) (output of diagnostic msg): 0 >>>> ICNTL(3) (output for global info): 0 >>>> ICNTL(4) (level of printing): 0 >>>> ICNTL(5) (input mat struct): 0 >>>> ICNTL(6) (matrix prescaling): 7 >>>> ICNTL(7) (sequentia matrix ordering):7 >>>> ICNTL(8) (scalling strategy): 77 >>>> ICNTL(10) (max num of refinements): 0 >>>> ICNTL(11) (error analysis): 0 >>>> ICNTL(12) (efficiency control): 0 >>>> ICNTL(13) (efficiency control): 0 >>>> ICNTL(14) (percentage of estimated workspace increase): >>>> 30 >>>> ICNTL(18) (input mat struct): 3 >>>> ICNTL(19) (Shur complement info): 0 >>>> ICNTL(20) (rhs sparse pattern): 0 >>>> ICNTL(21) (solution struct): 1 >>>> ICNTL(22) (in-core/out-of-core facility): 0 >>>> ICNTL(23) (max size of memory can be allocated locally):0 >>>> ICNTL(24) (detection of null pivot rows): 0 >>>> ICNTL(25) (computation of a null space basis): 0 >>>> ICNTL(26) (Schur options for rhs or solution): 0 >>>> ICNTL(27) (experimental parameter): >>>> -24 >>>> ICNTL(28) (use parallel or sequential ordering): 1 >>>> ICNTL(29) (parallel ordering): 0 >>>> ICNTL(30) (user-specified set of entries in inv(A)): 0 >>>> ICNTL(31) (factors is discarded in the solve phase): 0 >>>> ICNTL(33) (compute determinant): 0 >>>> CNTL(1) (relative pivoting threshold): 0.01 >>>> CNTL(2) (stopping criterion of refinement): 1.49012e-08 >>>> CNTL(3) (absolute pivoting threshold): 0. >>>> CNTL(4) (value of static pivoting): -1. >>>> CNTL(5) (fixation for null pivots): 0. >>>> RINFO(1) (local estimated flops for the elimination after >>>> analysis): >>>> [0] 1.84947e+08 >>>> [1] 2.42065e+08 >>>> [2] 2.53044e+08 >>>> [3] 2.18441e+08 >>>> RINFO(2) (local estimated flops for the assembly after >>>> factorization): >>>> [0] 945938. >>>> [1] 906795. >>>> [2] 897815. >>>> [3] 998840. >>>> RINFO(3) (local estimated flops for the elimination after >>>> factorization): >>>> [0] 1.59835e+08 >>>> [1] 1.50867e+08 >>>> [2] 2.27932e+08 >>>> [3] 1.52037e+08 >>>> INFO(15) (estimated size of (in MB) MUMPS internal data >>>> for running numerical factorization): >>>> [0] 36 >>>> [1] 37 >>>> [2] 38 >>>> [3] 39 >>>> INFO(16) (size of (in MB) MUMPS internal data used during >>>> numerical factorization): >>>> [0] 36 >>>> [1] 37 >>>> [2] 38 >>>> [3] 39 >>>> INFO(23) (num of pivots eliminated on this processor >>>> after factorization): >>>> [0] 6450 >>>> [1] 5442 >>>> [2] 4386 >>>> [3] 5526 >>>> RINFOG(1) (global estimated flops for the elimination >>>> after analysis): 8.98497e+08 >>>> RINFOG(2) (global estimated flops for the assembly after >>>> factorization): 3.74939e+06 >>>> RINFOG(3) (global estimated flops for the elimination >>>> after factorization): 6.9067e+08 >>>> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): >>>> (0.,0.)*(2^0) >>>> INFOG(3) (estimated real workspace for factors on all >>>> processors after analysis): 4082184 >>>> INFOG(4) (estimated integer workspace for factors on all >>>> processors after analysis): 231846 >>>> INFOG(5) (estimated maximum front size in the complete >>>> tree): 678 >>>> INFOG(6) (number of nodes in the complete tree): 1380 >>>> INFOG(7) (ordering option effectively use after >>>> analysis): 5 >>>> INFOG(8) (structural symmetry in percent of the permuted >>>> matrix after analysis): 100 >>>> INFOG(9) (total real/complex workspace to store the >>>> matrix factors after factorization): 3521904 >>>> INFOG(10) (total integer space store the matrix factors >>>> after factorization): 229416 >>>> INFOG(11) (order of largest frontal matrix after >>>> factorization): 678 >>>> INFOG(12) (number of off-diagonal pivots): 0 >>>> INFOG(13) (number of delayed pivots after factorization): >>>> 0 >>>> INFOG(14) (number of memory compress after >>>> factorization): 0 >>>> INFOG(15) (number of steps of iterative refinement after >>>> solution): 0 >>>> INFOG(16) (estimated size (in MB) of all MUMPS internal >>>> data for factorization after analysis: value on the most memory consuming >>>> processor): 39 >>>> INFOG(17) (estimated size of all MUMPS internal data for >>>> factorization after analysis: sum over all processors): 150 >>>> INFOG(18) (size of all MUMPS internal data allocated >>>> during factorization: value on the most memory consuming processor): 39 >>>> INFOG(19) (size of all MUMPS internal data allocated >>>> during factorization: sum over all processors): 150 >>>> INFOG(20) (estimated number of entries in the factors): >>>> 3361617 >>>> INFOG(21) (size in MB of memory effectively used during >>>> factorization - value on the most memory consuming processor): 35 >>>> INFOG(22) (size in MB of memory effectively used during >>>> factorization - sum over all processors): 136 >>>> INFOG(23) (after analysis: value of ICNTL(6) effectively >>>> used): 0 >>>> INFOG(24) (after analysis: value of ICNTL(12) effectively >>>> used): 1 >>>> INFOG(25) (after factorization: number of pivots modified >>>> by static pivoting): 0 >>>> INFOG(28) (after factorization: number of null pivots >>>> encountered): 0 >>>> INFOG(29) (after factorization: effective number of >>>> entries in the factors (sum over all processors)): 2931438 >>>> INFOG(30, 31) (after solution: size in Mbytes of memory >>>> used during solution phase): 0, 0 >>>> INFOG(32) (after analysis: type of analysis done): 1 >>>> INFOG(33) (value used for ICNTL(8)): 7 >>>> INFOG(34) (exponent of the determinant if determinant is >>>> requested): 0 >>>> linear system matrix = precond matrix: >>>> Mat Object: 4 MPI processes >>>> type: mpiaij >>>> rows=22878, cols=22878 >>>> total: nonzeros=1219140, allocated nonzeros=1219140 >>>> total number of mallocs used during MatSetValues calls =0 >>>> using I-node (on process 0) routines: found 1889 nodes, limit >>>> used is 5 >>>> converged reason: -11 >>>> >>>> ------------------------------------------------------------ >>>> ----------------------------------------- >>>> >>> >>> >