On Mon, Sep 19, 2016 at 9:45 PM, Fande Kong <fdkong...@gmail.com> wrote:
> Placing PCReset(PC pc) before the second kspsolve might works. > > Fande Kong, > > On Mon, Sep 19, 2016 at 7:38 PM, murat keçeli <kec...@gmail.com> wrote: > >> Another guess: maybe you also need KSPSetUp(ksp); before the second >> KSPSolve(ksp,b,x);. >> >> Murat Keceli >> > Thanks for the suggestions. I just tried these, and they didn't work either unfortunately. David > >> >> On Mon, Sep 19, 2016 at 8:33 PM, David Knezevic < >> david.kneze...@akselos.com> wrote: >> >>> On Mon, Sep 19, 2016 at 7:26 PM, Dave May <dave.mayhe...@gmail.com> >>> wrote: >>> >>>> >>>> >>>> On 19 September 2016 at 21:05, David Knezevic < >>>> david.kneze...@akselos.com> wrote: >>>> >>>>> When I use MUMPS via PETSc, one issue is that it can sometimes fail >>>>> with MUMPS error -9, which means that MUMPS didn't allocate a big enough >>>>> workspace. This can typically be fixed by increasing MUMPS icntl 14, e.g. >>>>> via the command line option -mat_mumps_icntl_14. >>>>> >>>>> However, instead of having to run several times with different command >>>>> line options, I'd like to be able to automatically increment icntl 14 >>>>> value >>>>> in a loop until the solve succeeds. >>>>> >>>>> I have a saved matrix which fails when I use it for a solve with MUMPS >>>>> with 4 MPI processes and the default ictnl values, so I'm using this to >>>>> check that I can achieve the automatic icntl 14 update, as described >>>>> above. >>>>> (The matrix is 14MB so I haven't attached it here, but I'd be happy to >>>>> send >>>>> it to anyone else who wants to try this test case out.) >>>>> >>>>> I've pasted some test code below which provides a simple test of this >>>>> idea using two solves. The first solve uses the default value of icntl 14, >>>>> which fails, and then we update icntl 14 to 30 and solve again. The second >>>>> solve should succeed since icntl 14 of 30 is sufficient for MUMPS to >>>>> succeed in this case, but for some reason the second solve still fails. >>>>> >>>>> Below I've also pasted the output from -ksp_view, and you can see that >>>>> ictnl 14 is being updated correctly (see the ICNTL(14) lines in the >>>>> output), so it's not clear to me why the second solve fails. It seems like >>>>> MUMPS is ignoring the update to the ictnl value? >>>>> >>>> >>>> I believe this parameter is utilized during the numerical factorization >>>> phase. >>>> In your code, the operator hasn't changed, however you haven't >>>> signalled to the KSP that you want to re-perform the numerical >>>> factorization. >>>> You can do this by calling KSPSetOperators() before your second solve. >>>> I think if you do this (please try it), the factorization will be >>>> performed again and the new value of icntl will have an effect. >>>> >>>> Note this is a wild stab in the dark - I haven't dug through the >>>> petsc-mumps code in detail... >>>> >>> >>> That sounds like a plausible guess to me, but unfortunately it didn't >>> work. I added KSPSetOperators(ksp,A,A); before the second solve and I >>> got the same behavior as before. >>> >>> Thanks, >>> David >>> >>> >>> >>> >>> >>>> ------------------------------------------------------------ >>>>> ----------------------------------------- >>>>> Test code: >>>>> >>>>> Mat A; >>>>> MatCreate(PETSC_COMM_WORLD,&A); >>>>> MatSetType(A,MATMPIAIJ); >>>>> >>>>> PetscViewer petsc_viewer; >>>>> PetscViewerBinaryOpen( PETSC_COMM_WORLD, >>>>> "matrix.dat", >>>>> FILE_MODE_READ, >>>>> &petsc_viewer); >>>>> MatLoad(A, petsc_viewer); >>>>> PetscViewerDestroy(&petsc_viewer); >>>>> >>>>> PetscInt m, n; >>>>> MatGetSize(A, &m, &n); >>>>> >>>>> Vec x; >>>>> VecCreate(PETSC_COMM_WORLD,&x); >>>>> VecSetSizes(x,PETSC_DECIDE,m); >>>>> VecSetFromOptions(x); >>>>> VecSet(x,1.0); >>>>> >>>>> Vec b; >>>>> VecDuplicate(x,&b); >>>>> >>>>> KSP ksp; >>>>> PC pc; >>>>> >>>>> KSPCreate(PETSC_COMM_WORLD,&ksp); >>>>> KSPSetOperators(ksp,A,A); >>>>> >>>>> KSPSetType(ksp,KSPPREONLY); >>>>> KSPGetPC(ksp,&pc); >>>>> >>>>> PCSetType(pc,PCCHOLESKY); >>>>> >>>>> PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS); >>>>> PCFactorSetUpMatSolverPackage(pc); >>>>> >>>>> KSPSetFromOptions(ksp); >>>>> KSPSetUp(ksp); >>>>> >>>>> KSPSolve(ksp,b,x); >>>>> >>>>> { >>>>> KSPConvergedReason reason; >>>>> KSPGetConvergedReason(ksp, &reason); >>>>> std::cout << "converged reason: " << reason << std::endl; >>>>> } >>>>> >>>>> Mat F; >>>>> PCFactorGetMatrix(pc,&F); >>>>> MatMumpsSetIcntl(F,14,30); >>>>> >>>>> KSPSolve(ksp,b,x); >>>>> >>>>> { >>>>> KSPConvergedReason reason; >>>>> KSPGetConvergedReason(ksp, &reason); >>>>> std::cout << "converged reason: " << reason << std::endl; >>>>> } >>>>> >>>>> ------------------------------------------------------------ >>>>> ----------------------------------------- >>>>> -ksp_view output (ICNTL(14) changes from 20 to 30, but we get >>>>> "converged reason: -11" for both solves) >>>>> >>>>> KSP Object: 4 MPI processes >>>>> type: preonly >>>>> maximum iterations=10000, initial guess is zero >>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using NONE norm type for convergence test >>>>> PC Object: 4 MPI processes >>>>> type: cholesky >>>>> Cholesky: out-of-place factorization >>>>> tolerance for zero pivot 2.22045e-14 >>>>> matrix ordering: natural >>>>> factor fill ratio given 0., needed 0. >>>>> Factored matrix follows: >>>>> Mat Object: 4 MPI processes >>>>> type: mpiaij >>>>> rows=22878, cols=22878 >>>>> package used to perform factorization: mumps >>>>> total: nonzeros=3361617, allocated nonzeros=3361617 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> MUMPS run parameters: >>>>> SYM (matrix type): 2 >>>>> PAR (host participation): 1 >>>>> ICNTL(1) (output for error): 6 >>>>> ICNTL(2) (output of diagnostic msg): 0 >>>>> ICNTL(3) (output for global info): 0 >>>>> ICNTL(4) (level of printing): 0 >>>>> ICNTL(5) (input mat struct): 0 >>>>> ICNTL(6) (matrix prescaling): 7 >>>>> ICNTL(7) (sequentia matrix ordering):7 >>>>> ICNTL(8) (scalling strategy): 77 >>>>> ICNTL(10) (max num of refinements): 0 >>>>> ICNTL(11) (error analysis): 0 >>>>> ICNTL(12) (efficiency control): >>>>> 0 >>>>> ICNTL(13) (efficiency control): >>>>> 0 >>>>> ICNTL(14) (percentage of estimated workspace increase): >>>>> 20 >>>>> ICNTL(18) (input mat struct): >>>>> 3 >>>>> ICNTL(19) (Shur complement info): >>>>> 0 >>>>> ICNTL(20) (rhs sparse pattern): >>>>> 0 >>>>> ICNTL(21) (solution struct): >>>>> 1 >>>>> ICNTL(22) (in-core/out-of-core facility): >>>>> 0 >>>>> ICNTL(23) (max size of memory can be allocated >>>>> locally):0 >>>>> ICNTL(24) (detection of null pivot rows): >>>>> 0 >>>>> ICNTL(25) (computation of a null space basis): >>>>> 0 >>>>> ICNTL(26) (Schur options for rhs or solution): >>>>> 0 >>>>> ICNTL(27) (experimental parameter): >>>>> -24 >>>>> ICNTL(28) (use parallel or sequential ordering): >>>>> 1 >>>>> ICNTL(29) (parallel ordering): >>>>> 0 >>>>> ICNTL(30) (user-specified set of entries in inv(A)): >>>>> 0 >>>>> ICNTL(31) (factors is discarded in the solve phase): >>>>> 0 >>>>> ICNTL(33) (compute determinant): >>>>> 0 >>>>> CNTL(1) (relative pivoting threshold): 0.01 >>>>> CNTL(2) (stopping criterion of refinement): 1.49012e-08 >>>>> CNTL(3) (absolute pivoting threshold): 0. >>>>> CNTL(4) (value of static pivoting): -1. >>>>> CNTL(5) (fixation for null pivots): 0. >>>>> RINFO(1) (local estimated flops for the elimination >>>>> after analysis): >>>>> [0] 1.84947e+08 >>>>> [1] 2.42065e+08 >>>>> [2] 2.53044e+08 >>>>> [3] 2.18441e+08 >>>>> RINFO(2) (local estimated flops for the assembly after >>>>> factorization): >>>>> [0] 945938. >>>>> [1] 906795. >>>>> [2] 897815. >>>>> [3] 998840. >>>>> RINFO(3) (local estimated flops for the elimination >>>>> after factorization): >>>>> [0] 1.59835e+08 >>>>> [1] 1.50867e+08 >>>>> [2] 2.27932e+08 >>>>> [3] 1.52037e+08 >>>>> INFO(15) (estimated size of (in MB) MUMPS internal data >>>>> for running numerical factorization): >>>>> [0] 36 >>>>> [1] 37 >>>>> [2] 38 >>>>> [3] 39 >>>>> INFO(16) (size of (in MB) MUMPS internal data used >>>>> during numerical factorization): >>>>> [0] 36 >>>>> [1] 37 >>>>> [2] 38 >>>>> [3] 39 >>>>> INFO(23) (num of pivots eliminated on this processor >>>>> after factorization): >>>>> [0] 6450 >>>>> [1] 5442 >>>>> [2] 4386 >>>>> [3] 5526 >>>>> RINFOG(1) (global estimated flops for the elimination >>>>> after analysis): 8.98497e+08 >>>>> RINFOG(2) (global estimated flops for the assembly after >>>>> factorization): 3.74939e+06 >>>>> RINFOG(3) (global estimated flops for the elimination >>>>> after factorization): 6.9067e+08 >>>>> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): >>>>> (0.,0.)*(2^0) >>>>> INFOG(3) (estimated real workspace for factors on all >>>>> processors after analysis): 4082184 >>>>> INFOG(4) (estimated integer workspace for factors on all >>>>> processors after analysis): 231846 >>>>> INFOG(5) (estimated maximum front size in the complete >>>>> tree): 678 >>>>> INFOG(6) (number of nodes in the complete tree): 1380 >>>>> INFOG(7) (ordering option effectively use after >>>>> analysis): 5 >>>>> INFOG(8) (structural symmetry in percent of the permuted >>>>> matrix after analysis): 100 >>>>> INFOG(9) (total real/complex workspace to store the >>>>> matrix factors after factorization): 3521904 >>>>> INFOG(10) (total integer space store the matrix factors >>>>> after factorization): 229416 >>>>> INFOG(11) (order of largest frontal matrix after >>>>> factorization): 678 >>>>> INFOG(12) (number of off-diagonal pivots): 0 >>>>> INFOG(13) (number of delayed pivots after >>>>> factorization): 0 >>>>> INFOG(14) (number of memory compress after >>>>> factorization): 0 >>>>> INFOG(15) (number of steps of iterative refinement after >>>>> solution): 0 >>>>> INFOG(16) (estimated size (in MB) of all MUMPS internal >>>>> data for factorization after analysis: value on the most memory consuming >>>>> processor): 39 >>>>> INFOG(17) (estimated size of all MUMPS internal data for >>>>> factorization after analysis: sum over all processors): 150 >>>>> INFOG(18) (size of all MUMPS internal data allocated >>>>> during factorization: value on the most memory consuming processor): 39 >>>>> INFOG(19) (size of all MUMPS internal data allocated >>>>> during factorization: sum over all processors): 150 >>>>> INFOG(20) (estimated number of entries in the factors): >>>>> 3361617 >>>>> INFOG(21) (size in MB of memory effectively used during >>>>> factorization - value on the most memory consuming processor): 35 >>>>> INFOG(22) (size in MB of memory effectively used during >>>>> factorization - sum over all processors): 136 >>>>> INFOG(23) (after analysis: value of ICNTL(6) effectively >>>>> used): 0 >>>>> INFOG(24) (after analysis: value of ICNTL(12) >>>>> effectively used): 1 >>>>> INFOG(25) (after factorization: number of pivots >>>>> modified by static pivoting): 0 >>>>> INFOG(28) (after factorization: number of null pivots >>>>> encountered): 0 >>>>> INFOG(29) (after factorization: effective number of >>>>> entries in the factors (sum over all processors)): 2931438 >>>>> INFOG(30, 31) (after solution: size in Mbytes of memory >>>>> used during solution phase): 0, 0 >>>>> INFOG(32) (after analysis: type of analysis done): 1 >>>>> INFOG(33) (value used for ICNTL(8)): 7 >>>>> INFOG(34) (exponent of the determinant if determinant is >>>>> requested): 0 >>>>> linear system matrix = precond matrix: >>>>> Mat Object: 4 MPI processes >>>>> type: mpiaij >>>>> rows=22878, cols=22878 >>>>> total: nonzeros=1219140, allocated nonzeros=1219140 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> using I-node (on process 0) routines: found 1889 nodes, limit >>>>> used is 5 >>>>> converged reason: -11 >>>>> KSP Object: 4 MPI processes >>>>> type: preonly >>>>> maximum iterations=10000, initial guess is zero >>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using NONE norm type for convergence test >>>>> PC Object: 4 MPI processes >>>>> type: cholesky >>>>> Cholesky: out-of-place factorization >>>>> tolerance for zero pivot 2.22045e-14 >>>>> matrix ordering: natural >>>>> factor fill ratio given 0., needed 0. >>>>> Factored matrix follows: >>>>> Mat Object: 4 MPI processes >>>>> type: mpiaij >>>>> rows=22878, cols=22878 >>>>> package used to perform factorization: mumps >>>>> total: nonzeros=3361617, allocated nonzeros=3361617 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> MUMPS run parameters: >>>>> SYM (matrix type): 2 >>>>> PAR (host participation): 1 >>>>> ICNTL(1) (output for error): 6 >>>>> ICNTL(2) (output of diagnostic msg): 0 >>>>> ICNTL(3) (output for global info): 0 >>>>> ICNTL(4) (level of printing): 0 >>>>> ICNTL(5) (input mat struct): 0 >>>>> ICNTL(6) (matrix prescaling): 7 >>>>> ICNTL(7) (sequentia matrix ordering):7 >>>>> ICNTL(8) (scalling strategy): 77 >>>>> ICNTL(10) (max num of refinements): 0 >>>>> ICNTL(11) (error analysis): 0 >>>>> ICNTL(12) (efficiency control): >>>>> 0 >>>>> ICNTL(13) (efficiency control): >>>>> 0 >>>>> ICNTL(14) (percentage of estimated workspace increase): >>>>> 30 >>>>> ICNTL(18) (input mat struct): >>>>> 3 >>>>> ICNTL(19) (Shur complement info): >>>>> 0 >>>>> ICNTL(20) (rhs sparse pattern): >>>>> 0 >>>>> ICNTL(21) (solution struct): >>>>> 1 >>>>> ICNTL(22) (in-core/out-of-core facility): >>>>> 0 >>>>> ICNTL(23) (max size of memory can be allocated >>>>> locally):0 >>>>> ICNTL(24) (detection of null pivot rows): >>>>> 0 >>>>> ICNTL(25) (computation of a null space basis): >>>>> 0 >>>>> ICNTL(26) (Schur options for rhs or solution): >>>>> 0 >>>>> ICNTL(27) (experimental parameter): >>>>> -24 >>>>> ICNTL(28) (use parallel or sequential ordering): >>>>> 1 >>>>> ICNTL(29) (parallel ordering): >>>>> 0 >>>>> ICNTL(30) (user-specified set of entries in inv(A)): >>>>> 0 >>>>> ICNTL(31) (factors is discarded in the solve phase): >>>>> 0 >>>>> ICNTL(33) (compute determinant): >>>>> 0 >>>>> CNTL(1) (relative pivoting threshold): 0.01 >>>>> CNTL(2) (stopping criterion of refinement): 1.49012e-08 >>>>> CNTL(3) (absolute pivoting threshold): 0. >>>>> CNTL(4) (value of static pivoting): -1. >>>>> CNTL(5) (fixation for null pivots): 0. >>>>> RINFO(1) (local estimated flops for the elimination >>>>> after analysis): >>>>> [0] 1.84947e+08 >>>>> [1] 2.42065e+08 >>>>> [2] 2.53044e+08 >>>>> [3] 2.18441e+08 >>>>> RINFO(2) (local estimated flops for the assembly after >>>>> factorization): >>>>> [0] 945938. >>>>> [1] 906795. >>>>> [2] 897815. >>>>> [3] 998840. >>>>> RINFO(3) (local estimated flops for the elimination >>>>> after factorization): >>>>> [0] 1.59835e+08 >>>>> [1] 1.50867e+08 >>>>> [2] 2.27932e+08 >>>>> [3] 1.52037e+08 >>>>> INFO(15) (estimated size of (in MB) MUMPS internal data >>>>> for running numerical factorization): >>>>> [0] 36 >>>>> [1] 37 >>>>> [2] 38 >>>>> [3] 39 >>>>> INFO(16) (size of (in MB) MUMPS internal data used >>>>> during numerical factorization): >>>>> [0] 36 >>>>> [1] 37 >>>>> [2] 38 >>>>> [3] 39 >>>>> INFO(23) (num of pivots eliminated on this processor >>>>> after factorization): >>>>> [0] 6450 >>>>> [1] 5442 >>>>> [2] 4386 >>>>> [3] 5526 >>>>> RINFOG(1) (global estimated flops for the elimination >>>>> after analysis): 8.98497e+08 >>>>> RINFOG(2) (global estimated flops for the assembly after >>>>> factorization): 3.74939e+06 >>>>> RINFOG(3) (global estimated flops for the elimination >>>>> after factorization): 6.9067e+08 >>>>> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): >>>>> (0.,0.)*(2^0) >>>>> INFOG(3) (estimated real workspace for factors on all >>>>> processors after analysis): 4082184 >>>>> INFOG(4) (estimated integer workspace for factors on all >>>>> processors after analysis): 231846 >>>>> INFOG(5) (estimated maximum front size in the complete >>>>> tree): 678 >>>>> INFOG(6) (number of nodes in the complete tree): 1380 >>>>> INFOG(7) (ordering option effectively use after >>>>> analysis): 5 >>>>> INFOG(8) (structural symmetry in percent of the permuted >>>>> matrix after analysis): 100 >>>>> INFOG(9) (total real/complex workspace to store the >>>>> matrix factors after factorization): 3521904 >>>>> INFOG(10) (total integer space store the matrix factors >>>>> after factorization): 229416 >>>>> INFOG(11) (order of largest frontal matrix after >>>>> factorization): 678 >>>>> INFOG(12) (number of off-diagonal pivots): 0 >>>>> INFOG(13) (number of delayed pivots after >>>>> factorization): 0 >>>>> INFOG(14) (number of memory compress after >>>>> factorization): 0 >>>>> INFOG(15) (number of steps of iterative refinement after >>>>> solution): 0 >>>>> INFOG(16) (estimated size (in MB) of all MUMPS internal >>>>> data for factorization after analysis: value on the most memory consuming >>>>> processor): 39 >>>>> INFOG(17) (estimated size of all MUMPS internal data for >>>>> factorization after analysis: sum over all processors): 150 >>>>> INFOG(18) (size of all MUMPS internal data allocated >>>>> during factorization: value on the most memory consuming processor): 39 >>>>> INFOG(19) (size of all MUMPS internal data allocated >>>>> during factorization: sum over all processors): 150 >>>>> INFOG(20) (estimated number of entries in the factors): >>>>> 3361617 >>>>> INFOG(21) (size in MB of memory effectively used during >>>>> factorization - value on the most memory consuming processor): 35 >>>>> INFOG(22) (size in MB of memory effectively used during >>>>> factorization - sum over all processors): 136 >>>>> INFOG(23) (after analysis: value of ICNTL(6) effectively >>>>> used): 0 >>>>> INFOG(24) (after analysis: value of ICNTL(12) >>>>> effectively used): 1 >>>>> INFOG(25) (after factorization: number of pivots >>>>> modified by static pivoting): 0 >>>>> INFOG(28) (after factorization: number of null pivots >>>>> encountered): 0 >>>>> INFOG(29) (after factorization: effective number of >>>>> entries in the factors (sum over all processors)): 2931438 >>>>> INFOG(30, 31) (after solution: size in Mbytes of memory >>>>> used during solution phase): 0, 0 >>>>> INFOG(32) (after analysis: type of analysis done): 1 >>>>> INFOG(33) (value used for ICNTL(8)): 7 >>>>> INFOG(34) (exponent of the determinant if determinant is >>>>> requested): 0 >>>>> linear system matrix = precond matrix: >>>>> Mat Object: 4 MPI processes >>>>> type: mpiaij >>>>> rows=22878, cols=22878 >>>>> total: nonzeros=1219140, allocated nonzeros=1219140 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> using I-node (on process 0) routines: found 1889 nodes, limit >>>>> used is 5 >>>>> converged reason: -11 >>>>> >>>>> ------------------------------------------------------------ >>>>> ----------------------------------------- >>>>> >>>> >>>> >> >