Re: [petsc-users] Issue updating MUMPS ictnl after failed solve

David Knezevic Mon, 19 Sep 2016 18:53:38 -0700

On Mon, Sep 19, 2016 at 9:45 PM, Fande Kong <fdkong...@gmail.com> wrote:


> Placing PCReset(PC pc) before the second kspsolve might works.
>
> Fande Kong,
>
> On Mon, Sep 19, 2016 at 7:38 PM, murat keçeli <kec...@gmail.com> wrote:
>
>> Another guess: maybe you also need KSPSetUp(ksp); before the second
>> KSPSolve(ksp,b,x);.
>>
>> Murat Keceli
>>
>
Thanks for the suggestions. I just tried these, and they didn't work either
unfortunately.

David





> 
>>
>> On Mon, Sep 19, 2016 at 8:33 PM, David Knezevic <
>> david.kneze...@akselos.com> wrote:
>>
>>> On Mon, Sep 19, 2016 at 7:26 PM, Dave May <dave.mayhe...@gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On 19 September 2016 at 21:05, David Knezevic <
>>>> david.kneze...@akselos.com> wrote:
>>>>
>>>>> When I use MUMPS via PETSc, one issue is that it can sometimes fail
>>>>> with MUMPS error -9, which means that MUMPS didn't allocate a big enough
>>>>> workspace. This can typically be fixed by increasing MUMPS icntl 14, e.g.
>>>>> via the command line option -mat_mumps_icntl_14.
>>>>>
>>>>> However, instead of having to run several times with different command
>>>>> line options, I'd like to be able to automatically increment icntl 14 
>>>>> value
>>>>> in a loop until the solve succeeds.
>>>>>
>>>>> I have a saved matrix which fails when I use it for a solve with MUMPS
>>>>> with 4 MPI processes and the default ictnl values, so I'm using this to
>>>>> check that I can achieve the automatic icntl 14 update, as described 
>>>>> above.
>>>>> (The matrix is 14MB so I haven't attached it here, but I'd be happy to 
>>>>> send
>>>>> it to anyone else who wants to try this test case out.)
>>>>>
>>>>> I've pasted some test code below which provides a simple test of this
>>>>> idea using two solves. The first solve uses the default value of icntl 14,
>>>>> which fails, and then we update icntl 14 to 30 and solve again. The second
>>>>> solve should succeed since icntl 14 of 30 is sufficient for MUMPS to
>>>>> succeed in this case, but for some reason the second solve still fails.
>>>>>
>>>>> Below I've also pasted the output from -ksp_view, and you can see that
>>>>> ictnl 14 is being updated correctly (see the ICNTL(14) lines in the
>>>>> output), so it's not clear to me why the second solve fails. It seems like
>>>>> MUMPS is ignoring the update to the ictnl value?
>>>>>
>>>>
>>>> I believe this parameter is utilized during the numerical factorization
>>>> phase.
>>>> In your code, the operator hasn't changed, however you haven't
>>>> signalled to the KSP that you want to re-perform the numerical
>>>> factorization.
>>>> You can do this by calling KSPSetOperators() before your second solve.
>>>> I think if you do this (please try it), the factorization will be
>>>> performed again and the new value of icntl will have an effect.
>>>>
>>>> Note this is a wild stab in the dark - I haven't dug through the
>>>> petsc-mumps code in detail...
>>>>
>>>
>>> That sounds like a plausible guess to me, but unfortunately it didn't
>>> work. I added KSPSetOperators(ksp,A,A); before the second solve and I
>>> got the same behavior as before.
>>>
>>> Thanks,
>>> David
>>>
>>>
>>>
>>>
>>>
>>>> ------------------------------------------------------------
>>>>> -----------------------------------------
>>>>> Test code:
>>>>>
>>>>>   Mat A;
>>>>>   MatCreate(PETSC_COMM_WORLD,&A);
>>>>>   MatSetType(A,MATMPIAIJ);
>>>>>
>>>>>   PetscViewer petsc_viewer;
>>>>>   PetscViewerBinaryOpen( PETSC_COMM_WORLD,
>>>>>                          "matrix.dat",
>>>>>                          FILE_MODE_READ,
>>>>>                          &petsc_viewer);
>>>>>   MatLoad(A, petsc_viewer);
>>>>>   PetscViewerDestroy(&petsc_viewer);
>>>>>
>>>>>   PetscInt m, n;
>>>>>   MatGetSize(A, &m, &n);
>>>>>
>>>>>   Vec x;
>>>>>   VecCreate(PETSC_COMM_WORLD,&x);
>>>>>   VecSetSizes(x,PETSC_DECIDE,m);
>>>>>   VecSetFromOptions(x);
>>>>>   VecSet(x,1.0);
>>>>>
>>>>>   Vec b;
>>>>>   VecDuplicate(x,&b);
>>>>>
>>>>>   KSP ksp;
>>>>>   PC pc;
>>>>>
>>>>>   KSPCreate(PETSC_COMM_WORLD,&ksp);
>>>>>   KSPSetOperators(ksp,A,A);
>>>>>
>>>>>   KSPSetType(ksp,KSPPREONLY);
>>>>>   KSPGetPC(ksp,&pc);
>>>>>
>>>>>   PCSetType(pc,PCCHOLESKY);
>>>>>
>>>>>   PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS);
>>>>>   PCFactorSetUpMatSolverPackage(pc);
>>>>>
>>>>>   KSPSetFromOptions(ksp);
>>>>>   KSPSetUp(ksp);
>>>>>
>>>>>   KSPSolve(ksp,b,x);
>>>>>
>>>>>   {
>>>>>     KSPConvergedReason reason;
>>>>>     KSPGetConvergedReason(ksp, &reason);
>>>>>     std::cout << "converged reason: " << reason << std::endl;
>>>>>   }
>>>>>
>>>>>   Mat F;
>>>>>   PCFactorGetMatrix(pc,&F);
>>>>>   MatMumpsSetIcntl(F,14,30);
>>>>>
>>>>>   KSPSolve(ksp,b,x);
>>>>>
>>>>>   {
>>>>>     KSPConvergedReason reason;
>>>>>     KSPGetConvergedReason(ksp, &reason);
>>>>>     std::cout << "converged reason: " << reason << std::endl;
>>>>>   }
>>>>>
>>>>> ------------------------------------------------------------
>>>>> -----------------------------------------
>>>>> -ksp_view output (ICNTL(14) changes from 20 to 30, but we get
>>>>> "converged reason: -11" for both solves)
>>>>>
>>>>> KSP Object: 4 MPI processes
>>>>>   type: preonly
>>>>>   maximum iterations=10000, initial guess is zero
>>>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>>>   left preconditioning
>>>>>   using NONE norm type for convergence test
>>>>> PC Object: 4 MPI processes
>>>>>   type: cholesky
>>>>>     Cholesky: out-of-place factorization
>>>>>     tolerance for zero pivot 2.22045e-14
>>>>>     matrix ordering: natural
>>>>>     factor fill ratio given 0., needed 0.
>>>>>       Factored matrix follows:
>>>>>         Mat Object:         4 MPI processes
>>>>>           type: mpiaij
>>>>>           rows=22878, cols=22878
>>>>>           package used to perform factorization: mumps
>>>>>           total: nonzeros=3361617, allocated nonzeros=3361617
>>>>>           total number of mallocs used during MatSetValues calls =0
>>>>>             MUMPS run parameters:
>>>>>               SYM (matrix type):                   2
>>>>>               PAR (host participation):            1
>>>>>               ICNTL(1) (output for error):         6
>>>>>               ICNTL(2) (output of diagnostic msg): 0
>>>>>               ICNTL(3) (output for global info):   0
>>>>>               ICNTL(4) (level of printing):        0
>>>>>               ICNTL(5) (input mat struct):         0
>>>>>               ICNTL(6) (matrix prescaling):        7
>>>>>               ICNTL(7) (sequentia matrix ordering):7
>>>>>               ICNTL(8) (scalling strategy):        77
>>>>>               ICNTL(10) (max num of refinements):  0
>>>>>               ICNTL(11) (error analysis):          0
>>>>>               ICNTL(12) (efficiency control):
>>>>> 0
>>>>>               ICNTL(13) (efficiency control):
>>>>> 0
>>>>>               ICNTL(14) (percentage of estimated workspace increase):
>>>>> 20
>>>>>               ICNTL(18) (input mat struct):
>>>>> 3
>>>>>               ICNTL(19) (Shur complement info):
>>>>> 0
>>>>>               ICNTL(20) (rhs sparse pattern):
>>>>> 0
>>>>>               ICNTL(21) (solution struct):
>>>>>  1
>>>>>               ICNTL(22) (in-core/out-of-core facility):
>>>>> 0
>>>>>               ICNTL(23) (max size of memory can be allocated
>>>>> locally):0
>>>>>               ICNTL(24) (detection of null pivot rows):
>>>>> 0
>>>>>               ICNTL(25) (computation of a null space basis):
>>>>>  0
>>>>>               ICNTL(26) (Schur options for rhs or solution):
>>>>>  0
>>>>>               ICNTL(27) (experimental parameter):
>>>>> -24
>>>>>               ICNTL(28) (use parallel or sequential ordering):
>>>>>  1
>>>>>               ICNTL(29) (parallel ordering):
>>>>>  0
>>>>>               ICNTL(30) (user-specified set of entries in inv(A)):
>>>>>  0
>>>>>               ICNTL(31) (factors is discarded in the solve phase):
>>>>>  0
>>>>>               ICNTL(33) (compute determinant):
>>>>>  0
>>>>>               CNTL(1) (relative pivoting threshold):      0.01
>>>>>               CNTL(2) (stopping criterion of refinement): 1.49012e-08
>>>>>               CNTL(3) (absolute pivoting threshold):      0.
>>>>>               CNTL(4) (value of static pivoting):         -1.
>>>>>               CNTL(5) (fixation for null pivots):         0.
>>>>>               RINFO(1) (local estimated flops for the elimination
>>>>> after analysis):
>>>>>                 [0] 1.84947e+08
>>>>>                 [1] 2.42065e+08
>>>>>                 [2] 2.53044e+08
>>>>>                 [3] 2.18441e+08
>>>>>               RINFO(2) (local estimated flops for the assembly after
>>>>> factorization):
>>>>>                 [0]  945938.
>>>>>                 [1]  906795.
>>>>>                 [2]  897815.
>>>>>                 [3]  998840.
>>>>>               RINFO(3) (local estimated flops for the elimination
>>>>> after factorization):
>>>>>                 [0]  1.59835e+08
>>>>>                 [1]  1.50867e+08
>>>>>                 [2]  2.27932e+08
>>>>>                 [3]  1.52037e+08
>>>>>               INFO(15) (estimated size of (in MB) MUMPS internal data
>>>>> for running numerical factorization):
>>>>>               [0] 36
>>>>>               [1] 37
>>>>>               [2] 38
>>>>>               [3] 39
>>>>>               INFO(16) (size of (in MB) MUMPS internal data used
>>>>> during numerical factorization):
>>>>>                 [0] 36
>>>>>                 [1] 37
>>>>>                 [2] 38
>>>>>                 [3] 39
>>>>>               INFO(23) (num of pivots eliminated on this processor
>>>>> after factorization):
>>>>>                 [0] 6450
>>>>>                 [1] 5442
>>>>>                 [2] 4386
>>>>>                 [3] 5526
>>>>>               RINFOG(1) (global estimated flops for the elimination
>>>>> after analysis): 8.98497e+08
>>>>>               RINFOG(2) (global estimated flops for the assembly after
>>>>> factorization): 3.74939e+06
>>>>>               RINFOG(3) (global estimated flops for the elimination
>>>>> after factorization): 6.9067e+08
>>>>>               (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
>>>>> (0.,0.)*(2^0)
>>>>>               INFOG(3) (estimated real workspace for factors on all
>>>>> processors after analysis): 4082184
>>>>>               INFOG(4) (estimated integer workspace for factors on all
>>>>> processors after analysis): 231846
>>>>>               INFOG(5) (estimated maximum front size in the complete
>>>>> tree): 678
>>>>>               INFOG(6) (number of nodes in the complete tree): 1380
>>>>>               INFOG(7) (ordering option effectively use after
>>>>> analysis): 5
>>>>>               INFOG(8) (structural symmetry in percent of the permuted
>>>>> matrix after analysis): 100
>>>>>               INFOG(9) (total real/complex workspace to store the
>>>>> matrix factors after factorization): 3521904
>>>>>               INFOG(10) (total integer space store the matrix factors
>>>>> after factorization): 229416
>>>>>               INFOG(11) (order of largest frontal matrix after
>>>>> factorization): 678
>>>>>               INFOG(12) (number of off-diagonal pivots): 0
>>>>>               INFOG(13) (number of delayed pivots after
>>>>> factorization): 0
>>>>>               INFOG(14) (number of memory compress after
>>>>> factorization): 0
>>>>>               INFOG(15) (number of steps of iterative refinement after
>>>>> solution): 0
>>>>>               INFOG(16) (estimated size (in MB) of all MUMPS internal
>>>>> data for factorization after analysis: value on the most memory consuming
>>>>> processor): 39
>>>>>               INFOG(17) (estimated size of all MUMPS internal data for
>>>>> factorization after analysis: sum over all processors): 150
>>>>>               INFOG(18) (size of all MUMPS internal data allocated
>>>>> during factorization: value on the most memory consuming processor): 39
>>>>>               INFOG(19) (size of all MUMPS internal data allocated
>>>>> during factorization: sum over all processors): 150
>>>>>               INFOG(20) (estimated number of entries in the factors):
>>>>> 3361617
>>>>>               INFOG(21) (size in MB of memory effectively used during
>>>>> factorization - value on the most memory consuming processor): 35
>>>>>               INFOG(22) (size in MB of memory effectively used during
>>>>> factorization - sum over all processors): 136
>>>>>               INFOG(23) (after analysis: value of ICNTL(6) effectively
>>>>> used): 0
>>>>>               INFOG(24) (after analysis: value of ICNTL(12)
>>>>> effectively used): 1
>>>>>               INFOG(25) (after factorization: number of pivots
>>>>> modified by static pivoting): 0
>>>>>               INFOG(28) (after factorization: number of null pivots
>>>>> encountered): 0
>>>>>               INFOG(29) (after factorization: effective number of
>>>>> entries in the factors (sum over all processors)): 2931438
>>>>>               INFOG(30, 31) (after solution: size in Mbytes of memory
>>>>> used during solution phase): 0, 0
>>>>>               INFOG(32) (after analysis: type of analysis done): 1
>>>>>               INFOG(33) (value used for ICNTL(8)): 7
>>>>>               INFOG(34) (exponent of the determinant if determinant is
>>>>> requested): 0
>>>>>   linear system matrix = precond matrix:
>>>>>   Mat Object:   4 MPI processes
>>>>>     type: mpiaij
>>>>>     rows=22878, cols=22878
>>>>>     total: nonzeros=1219140, allocated nonzeros=1219140
>>>>>     total number of mallocs used during MatSetValues calls =0
>>>>>       using I-node (on process 0) routines: found 1889 nodes, limit
>>>>> used is 5
>>>>> converged reason: -11
>>>>> KSP Object: 4 MPI processes
>>>>>   type: preonly
>>>>>   maximum iterations=10000, initial guess is zero
>>>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>>>   left preconditioning
>>>>>   using NONE norm type for convergence test
>>>>> PC Object: 4 MPI processes
>>>>>   type: cholesky
>>>>>     Cholesky: out-of-place factorization
>>>>>     tolerance for zero pivot 2.22045e-14
>>>>>     matrix ordering: natural
>>>>>     factor fill ratio given 0., needed 0.
>>>>>       Factored matrix follows:
>>>>>         Mat Object:         4 MPI processes
>>>>>           type: mpiaij
>>>>>           rows=22878, cols=22878
>>>>>           package used to perform factorization: mumps
>>>>>           total: nonzeros=3361617, allocated nonzeros=3361617
>>>>>           total number of mallocs used during MatSetValues calls =0
>>>>>             MUMPS run parameters:
>>>>>               SYM (matrix type):                   2
>>>>>               PAR (host participation):            1
>>>>>               ICNTL(1) (output for error):         6
>>>>>               ICNTL(2) (output of diagnostic msg): 0
>>>>>               ICNTL(3) (output for global info):   0
>>>>>               ICNTL(4) (level of printing):        0
>>>>>               ICNTL(5) (input mat struct):         0
>>>>>               ICNTL(6) (matrix prescaling):        7
>>>>>               ICNTL(7) (sequentia matrix ordering):7
>>>>>               ICNTL(8) (scalling strategy):        77
>>>>>               ICNTL(10) (max num of refinements):  0
>>>>>               ICNTL(11) (error analysis):          0
>>>>>               ICNTL(12) (efficiency control):
>>>>> 0
>>>>>               ICNTL(13) (efficiency control):
>>>>> 0
>>>>>               ICNTL(14) (percentage of estimated workspace increase):
>>>>> 30
>>>>>               ICNTL(18) (input mat struct):
>>>>> 3
>>>>>               ICNTL(19) (Shur complement info):
>>>>> 0
>>>>>               ICNTL(20) (rhs sparse pattern):
>>>>> 0
>>>>>               ICNTL(21) (solution struct):
>>>>>  1
>>>>>               ICNTL(22) (in-core/out-of-core facility):
>>>>> 0
>>>>>               ICNTL(23) (max size of memory can be allocated
>>>>> locally):0
>>>>>               ICNTL(24) (detection of null pivot rows):
>>>>> 0
>>>>>               ICNTL(25) (computation of a null space basis):
>>>>>  0
>>>>>               ICNTL(26) (Schur options for rhs or solution):
>>>>>  0
>>>>>               ICNTL(27) (experimental parameter):
>>>>> -24
>>>>>               ICNTL(28) (use parallel or sequential ordering):
>>>>>  1
>>>>>               ICNTL(29) (parallel ordering):
>>>>>  0
>>>>>               ICNTL(30) (user-specified set of entries in inv(A)):
>>>>>  0
>>>>>               ICNTL(31) (factors is discarded in the solve phase):
>>>>>  0
>>>>>               ICNTL(33) (compute determinant):
>>>>>  0
>>>>>               CNTL(1) (relative pivoting threshold):      0.01
>>>>>               CNTL(2) (stopping criterion of refinement): 1.49012e-08
>>>>>               CNTL(3) (absolute pivoting threshold):      0.
>>>>>               CNTL(4) (value of static pivoting):         -1.
>>>>>               CNTL(5) (fixation for null pivots):         0.
>>>>>               RINFO(1) (local estimated flops for the elimination
>>>>> after analysis):
>>>>>                 [0] 1.84947e+08
>>>>>                 [1] 2.42065e+08
>>>>>                 [2] 2.53044e+08
>>>>>                 [3] 2.18441e+08
>>>>>               RINFO(2) (local estimated flops for the assembly after
>>>>> factorization):
>>>>>                 [0]  945938.
>>>>>                 [1]  906795.
>>>>>                 [2]  897815.
>>>>>                 [3]  998840.
>>>>>               RINFO(3) (local estimated flops for the elimination
>>>>> after factorization):
>>>>>                 [0]  1.59835e+08
>>>>>                 [1]  1.50867e+08
>>>>>                 [2]  2.27932e+08
>>>>>                 [3]  1.52037e+08
>>>>>               INFO(15) (estimated size of (in MB) MUMPS internal data
>>>>> for running numerical factorization):
>>>>>               [0] 36
>>>>>               [1] 37
>>>>>               [2] 38
>>>>>               [3] 39
>>>>>               INFO(16) (size of (in MB) MUMPS internal data used
>>>>> during numerical factorization):
>>>>>                 [0] 36
>>>>>                 [1] 37
>>>>>                 [2] 38
>>>>>                 [3] 39
>>>>>               INFO(23) (num of pivots eliminated on this processor
>>>>> after factorization):
>>>>>                 [0] 6450
>>>>>                 [1] 5442
>>>>>                 [2] 4386
>>>>>                 [3] 5526
>>>>>               RINFOG(1) (global estimated flops for the elimination
>>>>> after analysis): 8.98497e+08
>>>>>               RINFOG(2) (global estimated flops for the assembly after
>>>>> factorization): 3.74939e+06
>>>>>               RINFOG(3) (global estimated flops for the elimination
>>>>> after factorization): 6.9067e+08
>>>>>               (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
>>>>> (0.,0.)*(2^0)
>>>>>               INFOG(3) (estimated real workspace for factors on all
>>>>> processors after analysis): 4082184
>>>>>               INFOG(4) (estimated integer workspace for factors on all
>>>>> processors after analysis): 231846
>>>>>               INFOG(5) (estimated maximum front size in the complete
>>>>> tree): 678
>>>>>               INFOG(6) (number of nodes in the complete tree): 1380
>>>>>               INFOG(7) (ordering option effectively use after
>>>>> analysis): 5
>>>>>               INFOG(8) (structural symmetry in percent of the permuted
>>>>> matrix after analysis): 100
>>>>>               INFOG(9) (total real/complex workspace to store the
>>>>> matrix factors after factorization): 3521904
>>>>>               INFOG(10) (total integer space store the matrix factors
>>>>> after factorization): 229416
>>>>>               INFOG(11) (order of largest frontal matrix after
>>>>> factorization): 678
>>>>>               INFOG(12) (number of off-diagonal pivots): 0
>>>>>               INFOG(13) (number of delayed pivots after
>>>>> factorization): 0
>>>>>               INFOG(14) (number of memory compress after
>>>>> factorization): 0
>>>>>               INFOG(15) (number of steps of iterative refinement after
>>>>> solution): 0
>>>>>               INFOG(16) (estimated size (in MB) of all MUMPS internal
>>>>> data for factorization after analysis: value on the most memory consuming
>>>>> processor): 39
>>>>>               INFOG(17) (estimated size of all MUMPS internal data for
>>>>> factorization after analysis: sum over all processors): 150
>>>>>               INFOG(18) (size of all MUMPS internal data allocated
>>>>> during factorization: value on the most memory consuming processor): 39
>>>>>               INFOG(19) (size of all MUMPS internal data allocated
>>>>> during factorization: sum over all processors): 150
>>>>>               INFOG(20) (estimated number of entries in the factors):
>>>>> 3361617
>>>>>               INFOG(21) (size in MB of memory effectively used during
>>>>> factorization - value on the most memory consuming processor): 35
>>>>>               INFOG(22) (size in MB of memory effectively used during
>>>>> factorization - sum over all processors): 136
>>>>>               INFOG(23) (after analysis: value of ICNTL(6) effectively
>>>>> used): 0
>>>>>               INFOG(24) (after analysis: value of ICNTL(12)
>>>>> effectively used): 1
>>>>>               INFOG(25) (after factorization: number of pivots
>>>>> modified by static pivoting): 0
>>>>>               INFOG(28) (after factorization: number of null pivots
>>>>> encountered): 0
>>>>>               INFOG(29) (after factorization: effective number of
>>>>> entries in the factors (sum over all processors)): 2931438
>>>>>               INFOG(30, 31) (after solution: size in Mbytes of memory
>>>>> used during solution phase): 0, 0
>>>>>               INFOG(32) (after analysis: type of analysis done): 1
>>>>>               INFOG(33) (value used for ICNTL(8)): 7
>>>>>               INFOG(34) (exponent of the determinant if determinant is
>>>>> requested): 0
>>>>>   linear system matrix = precond matrix:
>>>>>   Mat Object:   4 MPI processes
>>>>>     type: mpiaij
>>>>>     rows=22878, cols=22878
>>>>>     total: nonzeros=1219140, allocated nonzeros=1219140
>>>>>     total number of mallocs used during MatSetValues calls =0
>>>>>       using I-node (on process 0) routines: found 1889 nodes, limit
>>>>> used is 5
>>>>> converged reason: -11
>>>>>
>>>>> ------------------------------------------------------------
>>>>> -----------------------------------------
>>>>>
>>>>
>>>>
>>
>

Re: [petsc-users] Issue updating MUMPS ictnl after failed solve

Reply via email to