Re: [petsc-users] CPR-AMG: SNES with two cores worse than with one

2017-07-07 Thread Robert Annewandter
Thank you Hong!

I've used GMRES via

mpirun \
  -n ${NP} pflotran \
  -pflotranin ${INPUTFILE}.pflinput \
  -flow_ksp_type gmres \
  -flow_pc_type bjacobi \
  -flow_sub_pc_type lu \
  -flow_sub_pc_factor_nonzeros_along_diagonal \
  -snes_monitor

and get:

NP 1

FLOW TS BE steps = 43 newton =   43 linear = 43 cuts
=  0
FLOW TS BE Wasted Linear Iterations = 0
FLOW TS BE SNES time = 197.0 seconds

NP 2

FLOW TS BE steps = 43 newton =   43 linear =770 cuts
=  0
FLOW TS BE Wasted Linear Iterations = 0
FLOW TS BE SNES time = 68.7 seconds

Which looks ok to me.

Robert



On 07/07/17 15:49, h...@aspiritech.org wrote:
> What do you get with '-ksp_type gmres' or '-ksp_type bcgs' in parallel
> runs?
> Hong
>
> On Fri, Jul 7, 2017 at 6:05 AM, Robert Annewandter
> <robert.annewand...@opengosim.com
> <mailto:robert.annewand...@opengosim.com>> wrote:
>
> Yes indeed, PFLOTRAN cuts timestep after 8 failed iterations of SNES.
>
> I've rerun with -snes_monitor (attached with canonical suffix),
> their -pc_type is always PCBJACOBI + PCLU (though we'd like to try
> SUPERLU in the future, however it works only with -mat_type aij..)
>
>
> The sequential and parallel runs I did  with
>  
> -ksp_type preonly -pc_type lu
> -pc_factor_nonzeros_along_diagonal -snes_monitor
>
> and
>
> -ksp_type preonly -pc_type bjacobi -sub_pc_type lu
> -sub_pc_factor_nonzeros_along_diagonal -snes_monitor
>
> As expected, the sequential are bot identical and the parallel
> takes half the time compared to sequential.
>
>
>
>
> On 07/07/17 01:20, Barry Smith wrote:
>>Looks like PFLOTRAN has a maximum number of SNES iterations as 8 and 
>> cuts the timestep if that fails.
>>
>>Please run with -snes_monitor I don't understand the strange densely 
>> packed information that PFLOTRAN is printing.
>>
>>It looks like the linear solver is converging fine in parallel, 
>> normally then there is absolutely no reason that the Newton should behave 
>> different on 2 processors than 1 unless there is something wrong with the 
>> Jacobian. What is the -pc_type for the two cases LU or your fancy thing? 
>>
>>Please run sequential and parallel with -pc_type lu and also with 
>> -snes_monitor.  We need to fix all the knobs but one in order to understand 
>> what is going on.
>>
>>
>>Barry
>>
>>
>>   
>>> On Jul 6, 2017, at 5:11 PM, Robert Annewandter 
>>> <robert.annewand...@opengosim.com>
>>> <mailto:robert.annewand...@opengosim.com> wrote:
>>>
>>> Thanks Barry!
>>>
>>> I've attached log files for np = 1 (SNES time: 218 s) and np = 2 (SNES 
>>> time: 600 s). PFLOTRAN final output:
>>>
>>> NP 1
>>>
>>> FLOW TS BE steps = 43 newton =   43 linear = 43 cuts =  
>>> 0
>>> FLOW TS BE Wasted Linear Iterations = 0
>>> FLOW TS BE SNES time = 218.9 seconds
>>>
>>> NP 2
>>>
>>> FLOW TS BE steps = 67 newton =  176 linear =314 cuts =  
>>>13
>>> FLOW TS BE Wasted Linear Iterations = 208
>>> FLOW TS BE SNES time = 600.0 seconds
>>>
>>>
>>> Robert
>>>
>>> On 06/07/17 21:24, Barry Smith wrote:
>>>>So on one process the outer linear solver takes a single iteration 
>>>> this is because the block Jacobi with LU and one block is a direct solver.
>>>>
>>>>
>>>>> 11 KSP preconditioned resid norm 1.131868956745e+00 true resid 
>>>>> norm 1.526261825526e-05 ||r(i)||/||b|| 1.485509868409e-05
>>>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 
>>>>> 2.148515820410e-14 is less than relative tolerance 1.e-07 
>>>>> times initial right hand side norm 1.581814306485e-02 at iteration 1
>>>>> 1 KSP unpreconditioned resid norm 2.148515820410e-14 true resid 
>>>>> norm 2.148698024622e-14 ||r(i)||/||b|| 1.358375642332e-12
>>>>>
>>>>On two processes the outer linear solver takes a few iterations to 
>>>> solver, this is to be expected. 
>>>>
>>>>But what you sent doesn't give any indication about SNES not 
>>>> converging. Please turn off all inner linear solver monitoring and just 
>>>> run with -ksp_monitor_true_residual -snes_monitor -snes_lineseach_monitor

[petsc-users] CPR-AMG: SNES with two cores worse than with one

2017-07-06 Thread Robert Annewandter
Hi all,

I like to understand why the SNES of my CPR-AMG Two-Stage Preconditioner
(with KSPFGMRES + multipl. PCComposite (PCGalerkin with KSPGMRES +
BoomerAMG, PCBJacobi + PCLU init) on a 24,000 x 24,000 matrix) struggles
to converge when using two cores instead of one. Because of the adaptive
time stepping of the Newton, this leads to severe cuts in time step.

This is how I run it with two cores

mpirun \
  -n 2 pflotran \
  -pflotranin het.pflinput \
  -ksp_monitor_true_residual \
  -flow_snes_view \
  -flow_snes_converged_reason \
  -flow_sub_1_pc_type bjacobi \
  -flow_sub_1_sub_pc_type lu \
  -flow_sub_1_sub_pc_factor_pivot_in_blocks true\
  -flow_sub_1_sub_pc_factor_nonzeros_along_diagonal \
  -options_left \
  -log_summary \
  -info


With one core I get (after grepping the crap away from -info):

 Step 32 Time=  1.8E+01

[...]

  0 2r: 1.58E-02 2x: 0.00E+00 2u: 0.00E+00 ir: 7.18E-03 iu: 0.00E+00
rsn:   0
[0] SNESComputeJacobian(): Rebuilding preconditioner
Residual norms for flow_ solve.
0 KSP unpreconditioned resid norm 1.581814306485e-02 true resid norm
1.581814306485e-02 ||r(i)||/||b|| 1.e+00
  Residual norms for flow_sub_0_galerkin_ solve.
  0 KSP preconditioned resid norm 5.697603110484e+07 true resid norm
5.175721849125e+03 ||r(i)||/||b|| 5.037527476892e+03
  1 KSP preconditioned resid norm 5.041509073319e+06 true resid norm
3.251596928176e+02 ||r(i)||/||b|| 3.164777657484e+02
  2 KSP preconditioned resid norm 1.043761838360e+06 true resid norm
8.957519558348e+01 ||r(i)||/||b|| 8.718349288342e+01
  3 KSP preconditioned resid norm 1.129189815646e+05 true resid norm
2.722436912053e+00 ||r(i)||/||b|| 2.649746479496e+00
  4 KSP preconditioned resid norm 8.829637298082e+04 true resid norm
8.026373593492e+00 ||r(i)||/||b|| 7.812065388300e+00
  5 KSP preconditioned resid norm 6.506021637694e+04 true resid norm
3.479889319880e+00 ||r(i)||/||b|| 3.386974527698e+00
  6 KSP preconditioned resid norm 6.392263200180e+04 true resid norm
3.819202631980e+00 ||r(i)||/||b|| 3.717228003987e+00
  7 KSP preconditioned resid norm 2.464946645480e+04 true resid norm
7.329964753388e-01 ||r(i)||/||b|| 7.134251013911e-01
  8 KSP preconditioned resid norm 2.603879153772e+03 true resid norm
2.035525412004e-02 ||r(i)||/||b|| 1.981175861414e-02
  9 KSP preconditioned resid norm 1.774410462754e+02 true resid norm
3.001214973121e-03 ||r(i)||/||b|| 2.921081026352e-03
10 KSP preconditioned resid norm 1.664227038378e+01 true resid norm
3.413136309181e-04 ||r(i)||/||b|| 3.322003855903e-04
[0] KSPConvergedDefault(): Linear solver has converged. Residual norm
1.131868956745e+00 is less than relative tolerance 1.e-07
times initial right hand side norm 2.067297386780e+07 at iteration 11
11 KSP preconditioned resid norm 1.131868956745e+00 true resid norm
1.526261825526e-05 ||r(i)||/||b|| 1.485509868409e-05
[0] KSPConvergedDefault(): Linear solver has converged. Residual norm
2.148515820410e-14 is less than relative tolerance 1.e-07
times initial right hand side norm 1.581814306485e-02 at iteration 1
1 KSP unpreconditioned resid norm 2.148515820410e-14 true resid norm
2.148698024622e-14 ||r(i)||/||b|| 1.358375642332e-12
[0] SNESSolve_NEWTONLS(): iter=0, linear solve iterations=1
[0] SNESNEWTONLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX||
3.590873180642e-01 near zero implies inconsistent rhs
[0] SNESSolve_NEWTONLS(): fnorm=1.5818143064846742e-02,
gnorm=1.0695649833687331e-02, ynorm=4.6826522561266171e+02, lssucceed=0
[0] SNESConvergedDefault(): Converged due to small update length:
4.682652256127e+02 < 1.e-05 * 3.702480426117e+09
  1 2r: 1.07E-02 2x: 3.70E+09 2u: 4.68E+02 ir: 5.05E-03 iu: 4.77E+01
rsn: stol
Nonlinear flow_ solve converged due to CONVERGED_SNORM_RELATIVE iterations 1



But with two cores I get:


 Step 32 Time=  1.8E+01

[...]

  0 2r: 6.16E-03 2x: 0.00E+00 2u: 0.00E+00 ir: 3.63E-03 iu: 0.00E+00
rsn:   0
[0] SNESComputeJacobian(): Rebuilding preconditioner

Residual norms for flow_ solve.
0 KSP unpreconditioned resid norm 6.162760088924e-03 true resid norm
6.162760088924e-03 ||r(i)||/||b|| 1.e+00
  Residual norms for flow_sub_0_galerkin_ solve.
  0 KSP preconditioned resid norm 8.994949630499e+08 true resid norm
7.982144380936e-01 ||r(i)||/||b|| 1.e+00
  1 KSP preconditioned resid norm 8.950556502615e+08 true resid norm
1.550138696155e+00 ||r(i)||/||b|| 1.942007839218e+00
  2 KSP preconditioned resid norm 1.044849684205e+08 true resid norm
2.166193480531e+00 ||r(i)||/||b|| 2.713798920631e+00
  3 KSP preconditioned resid norm 8.209708619718e+06 true resid norm
3.076045005154e-01 ||r(i)||/||b|| 3.853657436340e-01
  4 KSP preconditioned resid norm 3.027461352422e+05 true resid norm
1.207731865714e-02 ||r(i)||/||b|| 1.513041869549e-02
  5 KSP preconditioned resid norm 1.595302164817e+04 true resid norm
4.123713694368e-04 

Re: [petsc-users] Hypre BoomerAMG has slow convergence

2017-07-04 Thread Robert Annewandter
Thank you Barry!



I believe the sub problem can be singular, because the first
preconditioner M1 in the CPR-AMG preconditioner

Mcpr^(-1) = M2^(-1) [ I - J M1^(-1) ] + M1^(-1),

where

M1^(-1) = C [ W^T J C ]^(-1) W^T,

has prolongation C and restriction W^T with size 3N x N resp. N x 3N
(and 3 # of primary variables).

Would that make sense?





On 04/07/17 18:24, Barry Smith wrote:
>The large preconditioned norm relative to the true residual norm is often 
> a sign that the preconditioner is not happy.
>
>   0 KSP preconditioned resid norm 2.495457360562e+08 true resid norm 
> 9.213492769259e-01 ||r(i)||/||b|| 1.e+00
>
> Is there any chance this subproblem is singular? 
>
>Run with -flow_sub_0_galerkin_ksp_view_mat binary 
> -flow_sub_0_galerkin_ksp_view_rhs binary for 18 stages. This will save the 
> matrices and the right hand sides for the 18 systems passed to hypre in a 
> file called binaryoutput. Email the file to petsc-ma...@mcs.anl.gov 
>
>Barry
>
>
>> On Jul 4, 2017, at 11:59 AM, Robert Annewandter 
>> <robert.annewand...@opengosim.com> wrote:
>>
>> Hi all,
>>
>>
>> I'm working on a CPR-AMG Two-Stage preconditioner implemented as 
>> multiplicative PCComposite with outer FGMRES, where the first PC is Hypre 
>> AMG (PCGalerkin + KSPRichardson + PCHYPRE) and the second stage is Block 
>> Jacobi with LU. The pde's describe two-phase subsurface flow, and I kept the 
>> problem small at 8000 x 8000 dofs.
>>
>> The first stage is hard-wired because of the PCGalerkin part and the second 
>> stage Block Jacobi is configured via command line (with pflotran prefix 
>> flow_):
>>
>>   -flow_sub_1_pc_type bjacobi \
>>   -flow_sub_1_sub_pc_type lu \
>>
>> With this configuration I see occasionally that Hypre struggles to converge 
>> fast:
>>
>>
>> Step 16
>>
>> 0 2r: 3.95E-03 2x: 0.00E+00 2u: 0.00E+00 ir: 2.53E-03 iu: 0.00E+00 rsn:  
>>  0
>> Residual norms for flow_ solve.
>> 0 KSP unpreconditioned resid norm 3.945216988332e-03 true resid norm 
>> 3.945216988332e-03 ||r(i)||/||b|| 1.e+00
>> Residual norms for flow_sub_0_galerkin_ solve.
>> 0 KSP preconditioned resid norm 2.495457360562e+08 true resid norm 
>> 9.213492769259e-01 ||r(i)||/||b|| 1.e+00
>> 1 KSP preconditioned resid norm 3.900401635809e+07 true resid norm 
>> 1.211813734614e-01 ||r(i)||/||b|| 1.315259874797e-01
>> 2 KSP preconditioned resid norm 7.264015944695e+06 true resid norm 
>> 2.127154159346e-02 ||r(i)||/||b|| 2.308738078618e-02
>> 3 KSP preconditioned resid norm 1.523934370189e+06 true resid norm 
>> 4.50720434e-03 ||r(i)||/||b|| 4.891961172285e-03
>> 4 KSP preconditioned resid norm 3.456355485206e+05 true resid norm 
>> 1.017486337883e-03 ||r(i)||/||b|| 1.104343774250e-03
>> 5 KSP preconditioned resid norm 8.215494701640e+04 true resid norm 
>> 2.386758602821e-04 ||r(i)||/||b|| 2.590503582729e-04
>> 6 KSP preconditioned resid norm 2.006221595869e+04 true resid norm 
>> 5.806707975375e-05 ||r(i)||/||b|| 6.302395975986e-05
>> 7 KSP preconditioned resid norm 4.975749682114e+03 true resid norm 
>> 1.457831681999e-05 ||r(i)||/||b|| 1.582279075383e-05
>> 8 KSP preconditioned resid norm 1.245359749620e+03 true resid norm 
>> 3.746721600730e-06 ||r(i)||/||b|| 4.066559441204e-06
>> 9 KSP preconditioned resid norm 3.134373137075e+02 true resid norm 
>> 9.784665277082e-07 ||r(i)||/||b|| 1.061993048904e-06
>>10 KSP preconditioned resid norm 7.917076489741e+01 true resid norm 
>> 2.582765351245e-07 ||r(i)||/||b|| 2.803242392356e-07
>>11 KSP preconditioned resid norm 2.004702594193e+01 true resid norm 
>> 6.867609287185e-08 ||r(i)||/||b|| 7.453860831257e-08
>> 1 KSP unpreconditioned resid norm 3.022346103074e-11 true resid norm 
>> 3.022346103592e-11 ||r(i)||/||b|| 7.660785484121e-09
>>   1 2r: 2.87E-04 2x: 3.70E+09 2u: 3.36E+02 ir: 1.67E-04 iu: 2.19E+01 rsn: 
>> stol
>> Nonlinear flow_ solve converged due to CONVERGED_SNORM_RELATIVE iterations 1
>>
>>
>> Step 17
>>
>>   0 2r: 3.85E-03 2x: 0.00E+00 2u: 0.00E+00 ir: 2.69E-03 iu: 0.00E+00 rsn:   0
>> Residual norms for flow_ solve.
>> 0 KSP unpreconditioned resid norm 3.846677237838e-03 true resid norm 
>> 3.846677237838e-03 ||r(i)||/||b|| 1.e+00
>> Residual norms for flow_sub_0_galerkin_ solve.
>> 0 KSP preconditioned resid norm 8.359592959751e+07 true resid norm 
>> 8.919381920269e-01 ||r(i)||/||b|| 1.e+00
>> 1 KSP preconditioned resid norm 2.04647

[petsc-users] Hypre BoomerAMG has slow convergence

2017-07-04 Thread Robert Annewandter
Hi all,


I'm working on a CPR-AMG Two-Stage preconditioner implemented as
multiplicative PCComposite with outer FGMRES, where the first PC is
Hypre AMG (PCGalerkin + KSPRichardson + PCHYPRE) and the second stage is
Block Jacobi with LU. The pde's describe two-phase subsurface flow, and
I kept the problem small at 8000 x 8000 dofs.

The first stage is hard-wired because of the PCGalerkin part and the
second stage Block Jacobi is configured via command line (with pflotran
prefix flow_):

  -flow_sub_1_pc_type bjacobi \
  -flow_sub_1_sub_pc_type lu \

With this configuration I see occasionally that Hypre struggles to
converge fast:


Step 16

0 2r: 3.95E-03 2x: 0.00E+00 2u: 0.00E+00 ir: 2.53E-03 iu: 0.00E+00
rsn:   0
Residual norms for flow_ solve.
0 KSP unpreconditioned resid norm 3.945216988332e-03 true resid norm
3.945216988332e-03 ||r(i)||/||b|| 1.e+00
Residual norms for flow_sub_0_galerkin_ solve.
0 KSP preconditioned resid norm 2.495457360562e+08 true resid norm
9.213492769259e-01 ||r(i)||/||b|| 1.e+00
1 KSP preconditioned resid norm 3.900401635809e+07 true resid norm
1.211813734614e-01 ||r(i)||/||b|| 1.315259874797e-01
2 KSP preconditioned resid norm 7.264015944695e+06 true resid norm
2.127154159346e-02 ||r(i)||/||b|| 2.308738078618e-02
3 KSP preconditioned resid norm 1.523934370189e+06 true resid norm
4.50720434e-03 ||r(i)||/||b|| 4.891961172285e-03
4 KSP preconditioned resid norm 3.456355485206e+05 true resid norm
1.017486337883e-03 ||r(i)||/||b|| 1.104343774250e-03
5 KSP preconditioned resid norm 8.215494701640e+04 true resid norm
2.386758602821e-04 ||r(i)||/||b|| 2.590503582729e-04
6 KSP preconditioned resid norm 2.006221595869e+04 true resid norm
5.806707975375e-05 ||r(i)||/||b|| 6.302395975986e-05
7 KSP preconditioned resid norm 4.975749682114e+03 true resid norm
1.457831681999e-05 ||r(i)||/||b|| 1.582279075383e-05
8 KSP preconditioned resid norm 1.245359749620e+03 true resid norm
3.746721600730e-06 ||r(i)||/||b|| 4.066559441204e-06
9 KSP preconditioned resid norm 3.134373137075e+02 true resid norm
9.784665277082e-07 ||r(i)||/||b|| 1.061993048904e-06
   10 KSP preconditioned resid norm 7.917076489741e+01 true resid norm
2.582765351245e-07 ||r(i)||/||b|| 2.803242392356e-07
   11 KSP preconditioned resid norm 2.004702594193e+01 true resid norm
6.867609287185e-08 ||r(i)||/||b|| 7.453860831257e-08
1 KSP unpreconditioned resid norm 3.022346103074e-11 true resid norm
3.022346103592e-11 ||r(i)||/||b|| 7.660785484121e-09
  1 2r: 2.87E-04 2x: 3.70E+09 2u: 3.36E+02 ir: 1.67E-04 iu: 2.19E+01
rsn: stol
Nonlinear flow_ solve converged due to CONVERGED_SNORM_RELATIVE iterations 1


Step 17

  0 2r: 3.85E-03 2x: 0.00E+00 2u: 0.00E+00 ir: 2.69E-03 iu: 0.00E+00
rsn:   0
Residual norms for flow_ solve.
0 KSP unpreconditioned resid norm 3.846677237838e-03 true resid norm
3.846677237838e-03 ||r(i)||/||b|| 1.e+00
Residual norms for flow_sub_0_galerkin_ solve.
0 KSP preconditioned resid norm 8.359592959751e+07 true resid norm
8.919381920269e-01 ||r(i)||/||b|| 1.e+00
1 KSP preconditioned resid norm 2.046474217608e+07 true resid norm
1.356172589724e+00 ||r(i)||/||b|| 1.520478214574e+00
2 KSP preconditioned resid norm 5.534610937223e+06 true resid norm
1.361527715124e+00 ||r(i)||/||b|| 1.526482134406e+00
3 KSP preconditioned resid norm 1.642592089665e+06 true resid norm
1.359990274368e+00 ||r(i)||/||b|| 1.524758426677e+00
4 KSP preconditioned resid norm 6.869446528993e+05 true resid norm
1.357740694885e+00 ||r(i)||/||b|| 1.522236301823e+00
5 KSP preconditioned resid norm 5.245968674991e+05 true resid norm
1.355364470917e+00 ||r(i)||/||b|| 1.519572189007e+00
6 KSP preconditioned resid norm 5.042030663187e+05 true resid norm
1.352962944308e+00 ||r(i)||/||b|| 1.516879708036e+00
7 KSP preconditioned resid norm 5.007302249221e+05 true resid norm
1.350558656878e+00 ||r(i)||/||b|| 1.514184131760e+00
8 KSP preconditioned resid norm 4.994105316949e+05 true resid norm
1.348156961110e+00 ||r(i)||/||b|| 1.511491461137e+00
9 KSP preconditioned resid norm 4.984373051647e+05 true resid norm
1.345759135434e+00 ||r(i)||/||b|| 1.508803129481e+00
   10 KSP preconditioned resid norm 4.975323739321e+05 true resid norm
1.343365479502e+00 ||r(i)||/||b|| 1.506119472750e+00
   11 KSP preconditioned resid norm 4.966432959339e+05 true resid norm
1.340976058673e+00 ||r(i)||/||b|| 1.503440564224e+00
[...]
  193 KSP preconditioned resid norm 3.591931201817e+05 true resid norm
9.698521332569e-01 ||r(i)||/||b|| 1.087353520599e+00
  194 KSP preconditioned resid norm 3.585542278288e+05 true resid norm
9.681270691497e-01 ||r(i)||/||b|| 1.085419458213e+00
  195 KSP preconditioned resid norm 3.579164717745e+05 true resid norm
9.664050733935e-01 ||r(i)||/||b|| 1.083488835922e+00
  196 KSP preconditioned resid norm 3.572798501551e+05 true resid norm
9.646861405301e-01 ||r(i)||/||b|| 

Re: [petsc-users] PCCOMPOSITE with PCBJACOBI

2017-06-28 Thread Robert Annewandter
Interesting! And would fit into configuring PFLOTRAN via its input decks
(ie we could also provide ASM instead of Block Jacobi)

Thanks a lot!


On 28/06/17 17:31, Barry Smith wrote:
>> On Jun 28, 2017, at 2:07 AM, Robert Annewandter 
>> <robert.annewand...@opengosim.com> wrote:
>>
>> Thank you Barry!
>>
>> We like to hard wire it into PFLOTRAN with CPR-AMG Block Jacobi Two-Stage 
>> Preconditioning potentially becoming the standard solver strategy. 
>Understood. Note that you can embed the options into the program with 
> PetscOptionsSetValue() so they don't need to be on the command line.
>
> Barry
>
>> Using the options database is a great start to reverse engineer the issue!
>>
>> Thanks!
>> Robert
>>
>>
>>
>>
>> On 27/06/17 23:45, Barry Smith wrote:
>>>It is difficult, if not impossible at times to get all the options where 
>>> you want them to be using the function call interface. On the other hand it 
>>> is generally easy (if there are no inner PCSHELLS) to do this via the 
>>> options database
>>>
>>>-pc_type composite
>>>-pc_composite_type multiplicative
>>>-pc_composite_pcs galerkin,bjacobi
>>>
>>>-sub_0_galerkin_ksp_type preonly
>>>-sub_0_galerkin_pc_type none
>>>
>>>-sub_1_sub_pc_factor_shift_type inblocks
>>>-sub_1_sub_pc_factor_zero_pivot zpiv
>>>
>>>
>>>
>>>
>>>> On Jun 27, 2017, at 11:24 AM, Robert Annewandter 
>>>> <robert.annewand...@opengosim.com>
>>>>  wrote:
>>>>
>>>> Dear PETSc folks,
>>>>
>>>>
>>>> I want a Block Jacobi PC to be the second PC in a two-stage 
>>>> preconditioning scheme implemented via multiplicative PCCOMPOSITE, with 
>>>> the outermost KSP an FGMRES.
>>>>
>>>>
>>>> However, PCBJacobiGetSubKSP (
>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCBJacobiGetSubKSP.html#PCBJacobiGetSubKSP
>>>> ) requires to call KSPSetUp (or PCSetUp) first on its parent KSP, which I 
>>>> struggle in succeeding. I wonder which KSP (or if so PC) that is.  
>>>>
>>>>
>>>> This is how I attempt to do it (using PCKSP to provide a parent KSP for 
>>>> PCBJacobiGetSubKSP):
>>>>
>>>>
>>>> call KSPGetPC(solver%ksp, solver%pc, ierr); CHKERRQ(ierr)
>>>> call PCSetType(solver%pc, PCCOMPOSITE, ierr); CHKERRQ(ierr)
>>>> call PCCompositeSetType(solver%pc, PC_COMPOSITE_MULTIPLICATIVE, ierr); 
>>>> CHKERRQ(ierr)
>>>>
>>>>
>>>> ! 1st Stage 
>>>> call PCCompositeAddPC(solver%pc, PCGALERKIN, ierr); CHKERRQ(ierr)
>>>> call PCCompositeGetPC(solver%pc, 0, T1, ierr); CHKERRQ(ierr)
>>>>
>>>>
>>>> ! KSPPREONLY-PCNONE for testing
>>>> call PCGalerkinGetKSP(T1, Ap_ksp, ierr); CHKERRQ(ierr)
>>>> call KSPSetType(Ap_ksp, KSPPREONLY, ierr); CHKERRQ(ierr)
>>>> call KSPGetPC(Ap_ksp, Ap_pc, ierr); CHKERRQ(ierr)
>>>> call PCSetType(Ap_pc, PCNONE, ierr); CHKERRQ(ierr)
>>>>
>>>>
>>>> ! 2nd Stage
>>>> call PCCompositeAddPC(solver%pc, PCKSP, ierr); CHKERRQ(ierr)
>>>> call PCCompositeGetPC(solver%pc, 1, T2, ierr); CHKERRQ(ierr)
>>>> call PCKSPGetKSP(T2, BJac_ksp, ierr); CHKERRQ(ierr)
>>>> call KSPSetType(BJac_ksp, KSPPREONLY, ierr); CHKERRQ(ierr)
>>>> call KSPGetPC(BJac_ksp, BJac_pc, ierr); CHKERRQ(ierr)
>>>> call PCSetType(BJac_pc, PCBJACOBI, ierr); CHKERRQ(ierr)
>>>>
>>>>
>>>> call KSPSetUp(solver%ksp, ierr); CHKERRQ(ierr)
>>>> ! call KSPSetUp(BJac_ksp, ierr); CHKERRQ(ierr)
>>>> ! call PCSetUp(T2, ierr); CHKERRQ(ierr)
>>>> ! call PCSetUp(BJac_pc, ierr); CHKERRQ(ierr)
>>>>
>>>>
>>>> call PCBJacobiGetSubKSP(BJac_pc, nsub_ksp, first_sub_ksp, PETSC_NULL_KSP, 
>>>> ierr); CHKERRQ(ierr)
>>>> allocate(sub_ksps(nsub_ksp))
>>>> call PCBJacobiGetSubKSP(BJac_pc, nsub_ksp, first_sub_ksp, sub_ksps,ierr); 
>>>> CHKERRQ(ierr)
>>>> do i = 1, nsub_ksp
>>>>   call KSPGetPC(sub_ksps(i), BJac_pc_sub, ierr); CHKERRQ(ierr)
>>>>   call PCFactorSetShiftType(BJac_pc_sub, MAT_SHIFT_INBLOCKS, ierr); 
>>>> CHKERRQ(ierr)
>>>>   call PCFactorSetZeroPivot(BJac_pc_sub, solver%linear_zero_pivot_tol, 
>>>> ierr); CHKERRQ(ierr)
>>>

Re: [petsc-users] PCCOMPOSITE with PCBJACOBI

2017-06-28 Thread Robert Annewandter
Thank you Barry!

We like to hard wire it into PFLOTRAN with CPR-AMG Block Jacobi
Two-Stage Preconditioning potentially becoming the standard solver
strategy. Using the options database is a great start to reverse
engineer the issue!

Thanks!
Robert




On 27/06/17 23:45, Barry Smith wrote:
>It is difficult, if not impossible at times to get all the options where 
> you want them to be using the function call interface. On the other hand it 
> is generally easy (if there are no inner PCSHELLS) to do this via the options 
> database
>
>-pc_type composite
>-pc_composite_type multiplicative
>-pc_composite_pcs galerkin,bjacobi
>
>-sub_0_galerkin_ksp_type preonly
>-sub_0_galerkin_pc_type none
>
>-sub_1_sub_pc_factor_shift_type inblocks
>-sub_1_sub_pc_factor_zero_pivot zpiv
>
>
>
>> On Jun 27, 2017, at 11:24 AM, Robert Annewandter 
>> <robert.annewand...@opengosim.com> wrote:
>>
>> Dear PETSc folks,
>>
>>
>> I want a Block Jacobi PC to be the second PC in a two-stage preconditioning 
>> scheme implemented via multiplicative PCCOMPOSITE, with the outermost KSP an 
>> FGMRES.
>>
>>
>> However, PCBJacobiGetSubKSP 
>> (https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCBJacobiGetSubKSP.html#PCBJacobiGetSubKSP)
>>  requires to call KSPSetUp (or PCSetUp) first on its parent KSP, which I 
>> struggle in succeeding. I wonder which KSP (or if so PC) that is.  
>>
>>
>> This is how I attempt to do it (using PCKSP to provide a parent KSP for 
>> PCBJacobiGetSubKSP):
>>
>>
>> call KSPGetPC(solver%ksp, solver%pc, ierr); CHKERRQ(ierr)
>> call PCSetType(solver%pc, PCCOMPOSITE, ierr); CHKERRQ(ierr)
>> call PCCompositeSetType(solver%pc, PC_COMPOSITE_MULTIPLICATIVE, ierr); 
>> CHKERRQ(ierr)
>>
>>
>> ! 1st Stage 
>> call PCCompositeAddPC(solver%pc, PCGALERKIN, ierr); CHKERRQ(ierr)
>> call PCCompositeGetPC(solver%pc, 0, T1, ierr); CHKERRQ(ierr)
>>
>>
>> ! KSPPREONLY-PCNONE for testing
>> call PCGalerkinGetKSP(T1, Ap_ksp, ierr); CHKERRQ(ierr)
>> call KSPSetType(Ap_ksp, KSPPREONLY, ierr); CHKERRQ(ierr)
>> call KSPGetPC(Ap_ksp, Ap_pc, ierr); CHKERRQ(ierr)
>> call PCSetType(Ap_pc, PCNONE, ierr); CHKERRQ(ierr)
>>
>>
>> ! 2nd Stage
>> call PCCompositeAddPC(solver%pc, PCKSP, ierr); CHKERRQ(ierr)
>> call PCCompositeGetPC(solver%pc, 1, T2, ierr); CHKERRQ(ierr)
>> call PCKSPGetKSP(T2, BJac_ksp, ierr); CHKERRQ(ierr)
>> call KSPSetType(BJac_ksp, KSPPREONLY, ierr); CHKERRQ(ierr)
>> call KSPGetPC(BJac_ksp, BJac_pc, ierr); CHKERRQ(ierr)
>> call PCSetType(BJac_pc, PCBJACOBI, ierr); CHKERRQ(ierr)
>>
>>
>> call KSPSetUp(solver%ksp, ierr); CHKERRQ(ierr)
>> ! call KSPSetUp(BJac_ksp, ierr); CHKERRQ(ierr)
>> ! call PCSetUp(T2, ierr); CHKERRQ(ierr)
>> ! call PCSetUp(BJac_pc, ierr); CHKERRQ(ierr)
>>
>>
>> call PCBJacobiGetSubKSP(BJac_pc, nsub_ksp, first_sub_ksp, PETSC_NULL_KSP, 
>> ierr); CHKERRQ(ierr)
>> allocate(sub_ksps(nsub_ksp))
>> call PCBJacobiGetSubKSP(BJac_pc, nsub_ksp, first_sub_ksp, sub_ksps,ierr); 
>> CHKERRQ(ierr)
>> do i = 1, nsub_ksp
>>   call KSPGetPC(sub_ksps(i), BJac_pc_sub, ierr); CHKERRQ(ierr)
>>   call PCFactorSetShiftType(BJac_pc_sub, MAT_SHIFT_INBLOCKS, ierr); 
>> CHKERRQ(ierr)
>>   call PCFactorSetZeroPivot(BJac_pc_sub, solver%linear_zero_pivot_tol, 
>> ierr); CHKERRQ(ierr)
>> end do
>> deallocate(sub_ksps)
>> nullify(sub_ksps)
>>
>>
>> Is using PCKSP a good idea at all? 
>>
>>
>> With KSPSetUp(solver%ksp) -> FGMRES
>>
>> [0]PETSC ERROR: - Error Message 
>> --
>> [0]PETSC ERROR: Object is in wrong state
>> [0]PETSC ERROR: You requested a vector from a KSP that cannot provide one
>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
>> trouble shooting.
>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad  GIT 
>> Date: 2017-03-30 14:27:53 -0500
>> [0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun 27 
>> 16:55:14 2017
>> [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes 
>> --download-fblaslapack=yes --download-metis=yes --download-parmetis=yes 
>> --download-eigen=yes --download-hypre=yes --download-superlu_dist=yes 
>> --download-superlu=yes --with-cc=gcc-6 --with-cxx=g++-6 --with-fc=gfortran-6 
>> PETSC_ARCH=debug_g-6.2 PETSC_DIR=/home/pujjad/Repositories/petsc
>> [0]PETSC ERROR: #1 KSPCreateVecs() line 939 in 

[petsc-users] PCCOMPOSITE with PCBJACOBI

2017-06-27 Thread Robert Annewandter
Dear PETSc folks,


I want a Block Jacobi PC to be the second PC in a two-stage
preconditioning scheme implemented via multiplicative PCCOMPOSITE, with
the outermost KSP an FGMRES.


However, PCBJacobiGetSubKSP
(https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCBJacobiGetSubKSP.html#PCBJacobiGetSubKSP)
requires to call KSPSetUp (or PCSetUp) first on its parent KSP, which I
struggle in succeeding. I wonder which KSP (or if so PC) that is. 


This is how I attempt to do it (using PCKSP to provide a parent KSP for
PCBJacobiGetSubKSP):


call KSPGetPC(solver%ksp, solver%pc, ierr); CHKERRQ(ierr)
call PCSetType(solver%pc, PCCOMPOSITE, ierr); CHKERRQ(ierr)
call PCCompositeSetType(solver%pc, PC_COMPOSITE_MULTIPLICATIVE, ierr);
CHKERRQ(ierr)


! 1st Stage
call PCCompositeAddPC(solver%pc, PCGALERKIN, ierr); CHKERRQ(ierr)
call PCCompositeGetPC(solver%pc, 0, T1, ierr); CHKERRQ(ierr)


! KSPPREONLY-PCNONE for testing
call PCGalerkinGetKSP(T1, Ap_ksp, ierr); CHKERRQ(ierr)
call KSPSetType(Ap_ksp, KSPPREONLY, ierr); CHKERRQ(ierr)
call KSPGetPC(Ap_ksp, Ap_pc, ierr); CHKERRQ(ierr)
call PCSetType(Ap_pc, PCNONE, ierr); CHKERRQ(ierr)


! 2nd Stage
call PCCompositeAddPC(solver%pc, PCKSP, ierr); CHKERRQ(ierr)
call PCCompositeGetPC(solver%pc, 1, T2, ierr); CHKERRQ(ierr)
call PCKSPGetKSP(T2, BJac_ksp, ierr); CHKERRQ(ierr)
call KSPSetType(BJac_ksp, KSPPREONLY, ierr); CHKERRQ(ierr)
call KSPGetPC(BJac_ksp, BJac_pc, ierr); CHKERRQ(ierr)
call PCSetType(BJac_pc, PCBJACOBI, ierr); CHKERRQ(ierr)


call KSPSetUp(solver%ksp, ierr); CHKERRQ(ierr)
! call KSPSetUp(BJac_ksp, ierr); CHKERRQ(ierr)
! call PCSetUp(T2, ierr); CHKERRQ(ierr)
! call PCSetUp(BJac_pc, ierr); CHKERRQ(ierr)


call PCBJacobiGetSubKSP(BJac_pc, nsub_ksp, first_sub_ksp,
PETSC_NULL_KSP, ierr); CHKERRQ(ierr)
allocate(sub_ksps(nsub_ksp))
call PCBJacobiGetSubKSP(BJac_pc, nsub_ksp, first_sub_ksp,
sub_ksps,ierr); CHKERRQ(ierr)
do i = 1, nsub_ksp
  call KSPGetPC(sub_ksps(i), BJac_pc_sub, ierr); CHKERRQ(ierr)
  call PCFactorSetShiftType(BJac_pc_sub, MAT_SHIFT_INBLOCKS, ierr);
CHKERRQ(ierr)
  call PCFactorSetZeroPivot(BJac_pc_sub, solver%linear_zero_pivot_tol,
ierr); CHKERRQ(ierr)
end do
deallocate(sub_ksps)
nullify(sub_ksps)


Is using PCKSP a good idea at all?


With KSPSetUp(solver%ksp) -> FGMRES

[0]PETSC ERROR: - Error Message
--
[0]PETSC ERROR: Object is in wrong state
[0]PETSC ERROR: You requested a vector from a KSP that cannot provide one
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
for trouble shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad 
GIT Date: 2017-03-30 14:27:53 -0500
[0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun
27 16:55:14 2017
[0]PETSC ERROR: Configure options --download-mpich=yes
--download-hdf5=yes --download-fblaslapack=yes --download-metis=yes
--download-parmetis=yes --download-eigen=yes --download-hypre=yes
--download-superlu_dist=yes --download-superlu=yes --with-cc=gcc-6
--with-cxx=g++-6 --with-fc=gfortran-6 PETSC_ARCH=debug_g-6.2
PETSC_DIR=/home/pujjad/Repositories/petsc
[0]PETSC ERROR: #1 KSPCreateVecs() line 939 in
/home/pujjad/Repositories/petsc/src/ksp/ksp/interface/iterativ.c
[0]PETSC ERROR: #2 KSPSetUp_GMRES() line 85 in
/home/pujjad/Repositories/petsc/src/ksp/ksp/impls/gmres/gmres.c
[0]PETSC ERROR: #3 KSPSetUp_FGMRES() line 41 in
/home/pujjad/Repositories/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c
[0]PETSC ERROR: #4 KSPSetUp() line 338 in
/home/pujjad/Repositories/petsc/src/ksp/ksp/interface/itfunc.c
application called MPI_Abort(MPI_COMM_WORLD, 73) - process 0
[mpiexec@mother] handle_pmi_cmd (./pm/pmiserv/pmiserv_cb.c:52):
Unrecognized PMI command: abort | cleaning up processes
[mpiexec@mother] control_cb (./pm/pmiserv/pmiserv_cb.c:289): unable to
process PMI command
[mpiexec@mother] HYDT_dmxu_poll_wait_for_event
(./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@mother] HYD_pmci_wait_for_completion
(./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
[mpiexec@mother] main (./ui/mpich/mpiexec.c:405): process manager error
waiting for completion



With KSPSetUp(BJac_ksp) -> KSPPREONLY

[0]PETSC ERROR: - Error Message
--
[0]PETSC ERROR: Arguments are incompatible
[0]PETSC ERROR: Both n and N cannot be PETSC_DECIDE
  likely a call to VecSetSizes() or MatSetSizes() is wrong.
See http://www.mcs.anl.gov/petsc/documentation/faq.html#split
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
for trouble shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3167-g03c0fad 
GIT Date: 2017-03-30 14:27:53 -0500
[0]PETSC ERROR: pflotran on a debug_g-6.2 named mother by pujjad Tue Jun
27 16:52:57 2017
[0]PETSC ERROR: Configure options --download-mpich=yes
--download-hdf5=yes --download-fblaslapack=yes --download-metis=yes
--download-parmetis=yes 

[petsc-users] Access to 'step' in 'toil_ims.f90' and 'pm_toil_ims.f90'

2016-11-21 Thread Robert Annewandter

  
  
Hi,
  
  Is there a way to access 'step' as used in the output log file :
  
   Step 35 Time=  1.0E-03 Dt=  2.73903E-05 [d]
  snes_conv_reason:   10,
  
  from within the subroutines '
  TOilImsResidual(snes,xx,r,realization,ierr)' and
  'TOilImsJacobian(snes,xx,A,B,realization,ierr)' in 'toil_ims.f90',
  as well as from within
  'PMTOilImsCheckUpdatePre(this,line_search,X,dX,changed,ierr)' in
  'pm_toil_ims.f90'?
  
  I like to make sure that J, b and dx are taken from the same step.
  I can do that if the filenames of the exports reflect step and NR
  iteration. The problem is that, e.g. 'TOilImsResidual()' is called
  more often than is 'TOilImsJacobian()', so using a simple counter
  which gets increased every time when the subroutines are called
  doesn't help unfortunately.
  
  Thank you for any pointers!
  
  Robert
  
  

  



Re: [petsc-users] Control Output of '-ksp_view_mat binary'

2016-11-17 Thread Robert Annewandter

  
  
Thank you Matt and Barry!

On 17/11/16 16:05, Barry Smith wrote:


  
   You can also call MatView(jac,PETSC_VIEWER_BINARY_WORLD); at the location where you compute the Jacobian based on the iteration number etc


  
On Nov 17, 2016, at 9:35 AM, Matthew Knepley <knep...@gmail.com> wrote:

On Thu, Nov 17, 2016 at 8:32 AM, Robert Annewandter <robert.annewand...@opengosim.com> wrote:
Hi, 

Is there a way to fine tune '-ksp_view_mat binary' so it only gives me an output at a defined time? 

I'm implementing a Two-Stage Preconditioner for PFLOTRAN (a subsurface flow and transport simulator which uses PETsc). Currently I like to compare convergence of the preconditioning in PFLOTRAN against a prototyped version in MATLAB. In PFLOTRAN  I export the Jacobian, and residual before they get handed-over to PETsc, which get imported in MATLAB to do the Two-Stage Preconditioning. 

However, I know I can do exports closer to PETsc with the options '-ksp_view_mat binary' and '-ksp_view_rhs binary'. Unfortunately I get outputs for the whole simulation in one file, instead of only at a well-defined Newton-Raphson iteration (as done in PFLOTRAN). 

Is there a way to tell PETsc with above options when to export the Jacobian and residual?

Any pointers much appreciated!

It sounds like you could give the particular solve you want to look at a prefix:

  http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPSetOptionsPrefix.html

and then give

  -myprefix_ksp_view_mat binary

  Thanks,

 Matt
 
Thank you!
Robert



-- 
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

  
  



  



[petsc-users] Control Output of '-ksp_view_mat binary'

2016-11-17 Thread Robert Annewandter

  
  
Hi, 
  
  Is there a way to fine tune '-ksp_view_mat binary' so it only
  gives me an output at a defined time? 
  
  I'm implementing a Two-Stage Preconditioner for PFLOTRAN (a
  subsurface flow and transport simulator which uses PETsc).
  Currently I like to compare convergence of the preconditioning in
  PFLOTRAN against a prototyped version in MATLAB. In PFLOTRAN  I
  export the Jacobian, and residual before they get handed-over to
  PETsc, which get imported in MATLAB to do the Two-Stage
  Preconditioning. 
  
  However, I know I can do exports closer to PETsc with the options
  '-ksp_view_mat binary' and '-ksp_view_rhs binary'. Unfortunately I
  get outputs for the whole simulation in one file, instead of only
  at a well-defined Newton-Raphson iteration (as done in PFLOTRAN).
  
  
  Is there a way to tell PETsc with above options when to export
  the Jacobian and residual?
  
  Any pointers much appreciated!
  
  Thank you!
  Robert