Re: [petsc-users] CPR-AMG: SNES with two cores worse than with one

2017-07-07 Thread Robert Annewandter
Thank you Hong!

I've used GMRES via

mpirun \
  -n ${NP} pflotran \
  -pflotranin ${INPUTFILE}.pflinput \
  -flow_ksp_type gmres \
  -flow_pc_type bjacobi \
  -flow_sub_pc_type lu \
  -flow_sub_pc_factor_nonzeros_along_diagonal \
  -snes_monitor

and get:

NP 1

FLOW TS BE steps = 43 newton =   43 linear = 43 cuts
=  0
FLOW TS BE Wasted Linear Iterations = 0
FLOW TS BE SNES time = 197.0 seconds

NP 2

FLOW TS BE steps = 43 newton =   43 linear =770 cuts
=  0
FLOW TS BE Wasted Linear Iterations = 0
FLOW TS BE SNES time = 68.7 seconds

Which looks ok to me.

Robert



On 07/07/17 15:49, h...@aspiritech.org wrote:
> What do you get with '-ksp_type gmres' or '-ksp_type bcgs' in parallel
> runs?
> Hong
>
> On Fri, Jul 7, 2017 at 6:05 AM, Robert Annewandter
>  > wrote:
>
> Yes indeed, PFLOTRAN cuts timestep after 8 failed iterations of SNES.
>
> I've rerun with -snes_monitor (attached with canonical suffix),
> their -pc_type is always PCBJACOBI + PCLU (though we'd like to try
> SUPERLU in the future, however it works only with -mat_type aij..)
>
>
> The sequential and parallel runs I did  with
>  
> -ksp_type preonly -pc_type lu
> -pc_factor_nonzeros_along_diagonal -snes_monitor
>
> and
>
> -ksp_type preonly -pc_type bjacobi -sub_pc_type lu
> -sub_pc_factor_nonzeros_along_diagonal -snes_monitor
>
> As expected, the sequential are bot identical and the parallel
> takes half the time compared to sequential.
>
>
>
>
> On 07/07/17 01:20, Barry Smith wrote:
>>Looks like PFLOTRAN has a maximum number of SNES iterations as 8 and 
>> cuts the timestep if that fails.
>>
>>Please run with -snes_monitor I don't understand the strange densely 
>> packed information that PFLOTRAN is printing.
>>
>>It looks like the linear solver is converging fine in parallel, 
>> normally then there is absolutely no reason that the Newton should behave 
>> different on 2 processors than 1 unless there is something wrong with the 
>> Jacobian. What is the -pc_type for the two cases LU or your fancy thing? 
>>
>>Please run sequential and parallel with -pc_type lu and also with 
>> -snes_monitor.  We need to fix all the knobs but one in order to understand 
>> what is going on.
>>
>>
>>Barry
>>
>>
>>   
>>> On Jul 6, 2017, at 5:11 PM, Robert Annewandter 
>>> 
>>>  wrote:
>>>
>>> Thanks Barry!
>>>
>>> I've attached log files for np = 1 (SNES time: 218 s) and np = 2 (SNES 
>>> time: 600 s). PFLOTRAN final output:
>>>
>>> NP 1
>>>
>>> FLOW TS BE steps = 43 newton =   43 linear = 43 cuts =  
>>> 0
>>> FLOW TS BE Wasted Linear Iterations = 0
>>> FLOW TS BE SNES time = 218.9 seconds
>>>
>>> NP 2
>>>
>>> FLOW TS BE steps = 67 newton =  176 linear =314 cuts =  
>>>13
>>> FLOW TS BE Wasted Linear Iterations = 208
>>> FLOW TS BE SNES time = 600.0 seconds
>>>
>>>
>>> Robert
>>>
>>> On 06/07/17 21:24, Barry Smith wrote:
So on one process the outer linear solver takes a single iteration 
 this is because the block Jacobi with LU and one block is a direct solver.


> 11 KSP preconditioned resid norm 1.131868956745e+00 true resid 
> norm 1.526261825526e-05 ||r(i)||/||b|| 1.485509868409e-05
> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 
> 2.148515820410e-14 is less than relative tolerance 1.e-07 
> times initial right hand side norm 1.581814306485e-02 at iteration 1
> 1 KSP unpreconditioned resid norm 2.148515820410e-14 true resid 
> norm 2.148698024622e-14 ||r(i)||/||b|| 1.358375642332e-12
>
On two processes the outer linear solver takes a few iterations to 
 solver, this is to be expected. 

But what you sent doesn't give any indication about SNES not 
 converging. Please turn off all inner linear solver monitoring and just 
 run with -ksp_monitor_true_residual -snes_monitor -snes_lineseach_monitor 
 -snes_converged_reason

Barry




> On Jul 6, 2017, at 2:03 PM, Robert Annewandter 
> 
> 
>  wrote:
>
> Hi all,
>
> I like to understand why the SNES of my CPR-AMG Two-Stage 
> Preconditioner (with KSPFGMRES + multipl. PCComposite (PCGalerkin with 
> KSPGMRES + BoomerAMG, PCBJacobi + PCLU init) on a 24,000 x 24,000 matrix) 
> struggles to converge when using two cores instead of one. Because of the 
> adaptive time stepping of the Newton, this leads to severe cuts in time 
> step.
>
> This is how I run it with two cores
>

Re: [petsc-users] CPR-AMG: SNES with two cores worse than with one

2017-07-07 Thread Barry Smith

   I don't have a clue. It looks like the np 2 case just takes a different 
trajectory that runs into trouble that doesn't happen to np 1. Since the linear 
solves give very good convergence for both np 2 and np 1 I don't think the 
preconditioner is really the "problem".

   I absolutely do not like the fact that the code is not using a line search. 
If you look at even the sequential case SNES is sometimes stopping due to snorm 
even though the function norm has actually increased. Frankly I'm very 
suspicious of the "solution" of the time integration, it is too much "hit the 
engine a few times with the hammer until it does what it wants" engineering.

   What happens if you run the 1 and 2 process case with the "default" Pflotran 
linear solver? 

 
> On Jul 7, 2017, at 6:05 AM, Robert Annewandter 
>  wrote:
> 
> Yes indeed, PFLOTRAN cuts timestep after 8 failed iterations of SNES. 
> 
> I've rerun with -snes_monitor (attached with canonical suffix), their 
> -pc_type is always PCBJACOBI + PCLU (though we'd like to try SUPERLU in the 
> future, however it works only with -mat_type aij..)
> 
> 
> The sequential and parallel runs I did  with
>  
> -ksp_type preonly -pc_type lu -pc_factor_nonzeros_along_diagonal 
> -snes_monitor
> 
> and 
> 
> -ksp_type preonly -pc_type bjacobi -sub_pc_type lu 
> -sub_pc_factor_nonzeros_along_diagonal -snes_monitor
> 
> As expected, the sequential are bot identical and the parallel takes half the 
> time compared to sequential.
> 
> 
> 
> 
> On 07/07/17 01:20, Barry Smith wrote:
>>Looks like PFLOTRAN has a maximum number of SNES iterations as 8 and cuts 
>> the timestep if that fails.
>> 
>>Please run with -snes_monitor I don't understand the strange densely 
>> packed information that PFLOTRAN is printing.
>> 
>>It looks like the linear solver is converging fine in parallel, normally 
>> then there is absolutely no reason that the Newton should behave different 
>> on 2 processors than 1 unless there is something wrong with the Jacobian. 
>> What is the -pc_type for the two cases LU or your fancy thing? 
>> 
>>Please run sequential and parallel with -pc_type lu and also with 
>> -snes_monitor.  We need to fix all the knobs but one in order to understand 
>> what is going on.
>> 
>> 
>>Barry
>> 
>> 
>>   
>> 
>>> On Jul 6, 2017, at 5:11 PM, Robert Annewandter 
>>> 
>>>  wrote:
>>> 
>>> Thanks Barry!
>>> 
>>> I've attached log files for np = 1 (SNES time: 218 s) and np = 2 (SNES 
>>> time: 600 s). PFLOTRAN final output:
>>> 
>>> NP 1
>>> 
>>> FLOW TS BE steps = 43 newton =   43 linear = 43 cuts =  >>> 0
>>> FLOW TS BE Wasted Linear Iterations = 0
>>> FLOW TS BE SNES time = 218.9 seconds
>>> 
>>> NP 2
>>> 
>>> FLOW TS BE steps = 67 newton =  176 linear =314 cuts = 
>>> 13
>>> FLOW TS BE Wasted Linear Iterations = 208
>>> FLOW TS BE SNES time = 600.0 seconds
>>> 
>>> 
>>> Robert
>>> 
>>> On 06/07/17 21:24, Barry Smith wrote:
>>> 
So on one process the outer linear solver takes a single iteration this 
 is because the block Jacobi with LU and one block is a direct solver.
 
 
 
> 11 KSP preconditioned resid norm 1.131868956745e+00 true resid norm 
> 1.526261825526e-05 ||r(i)||/||b|| 1.485509868409e-05
> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 
> 2.148515820410e-14 is less than relative tolerance 1.e-07 
> times initial right hand side norm 1.581814306485e-02 at iteration 1
> 1 KSP unpreconditioned resid norm 2.148515820410e-14 true resid norm 
> 2.148698024622e-14 ||r(i)||/||b|| 1.358375642332e-12
> 
> 
On two processes the outer linear solver takes a few iterations to 
 solver, this is to be expected. 
 
But what you sent doesn't give any indication about SNES not 
 converging. Please turn off all inner linear solver monitoring and just 
 run with -ksp_monitor_true_residual -snes_monitor -snes_lineseach_monitor 
 -snes_converged_reason
 
Barry
 
 
 
 
 
> On Jul 6, 2017, at 2:03 PM, Robert Annewandter 
> 
> 
>  wrote:
> 
> Hi all,
> 
> I like to understand why the SNES of my CPR-AMG Two-Stage Preconditioner 
> (with KSPFGMRES + multipl. PCComposite (PCGalerkin with KSPGMRES + 
> BoomerAMG, PCBJacobi + PCLU init) on a 24,000 x 24,000 matrix) struggles 
> to converge when using two cores instead of one. Because of the adaptive 
> time stepping of the Newton, this leads to severe cuts in time step.
> 
> This is how I run it with two cores
> 
> mpirun \
>   -n 2 pflotran \
>   -pflotranin het.pflinput \
>   -ksp_monitor_true_residual \
>   -flow_snes_view \
>   -flow_snes_converged_reason \
>   -flow_sub_1_pc_type bjacobi \
>   

Re: [petsc-users] CPR-AMG: SNES with two cores worse than with one

2017-07-07 Thread h...@aspiritech.org
What do you get with '-ksp_type gmres' or '-ksp_type bcgs' in parallel runs?
Hong

On Fri, Jul 7, 2017 at 6:05 AM, Robert Annewandter <
robert.annewand...@opengosim.com> wrote:

> Yes indeed, PFLOTRAN cuts timestep after 8 failed iterations of SNES.
>
> I've rerun with -snes_monitor (attached with canonical suffix), their
> -pc_type is always PCBJACOBI + PCLU (though we'd like to try SUPERLU in the
> future, however it works only with -mat_type aij..)
>
>
> The sequential and parallel runs I did  with
>
> -ksp_type preonly -pc_type lu -pc_factor_nonzeros_along_diagonal
> -snes_monitor
>
> and
>
> -ksp_type preonly -pc_type bjacobi -sub_pc_type lu
> -sub_pc_factor_nonzeros_along_diagonal -snes_monitor
>
> As expected, the sequential are bot identical and the parallel takes half
> the time compared to sequential.
>
>
>
>
> On 07/07/17 01:20, Barry Smith wrote:
>
>Looks like PFLOTRAN has a maximum number of SNES iterations as 8 and cuts 
> the timestep if that fails.
>
>Please run with -snes_monitor I don't understand the strange densely 
> packed information that PFLOTRAN is printing.
>
>It looks like the linear solver is converging fine in parallel, normally 
> then there is absolutely no reason that the Newton should behave different on 
> 2 processors than 1 unless there is something wrong with the Jacobian. What 
> is the -pc_type for the two cases LU or your fancy thing?
>
>Please run sequential and parallel with -pc_type lu and also with 
> -snes_monitor.  We need to fix all the knobs but one in order to understand 
> what is going on.
>
>
>Barry
>
>
>
>
> On Jul 6, 2017, at 5:11 PM, Robert Annewandter 
>   wrote:
>
> Thanks Barry!
>
> I've attached log files for np = 1 (SNES time: 218 s) and np = 2 (SNES time: 
> 600 s). PFLOTRAN final output:
>
> NP 1
>
> FLOW TS BE steps = 43 newton =   43 linear = 43 cuts =  0
> FLOW TS BE Wasted Linear Iterations = 0
> FLOW TS BE SNES time = 218.9 seconds
>
> NP 2
>
> FLOW TS BE steps = 67 newton =  176 linear =314 cuts = 13
> FLOW TS BE Wasted Linear Iterations = 208
> FLOW TS BE SNES time = 600.0 seconds
>
>
> Robert
>
> On 06/07/17 21:24, Barry Smith wrote:
>
>So on one process the outer linear solver takes a single iteration this is 
> because the block Jacobi with LU and one block is a direct solver.
>
>
>
> 11 KSP preconditioned resid norm 1.131868956745e+00 true resid norm 
> 1.526261825526e-05 ||r(i)||/||b|| 1.485509868409e-05
> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 
> 2.148515820410e-14 is less than relative tolerance 1.e-07 times 
> initial right hand side norm 1.581814306485e-02 at iteration 1
> 1 KSP unpreconditioned resid norm 2.148515820410e-14 true resid norm 
> 2.148698024622e-14 ||r(i)||/||b|| 1.358375642332e-12
>
>
>On two processes the outer linear solver takes a few iterations to solver, 
> this is to be expected.
>
>But what you sent doesn't give any indication about SNES not converging. 
> Please turn off all inner linear solver monitoring and just run with 
> -ksp_monitor_true_residual -snes_monitor -snes_lineseach_monitor 
> -snes_converged_reason
>
>Barry
>
>
>
>
>
> On Jul 6, 2017, at 2:03 PM, Robert Annewandter 
>  
>  wrote:
>
> Hi all,
>
> I like to understand why the SNES of my CPR-AMG Two-Stage Preconditioner 
> (with KSPFGMRES + multipl. PCComposite (PCGalerkin with KSPGMRES + BoomerAMG, 
> PCBJacobi + PCLU init) on a 24,000 x 24,000 matrix) struggles to converge 
> when using two cores instead of one. Because of the adaptive time stepping of 
> the Newton, this leads to severe cuts in time step.
>
> This is how I run it with two cores
>
> mpirun \
>   -n 2 pflotran \
>   -pflotranin het.pflinput \
>   -ksp_monitor_true_residual \
>   -flow_snes_view \
>   -flow_snes_converged_reason \
>   -flow_sub_1_pc_type bjacobi \
>   -flow_sub_1_sub_pc_type lu \
>   -flow_sub_1_sub_pc_factor_pivot_in_blocks true\
>   -flow_sub_1_sub_pc_factor_nonzeros_along_diagonal \
>   -options_left \
>   -log_summary \
>   -info
>
>
> With one core I get (after grepping the crap away from -info):
>
>  Step 32 Time=  1.8E+01
>
> [...]
>
>   0 2r: 1.58E-02 2x: 0.00E+00 2u: 0.00E+00 ir: 7.18E-03 iu: 0.00E+00 rsn:   0
> [0] SNESComputeJacobian(): Rebuilding preconditioner
> Residual norms for flow_ solve.
> 0 KSP unpreconditioned resid norm 1.581814306485e-02 true resid norm 
> 1.581814306485e-02 ||r(i)||/||b|| 1.e+00
>   Residual norms for flow_sub_0_galerkin_ solve.
>   0 KSP preconditioned resid norm 5.697603110484e+07 true resid norm 
> 5.175721849125e+03 ||r(i)||/||b|| 5.037527476892e+03
>   1 KSP preconditioned resid norm 5.041509073319e+06 true resid norm 
> 3.251596928176e+02 ||r(i)||/||b|| 3.164777657484e+02
>   2 KSP preconditioned resid 

Re: [petsc-users] CPR-AMG: SNES with two cores worse than with one

2017-07-06 Thread Barry Smith

   Looks like PFLOTRAN has a maximum number of SNES iterations as 8 and cuts 
the timestep if that fails.

   Please run with -snes_monitor I don't understand the strange densely packed 
information that PFLOTRAN is printing.

   It looks like the linear solver is converging fine in parallel, normally 
then there is absolutely no reason that the Newton should behave different on 2 
processors than 1 unless there is something wrong with the Jacobian. What is 
the -pc_type for the two cases LU or your fancy thing? 

   Please run sequential and parallel with -pc_type lu and also with 
-snes_monitor.  We need to fix all the knobs but one in order to understand 
what is going on.


   Barry


  
> On Jul 6, 2017, at 5:11 PM, Robert Annewandter 
>  wrote:
> 
> Thanks Barry!
> 
> I've attached log files for np = 1 (SNES time: 218 s) and np = 2 (SNES time: 
> 600 s). PFLOTRAN final output:
> 
> NP 1
> 
> FLOW TS BE steps = 43 newton =   43 linear = 43 cuts =  0
> FLOW TS BE Wasted Linear Iterations = 0
> FLOW TS BE SNES time = 218.9 seconds
> 
> NP 2
> 
> FLOW TS BE steps = 67 newton =  176 linear =314 cuts = 13
> FLOW TS BE Wasted Linear Iterations = 208
> FLOW TS BE SNES time = 600.0 seconds
> 
> 
> Robert
> 
> On 06/07/17 21:24, Barry Smith wrote:
>>So on one process the outer linear solver takes a single iteration this 
>> is because the block Jacobi with LU and one block is a direct solver.
>> 
>> 
>>> 11 KSP preconditioned resid norm 1.131868956745e+00 true resid norm 
>>> 1.526261825526e-05 ||r(i)||/||b|| 1.485509868409e-05
>>> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 
>>> 2.148515820410e-14 is less than relative tolerance 1.e-07 times 
>>> initial right hand side norm 1.581814306485e-02 at iteration 1
>>> 1 KSP unpreconditioned resid norm 2.148515820410e-14 true resid norm 
>>> 2.148698024622e-14 ||r(i)||/||b|| 1.358375642332e-12
>>> 
>> 
>>On two processes the outer linear solver takes a few iterations to 
>> solver, this is to be expected. 
>> 
>>But what you sent doesn't give any indication about SNES not converging. 
>> Please turn off all inner linear solver monitoring and just run with 
>> -ksp_monitor_true_residual -snes_monitor -snes_lineseach_monitor 
>> -snes_converged_reason
>> 
>>Barry
>> 
>> 
>> 
>> 
>>> On Jul 6, 2017, at 2:03 PM, Robert Annewandter 
>>> 
>>>  wrote:
>>> 
>>> Hi all,
>>> 
>>> I like to understand why the SNES of my CPR-AMG Two-Stage Preconditioner 
>>> (with KSPFGMRES + multipl. PCComposite (PCGalerkin with KSPGMRES + 
>>> BoomerAMG, PCBJacobi + PCLU init) on a 24,000 x 24,000 matrix) struggles to 
>>> converge when using two cores instead of one. Because of the adaptive time 
>>> stepping of the Newton, this leads to severe cuts in time step.
>>> 
>>> This is how I run it with two cores
>>> 
>>> mpirun \
>>>   -n 2 pflotran \
>>>   -pflotranin het.pflinput \
>>>   -ksp_monitor_true_residual \
>>>   -flow_snes_view \
>>>   -flow_snes_converged_reason \
>>>   -flow_sub_1_pc_type bjacobi \
>>>   -flow_sub_1_sub_pc_type lu \
>>>   -flow_sub_1_sub_pc_factor_pivot_in_blocks true\
>>>   -flow_sub_1_sub_pc_factor_nonzeros_along_diagonal \
>>>   -options_left \
>>>   -log_summary \
>>>   -info 
>>> 
>>> 
>>> With one core I get (after grepping the crap away from -info):
>>> 
>>>  Step 32 Time=  1.8E+01 
>>> 
>>> [...]
>>> 
>>>   0 2r: 1.58E-02 2x: 0.00E+00 2u: 0.00E+00 ir: 7.18E-03 iu: 0.00E+00 rsn:   >>> 0
>>> [0] SNESComputeJacobian(): Rebuilding preconditioner
>>> Residual norms for flow_ solve.
>>> 0 KSP unpreconditioned resid norm 1.581814306485e-02 true resid norm 
>>> 1.581814306485e-02 ||r(i)||/||b|| 1.e+00
>>>   Residual norms for flow_sub_0_galerkin_ solve.
>>>   0 KSP preconditioned resid norm 5.697603110484e+07 true resid norm 
>>> 5.175721849125e+03 ||r(i)||/||b|| 5.037527476892e+03
>>>   1 KSP preconditioned resid norm 5.041509073319e+06 true resid norm 
>>> 3.251596928176e+02 ||r(i)||/||b|| 3.164777657484e+02
>>>   2 KSP preconditioned resid norm 1.043761838360e+06 true resid norm 
>>> 8.957519558348e+01 ||r(i)||/||b|| 8.718349288342e+01
>>>   3 KSP preconditioned resid norm 1.129189815646e+05 true resid norm 
>>> 2.722436912053e+00 ||r(i)||/||b|| 2.649746479496e+00
>>>   4 KSP preconditioned resid norm 8.829637298082e+04 true resid norm 
>>> 8.026373593492e+00 ||r(i)||/||b|| 7.812065388300e+00
>>>   5 KSP preconditioned resid norm 6.506021637694e+04 true resid norm 
>>> 3.479889319880e+00 ||r(i)||/||b|| 3.386974527698e+00
>>>   6 KSP preconditioned resid norm 6.392263200180e+04 true resid norm 
>>> 3.819202631980e+00 ||r(i)||/||b|| 3.717228003987e+00
>>>   7 KSP preconditioned resid norm 2.464946645480e+04 true resid norm 
>>> 7.329964753388e-01 ||r(i)||/||b|| 7.134251013911e-01
>>>   8 KSP preconditioned resid norm 

Re: [petsc-users] CPR-AMG: SNES with two cores worse than with one

2017-07-06 Thread Barry Smith

   So on one process the outer linear solver takes a single iteration this is 
because the block Jacobi with LU and one block is a direct solver.

> 11 KSP preconditioned resid norm 1.131868956745e+00 true resid norm 
> 1.526261825526e-05 ||r(i)||/||b|| 1.485509868409e-05
> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 
> 2.148515820410e-14 is less than relative tolerance 1.e-07 times 
> initial right hand side norm 1.581814306485e-02 at iteration 1
> 1 KSP unpreconditioned resid norm 2.148515820410e-14 true resid norm 
> 2.148698024622e-14 ||r(i)||/||b|| 1.358375642332e-12


   On two processes the outer linear solver takes a few iterations to solver, 
this is to be expected. 

   But what you sent doesn't give any indication about SNES not converging. 
Please turn off all inner linear solver monitoring and just run with 
-ksp_monitor_true_residual -snes_monitor -snes_lineseach_monitor 
-snes_converged_reason

   Barry



> On Jul 6, 2017, at 2:03 PM, Robert Annewandter 
>  wrote:
> 
> Hi all,
> 
> I like to understand why the SNES of my CPR-AMG Two-Stage Preconditioner 
> (with KSPFGMRES + multipl. PCComposite (PCGalerkin with KSPGMRES + BoomerAMG, 
> PCBJacobi + PCLU init) on a 24,000 x 24,000 matrix) struggles to converge 
> when using two cores instead of one. Because of the adaptive time stepping of 
> the Newton, this leads to severe cuts in time step.
> 
> This is how I run it with two cores
> 
> mpirun \
>   -n 2 pflotran \
>   -pflotranin het.pflinput \
>   -ksp_monitor_true_residual \
>   -flow_snes_view \
>   -flow_snes_converged_reason \
>   -flow_sub_1_pc_type bjacobi \
>   -flow_sub_1_sub_pc_type lu \
>   -flow_sub_1_sub_pc_factor_pivot_in_blocks true\
>   -flow_sub_1_sub_pc_factor_nonzeros_along_diagonal \
>   -options_left \
>   -log_summary \
>   -info 
> 
> 
> With one core I get (after grepping the crap away from -info):
> 
>  Step 32 Time=  1.8E+01 
> 
> [...]
> 
>   0 2r: 1.58E-02 2x: 0.00E+00 2u: 0.00E+00 ir: 7.18E-03 iu: 0.00E+00 rsn:   0
> [0] SNESComputeJacobian(): Rebuilding preconditioner
> Residual norms for flow_ solve.
> 0 KSP unpreconditioned resid norm 1.581814306485e-02 true resid norm 
> 1.581814306485e-02 ||r(i)||/||b|| 1.e+00
>   Residual norms for flow_sub_0_galerkin_ solve.
>   0 KSP preconditioned resid norm 5.697603110484e+07 true resid norm 
> 5.175721849125e+03 ||r(i)||/||b|| 5.037527476892e+03
>   1 KSP preconditioned resid norm 5.041509073319e+06 true resid norm 
> 3.251596928176e+02 ||r(i)||/||b|| 3.164777657484e+02
>   2 KSP preconditioned resid norm 1.043761838360e+06 true resid norm 
> 8.957519558348e+01 ||r(i)||/||b|| 8.718349288342e+01
>   3 KSP preconditioned resid norm 1.129189815646e+05 true resid norm 
> 2.722436912053e+00 ||r(i)||/||b|| 2.649746479496e+00
>   4 KSP preconditioned resid norm 8.829637298082e+04 true resid norm 
> 8.026373593492e+00 ||r(i)||/||b|| 7.812065388300e+00
>   5 KSP preconditioned resid norm 6.506021637694e+04 true resid norm 
> 3.479889319880e+00 ||r(i)||/||b|| 3.386974527698e+00
>   6 KSP preconditioned resid norm 6.392263200180e+04 true resid norm 
> 3.819202631980e+00 ||r(i)||/||b|| 3.717228003987e+00
>   7 KSP preconditioned resid norm 2.464946645480e+04 true resid norm 
> 7.329964753388e-01 ||r(i)||/||b|| 7.134251013911e-01
>   8 KSP preconditioned resid norm 2.603879153772e+03 true resid norm 
> 2.035525412004e-02 ||r(i)||/||b|| 1.981175861414e-02
>   9 KSP preconditioned resid norm 1.774410462754e+02 true resid norm 
> 3.001214973121e-03 ||r(i)||/||b|| 2.921081026352e-03
> 10 KSP preconditioned resid norm 1.664227038378e+01 true resid norm 
> 3.413136309181e-04 ||r(i)||/||b|| 3.322003855903e-04
> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 
> 1.131868956745e+00 is less than relative tolerance 1.e-07 times 
> initial right hand side norm 2.067297386780e+07 at iteration 11
> 11 KSP preconditioned resid norm 1.131868956745e+00 true resid norm 
> 1.526261825526e-05 ||r(i)||/||b|| 1.485509868409e-05
> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 
> 2.148515820410e-14 is less than relative tolerance 1.e-07 times 
> initial right hand side norm 1.581814306485e-02 at iteration 1
> 1 KSP unpreconditioned resid norm 2.148515820410e-14 true resid norm 
> 2.148698024622e-14 ||r(i)||/||b|| 1.358375642332e-12
> [0] SNESSolve_NEWTONLS(): iter=0, linear solve iterations=1
> [0] SNESNEWTONLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX|| 
> 3.590873180642e-01 near zero implies inconsistent rhs
> [0] SNESSolve_NEWTONLS(): fnorm=1.5818143064846742e-02, 
> gnorm=1.0695649833687331e-02, ynorm=4.6826522561266171e+02, lssucceed=0
> [0] SNESConvergedDefault(): Converged due to small update length: 
> 4.682652256127e+02 < 1.e-05 * 3.702480426117e+09
>   1 2r: 1.07E-02 2x: 3.70E+09 2u: 4.68E+02 ir: 

[petsc-users] CPR-AMG: SNES with two cores worse than with one

2017-07-06 Thread Robert Annewandter
Hi all,

I like to understand why the SNES of my CPR-AMG Two-Stage Preconditioner
(with KSPFGMRES + multipl. PCComposite (PCGalerkin with KSPGMRES +
BoomerAMG, PCBJacobi + PCLU init) on a 24,000 x 24,000 matrix) struggles
to converge when using two cores instead of one. Because of the adaptive
time stepping of the Newton, this leads to severe cuts in time step.

This is how I run it with two cores

mpirun \
  -n 2 pflotran \
  -pflotranin het.pflinput \
  -ksp_monitor_true_residual \
  -flow_snes_view \
  -flow_snes_converged_reason \
  -flow_sub_1_pc_type bjacobi \
  -flow_sub_1_sub_pc_type lu \
  -flow_sub_1_sub_pc_factor_pivot_in_blocks true\
  -flow_sub_1_sub_pc_factor_nonzeros_along_diagonal \
  -options_left \
  -log_summary \
  -info


With one core I get (after grepping the crap away from -info):

 Step 32 Time=  1.8E+01

[...]

  0 2r: 1.58E-02 2x: 0.00E+00 2u: 0.00E+00 ir: 7.18E-03 iu: 0.00E+00
rsn:   0
[0] SNESComputeJacobian(): Rebuilding preconditioner
Residual norms for flow_ solve.
0 KSP unpreconditioned resid norm 1.581814306485e-02 true resid norm
1.581814306485e-02 ||r(i)||/||b|| 1.e+00
  Residual norms for flow_sub_0_galerkin_ solve.
  0 KSP preconditioned resid norm 5.697603110484e+07 true resid norm
5.175721849125e+03 ||r(i)||/||b|| 5.037527476892e+03
  1 KSP preconditioned resid norm 5.041509073319e+06 true resid norm
3.251596928176e+02 ||r(i)||/||b|| 3.164777657484e+02
  2 KSP preconditioned resid norm 1.043761838360e+06 true resid norm
8.957519558348e+01 ||r(i)||/||b|| 8.718349288342e+01
  3 KSP preconditioned resid norm 1.129189815646e+05 true resid norm
2.722436912053e+00 ||r(i)||/||b|| 2.649746479496e+00
  4 KSP preconditioned resid norm 8.829637298082e+04 true resid norm
8.026373593492e+00 ||r(i)||/||b|| 7.812065388300e+00
  5 KSP preconditioned resid norm 6.506021637694e+04 true resid norm
3.479889319880e+00 ||r(i)||/||b|| 3.386974527698e+00
  6 KSP preconditioned resid norm 6.392263200180e+04 true resid norm
3.819202631980e+00 ||r(i)||/||b|| 3.717228003987e+00
  7 KSP preconditioned resid norm 2.464946645480e+04 true resid norm
7.329964753388e-01 ||r(i)||/||b|| 7.134251013911e-01
  8 KSP preconditioned resid norm 2.603879153772e+03 true resid norm
2.035525412004e-02 ||r(i)||/||b|| 1.981175861414e-02
  9 KSP preconditioned resid norm 1.774410462754e+02 true resid norm
3.001214973121e-03 ||r(i)||/||b|| 2.921081026352e-03
10 KSP preconditioned resid norm 1.664227038378e+01 true resid norm
3.413136309181e-04 ||r(i)||/||b|| 3.322003855903e-04
[0] KSPConvergedDefault(): Linear solver has converged. Residual norm
1.131868956745e+00 is less than relative tolerance 1.e-07
times initial right hand side norm 2.067297386780e+07 at iteration 11
11 KSP preconditioned resid norm 1.131868956745e+00 true resid norm
1.526261825526e-05 ||r(i)||/||b|| 1.485509868409e-05
[0] KSPConvergedDefault(): Linear solver has converged. Residual norm
2.148515820410e-14 is less than relative tolerance 1.e-07
times initial right hand side norm 1.581814306485e-02 at iteration 1
1 KSP unpreconditioned resid norm 2.148515820410e-14 true resid norm
2.148698024622e-14 ||r(i)||/||b|| 1.358375642332e-12
[0] SNESSolve_NEWTONLS(): iter=0, linear solve iterations=1
[0] SNESNEWTONLSCheckResidual_Private(): ||J^T(F-Ax)||/||F-AX||
3.590873180642e-01 near zero implies inconsistent rhs
[0] SNESSolve_NEWTONLS(): fnorm=1.5818143064846742e-02,
gnorm=1.0695649833687331e-02, ynorm=4.6826522561266171e+02, lssucceed=0
[0] SNESConvergedDefault(): Converged due to small update length:
4.682652256127e+02 < 1.e-05 * 3.702480426117e+09
  1 2r: 1.07E-02 2x: 3.70E+09 2u: 4.68E+02 ir: 5.05E-03 iu: 4.77E+01
rsn: stol
Nonlinear flow_ solve converged due to CONVERGED_SNORM_RELATIVE iterations 1



But with two cores I get:


 Step 32 Time=  1.8E+01

[...]

  0 2r: 6.16E-03 2x: 0.00E+00 2u: 0.00E+00 ir: 3.63E-03 iu: 0.00E+00
rsn:   0
[0] SNESComputeJacobian(): Rebuilding preconditioner

Residual norms for flow_ solve.
0 KSP unpreconditioned resid norm 6.162760088924e-03 true resid norm
6.162760088924e-03 ||r(i)||/||b|| 1.e+00
  Residual norms for flow_sub_0_galerkin_ solve.
  0 KSP preconditioned resid norm 8.994949630499e+08 true resid norm
7.982144380936e-01 ||r(i)||/||b|| 1.e+00
  1 KSP preconditioned resid norm 8.950556502615e+08 true resid norm
1.550138696155e+00 ||r(i)||/||b|| 1.942007839218e+00
  2 KSP preconditioned resid norm 1.044849684205e+08 true resid norm
2.166193480531e+00 ||r(i)||/||b|| 2.713798920631e+00
  3 KSP preconditioned resid norm 8.209708619718e+06 true resid norm
3.076045005154e-01 ||r(i)||/||b|| 3.853657436340e-01
  4 KSP preconditioned resid norm 3.027461352422e+05 true resid norm
1.207731865714e-02 ||r(i)||/||b|| 1.513041869549e-02
  5 KSP preconditioned resid norm 1.595302164817e+04 true resid norm
4.123713694368e-04