Re: [petsc-users] CPR-AMG: SNES with two cores worse than with one

2017-07-07 Thread Robert Annewandter
Thank you Hong!

I've used GMRES via

mpirun \
  -n ${NP} pflotran \
  -pflotranin ${INPUTFILE}.pflinput \
  -flow_ksp_type gmres \
  -flow_pc_type bjacobi \
  -flow_sub_pc_type lu \
  -flow_sub_pc_factor_nonzeros_along_diagonal \
  -snes_monitor

and get:

NP 1

FLOW TS BE steps = 43 newton =   43 linear = 43 cuts
=  0
FLOW TS BE Wasted Linear Iterations = 0
FLOW TS BE SNES time = 197.0 seconds

NP 2

FLOW TS BE steps = 43 newton =   43 linear =770 cuts
=  0
FLOW TS BE Wasted Linear Iterations = 0
FLOW TS BE SNES time = 68.7 seconds

Which looks ok to me.

Robert



On 07/07/17 15:49, h...@aspiritech.org wrote:
> What do you get with '-ksp_type gmres' or '-ksp_type bcgs' in parallel
> runs?
> Hong
>
> On Fri, Jul 7, 2017 at 6:05 AM, Robert Annewandter
>  > wrote:
>
> Yes indeed, PFLOTRAN cuts timestep after 8 failed iterations of SNES.
>
> I've rerun with -snes_monitor (attached with canonical suffix),
> their -pc_type is always PCBJACOBI + PCLU (though we'd like to try
> SUPERLU in the future, however it works only with -mat_type aij..)
>
>
> The sequential and parallel runs I did  with
>  
> -ksp_type preonly -pc_type lu
> -pc_factor_nonzeros_along_diagonal -snes_monitor
>
> and
>
> -ksp_type preonly -pc_type bjacobi -sub_pc_type lu
> -sub_pc_factor_nonzeros_along_diagonal -snes_monitor
>
> As expected, the sequential are bot identical and the parallel
> takes half the time compared to sequential.
>
>
>
>
> On 07/07/17 01:20, Barry Smith wrote:
>>Looks like PFLOTRAN has a maximum number of SNES iterations as 8 and 
>> cuts the timestep if that fails.
>>
>>Please run with -snes_monitor I don't understand the strange densely 
>> packed information that PFLOTRAN is printing.
>>
>>It looks like the linear solver is converging fine in parallel, 
>> normally then there is absolutely no reason that the Newton should behave 
>> different on 2 processors than 1 unless there is something wrong with the 
>> Jacobian. What is the -pc_type for the two cases LU or your fancy thing? 
>>
>>Please run sequential and parallel with -pc_type lu and also with 
>> -snes_monitor.  We need to fix all the knobs but one in order to understand 
>> what is going on.
>>
>>
>>Barry
>>
>>
>>   
>>> On Jul 6, 2017, at 5:11 PM, Robert Annewandter 
>>> 
>>>  wrote:
>>>
>>> Thanks Barry!
>>>
>>> I've attached log files for np = 1 (SNES time: 218 s) and np = 2 (SNES 
>>> time: 600 s). PFLOTRAN final output:
>>>
>>> NP 1
>>>
>>> FLOW TS BE steps = 43 newton =   43 linear = 43 cuts =  
>>> 0
>>> FLOW TS BE Wasted Linear Iterations = 0
>>> FLOW TS BE SNES time = 218.9 seconds
>>>
>>> NP 2
>>>
>>> FLOW TS BE steps = 67 newton =  176 linear =314 cuts =  
>>>13
>>> FLOW TS BE Wasted Linear Iterations = 208
>>> FLOW TS BE SNES time = 600.0 seconds
>>>
>>>
>>> Robert
>>>
>>> On 06/07/17 21:24, Barry Smith wrote:
So on one process the outer linear solver takes a single iteration 
 this is because the block Jacobi with LU and one block is a direct solver.


> 11 KSP preconditioned resid norm 1.131868956745e+00 true resid 
> norm 1.526261825526e-05 ||r(i)||/||b|| 1.485509868409e-05
> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 
> 2.148515820410e-14 is less than relative tolerance 1.e-07 
> times initial right hand side norm 1.581814306485e-02 at iteration 1
> 1 KSP unpreconditioned resid norm 2.148515820410e-14 true resid 
> norm 2.148698024622e-14 ||r(i)||/||b|| 1.358375642332e-12
>
On two processes the outer linear solver takes a few iterations to 
 solver, this is to be expected. 

But what you sent doesn't give any indication about SNES not 
 converging. Please turn off all inner linear solver monitoring and just 
 run with -ksp_monitor_true_residual -snes_monitor -snes_lineseach_monitor 
 -snes_converged_reason

Barry




> On Jul 6, 2017, at 2:03 PM, Robert Annewandter 
> 
> 
>  wrote:
>
> Hi all,
>
> I like to understand why the SNES of my CPR-AMG Two-Stage 
> Preconditioner (with KSPFGMRES + multipl. PCComposite (PCGalerkin with 
> KSPGMRES + BoomerAMG, PCBJacobi + PCLU init) on a 24,000 x 24,000 matrix) 
> struggles to converge when using two cores instead of one. Because of the 
> adaptive time stepping of the Newton, this leads to severe cuts in time 
> step.
>
> This is how I run it with two cores
>

Re: [petsc-users] TS for explicit equations

2017-07-07 Thread Lisandro Dalcin
On 6 July 2017 at 17:10, Alejandro D Otero  wrote:
>
> # Set rhs function
> ts.setRHSFunction(self.evalRHSFunction, self.g)
> (where evalRHSFunction has the form f(self, ts, t, Vort, f))
>

Please double check that the RHS function "outputs" the result in the
passed-in vector "f", my guess is that you are just returning a new
vector that is eventually ignored.


-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 0109
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


Re: [petsc-users] CPR-AMG: SNES with two cores worse than with one

2017-07-07 Thread Barry Smith

   I don't have a clue. It looks like the np 2 case just takes a different 
trajectory that runs into trouble that doesn't happen to np 1. Since the linear 
solves give very good convergence for both np 2 and np 1 I don't think the 
preconditioner is really the "problem".

   I absolutely do not like the fact that the code is not using a line search. 
If you look at even the sequential case SNES is sometimes stopping due to snorm 
even though the function norm has actually increased. Frankly I'm very 
suspicious of the "solution" of the time integration, it is too much "hit the 
engine a few times with the hammer until it does what it wants" engineering.

   What happens if you run the 1 and 2 process case with the "default" Pflotran 
linear solver? 

 
> On Jul 7, 2017, at 6:05 AM, Robert Annewandter 
>  wrote:
> 
> Yes indeed, PFLOTRAN cuts timestep after 8 failed iterations of SNES. 
> 
> I've rerun with -snes_monitor (attached with canonical suffix), their 
> -pc_type is always PCBJACOBI + PCLU (though we'd like to try SUPERLU in the 
> future, however it works only with -mat_type aij..)
> 
> 
> The sequential and parallel runs I did  with
>  
> -ksp_type preonly -pc_type lu -pc_factor_nonzeros_along_diagonal 
> -snes_monitor
> 
> and 
> 
> -ksp_type preonly -pc_type bjacobi -sub_pc_type lu 
> -sub_pc_factor_nonzeros_along_diagonal -snes_monitor
> 
> As expected, the sequential are bot identical and the parallel takes half the 
> time compared to sequential.
> 
> 
> 
> 
> On 07/07/17 01:20, Barry Smith wrote:
>>Looks like PFLOTRAN has a maximum number of SNES iterations as 8 and cuts 
>> the timestep if that fails.
>> 
>>Please run with -snes_monitor I don't understand the strange densely 
>> packed information that PFLOTRAN is printing.
>> 
>>It looks like the linear solver is converging fine in parallel, normally 
>> then there is absolutely no reason that the Newton should behave different 
>> on 2 processors than 1 unless there is something wrong with the Jacobian. 
>> What is the -pc_type for the two cases LU or your fancy thing? 
>> 
>>Please run sequential and parallel with -pc_type lu and also with 
>> -snes_monitor.  We need to fix all the knobs but one in order to understand 
>> what is going on.
>> 
>> 
>>Barry
>> 
>> 
>>   
>> 
>>> On Jul 6, 2017, at 5:11 PM, Robert Annewandter 
>>> 
>>>  wrote:
>>> 
>>> Thanks Barry!
>>> 
>>> I've attached log files for np = 1 (SNES time: 218 s) and np = 2 (SNES 
>>> time: 600 s). PFLOTRAN final output:
>>> 
>>> NP 1
>>> 
>>> FLOW TS BE steps = 43 newton =   43 linear = 43 cuts =  >>> 0
>>> FLOW TS BE Wasted Linear Iterations = 0
>>> FLOW TS BE SNES time = 218.9 seconds
>>> 
>>> NP 2
>>> 
>>> FLOW TS BE steps = 67 newton =  176 linear =314 cuts = 
>>> 13
>>> FLOW TS BE Wasted Linear Iterations = 208
>>> FLOW TS BE SNES time = 600.0 seconds
>>> 
>>> 
>>> Robert
>>> 
>>> On 06/07/17 21:24, Barry Smith wrote:
>>> 
So on one process the outer linear solver takes a single iteration this 
 is because the block Jacobi with LU and one block is a direct solver.
 
 
 
> 11 KSP preconditioned resid norm 1.131868956745e+00 true resid norm 
> 1.526261825526e-05 ||r(i)||/||b|| 1.485509868409e-05
> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 
> 2.148515820410e-14 is less than relative tolerance 1.e-07 
> times initial right hand side norm 1.581814306485e-02 at iteration 1
> 1 KSP unpreconditioned resid norm 2.148515820410e-14 true resid norm 
> 2.148698024622e-14 ||r(i)||/||b|| 1.358375642332e-12
> 
> 
On two processes the outer linear solver takes a few iterations to 
 solver, this is to be expected. 
 
But what you sent doesn't give any indication about SNES not 
 converging. Please turn off all inner linear solver monitoring and just 
 run with -ksp_monitor_true_residual -snes_monitor -snes_lineseach_monitor 
 -snes_converged_reason
 
Barry
 
 
 
 
 
> On Jul 6, 2017, at 2:03 PM, Robert Annewandter 
> 
> 
>  wrote:
> 
> Hi all,
> 
> I like to understand why the SNES of my CPR-AMG Two-Stage Preconditioner 
> (with KSPFGMRES + multipl. PCComposite (PCGalerkin with KSPGMRES + 
> BoomerAMG, PCBJacobi + PCLU init) on a 24,000 x 24,000 matrix) struggles 
> to converge when using two cores instead of one. Because of the adaptive 
> time stepping of the Newton, this leads to severe cuts in time step.
> 
> This is how I run it with two cores
> 
> mpirun \
>   -n 2 pflotran \
>   -pflotranin het.pflinput \
>   -ksp_monitor_true_residual \
>   -flow_snes_view \
>   -flow_snes_converged_reason \
>   -flow_sub_1_pc_type bjacobi \
>   

Re: [petsc-users] Understanding preallocation for MPI

2017-07-07 Thread Matthew Knepley
On Fri, Jul 7, 2017 at 2:49 AM, Dave May  wrote:

> On Fri, 7 Jul 2017 at 11:31, Florian Lindner  wrote:
>
>> Hello,
>>
>> I'm having some struggle understanding the preallocation for MPIAIJ
>> matrices, especially when a value is in off-diagonal
>> vs. diagonal block.
>>
>> The small example program is at https://pastebin.com/67dXnGm3
>>
>> In general it should be parallel, but right now I just run it in serial.
>
>
> When you run this code in serial, the mat type will be MATSEQAIJ. Hence,
> the call to MatMPIAIJSetPreallocation() will have no effect because the mat
> type does not match MPIAIJ. As a result, your code doesn't perform any
> preallocation for SEQAIJ matrices.
>
> In addition to calling MatMPIAIJSetPreallocation(), add a call to
> MatSEQAIJSetPreallocation.
>

To make it easier we now provide


http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatXAIJSetPreallocation.html

Matt


> Thanks,
>   Dave
>
>
>>
>> According to my understanding of
>>
>> http://www.mcs.anl.gov/petsc/petsc-3.7/docs/manualpages/
>> Mat/MatMPIAIJSetPreallocation.html
>>
>> a entry is in the diagonal submatrix, if its row is in the OwnershipRange
>> and its column is in OwnershipRangeColumn.
>> That also means that in a serial run, there is only a diagonal submatrix.
>>
>> However, having MAT_NEW_NONZERO_ALLOCATION_ERR set, I get an error when
>>
>> Inserting 6 elements in row 2, though I have exactly
>>
>> 2 o_nnz = 0, d_nnz = 6 (means 6 elements allocated in the diagonal
>> submatrix of row 2)
>>
>> Error is:
>>
>> [0]PETSC ERROR: Argument out of range
>> [0]PETSC ERROR: New nonzero at (2,5) caused a malloc
>> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn
>> off this check
>>
>>
>> What is wrong with my understanding?
>>
>> Thanks,
>> Florian
>>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

http://www.caam.rice.edu/~mk51/


Re: [petsc-users] -snes_fd for problems with residuals with non-continuous first derivative?

2017-07-07 Thread Barry Smith

  Send the output with the monitoring turned on -snes_monitor 
-snes_linesearch_monitor


> On Jul 7, 2017, at 4:54 AM, Maximilian Hartig  
> wrote:
> 
> If I set the function count higher it results in a diverged line search 
> error. And if I reduce the absolute residual tolerance it continues for one 
> or two more timesteps before it fails again.
> 
> Thanks,
> Max
> 
> 2017-07-05 20:24 GMT+02:00 Barry Smith :
> 
> >39 SNES Function norm 3.758101652998e-08
> >   0 KSP preconditioned resid norm 1.931834799209e-02 true resid norm 
> > 3.758101652998e-08 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 5.958979968059e-16 true resid norm 
> > 2.588244438237e-18 ||r(i)||/||b|| 6.887105983875e-11
> 
>The SNES function norm is pretty small here are you sure you need it 
> smaller.
> 
> > Nonlinear solve did not converge due to DIVERGED_FUNCTION_COUNT i
> 
>You should just set the function count to say 1 billion so it doesn't stop 
> the solution process.
> 
> 
> 
> 
> > On Jul 5, 2017, at 11:24 AM, Maximilian Hartig  
> > wrote:
> >
> > The solver goes on for some timesteps, then it gets hung up in step 10. The 
> > output is as follows:
> >
> >   28 SNES Function norm 7.825649542035e-06
> >   0 KSP preconditioned resid norm 5.335581451751e+00 true resid norm 
> > 7.825649542035e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 3.875747966342e-13 true resid norm 
> > 3.082821236889e-15 ||r(i)||/||b|| 3.939380648635e-10
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >29 SNES Function norm 7.825641650401e-06
> >   0 KSP preconditioned resid norm 5.335568660325e+00 true resid norm 
> > 7.825641650401e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 1.169064646930e-13 true resid norm 
> > 2.949237553452e-15 ||r(i)||/||b|| 3.768684646199e-10
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >30 SNES Function norm 7.825633758775e-06
> >   0 KSP preconditioned resid norm 5.335575714181e+00 true resid norm 
> > 7.825633758775e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 3.626528075820e-13 true resid norm 
> > 7.454935467844e-16 ||r(i)||/||b|| 9.526302530432e-11
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >31 SNES Function norm 7.825625867144e-06
> >   0 KSP preconditioned resid norm 5.335574289757e+00 true resid norm 
> > 7.825625867144e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 1.220021201426e-13 true resid norm 
> > 6.140701297052e-16 ||r(i)||/||b|| 7.846913973787e-11
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >32 SNES Function norm 7.825617975525e-06
> >   0 KSP preconditioned resid norm 5.335556211708e+00 true resid norm 
> > 7.825617975525e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 3.238615954344e-14 true resid norm 
> > 1.639170816829e-15 ||r(i)||/||b|| 2.094621564655e-10
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >33 SNES Function norm 7.825610083913e-06
> >   0 KSP preconditioned resid norm 5.335545135100e+00 true resid norm 
> > 7.825610083913e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 8.328043430360e-14 true resid norm 
> > 7.902728010647e-16 ||r(i)||/||b|| 1.009854557780e-10
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >34 SNES Function norm 7.825609294752e-06
> >   0 KSP preconditioned resid norm 5.335519330054e+00 true resid norm 
> > 7.825609294752e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 1.176339150741e-13 true resid norm 
> > 2.137240765649e-15 ||r(i)||/||b|| 2.731085446703e-10
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >35 SNES Function norm 7.825608505592e-06
> >   0 KSP preconditioned resid norm 5.335552250936e+00 true resid norm 
> > 7.825608505592e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 2.388837462586e-13 true resid norm 
> > 3.899327292367e-16 ||r(i)||/||b|| 4.982778386601e-11
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >36 SNES Function norm 7.825607716432e-06
> >   0 KSP preconditioned resid norm 1.017554264638e-01 true resid norm 
> > 7.825607716432e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 2.758703070887e-14 true resid norm 
> > 1.990491888342e-16 ||r(i)||/||b|| 2.543562059932e-11
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >37 SNES Function norm 3.815837655573e-08
> >   0 KSP preconditioned resid norm 7.197730150406e+00 true resid norm 
> > 3.815837655573e-08 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 7.114599045039e-12 true resid norm 
> > 

Re: [petsc-users] CPR-AMG: SNES with two cores worse than with one

2017-07-07 Thread h...@aspiritech.org
What do you get with '-ksp_type gmres' or '-ksp_type bcgs' in parallel runs?
Hong

On Fri, Jul 7, 2017 at 6:05 AM, Robert Annewandter <
robert.annewand...@opengosim.com> wrote:

> Yes indeed, PFLOTRAN cuts timestep after 8 failed iterations of SNES.
>
> I've rerun with -snes_monitor (attached with canonical suffix), their
> -pc_type is always PCBJACOBI + PCLU (though we'd like to try SUPERLU in the
> future, however it works only with -mat_type aij..)
>
>
> The sequential and parallel runs I did  with
>
> -ksp_type preonly -pc_type lu -pc_factor_nonzeros_along_diagonal
> -snes_monitor
>
> and
>
> -ksp_type preonly -pc_type bjacobi -sub_pc_type lu
> -sub_pc_factor_nonzeros_along_diagonal -snes_monitor
>
> As expected, the sequential are bot identical and the parallel takes half
> the time compared to sequential.
>
>
>
>
> On 07/07/17 01:20, Barry Smith wrote:
>
>Looks like PFLOTRAN has a maximum number of SNES iterations as 8 and cuts 
> the timestep if that fails.
>
>Please run with -snes_monitor I don't understand the strange densely 
> packed information that PFLOTRAN is printing.
>
>It looks like the linear solver is converging fine in parallel, normally 
> then there is absolutely no reason that the Newton should behave different on 
> 2 processors than 1 unless there is something wrong with the Jacobian. What 
> is the -pc_type for the two cases LU or your fancy thing?
>
>Please run sequential and parallel with -pc_type lu and also with 
> -snes_monitor.  We need to fix all the knobs but one in order to understand 
> what is going on.
>
>
>Barry
>
>
>
>
> On Jul 6, 2017, at 5:11 PM, Robert Annewandter 
>   wrote:
>
> Thanks Barry!
>
> I've attached log files for np = 1 (SNES time: 218 s) and np = 2 (SNES time: 
> 600 s). PFLOTRAN final output:
>
> NP 1
>
> FLOW TS BE steps = 43 newton =   43 linear = 43 cuts =  0
> FLOW TS BE Wasted Linear Iterations = 0
> FLOW TS BE SNES time = 218.9 seconds
>
> NP 2
>
> FLOW TS BE steps = 67 newton =  176 linear =314 cuts = 13
> FLOW TS BE Wasted Linear Iterations = 208
> FLOW TS BE SNES time = 600.0 seconds
>
>
> Robert
>
> On 06/07/17 21:24, Barry Smith wrote:
>
>So on one process the outer linear solver takes a single iteration this is 
> because the block Jacobi with LU and one block is a direct solver.
>
>
>
> 11 KSP preconditioned resid norm 1.131868956745e+00 true resid norm 
> 1.526261825526e-05 ||r(i)||/||b|| 1.485509868409e-05
> [0] KSPConvergedDefault(): Linear solver has converged. Residual norm 
> 2.148515820410e-14 is less than relative tolerance 1.e-07 times 
> initial right hand side norm 1.581814306485e-02 at iteration 1
> 1 KSP unpreconditioned resid norm 2.148515820410e-14 true resid norm 
> 2.148698024622e-14 ||r(i)||/||b|| 1.358375642332e-12
>
>
>On two processes the outer linear solver takes a few iterations to solver, 
> this is to be expected.
>
>But what you sent doesn't give any indication about SNES not converging. 
> Please turn off all inner linear solver monitoring and just run with 
> -ksp_monitor_true_residual -snes_monitor -snes_lineseach_monitor 
> -snes_converged_reason
>
>Barry
>
>
>
>
>
> On Jul 6, 2017, at 2:03 PM, Robert Annewandter 
>  
>  wrote:
>
> Hi all,
>
> I like to understand why the SNES of my CPR-AMG Two-Stage Preconditioner 
> (with KSPFGMRES + multipl. PCComposite (PCGalerkin with KSPGMRES + BoomerAMG, 
> PCBJacobi + PCLU init) on a 24,000 x 24,000 matrix) struggles to converge 
> when using two cores instead of one. Because of the adaptive time stepping of 
> the Newton, this leads to severe cuts in time step.
>
> This is how I run it with two cores
>
> mpirun \
>   -n 2 pflotran \
>   -pflotranin het.pflinput \
>   -ksp_monitor_true_residual \
>   -flow_snes_view \
>   -flow_snes_converged_reason \
>   -flow_sub_1_pc_type bjacobi \
>   -flow_sub_1_sub_pc_type lu \
>   -flow_sub_1_sub_pc_factor_pivot_in_blocks true\
>   -flow_sub_1_sub_pc_factor_nonzeros_along_diagonal \
>   -options_left \
>   -log_summary \
>   -info
>
>
> With one core I get (after grepping the crap away from -info):
>
>  Step 32 Time=  1.8E+01
>
> [...]
>
>   0 2r: 1.58E-02 2x: 0.00E+00 2u: 0.00E+00 ir: 7.18E-03 iu: 0.00E+00 rsn:   0
> [0] SNESComputeJacobian(): Rebuilding preconditioner
> Residual norms for flow_ solve.
> 0 KSP unpreconditioned resid norm 1.581814306485e-02 true resid norm 
> 1.581814306485e-02 ||r(i)||/||b|| 1.e+00
>   Residual norms for flow_sub_0_galerkin_ solve.
>   0 KSP preconditioned resid norm 5.697603110484e+07 true resid norm 
> 5.175721849125e+03 ||r(i)||/||b|| 5.037527476892e+03
>   1 KSP preconditioned resid norm 5.041509073319e+06 true resid norm 
> 3.251596928176e+02 ||r(i)||/||b|| 3.164777657484e+02
>   2 KSP preconditioned resid 

Re: [petsc-users] -snes_fd for problems with residuals with non-continuous first derivative?

2017-07-07 Thread Maximilian Hartig
We do have one second order timestepper (Newmark) put in by Lisandro

Great, I was not aware of that. I'll try to make use of it as soon as I
resolve this issue.

I have a suspicion that my problem is the fact that I'm updating my plastic
strain during the nonlinear solver iterations for the stress. I tried to
create an auxiliary field for the plastic strain and then update it at the
end of each timestep with PEtscDSSetUpdate. But the update function does
not seem to get called. Do I have to update the auxiliary field in a
TSMonitor context? I couldn't find any examples of how to update an
Auxiliary field.

Thanks,
Max


2017-07-05 18:46 GMT+02:00 Matthew Knepley :

> On Wed, Jul 5, 2017 at 9:39 AM, Maximilian Hartig <
> imilian.har...@gmail.com> wrote:
>
>> I do not clearly understand the discrimination between local and global
>> plasticity. I do have areas where I expect the behaviour to be elastic and
>> other areas where I expect elasto-plastic behaviour.
>> Inertia effects are of importance and hence I need second order temporal
>> derivatives of my displacements. The only way I have found to implement
>> this in Petsc is to create a separate velocity field which I use to then
>> compute ü.
>>
>
> We do have one second order timestepper (Newmark) put in by Lisandro:
>
>   http://www.mcs.anl.gov/petsc/petsc-current/docs/
> manualpages/TS/TSALPHA2.html
>
>
>> To account for plasticity, in my understanding I need to introduce at
>> least one additional history variable. In my case this is the effective
>> plastic strain e_p. I then solve the equation of motion
>> (grad(sigma)-rho*ü+F=0) and the consistency condition (sigma_eq -
>> sigma_yield = 0) at the same time. Or try to at least.
>>
>
> Did you look at the DAMASK formulation that Luv suggested?
>
>   Thanks,
>
> Matt
>
>
>> Thanks,
>> Max
>>
>> 2017-06-30 20:49 GMT+02:00 Luv Sharma :
>>
>>> Hi Max,
>>>
>>> I do not understand the equations that you write very clearly.
>>>
>>> Are you looking to implement a “local” and “if” type of isotropic
>>> hardening plasticity? If that is the case, then in my understanding you
>>> need to solve only 1 field equation for the displacement components or for
>>> the strain components. You can look at the following code:
>>> https://github.com/tdegeus/GooseFFT/blob/master/small-strain
>>> /laminate/elasto-plasticity.py
>>>
>>> If you are looking for a PETSc based implementation for plasticity
>>> (isotropic/anisotropic) you can look at
>>> https://damask.mpie.de/
>>> I had presented a talk about the same at the PETSc User Meeting last
>>> year.
>>>
>>> As I understand it, additional field equations will only be necessary if
>>> the plasticity or elasticity were “nonlocal”. You may want to look at:
>>> On the role of moving elastic–plastic boundaries in strain gradient
>>> plasticity, R H J Peerlings
>>>
>>> Best regards,
>>> Luv
>>>
>>> On 30 Jun 2017, at 11:52, Maximilian Hartig 
>>> wrote:
>>>
>>> Hi Luv,
>>>
>>> I’m modelling linear hardening(sigma_y = sigma_y0 +
>>> K_iso*epsilon_plast_eqiv) with isotropic  plasticity only. So I should not
>>> need to use an iterative method to find the point on the yield surface. I
>>> have three fields and 7 unknowns in total:
>>> Field 0: 3 displacement components
>>> Field 1: 3 velocity components
>>> Field 2: 1 equivalent plastic strain
>>>
>>> It is the solver for these three fields that is not converging. I am
>>> using PetscFE. As residuals for the plastic case (sigma_vM > sigma_yield) I
>>> have:
>>>
>>> Field 0 (displacement):
>>> f0[i] = rho*u_t[u_Off[1]+i]
>>> f1[i*dim+j] = sigma_tr[i*dim+j] - 2*mu*sqrt(3/2)*u_t[uOff[2]]*N[i*dim+j]
>>>
>>> where sigma_tr is the trial stress, mu is the shear modulus and N is the
>>> unit deviator tensor normal to the yield surface.
>>>
>>> Field 1 (velocity):
>>> f0[i] = u[uOff[1]+i]-u_t[i]
>>> f1[i*dim+j] = 0
>>>
>>> Field 2 (effective plastic strain):
>>> f0[0] = ||s_tr|| -2*mu*sqrt(3/2)*u_t[uOff[2]]-sqrt(2/3)*sigma_y
>>> f1[i] = 0
>>> where ||s_tr|| is the norm of the deviator stress tensor.
>>>
>>> Field 0 residual is essentially newton’s second law of motion and Field
>>> 2 residual should be the yield criterion. I might have just fundamentally
>>> misunderstood the equations of plasticity but I cannot seem to find my
>>> mistake.
>>>
>>> Thanks,
>>> Max
>>>
>>>
>>> On 30. Jun 2017, at 11:09, Luv Sharma  wrote:
>>>
>>> Hi Max,
>>>
>>> Is your field solver not converging or the material point solver ;)?
>>>
>>> Best regards,
>>> Luv
>>>
>>> On 30 Jun 2017, at 10:45, Maximilian Hartig 
>>> wrote:
>>>
>>> Hello,
>>>
>>> I’m trying to implement plasticity and have problems getting the Petsc
>>> SNES to converge. To check if my residual formulation is correct I tried
>>> running with -snes_fd for an easy example as the Petsc FAQ suggest. I
>>> cannot seem to get the solver to converge at any cost.
>>> I already 

Re: [petsc-users] -snes_fd for problems with residuals with non-continuous first derivative?

2017-07-07 Thread Maximilian Hartig
I have looked into several books for plasticity, but it cannot hurt to
check out another one. I will do so as soon as I can get a hold of it.
I do have a basic understanding on how to implement plasticity with the
return mapping algorithm. My main problem is that I am lost when trying to
implement it within Petsc. I just might have to create two separate snes
-solver contexts to run inside a single TS solver. In my understanding,
that is not how Petsc is designed to work though.
I cannot seem to find a clear formulation for dynamic plasticity either. I
looked into the Cristescu book "dynamic plasticity" but that was not very
helpful either. I have read through:
Rust, W. (2009). Nichtlineare Finite- Elemente-Berechnungen. Wiesbaden:
Springer Vieweg.
Kim, N.-H. (2015). Introduction to Nonlinear Finite Element Analysis. New
York, NY: Springer New York Heidelberg Dodrecht London.
http://doi.org/10.1007/978-1-4419-1746-1
Steinke, P. (2015). Finite-Elemente-Methode (5th ed.). Springer Vieweg.
and they all describe how to set up a stationary (or pseudo-timestepping)
solver for plasticity.
But firstly I'd like to avoid writing my own solver and secondly I need it
to be transient.

I have looked at the links you provided Luv, thanks. DAMASK is to complex a
subject for me to understand it quickly. I will need some time to
understand what it does and how to make use of it exactly.
Thanks,
Max

2017-07-05 20:49 GMT+02:00 Luv Sharma :

> Hi ,
>
> I agree with Sanjay and you can still look at the links that I mentioned;
> to get an understanding of different plasticity formulations.
> Honestly, I am also lost in plasticity ;)
>
> The first link is an FFT based spectral solver implementation of
> plasticity.
> I have a PETSc implementation (in python) of the same but without
> plasticity (only hyper-elasticity). I hope to make it open source soon.
> May be that will be helpful for you as well!
>
> Best regards,
> Luv
>
> On 5 Jul 2017, at 19:08, Sanjay Govindjee  wrote:
>
> Let me suggest that you grab a hold of Simo and Hughes, Computational
> Inelasticity, Springer-Verlag (1998).  It explains a lot about how to set
> up this problem -- in particular Chapter 1 gives a comprehensive
> one-dimensional tutorial on everything you need to know.
>
>
> On 7/5/17 9:39 AM, Maximilian Hartig wrote:
>
> I do not clearly understand the discrimination between local and global
> plasticity. I do have areas where I expect the behaviour to be elastic and
> other areas where I expect elasto-plastic behaviour.
> Inertia effects are of importance and hence I need second order temporal
> derivatives of my displacements. The only way I have found to implement
> this in Petsc is to create a separate velocity field which I use to then
> compute ü.
> To account for plasticity, in my understanding I need to introduce at
> least one additional history variable. In my case this is the effective
> plastic strain e_p. I then solve the equation of motion
> (grad(sigma)-rho*ü+F=0) and the consistency condition (sigma_eq -
> sigma_yield = 0) at the same time. Or try to at least.
>
> Thanks,
> Max
>
> 2017-06-30 20:49 GMT+02:00 Luv Sharma :
>
>> Hi Max,
>>
>> I do not understand the equations that you write very clearly.
>>
>> Are you looking to implement a “local” and “if” type of isotropic
>> hardening plasticity? If that is the case, then in my understanding you
>> need to solve only 1 field equation for the displacement components or for
>> the strain components. You can look at the following code:
>> https://github.com/tdegeus/GooseFFT/blob/master/small-strain
>> /laminate/elasto-plasticity.py
>>
>> If you are looking for a PETSc based implementation for plasticity
>> (isotropic/anisotropic) you can look at
>> https://damask.mpie.de/
>> I had presented a talk about the same at the PETSc User Meeting last year.
>>
>> As I understand it, additional field equations will only be necessary if
>> the plasticity or elasticity were “nonlocal”. You may want to look at:
>> On the role of moving elastic–plastic boundaries in strain gradient
>> plasticity, R H J Peerlings
>>
>> Best regards,
>> Luv
>>
>> On 30 Jun 2017, at 11:52, Maximilian Hartig 
>> wrote:
>>
>> Hi Luv,
>>
>> I’m modelling linear hardening(sigma_y = sigma_y0 +
>> K_iso*epsilon_plast_eqiv) with isotropic  plasticity only. So I should not
>> need to use an iterative method to find the point on the yield surface. I
>> have three fields and 7 unknowns in total:
>> Field 0: 3 displacement components
>> Field 1: 3 velocity components
>> Field 2: 1 equivalent plastic strain
>>
>> It is the solver for these three fields that is not converging. I am
>> using PetscFE. As residuals for the plastic case (sigma_vM > sigma_yield) I
>> have:
>>
>> Field 0 (displacement):
>> f0[i] = rho*u_t[u_Off[1]+i]
>> f1[i*dim+j] = sigma_tr[i*dim+j] - 2*mu*sqrt(3/2)*u_t[uOff[2]]*N[i*dim+j]
>>
>> where sigma_tr is the trial 

Re: [petsc-users] -snes_fd for problems with residuals with non-continuous first derivative?

2017-07-07 Thread Maximilian Hartig
If I set the function count higher it results in a diverged line search
error. And if I reduce the absolute residual tolerance it continues for one
or two more timesteps before it fails again.

Thanks,
Max

2017-07-05 20:24 GMT+02:00 Barry Smith :

>
> >39 SNES Function norm 3.758101652998e-08
> >   0 KSP preconditioned resid norm 1.931834799209e-02 true resid norm
> 3.758101652998e-08 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 5.958979968059e-16 true resid norm
> 2.588244438237e-18 ||r(i)||/||b|| 6.887105983875e-11
>
>The SNES function norm is pretty small here are you sure you need it
> smaller.
>
> > Nonlinear solve did not converge due to DIVERGED_FUNCTION_COUNT i
>
>You should just set the function count to say 1 billion so it doesn't
> stop the solution process.
>
>
>
>
> > On Jul 5, 2017, at 11:24 AM, Maximilian Hartig 
> wrote:
> >
> > The solver goes on for some timesteps, then it gets hung up in step 10.
> The output is as follows:
> >
> >   28 SNES Function norm 7.825649542035e-06
> >   0 KSP preconditioned resid norm 5.335581451751e+00 true resid norm
> 7.825649542035e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 3.875747966342e-13 true resid norm
> 3.082821236889e-15 ||r(i)||/||b|| 3.939380648635e-10
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >29 SNES Function norm 7.825641650401e-06
> >   0 KSP preconditioned resid norm 5.335568660325e+00 true resid norm
> 7.825641650401e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 1.169064646930e-13 true resid norm
> 2.949237553452e-15 ||r(i)||/||b|| 3.768684646199e-10
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >30 SNES Function norm 7.825633758775e-06
> >   0 KSP preconditioned resid norm 5.335575714181e+00 true resid norm
> 7.825633758775e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 3.626528075820e-13 true resid norm
> 7.454935467844e-16 ||r(i)||/||b|| 9.526302530432e-11
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >31 SNES Function norm 7.825625867144e-06
> >   0 KSP preconditioned resid norm 5.335574289757e+00 true resid norm
> 7.825625867144e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 1.220021201426e-13 true resid norm
> 6.140701297052e-16 ||r(i)||/||b|| 7.846913973787e-11
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >32 SNES Function norm 7.825617975525e-06
> >   0 KSP preconditioned resid norm 5.335556211708e+00 true resid norm
> 7.825617975525e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 3.238615954344e-14 true resid norm
> 1.639170816829e-15 ||r(i)||/||b|| 2.094621564655e-10
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >33 SNES Function norm 7.825610083913e-06
> >   0 KSP preconditioned resid norm 5.335545135100e+00 true resid norm
> 7.825610083913e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 8.328043430360e-14 true resid norm
> 7.902728010647e-16 ||r(i)||/||b|| 1.009854557780e-10
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >34 SNES Function norm 7.825609294752e-06
> >   0 KSP preconditioned resid norm 5.335519330054e+00 true resid norm
> 7.825609294752e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 1.176339150741e-13 true resid norm
> 2.137240765649e-15 ||r(i)||/||b|| 2.731085446703e-10
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >35 SNES Function norm 7.825608505592e-06
> >   0 KSP preconditioned resid norm 5.335552250936e+00 true resid norm
> 7.825608505592e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 2.388837462586e-13 true resid norm
> 3.899327292367e-16 ||r(i)||/||b|| 4.982778386601e-11
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >36 SNES Function norm 7.825607716432e-06
> >   0 KSP preconditioned resid norm 1.017554264638e-01 true resid norm
> 7.825607716432e-06 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 2.758703070887e-14 true resid norm
> 1.990491888342e-16 ||r(i)||/||b|| 2.543562059932e-11
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >37 SNES Function norm 3.815837655573e-08
> >   0 KSP preconditioned resid norm 7.197730150406e+00 true resid norm
> 3.815837655573e-08 ||r(i)||/||b|| 1.e+00
> >   1 KSP preconditioned resid norm 7.114599045039e-12 true resid norm
> 4.763264085142e-16 ||r(i)||/||b|| 1.248287929175e-08
> > Linear solve converged due to CONVERGED_RTOL iterations 1
> >38 SNES Function norm 3.790245121284e-08
> >   0 KSP preconditioned resid norm 7.127517791828e+00 true resid norm
> 3.790245121284e-08 ||r(i)||/||b|| 

Re: [petsc-users] Understanding preallocation for MPI

2017-07-07 Thread Dave May
On Fri, 7 Jul 2017 at 11:31, Florian Lindner  wrote:

> Hello,
>
> I'm having some struggle understanding the preallocation for MPIAIJ
> matrices, especially when a value is in off-diagonal
> vs. diagonal block.
>
> The small example program is at https://pastebin.com/67dXnGm3
>
> In general it should be parallel, but right now I just run it in serial.


When you run this code in serial, the mat type will be MATSEQAIJ. Hence,
the call to MatMPIAIJSetPreallocation() will have no effect because the mat
type does not match MPIAIJ. As a result, your code doesn't perform any
preallocation for SEQAIJ matrices.

In addition to calling MatMPIAIJSetPreallocation(), add a call to
MatSEQAIJSetPreallocation.

Thanks,
  Dave


>
> According to my understanding of
>
>
> http://www.mcs.anl.gov/petsc/petsc-3.7/docs/manualpages/Mat/MatMPIAIJSetPreallocation.html
>
> a entry is in the diagonal submatrix, if its row is in the OwnershipRange
> and its column is in OwnershipRangeColumn.
> That also means that in a serial run, there is only a diagonal submatrix.
>
> However, having MAT_NEW_NONZERO_ALLOCATION_ERR set, I get an error when
>
> Inserting 6 elements in row 2, though I have exactly
>
> 2 o_nnz = 0, d_nnz = 6 (means 6 elements allocated in the diagonal
> submatrix of row 2)
>
> Error is:
>
> [0]PETSC ERROR: Argument out of range
> [0]PETSC ERROR: New nonzero at (2,5) caused a malloc
> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn
> off this check
>
>
> What is wrong with my understanding?
>
> Thanks,
> Florian
>


[petsc-users] Understanding preallocation for MPI

2017-07-07 Thread Florian Lindner
Hello,

I'm having some struggle understanding the preallocation for MPIAIJ matrices, 
especially when a value is in off-diagonal
vs. diagonal block.

The small example program is at https://pastebin.com/67dXnGm3

In general it should be parallel, but right now I just run it in serial.

According to my understanding of

http://www.mcs.anl.gov/petsc/petsc-3.7/docs/manualpages/Mat/MatMPIAIJSetPreallocation.html

a entry is in the diagonal submatrix, if its row is in the OwnershipRange and 
its column is in OwnershipRangeColumn.
That also means that in a serial run, there is only a diagonal submatrix.

However, having MAT_NEW_NONZERO_ALLOCATION_ERR set, I get an error when

Inserting 6 elements in row 2, though I have exactly

2 o_nnz = 0, d_nnz = 6 (means 6 elements allocated in the diagonal submatrix of 
row 2)

Error is:

[0]PETSC ERROR: Argument out of range
[0]PETSC ERROR: New nonzero at (2,5) caused a malloc
Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off 
this check


What is wrong with my understanding?

Thanks,
Florian