Re: [petsc-users] Block preconditioning for 3d problem

2019-10-25 Thread Smith, Barry F. via petsc-users



> On Oct 25, 2019, at 1:12 AM, Dave Lee  wrote:
> 
> Thanks for your thoughts Barry, I will take this on board. Its not entirely 
> clear to me how to back out the dx updates for the other variables (velocity, 
> density, temperature) from the SNES at each newton iteration (for the 
> pressure solve), though i guess i could just do these myself within the 
> SNESComputeFunction() user routine.

   I don't understand your algorithm or approach. Are you just doing a full 
nonlinear solve on (velocity, density, temperature, pressure) but inside the 
linear solve you want/need to specifically handle "pressure" as part of the 
preconditioner? 

> There is no doubt also a mechanism for doing this via FieldSplit that I'm not 
> familiar with.

   If this is the case then FieldSplit was originally designed for this 
purpose. There are lots of examples showing how to use Fieldsplit in a variety 
of ways. 

> 
> I've found a partial solution to my problem, doing a Gauss-seidel two stage 
> solve for the pressure at each newton iteration, where i do an intermediate 
> solve based on implicit horizontal gradients, then a final solve (for the 
> current newton iteration) based on implicit vertical gradients.
> 
> This converges well, but i think there are still time splitting issues with 
> this uncoupled approach as the full solution goes unstable after a few time 
> steps (dt >> CFL).
> 
> I will take your advice and look at reformulating my outer problem for a SNES 
> (line search) solve.
> 
> Cheers, Dave.
> 
> On Fri, 25 Oct. 2019, 2:52 am Smith, Barry F.,  wrote:
> 
>   If you are "throwing things" away in computing the Jacobian then any 
> expectation that Newton will converge is also thrown out the window. 
> 
>   If you used the PETSc SNES nonlinear solver you would have much more space 
> to experiment in. For example you could use -snes_mf_operator to apply the 
> "true" matrix-vector product (computed with differencing) while building the 
> preconditioner by "throwing things" away and can still expect convergence of 
> the nonlinear solver.  It also has tools like -snes_jacobian_test that 
> compare your Jacobian to one computed by PETSc to see if there are bugs in 
> your Jacobian code and can compute the full Jacobian with differencing and 
> coloring pretty efficiently.  Also a large set of line search algorithms.
> 
>   Newton is not simple (though it looks it) and you should bring power Newton 
> tools to the game, SNES, not home-brew Newton solvers, especially when 
> tackling problems with potentially interesting nonlinearities.
> 
>  Barry
> 
> 
> > On Oct 14, 2019, at 8:18 PM, Dave Lee  wrote:
> > 
> > Hi Barry,
> > 
> > I've replied inline:
> > 
> > On Mon, Oct 14, 2019 at 4:07 PM Smith, Barry F.  wrote:
> > 
> >   Thanks. It seems that the iterative method is solved the entire linear 
> > system very accurately. For the last linear solve the initial true residual 
> > norm is around 10^6 (i.e. the norm of b) and the final one around 10^-10
> > 
> >   The initial  true residual norm on each block (part of a slab on one 
> > process) is around 10^-1 so I don't see a huge difference in scaling 
> > between the blocks. The final true residual for blocks is around 10^-17 so 
> > each block is well solved.
> > 
> > Thank you for taking the time to look at this Barry. Glad to know there's 
> > nothing too crazy going on here.
> >  
> > 
> >What do you mean by "quasi-Newton"? Are you doing any kind of line 
> > search or just using the full step from the linear solve? 
> > 
> > I'm using a full step from the linear solve. By "quasi-Newton" I just mean 
> > that the Jacobian is only approximate. I've left out and simplified some 
> > terms in order to make the Schur complement more tractable
> >  
> > 
> >When you solve with the single KSP over all slabs the convergence is 
> > determined by ||r_all||/||b_all|| < tol while slab by slab it is 
> > ||r_slab||/||b_slab|| < tol for each slab. If the slabs had hugely 
> > different scalings this could result in different "answers" but since the 
> > slabs seem to have very similar scalings the "answers" should be very 
> > similar. 
> > 
> >I would do the following. Solve the first linear system both ways and 
> > then compute the difference in the solution to the linear systems computed 
> > both ways. Relative to ||b|| the norm of the difference of the two linear 
> > systems solutions should be 10^-15 indicating both approaches produced 
> > almost identical answers. If this is the case then the problem is 
> > elsewhere, not in the solution of the linear system. On the other hand if 
> > the difference is large then likely the problem is in the formulation of 
> > the linear system, perhaps the combined linear system is not actually the 
> > individual linear systems.
> > 
> > Thanks for the tip Barry. I tried this, and the normalised difference 
> > between the full solve and the slab-wise solve is approximately 1.0e-4 to 
> > 1.0e-3 (depe

Re: [petsc-users] Block preconditioning for 3d problem

2019-10-24 Thread Smith, Barry F. via petsc-users


  If you are "throwing things" away in computing the Jacobian then any 
expectation that Newton will converge is also thrown out the window. 

  If you used the PETSc SNES nonlinear solver you would have much more space to 
experiment in. For example you could use -snes_mf_operator to apply the "true" 
matrix-vector product (computed with differencing) while building the 
preconditioner by "throwing things" away and can still expect convergence of 
the nonlinear solver.  It also has tools like -snes_jacobian_test that compare 
your Jacobian to one computed by PETSc to see if there are bugs in your 
Jacobian code and can compute the full Jacobian with differencing and coloring 
pretty efficiently.  Also a large set of line search algorithms.

  Newton is not simple (though it looks it) and you should bring power Newton 
tools to the game, SNES, not home-brew Newton solvers, especially when tackling 
problems with potentially interesting nonlinearities.

 Barry


> On Oct 14, 2019, at 8:18 PM, Dave Lee  wrote:
> 
> Hi Barry,
> 
> I've replied inline:
> 
> On Mon, Oct 14, 2019 at 4:07 PM Smith, Barry F.  wrote:
> 
>   Thanks. It seems that the iterative method is solved the entire linear 
> system very accurately. For the last linear solve the initial true residual 
> norm is around 10^6 (i.e. the norm of b) and the final one around 10^-10
> 
>   The initial  true residual norm on each block (part of a slab on one 
> process) is around 10^-1 so I don't see a huge difference in scaling between 
> the blocks. The final true residual for blocks is around 10^-17 so each block 
> is well solved.
> 
> Thank you for taking the time to look at this Barry. Glad to know there's 
> nothing too crazy going on here.
>  
> 
>What do you mean by "quasi-Newton"? Are you doing any kind of line search 
> or just using the full step from the linear solve? 
> 
> I'm using a full step from the linear solve. By "quasi-Newton" I just mean 
> that the Jacobian is only approximate. I've left out and simplified some 
> terms in order to make the Schur complement more tractable
>  
> 
>When you solve with the single KSP over all slabs the convergence is 
> determined by ||r_all||/||b_all|| < tol while slab by slab it is 
> ||r_slab||/||b_slab|| < tol for each slab. If the slabs had hugely different 
> scalings this could result in different "answers" but since the slabs seem to 
> have very similar scalings the "answers" should be very similar. 
> 
>I would do the following. Solve the first linear system both ways and then 
> compute the difference in the solution to the linear systems computed both 
> ways. Relative to ||b|| the norm of the difference of the two linear systems 
> solutions should be 10^-15 indicating both approaches produced almost 
> identical answers. If this is the case then the problem is elsewhere, not in 
> the solution of the linear system. On the other hand if the difference is 
> large then likely the problem is in the formulation of the linear system, 
> perhaps the combined linear system is not actually the individual linear 
> systems.
> 
> Thanks for the tip Barry. I tried this, and the normalised difference between 
> the full solve and the slab-wise solve is approximately 1.0e-4 to 1.0e-3 
> (depending on the slab), so I guess the problem is in how I'm assembling the 
> linear system for the full solve.
> 
> I've had a look into this and haven't noticed anything obvious (but I will 
> keep digging). One difference between the full and slab-wise solves is that 
> for the full solve each processor has num_slabs * num_slab_local_dofs DOFSs, 
> whereas for the slab-wise solves each processor has just num_slab_local_dofs, 
> so I'm wondering if the striding of the slab_local_dofs is causing issues for 
> the full solve. I'm going to try replacing the PCBJACOBI preconditioner with 
> a PCFIELDSPLIT preconditioner, and use PCFieldSplitSetIS() to specify dofs 
> within a given slab for each field, and see if that helps.
>  
> 
>   Barry
> 
> By the way, using a block Jacobi with Jacobi as the preconditioner on each 
> block  and a really tight tolerance for each block is not a practical method 
> for larger problems, though I think it is ok for this test.
> 
> Thanks Barry. I just wanted to do something simple for this test, but I take 
> your point. 
> 
> Cheers, Dave.
> 
> 
> 
> 
> 
> > On Oct 13, 2019, at 6:20 AM, Dave Lee  wrote:
> > 
> > Hi Barry,
> > 
> > I've answered your questions as follows:
> > 
> >  So each slab lives on all the processes?
> > 
> > Yes, that's correct.
> > 
> > How many processes and how many slabs are you running with?
> > 
> > For my test problem there are 6 processors and 8 slabs
> > 
> > There will be num_slabs blocks per processor times the number of 
> > processors. Depending on the layout of the unknowns each block will consist 
> > of a entire local slap or each block will consist of pieces from each slap.
> > 
> > The internal slab DOF is the minor index. Each

Re: [petsc-users] Block preconditioning for 3d problem

2019-10-14 Thread Jed Brown via petsc-users
Dave Lee  writes:

> Thanks Jed,
>
> I will reconfigure my PETSc with MUMPS or SuperLU and see if that helps.
> (my code is configured to only run in parallel on 6*n^2 processors (n^2
> procs on each face of a cubed sphere, which is a little annoying for
> situations like these where a serial LU solver would be handy for testing).
>
> I've tried setting a tight tolerance on the slab-wise block solve, and
> preconditioning based on the pressure, but these haven't helped.
>
> Unlike other atmosphere codes I'm using slab (layer)-minor indexing, as my
> horizontal discretisation is arbitrary order, while I'm only piecewise
> constant/linear in the vertical, so I've already got the slab-wise dofs
> close together in memory.

Being high order does not necessarily mean more tightly coupled.  A
useful test is to compute a column of the Jacobian inverse (e.g., by
solving with a right-hand side that is a column of the identity) and
plot it to see how it extends in each spatial dimension.

> I feel like maybe the construction of the Hessenberg during the Arnoldi
> iteration and/or the least squares minimisation for the GMRES solve is
> leading to additional coupling between the slabs, which is maybe degrading
> the convergence, just a thought...

GMRES does not care how you order degrees of freedom.

My guess based on what I've seen in this thread is that there is a bug
in your discretization.


Re: [petsc-users] Block preconditioning for 3d problem

2019-10-14 Thread Dave Lee via petsc-users
Hi Barry,

I've replied inline:

On Mon, Oct 14, 2019 at 4:07 PM Smith, Barry F.  wrote:

>
>   Thanks. It seems that the iterative method is solved the entire linear
> system very accurately. For the last linear solve the initial true residual
> norm is around 10^6 (i.e. the norm of b) and the final one around 10^-10
>
>   The initial  true residual norm on each block (part of a slab on one
> process) is around 10^-1 so I don't see a huge difference in scaling
> between the blocks. The final true residual for blocks is around 10^-17 so
> each block is well solved.
>

Thank you for taking the time to look at this Barry. Glad to know there's
nothing too crazy going on here.


>
>What do you mean by "quasi-Newton"? Are you doing any kind of line
> search or just using the full step from the linear solve?
>

I'm using a full step from the linear solve. By "quasi-Newton" I just mean
that the Jacobian is only approximate. I've left out and simplified some
terms in order to make the Schur complement more tractable


>
>When you solve with the single KSP over all slabs the convergence is
> determined by ||r_all||/||b_all|| < tol while slab by slab it is
> ||r_slab||/||b_slab|| < tol for each slab. If the slabs had hugely
> different scalings this could result in different "answers" but since the
> slabs seem to have very similar scalings the "answers" should be very
> similar.
>
>I would do the following. Solve the first linear system both ways and
> then compute the difference in the solution to the linear systems computed
> both ways. Relative to ||b|| the norm of the difference of the two linear
> systems solutions should be 10^-15 indicating both approaches produced
> almost identical answers. If this is the case then the problem is
> elsewhere, not in the solution of the linear system. On the other hand if
> the difference is large then likely the problem is in the formulation of
> the linear system, perhaps the combined linear system is not actually the
> individual linear systems.
>

Thanks for the tip Barry. I tried this, and the normalised difference
between the full solve and the slab-wise solve is approximately 1.0e-4 to
1.0e-3 (depending on the slab), so I guess the problem is in how I'm
assembling the linear system for the full solve.

I've had a look into this and haven't noticed anything obvious (but I will
keep digging). One difference between the full and slab-wise solves is that
for the full solve each processor has num_slabs * num_slab_local_dofs
DOFSs, whereas for the slab-wise solves each processor has just
num_slab_local_dofs, so I'm wondering if the striding of the
slab_local_dofs is causing issues for the full solve. I'm going to try
replacing the PCBJACOBI preconditioner with a PCFIELDSPLIT preconditioner,
and use PCFieldSplitSetIS() to specify dofs within a given slab for each
field, and see if that helps.


>
>   Barry
>
> By the way, using a block Jacobi with Jacobi as the preconditioner on each
> block  and a really tight tolerance for each block is not a practical
> method for larger problems, though I think it is ok for this test.
>

Thanks Barry. I just wanted to do something simple for this test, but I
take your point.

Cheers, Dave.

>
>
>
>
>
> > On Oct 13, 2019, at 6:20 AM, Dave Lee  wrote:
> >
> > Hi Barry,
> >
> > I've answered your questions as follows:
> >
> >  So each slab lives on all the processes?
> >
> > Yes, that's correct.
> >
> > How many processes and how many slabs are you running with?
> >
> > For my test problem there are 6 processors and 8 slabs
> >
> > There will be num_slabs blocks per processor times the number of
> processors. Depending on the layout of the unknowns each block will consist
> of a entire local slap or each block will consist of pieces from each slap.
> >
> > The internal slab DOF is the minor index. Each processor is responsible
> for 144 DOFs per slab, so each processor is responsible for 8x144 DOFs.
> When I run with -ksp_view there are 8 blocks on each processor (48 total),
> each with 144 DOF's, so I'm assuming everything is ok there.
> >
> >  Please run the linear solver with
> >
> > -ksp_monitor_true_residual -sub_ksp_monitor_true_residual
> -ksp_converged_reason -sub_ksp_converged_reason
> >
> >
> >   Do you get huge iteration counts somewhere?
> >
> > Not that I can see, file is attached (log.gmres)
> >
> > Next run the same thing but add -ksp_type fgmres does that change
> anything?
> >
> > I don't think so. The convergence is still the same, file is attached
> (log.fgmres).
> >
> > One thing I've noticed - I've been dumping the index of the slab with
> the maximum normalised error: norm_L2(delta_x) / norm_L2(x). If I solve for
> each slab separately, then the maximum normalised errors are typically in
> slab 3 (of 0-7). This makes sense as this is probably the most dynamically
> active layer of the atmosphere. However if I solve for all the slabs in a
> single solve (as is the cases for these tests), then the greate

Re: [petsc-users] Block preconditioning for 3d problem

2019-10-13 Thread Smith, Barry F. via petsc-users


  Thanks. It seems that the iterative method is solved the entire linear system 
very accurately. For the last linear solve the initial true residual norm is 
around 10^6 (i.e. the norm of b) and the final one around 10^-10

  The initial  true residual norm on each block (part of a slab on one process) 
is around 10^-1 so I don't see a huge difference in scaling between the blocks. 
The final true residual for blocks is around 10^-17 so each block is well 
solved.

   What do you mean by "quasi-Newton"? Are you doing any kind of line search or 
just using the full step from the linear solve? 

   When you solve with the single KSP over all slabs the convergence is 
determined by ||r_all||/||b_all|| < tol while slab by slab it is 
||r_slab||/||b_slab|| < tol for each slab. If the slabs had hugely different 
scalings this could result in different "answers" but since the slabs seem to 
have very similar scalings the "answers" should be very similar. 

   I would do the following. Solve the first linear system both ways and then 
compute the difference in the solution to the linear systems computed both 
ways. Relative to ||b|| the norm of the difference of the two linear systems 
solutions should be 10^-15 indicating both approaches produced almost identical 
answers. If this is the case then the problem is elsewhere, not in the solution 
of the linear system. On the other hand if the difference is large then likely 
the problem is in the formulation of the linear system, perhaps the combined 
linear system is not actually the individual linear systems.

  Barry

By the way, using a block Jacobi with Jacobi as the preconditioner on each 
block  and a really tight tolerance for each block is not a practical method 
for larger problems, though I think it is ok for this test.

   



> On Oct 13, 2019, at 6:20 AM, Dave Lee  wrote:
> 
> Hi Barry,
> 
> I've answered your questions as follows:
> 
>  So each slab lives on all the processes?
> 
> Yes, that's correct.
> 
> How many processes and how many slabs are you running with?
> 
> For my test problem there are 6 processors and 8 slabs
> 
> There will be num_slabs blocks per processor times the number of processors. 
> Depending on the layout of the unknowns each block will consist of a entire 
> local slap or each block will consist of pieces from each slap.
> 
> The internal slab DOF is the minor index. Each processor is responsible for 
> 144 DOFs per slab, so each processor is responsible for 8x144 DOFs. When I 
> run with -ksp_view there are 8 blocks on each processor (48 total), each with 
> 144 DOF's, so I'm assuming everything is ok there.
> 
>  Please run the linear solver with
> 
> -ksp_monitor_true_residual -sub_ksp_monitor_true_residual 
> -ksp_converged_reason -sub_ksp_converged_reason
> 
> 
>   Do you get huge iteration counts somewhere?
> 
> Not that I can see, file is attached (log.gmres)
> 
> Next run the same thing but add -ksp_type fgmres does that change anything?
> 
> I don't think so. The convergence is still the same, file is attached 
> (log.fgmres).
> 
> One thing I've noticed - I've been dumping the index of the slab with the 
> maximum normalised error: norm_L2(delta_x) / norm_L2(x). If I solve for each 
> slab separately, then the maximum normalised errors are typically in slab 3 
> (of 0-7). This makes sense as this is probably the most dynamically active 
> layer of the atmosphere. However if I solve for all the slabs in a single 
> solve (as is the cases for these tests), then the greatest error is always in 
> slab 7, the top of the atmosphere. Obviously the pressure is lowest in this 
> level, so I'm wondering if the GMRES iterations are just weighting for the 
> lower layers of the atmosphere too much.
> 
> Cheers, Dave.
> 
> On Sun, Oct 13, 2019 at 3:42 AM Smith, Barry F.  wrote:
> 
> 
> > On Oct 10, 2019, at 2:58 AM, Dave Lee via petsc-users 
> >  wrote:
> > 
> > Hi PETSc,
> > 
> > I have a nonlinear 3D problem for a set of uncoupled 2D slabs. (Which I 
> > ultimately want to couple once this problem is solved).
> > 
> > When I solve the inner linear problem for each of these 2D slabs 
> > individually (using KSPGMRES) the convergence of the outer nonlinear 
> > problem is good. However when I solve the inner linear problem as a single 
> > 3D problem (with no coupling between the 2D slabs, so the matrix is 
> > effectively a set of uncoupled blocks, one for each 2D slab) the outer 
> > nonlinear convergence degrades dramatically.
> > 
> > Note that I am not using SNES, just my own quasi-Newton approach for the 
> > outer nonlinear problem.
> > 
> > I suspect that the way to recover the convergence for the 3D coupled 
> > problem is to use some sort of PCBJACOBI or PCFIELDSPLIT preconditioner, 
> > but I'm not entirely sure. I've tried following this example:
> > https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex7.c.html
> > but without any improvement in the convergence. 
> > 
> > On each pro

Re: [petsc-users] Block preconditioning for 3d problem

2019-10-12 Thread Smith, Barry F. via petsc-users



> On Oct 10, 2019, at 2:58 AM, Dave Lee via petsc-users 
>  wrote:
> 
> Hi PETSc,
> 
> I have a nonlinear 3D problem for a set of uncoupled 2D slabs. (Which I 
> ultimately want to couple once this problem is solved).
> 
> When I solve the inner linear problem for each of these 2D slabs individually 
> (using KSPGMRES) the convergence of the outer nonlinear problem is good. 
> However when I solve the inner linear problem as a single 3D problem (with no 
> coupling between the 2D slabs, so the matrix is effectively a set of 
> uncoupled blocks, one for each 2D slab) the outer nonlinear convergence 
> degrades dramatically.
> 
> Note that I am not using SNES, just my own quasi-Newton approach for the 
> outer nonlinear problem.
> 
> I suspect that the way to recover the convergence for the 3D coupled problem 
> is to use some sort of PCBJACOBI or PCFIELDSPLIT preconditioner, but I'm not 
> entirely sure. I've tried following this example:
> https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex7.c.html
> but without any improvement in the convergence. 
> 
> On each processor the slab index is the major index and the internal slab DOF 
> index is the minor index. The parallel decomposition is within the 2D slab 
> dimensions only, not between slabs.

  So each slab lives on all the processes?

> 
> I'm configuring the 3D inner linear solver as

  How many processes and how many slabs are you running with? 

> 
> KSPGetPC(ksp, &pc);
> PCSetType(pc, PCBJACOBI);
> PCBJacobiSetLocalBlocks(pc, num_slabs, NULL);

There will be num_slabs blocks per processor times the number of 
processors. Depending on the layout of the unknowns each block will consist of 
a entire local slap or each block will consist of pieces from each slap. 

> KSPSetUp(ksp);
> PCBJacobiGetSubKSP(pc, &nlocal, &first_local, &subksp);
> for(int ii = 0; ii < nlocal; ii++) {
> KSPGetPC(subksp[ii], &subpc);
> PCSetType(subpc, PCJACOBI);
> KSPSetType(subksp[ii], KSPGMRES);
> KSPSetTolerances(subksp[ii], 1.e-12, PETSC_DEFAULT, PETSC_DEFAULT, 
> PETSC_DEFAULT);
> }

  Please run the linear solver with 

-ksp_monitor_true_residual -sub_ksp_monitor_true_residual 
-ksp_converged_reason -sub_ksp_converged_reason 


  Do you get huge iteration counts somewhere? 

  Next run the same thing but add -ksp_type fgmres does that change anything?

  Send all the results in an attached file.

Barry



> 
> However this does not seem to change the outer nonlinear convergence. When I 
> run with ksp_view the local block sizes look correct - number of local 2D 
> slab DOFs for each block on each processor.
> 
> Any ideas on what sort of KSP/PC configuration I should be using to recover 
> the convergence of the uncoupled 2D slab linear solve for this 3D linear 
> solve? Should I be rethinking my node indexing to help with this?
> 
> Cheers, Dave.



Re: [petsc-users] Block preconditioning for 3d problem

2019-10-11 Thread Jed Brown via petsc-users
Dave Lee  writes:

> Hey Jed,
>
> Right now they are uncoupled purely for testing purposes. I just want to
> get the uncoupled problem right before moving on to the coupled problem.

Okay, well you may not want to change your dof ordering to put slabs
together since you'll likely have some parts of your code that do
vertical processing (column integrals/physics or PDE stiffness in the
vertical).  It's common for climate/weather codes to use some horizontal
index (within a blocking scheme or an entire subdomain array index) as
the innermost to facilitate vectorization of the column processing
(which has high enough arithmetic intensity that vectorization matters).

> When I add the vertical dynamics back in both the outer nonlinear
> convergence and the inner linear convergence are roughly the same as for
> when I solve for the 2D slabs only as a 3D system.

Which you've said is terrible compared to separate 2D slabs, so we have
to get to the bottom of that.  We outlined a number of tests you can do
to understand why it isn't converging as expected, but haven't heard
back on the results so it'll be hard to advise further.

> I like your scaling idea. I tried using the pressure multiplied by the mass
> matrix as a preconditioner for the inner linear problem, but this didn't
> help. Perhaps some sort of scaling is the way to go though.
>
> Cheers, Dave.
>
> On Fri, Oct 11, 2019 at 2:45 PM Jed Brown  wrote:
>
>> Why are your slabs decoupled at present?  (Have you done a transform in
>> the vertical?)  Is the linear convergence significantly different when
>> you include the multiple layers?
>>
>> Dave Lee  writes:
>>
>> > Hi Jed and Mark,
>> >
>> > thanks for your helpful comments. Yes the nonlinear outer problem is
>> > uncoupled between the slabs, it is only the linear inner problem where
>> they
>> > are coupled.
>> >
>> > I've tried to make the slab DOFs close in memory, and also tried using a
>> > tight tolerance on the outer KSP (1.0e-20), but without success.
>> >
>> > You make some really good points about scaling issues Jed and Mark. This
>> is
>> > a solve for a global atmosphere, and each 2D slab is a horizontal layer
>> of
>> > the atmosphere, so the pressure (which the linear solve is for) will vary
>> > dramatically between slabs. Perhaps I can additionally precondition the
>> > linear problem to normalise the pressure in each slab so that they stay
>> > close to unity.
>> >
>> > Cheers, Dave.
>> >
>> > On Fri, Oct 11, 2019 at 2:52 AM Mark Adams  wrote:
>> >
>> >> I can think of a few sources of coupling in the solver: 1) line search
>> and
>> >> 2) Krylov, and 3) the residual test (scaling issues). You could turn
>> >> linesearch off and use Richardson (with a fixed number of iterations) or
>> >> exact solves as Jed suggested. As far as scaling can you use the same NL
>> >> problem on each slab? This should fix all the problems anyway. Or, on
>> the
>> >> good decouple solves, if the true residual is of the same scale and
>> *all*
>> >> of the slabs converge well then you should be OK on scaling. If this
>> works
>> >> then start adding stuff back in and see what breaks it.
>> >>
>> >> On Thu, Oct 10, 2019 at 11:01 AM Jed Brown via petsc-users <
>> >> petsc-users@mcs.anl.gov> wrote:
>> >>
>> >>> Dave Lee via petsc-users  writes:
>> >>>
>> >>> > Hi PETSc,
>> >>> >
>> >>> > I have a nonlinear 3D problem for a set of uncoupled 2D slabs.
>> (Which I
>> >>> > ultimately want to couple once this problem is solved).
>> >>> >
>> >>> > When I solve the inner linear problem for each of these 2D slabs
>> >>> > individually (using KSPGMRES) the convergence of the outer nonlinear
>> >>> > problem is good. However when I solve the inner linear problem as a
>> >>> single
>> >>> > 3D problem (with no coupling between the 2D slabs, so the matrix is
>> >>> > effectively a set of uncoupled blocks, one for each 2D slab) the
>> outer
>> >>> > nonlinear convergence degrades dramatically.
>> >>>
>> >>> Is the nonlinear problem also decoupled between slabs?
>> >>>
>> >>> If you solve the linear problem accurately (tight tolerances on the
>> >>> outer KSP, or global direct solve), is the outer nonlinear convergence
>> >>> good again?  If not, test that your Jacobian is correct (it likely
>> isn't
>> >>> or you have inconsistent scaling leading to ill conditioning).  SNES
>> has
>> >>> automatic tests for that, but you aren't using SNES so you'd have to
>> >>> write some code.
>> >>>
>> >>> What happens if you run the 2D problem (where convergence is currently
>> >>> good) with much smaller subdomains (or -pc_type pbjacobi)?
>> >>>
>> >>> > Note that I am not using SNES, just my own quasi-Newton approach for
>> the
>> >>> > outer nonlinear problem.
>> >>> >
>> >>> > I suspect that the way to recover the convergence for the 3D coupled
>> >>> > problem is to use some sort of PCBJACOBI or PCFIELDSPLIT
>> preconditioner,
>> >>> > but I'm not entirely sure. I've tried following this example:
>> >>> >
>> >>>
>> https://www.mcs

Re: [petsc-users] Block preconditioning for 3d problem

2019-10-10 Thread Jed Brown via petsc-users
Why are your slabs decoupled at present?  (Have you done a transform in
the vertical?)  Is the linear convergence significantly different when
you include the multiple layers?

Dave Lee  writes:

> Hi Jed and Mark,
>
> thanks for your helpful comments. Yes the nonlinear outer problem is
> uncoupled between the slabs, it is only the linear inner problem where they
> are coupled.
>
> I've tried to make the slab DOFs close in memory, and also tried using a
> tight tolerance on the outer KSP (1.0e-20), but without success.
>
> You make some really good points about scaling issues Jed and Mark. This is
> a solve for a global atmosphere, and each 2D slab is a horizontal layer of
> the atmosphere, so the pressure (which the linear solve is for) will vary
> dramatically between slabs. Perhaps I can additionally precondition the
> linear problem to normalise the pressure in each slab so that they stay
> close to unity.
>
> Cheers, Dave.
>
> On Fri, Oct 11, 2019 at 2:52 AM Mark Adams  wrote:
>
>> I can think of a few sources of coupling in the solver: 1) line search and
>> 2) Krylov, and 3) the residual test (scaling issues). You could turn
>> linesearch off and use Richardson (with a fixed number of iterations) or
>> exact solves as Jed suggested. As far as scaling can you use the same NL
>> problem on each slab? This should fix all the problems anyway. Or, on the
>> good decouple solves, if the true residual is of the same scale and *all*
>> of the slabs converge well then you should be OK on scaling. If this works
>> then start adding stuff back in and see what breaks it.
>>
>> On Thu, Oct 10, 2019 at 11:01 AM Jed Brown via petsc-users <
>> petsc-users@mcs.anl.gov> wrote:
>>
>>> Dave Lee via petsc-users  writes:
>>>
>>> > Hi PETSc,
>>> >
>>> > I have a nonlinear 3D problem for a set of uncoupled 2D slabs. (Which I
>>> > ultimately want to couple once this problem is solved).
>>> >
>>> > When I solve the inner linear problem for each of these 2D slabs
>>> > individually (using KSPGMRES) the convergence of the outer nonlinear
>>> > problem is good. However when I solve the inner linear problem as a
>>> single
>>> > 3D problem (with no coupling between the 2D slabs, so the matrix is
>>> > effectively a set of uncoupled blocks, one for each 2D slab) the outer
>>> > nonlinear convergence degrades dramatically.
>>>
>>> Is the nonlinear problem also decoupled between slabs?
>>>
>>> If you solve the linear problem accurately (tight tolerances on the
>>> outer KSP, or global direct solve), is the outer nonlinear convergence
>>> good again?  If not, test that your Jacobian is correct (it likely isn't
>>> or you have inconsistent scaling leading to ill conditioning).  SNES has
>>> automatic tests for that, but you aren't using SNES so you'd have to
>>> write some code.
>>>
>>> What happens if you run the 2D problem (where convergence is currently
>>> good) with much smaller subdomains (or -pc_type pbjacobi)?
>>>
>>> > Note that I am not using SNES, just my own quasi-Newton approach for the
>>> > outer nonlinear problem.
>>> >
>>> > I suspect that the way to recover the convergence for the 3D coupled
>>> > problem is to use some sort of PCBJACOBI or PCFIELDSPLIT preconditioner,
>>> > but I'm not entirely sure. I've tried following this example:
>>> >
>>> https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex7.c.html
>>> > but without any improvement in the convergence.
>>> >
>>> > On each processor the slab index is the major index and the internal
>>> slab
>>> > DOF index is the minor index. The parallel decomposition is within the
>>> 2D
>>> > slab dimensions only, not between slabs.
>>>
>>> For convergence, you usually want the direction of tight coupling
>>> (sounds like that is within slabs) to be close in memory.
>>>
>>> In general, use -ksp_monitor_true_residual -ksp_converged_reason.
>>>
>>> > I'm configuring the 3D inner linear solver as
>>> >
>>> > KSPGetPC(ksp, &pc);
>>> > PCSetType(pc, PCBJACOBI);
>>> > PCBJacobiSetLocalBlocks(pc, num_slabs, NULL);
>>> > KSPSetUp(ksp);
>>> > PCBJacobiGetSubKSP(pc, &nlocal, &first_local, &subksp);
>>> > for(int ii = 0; ii < nlocal; ii++) {
>>> > KSPGetPC(subksp[ii], &subpc);
>>> > PCSetType(subpc, PCJACOBI);
>>> > KSPSetType(subksp[ii], KSPGMRES);
>>> > KSPSetTolerances(subksp[ii], 1.e-12, PETSC_DEFAULT, PETSC_DEFAULT,
>>> > PETSC_DEFAULT);
>>> > }
>>> >
>>> > However this does not seem to change the outer nonlinear convergence.
>>> When
>>> > I run with ksp_view the local block sizes look correct - number of
>>> local 2D
>>> > slab DOFs for each block on each processor.
>>> >
>>> > Any ideas on what sort of KSP/PC configuration I should be using to
>>> recover
>>> > the convergence of the uncoupled 2D slab linear solve for this 3D linear
>>> > solve? Should I be rethinking my node indexing to help with this?
>>> >
>>> > Cheers, Dave.
>>>
>>


Re: [petsc-users] Block preconditioning for 3d problem

2019-10-10 Thread Dave Lee via petsc-users
Hi Jed and Mark,

thanks for your helpful comments. Yes the nonlinear outer problem is
uncoupled between the slabs, it is only the linear inner problem where they
are coupled.

I've tried to make the slab DOFs close in memory, and also tried using a
tight tolerance on the outer KSP (1.0e-20), but without success.

You make some really good points about scaling issues Jed and Mark. This is
a solve for a global atmosphere, and each 2D slab is a horizontal layer of
the atmosphere, so the pressure (which the linear solve is for) will vary
dramatically between slabs. Perhaps I can additionally precondition the
linear problem to normalise the pressure in each slab so that they stay
close to unity.

Cheers, Dave.

On Fri, Oct 11, 2019 at 2:52 AM Mark Adams  wrote:

> I can think of a few sources of coupling in the solver: 1) line search and
> 2) Krylov, and 3) the residual test (scaling issues). You could turn
> linesearch off and use Richardson (with a fixed number of iterations) or
> exact solves as Jed suggested. As far as scaling can you use the same NL
> problem on each slab? This should fix all the problems anyway. Or, on the
> good decouple solves, if the true residual is of the same scale and *all*
> of the slabs converge well then you should be OK on scaling. If this works
> then start adding stuff back in and see what breaks it.
>
> On Thu, Oct 10, 2019 at 11:01 AM Jed Brown via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>> Dave Lee via petsc-users  writes:
>>
>> > Hi PETSc,
>> >
>> > I have a nonlinear 3D problem for a set of uncoupled 2D slabs. (Which I
>> > ultimately want to couple once this problem is solved).
>> >
>> > When I solve the inner linear problem for each of these 2D slabs
>> > individually (using KSPGMRES) the convergence of the outer nonlinear
>> > problem is good. However when I solve the inner linear problem as a
>> single
>> > 3D problem (with no coupling between the 2D slabs, so the matrix is
>> > effectively a set of uncoupled blocks, one for each 2D slab) the outer
>> > nonlinear convergence degrades dramatically.
>>
>> Is the nonlinear problem also decoupled between slabs?
>>
>> If you solve the linear problem accurately (tight tolerances on the
>> outer KSP, or global direct solve), is the outer nonlinear convergence
>> good again?  If not, test that your Jacobian is correct (it likely isn't
>> or you have inconsistent scaling leading to ill conditioning).  SNES has
>> automatic tests for that, but you aren't using SNES so you'd have to
>> write some code.
>>
>> What happens if you run the 2D problem (where convergence is currently
>> good) with much smaller subdomains (or -pc_type pbjacobi)?
>>
>> > Note that I am not using SNES, just my own quasi-Newton approach for the
>> > outer nonlinear problem.
>> >
>> > I suspect that the way to recover the convergence for the 3D coupled
>> > problem is to use some sort of PCBJACOBI or PCFIELDSPLIT preconditioner,
>> > but I'm not entirely sure. I've tried following this example:
>> >
>> https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex7.c.html
>> > but without any improvement in the convergence.
>> >
>> > On each processor the slab index is the major index and the internal
>> slab
>> > DOF index is the minor index. The parallel decomposition is within the
>> 2D
>> > slab dimensions only, not between slabs.
>>
>> For convergence, you usually want the direction of tight coupling
>> (sounds like that is within slabs) to be close in memory.
>>
>> In general, use -ksp_monitor_true_residual -ksp_converged_reason.
>>
>> > I'm configuring the 3D inner linear solver as
>> >
>> > KSPGetPC(ksp, &pc);
>> > PCSetType(pc, PCBJACOBI);
>> > PCBJacobiSetLocalBlocks(pc, num_slabs, NULL);
>> > KSPSetUp(ksp);
>> > PCBJacobiGetSubKSP(pc, &nlocal, &first_local, &subksp);
>> > for(int ii = 0; ii < nlocal; ii++) {
>> > KSPGetPC(subksp[ii], &subpc);
>> > PCSetType(subpc, PCJACOBI);
>> > KSPSetType(subksp[ii], KSPGMRES);
>> > KSPSetTolerances(subksp[ii], 1.e-12, PETSC_DEFAULT, PETSC_DEFAULT,
>> > PETSC_DEFAULT);
>> > }
>> >
>> > However this does not seem to change the outer nonlinear convergence.
>> When
>> > I run with ksp_view the local block sizes look correct - number of
>> local 2D
>> > slab DOFs for each block on each processor.
>> >
>> > Any ideas on what sort of KSP/PC configuration I should be using to
>> recover
>> > the convergence of the uncoupled 2D slab linear solve for this 3D linear
>> > solve? Should I be rethinking my node indexing to help with this?
>> >
>> > Cheers, Dave.
>>
>


Re: [petsc-users] Block preconditioning for 3d problem

2019-10-10 Thread Mark Adams via petsc-users
I can think of a few sources of coupling in the solver: 1) line search and
2) Krylov, and 3) the residual test (scaling issues). You could turn
linesearch off and use Richardson (with a fixed number of iterations) or
exact solves as Jed suggested. As far as scaling can you use the same NL
problem on each slab? This should fix all the problems anyway. Or, on the
good decouple solves, if the true residual is of the same scale and *all*
of the slabs converge well then you should be OK on scaling. If this works
then start adding stuff back in and see what breaks it.

On Thu, Oct 10, 2019 at 11:01 AM Jed Brown via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Dave Lee via petsc-users  writes:
>
> > Hi PETSc,
> >
> > I have a nonlinear 3D problem for a set of uncoupled 2D slabs. (Which I
> > ultimately want to couple once this problem is solved).
> >
> > When I solve the inner linear problem for each of these 2D slabs
> > individually (using KSPGMRES) the convergence of the outer nonlinear
> > problem is good. However when I solve the inner linear problem as a
> single
> > 3D problem (with no coupling between the 2D slabs, so the matrix is
> > effectively a set of uncoupled blocks, one for each 2D slab) the outer
> > nonlinear convergence degrades dramatically.
>
> Is the nonlinear problem also decoupled between slabs?
>
> If you solve the linear problem accurately (tight tolerances on the
> outer KSP, or global direct solve), is the outer nonlinear convergence
> good again?  If not, test that your Jacobian is correct (it likely isn't
> or you have inconsistent scaling leading to ill conditioning).  SNES has
> automatic tests for that, but you aren't using SNES so you'd have to
> write some code.
>
> What happens if you run the 2D problem (where convergence is currently
> good) with much smaller subdomains (or -pc_type pbjacobi)?
>
> > Note that I am not using SNES, just my own quasi-Newton approach for the
> > outer nonlinear problem.
> >
> > I suspect that the way to recover the convergence for the 3D coupled
> > problem is to use some sort of PCBJACOBI or PCFIELDSPLIT preconditioner,
> > but I'm not entirely sure. I've tried following this example:
> >
> https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex7.c.html
> > but without any improvement in the convergence.
> >
> > On each processor the slab index is the major index and the internal slab
> > DOF index is the minor index. The parallel decomposition is within the 2D
> > slab dimensions only, not between slabs.
>
> For convergence, you usually want the direction of tight coupling
> (sounds like that is within slabs) to be close in memory.
>
> In general, use -ksp_monitor_true_residual -ksp_converged_reason.
>
> > I'm configuring the 3D inner linear solver as
> >
> > KSPGetPC(ksp, &pc);
> > PCSetType(pc, PCBJACOBI);
> > PCBJacobiSetLocalBlocks(pc, num_slabs, NULL);
> > KSPSetUp(ksp);
> > PCBJacobiGetSubKSP(pc, &nlocal, &first_local, &subksp);
> > for(int ii = 0; ii < nlocal; ii++) {
> > KSPGetPC(subksp[ii], &subpc);
> > PCSetType(subpc, PCJACOBI);
> > KSPSetType(subksp[ii], KSPGMRES);
> > KSPSetTolerances(subksp[ii], 1.e-12, PETSC_DEFAULT, PETSC_DEFAULT,
> > PETSC_DEFAULT);
> > }
> >
> > However this does not seem to change the outer nonlinear convergence.
> When
> > I run with ksp_view the local block sizes look correct - number of local
> 2D
> > slab DOFs for each block on each processor.
> >
> > Any ideas on what sort of KSP/PC configuration I should be using to
> recover
> > the convergence of the uncoupled 2D slab linear solve for this 3D linear
> > solve? Should I be rethinking my node indexing to help with this?
> >
> > Cheers, Dave.
>


Re: [petsc-users] Block preconditioning for 3d problem

2019-10-10 Thread Jed Brown via petsc-users
Dave Lee via petsc-users  writes:

> Hi PETSc,
>
> I have a nonlinear 3D problem for a set of uncoupled 2D slabs. (Which I
> ultimately want to couple once this problem is solved).
>
> When I solve the inner linear problem for each of these 2D slabs
> individually (using KSPGMRES) the convergence of the outer nonlinear
> problem is good. However when I solve the inner linear problem as a single
> 3D problem (with no coupling between the 2D slabs, so the matrix is
> effectively a set of uncoupled blocks, one for each 2D slab) the outer
> nonlinear convergence degrades dramatically.

Is the nonlinear problem also decoupled between slabs?

If you solve the linear problem accurately (tight tolerances on the
outer KSP, or global direct solve), is the outer nonlinear convergence
good again?  If not, test that your Jacobian is correct (it likely isn't
or you have inconsistent scaling leading to ill conditioning).  SNES has
automatic tests for that, but you aren't using SNES so you'd have to
write some code.

What happens if you run the 2D problem (where convergence is currently
good) with much smaller subdomains (or -pc_type pbjacobi)?

> Note that I am not using SNES, just my own quasi-Newton approach for the
> outer nonlinear problem.
>
> I suspect that the way to recover the convergence for the 3D coupled
> problem is to use some sort of PCBJACOBI or PCFIELDSPLIT preconditioner,
> but I'm not entirely sure. I've tried following this example:
> https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex7.c.html
> but without any improvement in the convergence.
>
> On each processor the slab index is the major index and the internal slab
> DOF index is the minor index. The parallel decomposition is within the 2D
> slab dimensions only, not between slabs.

For convergence, you usually want the direction of tight coupling
(sounds like that is within slabs) to be close in memory.

In general, use -ksp_monitor_true_residual -ksp_converged_reason.

> I'm configuring the 3D inner linear solver as
>
> KSPGetPC(ksp, &pc);
> PCSetType(pc, PCBJACOBI);
> PCBJacobiSetLocalBlocks(pc, num_slabs, NULL);
> KSPSetUp(ksp);
> PCBJacobiGetSubKSP(pc, &nlocal, &first_local, &subksp);
> for(int ii = 0; ii < nlocal; ii++) {
> KSPGetPC(subksp[ii], &subpc);
> PCSetType(subpc, PCJACOBI);
> KSPSetType(subksp[ii], KSPGMRES);
> KSPSetTolerances(subksp[ii], 1.e-12, PETSC_DEFAULT, PETSC_DEFAULT,
> PETSC_DEFAULT);
> }
>
> However this does not seem to change the outer nonlinear convergence. When
> I run with ksp_view the local block sizes look correct - number of local 2D
> slab DOFs for each block on each processor.
>
> Any ideas on what sort of KSP/PC configuration I should be using to recover
> the convergence of the uncoupled 2D slab linear solve for this 3D linear
> solve? Should I be rethinking my node indexing to help with this?
>
> Cheers, Dave.