Re: [petsc-users] PETSc turtorial example generating unexpected result

2019-01-14 Thread Mark Adams via petsc-users
This code puts the error in x:

call VecAXPY(x,neg_one,u,ierr)

I suspect you printed these numbers after this statement.

On Mon, Jan 14, 2019 at 8:10 PM Maahi Talukder via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Hello all,
>
> I complied and run the example *ex2f.F90* located in  
> */home/maahi/petsc/src/ksp/ksp/examples/tutorials.
> *As in the code the right hand side vector, b, was computed assuming an
> exact solution of a vector with all elements of 1.0, all the values of
> vector 'x' should have been 1.0. But after running the code as it was
> provided, the values of vector x that I get are the followings -
>  2.72322e-07
> 3.81437e-07
> 1.58922e-07
> 3.81437e-07
> 2.38878e-07
> -6.65645e-07
> 1.58922e-07
> -6.65645e-07
> -2.51219e-07
>
> I am wondering what went wrong? Could you please help me with some insight
> into this?
>
> Regards,
> Maahi Talukder
> Clarkson University
>


Re: [petsc-users] MPI Iterative solver crash on HPC

2019-01-14 Thread Mark Adams via petsc-users
The memory requested is an insane number. You may need to use 64 bit
integers.

On Mon, Jan 14, 2019 at 8:06 AM Sal Am via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> I ran it by: mpiexec -n 8 valgrind --tool=memcheck -q --num-callers=20
> --log-file=valgrind.log-osa.%p ./solveCSys -malloc off -ksp_type bcgs
> -pc_type gamg -mattransposematmult_via scalable -ksp_monitor -log_view
> The error:
>
> [6]PETSC ERROR: - Error Message
> --
> [6]PETSC ERROR: Out of memory. This could be due to allocating
> [6]PETSC ERROR: too large an object or bleeding by not properly
> [6]PETSC ERROR: destroying unneeded objects.
> [6]PETSC ERROR: Memory allocated 0 Memory used by process 39398023168
> [6]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info.
> [6]PETSC ERROR: Memory requested 18446744066024411136
> [6]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [6]PETSC ERROR: Petsc Release Version 3.10.2, unknown
> [6]PETSC ERROR: ./solveCSys on a linux-cumulus-debug named r02g03 by
> vef002 Mon Jan 14 08:54:45 2019
> [6]PETSC ERROR: Configure options PETSC_ARCH=linux-cumulus-debug
> --with-cc=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpicc
> --with-fc=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpifort
> --with-cxx=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpicxx
> --download-parmetis --download-metis --download-ptscotch
> --download-superlu_dist --download-mumps --with-scalar-type=complex
> --with-debugging=yes --download-scalapack --download-superlu
> --download-fblaslapack=1 --download-cmake
> [6]PETSC ERROR: #1 MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() line 1989
> in /lustre/home/vef002/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c
> [6]PETSC ERROR: #2 PetscMallocA() line 397 in
> /lustre/home/vef002/petsc/src/sys/memory/mal.c
> [6]PETSC ERROR: #3 MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() line 1989
> in /lustre/home/vef002/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c
> [6]PETSC ERROR: #4 MatTransposeMatMult_MPIAIJ_MPIAIJ() line 1203 in
> /lustre/home/vef002/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c
> [6]PETSC ERROR: #5 MatTransposeMatMult() line 9984 in
> /lustre/home/vef002/petsc/src/mat/interface/matrix.c
> [6]PETSC ERROR: #6 PCGAMGCoarsen_AGG() line 882 in
> /lustre/home/vef002/petsc/src/ksp/pc/impls/gamg/agg.c
> [6]PETSC ERROR: #7 PCSetUp_GAMG() line 522 in
> /lustre/home/vef002/petsc/src/ksp/pc/impls/gamg/gamg.c
> [6]PETSC ERROR: #8 PCSetUp() line 932 in
> /lustre/home/vef002/petsc/src/ksp/pc/interface/precon.c
> [6]PETSC ERROR: #9 KSPSetUp() line 391 in
> /lustre/home/vef002/petsc/src/ksp/ksp/interface/itfunc.c
> [6]PETSC ERROR: #10 main() line 68 in
> /home/vef002/debugenv/tests/solveCmplxLinearSys.cpp
> [6]PETSC ERROR: PETSc Option Table entries:
> [6]PETSC ERROR: -ksp_monitor
> [6]PETSC ERROR: -ksp_type bcgs
> [6]PETSC ERROR: -log_view
> [6]PETSC ERROR: -malloc off
> [6]PETSC ERROR: -mattransposematmult_via scalable
> [6]PETSC ERROR: -pc_type gamg
> [6]PETSC ERROR: End of Error Message ---send entire
> error message to petsc-ma...@mcs.anl.gov--
> --
> MPI_ABORT was invoked on rank 6 in communicator MPI_COMM_WORLD
> with errorcode 55.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --
>
> Memory requested error seems astronomical though This was done on a
> machine with 500GB of memory, during my last check it was using 30GB
> mem/processor not sure if it increased suddenly. The file size of the
> matrix is 40GB still same matrix
> 2 122 821 366 (non-zero elements)
> 25 947 279 x 25 947 279
>
>
>
> On Fri, Jan 11, 2019 at 5:34 PM Zhang, Hong  wrote:
>
>> Add option '-mattransposematmult_via scalable'
>> Hong
>>
>> On Fri, Jan 11, 2019 at 9:52 AM Zhang, Junchao via petsc-users <
>> petsc-users@mcs.anl.gov> wrote:
>>
>>> I saw the following error message in your first email.
>>>
>>> [0]PETSC ERROR: Out of memory. This could be due to allocating
>>> [0]PETSC ERROR: too large an object or bleeding by not properly
>>> [0]PETSC ERROR: destroying unneeded objects.
>>>
>>> Probably the matrix is too large. You can try with more compute nodes,
>>> for example, use 8 nodes instead of 2, and see what happens.
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Fri, Jan 11, 2019 at 7:45 AM Sal Am via petsc-users <
>>> petsc-users@mcs.anl.gov> wrote:
>>>
 Using a larger problem set with 2B non-zero elements and a matrix of
 25M x 25M I get the following error:
 [4]PETSC ERROR:
 
 [4]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
 probably memory access out of range
 

Re: [petsc-users] Example for adaptive high-order FEM in PETSc

2019-01-13 Thread Mark Adams via petsc-users
>
>
>
> > The Riemann solver computes the flux and the Jacobian is assembled in
> PETSc.
>
> If you run this example with an implicit integrator, it uses FD coloring
> to compute the Jacobian.  Manav wants an analytic Jacobian.  I was
> hoping Matt could answer this; I haven't used Plex in that way.
>

Oh right. That test is explicit.

Matt is on vacation but he is doing email.


Re: [petsc-users] Example for adaptive high-order FEM in PETSc

2019-01-13 Thread Mark Adams via petsc-users
On Sat, Jan 12, 2019 at 6:59 PM Manav Bhatia via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> I have been studying the source code and the manual entries in the
> documentation and have a better understanding of setting up a high-order DG
> analysis.
>
> I have been able to identify the routines that set the order of the space
> and its continuity (CG/DG).
>
> The following is still unclear and I would appreciate some guidance:
>
> — A implicit DG solve of the inviscid Euler equations would require
> Jacobian contributions from the Riemann Solver at element interfaces. I do
> not see the ability to define Jacobian contribution in
> PetscDSSetRiemannSolver. What would be the best approach to do this?
>

See
https://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/tutorials/ex11.c.html

The Riemann solver computes the flux and the Jacobian is assembled in PETSc.


>
> — It appears that the only two choices for polynomials are Lagrange and
> “simple”. I am not clear what the “simple” basis implies. Are these just
> powers of x, y and z?
>
> — Are Legendre polynomials available elsewhere in PETSc? I saw some
> mention of it in TS, but not in FE.
>
> Thanks,
> Manav
>
>
>
> > On Jan 12, 2019, at 2:46 AM, Manav Bhatia  wrote:
> >
> > While looking through some examples I see that both PetscDSSetJacobian
> and DMPlexSNESComputeJacobianFEM are being used to set the functions to
> evaluate the Jacobian contributions on each rank.
> >
> > This is my understanding after studying these (please correct if I am
> wrong):
> >
> > — The PetscDSSetJacobian (and its corresponding residual routine) does
> not seem to have the concept of an element. Instead, it provides the
> derivatives of solution at a point and requires that the coefficients for
> each form g_0, g_1, … be evaluated. Then, PETSc combines that in the
> background.
> >
> > — DMPlexSNESComputeJacobianFEM is a method that calls PetscDSSetJacobian
> to compute the Jacobian.
> >
> >
> > If this is correct, I am a bit confused about the following:
> >
> > — For a DG solve, I can see that PetscDSSetRiemannSolver can be used to
> compute the convective flux at the interface of two elements. However, I
> don’t see how the Jacobian contribution of this flux can be computed and
> added to the system Jacobian.
> >
> > — A more conventional FEM assembly iterates over elements, then over the
> quadrature points, where the shape functions and their derivatives are
> initialized and used for computation of residual and Jacobian. Is it
> possible to follow this procedure in PETSc with DMPlex/DMForest and still
> use multigrid? Is there an example that demonstrates this?
> >
> > — Is there some form of an index for elements that exist on the local
> processor that can be used to iterate on local elements?
> >
> >
> > I have tried to look, but could not find a document that outlines these
> concepts. Is there one that exists and I have missed it?
> >
> > I would greatly appreciate some direction.
> >
> > Regards,
> > Manav
> >
> >
> >
> >
> >> On Jan 11, 2019, at 10:03 PM, Manav Bhatia 
> wrote:
> >>
> >> Hi,
> >>
> >>  I am interested in setting up a high-order DG FEM solution with PETSc
> on an octree mesh with adaptivity and geometric multigrid preconditioning.
> Is there an example that demonstrates this with either DMForest or DMPlex.
> >>
> >> Thanks,
> >> Manav
> >>
> >>
> >
>
>


Re: [petsc-users] GAMG scaling

2018-12-24 Thread Mark Adams via petsc-users
On Tue, Dec 25, 2018 at 12:10 AM Jed Brown  wrote:

> Mark Adams  writes:
>
> > On Mon, Dec 24, 2018 at 4:56 PM Jed Brown  wrote:
> >
> >> Mark Adams via petsc-users  writes:
> >>
> >> > Anyway, my data for this is in my SC 2004 paper (MakeNextMat_private
> in
> >> > attached, NB, this is code that I wrote in grad school). It is memory
> >> > efficient and simple, just four nested loops i,j,I,J: C(I,J) =
> >> > P(i,I)*A(i,j)*P(j,J). In eyeballing the numbers and from new data
> that I
> >> am
> >> > getting from my bone modeling colleagues, that use this old code on
> >> > Stampede now, the times look reasonable compared to GAMG. This is
> >> optimized
> >> > for elasticity, where I unroll loops (so it is really six nested
> loops).
> >>
> >> Is the A above meant to include some ghosted rows?
> >>
> >
> > You could but I was thinking of having i in the outer loop. In C(I,J) =
> > P(i,I)*A(i,j)*P(j,J), the iteration over 'i' need only be the local rows
> of
> > A (and the left term P).
>
> Okay, so you need to gather those rows of P referenced by the
> off-diagonal parts of A.


yes, and this looks correct ..


> Once you have them, do
>
>   for i:
> v[:] = 0 # sparse vector
> for j:
>   v[:] += A[i,j] * P[j,:]
> for I:
>   C[I,:] += P[i,I] * v[:]
>
> One inefficiency is that you don't actually get "hits" on all the
> entries of C[I,:], but that much remains no matter how you reorder loops
> (unless you make I the outermost).
>

> >> > In thinking about this now, I think you want to make a local copy of P
> >> with
> >> > rows (j) for every column in A that you have locally, then transpose
> this
> >> > local thing for the P(j,J) term. A sparse AXPY on j. (My code uses a
> >> > special tree data structure but a matrix is simpler.)
> >>
> >> Why transpose for P(j,J)?
> >>
> >
> > (premature) optimization. I was thinking 'j' being in the inner loop and
> > doing sparse inner product, but now that I think about it there are other
> > options.
>
> Sparse inner products tend to be quite inefficient.  Explicit blocking
> helps some, but I would try to avoid it.
>

Yea, the design space here is non-trivial.

BTW, I have a Cal ME grad student that I've been working with on getting my
old parallel FE / Prometheus code running on Stampede for his bone modeling
problems. He started from zero in HPC but he is eager and has been picking
it up. If there is interest we could get performance data with the existing
code, as a benchmark, and we could generate matrices, if anyone wants to
look into this.


Re: [petsc-users] GAMG scaling

2018-12-24 Thread Mark Adams via petsc-users
On Mon, Dec 24, 2018 at 4:56 PM Jed Brown  wrote:

> Mark Adams via petsc-users  writes:
>
> > Anyway, my data for this is in my SC 2004 paper (MakeNextMat_private in
> > attached, NB, this is code that I wrote in grad school). It is memory
> > efficient and simple, just four nested loops i,j,I,J: C(I,J) =
> > P(i,I)*A(i,j)*P(j,J). In eyeballing the numbers and from new data that I
> am
> > getting from my bone modeling colleagues, that use this old code on
> > Stampede now, the times look reasonable compared to GAMG. This is
> optimized
> > for elasticity, where I unroll loops (so it is really six nested loops).
>
> Is the A above meant to include some ghosted rows?
>

You could but I was thinking of having i in the outer loop. In C(I,J) =
P(i,I)*A(i,j)*P(j,J), the iteration over 'i' need only be the local rows of
A (and the left term P).


>
> > In thinking about this now, I think you want to make a local copy of P
> with
> > rows (j) for every column in A that you have locally, then transpose this
> > local thing for the P(j,J) term. A sparse AXPY on j. (My code uses a
> > special tree data structure but a matrix is simpler.)
>
> Why transpose for P(j,J)?
>

(premature) optimization. I was thinking 'j' being in the inner loop and
doing sparse inner product, but now that I think about it there are other
options.


Re: [petsc-users] GAMG scaling

2018-12-22 Thread Mark Adams via petsc-users
Wow, this is an old thread.

Sorry if I sound like an old fart talking about the good old days but I
originally did RAP. in Prometheus, in a non work optimal way that might be
of interest. Not hard to implement. I bring this up because we continue to
struggle with this damn thing. I think this approach is perfectly scalable
and pretty low overhead, and simple.

Note, I talked to the hypre people about this in like 97 when they were
implementing RAP and perhaps they are doing it this way ... the 4x slower
way.

Anyway, my data for this is in my SC 2004 paper (MakeNextMat_private in
attached, NB, this is code that I wrote in grad school). It is memory
efficient and simple, just four nested loops i,j,I,J: C(I,J) =
P(i,I)*A(i,j)*P(j,J). In eyeballing the numbers and from new data that I am
getting from my bone modeling colleagues, that use this old code on
Stampede now, the times look reasonable compared to GAMG. This is optimized
for elasticity, where I unroll loops (so it is really six nested loops).

In thinking about this now, I think you want to make a local copy of P with
rows (j) for every column in A that you have locally, then transpose this
local thing for the P(j,J) term. A sparse AXPY on j. (My code uses a
special tree data structure but a matrix is simpler.)


On Sat, Dec 22, 2018 at 3:39 AM Mark Adams  wrote:

> OK, so this thread has drifted, see title :)
>
> On Fri, Dec 21, 2018 at 10:01 PM Fande Kong  wrote:
>
>> Sorry, hit the wrong button.
>>
>>
>>
>> On Fri, Dec 21, 2018 at 7:56 PM Fande Kong  wrote:
>>
>>>
>>>
>>> On Fri, Dec 21, 2018 at 9:44 AM Mark Adams  wrote:
>>>
 Also, you mentioned that you are using 10 levels. This is very strange
 with GAMG. You can run with -info and grep on GAMG to see the sizes and the
 number of non-zeros per level. You should coarsen at a rate of about 2^D to
 3^D with GAMG (with 10 levels this would imply a very large fine grid
 problem so I suspect there is something strange going on with coarsening).
 Mark

>>>
>>> Hi Mark,
>>>
>>>
>> Thanks for your email. We did not try GAMG much for our problems since we
>> still have troubles to figure out how to effectively use GAMG so far.
>> Instead, we are building our own customized  AMG  that needs to use PtAP to
>> construct coarse matrices.  The customized AMG works pretty well for our
>> specific simulations. The bottleneck right now is that PtAP might
>> take too much memory, and the code crashes within the function "PtAP". I
>> defiantly need a memory profiler to confirm my statement here.
>>
>> Thanks,
>>
>> Fande Kong,
>>
>>
>>
>>>
>>>
>>>

 On Fri, Dec 21, 2018 at 11:36 AM Zhang, Hong via petsc-users <
 petsc-users@mcs.anl.gov> wrote:

> Fande:
> I will explore it and get back to you.
> Does anyone know how to profile memory usage?
> Hong
>
> Thanks, Hong,
>>
>> I just briefly went through the code. I was wondering if it is
>> possible to destroy "c->ptap" (that caches a lot of intermediate data) to
>> release the memory after the coarse matrix is assembled. I understand you
>> may still want to reuse these data structures by default but for my
>> simulation, the preconditioner is fixed and there is no reason to keep 
>> the
>> "c->ptap".
>>
>
>> It would be great, if we could have this optional functionality.
>>
>> Fande Kong,
>>
>> On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong 
>> wrote:
>>
>>> We use nonscalable implementation as default, and switch to scalable
>>> for matrices over finer grids. You may use option '-matptap_via 
>>> scalable'
>>> to force scalable PtAP  implementation for all PtAP. Let me know if it
>>> works.
>>> Hong
>>>
>>> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
>>> wrote:
>>>

   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable
 automatically for "large" problems, which is determined by some 
 heuristic.

Barry


 > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
 petsc-users@mcs.anl.gov> wrote:
 >
 >
 >
 > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
 wrote:
 > Fande:
 > Hong,
 > Thanks for your improvements on PtAP that is critical for MG-type
 algorithms.
 >
 > On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
 > Mark,
 > Below is the copy of my email sent to you on Feb 27:
 >
 > I implemented scalable MatPtAP and did comparisons of three
 implementations using ex56.c on alcf cetus machine (this machine has 
 small
 memory, 1GB/core):
 > - nonscalable PtAP: use an array of length PN to do dense axpy
 > - scalable PtAP:   do sparse axpy without use of PN array
 >
 > What PN means here?
 > Global number of columns of P.

Re: [petsc-users] GAMG scaling

2018-12-22 Thread Mark Adams via petsc-users
OK, so this thread has drifted, see title :)

On Fri, Dec 21, 2018 at 10:01 PM Fande Kong  wrote:

> Sorry, hit the wrong button.
>
>
>
> On Fri, Dec 21, 2018 at 7:56 PM Fande Kong  wrote:
>
>>
>>
>> On Fri, Dec 21, 2018 at 9:44 AM Mark Adams  wrote:
>>
>>> Also, you mentioned that you are using 10 levels. This is very strange
>>> with GAMG. You can run with -info and grep on GAMG to see the sizes and the
>>> number of non-zeros per level. You should coarsen at a rate of about 2^D to
>>> 3^D with GAMG (with 10 levels this would imply a very large fine grid
>>> problem so I suspect there is something strange going on with coarsening).
>>> Mark
>>>
>>
>> Hi Mark,
>>
>>
> Thanks for your email. We did not try GAMG much for our problems since we
> still have troubles to figure out how to effectively use GAMG so far.
> Instead, we are building our own customized  AMG  that needs to use PtAP to
> construct coarse matrices.  The customized AMG works pretty well for our
> specific simulations. The bottleneck right now is that PtAP might
> take too much memory, and the code crashes within the function "PtAP". I
> defiantly need a memory profiler to confirm my statement here.
>
> Thanks,
>
> Fande Kong,
>
>
>
>>
>>
>>
>>>
>>> On Fri, Dec 21, 2018 at 11:36 AM Zhang, Hong via petsc-users <
>>> petsc-users@mcs.anl.gov> wrote:
>>>
 Fande:
 I will explore it and get back to you.
 Does anyone know how to profile memory usage?
 Hong

 Thanks, Hong,
>
> I just briefly went through the code. I was wondering if it is
> possible to destroy "c->ptap" (that caches a lot of intermediate data) to
> release the memory after the coarse matrix is assembled. I understand you
> may still want to reuse these data structures by default but for my
> simulation, the preconditioner is fixed and there is no reason to keep the
> "c->ptap".
>

> It would be great, if we could have this optional functionality.
>
> Fande Kong,
>
> On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong 
> wrote:
>
>> We use nonscalable implementation as default, and switch to scalable
>> for matrices over finer grids. You may use option '-matptap_via scalable'
>> to force scalable PtAP  implementation for all PtAP. Let me know if it
>> works.
>> Hong
>>
>> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
>> wrote:
>>
>>>
>>>   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically
>>> for "large" problems, which is determined by some heuristic.
>>>
>>>Barry
>>>
>>>
>>> > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
>>> petsc-users@mcs.anl.gov> wrote:
>>> >
>>> >
>>> >
>>> > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
>>> wrote:
>>> > Fande:
>>> > Hong,
>>> > Thanks for your improvements on PtAP that is critical for MG-type
>>> algorithms.
>>> >
>>> > On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
>>> > Mark,
>>> > Below is the copy of my email sent to you on Feb 27:
>>> >
>>> > I implemented scalable MatPtAP and did comparisons of three
>>> implementations using ex56.c on alcf cetus machine (this machine has 
>>> small
>>> memory, 1GB/core):
>>> > - nonscalable PtAP: use an array of length PN to do dense axpy
>>> > - scalable PtAP:   do sparse axpy without use of PN array
>>> >
>>> > What PN means here?
>>> > Global number of columns of P.
>>> >
>>> > - hypre PtAP.
>>> >
>>> > The results are attached. Summary:
>>> > - nonscalable PtAP is 2x faster than scalable, 8x faster than
>>> hypre PtAP
>>> > - scalable PtAP is 4x faster than hypre PtAP
>>> > - hypre uses less memory (see job.ne399.n63.np1000.sh)
>>> >
>>> > I was wondering how much more memory PETSc PtAP uses than hypre? I
>>> am implementing an AMG algorithm based on PETSc right now, and it is
>>> working well. But we find some a bottleneck with PtAP. For the same P 
>>> and
>>> A, PETSc PtAP fails to generate a coarse matrix due to out of memory, 
>>> while
>>> hypre still can generates the coarse matrix.
>>> >
>>> > I do not want to just use the HYPRE one because we had to
>>> duplicate matrices if I used HYPRE PtAP.
>>> >
>>> > It would be nice if you guys already have done some compassions on
>>> these implementations for the memory usage.
>>> > Do you encounter memory issue with  scalable PtAP?
>>> >
>>> > By default do we use the scalable PtAP?? Do we have to specify
>>> some options to use the scalable version of PtAP?  If so, it would be 
>>> nice
>>> to use the scalable version by default.  I am totally missing something
>>> here.
>>> >
>>> > Thanks,
>>> >
>>> > Fande
>>> >
>>> >
>>> > Karl had a student in the summer who improved MatPtAP(). Do you
>>> use the latest version of 

Re: [petsc-users] GAMG scaling

2018-12-21 Thread Mark Adams via petsc-users
Also, you mentioned that you are using 10 levels. This is very strange with
GAMG. You can run with -info and grep on GAMG to see the sizes and the
number of non-zeros per level. You should coarsen at a rate of about 2^D to
3^D with GAMG (with 10 levels this would imply a very large fine grid
problem so I suspect there is something strange going on with coarsening).
Mark

On Fri, Dec 21, 2018 at 11:36 AM Zhang, Hong via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Fande:
> I will explore it and get back to you.
> Does anyone know how to profile memory usage?
> Hong
>
> Thanks, Hong,
>>
>> I just briefly went through the code. I was wondering if it is possible
>> to destroy "c->ptap" (that caches a lot of intermediate data) to release
>> the memory after the coarse matrix is assembled. I understand you may still
>> want to reuse these data structures by default but for my simulation, the
>> preconditioner is fixed and there is no reason to keep the "c->ptap".
>>
>
>> It would be great, if we could have this optional functionality.
>>
>> Fande Kong,
>>
>> On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong  wrote:
>>
>>> We use nonscalable implementation as default, and switch to scalable for
>>> matrices over finer grids. You may use option '-matptap_via scalable' to
>>> force scalable PtAP  implementation for all PtAP. Let me know if it works.
>>> Hong
>>>
>>> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
>>> wrote:
>>>

   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically
 for "large" problems, which is determined by some heuristic.

Barry


 > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
 petsc-users@mcs.anl.gov> wrote:
 >
 >
 >
 > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
 wrote:
 > Fande:
 > Hong,
 > Thanks for your improvements on PtAP that is critical for MG-type
 algorithms.
 >
 > On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
 > Mark,
 > Below is the copy of my email sent to you on Feb 27:
 >
 > I implemented scalable MatPtAP and did comparisons of three
 implementations using ex56.c on alcf cetus machine (this machine has small
 memory, 1GB/core):
 > - nonscalable PtAP: use an array of length PN to do dense axpy
 > - scalable PtAP:   do sparse axpy without use of PN array
 >
 > What PN means here?
 > Global number of columns of P.
 >
 > - hypre PtAP.
 >
 > The results are attached. Summary:
 > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre
 PtAP
 > - scalable PtAP is 4x faster than hypre PtAP
 > - hypre uses less memory (see job.ne399.n63.np1000.sh)
 >
 > I was wondering how much more memory PETSc PtAP uses than hypre? I am
 implementing an AMG algorithm based on PETSc right now, and it is working
 well. But we find some a bottleneck with PtAP. For the same P and A, PETSc
 PtAP fails to generate a coarse matrix due to out of memory, while hypre
 still can generates the coarse matrix.
 >
 > I do not want to just use the HYPRE one because we had to duplicate
 matrices if I used HYPRE PtAP.
 >
 > It would be nice if you guys already have done some compassions on
 these implementations for the memory usage.
 > Do you encounter memory issue with  scalable PtAP?
 >
 > By default do we use the scalable PtAP?? Do we have to specify some
 options to use the scalable version of PtAP?  If so, it would be nice to
 use the scalable version by default.  I am totally missing something here.
 >
 > Thanks,
 >
 > Fande
 >
 >
 > Karl had a student in the summer who improved MatPtAP(). Do you use
 the latest version of petsc?
 > HYPRE may use less memory than PETSc because it does not save and
 reuse the matrices.
 >
 > I do not understand why generating coarse matrix fails due to out of
 memory. Do you use direct solver at coarse grid?
 > Hong
 >
 > Based on above observation, I set the default PtAP algorithm as
 'nonscalable'.
 > When PN > local estimated nonzero of C=PtAP, then switch default to
 'scalable'.
 > User can overwrite default.
 >
 > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
 > MatPtAP   3.6224e+01 (nonscalable for small mats,
 scalable for larger ones)
 > scalable MatPtAP 4.6129e+01
 > hypre1.9389e+02
 >
 > This work in on petsc-master. Give it a try. If you encounter any
 problem, let me know.
 >
 > Hong
 >
 > On Wed, May 3, 2017 at 10:01 AM, Mark Adams  wrote:
 > (Hong), what is the current state of optimizing RAP for scaling?
 >
 > Nate, is driving 3D elasticity problems at scaling with GAMG and we
 are working out performance problems. They are hitting problems at ~1.5B
 dof problems on a basic Cray 

Re: [petsc-users] taking over blind postdoc call for Andy Nonaka

2018-12-17 Thread Mark Adams via petsc-users
Woops petsc-users, mistake, sorry about that.

On Mon, Dec 17, 2018 at 11:29 AM Mark Adams  wrote:

> Dan and Sherry,
>
> I am taking over blind postdoc call for Andy Nonaka. Do you have any
> particular priorities that you would like me to refer good candidates for?
>
> Mark
>


[petsc-users] taking over blind postdoc call for Andy Nonaka

2018-12-17 Thread Mark Adams via petsc-users
Dan and Sherry,

I am taking over blind postdoc call for Andy Nonaka. Do you have any
particular priorities that you would like me to refer good candidates for?

Mark


Re: [petsc-users] PETSc binary write format

2018-12-05 Thread Mark Adams via petsc-users
On Wed, Dec 5, 2018 at 1:27 PM Sajid Ali via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> I have created a file as per the specification as shown below
>
> [sajid@xrm temp]$ cat vector.dat
> 00010010001101001110
> 00010100
> 
> 0011
> 0100
> 01001000
> 0101
> 01010100
> 01011000
> 01011100
> 0110
> 01100010
> 01100100
> 01100110
> 01101000
> 01101010
> 01101100
> 01101110
> 0111
> 01110001
> 01110010
> 01110011
>
> How do I save it as binary so that petsc can understand it?
>
This is the format that PETSc can read. ex10 saves these .dat files and
then reads them in. Am I missing something?


> xxd -b takes each element as ascii 0 or 1 and converts that to binary and
> neither does petsc understand the file.
>


Re: [petsc-users] Implementing of a variable block size BILU preconditioner

2018-12-01 Thread Mark Adams via petsc-users
There is a reason PETSc does not have variable block matrices. It is messy
and rarely a big win. It is doable, ML/Aztec does it, but it is messy.

On Sat, Dec 1, 2018 at 6:20 PM Smith, Barry F. via petsc-users <
petsc-users@mcs.anl.gov> wrote:

>
>Well, you need to start somewhere. It is going to take a combination of
> this, MatLUFactorSymbolic_SeqAIJ(), MatLUFactorSymbolic_SeqAIJ() and
> MatLUFactorNumerical_SeqAIJ() to get what you want. Note that one of the
> first things you need to do is given the MatGetVariableBlockSizes()
> determine the nonzero pattern of the "variable block" matrix from the
> nonzero pattern of the SeqAIJ matrix and then do a symbolic factorization
> on that new "reduced" sparse matrix. Them you do a numerical factorization
> using the "reduced sparse matrix" factorization.
>
> Barry
>
>
> > On Dec 1, 2018, at 4:54 PM, Ali Reza Khaz'ali 
> wrote:
> >
> > Hi,
> >
> > I want to implement a variable block size BILU preconditioner in PETSc.
> I am wondering if modifying the PCVPBJACOBI preconditioner (vpbjacobi.c),
> to get what I want, is a good idea.
> >
> > Any help is much appreciated.
> >
> > --
> > Ali Reza Khaz’ali
> > Assistant Professor of Petroleum Engineering,
> > Department of Chemical Engineering
> > Isfahan University of Technology
> > Isfahan, Iran
> >
>
>


Re: [petsc-users] Solving complex linear sparse matrix in parallel + external library

2018-11-28 Thread Mark Adams via petsc-users
On Wed, Nov 28, 2018 at 10:27 AM Sal Am via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Thank you indeed --download-mpich and using PETSC_ARCH/bin/mpiexec seems
> to work.
>
> Now I am wondering about the other problem namely getting the residual, is
> the residual only computed when using iterative solvers? Cause using
> richardson or gmres with 1 iteration while using MUMPS prints me nothing.
>

This should work. Please send us all the output and your arguments, etc.
You can also add -ksp_view to print info about the solver configuration.


>
>
>
> On Tue, Nov 27, 2018 at 10:53 AM Matthew Knepley 
> wrote:
>
>> On Tue, Nov 27, 2018 at 4:29 AM Sal Am  wrote:
>>
>>> This can happen if you use an 'mpiexec' which is from a different MPI
 than the one you compiled PETSc with.

>>>
>>> that is odd, I tried removing the --download-mpich from the config and
>>> tried --with-mpi=1 (which should be default anyways) and retried it with
>>> --with-mpich=1.
>>>
>>
>> Your MPI is still broken. Some default MPIs, like those installed with
>> Apple, are broken. If you are on Apple
>> I would recommend using --download-mpich, but make sure you use
>> $PETSC_DIR/$PETSC_ARCH/bin/mpiexec.
>>
>> If you reconfigure, you either have to specify a different --PETSC_ARCH,
>> or delete the $PETSC_DIR/$PETSC_ARCH completely. Otherwise, you can get bad
>> interaction with the libraries already sitting there.
>>
>>   Thanks,
>>
>> Matt
>>
>>
>>> Current reconfig file:
>>> '--download-mpich',
>>> '--download-mumps',
>>> '--download-scalapack',
>>> '--download-superlu_dist',
>>> '--with-cc=gcc',
>>> '--with-clanguage=cxx',
>>> '--with-cxx=g++',
>>> '--with-debugging=no',
>>> '--with-fc=gfortran',
>>> '--with-mpi=1',
>>> '--with-mpich=1',
>>> '--with-scalar-type=complex',
>>> 'PETSC_ARCH=linux-opt'
>>> Still does not work though, I tried executing ex11 in
>>> ksp/ksp/example/tutorial/ex11 which should solve a linear system in
>>> parallel by running mpiexec -n 2 but it prints out
>>> Mat Object: 1 MPI processes
>>> ...
>>> ..
>>> twice.
>>> What am I missing?
>>>
>>> Why would you want a minimum iterations? Why not set a tolerance and a
 max? What would you achieve with a minimum?

>>>
>>> I thought PETSc might not be iterating far enough, but after having had
>>> a look at KSPSetTolerances it makes more sense.
>>>
>>> Instead of 'preonly', where you do not want a residual, use 'richardson'
 or 'gmres' with a max of 1 iterate (-ksp_max_it 1)

>>>
>>> So I tried that by executing: mpiexec -n 4 ./test -ksp_type richardson
>>> -pc_type lu -pc_factor_mat_solver_type mumps -ksp_max_it 1
>>> on the command line. However, nothing prints out on the terminal (aside
>>> from PetscPrintf if I have them enabled).
>>>
>>> Thank you.
>>>
>>> On Fri, Nov 16, 2018 at 12:00 PM Matthew Knepley 
>>> wrote:
>>>
 On Fri, Nov 16, 2018 at 4:23 AM Sal Am via petsc-users <
 petsc-users@mcs.anl.gov> wrote:

> Hi,
>
> I have a few questions:
>
> 1. The following issue/misunderstanding:
> My code reads in two files one PETSc vector and one PETSc matrix (b
> and A from Ax=b, size ~65000x65000).
> and then calls KSP solver to solve it by running the following in the
> terminal:
>
> mpiexec - n 2 ./SolveSys -ksp_type preonly -pc_type lu
> -pc_factor_mat_solver mumps
>
> Now mumps is supposed to work in parallel and complex, but the code is
> not solved in parallel it seems. It just prints the result twice. Adding
> -log_view gives me
>
> "./SolveSys on a linux-opt named F8434 with 1 processor..." printed
> twice.
>

 This can happen if you use an 'mpiexec' which is from a different MPI
 than the one you compiled PETSc with.


> 2. Using iterative solvers, I am having difficulty getting
> convergence. I found that there is a way to set the maximum number of
> iterations, but is there a minimum I can increase?
>

 Why would you want a minimum iterations? Why not set a tolerance and a
 max? What would you achieve with a minimum?


> 3. The residual is not computed when using direct external solvers.
> What is proper PETSc way of doing this?
>

 Instead of 'preonly', where you do not want a residual, use
 'richardson' or 'gmres' with a max of 1 iterate (-ksp_max_it 1)

   Thanks,

  Matt


> -The
> code---
> #include 
> #include 
> int main(int argc,char **args)
> {
>   Vecx,b;  /* approx solution, RHS */
>   MatA;/* linear system matrix */
>   KSPksp; /* linear solver context */
>   PetscReal  norm; /* norm of solution error */
>   PC pc;
>   PetscMPIIntrank, size;

Re: [petsc-users] error messages while using snes

2018-11-22 Thread Mark Adams via petsc-users
Note, your FormInitialGuess does not initialize bb[0] and bb[1].

On Thu, Nov 22, 2018 at 11:23 AM barry via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Hi,
>
> I want to solve a nonlinear equation tan(x) = x, and the code i wrote
> occur some error.
>
> In the error, it say that I need to use MatAssemblyBegin/End(), but
> there aren't any Mat in my program.
>
> How can I fix it?
>
> This is what i write (major part):
>
> int main(){
>
>.
>
>VecCreate(PETSC_COMM_WORLD,);
>VecSetSizes(f,PETSC_DECIDE,400);
>VecSetFromOptions(f);
>VecDuplicate(f,);
>
>SNESCreate(PETSC_COMM_WORLD,);
>SNESSetFunction(snes,f,FormFunction,);
>SNESSetComputeInitialGuess(snes,FormInitialGuess,NULL);
>SNESSetFromOptions(snes);
>SNESSolve(snes,NULL,b);
>
>SNESDestroy();
>
>.
>
> }
>
> PetscErrorCode FormFunction(SNES snes,Vec b,Vec f,void *ctx)
> {
>const PetscScalar   *bb;
>PetscScalar *ff;
>PetscInt   i;
>PetscErrorCode   ierr;
>
>PetscInt   A1 = 1;
>PetscInt   A2 = 1;
>
>VecGetArrayRead(b,);
>VecGetArray(f,);
>
>/* Compute function */
>for (i=0; i<400; i++) ff[i] = tan(bb[i])-(A1/A2)*bb[i];
>
>VecRestoreArrayRead(b,);
>VecRestoreArray(f,);
>return 0;
> }
>
> PetscErrorCode FormInitialGuess(SNES snes,Vec b,void *ctx)
> {
>PetscScalar   *bb;
>PetscInt  i;
>PetscErrorCode  ierr;
>
>VecGetArray(b,);
>
>/* Compute initial guess */
>for (i=2; i<400; i++)
>{
>  if  (i==0) bb[i] = 0;
>  else if (i==1) bb[i] = 4.5;
>  else   bb[i] = bb[i-1] + 3.2;
>}
>
>VecRestoreArray(b,);
>return(0);
> }
>
>
> Here is the error come out:
>
> [0]PETSC ERROR: - Error Message
> --
> [0]PETSC ERROR: Object is in wrong state
> [0]PETSC ERROR: Matrix must be assembled by calls to
> MatAssemblyBegin/End();
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.10.2, unknown
> [0]PETSC ERROR: ./test2 on a arch-linux2-c-debug named G1ngy by barry
> Thu Nov 22 21:07:18 2018
> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
> --with-fc=gfortran --download-mpich --download-fblaslapack
> [0]PETSC ERROR: #1 MatFDColoringCreate() line 469 in
> /home/barry/petsc/src/mat/matfd/fdmatrix.c
> [0]PETSC ERROR: #2 SNESComputeJacobianDefaultColor() line 84 in
> /home/barry/petsc/src/snes/interface/snesj2.c
> [0]PETSC ERROR: #3 SNESComputeJacobian() line 2555 in
> /home/barry/petsc/src/snes/interface/snes.c
> [0]PETSC ERROR: #4 SNESSolve_NEWTONLS() line 222 in
> /home/barry/petsc/src/snes/impls/ls/ls.c
> [0]PETSC ERROR: #5 SNESSolve() line 4396 in
> /home/barry/petsc/src/snes/interface/snes.c
> [0]PETSC ERROR: #6 main() line 89 in /home/barry/Desktop/petsc/test2.c
> [0]PETSC ERROR: No PETSc Option Table entries
> [0]PETSC ERROR: End of Error Message ---send entire
> error message to petsc-ma...@mcs.anl.gov--
> application called MPI_Abort(MPI_COMM_WORLD, 73) - process 0
> [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=73
> :
> system msg for write_line failure : Bad file descriptor
>
>
> Thank you,
>
> Barry
>
>


Re: [petsc-users] Nullspace

2018-11-20 Thread Mark Adams via petsc-users
Yes, that's right.

Also, it is best to send this to the list, like petsc-users, for a few
reasons.

It is useful for the core developers, as well as any lurkers really, to
casually see what people are doing. And the threads are archived and are a
good source of deep documentation. eg, one could search for these methods
to see any discussion on them if you want to go beyond the documentation.
Your question is a good question. Could be a FAQ (even if not that
frequent).

Mark


On Tue, Nov 20, 2018 at 10:32 PM Sanjay Govindjee  wrote:

> Thanks.  I will give this a try and see what makes best sense.  In the
> Thermo-Elastic case I think it should be easy.
> The vectors are the 3 translations, the 3 rotations (all on the first 3
> dofs), then a 'translation' on the 4th dof.
>
> I'll let you know if it works or not.
>
> -sanjay
>
> ---
> Sanjay Govindjee, PhD, PE
> Horace, Dorothy, and Katherine Johnson Professor in Engineering
>
> 779 Davis Hall
> University of California
> Berkeley, CA 94720-1710
>
> Voice:  +1 510 642 6060
> FAX:+1 510 643 5264s_g@berkeley.eduhttp://faculty.ce.berkeley.edu/sanjay
> ---
>
> Books:
>
> Engineering Mechanics of Deformable Solidshttp://amzn.com/0199651647
>
> Engineering Mechanics 3 (Dynamics) 2nd Editionhttp://amzn.com/3642537111
>
> Engineering Mechanics 3, Supplementary Problems: Dynamics 
> http://www.amzn.com/B00SOXN8JU
>
> NSF NHERI SimCenterhttps://simcenter.designsafe-ci.org/
> ---
>
>
> On 11/20/18 10:39 AM, Mark Adams wrote:
>
> PCSetCoordinates is essentially deprecated for this reason. Fragile.
>
> The recommended way is to use MatSetNearNullSpace. There is a helper
> function MatNullSpaceCreateRigidBody that takes coordinates that is a
> reroll of PCSetCoordinates basically, but you can not use that.
>
> 1) You can create the null space yourself and give it to the matrix. Here
> is a sample code:
>
>   if (nearnulldim) {
> MatNullSpace nullsp;
> Vec  *nullvecs;
> PetscInt i;
> ierr = PetscMalloc1(nearnulldim,);CHKERRQ(ierr);
> for (i=0; i   ierr = VecCreate(PETSC_COMM_WORLD,[i]);CHKERRQ(ierr);
>   ierr = VecLoad(nullvecs[i],viewer);CHKERRQ(ierr);
> }
> ierr = 
> MatNullSpaceCreate(PETSC_COMM_WORLD,PETSC_FALSE,nearnulldim,nullvecs,);CHKERRQ(ierr);
> ierr = MatSetNearNullSpace(A,nullsp);CHKERRQ(ierr);
> for (i=0; i VecDestroy([i]);CHKERRQ(ierr);}
> ierr = PetscFree(nullvecs);CHKERRQ(ierr);
> ierr = MatNullSpaceDestroy();CHKERRQ(ierr);
>   }
>
> You have nearnulldim = 2*D + 1. You need to replace VecLoad here with
> code that sets the null space explicitly.
> 2) Here is another way that would use PETSc's RBM constructor. Here is the
> recommended way to set the null space now with PETSc's RBMs:
> if (use_nearnullspace) { MatNullSpace matnull; Vec vec_coords; PetscScalar
> *c; ierr = VecCreate(MPI_COMM_WORLD,_coords);CHKERRQ(ierr); ierr =
> VecSetBlockSize(vec_coords,3);CHKERRQ(ierr); ierr =
> VecSetSizes(vec_coords,m,PETSC_DECIDE);CHKERRQ(ierr); ierr =
> VecSetUp(vec_coords);CHKERRQ(ierr); ierr =
> VecGetArray(vec_coords,);CHKERRQ(ierr); for (i=0; i coords[i]; /* Copy since Scalar type might be Complex */ ierr =
> VecRestoreArray(vec_coords,);CHKERRQ(ierr); ierr =
> MatNullSpaceCreateRigidBody(vec_coords,);CHKERRQ(ierr); ierr =
> MatSetNearNullSpace(Amat,matnull);CHKERRQ(ierr); ierr =
> MatNullSpaceDestroy();CHKERRQ(ierr); ierr =
> VecDestroy(_coords);CHKERRQ(ierr); } else {
> Now there is a method to get the vectors out of the null space:
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatNullSpaceGetVecs.html
>
> Now you can create the RBMs (6) with thus second code and the get the
> vectors with MatNullSpaceGetVecs -- giving MatNullSpaceGetVecs a vector
> array of size 7. This will give you the 6 RBMs. Now you just need to
> manually create and set the 7th vector with a basis vector for your thermal
> dof. I assume that this is the correct null space mathematically. Then use
> that in MatNullSpaceCreate.
>
> You could use either approach.
>
> Note, I don't know if all this has been tested well with  > 6 null
> vectors, so we may need to fix bug(s).
>
> Mark
>
>
>
>
>
> On Tue, Nov 20, 2018 at 1:03 PM Sanjay Govindjee  wrote:
>
>> Mark,
>>   In parFEAP we have been using GAMG together with PCSetCoordinates.
>> This works fine for mechanical problems, but
>> one of our users wants to solve thermo-elastic problems (4 - dofs per
>> node).  The matrix now has a block size of 4
>> and GAMG is complaining:
>>
>> [0]PETSC ERROR: Petsc has generated inconsistent data
>> [0]PETSC ERROR: Don't know how to create null space for ndm=3, ndf=4.  Use 
>> MatSetNearNullSpace.
>> [0]PETSC ERROR: #1 PCSetCoordinates_AGG() line 196 in 
>> /home/user/Software/petsc-3.9.2/src/ksp/pc/impls/gamg/agg.c
>> [0]PETSC 

Re: [petsc-users] Question about DMDAGetElements

2018-11-19 Thread Mark Adams via petsc-users
The local indices of the local mesh and local vectors, which includes ghost
vertices. There are global-to-local methods to fill in ghost values and
local-to-global methods to create global vectors that you can use for
computation.

On Mon, Nov 19, 2018 at 5:16 PM Sajid Ali 
wrote:

> Bingo!
>
> So, DMDAGetElements gives the indices of the mesh, right ?
>
>
> Thank you !
>


Re: [petsc-users] Question about DMDAGetElements

2018-11-19 Thread Mark Adams via petsc-users
You seem to be confusing the degree of the mesh and the "degree" of the
matrix and vector.

A matrix is always N x M (2D if you like), a vector is always N (or 1 x N,
or 1D if you like).

The mesh in a DM or DA can be 1, 2 or 3D.

On Mon, Nov 19, 2018 at 4:44 PM Sajid Ali via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> So, DMDA is used for sparse matrices arising from FD/FE and
> MatCreateMPIAIJ can be used for dense matrices (though it is strongly
> discouraged).
>
> My confusion stemmed from DMDAGetElements giving the element indices for
> the 1D mesh/vector of size N(when the DMDACreate1d is used). But these
> indices were then used as local indices to set the values for a 2D matrix
> (via MatSetValuesLocal ) created using the same DM (now we have NxN
> elements).
>
>


Re: [petsc-users] PETSc (3.9.0) GAMG weak scaling test issue

2018-11-19 Thread Mark Adams via petsc-users
>
>
>
> Mark would have better comments on the scalability of the setup stage.
>
>

The first thing to verify is that the algorithm is scaling. If you coarsen
too slowly then the coarse grids get large, with many non-zeros per row,
and the cost of the matrix triple product can explode. You can check this
by running with -info and grep on GAMG and then look/grep for line about
grid "Complexity". It should be well below 1.5. With a new version of PETSc
you can see this with -ksp_view. You can also look at the total number of
flops in matrix-matrix kernels like MatPtAPNumeric below. That should
increase slowly as you scale the problem size up.

Next, the setup has two basic types of processes: 1) custom graph
processing and other kernels and 2) matrix-matrix products (like the matrix
triple product). (2) are normal, but complex, numerical kernels and you can
use standard performance debugging techniques to figure out where the time
is going. There are normal numerical loops and MPI communication. (1) is
made up of custom methods that do a lot of different types of processing,
but parallel graph processing is a big chunk of this. These processes are
very unstructured and it is hard to predict what performance to expect from
them. You really just have to dig in and look for where the time is spent.
I have added timers help with this:

PCGAMGGraph_AGG   12 1.0 3.3467e+00 1.0 4.58e+06 1.2 7.6e+06
6.7e+02 1.4e+02  5  0  2  1  6   5  0  2  1  6 22144
PCGAMGCoarse_AGG  12 1.0 8.9895e+00 1.0 1.35e+08 1.2 7.7e+07
6.2e+03 4.7e+02 14 10 16 51 18  14 10 16 51 18 233748
PCGAMGProl_AGG12 1.0 3.9328e+00 1.0 0.00e+00 0.0 9.1e+06
1.5e+03 1.9e+02  6  0  2  1  7   6  0  2  1  7 0
PCGAMGPOpt_AGG12 1.0 6.3192e+00 1.0 7.43e+07 1.1 3.9e+07
9.0e+02 5.0e+02  9  6  8  4 19   9  6  8  4 19 190048
GAMG: createProl  12 1.0 2.2585e+01 1.0 2.13e+08 1.2 1.3e+08
4.0e+03 1.3e+03 34 16 28 57 50  34 16 28 57 50 149493
  Graph   24 1.0 3.3388e+00 1.0 4.58e+06 1.2 7.6e+06
6.7e+02 1.4e+02  5  0  2  1  6   5  0  2  1  6 22196
  MIS/Agg 12 1.0 5.6411e-01 1.2 0.00e+00 0.0 4.7e+07
8.7e+02 2.3e+02  1  0 10  4  9   1  0 10  4  9 0
  SA: col data12 1.0 1.2982e+00 1.1 0.00e+00 0.0 5.6e+06
2.0e+03 4.8e+01  2  0  1  1  2   2  0  1  1  2 0
  SA: frmProl012 1.0 1.6284e+00 1.0 0.00e+00 0.0 3.6e+06
5.5e+02 9.6e+01  2  0  1  0  4   2  0  1  0  4 0
  SA: smooth  12 1.0 4.1778e+00 1.0 5.28e+06 1.2 1.4e+07
7.3e+02 1.7e+02  6  0  3  1  7   6  0  3  1  7 20483
GAMG: partLevel   12 1.0 1.3577e+01 1.0 2.90e+07 1.2 2.3e+07
2.1e+03 6.4e+02 20  2  5  5 25  20  2  5  5 25 34470
  repartition  9 1.0 1.5048e+00 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 5.4e+01  2  0  0  0  2   2  0  0  0  2 0
  Invert-Sort  9 1.0 1.2282e+00 1.1 0.00e+00 0.0 0.0e+00
0.0e+00 3.6e+01  2  0  0  0  1   2  0  0  0  1 0
  Move A   9 1.0 2.8930e+00 1.0 0.00e+00 0.0 5.7e+05
8.4e+02 1.5e+02  4  0  0  0  6   4  0  0  0  6 0
  Move P   9 1.0 3.0317e+00 1.0 0.00e+00 0.0 9.4e+05
2.5e+01 1.5e+02  5  0  0  0  6   5  0  0  0  6 0


There is nothing that pops out here but you could look at the scaling of
the parts. There are no alternative implementations of this and so there is
not much that can be done about it. I have not done a deep dive into the
performance of this (1) stuff for a very long time. Note, most of this work
is amortized because you do it just once for each mesh. So this cost will
decrease relatively as you do more solves, like in a real production run.

The matrix-matrix (2) stuff is here:

MatMatMult12 1.0 3.5538e+00 1.0 4.58e+06 1.2 1.4e+07
7.3e+02 1.5e+02  5  0  3  1  6   5  0  3  1  6 20853
MatMatMultSym 12 1.0 3.3264e+00 1.0 0.00e+00 0.0 1.2e+07
6.7e+02 1.4e+02  5  0  2  1  6   5  0  2  1  6 0
MatMatMultNum 12 1.0 9.7088e-02 1.1 4.58e+06 1.2 2.5e+06
1.0e+03 0.0e+00  0  0  1  0  0   0  0  1  0  0 763319
MatPtAP   12 1.0 4.0859e+00 1.0 2.90e+07 1.2 2.2e+07
2.2e+03 1.9e+02  6  2  5  5  7   6  2  5  5  7 114537
MatPtAPSymbolic   12 1.0 2.4298e+00 1.1 0.00e+00 0.0 1.4e+07
2.4e+03 8.4e+01  4  0  3  4  3   4  0  3  4  3 0
MatPtAPNumeric12 1.0 1.7467e+00 1.1 2.90e+07 1.2 8.2e+06
1.8e+03 9.6e+01  3  2  2  2  4   3  2  2  2  4 267927
MatTrnMatMult 12 1.0 7.1406e+00 1.0 1.35e+08 1.2 1.6e+07
2.6e+04 1.9e+02 11 10  3 44  7  11 10  3 44  7 294270
MatTrnMatMultSym  12 1.0 5.6756e+00 1.0 0.00e+00 0.0 1.3e+07
1.6e+04 1.6e+02  9  0  3 23  6   9  0  3 23  6 0
MatTrnMatMultNum  12 1.0 1.5121e+00 1.0 1.35e+08 1.2 2.5e+06
7.9e+04 2.4e+01  2 10  1 21  1   2 10  1 21  1 1389611


Note, the symbolic ("Sym") times (like (1) above) are pretty large, but
again they are amortized away and tend to be slow compared to nice
numerical kernels loops.

Mark


Re: [petsc-users] Solving problem using multigrid

2018-11-18 Thread Mark Adams via petsc-users
The manual is pretty clear on this, but your code is purely algebraic, that
is you do not use a DM, and so you will need to use algebraic multigrid
(AMG). So you want to look at AMG preconditioners (-pc_type gamg [or hypre
if you configured with --download-hypre]).

Mark

On Sun, Nov 18, 2018 at 1:28 PM Fazlul Huq via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Hello PETSc developers,
>
> I have solved a problem using ksp. Please find the problem in attachment 1
> and PETSc code for solution in attachment 2.
> Now I need to solve the same problem using multigrid.
> I never solved problem using multigrid. Can you please guide me about
> steps that I should follow
> to solve this problem using multigrid.
>
> Thanks.
> Sincerely,
> Huq
> --
>
> Fazlul Huq
> Graduate Research Assistant
> Department of Nuclear, Plasma & Radiological Engineering (NPRE)
> University of Illinois at Urbana-Champaign (UIUC)
> E-mail: huq2...@gmail.com
>


Re: [petsc-users] GAMG Parallel Performance

2018-11-15 Thread Mark Adams via petsc-users
There is a lot of load imbalance in VecMAXPY also. The partitioning could
be bad and if not its the machine.

On Thu, Nov 15, 2018 at 1:56 PM Smith, Barry F. via petsc-users <
petsc-users@mcs.anl.gov> wrote:

>
> Something is odd about your configuration. Just consider the time for
> VecMAXPY which is an embarrassingly parallel operation. On 1000 MPI
> processes it produces
>
> Time
>
>   flop rate
>  VecMAXPY 575 1.0 8.4132e-01 1.5 1.36e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,600,021
>
> on 1500 processes it produces
>
>  VecMAXPY 583 1.0 1.0786e+00 3.4 9.38e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,289,187
>
> that is it actually takes longer (the time goes from .84 seconds to 1.08
> seconds and the flop rate from 1,600,021 down to 1,289,187) You would never
> expect this kind of behavior
>
> and on 2000 processes it produces
>
> VecMAXPY 583 1.0 7.1103e-01 2.7 7.03e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,955,563
>
> so it speeds up again but not by very much. This is very mysterious and
> not what you would expect.
>
>I'm inclined to believe something is out of whack on your computer, are
> you sure all nodes on the computer are equivalent? Same processors, same
> clock speeds? What happens if you run the 1000 process case several times,
> do you get very similar numbers for VecMAXPY()? You should but I am
> guessing you may not.
>
> Barry
>
>   Note that this performance issue doesn't really have anything to do with
> the preconditioner you are using.
>
>
>
>
>
> > On Nov 15, 2018, at 10:50 AM, Karin via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
> >
> > Dear PETSc team,
> >
> > I am solving a linear transient dynamic problem, based on a
> discretization with finite elements. To do that, I am using FGMRES with
> GAMG as a preconditioner. I consider here 10 time steps.
> > The problem has round to 118e6 dof and I am running on 1000, 1500 and
> 2000 procs. So I have something like 100e3, 78e3 and 50e3 dof/proc.
> > I notice that the performance deteriorates when I increase the number of
> processes.
> > You can find as attached file the log_view of the execution and the
> detailled definition of the KSP.
> >
> > Is the problem too small to run on that number of processes or is there
> something wrong with my use of GAMG?
> >
> > I thank you in advance for your help,
> > Nicolas
> >
> 
>
>


Re: [petsc-users] [petsc-maint] Correct use of PCFactorSetMatOrderingType

2018-11-08 Thread Mark Adams via petsc-users
I am not that familiar with hypre's options. AMG is complicated and I
barely keep my options straight.

OpenFOAM seems to have highly specialized solvers so being with 50% of them
is decent.


On Thu, Nov 8, 2018 at 12:03 PM Edoardo alinovi 
wrote:

> Yes, it is like you are saying. This is mostly due to the time employed by
> ksp to solve the pressure equation. However, I have worked a lot on the
> problem and I have found out that the default configuration is far to be
> the optimal one, at least in this case.
>
> Actually my cpu time is decreased by more than twice with respect to the
> default configuration. I put here the changes, maybe they can be usefull to
> other users in the future:
>
> -pc_hypre_boomeramg_no_CF
> -pc_hypre_boomeramg_agg_nl 1
> pc_hypre_boomeramg_coarsen_type HMIS
> -pc_hypre_boomeramg_interp_type FF1
>
> The last two seem to be essential in enhancing performances.
>
> Do you know other configurations that worth to test?  Also, I would like
> to know if a list of command options for hypre is available somewhere.
> Looking at hypre's doc there are a lot of options, but I do not exactly
> know which one is available in petsc and their name.
>
> Thank you very much for the kind support and sorry for the long emal.!
>
> Il giorno gio 8 nov 2018, 17:32 Mark Adams  ha scritto:
>
>> To repeat:
>>
>> You seem to be saying that OpenFOAM solves the problem in 10 seconds and
>> PETSc solves it in 14 seconds. Is that correct?
>>
>>
>>
>> On Thu, Nov 8, 2018 at 3:42 AM Edoardo alinovi via petsc-users <
>> petsc-users@mcs.anl.gov> wrote:
>>
>>> Hello Mark,
>>>
>>> Yes, there are 5 KSP calls within a time-step (3 for the solution of
>>> momentum equation + 2 for the solution of pressure), this is the classical
>>> non iterative PISO by Issa ( the exact sequence of operations is : solve
>>> momentum implicitly, solve pressure-correction, momentum explicitly,
>>> pressure correction). The pressure correction equation ,which is something
>>> similar to a Poisson equation for incompressible flows, is the one that
>>> determines the overall performance in my code such as in the others.
>>> Usually, when the pressure is being solved for the second time, the
>>> solution is faster since there is a better input guess and, as in my case,
>>> the preconditioner is not recomputed again.
>>>
>>> Have you got some advices for the multigrid configuration in this
>>> scenario, which are not the default one, in order to increase performances?
>>>
>>> I do not know if this may impact  drastically the performance, but I am
>>> running on a E4 workstation with 16 Intel's Xeon processors (2.3GH/12MB
>>> cache)  and 128GB of RAM .
>>>
>>> Thank you very much for your helpful comments,
>>>
>>>
>>> Edoardo
>>> --
>>>
>>> Edoardo Alinovi, Ph.D.
>>>
>>> DICCA, Scuola Politecnica
>>> Universita' di Genova
>>> 1, via Montallegro
>>> 16145 Genova, Italy
>>>
>>> email: edoardo.alin...@dicca.unige.it
>>> Tel: +39 010 353 2540
>>>
>>>
>>>
>>>
>>> Il giorno mer 7 nov 2018 alle ore 17:59 Mark Adams  ha
>>> scritto:
>>>
 please respond to petsc-users.

 You are doing 5 solves here in 14 seconds. You seem to be saying that
 the two pressure solves are taking all of this time. I don't know why the
 two solves are different.

 You seem to be saying that OpenFOAM solves the problem in 10 seconds
 and PETSc solves it in 14 seconds. Is that correct? Hypre seems to be
 running fine.



 On Wed, Nov 7, 2018 at 11:24 AM Edoardo alinovi <
 edoardo.alin...@gmail.com> wrote:

> Thanks a lot Mark for your kind replay. The solver is mine and I use
> PETSc  for the solution of momentum and pressure. The first is solved very
> fast by a standard bcgs + bjacobi, but the pressure is the source of all
> evils and, unfortunately, I am pretty sure that almost all the time within
> the time-step is needed by KSP to solve the pressure (see log attached). I
> have verified this also putting a couple of mpi_wtime around the kspsolve
> call. The pressure is solved 2 times (1 prediction + 1 correction), the
> prediction takes around 11s , the correction around 4s (here I am avoiding
> to recompute the preconditioner), all the rest of the code (flux 
> assembling
> + mometum solution + others) around 1s. Openfoam does the same procedure
> with the same tolerance in 10s using its gamg version (50 it to converge).
> The number of iteration required to solve the pressure with hypre are 12.
> Gamg performs similarly to hypre in terms of speed, but with 50 iterations
> to converge. Am I missing something in the setup in your opinion?
>
> thanks a lot,
>
> Edo
>
> --
>
> Edoardo Alinovi, Ph.D.
>
> DICCA, Scuola Politecnica
> Universita' di Genova
> 1, via Montallegro
> 16145 Genova, Italy
>
> email: edoardo.alin...@dicca.unige.it
> Tel: +39 010 353 2540
>
>
>

Re: [petsc-users] [petsc-maint] Correct use of PCFactorSetMatOrderingType

2018-11-08 Thread Mark Adams via petsc-users
To repeat:

You seem to be saying that OpenFOAM solves the problem in 10 seconds and
PETSc solves it in 14 seconds. Is that correct?



On Thu, Nov 8, 2018 at 3:42 AM Edoardo alinovi via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Hello Mark,
>
> Yes, there are 5 KSP calls within a time-step (3 for the solution of
> momentum equation + 2 for the solution of pressure), this is the classical
> non iterative PISO by Issa ( the exact sequence of operations is : solve
> momentum implicitly, solve pressure-correction, momentum explicitly,
> pressure correction). The pressure correction equation ,which is something
> similar to a Poisson equation for incompressible flows, is the one that
> determines the overall performance in my code such as in the others.
> Usually, when the pressure is being solved for the second time, the
> solution is faster since there is a better input guess and, as in my case,
> the preconditioner is not recomputed again.
>
> Have you got some advices for the multigrid configuration in this
> scenario, which are not the default one, in order to increase performances?
>
> I do not know if this may impact  drastically the performance, but I am
> running on a E4 workstation with 16 Intel's Xeon processors (2.3GH/12MB
> cache)  and 128GB of RAM .
>
> Thank you very much for your helpful comments,
>
>
> Edoardo
> --
>
> Edoardo Alinovi, Ph.D.
>
> DICCA, Scuola Politecnica
> Universita' di Genova
> 1, via Montallegro
> 16145 Genova, Italy
>
> email: edoardo.alin...@dicca.unige.it
> Tel: +39 010 353 2540
>
>
>
>
> Il giorno mer 7 nov 2018 alle ore 17:59 Mark Adams  ha
> scritto:
>
>> please respond to petsc-users.
>>
>> You are doing 5 solves here in 14 seconds. You seem to be saying that the
>> two pressure solves are taking all of this time. I don't know why the two
>> solves are different.
>>
>> You seem to be saying that OpenFOAM solves the problem in 10 seconds and
>> PETSc solves it in 14 seconds. Is that correct? Hypre seems to be running
>> fine.
>>
>>
>>
>> On Wed, Nov 7, 2018 at 11:24 AM Edoardo alinovi <
>> edoardo.alin...@gmail.com> wrote:
>>
>>> Thanks a lot Mark for your kind replay. The solver is mine and I use
>>> PETSc  for the solution of momentum and pressure. The first is solved very
>>> fast by a standard bcgs + bjacobi, but the pressure is the source of all
>>> evils and, unfortunately, I am pretty sure that almost all the time within
>>> the time-step is needed by KSP to solve the pressure (see log attached). I
>>> have verified this also putting a couple of mpi_wtime around the kspsolve
>>> call. The pressure is solved 2 times (1 prediction + 1 correction), the
>>> prediction takes around 11s , the correction around 4s (here I am avoiding
>>> to recompute the preconditioner), all the rest of the code (flux assembling
>>> + mometum solution + others) around 1s. Openfoam does the same procedure
>>> with the same tolerance in 10s using its gamg version (50 it to converge).
>>> The number of iteration required to solve the pressure with hypre are 12.
>>> Gamg performs similarly to hypre in terms of speed, but with 50 iterations
>>> to converge. Am I missing something in the setup in your opinion?
>>>
>>> thanks a lot,
>>>
>>> Edo
>>>
>>> --
>>>
>>> Edoardo Alinovi, Ph.D.
>>>
>>> DICCA, Scuola Politecnica
>>> Universita' di Genova
>>> 1, via Montallegro
>>> 16145 Genova, Italy
>>>
>>> email: edoardo.alin...@dicca.unige.it
>>> Tel: +39 010 353 2540
>>>
>>>
>>>
>>>
>>> Il giorno mer 7 nov 2018 alle ore 16:50 Mark Adams  ha
>>> scritto:
>>>
 You can try -pc_type gamg, but hypre is a pretty good solver for the
 Laplacian. If hypre is just a little faster than LU on a 3D problem (that
 takes 10 seconds to solve) then AMG is not doing well. I would expect that
 AMG is taking a lot of iterations (eg, >> 10). You can check that with
 -ksp_monitor.

 The PISO algorithm is a multistage algorithm with a pressure correction
 in it. It also has a solve for the velocity, from what I can tell. Are you
 building PISO yourself and using PETSc just for the pressure correction?
 Are you sure the time is spent in this solver? You can use -log_view to see
 performance numbers and look for KSPSolve to see how much time is spent in
 the PETSc solver.

 Mark


 On Wed, Nov 7, 2018 at 10:26 AM Zhang, Hong via petsc-maint <
 petsc-ma...@mcs.anl.gov> wrote:

> Edoardo:
> Forwarding your request to petsc-maint where you can get fast and
> expert advise. I do not have suggestion for your application, but someone
> in our team likely will make suggestion.
> Hong
>
> Hello Hong,
>>
>> Well,  using -sub_pc_type lu  it super slow. I am desperately triying
>> to enhance performaces of my code (CFD, finite volume, PISO alghoritm), 
>> in
>> particular I have a strong bottleneck in the solution of pressure
>> correction equation which takes almost the 90% of computational 

Re: [petsc-users] PETSc (3.9.0) GAMG weak scaling test issue

2018-11-08 Thread Mark Adams via petsc-users
>
>
> I did not configured PETSc with ParMetis support. Should I?
>
> I figured it out when I tried to use "-pc_gamg_repartition". PETSc
> complained that it was not compiled with ParMetis support.
>

You need ParMetis, or some parallel mesh partitioner, configured to use
repartitioning. I would guess that "-pc_gamg_repartition" would not help
and might hurt, because it just does the coarse grids, not the fine grid.
But it is worth a try. Just configure with --download-parmetis

The problem is that you are using space filling curves on the background
grid and are getting empty processors. Right?  The mesh setup phase is not
super optimized, but your times

And you said in your attachment that you added the near null space, but
just the constant vector. I trust you mean the three translational rigid
body modes. That is the default and so you should not see any difference.
If you added one vector of all 1s then that would be bad. You also want the
rotational rigid body modes. Now, you are converging pretty well and if
your solution does not have much rotation in it the the rotational modes
are not needed, but they are required for optimality in general.


Re: [petsc-users] PETSc (3.9.0) GAMG weak scaling test issue

2018-11-07 Thread Mark Adams via petsc-users
First I would add -gamg_est_ksp_type cg

You seem to be converging well so I assume you are setting the null space
for GAMG.

Note, you should test hypre also.

You probably want a bigger "-pc_gamg_process_eq_limit 50". 200 at least but
you test your machine with a range on the largest problem. This is a
parameter for reducing the number of active processors (on coarse grids).

I would only worry about "load3". This has 16K equations per process, which
is where you start noticing "strong scaling" problems, depending on the
machine.

An important parameter is "-pc_gamg_square_graph 0". I would probably start
with infinity (eg, 10).

Now, I'm not sure about your domain, problem sizes, and thus the weak
scaling design. You seem to be scaling on the background mesh, but that may
not be a good proxy for complexity.

You can look at the number of flops and scale it appropriately by the
number of solver iterations to get a relative size of the problem. I would
recommend scaling the number of processors with this. For instance here the
MatMult line for the 4 proc and 16K proc run:


EventCount  Time (sec) Flop
 --- Global ---  --- Stage ---   Total
   Max Ratio  Max Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s

MatMult  636 1.0 1.9035e-01 1.0 3.12e+08 1.1 7.6e+03 3.0e+03
0.0e+00  0 47 62 44  0   0 47 62 44  0  6275 [2 procs]
MatMult 1416 1.0 1.9601e+002744.6 4.82e+08 0.0 4.3e+08 7.2e+02
0.0e+00  0 48 50 48  0   0 48 50 48  0 2757975 [16K procs]

Now, you have empty processors. See the massive load imbalance on time and
the zero on Flops. The "Ratio" is max/min and cleary min=0 so PETSc reports
a ratio of 0 (it is infinity really).

Also, weak scaling on a thin body (I don't know your domain) is a little
funny because as the problem scales up the mesh becomes more 3D and this
causes the cost per equation to go up. That is why I prefer to use the
number of non-zeros as the processor scaling function but number of
equations is easier ...

The PC setup times are large (I see 48 seconds at 16K bu you report 16).
-pc_gamg_square_graph 10 should help that.

The max number of flops per processor in MatMult goes up by 50% and the max
time goes up by 10x and the number of iterations goes up by 13/8. If I put
all of this together I get that 75% of the time at 16K is in communication
at 16K. I think that and the absolute time can be improved some by
optimizing parameters as I've suggested.

Mark





On Wed, Nov 7, 2018 at 11:03 AM "Alberto F. Martín" via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Dear All,
>
> we are performing a weak scaling test of the PETSc (v3.9.0) GAMG
> preconditioner when applied to the linear system arising
> from the *conforming unfitted FE discretization *(using Q1 Lagrangian
> FEs) of a 3D PDE Poisson problem, where
> the boundary of the domain (a popcorn flake)  is described as a
> zero-level-set embedded within a uniform background
> (Cartesian-like) hexahedral mesh. Details underlying the FEM formulation
> can be made available on demand if you
> believe that this might be helpful, but let me just point out that it is
> designed such that it addresses the well-known
> ill-conditioning issues of unfitted FE discretizations due to the small
> cut cell problem.
>
> The weak scaling test is set up as follows. We start from a single cube
> background mesh, and refine it uniformly several
> steps, until we have approximately either 10**3 (load1), 20**3 (load2), or
> 40**3 (load3) hexahedra/MPI task when
> distributing it over 4 MPI tasks. The benchmark is scaled such that the
> next larger scale problem to be tested is obtained
> by uniformly refining the mesh from the previous scale and running it on
> 8x times the number of MPI tasks that we used
> in the previous scale.  As a result, we obtain three weak scaling curves
> for each of the three fixed loads per MPI task
> above, on the following total number of MPI tasks: 4, 32, 262, 2097,
> 16777. The underlying mesh is not partitioned among
> MPI tasks using ParMETIS (unstructured multilevel graph partitioning)  nor
> optimally by hand, but following the so-called
> z-shape space-filling curves provided by an underlying octree-like mesh
> handler (i.e., p4est library).
>
> I configured the preconditioned linear solver as follows:
>
> -ksp_type cg
> -ksp_monitor
> -ksp_rtol 1.0e-6
> -ksp_converged_reason
> -ksp_max_it 500
> -ksp_norm_type unpreconditioned
> -ksp_view
> -log_view
>
> -pc_type gamg
> -pc_gamg_type agg
> -mg_levels_esteig_ksp_type cg
> -mg_coarse_sub_pc_type cholesky
> -mg_coarse_sub_pc_factor_mat_ordering_type nd
> -pc_gamg_process_eq_limit 50
> -pc_gamg_square_graph 0
> 

Re: [petsc-users] [petsc-maint] Correct use of PCFactorSetMatOrderingType

2018-11-07 Thread Mark Adams via petsc-users
please respond to petsc-users.

You are doing 5 solves here in 14 seconds. You seem to be saying that the
two pressure solves are taking all of this time. I don't know why the two
solves are different.

You seem to be saying that OpenFOAM solves the problem in 10 seconds and
PETSc solves it in 14 seconds. Is that correct? Hypre seems to be running
fine.



On Wed, Nov 7, 2018 at 11:24 AM Edoardo alinovi 
wrote:

> Thanks a lot Mark for your kind replay. The solver is mine and I use
> PETSc  for the solution of momentum and pressure. The first is solved very
> fast by a standard bcgs + bjacobi, but the pressure is the source of all
> evils and, unfortunately, I am pretty sure that almost all the time within
> the time-step is needed by KSP to solve the pressure (see log attached). I
> have verified this also putting a couple of mpi_wtime around the kspsolve
> call. The pressure is solved 2 times (1 prediction + 1 correction), the
> prediction takes around 11s , the correction around 4s (here I am avoiding
> to recompute the preconditioner), all the rest of the code (flux assembling
> + mometum solution + others) around 1s. Openfoam does the same procedure
> with the same tolerance in 10s using its gamg version (50 it to converge).
> The number of iteration required to solve the pressure with hypre are 12.
> Gamg performs similarly to hypre in terms of speed, but with 50 iterations
> to converge. Am I missing something in the setup in your opinion?
>
> thanks a lot,
>
> Edo
>
> --
>
> Edoardo Alinovi, Ph.D.
>
> DICCA, Scuola Politecnica
> Universita' di Genova
> 1, via Montallegro
> 16145 Genova, Italy
>
> email: edoardo.alin...@dicca.unige.it
> Tel: +39 010 353 2540
>
>
>
>
> Il giorno mer 7 nov 2018 alle ore 16:50 Mark Adams  ha
> scritto:
>
>> You can try -pc_type gamg, but hypre is a pretty good solver for the
>> Laplacian. If hypre is just a little faster than LU on a 3D problem (that
>> takes 10 seconds to solve) then AMG is not doing well. I would expect that
>> AMG is taking a lot of iterations (eg, >> 10). You can check that with
>> -ksp_monitor.
>>
>> The PISO algorithm is a multistage algorithm with a pressure correction
>> in it. It also has a solve for the velocity, from what I can tell. Are you
>> building PISO yourself and using PETSc just for the pressure correction?
>> Are you sure the time is spent in this solver? You can use -log_view to see
>> performance numbers and look for KSPSolve to see how much time is spent in
>> the PETSc solver.
>>
>> Mark
>>
>>
>> On Wed, Nov 7, 2018 at 10:26 AM Zhang, Hong via petsc-maint <
>> petsc-ma...@mcs.anl.gov> wrote:
>>
>>> Edoardo:
>>> Forwarding your request to petsc-maint where you can get fast and expert
>>> advise. I do not have suggestion for your application, but someone in our
>>> team likely will make suggestion.
>>> Hong
>>>
>>> Hello Hong,

 Well,  using -sub_pc_type lu  it super slow. I am desperately triying
 to enhance performaces of my code (CFD, finite volume, PISO alghoritm), in
 particular I have a strong bottleneck in the solution of pressure
 correction equation which takes almost the 90% of computational time. Using
 multigrid as preconditoner (hypre with default options)  is slighlty
 better, but comparing the results against the multigrid used in openFOAM,
 my code is losing 10s/iteration which a huge amount of time. Now, since
 that all the time is employed by KSPSolve, I feel a bit powerless.  Do you
 have any helpful advice?

 Thank you very much!
 --

 Edoardo Alinovi, Ph.D.

 DICCA, Scuola Politecnica
 Universita' di Genova
 1, via Montallegro
 16145 Genova, Italy

 email: edoardo.alin...@dicca.unige.it
 Tel: +39 010 353 2540




 Il giorno mar 6 nov 2018 alle ore 17:15 Zhang, Hong 
 ha scritto:

> Edoardo:
> Interesting. I thought it would not affect performance much. What
> happens if you use -sub_pc_type lu'?
> Hong
>
> Dear Hong and Matt,
>>
>> thank you for your kind replay. I have just tested your suggestions
>> and applied " -sub_pc_type ilu -sub_pc_factor_mat_ordering_type nd/rcm"
>> and, in both cases, I have found  a deterioration of performances
>> with respect to doing nothing (thus just putting default PCBJACOBI). Is 
>> it
>> normal? However, I guess this is very problem dependent.
>> --
>>
>> Edoardo Alinovi, Ph.D.
>>
>> DICCA, Scuola Politecnica
>> Universita' di Genova
>> 1, via Montallegro
>> 16145 Genova, Italy
>>
>> email: edoardo.alin...@dicca.unige.it
>> Tel: +39 010 353 2540
>>
>>
>>
>>
>> Il giorno mar 6 nov 2018 alle ore 16:04 Zhang, Hong <
>> hzh...@mcs.anl.gov> ha scritto:
>>
>>> Edoardo:
>>> You can test runtime option '-sub_pc_factor_mat_ordering_type' and
>>> use '-log_view' to get performance on different orderings,
>>> 

Re: [petsc-users] Problems about PCtype bjacobi

2018-11-07 Thread Mark Adams via petsc-users
On Wed, Nov 7, 2018 at 10:16 AM Yingjie Wu via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Dear Petsc developer:
> Hi,
> Recently, I'm solving the problems of nonlinear systems of PDEs, I
> encountered some problems about precondition and wanted to seek help.
>
> 1.I set the precondition matrix in SNES as MPIAIJ in the program, and then
> use Matrix Free method to solve my problem. The log information of the
> program is as follows:
>
> SNES Object: 1 MPI processes
>   type: newtonls
>   maximum iterations=50, maximum function evaluations=1
>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>   total number of linear solver iterations=177
>   total number of function evaluations=371
>   norm schedule ALWAYS
>   SNESLineSearch Object: 1 MPI processes
> type: bt
>   interpolation: cubic
>   alpha=1.00e-04
> maxstep=1.00e+08, minlambda=1.00e-12
> tolerances: relative=1.00e-08, absolute=1.00e-15,
> lambda=1.00e-08
> maximum iterations=40
>   KSP Object: 1 MPI processes
> type: gmres
>   restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>   happy breakdown tolerance 1e-30
> maximum iterations=1, initial guess is zero
> tolerances:  relative=0.01, absolute=1e-50, divergence=1.
> left preconditioning
> using PRECONDITIONED norm type for convergence test
>   PC Object: 1 MPI processes
> type: bjacobi
>   number of blocks = 1
>   Local solve is same for all blocks, in the following KSP and PC
> objects:
>   KSP Object: (sub_) 1 MPI processes
> type: preonly
> maximum iterations=1, initial guess is zero
> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> left preconditioning
> using NONE norm type for convergence test
>   PC Object: (sub_) 1 MPI processes
> type: bjacobi
>   number of blocks = 1
>   Local solve is same for all blocks, in the following KSP and PC
> objects:
>   KSP Object: (sub_sub_) 1 MPI processes
> type: preonly
> maximum iterations=1, initial guess is zero
> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> left preconditioning
> using NONE norm type for convergence test
>   PC Object: (sub_sub_) 1 MPI processes
>type: ilu
>   out-of-place factorization
>   0 levels of fill
>   tolerance for zero pivot 2.22045e-14
>   matrix ordering: natural
>   factor fill ratio given 1., needed 1.
> Factored matrix follows:
>  Mat Object: 1 MPI processes
> type: seqaij
> rows=961, cols=961
>package used to perform factorization: petsc
> total: nonzeros=4129, allocated nonzeros=4129
> total number of mallocs used during MatSetValues calls
> =0
>   not using I-node routines
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
>   type: seqaij
>   rows=961, cols=961
>   total: nonzeros=4129, allocated nonzeros=4805
>   total number of mallocs used during MatSetValues calls =0
> not using I-node routines
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
>   type: mpiaij
>   rows=961, cols=961
>   total: nonzeros=4129, allocated nonzeros=9610
>  total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
> linear system matrix followed by preconditioner matrix:
> Mat Object: 1 MPI processes
>   type: mffd
>   rows=961, cols=961
> Matrix-free approximation:
>   err=1.49012e-08 (relative error in function evaluation)
>   Using wp compute h routine
>   Does not compute normU
> Mat Object: 1 MPI processes
>   type: mpiaij
>   rows=961, cols=961
>   total: nonzeros=4129, allocated nonzeros=9610
>   total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
>
> Although parallel matrix is used, it runs on a single processor. Because
> of the use of parallel matrices, the overall precondition scheme should be
> bjacobi, and then build a KSP for each block (there is only one block in my
> program). Therefore, there will be a sub KSP object in the PC information.
> But in the above information, a subsubksp is also embedded in the sub KSP
> object. I don't understand the reason for this KSP. Please help me answer.
>

It looks like you are setting the sub PC type to bjacobi and it has a sub
(sub) PC.


>
> 2. Bjacobi is a precondition method in theory. Why is there a subsystem of
> linear equations solver 

Re: [petsc-users] DIVERGED_NANORING with PC GAMG

2018-11-05 Thread Mark Adams via petsc-users
On Mon, Nov 5, 2018 at 4:11 PM Appel, Thibaut 
wrote:

> "Local" as in serial?
>

Block Jacobi with ILU as the solver on each block. Each block corresponds
to an MPI process by default. So it is completely parallel it is just not a
true ILU. I the limit of one equation per processor it is just (point)
Jacobi.


>
> Thibaut
>
> On 5 Nov 2018, at 20:12, Mark Adams  wrote:
>
>
>
> On Mon, Nov 5, 2018 at 12:50 PM Thibaut Appel 
> wrote:
>
>> Hi Mark,
>>
>> Yes it doesn't seem to be usable. Unfortunately we're aiming to do 3D so
>> direct solvers are not a viable solution and PETSc' ILU is not parallel and
>> we can't use HYPRE (complex arithmetic)
>>
>
> I think SuperLU has a parallel ILU but in my opinion parallel ILU is not a
> big deal. Neither is optimal and the math win (faster convergence) with
> parallel is offset by the cost of synchronization, in some form, for a true
> parallel ILU. So I think the PETSc default gmres/(local)ILU is your
> best option.
>
>
>> Thibaut
>> On 01/11/2018 20:42, Mark Adams wrote:
>>
>>
>>
>> On Wed, Oct 31, 2018 at 8:11 PM Smith, Barry F. 
>> wrote:
>>
>>>
>>>
>>> > On Oct 31, 2018, at 5:39 PM, Appel, Thibaut via petsc-users <
>>> petsc-users@mcs.anl.gov> wrote:
>>> >
>>> > Well yes naturally for the residual but adding -ksp_true_residual just
>>> gives
>>> >
>>> >   0 KSP unpreconditioned resid norm 3.583290589961e+00 true resid norm
>>> 3.583290589961e+00 ||r(i)||/||b|| 1.e+00
>>> >   1 KSP unpreconditioned resid norm 0.e+00 true resid norm
>>> 3.583290589961e+00 ||r(i)||/||b|| 1.e+00
>>> > Linear solve converged due to CONVERGED_ATOL iterations 1
>>>
>>>Very bad stuff is happening in the preconditioner. The preconditioner
>>> must have a null space (which it shouldn't have to be a useful
>>> preconditioner).
>>>
>>
>> Yea, you are far away from an optimal preconditioner for this system. In
>> low frequency (indefinite) Helmholtz is very very hard. Now, something very
>> bad is going on here but even if you fix it standard AMG is not good for
>> these problems. I would use direct solvers or grind away it with ILU.
>>
>>
>>>
>>> >
>>> > Mark - if that helps - a Poisson equation is used for the pressure so
>>> the Helmholtz is the same as for the velocity in the interior.
>>> >
>>> > Thibaut
>>> >
>>> >> Le 31 oct. 2018 à 21:05, Mark Adams  a écrit :
>>> >>
>>> >> These are indefinite (bad) Helmholtz problems. Right?
>>> >>
>>> >> On Wed, Oct 31, 2018 at 2:38 PM Matthew Knepley 
>>> wrote:
>>> >> On Wed, Oct 31, 2018 at 2:13 PM Thibaut Appel <
>>> t.appe...@imperial.ac.uk> wrote:
>>> >> Hi Mark, Matthew,
>>> >>
>>> >> Thanks for taking the time.
>>> >>
>>> >> 1) You're not suggesting having -fieldsplit_X_ksp_type fgmres for
>>> each field, are you?
>>> >>
>>> >> 2) No, the matrix has pressure in one of the fields. Here it's a 2D
>>> problem (but we're also doing 3D), the unknowns are (p,u,v) and those are
>>> my 3 fields. We are dealing with subsonic/transsonic flows so it is
>>> convection dominated indeed.
>>> >>
>>> >> 3) We are in frequency domain with respect to time, i.e.
>>> \partial{phi}/\partial{t} = -i*omega*phi.
>>> >>
>>> >> 4) Hypre is unfortunately not an option since we are in complex
>>> arithmetic.
>>> >>
>>> >>
>>> >>
>>> >>> I'm not sure about "-fieldsplit_pc_type gamg" GAMG should work on
>>> one block, and hence be a subpc. I'm not up on fieldsplit syntax.
>>> >> According to the online manual page this syntax applies the suffix to
>>> all the defined fields?
>>> >>
>>> >>
>>> >>
>>> >>> Mark is correct. I wanted you to change the smoother. He shows how
>>> to change it to Richardson (make sure you add the self-scale option), which
>>> is probably the best choice.
>>> >>>
>>> >>>   Thanks,
>>> >>>
>>> >>>  Matt
>>> >>
>>> >> You did tell me to set it to GMRES if I'm not mistaken, that's why I
>>> tried "-fieldsplit_mg_levels_ksp_type gmres" (mentioned in the email).
>>> Also, it wasn't clear whether these should be applied to each block or the
>>> whole system, as the online manual pages + .pdf manual barely mention
>>> smoothers and how to manipulate MG objects with KSP/PC, this especially
>>> with PCFIELDSPLIT where examples are scarce.
>>> >>
>>> >> From what I can gather from your suggestions I tried (lines with X
>>> are repeated for X={0,1,2})
>>> >>
>>> >> This looks good. How can an identically zero vector produce a 0
>>> residual? You should always monitor with
>>> >>
>>> >>   -ksp_monitor_true_residual.
>>> >>
>>> >>Thanks,
>>> >>
>>> >> Matt
>>> >> -ksp_view_pre -ksp_monitor -ksp_converged_reason \
>>> >> -ksp_type fgmres -ksp_rtol 1.0e-8 \
>>> >> -pc_type fieldsplit \
>>> >> -pc_fieldsplit_type multiplicative \
>>> >> -pc_fieldsplit_block_size 3 \
>>> >> -pc_fieldsplit_0_fields 0 \
>>> >> -pc_fieldsplit_1_fields 1 \
>>> >> -pc_fieldsplit_2_fields 2 \
>>> >> -fieldsplit_X_pc_type gamg \
>>> >> -fieldsplit_X_ksp_type gmres \
>>> >> -fieldsplit_X_ksp_rtol 1e-10 \
>>> >> 

Re: [petsc-users] DIVERGED_NANORING with PC GAMG

2018-11-05 Thread Mark Adams via petsc-users
On Mon, Nov 5, 2018 at 12:50 PM Thibaut Appel 
wrote:

> Hi Mark,
>
> Yes it doesn't seem to be usable. Unfortunately we're aiming to do 3D so
> direct solvers are not a viable solution and PETSc' ILU is not parallel and
> we can't use HYPRE (complex arithmetic)
>

I think SuperLU has a parallel ILU but in my opinion parallel ILU is not a
big deal. Neither is optimal and the math win (faster convergence) with
parallel is offset by the cost of synchronization, in some form, for a true
parallel ILU. So I think the PETSc default gmres/(local)ILU is your
best option.


> Thibaut
> On 01/11/2018 20:42, Mark Adams wrote:
>
>
>
> On Wed, Oct 31, 2018 at 8:11 PM Smith, Barry F. 
> wrote:
>
>>
>>
>> > On Oct 31, 2018, at 5:39 PM, Appel, Thibaut via petsc-users <
>> petsc-users@mcs.anl.gov> wrote:
>> >
>> > Well yes naturally for the residual but adding -ksp_true_residual just
>> gives
>> >
>> >   0 KSP unpreconditioned resid norm 3.583290589961e+00 true resid norm
>> 3.583290589961e+00 ||r(i)||/||b|| 1.e+00
>> >   1 KSP unpreconditioned resid norm 0.e+00 true resid norm
>> 3.583290589961e+00 ||r(i)||/||b|| 1.e+00
>> > Linear solve converged due to CONVERGED_ATOL iterations 1
>>
>>Very bad stuff is happening in the preconditioner. The preconditioner
>> must have a null space (which it shouldn't have to be a useful
>> preconditioner).
>>
>
> Yea, you are far away from an optimal preconditioner for this system. In
> low frequency (indefinite) Helmholtz is very very hard. Now, something very
> bad is going on here but even if you fix it standard AMG is not good for
> these problems. I would use direct solvers or grind away it with ILU.
>
>
>>
>> >
>> > Mark - if that helps - a Poisson equation is used for the pressure so
>> the Helmholtz is the same as for the velocity in the interior.
>> >
>> > Thibaut
>> >
>> >> Le 31 oct. 2018 à 21:05, Mark Adams  a écrit :
>> >>
>> >> These are indefinite (bad) Helmholtz problems. Right?
>> >>
>> >> On Wed, Oct 31, 2018 at 2:38 PM Matthew Knepley 
>> wrote:
>> >> On Wed, Oct 31, 2018 at 2:13 PM Thibaut Appel <
>> t.appe...@imperial.ac.uk> wrote:
>> >> Hi Mark, Matthew,
>> >>
>> >> Thanks for taking the time.
>> >>
>> >> 1) You're not suggesting having -fieldsplit_X_ksp_type fgmres for each
>> field, are you?
>> >>
>> >> 2) No, the matrix has pressure in one of the fields. Here it's a 2D
>> problem (but we're also doing 3D), the unknowns are (p,u,v) and those are
>> my 3 fields. We are dealing with subsonic/transsonic flows so it is
>> convection dominated indeed.
>> >>
>> >> 3) We are in frequency domain with respect to time, i.e.
>> \partial{phi}/\partial{t} = -i*omega*phi.
>> >>
>> >> 4) Hypre is unfortunately not an option since we are in complex
>> arithmetic.
>> >>
>> >>
>> >>
>> >>> I'm not sure about "-fieldsplit_pc_type gamg" GAMG should work on one
>> block, and hence be a subpc. I'm not up on fieldsplit syntax.
>> >> According to the online manual page this syntax applies the suffix to
>> all the defined fields?
>> >>
>> >>
>> >>
>> >>> Mark is correct. I wanted you to change the smoother. He shows how to
>> change it to Richardson (make sure you add the self-scale option), which is
>> probably the best choice.
>> >>>
>> >>>   Thanks,
>> >>>
>> >>>  Matt
>> >>
>> >> You did tell me to set it to GMRES if I'm not mistaken, that's why I
>> tried "-fieldsplit_mg_levels_ksp_type gmres" (mentioned in the email).
>> Also, it wasn't clear whether these should be applied to each block or the
>> whole system, as the online manual pages + .pdf manual barely mention
>> smoothers and how to manipulate MG objects with KSP/PC, this especially
>> with PCFIELDSPLIT where examples are scarce.
>> >>
>> >> From what I can gather from your suggestions I tried (lines with X are
>> repeated for X={0,1,2})
>> >>
>> >> This looks good. How can an identically zero vector produce a 0
>> residual? You should always monitor with
>> >>
>> >>   -ksp_monitor_true_residual.
>> >>
>> >>Thanks,
>> >>
>> >> Matt
>> >> -ksp_view_pre -ksp_monitor -ksp_converged_reason \
>> >> -ksp_type fgmres -ksp_rtol 1.0e-8 \
>> >> -pc_type fieldsplit \
>> >> -pc_fieldsplit_type multiplicative \
>> >> -pc_fieldsplit_block_size 3 \
>> >> -pc_fieldsplit_0_fields 0 \
>> >> -pc_fieldsplit_1_fields 1 \
>> >> -pc_fieldsplit_2_fields 2 \
>> >> -fieldsplit_X_pc_type gamg \
>> >> -fieldsplit_X_ksp_type gmres \
>> >> -fieldsplit_X_ksp_rtol 1e-10 \
>> >> -fieldsplit_X_mg_levels_ksp_type richardson \
>> >> -fieldsplit_X_mg_levels_pc_type sor \
>> >> -fieldsplit_X_pc_gamg_agg_nsmooths 0 \
>> >> -fieldsplit_X_mg_levels_ksp_richardson_self_scale \
>> >> -log_view
>> >>
>> >> which yields
>> >>
>> >> KSP Object: 1 MPI processes
>> >>   type: fgmres
>> >> restart=30, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>> >> happy breakdown tolerance 1e-30
>> >>   maximum iterations=1, initial guess is zero
>> >>   tolerances:  

Re: [petsc-users] Problems about Assemble DMComposite Precondition Matrix

2018-11-05 Thread Mark Adams via petsc-users
On Mon, Nov 5, 2018 at 10:37 AM Yingjie Wu  wrote:

> Thank you very much for your reply.
> My equation is a neutron diffusion equation with eigenvalues, which is why
> I use DMConposite because there is a single non-physical field variable,
> eigenvalue.
>

OK, DMComposite might be your best choice.


> I am not very familiar with FieldSplit. I can understand it first.
> It seems not the problem of DmConposite because the reason of error
> reporting is that the diagonal elements of the precondition matrix have
> zero terms.
> My requirement is to divide precondition matrix to sub-matrices, because I
> have neutrons of multiple energy groups (each is a physical field). I want
> to assign only diagonal block matrices, preferably using
> MatSetValuesStencil (which can simplify the assignment of five diagonal
> matrices).
>

So you are trying to create the identity matrix with MatSetValuesStencil
but you are getting zeros on the diagonal.

I would debug this by looking at the matrix in matlab (or Octave). You can
use code like this:

  if (PETSC_TRUE) {
PetscViewer viewer;
ierr = PetscViewerASCIIOpen(comm, "Amat.m", );CHKERRQ(ierr);
ierr = PetscViewerPushFormat(viewer,
PETSC_VIEWER_ASCII_MATLAB);CHKERRQ(ierr);
ierr = MatView(Amat,viewer);CHKERRQ(ierr);
ierr = PetscViewerPopFormat(viewer);CHKERRQ(ierr);
ierr = PetscViewerDestroy();
  }

Test on a small, serial problem and see where your data is actually going,
if not on the diagonal.


> Thanks,
> Yingjie
>
> Mark Adams  于2018年11月5日周一 下午11:22写道:
>
>> DMComposite is not very mature, the last time I checked and I don't of
>> anyone having worked on it recently, and it is probably not what you want
>> anyway. FieldSplit is most likely what you want.
>>
>> What are your equations and discretization? eg, Stokes with cell centered
>> pressure? There are probably examples that are close to what you want and
>> it would not be hard to move your code over.
>>
>> Mark
>>
>> On Mon, Nov 5, 2018 at 10:00 AM Yingjie Wu via petsc-users <
>> petsc-users@mcs.anl.gov> wrote:
>>
>>> Dear Petsc developer:
>>> Hi,
>>> I have recently studied the preconditioner of the program, and some
>>> problems have arisen. Please help me to solve them.
>>> At present, I have written a program to solve the system of non-linear
>>> equations. The Matrix Free method has been used to calculate results. But I
>>> want to add a preprocessing matrix to it.
>>> I used the DMComposite object, which stores two sub-DM objects and a
>>> single value (two physical field variables and one variable). I want to use
>>> MatGetLocalSubMatrix to assign each physical field sub precondition matrix.
>>> At the same time, because my DM object is two-dimensional, I use
>>> MatSetValuesStencil() to fill the sub matrix.
>>> At present, I just want to fill in a unit matrix (for global vectors) as
>>> the precondition matrix of Matrix Free Method (just to test whether it can
>>> be used, the unit matrix has no preprocessing effect). But the procedure
>>> was wrong.
>>>
>>>yjwu@yjwu-XPS-8910:~/petsc-3.10.1/src/snes/examples/tutorials$
>>> mpiexec -n 1 ./ex216 -f wu-readtwogroups -snes_mf_operator -snes_view
>>> -snes_converged_reason -snes_monitor -ksp_converged_reason
>>> -ksp_monitor_true_residual
>>>
>>>   0 SNES Function norm 8.235090086536e-02
>>> iter = 0, SNES Function norm 0.0823509
>>> iter = 0, Keff === 1.
>>>   Linear solve did not converge due to DIVERGED_PCSETUP_FAILED
>>> iterations 0
>>>  PCSETUP_FAILED due to FACTOR_NUMERIC_ZEROPIVOT
>>> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations
>>> 0
>>> SNES Object: 1 MPI processes
>>>   type: newtonls
>>>   maximum iterations=50, maximum function evaluations=1
>>>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>>>   total number of linear solver iterations=0
>>>   total number of function evaluations=1
>>>   norm schedule ALWAYS
>>>   SNESLineSearch Object: 1 MPI processes
>>> type: bt
>>>   interpolation: cubic
>>>   alpha=1.00e-04
>>> maxstep=1.00e+08, minlambda=1.00e-12
>>> tolerances: relative=1.00e-08, absolute=1.00e-15,
>>> lambda=1.00e-08
>>> maximum iterations=40
>>>   KSP Object: 1 MPI processes
>>> type: gmres
>>>   restart=30, using Classical (unmodified) Gram-Schmidt
>>> Orthogonalization with no iterative refinement
>>>   happy breakdown tolerance 1e-30
>>> maximum iterations=1, initial guess is zero
>>> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>>> left preconditioning
>>> using PRECONDITIONED norm type for convergence test
>>>   PC Object: 1 MPI processes
>>> type: ilu
>>>   out-of-place factorization
>>>   0 levels of fill
>>>   tolerance for zero pivot 2.22045e-14
>>>   matrix ordering: natural
>>>   factor fill ratio given 1., needed 1.
>>> Factored matrix follows:
>>>   Mat Object: 1 MPI processes
>>>

Re: [petsc-users] Problems about Assemble DMComposite Precondition Matrix

2018-11-05 Thread Mark Adams via petsc-users
DMComposite is not very mature, the last time I checked and I don't of
anyone having worked on it recently, and it is probably not what you want
anyway. FieldSplit is most likely what you want.

What are your equations and discretization? eg, Stokes with cell centered
pressure? There are probably examples that are close to what you want and
it would not be hard to move your code over.

Mark

On Mon, Nov 5, 2018 at 10:00 AM Yingjie Wu via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Dear Petsc developer:
> Hi,
> I have recently studied the preconditioner of the program, and some
> problems have arisen. Please help me to solve them.
> At present, I have written a program to solve the system of non-linear
> equations. The Matrix Free method has been used to calculate results. But I
> want to add a preprocessing matrix to it.
> I used the DMComposite object, which stores two sub-DM objects and a
> single value (two physical field variables and one variable). I want to use
> MatGetLocalSubMatrix to assign each physical field sub precondition matrix.
> At the same time, because my DM object is two-dimensional, I use
> MatSetValuesStencil() to fill the sub matrix.
> At present, I just want to fill in a unit matrix (for global vectors) as
> the precondition matrix of Matrix Free Method (just to test whether it can
> be used, the unit matrix has no preprocessing effect). But the procedure
> was wrong.
>
>yjwu@yjwu-XPS-8910:~/petsc-3.10.1/src/snes/examples/tutorials$
> mpiexec -n 1 ./ex216 -f wu-readtwogroups -snes_mf_operator -snes_view
> -snes_converged_reason -snes_monitor -ksp_converged_reason
> -ksp_monitor_true_residual
>
>   0 SNES Function norm 8.235090086536e-02
> iter = 0, SNES Function norm 0.0823509
> iter = 0, Keff === 1.
>   Linear solve did not converge due to DIVERGED_PCSETUP_FAILED iterations 0
>  PCSETUP_FAILED due to FACTOR_NUMERIC_ZEROPIVOT
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> SNES Object: 1 MPI processes
>   type: newtonls
>   maximum iterations=50, maximum function evaluations=1
>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>   total number of linear solver iterations=0
>   total number of function evaluations=1
>   norm schedule ALWAYS
>   SNESLineSearch Object: 1 MPI processes
> type: bt
>   interpolation: cubic
>   alpha=1.00e-04
> maxstep=1.00e+08, minlambda=1.00e-12
> tolerances: relative=1.00e-08, absolute=1.00e-15,
> lambda=1.00e-08
> maximum iterations=40
>   KSP Object: 1 MPI processes
> type: gmres
>   restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>   happy breakdown tolerance 1e-30
> maximum iterations=1, initial guess is zero
> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> left preconditioning
> using PRECONDITIONED norm type for convergence test
>   PC Object: 1 MPI processes
> type: ilu
>   out-of-place factorization
>   0 levels of fill
>   tolerance for zero pivot 2.22045e-14
>   matrix ordering: natural
>   factor fill ratio given 1., needed 1.
> Factored matrix follows:
>   Mat Object: 1 MPI processes
> type: seqaij
> rows=961, cols=961
> package used to perform factorization: petsc
> total: nonzeros=4625, allocated nonzeros=4625
> total number of mallocs used during MatSetValues calls =0
>   not using I-node routines
> linear system matrix followed by preconditioner matrix:
> Mat Object: 1 MPI processes
>   type: mffd
>   rows=961, cols=961
> Matrix-free approximation:
>   err=1.49012e-08 (relative error in function evaluation)
>   The compute h routine has not yet been set
> Mat Object: 1 MPI processes
>   type: seqaij
>   rows=961, cols=961
>   total: nonzeros=4625, allocated nonzeros=4625
>   total number of mallocs used during MatSetValues calls =0
> not using I-node routines
>
> It seems that there are elements on the diagonal line that are not filled,
> but I don't understand what went wrong. Part of the code of my program is
> as follows. I added all the code to the appendix. I have tested that for
> the entire Jacobian matrix assignment unit matrix (no submatrix, direct
> operation of the global preprocessing matrix), the program can run
> normally. However, since the problems developed later may be more complex,
> involving multiple physical fields, it would be easier to implement
> submatrix.
>
> ..
>
> /* set DMComposite */
>
> ierr =
> DMDACreate2d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DMDA_STENCIL_STAR,20,24,PETSC_DECIDE,PETSC_DECIDE,1,1,0,0,);CHKERRQ(ierr);
>   ierr = DMSetFromOptions(user.da1);CHKERRQ(ierr);
>   ierr = DMSetUp(user.da1);CHKERRQ(ierr);
>   ierr = DMCompositeAddDM(user.packer,user.da1);CHKERRQ(ierr);
> 

Re: [petsc-users] DIVERGED_NANORING with PC GAMG

2018-11-01 Thread Mark Adams via petsc-users
On Wed, Oct 31, 2018 at 8:11 PM Smith, Barry F.  wrote:

>
>
> > On Oct 31, 2018, at 5:39 PM, Appel, Thibaut via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
> >
> > Well yes naturally for the residual but adding -ksp_true_residual just
> gives
> >
> >   0 KSP unpreconditioned resid norm 3.583290589961e+00 true resid norm
> 3.583290589961e+00 ||r(i)||/||b|| 1.e+00
> >   1 KSP unpreconditioned resid norm 0.e+00 true resid norm
> 3.583290589961e+00 ||r(i)||/||b|| 1.e+00
> > Linear solve converged due to CONVERGED_ATOL iterations 1
>
>Very bad stuff is happening in the preconditioner. The preconditioner
> must have a null space (which it shouldn't have to be a useful
> preconditioner).
>

Yea, you are far away from an optimal preconditioner for this system. In
low frequency (indefinite) Helmholtz is very very hard. Now, something very
bad is going on here but even if you fix it standard AMG is not good for
these problems. I would use direct solvers or grind away it with ILU.


>
> >
> > Mark - if that helps - a Poisson equation is used for the pressure so
> the Helmholtz is the same as for the velocity in the interior.
> >
> > Thibaut
> >
> >> Le 31 oct. 2018 à 21:05, Mark Adams  a écrit :
> >>
> >> These are indefinite (bad) Helmholtz problems. Right?
> >>
> >> On Wed, Oct 31, 2018 at 2:38 PM Matthew Knepley 
> wrote:
> >> On Wed, Oct 31, 2018 at 2:13 PM Thibaut Appel 
> wrote:
> >> Hi Mark, Matthew,
> >>
> >> Thanks for taking the time.
> >>
> >> 1) You're not suggesting having -fieldsplit_X_ksp_type fgmres for each
> field, are you?
> >>
> >> 2) No, the matrix has pressure in one of the fields. Here it's a 2D
> problem (but we're also doing 3D), the unknowns are (p,u,v) and those are
> my 3 fields. We are dealing with subsonic/transsonic flows so it is
> convection dominated indeed.
> >>
> >> 3) We are in frequency domain with respect to time, i.e.
> \partial{phi}/\partial{t} = -i*omega*phi.
> >>
> >> 4) Hypre is unfortunately not an option since we are in complex
> arithmetic.
> >>
> >>
> >>
> >>> I'm not sure about "-fieldsplit_pc_type gamg" GAMG should work on one
> block, and hence be a subpc. I'm not up on fieldsplit syntax.
> >> According to the online manual page this syntax applies the suffix to
> all the defined fields?
> >>
> >>
> >>
> >>> Mark is correct. I wanted you to change the smoother. He shows how to
> change it to Richardson (make sure you add the self-scale option), which is
> probably the best choice.
> >>>
> >>>   Thanks,
> >>>
> >>>  Matt
> >>
> >> You did tell me to set it to GMRES if I'm not mistaken, that's why I
> tried "-fieldsplit_mg_levels_ksp_type gmres" (mentioned in the email).
> Also, it wasn't clear whether these should be applied to each block or the
> whole system, as the online manual pages + .pdf manual barely mention
> smoothers and how to manipulate MG objects with KSP/PC, this especially
> with PCFIELDSPLIT where examples are scarce.
> >>
> >> From what I can gather from your suggestions I tried (lines with X are
> repeated for X={0,1,2})
> >>
> >> This looks good. How can an identically zero vector produce a 0
> residual? You should always monitor with
> >>
> >>   -ksp_monitor_true_residual.
> >>
> >>Thanks,
> >>
> >> Matt
> >> -ksp_view_pre -ksp_monitor -ksp_converged_reason \
> >> -ksp_type fgmres -ksp_rtol 1.0e-8 \
> >> -pc_type fieldsplit \
> >> -pc_fieldsplit_type multiplicative \
> >> -pc_fieldsplit_block_size 3 \
> >> -pc_fieldsplit_0_fields 0 \
> >> -pc_fieldsplit_1_fields 1 \
> >> -pc_fieldsplit_2_fields 2 \
> >> -fieldsplit_X_pc_type gamg \
> >> -fieldsplit_X_ksp_type gmres \
> >> -fieldsplit_X_ksp_rtol 1e-10 \
> >> -fieldsplit_X_mg_levels_ksp_type richardson \
> >> -fieldsplit_X_mg_levels_pc_type sor \
> >> -fieldsplit_X_pc_gamg_agg_nsmooths 0 \
> >> -fieldsplit_X_mg_levels_ksp_richardson_self_scale \
> >> -log_view
> >>
> >> which yields
> >>
> >> KSP Object: 1 MPI processes
> >>   type: fgmres
> >> restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> >> happy breakdown tolerance 1e-30
> >>   maximum iterations=1, initial guess is zero
> >>   tolerances:  relative=1e-08, absolute=1e-50, divergence=1.
> >>   left preconditioning
> >>   using DEFAULT norm type for convergence test
> >> PC Object: 1 MPI processes
> >>   type: fieldsplit
> >>   PC has not been set up so information may be incomplete
> >> FieldSplit with MULTIPLICATIVE composition: total splits = 3,
> blocksize = 3
> >> Solver info for each split is in the following KSP objects:
> >>   Split number 0 Fields  0
> >>   KSP Object: (fieldsplit_0_) 1 MPI processes
> >> type: preonly
> >> maximum iterations=1, initial guess is zero
> >> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> >> left preconditioning
> >> using DEFAULT norm type for convergence test
> >>   PC Object: (fieldsplit_0_) 1 MPI processes
> >> type not 

Re: [petsc-users] Convergence of AMG

2018-10-31 Thread Mark Adams via petsc-users
On Wed, Oct 31, 2018 at 3:43 PM Manav Bhatia  wrote:

> Here are the updates. I did not find the options to make much difference
> in the results.
>
> I noticed this message in the GAMG output for cases 2, 3:  HARD stop of
> coarsening on level 3.  Grid too small: 1 block nodes
>

Yea, this is what it looks like but I can see that there are 5 levels below.


>
> Is this implying that the mesh on level 3 could not be coarsened towards
> levels 4/5?
>
> In one of your earlier emails you had mentioned about reducing the rate of
> coarsening. What is the option to do that.
>

The coarsening rate is pretty slow but you could try reducing it more with
a finite -pc_gamg_threshold val (you could try val=0.01, 0.02, 0.05).


>
> Common options:
> -pc_mg_levels 5 -mg_levels_ksp_max_it 4 -pc_gamg_square_graph 0
> -pc_gamg_threshold 0. -mg_levels_ksp_type richardson -gamg_est_ksp_type cg
>
> Case#  |  #levels   |#KSP Iters  |  Extra Options
> 
>  1  | 5|   67|
>  2  | 5|   67|
>  -pc_gamg_agg_nsmooths 2
>  3  | 5|   67|
>  -pc_gamg_agg_nsmooths 2  -mg_levels_esteig_ksp_type cg
>

I have done plates a long time ago and they worked fine, but there is
something going wrong here. Perhaps my plate formulations were different
but I don't think so (what we called Reissner-Mindlin).

You might try an easier test case if this is not a super simple test (eg,
well shaped elements).

Otherwise you can limit the number of levels that you use and use a
parallel direct solver for the (large) coarse grid. To do that use:
"-pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type lu" and
configure your PETSc with a parallel coarse grid solver like MUMPS or
SuperLU.

Then set -pc_gamg_coarse_eq_limit N with N being large. Increase it and
minimize the solve time. You will find that parallel lu is faster than MG
for "small" problems (note, small could be pretty large here). So you will
want to run LU first (ie, no AMG) to get a baseline and then try GAMG.

Oh, and you could also try hypre (=pc_type hypre -pc_hypre_type boomeramg).
Hypre should not be great in theory but it is worth a try. Maybe there's
just something going wrong with GAMG.


Re: [petsc-users] DIVERGED_NANORING with PC GAMG

2018-10-31 Thread Mark Adams via petsc-users
These are indefinite (bad) Helmholtz problems. Right?

On Wed, Oct 31, 2018 at 2:38 PM Matthew Knepley  wrote:

> On Wed, Oct 31, 2018 at 2:13 PM Thibaut Appel 
> wrote:
>
>> Hi Mark, Matthew,
>>
>> Thanks for taking the time.
>>
>> 1) You're not suggesting having -fieldsplit_X_ksp_type *f*gmres for each
>> field, are you?
>>
>> 2) No, the matrix *has* pressure in one of the fields. Here it's a 2D
>> problem (but we're also doing 3D), the unknowns are (p,u,v) and those are
>> my 3 fields. We are dealing with subsonic/transsonic flows so it is
>> convection dominated indeed.
>>
>> 3) We are in frequency domain with respect to time, i.e.
>> \partial{phi}/\partial{t} = -i*omega*phi.
>>
>> 4) Hypre is unfortunately not an option since we are in complex
>> arithmetic.
>>
>>
>> I'm not sure about "-fieldsplit_pc_type gamg" GAMG should work on one
>> block, and hence be a subpc. I'm not up on fieldsplit syntax.
>>
>> According to the online manual page this syntax applies the suffix to all
>> the defined fields?
>>
>>
>> Mark is correct. I wanted you to change the smoother. He shows how to
>> change it to Richardson (make sure you add the self-scale option), which is
>> probably the best choice.
>>
>>   Thanks,
>>
>>  Matt
>>
>> You did tell me to set it to GMRES if I'm not mistaken, that's why I
>> tried "-fieldsplit_mg_levels_ksp_type gmres" (mentioned in the email).
>> Also, it wasn't clear whether these should be applied to each block or the
>> whole system, as the online manual pages + .pdf manual barely mention
>> smoothers and how to manipulate MG objects with KSP/PC, this especially
>> with PCFIELDSPLIT where examples are scarce.
>>
>> From what I can gather from your suggestions I tried (lines with X are
>> repeated for X={0,1,2})
>>
>> This looks good. How can an identically zero vector produce a 0 residual?
> You should always monitor with
>
>   -ksp_monitor_true_residual.
>
>Thanks,
>
> Matt
>
>> -ksp_view_pre -ksp_monitor -ksp_converged_reason \
>> -ksp_type fgmres -ksp_rtol 1.0e-8 \
>> -pc_type fieldsplit \
>> -pc_fieldsplit_type multiplicative \
>> -pc_fieldsplit_block_size 3 \
>> -pc_fieldsplit_0_fields 0 \
>> -pc_fieldsplit_1_fields 1 \
>> -pc_fieldsplit_2_fields 2 \
>> -fieldsplit_X_pc_type gamg \
>> -fieldsplit_X_ksp_type gmres \
>> -fieldsplit_X_ksp_rtol 1e-10 \
>> -fieldsplit_X_mg_levels_ksp_type richardson \
>> -fieldsplit_X_mg_levels_pc_type sor \
>> -fieldsplit_X_pc_gamg_agg_nsmooths 0 \
>> -fieldsplit_X_mg_levels_ksp_richardson_self_scale \
>> -log_view
>>
>> which yields
>>
>> KSP Object: 1 MPI processes
>>   type: fgmres
>> restart=30, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>> happy breakdown tolerance 1e-30
>>   maximum iterations=1, initial guess is zero
>>   tolerances:  relative=1e-08, absolute=1e-50, divergence=1.
>>   left preconditioning
>>   using DEFAULT norm type for convergence test
>> PC Object: 1 MPI processes
>>   type: fieldsplit
>>   PC has not been set up so information may be incomplete
>> FieldSplit with MULTIPLICATIVE composition: total splits = 3,
>> blocksize = 3
>> Solver info for each split is in the following KSP objects:
>>   Split number 0 Fields  0
>>   KSP Object: (fieldsplit_0_) 1 MPI processes
>> type: preonly
>> maximum iterations=1, initial guess is zero
>> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>> left preconditioning
>> using DEFAULT norm type for convergence test
>>   PC Object: (fieldsplit_0_) 1 MPI processes
>> type not yet set
>> PC has not been set up so information may be incomplete
>>   Split number 1 Fields  1
>>   KSP Object: (fieldsplit_1_) 1 MPI processes
>> type: preonly
>> maximum iterations=1, initial guess is zero
>> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>> left preconditioning
>> using DEFAULT norm type for convergence test
>>   PC Object: (fieldsplit_1_) 1 MPI processes
>> type not yet set
>> PC has not been set up so information may be incomplete
>>   Split number 2 Fields  2
>>   KSP Object: (fieldsplit_2_) 1 MPI processes
>> type: preonly
>> maximum iterations=1, initial guess is zero
>> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>> left preconditioning
>> using DEFAULT norm type for convergence test
>>   PC Object: (fieldsplit_2_) 1 MPI processes
>> type not yet set
>> PC has not been set up so information may be incomplete
>>   linear system matrix = precond matrix:
>>   Mat Object: 1 MPI processes
>> type: seqaij
>> rows=52500, cols=52500
>> total: nonzeros=1127079, allocated nonzeros=1128624
>> total number of mallocs used during MatSetValues calls =0
>>   not using I-node routines
>>   0 KSP Residual norm 3.583290589961e+00
>>   1 KSP Residual norm 0.e+00
>> Linear solve converged due to CONVERGED_ATOL iterations 1
>>
>> so something must not be 

Re: [petsc-users] DIVERGED_NANORING with PC GAMG

2018-10-31 Thread Mark Adams via petsc-users
Again, you probably want to avoid Cheby. with  ‘-mg_levels_ksp_type
richardson -mg_levels_pc_type sor’ with the proper prefix.

I'm not sure about "-fieldsplit_pc_type gamg" GAMG should work on one
block, and hence be a subpc. I'm not up on fieldsplit syntax.

On Wed, Oct 31, 2018 at 9:22 AM Thibaut Appel via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Hi Matthew,
>
> Which database option are you referring to?
>
> I tried to add -fieldsplit_mg_levels_ksp_type gmres (and
> -fieldsplit_mg_levels_ksp_max_it 4 for another run) to my options (cf.
> below) which starts the iterations but it takes 1 hour for PETSc to do 13
> of them so it must be wrong.
>
> Reminder: my baseline database options line reads
>
> -ksp_view_pre -ksp_monitor -ksp_converged_reason \
> -ksp_rtol 1.0e-8 -ksp_gmres_restart 300 \
> -ksp_type fgmres \
> -pc_type fieldsplit \
> -pc_fieldsplit_type multiplicative \
> -pc_fieldsplit_block_size 3 \
> -pc_fieldsplit_0_fields 0   \
> -pc_fieldsplit_1_fields 1   \
> -pc_fieldsplit_2_fields 2   \
> -fieldsplit_pc_type gamg\
> -fieldsplit_ksp_type gmres  \
> -fieldsplit_ksp_rtol 1.0e-8
>
> which gives
>
> KSP Object: 1 MPI processes
>   type: fgmres
> restart=300, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> happy breakdown tolerance 1e-30
>   maximum iterations=1, initial guess is zero
>   tolerances:  relative=1e-08, absolute=1e-50, divergence=1.
>   left preconditioning
>   using DEFAULT norm type for convergence test
> PC Object: 1 MPI processes
>   type: fieldsplit
>   PC has not been set up so information may be incomplete
> FieldSplit with MULTIPLICATIVE composition: total splits = 3,
> blocksize = 3
> Solver info for each split is in the following KSP objects:
>   Split number 0 Fields  0
>   KSP Object: (fieldsplit_0_) 1 MPI processes
> type: preonly
> maximum iterations=1, initial guess is zero
> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> left preconditioning
> using DEFAULT norm type for convergence test
>   PC Object: (fieldsplit_0_) 1 MPI processes
> type not yet set
> PC has not been set up so information may be incomplete
>   Split number 1 Fields  1
>   KSP Object: (fieldsplit_1_) 1 MPI processes
> type: preonly
> maximum iterations=1, initial guess is zero
> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> left preconditioning
> using DEFAULT norm type for convergence test
>   PC Object: (fieldsplit_1_) 1 MPI processes
> type not yet set
> PC has not been set up so information may be incomplete
>   Split number 2 Fields  2
>   KSP Object: (fieldsplit_2_) 1 MPI processes
> type: preonly
> maximum iterations=1, initial guess is zero
> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> left preconditioning
> using DEFAULT norm type for convergence test
>   PC Object: (fieldsplit_2_) 1 MPI processes
> type not yet set
> PC has not been set up so information may be incomplete
>   linear system matrix = precond matrix:
>   Mat Object: 1 MPI processes
> type: seqaij
> rows=52500, cols=52500
> total: nonzeros=1127079, allocated nonzeros=1128624
> total number of mallocs used during MatSetValues calls =0
>   not using I-node routines
>   0 KSP Residual norm 3.583290589961e+00
> [0]PETSC ERROR: - Error Message
> --
> [0]PETSC ERROR: Petsc has generated inconsistent data
> [0]PETSC ERROR: Eigen estimator failed: DIVERGED_NANORINF at iteration
> 0[0]PETSC ERROR: Petsc Release Version 3.10.2, unknown
> [0]PETSC ERROR: Configure options --PETSC_ARCH=msi_cplx_debug
> --with-scalar-type=complex --with-precision=double --with-debugging=1
> --with-valgrind=1 --with-debugger=gdb --with-fortran-kernels=1
> --download-mpich --download-hwloc --download-fblaslapack
> --download-scalapack --download-metis --download-parmetis
> --download-ptscotch --download-mumps --download-slepc
> [0]PETSC ERROR: #1 KSPSolve_Chebyshev() line 381 in
> /home/thibaut/Packages/petsc/src/ksp/ksp/impls/cheby/cheby.c
> [0]PETSC ERROR: #2 KSPSolve() line 780 in
> /home/thibaut/Packages/petsc/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #3 PCMGMCycle_Private() line 20 in
> /home/thibaut/Packages/petsc/src/ksp/pc/impls/mg/mg.c
> [0]PETSC ERROR: #4 PCApply_MG() line 377 in
> /home/thibaut/Packages/petsc/src/ksp/pc/impls/mg/mg.c
> [0]PETSC ERROR: #5 PCApply() line 462 in
> /home/thibaut/Packages/petsc/src/ksp/pc/interface/precon.c
> [0]PETSC ERROR: #6 KSP_PCApply() line 281 in
> /home/thibaut/Packages/petsc/include/petsc/private/kspimpl.h
> [0]PETSC ERROR: #7 KSPInitialResidual() line 67 in
> /home/thibaut/Packages/petsc/src/ksp/ksp/interface/itres.c
> [0]PETSC ERROR: #8 KSPSolve_GMRES() line 233 in
> /home/thibaut/Packages/petsc/src/ksp/ksp/impls/gmres/gmres.c
> [0]PETSC ERROR: #9 KSPSolve() line 780 in

Re: [petsc-users] DIVERGED_NANORING with PC GAMG

2018-10-31 Thread Mark Adams via petsc-users
On Tue, Oct 30, 2018 at 5:23 PM Appel, Thibaut via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Dear users,
>
> Following a suggestion from Matthew Knepley I’ve been trying to apply
> fieldsplit/gamg for my set of PDEs but I’m still encountering issues
> despite various tests. pc_gamg simply won’t start.
> Note that direct solvers always yield the correct, physical result.
> Removing the fieldsplit to focus on the gamg bit and trying to solve the
> linear system on a modest size problem still gives, with
>
> '-ksp_monitor -ksp_rtol 1.0e-10 -ksp_gmres_restart 300 -ksp_type gmres
> -pc_type gamg'
>
> [3]PETSC ERROR: - Error Message
> --
> [3]PETSC ERROR: Petsc has generated inconsistent data
> [3]PETSC ERROR: Have un-symmetric graph (apparently). Use
> '-(null)pc_gamg_sym_graph true' to symetrize the graph or
> '-(null)pc_gamg_threshold -1' if the matrix is structurally symmetric.
>
> And since then, after adding '-pc_gamg_sym_graph true' I have been getting
> [0]PETSC ERROR: - Error Message
> --
> [0]PETSC ERROR: Petsc has generated inconsistent data
> [0]PETSC ERROR: Eigen estimator failed: DIVERGED_NANORINF at iteration
>
> -ksp_chebyshev_esteig_noisy 0/1 does not change anything
>
> Knowing that Chebyshev eigen estimator needs a positive spectrum I tried
> ‘-mg_levels_ksp_type gmres’ but iterations would just go on endlessly.
>

This is OK, but you need to use '-ksp_type *f*gmres' (this could be why it
is failing ...).

It looks like your matrix is 1) just the velocity field and 2) very
unsymmetric (eg, convection dominated). I would start with
‘-mg_levels_ksp_type richardson -mg_levels_pc_type sor’.

I would also start with unsmoothed aggregation: '-pc_gamg_nsmooths 0'


>
> It seems that I have indeed eigenvalues of rather high magnitude in the
> spectrum of my operator without being able to determine the reason.
> The eigenvectors look like small artifacts at the wall-inflow or
> wall-outflow corners with zero anywhere else but I do not know how to
> interpret this.
> Equations are time-harmonic linearized Navier-Stokes to which a forcing is
> applied, there’s no time-marching.
>

You mean you are in frequency domain?


>
> Matrix is formed with a MPIAIJ type. The formulation is incompressible, in
> complex arithmetic and the 2D physical domain is mapped to a logically
> rectangular,


This kind of messes up the null space that AMG depends on but AMG theory is
gone for NS anyway.


> regular collocated grid with a high-order finite difference method.
> I determine the ownership of the rows/degrees of freedom of the matrix
> with PetscSplitOwnership and I’m not using DMDA.
>

Our iterative solvers are probably not going to work well on this but you
should test hypre also (-pc_type hypre -pc_hypre_type boomeramg). You need
to configure PETSc to download hypre.

Mark


>
> The Fortran application code is memory-leak free and has undergone a
> strict verification/validation procedure for different variations of the
> PDEs.
>
> If there’s any problem with the matrix what could help for the diagnostic?
> At this point I’m running out of ideas so I would really appreciate
> additional suggestions and discussions.
>
> Thanks for your continued support,
>
>
> Thibaut


Re: [petsc-users] Convergence of AMG

2018-10-30 Thread Mark Adams via petsc-users
And you can also always use -mg_levels_esteig_ksp_type cg

On Tue, Oct 30, 2018 at 2:51 PM Mark Adams  wrote:

> The last two cases were cut off but that is OK.
>
> Lets just do -pc_gamg_square_graph 0 and please add/do -ksp_view -
> mg_levels_ksp_type richardson -pc_mg_levels 5 -gamg_est_ksp_type cg
>
> Send the grep GAMG again, and the ksp_view output.
>
> And then do this again with: -pc_gamg_agg_nsmooths
>
> I think the eigen estimators are messed up on the coarse grids.
>
>
> On Tue, Oct 30, 2018 at 12:39 PM Karin  wrote:
>
>> Ok, sorry for my misunderstanding  and thank you for the clarification.
>>
>> Le mar. 30 oct. 2018 13:55, Mark Adams  a écrit :
>>
>>> Nicolas,
>>>
>>> Smoothed aggregation is fine with shells. see the original SA paper (
>>> https://link.springer.com/article/10.1007/BF02238511).
>>>
>>> The rotational modes, which are the non-trivial modes that must be
>>> supplied, are used in the interpolation.
>>>
>>> Mark
>>>
>>> On Tue, Oct 30, 2018 at 5:22 AM Karin  wrote:
>>>
>>>> Manav,
>>>>
>>>> How are interpolated the rotational degrees of freedom? AFAIK, when
>>>> using smoothed aggregation, the interpolation process tries to mimic linear
>>>> interpolation, which can be OK for the displacement DOF but is not for the
>>>> rotational DOF using some plate and shell formulations.
>>>> This can explain poor convergence of a multilevel approach, which needs
>>>> to restrict and extrapolate the unknowns. In order to check this
>>>> hypothesis, you can try a test case with zero rotations.
>>>>
>>>> Nicolas
>>>>
>>>> Le lun. 29 oct. 2018 à 22:13, Mark Adams via petsc-users <
>>>> petsc-users@mcs.anl.gov> a écrit :
>>>>
>>>>> * the two level results tell us that MG is not doing well on the
>>>>> coarse grids. So the coarse grids are the problem.
>>>>>
>>>>> * Do not worry about timing now. Get the math correct. The two level
>>>>> solve is not meant to be a solution just a diagnostic so don't try to
>>>>> optimize it by squaring the graph. Use -pc_gamg_square_graph 0.
>>>>>
>>>>> * It looks like you don't need 4 smoothing steps but lets keep it and
>>>>> we can dial it back later.
>>>>>
>>>>> * This table is interesting. First, you had about 12 iterations
>>>>> earlier and I think your rtol was tighter than the default (so the
>>>>> iteration could should go down not up). Do you know what change here?
>>>>>
>>>>> Note, even though -mg_levels_ksp_max_it is not in the ksp_view it does
>>>>> work. It is syntactic sugar to just add it to all levels like you did
>>>>> manually.
>>>>>
>>>>> Anyway, these number look reasonable. It is interesting that 3 levels
>>>>> ran well but the 4th level ran poorly. This implies we want to slow down
>>>>> coarsening on these levels, but ...
>>>>>
>>>>> First can you please rerun this experiment with -pc_gamg_square_graph
>>>>> 0.
>>>>>
>>>>> Also, please run with -info. This is very noisy but you can grep on
>>>>> "GAMG" and send that output to us (about 15 lines).
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 29, 2018 at 3:34 PM Manav Bhatia 
>>>>> wrote:
>>>>>
>>>>>> Barry,
>>>>>>
>>>>>>Here are some quick numbers with the following options on 4 CPUs
>>>>>> and 543,606 dofs:
>>>>>>
>>>>>> -mg_levels_ksp_max_it 4 -pc_gamg_square_graph 1 -pc_gamg_threshold 0.
>>>>>>
>>>>>>  #levels   |#KSP Iters
>>>>>> ———
>>>>>>  2|   18
>>>>>>  3|   18
>>>>>>  4|   40
>>>>>>  5|   59
>>>>>>
>>>>>> -Manav
>>>>>>
>>>>>>
>>>>>> On Oct 29, 2018, at 2:06 PM, Smith, Barry F. 
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>  Exactly how much does it increase with number of levels? Send a
>>>>>> chart number of levels and number of iterations. With say
>>>>>> -mg_levels_ksp_maxit 4
>>>>>>
>>>>>>   Thanks
>>>>>>
>>>>>>   Barry
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Oct 29, 2018, at 12:59 PM, Manav Bhatia 
>>>>>> wrote:
>>>>>>
>>>>>> Thanks for the clarification.
>>>>>>
>>>>>> I also observed that the number of KSP iterations increases with an
>>>>>> increase in the levels of AMG. Is this true, in general, for all/most
>>>>>> applications?
>>>>>>
>>>>>> -Manav
>>>>>>
>>>>>> On Oct 29, 2018, at 12:53 PM, Jed Brown  wrote:
>>>>>>
>>>>>> Manav Bhatia  writes:
>>>>>>
>>>>>> Thanks, Jed.
>>>>>>
>>>>>> The description says: “ Square the graph, ie. compute A'*A before
>>>>>> aggregating it"
>>>>>>
>>>>>> What is A here?
>>>>>>
>>>>>>
>>>>>> The original matrix, or its "graph" (your 6x6 blocks condensed to
>>>>>> scalars).
>>>>>>
>>>>>> What is the impact of setting this to 0, which led to a very
>>>>>> significant increase in the CPU time in my case?
>>>>>>
>>>>>>
>>>>>> The aggregates are formed on the connectivity of your original matrix,
>>>>>> so root nodes are aggregated only with their first neighbors,
>>>>>> resulting
>>>>>> in slower coarsening.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>


Re: [petsc-users] Convergence of AMG

2018-10-30 Thread Mark Adams via petsc-users
The last two cases were cut off but that is OK.

Lets just do -pc_gamg_square_graph 0 and please add/do -ksp_view -
mg_levels_ksp_type richardson -pc_mg_levels 5 -gamg_est_ksp_type cg

Send the grep GAMG again, and the ksp_view output.

And then do this again with: -pc_gamg_agg_nsmooths

I think the eigen estimators are messed up on the coarse grids.


On Tue, Oct 30, 2018 at 12:39 PM Karin  wrote:

> Ok, sorry for my misunderstanding  and thank you for the clarification.
>
> Le mar. 30 oct. 2018 13:55, Mark Adams  a écrit :
>
>> Nicolas,
>>
>> Smoothed aggregation is fine with shells. see the original SA paper (
>> https://link.springer.com/article/10.1007/BF02238511).
>>
>> The rotational modes, which are the non-trivial modes that must be
>> supplied, are used in the interpolation.
>>
>> Mark
>>
>> On Tue, Oct 30, 2018 at 5:22 AM Karin  wrote:
>>
>>> Manav,
>>>
>>> How are interpolated the rotational degrees of freedom? AFAIK, when
>>> using smoothed aggregation, the interpolation process tries to mimic linear
>>> interpolation, which can be OK for the displacement DOF but is not for the
>>> rotational DOF using some plate and shell formulations.
>>> This can explain poor convergence of a multilevel approach, which needs
>>> to restrict and extrapolate the unknowns. In order to check this
>>> hypothesis, you can try a test case with zero rotations.
>>>
>>> Nicolas
>>>
>>> Le lun. 29 oct. 2018 à 22:13, Mark Adams via petsc-users <
>>> petsc-users@mcs.anl.gov> a écrit :
>>>
>>>> * the two level results tell us that MG is not doing well on the coarse
>>>> grids. So the coarse grids are the problem.
>>>>
>>>> * Do not worry about timing now. Get the math correct. The two level
>>>> solve is not meant to be a solution just a diagnostic so don't try to
>>>> optimize it by squaring the graph. Use -pc_gamg_square_graph 0.
>>>>
>>>> * It looks like you don't need 4 smoothing steps but lets keep it and
>>>> we can dial it back later.
>>>>
>>>> * This table is interesting. First, you had about 12 iterations earlier
>>>> and I think your rtol was tighter than the default (so the iteration could
>>>> should go down not up). Do you know what change here?
>>>>
>>>> Note, even though -mg_levels_ksp_max_it is not in the ksp_view it does
>>>> work. It is syntactic sugar to just add it to all levels like you did
>>>> manually.
>>>>
>>>> Anyway, these number look reasonable. It is interesting that 3 levels
>>>> ran well but the 4th level ran poorly. This implies we want to slow down
>>>> coarsening on these levels, but ...
>>>>
>>>> First can you please rerun this experiment with -pc_gamg_square_graph 0.
>>>>
>>>> Also, please run with -info. This is very noisy but you can grep on
>>>> "GAMG" and send that output to us (about 15 lines).
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>>
>>>>
>>>> On Mon, Oct 29, 2018 at 3:34 PM Manav Bhatia 
>>>> wrote:
>>>>
>>>>> Barry,
>>>>>
>>>>>Here are some quick numbers with the following options on 4 CPUs
>>>>> and 543,606 dofs:
>>>>>
>>>>> -mg_levels_ksp_max_it 4 -pc_gamg_square_graph 1 -pc_gamg_threshold 0.
>>>>>
>>>>>  #levels   |#KSP Iters
>>>>> ———
>>>>>  2|   18
>>>>>  3|   18
>>>>>  4|   40
>>>>>  5|   59
>>>>>
>>>>> -Manav
>>>>>
>>>>>
>>>>> On Oct 29, 2018, at 2:06 PM, Smith, Barry F. 
>>>>> wrote:
>>>>>
>>>>>
>>>>>  Exactly how much does it increase with number of levels? Send a chart
>>>>> number of levels and number of iterations. With say -mg_levels_ksp_maxit 4
>>>>>
>>>>>   Thanks
>>>>>
>>>>>   Barry
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Oct 29, 2018, at 12:59 PM, Manav Bhatia 
>>>>> wrote:
>>>>>
>>>>> Thanks for the clarification.
>>>>>
>>>>> I also observed that the number of KSP iterations increases with an
>>>>> increase in the levels of AMG. Is this true, in general, for all/most
>>>>> applications?
>>>>>
>>>>> -Manav
>>>>>
>>>>> On Oct 29, 2018, at 12:53 PM, Jed Brown  wrote:
>>>>>
>>>>> Manav Bhatia  writes:
>>>>>
>>>>> Thanks, Jed.
>>>>>
>>>>> The description says: “ Square the graph, ie. compute A'*A before
>>>>> aggregating it"
>>>>>
>>>>> What is A here?
>>>>>
>>>>>
>>>>> The original matrix, or its "graph" (your 6x6 blocks condensed to
>>>>> scalars).
>>>>>
>>>>> What is the impact of setting this to 0, which led to a very
>>>>> significant increase in the CPU time in my case?
>>>>>
>>>>>
>>>>> The aggregates are formed on the connectivity of your original matrix,
>>>>> so root nodes are aggregated only with their first neighbors, resulting
>>>>> in slower coarsening.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>


Re: [petsc-users] Convergence of AMG

2018-10-30 Thread Mark Adams via petsc-users
Nicolas,

Smoothed aggregation is fine with shells. see the original SA paper (
https://link.springer.com/article/10.1007/BF02238511).

The rotational modes, which are the non-trivial modes that must be
supplied, are used in the interpolation.

Mark

On Tue, Oct 30, 2018 at 5:22 AM Karin  wrote:

> Manav,
>
> How are interpolated the rotational degrees of freedom? AFAIK, when using
> smoothed aggregation, the interpolation process tries to mimic linear
> interpolation, which can be OK for the displacement DOF but is not for the
> rotational DOF using some plate and shell formulations.
> This can explain poor convergence of a multilevel approach, which needs to
> restrict and extrapolate the unknowns. In order to check this hypothesis,
> you can try a test case with zero rotations.
>
> Nicolas
>
> Le lun. 29 oct. 2018 à 22:13, Mark Adams via petsc-users <
> petsc-users@mcs.anl.gov> a écrit :
>
>> * the two level results tell us that MG is not doing well on the coarse
>> grids. So the coarse grids are the problem.
>>
>> * Do not worry about timing now. Get the math correct. The two level
>> solve is not meant to be a solution just a diagnostic so don't try to
>> optimize it by squaring the graph. Use -pc_gamg_square_graph 0.
>>
>> * It looks like you don't need 4 smoothing steps but lets keep it and we
>> can dial it back later.
>>
>> * This table is interesting. First, you had about 12 iterations earlier
>> and I think your rtol was tighter than the default (so the iteration could
>> should go down not up). Do you know what change here?
>>
>> Note, even though -mg_levels_ksp_max_it is not in the ksp_view it does
>> work. It is syntactic sugar to just add it to all levels like you did
>> manually.
>>
>> Anyway, these number look reasonable. It is interesting that 3 levels ran
>> well but the 4th level ran poorly. This implies we want to slow down
>> coarsening on these levels, but ...
>>
>> First can you please rerun this experiment with -pc_gamg_square_graph 0.
>>
>> Also, please run with -info. This is very noisy but you can grep on
>> "GAMG" and send that output to us (about 15 lines).
>>
>> Thanks,
>> Mark
>>
>>
>>
>> On Mon, Oct 29, 2018 at 3:34 PM Manav Bhatia 
>> wrote:
>>
>>> Barry,
>>>
>>>Here are some quick numbers with the following options on 4 CPUs and
>>> 543,606 dofs:
>>>
>>> -mg_levels_ksp_max_it 4 -pc_gamg_square_graph 1 -pc_gamg_threshold 0.
>>>
>>>  #levels   |#KSP Iters
>>> ———
>>>  2|   18
>>>  3|   18
>>>  4|   40
>>>  5|   59
>>>
>>> -Manav
>>>
>>>
>>> On Oct 29, 2018, at 2:06 PM, Smith, Barry F.  wrote:
>>>
>>>
>>>  Exactly how much does it increase with number of levels? Send a chart
>>> number of levels and number of iterations. With say -mg_levels_ksp_maxit 4
>>>
>>>   Thanks
>>>
>>>   Barry
>>>
>>>
>>>
>>>
>>> On Oct 29, 2018, at 12:59 PM, Manav Bhatia 
>>> wrote:
>>>
>>> Thanks for the clarification.
>>>
>>> I also observed that the number of KSP iterations increases with an
>>> increase in the levels of AMG. Is this true, in general, for all/most
>>> applications?
>>>
>>> -Manav
>>>
>>> On Oct 29, 2018, at 12:53 PM, Jed Brown  wrote:
>>>
>>> Manav Bhatia  writes:
>>>
>>> Thanks, Jed.
>>>
>>> The description says: “ Square the graph, ie. compute A'*A before
>>> aggregating it"
>>>
>>> What is A here?
>>>
>>>
>>> The original matrix, or its "graph" (your 6x6 blocks condensed to
>>> scalars).
>>>
>>> What is the impact of setting this to 0, which led to a very significant
>>> increase in the CPU time in my case?
>>>
>>>
>>> The aggregates are formed on the connectivity of your original matrix,
>>> so root nodes are aggregated only with their first neighbors, resulting
>>> in slower coarsening.
>>>
>>>
>>>
>>>
>>>


Re: [petsc-users] Convergence of AMG

2018-10-29 Thread Mark Adams via petsc-users
* the two level results tell us that MG is not doing well on the coarse
grids. So the coarse grids are the problem.

* Do not worry about timing now. Get the math correct. The two level solve
is not meant to be a solution just a diagnostic so don't try to optimize it
by squaring the graph. Use -pc_gamg_square_graph 0.

* It looks like you don't need 4 smoothing steps but lets keep it and we
can dial it back later.

* This table is interesting. First, you had about 12 iterations earlier and
I think your rtol was tighter than the default (so the iteration could
should go down not up). Do you know what change here?

Note, even though -mg_levels_ksp_max_it is not in the ksp_view it does
work. It is syntactic sugar to just add it to all levels like you did
manually.

Anyway, these number look reasonable. It is interesting that 3 levels ran
well but the 4th level ran poorly. This implies we want to slow down
coarsening on these levels, but ...

First can you please rerun this experiment with -pc_gamg_square_graph 0.

Also, please run with -info. This is very noisy but you can grep on "GAMG"
and send that output to us (about 15 lines).

Thanks,
Mark



On Mon, Oct 29, 2018 at 3:34 PM Manav Bhatia  wrote:

> Barry,
>
>Here are some quick numbers with the following options on 4 CPUs and
> 543,606 dofs:
>
> -mg_levels_ksp_max_it 4 -pc_gamg_square_graph 1 -pc_gamg_threshold 0.
>
>  #levels   |#KSP Iters
> ———
>  2|   18
>  3|   18
>  4|   40
>  5|   59
>
> -Manav
>
>
> On Oct 29, 2018, at 2:06 PM, Smith, Barry F.  wrote:
>
>
>  Exactly how much does it increase with number of levels? Send a chart
> number of levels and number of iterations. With say -mg_levels_ksp_maxit 4
>
>   Thanks
>
>   Barry
>
>
>
>
> On Oct 29, 2018, at 12:59 PM, Manav Bhatia  wrote:
>
> Thanks for the clarification.
>
> I also observed that the number of KSP iterations increases with an
> increase in the levels of AMG. Is this true, in general, for all/most
> applications?
>
> -Manav
>
> On Oct 29, 2018, at 12:53 PM, Jed Brown  wrote:
>
> Manav Bhatia  writes:
>
> Thanks, Jed.
>
> The description says: “ Square the graph, ie. compute A'*A before
> aggregating it"
>
> What is A here?
>
>
> The original matrix, or its "graph" (your 6x6 blocks condensed to scalars).
>
> What is the impact of setting this to 0, which led to a very significant
> increase in the CPU time in my case?
>
>
> The aggregates are formed on the connectivity of your original matrix,
> so root nodes are aggregated only with their first neighbors, resulting
> in slower coarsening.
>
>
>
>
>


<    1   2