Re: [petsc-users] Memory growth issue

2019-05-29 Thread Smith, Barry F. via petsc-users


   This is indeed worrisome. 

Would it be possible to put PetscMemoryGetCurrentUsage() around each call 
to KSPSolve() and each call to your data exchange? See if at each step they 
increase? 

One thing to be aware of with "max resident set size" is that it measures 
the number of pages that have been set up. Not the amount of memory allocated. 
So, if, for example, you allocate a very large array but don't actually read or 
write the memory in that array until later in the code it won't appear in the 
"resident set size" until you read or write the memory (because Unix doesn't 
set up pages until it needs to). 

   You should also try another MPI. Both OpenMPI and MPICH can be installed 
with brew or you can use --download-mpich or --download-openmp to see if the 
MPI implementation is making a difference.

For now I would focus on the PETSc only solvers to eliminate one variable 
from the equation; once that is understood you can go back to the question of 
the memory management of the other solvers

  Barry


> On May 29, 2019, at 11:51 PM, Sanjay Govindjee via petsc-users 
>  wrote:
> 
> I am trying to track down a memory issue with my code; apologies in advance 
> for the longish message.
> 
> I am solving a FEA problem with a number of load steps involving about 3000
> right hand side and tangent assemblies and solves.  The program is mainly 
> Fortran, with a C memory allocator.
> 
> When I run my code in strictly serial mode (no Petsc or MPI routines) the 
> memory stays constant over the whole run.
> 
> When I run it in parallel mode with petsc solvers with num_processes=1, the 
> memory (max resident set size) also stays constant:
> 
> PetscMalloc = 28,976, ProgramNativeMalloc = constant, Resident Size = 
> 24,854,528 (constant) [CG/JACOBI]
> 
> [PetscMalloc and Resident Size as reported by PetscMallocGetCurrentUsage and 
> PetscMemoryGetCurrentUsage (and summed across processes as needed);
> ProgramNativeMalloc reported by program memory allocator.]
> 
> When I run it in parallel mode with petsc solvers but num_processes=2, the 
> resident memory grows steadily during the run:
> 
> PetscMalloc = 3,039,072 (constant), ProgramNativeMalloc = constant, Resident 
> Size = (finish) 31,313,920 (start) 24,698,880 [CG/JACOBI]
> 
> When I run it in parallel mode with petsc solvers but num_processes=4, the 
> resident memory grows steadily during the run:
> 
> PetscMalloc = 3,307,888 (constant), ProgramNativeMalloc = 1,427,584 
> (constant), Resident Size = (finish) 70,787,072  (start) 45,801,472 
> [CG/JACOBI]
> PetscMalloc = 5,903,808 (constant), ProgramNativeMalloc = 1,427,584 
> (constant), Resident Size = (finish) 112,410,624 (start) 52,076,544 
> [GMRES/BJACOBI]
> PetscMalloc = 3,188,944 (constant), ProgramNativeMalloc = 1,427,584 
> (constant), Resident Size = (finish) 712,798,208 (start) 381,480,960 [SUPERLU]
> PetscMalloc = 6,539,408 (constant), ProgramNativeMalloc = 1,427,584 
> (constant), Resident Size = (finish) 591,048,704 (start) 278,671,360 [MUMPS]
> 
> The memory growth feels alarming but maybe I do not understand the values in 
> ru_maxrss from getrusage().
> 
> My box (MacBook Pro) has a broken Valgrind so I need to get to a system with 
> a functional one; notwithstanding, the code has always been Valgrind clean.
> There are no Fortran Pointers or Fortran Allocatable arrays in the part of 
> the code being used.  The program's C memory allocator keeps track of
> itself so I do not see that the problem is there.  The Petsc malloc is also 
> steady.
> 
> Other random hints:
> 
> 1) If I comment out the call to KSPSolve and to my MPI data-exchange routine 
> (for passing solution values between processes after each solve,
> use  MPI_Isend, MPI_Recv, MPI_BARRIER)  the memory growth essentially goes 
> away.
> 
> 2) If I comment out the call to my MPI data-exchange routine but leave the 
> call to KSPSolve the problem remains but is substantially reduced
> for CG/JACOBI, and is marginally reduced for the GMRES/BJACOBI, SUPERLU, and 
> MUMPS runs.
> 
> 3) If I comment out the call to KSPSolve but leave the call to my MPI 
> data-exchange routine the problem remains.
> 
> Any suggestions/hints of where to look will be great.
> 
> -sanjay
> 
> 



Re: [petsc-users] Nonzero I-j locations

2019-05-29 Thread Jed Brown via petsc-users
"Smith, Barry F."  writes:

>   Sorry, my mistake. I assumed that the naming would follow PETSc convention 
> and there would be MatGetLocalSubMatrix_something() as there is 
> MatGetLocalSubMatrix_IS() and MatGetLocalSubMatrix_Nest(). Instead 
> MatGetLocalSubMatrix() is hardwired to call MatCreateLocalRef() if the 
> method is not provide for the original matrix. 
>
>   Now interestingly MatCreateLocalRef() has its own manual page which states: 
> Most will use MatGetLocalSubMatrix(). I am not sure why MatCreateLocalRef() 
> is a public function (that is why it would ever be called directly). Perhaps 
> a note could be added to its manual page indicating why it is public. My 
> inclination would be to make it private and call it 
> MatGetLocalSubMatrix_Basic(). There is harm in having multiple similar public 
> functions unless there is a true need for them.

I think the motivation was that it's actually a Mat implementation and
thus made sense as Developer level interface rather than a strictly
internal interface.  I don't know if we had in mind any use cases where
it could be useful to a general caller.  I don't have a strong opinion
at the moment about whether it makes sense to keep like this or make
internal.


Re: [petsc-users] Nonzero I-j locations

2019-05-29 Thread Smith, Barry F. via petsc-users


  Sorry, my mistake. I assumed that the naming would follow PETSc convention 
and there would be MatGetLocalSubMatrix_something() as there is 
MatGetLocalSubMatrix_IS() and MatGetLocalSubMatrix_Nest(). Instead 
MatGetLocalSubMatrix() is hardwired to call MatCreateLocalRef() if the 
method is not provide for the original matrix. 

  Now interestingly MatCreateLocalRef() has its own manual page which states: 
Most will use MatGetLocalSubMatrix(). I am not sure why MatCreateLocalRef() is 
a public function (that is why it would ever be called directly). Perhaps a 
note could be added to its manual page indicating why it is public. My 
inclination would be to make it private and call it 
MatGetLocalSubMatrix_Basic(). There is harm in having multiple similar public 
functions unless there is a true need for them.

  Barry

I don't remember the names of anything in PETSc, I only remember the naming 
conventions, hence when something is nonstandard I tend to get lost.


> On May 29, 2019, at 11:13 PM, Jed Brown  wrote:
> 
> "Smith, Barry F. via petsc-users"  writes:
> 
>>   This is an interesting idea, but unfortunately not directly compatible 
>> with libMesh filling up the finite element part of the matrix. Plus it 
>> appears MatGetLocalSubMatrix() is only implemented for IS and Nest matrices 
>> :-(
> 
> Maybe I'm missing something, but MatGetLocalSubMatrix *is* implemented
> for arbitrary Mats; it returns a view that allows you to set entries
> using local submatrix indexing.  That was a key feature of the MatNest
> work from so many years ago and a paper on which you're coauthor.  ;-)



Re: [petsc-users] Nonzero I-j locations

2019-05-29 Thread Jed Brown via petsc-users
"Smith, Barry F. via petsc-users"  writes:

>This is an interesting idea, but unfortunately not directly compatible 
> with libMesh filling up the finite element part of the matrix. Plus it 
> appears MatGetLocalSubMatrix() is only implemented for IS and Nest matrices 
> :-(

Maybe I'm missing something, but MatGetLocalSubMatrix *is* implemented
for arbitrary Mats; it returns a view that allows you to set entries
using local submatrix indexing.  That was a key feature of the MatNest
work from so many years ago and a paper on which you're coauthor.  ;-)


Re: [petsc-users] parallel dual porosity

2019-05-29 Thread Matthew Knepley via petsc-users
On Wed, May 29, 2019 at 10:54 PM Adrian Croucher 
wrote:

> On 30/05/19 2:45 PM, Matthew Knepley wrote:
>
>
> Hmm, I had not thought about that. It will not do that at all. We have
> never rebalanced a simulation
> using overlap cells. I would have to write the code that strips them out.
> Not hard, but more code.
> If you only plan on redistributing once, you can wait until then to add
> the overlap cells.
>
>
> So for now at least, you would suggest doing the initial distribution with
> overlap = 0, and then the redistribution with overlap = 1?
>
> (I shouldn't need to redistribute more than once.)
>

Yep. Those actions are actually separate too. All Distribute() does is
first distribute with no overlap, and then call DIstributeOverlap().

  Thanks,

Matt

> - Adrian
>
> --
> Dr Adrian Croucher
> Senior Research Fellow
> Department of Engineering Science
> University of Auckland, New Zealand
> email: a.crouc...@auckland.ac.nz
> tel: +64 (0)9 923 4611
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] parallel dual porosity

2019-05-29 Thread Adrian Croucher via petsc-users


On 30/05/19 2:45 PM, Matthew Knepley wrote:


Hmm, I had not thought about that. It will not do that at all. We have 
never rebalanced a simulation
using overlap cells. I would have to write the code that strips them 
out. Not hard, but more code.
If you only plan on redistributing once, you can wait until then to 
add the overlap cells.



So for now at least, you would suggest doing the initial distribution 
with overlap = 0, and then the redistribution with overlap = 1?


(I shouldn't need to redistribute more than once.)

- Adrian

--
Dr Adrian Croucher
Senior Research Fellow
Department of Engineering Science
University of Auckland, New Zealand
email: a.crouc...@auckland.ac.nz
tel: +64 (0)9 923 4611



Re: [petsc-users] Matrix free GMRES seems to ignore my initial guess

2019-05-29 Thread Jan Izak Cornelius Vermaak via petsc-users
Just some feedback. I found the problem. For reference my solve was called
as follows

KSPSolve(ksp,b,phi_new)

Inside my matrix operation (the "Matrix-Action" or MAT_OP_MULT) I was using
phi_new for a computation and that overwrite my initial guess everytime.
Looks like the solver still holds on to phi_new and uses it internally and
therefore when I modify it it basically changes the entire behavior. Lesson
learned: Internal to the custom MAT_OP_MULT, do not modify, b or phi_new.

Thanks for the help.

Regards,
Jan Vermaak

On Tue, May 28, 2019 at 1:16 AM Smith, Barry F.  wrote:

>
>
> > On May 28, 2019, at 12:41 AM, Jan Izak Cornelius Vermaak <
> janicverm...@gmail.com> wrote:
> >
> > Checking the matrix would be hard as I have a really big operator. Its
> transport sweeps.
> >
> > When I increase the restart interval the solution converges to the right
> one.
>
>   Run with -ksp_monitor_true_residual what are the true residuals being
> printed?
>
>   The GMRES code has been in continuous use for 25 years, it would
> stunning if you suddenly found a bug in it.
>
>How it works, within a restart, the GMRES algorithm uses a simple
> recursive formula to compute an "estimate" for the residual norm. At
> restart it actually computes the current solution and then uses that to
> compute an accurate residual norm via the formula b - A x. When the
> residual computed by the b - A x is different than that computed by the
> recursive formula it means the recursive formula has run into some
> difficulty (bad operator, bad preconditioner, null space in the operator)
> and is not computing correct values. Now if you increase the restart to
> past the point when it "converges" you are hiding the incorrectly computed
> values computed via the recursive formula.
>
>   I urge you to check the residual norm by b - A x at the end of the solve
> and double check that it is small. It seems unlikely GMRES is providing the
> correct answer for your problem.
>
>Barry
>
>
> > Checked against a reference solution and Classic Richardson. It is
> really as if the initial guess is completely ignored.
> >
> > [0]  Computing b
> > [0]  Iteration 0 Residual 169.302
> > [0]  Iteration 1 Residual 47.582
> > [0]  Iteration 2 Residual 13.2614
> > [0]  Iteration 3 Residual 4.46795
> > [0]  Iteration 4 Residual 1.03038
> > [0]  Iteration 5 Residual 0.246807
> > [0]  Iteration 6 Residual 0.0828341
> > [0]  Iteration 7 Residual 0.0410627
> > [0]  Iteration 8 Residual 0.0243749
> > [0]  Iteration 9 Residual 0.0136067
> > [0]  Iteration 10 Residual 0.00769078
> > [0]  Iteration 11 Residual 0.00441658
> > [0]  Iteration 12 Residual 0.00240794
> > [0]  Iteration 13 Residual 0.00132048
> > [0]  Iteration 14 Residual 0.00073003
> > [0]  Iteration 15 Residual 0.000399504
> > [0]  Iteration 16 Residual 0.000217677
> > [0]  Iteration 17 Residual 0.000120408
> > [0]  Iteration 18 Residual 6.49719e-05
> > [0]  Iteration 19 Residual 3.44523e-05
> > [0]  Iteration 20 Residual 1.87909e-05
> > [0]  Iteration 21 Residual 1.02385e-05
> > [0]  Iteration 22 Residual 5.57859e-06
> > [0]  Iteration 23 Residual 3.03431e-06
> > [0]  Iteration 24 Residual 1.63696e-06
> > [0]  Iteration 25 Residual 8.78202e-07
> >
> > On Mon, May 27, 2019 at 11:55 PM Smith, Barry F. 
> wrote:
> >
> >This behavior where the residual norm jumps at restart indicates
> something is very very wrong. Run with the option
> -ksp_monitor_true_residual and I think you'll see the true residual is not
> decreasing as is the preconditioned residual. My guess is that your "action
> of the matrix" is incorrect and not actually a linear operator. Try using
> MatComputeExplicitOperator() and see what explicit matrix it produces, is
> it what you expect?
> >
> >Barry
> >
> >
> >
> >
> > > On May 27, 2019, at 11:33 PM, Jan Izak Cornelius Vermaak via
> petsc-users  wrote:
> > >
> > > Hi all,
> > >
> > > So I am faced with this debacle. I have a neutron transport solver
> with a sweep code that can compute the action of the matrix on a vector.
> > >
> > > I use a matrix shell to set up the action of the matrix. The method
> works but only if I can get the solution converged before GMRES restarts.
> It gets the right answer. Now my first problem is (and I only saw this when
> I hit the first restart) is that it looks like the solver completely resets
> after the GMRES-restart. Below is an iteration log with restart interval
> set to 10. At first I thought it wasn't updating the initial guess but it
> became clear that it initial guess always had no effect. I do set
> KSPSetInitialGuessNonZero but it has no effect.
> > >
> > > Is the matrix-free business defaulting my initial guess to zero
> everytime? What can I do to actually supply an initial guess? I've used
> PETSc for diffusion many times and the initial guess always works, just not
> now.
> > >
> > > [0]  Computing b
> > > [0]  Iteration 0 Residual 169.302
> > > [0]  Iteration 1 Residual 47.582
> > > [0]  Iteration 2 Residual 13.2614

Re: [petsc-users] parallel dual porosity

2019-05-29 Thread Matthew Knepley via petsc-users
On Wed, May 29, 2019 at 10:38 PM Adrian Croucher 
wrote:

> hi
> On 28/05/19 11:32 AM, Matthew Knepley wrote:
>
>
> I would not do that. It should be much easier, and better from a workflow
> standpoint,
> to just redistribute in parallel. We now have several test examples that
> redistribute
> in parallel, for example
>
>
> https://bitbucket.org/petsc/petsc/src/cd762eb66180d8d1fcc3950bd19a3c1b423f4f20/src/dm/impls/plex/examples/tests/ex1.c#lines-486
>
> Let us know if you have problems.
>
> If you use DMPlexDistribute() a second time to redistribute the mesh,
> presumably it will first delete all the partition ghost cells created from
> the initial distribution (and create new ones after redistribution).
>
> Hmm, I had not thought about that. It will not do that at all. We have
never rebalanced a simulation
using overlap cells. I would have to write the code that strips them out.
Not hard, but more code.
If you only plan on redistributing once, you can wait until then to add the
overlap cells.

> If I have also used DMPlexConstructGhostCells() to create boundary
> condition ghost cells on the distributed mesh, what happens to them when I
> redistribute? Do they get copied over to the redistributed mesh? or is it
> better not to add them until the mesh has been redistributed?
>
> Same thing here. Don't add ghost cells until after redistribution.

If you want to redistribute multiple times, again we would need to strip
out those cells.

  Thanks,

Matt

> - Adrian
>
> --
> Dr Adrian Croucher
> Senior Research Fellow
> Department of Engineering Science
> University of Auckland, New Zealand
> email: a.crouc...@auckland.ac.nz
> tel: +64 (0)9 923 4611
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] parallel dual porosity

2019-05-29 Thread Adrian Croucher via petsc-users

hi

On 28/05/19 11:32 AM, Matthew Knepley wrote:


I would not do that. It should be much easier, and better from a 
workflow standpoint,
to just redistribute in parallel. We now have several test examples 
that redistribute

in parallel, for example

https://bitbucket.org/petsc/petsc/src/cd762eb66180d8d1fcc3950bd19a3c1b423f4f20/src/dm/impls/plex/examples/tests/ex1.c#lines-486

Let us know if you have problems.


If you use DMPlexDistribute() a second time to redistribute the mesh, 
presumably it will first delete all the partition ghost cells created 
from the initial distribution (and create new ones after redistribution).


If I have also used DMPlexConstructGhostCells() to create boundary 
condition ghost cells on the distributed mesh, what happens to them when 
I redistribute? Do they get copied over to the redistributed mesh? or is 
it better not to add them until the mesh has been redistributed?


- Adrian

--
Dr Adrian Croucher
Senior Research Fellow
Department of Engineering Science
University of Auckland, New Zealand
email: a.crouc...@auckland.ac.nz
tel: +64 (0)9 923 4611



Re: [petsc-users] Memory inquire functions

2019-05-29 Thread Smith, Barry F. via petsc-users


  They are for the given process. 


> On May 29, 2019, at 7:10 PM, Sanjay Govindjee via petsc-users 
>  wrote:
> 
> (In Fortran) do the calls
> 
> call PetscMallocGetCurrentUsage(val, ierr)
> call PetscMemoryGetCurrentUsage(val, ierr)
> 
> return the per process memory numbers? or are the returned values summed 
> across all processes?
> 
> -sanjay
> 



Re: [petsc-users] Nonzero I-j locations

2019-05-29 Thread Smith, Barry F. via petsc-users


   This is an interesting idea, but unfortunately not directly compatible with 
libMesh filling up the finite element part of the matrix. Plus it appears 
MatGetLocalSubMatrix() is only implemented for IS and Nest matrices :-(

   You could create  a MATNEST reusing exactly the matrix from lib mesh as the 
first block and call MatConvert() to MPIAIJ format. This is easier I guess then 
coding the conversion yourself. (Still has the memory and copy issues but if 
that is the best we can do). Note that MatNest() requires that all its matrices 
live on all the ranks of the MPI_Comm, so for your A B and C you will need to 
declare them on the MPI_Comm with zero rows and columns for most ranks (maybe 
all but one).

   Barry






> On May 29, 2019, at 6:43 PM, Matthew Knepley  wrote:
> 
> On Wed, May 29, 2019 at 7:30 PM Manav Bhatia via petsc-users 
>  wrote:
> 
> I would not choose Nest if you want to eventually run MUMPS, since that will 
> not work. I would
> try to build your matrix using
> 
>   
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetLocalSubMatrix.html
> 
> obtained from your bigger matrix. This is our interface to assembling into 
> portions or your matrix as if
> its the entire matrix.
> 
>   Thanks,
> 
> Matt
>  
>   Also, I am currently using MUMPS for all my parallel solves. 
> 
>I would appreciate any advice. 
> 
> Regards,
> Manav
> 
> 
> > On May 29, 2019, at 6:07 PM, Smith, Barry F.  wrote:
> > 
> > 
> >  Manav,
> > 
> >  For parallel sparse matrices using the standard PETSc formats the matrix 
> > is stored in two parts on each process (see the details in MatCreateAIJ()) 
> > thus there is no inexpensive way to access directly the IJ locations as a 
> > single local matrix. What are you hoping to use the information for? 
> > Perhaps we have other suggestions on how to achieve the goal.
> > 
> >   Barry
> > 
> > 
> >> On May 29, 2019, at 2:27 PM, Manav Bhatia via petsc-users 
> >>  wrote:
> >> 
> >> Hi, 
> >> 
> >>   Once a MPI-AIJ matrix has been assembled, is there a method to get the 
> >> nonzero I-J locations? I see one for sequential matrices here: 
> >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html
> >>  , but not for parallel matrices. 
> >> 
> >> Regards,
> >> Manav
> >> 
> >> 
> > 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/



[petsc-users] Memory inquire functions

2019-05-29 Thread Sanjay Govindjee via petsc-users

(In Fortran) do the calls

    call PetscMallocGetCurrentUsage(val, ierr)
    call PetscMemoryGetCurrentUsage(val, ierr)

return the per process memory numbers? or are the returned values summed 
across all processes?


-sanjay



Re: [petsc-users] Nonzero I-j locations

2019-05-29 Thread Smith, Barry F. via petsc-users

  Understood. Where are you putting the "few extra unknowns" in the vector and 
matrix? On the first process, on the last process, some places in the middle of 
the matrix?

   We don't have any trivial code for copying a big matrix into a even larger 
matrix directly because we frown on doing that. It is very wasteful in time and 
memory. 

   The simplest way to do it is call MatGetRow() twice for each row, once to 
get the nonzero locations for each row to determine the numbers needed for 
preallocation and then the second time after the big matrix has been 
preallocated to get the nonzero locations and numerical values for the row to 
call MatSetValues() with to set that row into the bigger matrix. Note of course 
when you call MatSetValues() you will need to shift the rows and column 
locations to take into account the new rows and columns in the bigger matrix. 
If you put the "extra unknowns" at the every end of the rows/columns on the 
last process you won't have to shift.

   Note that B being dense really messes up chances for load balancing since 
its rows are dense and take a great deal of space so whatever process gets 
those rows needs to have much less of the mesh. 

  The correct long term approach is to have libmesh provide the needed 
functionality (for continuation) for the slightly larger matrix directly so 
huge matrices do not need to be copied.

  I noticed that libmesh has some functionality related to continuation. I do 
not know if they handle it by creating the larger matrix and vector and filling 
that up directly for finite elements. If they do then you should definitely 
take a look at that and see if it can be extended for your case (ignore the 
continuation algorithm they may be using, that is not relevant, the question is 
if they generate the larger matrices and if you can leverage this).


  The ultimate hack would be to (for example) assign the extra variables to the 
end of the last process and hack lib mesh a little bit so the matrix it creates 
(before it puts in the numerical values) has the extra rows and columns, that 
libmesh will not put the values into but you will. Thus you get libmesh to fill 
up the true final matrix for its finite element problem (not realizing the 
matrix is a little bigger then it needs) directly, no copies of the data 
needed. But this is bit tricky, you'll need to combine libmesh's preallocation 
information with yours for the final columns and rows before you have lib mesh 
put the numerical values in. Double check if they have any support for this 
first. 

   Barry


> On May 29, 2019, at 6:29 PM, Manav Bhatia  wrote:
> 
> Thanks, Barry. 
> 
> I am working on a FE application (involving bifurcation behavior) with 
> libMesh where I need to solve the system of equations along with a few extra 
> unknowns that are not directly related to the FE mesh. I am able to assemble 
> the n x 1 residual (R_fe) and  n x n  Jacobian (J_fe ) from my code and 
> libMesh provides me with the sparsity pattern for this. 
> 
> Next, the system of equations that I need to solve is: 
> 
> [   J_fe   A ]  { dX } =  { R_fe  }
> [   B   C ]  { dV } =  {R_ext } 
> 
> Where, C is a dense matrix of size m x m ( m << n ), A is n x m, B is m x n, 
> R_ext is m x 1.   A, B and C are dense matrixes. This comes from the bordered 
> system for my path continuation solver. 
> 
> I have implemented a solver using Schur factorization ( this is outside of 
> PETSc and does not use the FieldSplit construct ). This works well for most 
> cases, except when J_fe is close to singular. 
> 
> I am now attempting to create a monolithic matrix that solves the complete 
> system. 
> 
> Currently, the approach I am considering is to compute J_fe using my libMesh 
> application, so that I don’t have to change that. I am defining a new matrix 
> with the extra non-zero locations for A, B, C. 
> 
>  With J_fe computed, I am looking to copy its non-zero entries to this new 
> matrix. This is where I am stumbling since I don’t know how best to get the 
> non-zero locations in J_fe. Maybe there is a better approach to copy from 
> J_fe to the new matrix? 
> 
>  I have looked through the nested matrix construct, but have not given this a 
> serious consideration. Maybe I should? Note that I don’t want to solve J_fe 
> and C separately (not as separate systems), so the field-split approach will 
> not be suitable here. 
> 
>  Also, I am currently using MUMPS for all my parallel solves. 
> 
>   I would appreciate any advice. 
> 
> Regards,
> Manav
> 
> 
>> On May 29, 2019, at 6:07 PM, Smith, Barry F.  wrote:
>> 
>> 
>> Manav,
>> 
>> For parallel sparse matrices using the standard PETSc formats the matrix is 
>> stored in two parts on each process (see the details in MatCreateAIJ()) thus 
>> there is no inexpensive way to access directly the IJ locations as a single 
>> local matrix. What are you hoping to use the information for? Perhaps we 
>> have other suggestions on how to achieve 

Re: [petsc-users] Nonzero I-j locations

2019-05-29 Thread Smith, Barry F. via petsc-users


  Manav,

  For parallel sparse matrices using the standard PETSc formats the matrix is 
stored in two parts on each process (see the details in MatCreateAIJ()) thus 
there is no inexpensive way to access directly the IJ locations as a single 
local matrix. What are you hoping to use the information for? Perhaps we have 
other suggestions on how to achieve the goal.

   Barry


> On May 29, 2019, at 2:27 PM, Manav Bhatia via petsc-users 
>  wrote:
> 
> Hi, 
> 
>Once a MPI-AIJ matrix has been assembled, is there a method to get the 
> nonzero I-J locations? I see one for sequential matrices here: 
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html
>  , but not for parallel matrices. 
> 
> Regards,
> Manav
> 
> 



Re: [petsc-users] Nonzero I-j locations

2019-05-29 Thread Zhang, Junchao via petsc-users
Yes, see MatGetRow 
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRow.html
--Junchao Zhang


On Wed, May 29, 2019 at 2:28 PM Manav Bhatia via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi,

   Once a MPI-AIJ matrix has been assembled, is there a method to get the 
nonzero I-J locations? I see one for sequential matrices here: 
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html
 , but not for parallel matrices.

Regards,
Manav




Re: [petsc-users] How do I supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS

2019-05-29 Thread Jed Brown via petsc-users
Lisandro Dalcin  writes:

> On Tue, 28 May 2019 at 22:05, Jed Brown  wrote:
>
>>
>> Note that all of these compilers (including Sun C, which doesn't define
>> the macro) recognize -fPIC.  (Blue Gene xlc requires -qpic.)  Do we
>> still need to test the other alternatives?
>>
>>
> Well, worst case, if the configure test always fails with and without all
> the possible variants of  the PIC flag (-fPIC, -kPIC, -qpic, etc.) because
> they do not define the __PIC__ macro, then you are free to abort configure
> and ask users to pass the pic flag in CFLAGS and remove --with-pic=1 from
> the configure line, as we do today for all compilers.

Yeah, this would also need to apply to shared libraries, where need for
PIC is implied and we should proceed to confirm that linking works
(i.e., the old test) using user-provided CFLAGS.

> BTW, my trick seems to works with the Cray compiler.
>
> $ cc
> CC-2107 craycc: ERROR in command line
>   No valid filenames are specified on the command line.
>
> $ cc -c check-pic.c -fPIC
>
> $ cc -c check-pic.c
> CC-35 craycc: ERROR File = check-pic.c, Line = 2
>   #error directive: "no-PIC"
>   #error "no-PIC"
>^
>
>
> -- 
> Lisandro Dalcin
> 
> Research Scientist
> Extreme Computing Research Center (ECRC)
> King Abdullah University of Science and Technology (KAUST)
> http://ecrc.kaust.edu.sa/


Re: [petsc-users] Stop KSP if diverging

2019-05-29 Thread Smith, Barry F. via petsc-users


   Hmm, in the lastest couple of releases of PETSc the KSPSolve is suppose to 
end as soon as it hits a NaN or Infinity. Is that not happening for you? If you 
run with -ksp_monitor does it print multiple lines with Nan or Inf? If so 
please send use the -ksp_view output so we can track down which solver is not 
correctly handling the Nan or Info.

  That said if you call KSPSolve() multiple times in a loop or from SNESSolve() 
each new solve may have Nan or Inf (from the previous) but it should only do 
one iteration before exiting.

   You should always call KSPGetConvergedReason() after KSPSolve() and confirm 
that the reason is positive, if it is native it indicates something failed in 
the solve.

   Barry


> On May 29, 2019, at 2:06 AM, Edoardo alinovi via petsc-users 
>  wrote:
> 
> Dear PETSc friends,
> 
> Hope you are doing all well.
> 
> I have a quick question for you that I am not able to solve by my self. Time 
> to time, while testing new code features, it happens that KSP diverges but it 
> does not stop automatically and the iterations continue even after getting a 
> NaN. 
> 
> In the KSP setup I use the following instruction to set the divergence 
> stopping criteria (div = 1):
> 
> call KSPSetTolerances(myksp,  rel_tol,  abs_tol, div,  itmax,  ierr)
> 
> But is does not help. Looking into the documentation I have found also:
> KSPConvergedDefault(KSP ksp,PetscInt n,PetscReal rnorm,KSPConvergedReason 
> *reason,void *ctx)
> Which I am not calling in the code. Is this maybe the reason of my problem? 
> If yes how can I use KSPConvergedDefault in FORTRAN? 
> 
> Thanks,
> 
> Edo
> 
> --
> 
> Edoardo Alinovi, Ph.D.
> 
> DICCA, Scuola Politecnica,
> Universita' degli Studi di Genova,
> 1, via Montallegro,
> 16145 Genova, Italy
> 
> Email: edoardo.alin...@dicca.unige.it
> Tel: +39 010 353 2540
> Website: https://www.edoardoalinovi.com/
> 
> 



Re: [petsc-users] Bad memory scaling with PETSc 3.10

2019-05-29 Thread Myriam Peyrounette via petsc-users
Oh sorry, I missed that. That's great!

Thanks,

Myriam


Le 05/29/19 à 16:55, Zhang, Hong a écrit :
> Myriam:
> This branch is merged to master.
> Thanks for your work and patience. It helps us a lot. The graphs are
> very nice :-)
>
> We plan to re-organise the APIs of mat-mat opts, make them easier for
> users.
> Hong
>
> Hi,
>
> Do you have any idea when Barry's fix
> 
> (https://bitbucket.org/petsc/petsc/pull-requests/1606/change-handling-of-matptap_mpiaij_mpimaij/diff)
> will be released? I can see it has been merged to the "next"
> branch. Does it mean it will be soon available on master?
>
> +for your information, I plotted a summary of the scalings of
> interest (memory and time):
> - using petsc-3.10.2 (ref "bad" scaling)
> - using petsc-3.6.4 (ref "good" scaling)
> - using commit d330a26 + Barry's fix and different algorithms
> (none, scalable, allatonce, allatonce_merged)
>
> Best regards,
>
> Myriam
>
>
> Le 05/13/19 à 17:20, Fande Kong a écrit :
>> Hi Myriam,
>>
>> Thanks for your report back.
>>
>> On Mon, May 13, 2019 at 2:01 AM Myriam Peyrounette
>> > > wrote:
>>
>> Hi all,
>>
>> I tried with 3.11.1 version and Barry's fix. The good scaling
>> is back!
>> See the green curve in the plot attached. It is even better
>> than PETSc
>> 3.6! And it runs faster (10-15s instead of 200-300s with 3.6).
>>
>>
>> We are glad your issue was resolved here. 
>>  
>>
>>
>> So you were right. It seems that not all the PtAPs used the
>> scalable
>> version.
>>
>> I was a bit confused about the options to set... I used the
>> options:
>> -matptap_via scalable and -mat_freeintermediatedatastructures
>> 1. Do you
>> think it would be even better with allatonce?
>>
>>
>> "scalable" and "allatonce" correspond to different algorithms
>> respectively. ``allatonce" should be using less memory than
>> "scalable". The "allatonce" algorithm  would be a good
>> alternative if your application is memory sensitive and the
>> problem size is large. 
>> We are definitely curious about the memory usage of ``allatonce"
>> in your test cases but don't feel obligated to do these tests
>> since your concern were resolved now. In case you are also
>> interested in how our new algorithms perform, I post petsc
>> options here that are used to 
>> choose these algorithms:
>>
>> algorithm 1: ``allatonce" 
>>
>> -matptap_via allatonce
>> -mat_freeintermediatedatastructures 1
>>
>> algorithm 2: ``allatonce_merged" 
>>
>> -matptap_via allatonce_merged
>> -mat_freeintermediatedatastructures 1
>>
>>
>> Again, thanks for your report that help us improve PETSc.
>>
>> Fande,
>>  
>>
>>
>> It is unfortunate that this fix can't be merged with the
>> master branch.
>> But the patch works well and I can consider the issue as
>> solved now.
>>
>> Thanks a lot for your time!
>>
>> Myriam
>>
>>
>> Le 05/04/19 à 06:54, Smith, Barry F. a écrit :
>> >    Hmm, I had already fixed this, I think,
>> >
>> >   
>> 
>> https://bitbucket.org/petsc/petsc/pull-requests/1606/change-handling-of-matptap_mpiaij_mpimaij/diff
>> >
>> >    but unfortunately our backlog of pull requests kept it
>> out of master. We are (well Satish and Jed) working on a new
>> CI infrastructure that will hopefully be more stable than the
>> current CI that we are using.
>> >
>> >    Fande,
>> >       Sorry you had to spend time on this.
>> >
>> >
>> >    Barry
>> >
>> >
>> >
>> >> On May 3, 2019, at 11:20 PM, Fande Kong via petsc-users
>> mailto:petsc-users@mcs.anl.gov>> wrote:
>> >>
>> >> Hi Myriam,
>> >>
>> >> I run the example you attached earlier with "-mx 48 -my 48
>> -mz 48 -levels 3 -ksp_view  -matptap_via allatonce -log_view ". 
>> >>
>> >> There are six PtAPs. Two of them are sill using the
>> nonscalable version of the algorithm (this might explain why
>> the memory still exponentially increases) even though we have
>> asked PETSc to use the ``allatonce" algorithm. This is
>> happening because MATMAIJ does not honor the petsc option,
>> instead, it uses the default setting of MPIAIJ.  I have a fix
>> at
>> 
>> https://bitbucket.org/petsc/petsc/pull-requests/1623/choose-algorithms-in/diff.
>> The PR should fix the issue.
>> >>
>> >> Thanks again for your report,
>> >>
>> >> Fande,
>> >>
>> >> 
>>
>> -- 
>> Myriam Peyrounette
>> CNRS/IDRIS - HLST
>> --
>>
>>
>

Re: [petsc-users] Bad memory scaling with PETSc 3.10

2019-05-29 Thread Zhang, Hong via petsc-users
Myriam:
This branch is merged to master.
Thanks for your work and patience. It helps us a lot. The graphs are very nice 
:-)

We plan to re-organise the APIs of mat-mat opts, make them easier for users.
Hong

Hi,

Do you have any idea when Barry's fix 
(https://bitbucket.org/petsc/petsc/pull-requests/1606/change-handling-of-matptap_mpiaij_mpimaij/diff)
 will be released? I can see it has been merged to the "next" branch. Does it 
mean it will be soon available on master?

+for your information, I plotted a summary of the scalings of interest (memory 
and time):
- using petsc-3.10.2 (ref "bad" scaling)
- using petsc-3.6.4 (ref "good" scaling)
- using commit d330a26 + Barry's fix and different algorithms (none, scalable, 
allatonce, allatonce_merged)

Best regards,

Myriam

Le 05/13/19 à 17:20, Fande Kong a écrit :
Hi Myriam,

Thanks for your report back.

On Mon, May 13, 2019 at 2:01 AM Myriam Peyrounette 
mailto:myriam.peyroune...@idris.fr>> wrote:
Hi all,

I tried with 3.11.1 version and Barry's fix. The good scaling is back!
See the green curve in the plot attached. It is even better than PETSc
3.6! And it runs faster (10-15s instead of 200-300s with 3.6).

We are glad your issue was resolved here.


So you were right. It seems that not all the PtAPs used the scalable
version.

I was a bit confused about the options to set... I used the options:
-matptap_via scalable and -mat_freeintermediatedatastructures 1. Do you
think it would be even better with allatonce?

"scalable" and "allatonce" correspond to different algorithms respectively. 
``allatonce" should be using less memory than "scalable". The "allatonce" 
algorithm  would be a good alternative if your application is memory sensitive 
and the problem size is large.
We are definitely curious about the memory usage of ``allatonce" in your test 
cases but don't feel obligated to do these tests since your concern were 
resolved now. In case you are also interested in how our new algorithms 
perform, I post petsc options here that are used to
choose these algorithms:

algorithm 1: ``allatonce"

-matptap_via allatonce
-mat_freeintermediatedatastructures 1

algorithm 2: ``allatonce_merged"

-matptap_via allatonce_merged
-mat_freeintermediatedatastructures 1


Again, thanks for your report that help us improve PETSc.

Fande,


It is unfortunate that this fix can't be merged with the master branch.
But the patch works well and I can consider the issue as solved now.

Thanks a lot for your time!

Myriam


Le 05/04/19 à 06:54, Smith, Barry F. a écrit :
>Hmm, I had already fixed this, I think,
>
>
> https://bitbucket.org/petsc/petsc/pull-requests/1606/change-handling-of-matptap_mpiaij_mpimaij/diff
>
>but unfortunately our backlog of pull requests kept it out of master. We 
> are (well Satish and Jed) working on a new CI infrastructure that will 
> hopefully be more stable than the current CI that we are using.
>
>Fande,
>   Sorry you had to spend time on this.
>
>
>Barry
>
>
>
>> On May 3, 2019, at 11:20 PM, Fande Kong via petsc-users 
>> mailto:petsc-users@mcs.anl.gov>> wrote:
>>
>> Hi Myriam,
>>
>> I run the example you attached earlier with "-mx 48 -my 48 -mz 48 -levels 3 
>> -ksp_view  -matptap_via allatonce -log_view ".
>>
>> There are six PtAPs. Two of them are sill using the nonscalable version of 
>> the algorithm (this might explain why the memory still exponentially 
>> increases) even though we have asked PETSc to use the ``allatonce" 
>> algorithm. This is happening because MATMAIJ does not honor the petsc 
>> option, instead, it uses the default setting of MPIAIJ.  I have a fix at 
>> https://bitbucket.org/petsc/petsc/pull-requests/1623/choose-algorithms-in/diff.
>>  The PR should fix the issue.
>>
>> Thanks again for your report,
>>
>> Fande,
>>
>>

--
Myriam Peyrounette
CNRS/IDRIS - HLST
--




--
Myriam Peyrounette
CNRS/IDRIS - HLST
--



Re: [petsc-users] Stop KSP if diverging

2019-05-29 Thread Edoardo alinovi via petsc-users
Thanks Matthew,

Yes, I will give it a try thid evening.

Thak you very much!

On Wed, 29 May 2019, 11:32 Matthew Knepley,  wrote:

> On Wed, May 29, 2019 at 3:07 AM Edoardo alinovi via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>> Dear PETSc friends,
>>
>> Hope you are doing all well.
>>
>> I have a quick question for you that I am not able to solve by my self.
>> Time to time, while testing new code features, it happens that KSP diverges
>> but it does not stop automatically and the iterations continue even after
>> getting a NaN.
>>
>
> I think you want
>
>
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPSetErrorIfNotConverged.html
>
>   Thanks,
>
> Matt
>
>
>> In the KSP setup I use the following instruction to set the divergence
>> stopping criteria (div = 1):
>>
>> call KSPSetTolerances(myksp,  rel_tol,  abs_tol, div,  itmax,  ierr)
>>
>> But is does not help. Looking into the documentation I have found also:
>>
>> KSPConvergedDefault 
>> (KSP
>>  
>> 
>>  ksp,PetscInt 
>> 
>>  n,PetscReal 
>> 
>>  rnorm,KSPConvergedReason 
>> 
>>  *reason,void *ctx)
>>
>> Which I am not calling in the code. Is this maybe the reason of my
>> problem? If yes how can I use KSPConvergedDefault
>> 
>>  in
>> FORTRAN?
>>
>> Thanks,
>>
>> Edo
>>
>> --
>>
>> Edoardo Alinovi, Ph.D.
>>
>> DICCA, Scuola Politecnica,
>> Universita' degli Studi di Genova,
>> 1, via Montallegro,
>> 16145 Genova, Italy
>>
>> Email: edoardo.alin...@dicca.unige.it
>> Tel: +39 010 353 2540
>> Website: https://www.edoardoalinovi.com/
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> 
>