Re: [petsc-users] Nonzero I-j locations

2019-05-30 Thread Smith, Barry F. via petsc-users


> On May 30, 2019, at 11:08 PM, Manav Bhatia  wrote:
> 
> I managed to get this to work. 
> 
> I defined a larger matrix with the dense blocks appended to the end of the 
> matrix on the last processor. Currently, I am only running with one extra 
> unknown, so this should not be a significant penalty for load balancing. 
> 
> Since the larger matrix has the same I-j locations for the FE non-zeros, I 
> use it directly in the FE assembly. 

  Great!

> 
> I have tested with parallel MUMPS solves and it working smoothly. Also, the 
> monolithic system removes the issue with the singularity of J_fe at/near the 
> bifurcation point. 
> 
> Next, I would like to figure out if there are ways to bring in iterative 
> solvers to solve this more efficiently. My J_fe comes from a nonlinear shell 
> deformation problem with snap through response. 

   This can be a tough problem for AMG (or any) iterative method.
> 
> I am not sure if it would make sense to use an AMG solver on this monolithic 
> matrix,

   Almost surely not.

> as opposed to using it as a preconditioner for J_fe in the 
> Schur-factorization approach. The LOCA solver in Trillions was able to find 
> some success with the latter approach: 
> https://www.worldscientific.com/doi/abs/10.1142/S0218127405012508

  In theory you can use PCFIELDSPLIT to do the Schur factorization approach 
with your monothic matrix. You would create two IS, one for the FE problem, you 
can create this by using a ISCreateStride() (each process would create an IS 
for all the local variables except the last process which would skip the last 
variable) and the second IS would be of size zero on all processes except the 
last process where it would have only the last variable. 

  This would be fine for testing if Schur factorization plus AMG works in your 
case.

  The drawback is that PCFIELDSPLIT in this circumstance will pull out  of the 
big matrix (copy) the ever so slightly smaller matrix that needs to be passed 
to AMG; it needs the copy because GAMG currently needs to directly work with a 
AIJ matrix.

   Thus what you need to do is to use MatCreateNest(). You can do this and 
share almost all your current code. You would create the FE matrix (using, for 
example libmesh), this matrix is what you would have the FE assembly code fill 
in (because MATNEST does not support MatSetValues()). You would also create A, 
B, C as MPIAIJ matrices. Your Jacobian filling routine would then fill up 
separately the FE matrix, A, B, and C. For a PC you would use field split with 
the IS I indicated above. GAMG will then directly use the FE matrix with no 
copy needed inside the PCFIELDSPLIT. So the only difference in your code for 
the two cases would be

1) The creation of the matrix. 

2) The filling up of the pieces of the matrix versus just filling up the big 
matrix directly.

3) for field split you would need to create the IS and provide them to the PC.

Based on your previous rapid progress I have no doubt that you will be able to 
achieve this approach rapidly, good luck

Barry



> 
> I would appreciate any general thoughts concerning this. 
> 
> Regards,
> Manav
> 
> 
>> On May 29, 2019, at 9:11 PM, Manav Bhatia  wrote:
>> 
>> Barry, 
>> 
>>   Thanks for the detailed message. 
>> 
>>I checked libMesh’s continuation sovler and it appears to be using the 
>> same system solver without creating a larger matrix: 
>> https://github.com/libMesh/libmesh/blob/master/src/systems/continuation_system.C
>>  
>> 
>>I need to implement this in my code, MAST, for various reasons (mainly, 
>> it fits inside a bigger workflow). The current implementation implementation 
>> follows the Schur factorization approach: 
>> https://mastmultiphysics.github.io/class_m_a_s_t_1_1_continuation_solver_base.html#details
>>  
>> 
>>I will look into some solutions pertaining to the use of 
>> MatGetLocalSubMatrix or leverage some existing functionality in libMesh. 
>> 
>> Thanks,
>> Manav
>> 
>> 
>>> On May 29, 2019, at 7:04 PM, Smith, Barry F.  wrote:
>>> 
>>> 
>>>  Understood. Where are you putting the "few extra unknowns" in the vector 
>>> and matrix? On the first process, on the last process, some places in the 
>>> middle of the matrix?
>>> 
>>>   We don't have any trivial code for copying a big matrix into a even 
>>> larger matrix directly because we frown on doing that. It is very wasteful 
>>> in time and memory. 
>>> 
>>>   The simplest way to do it is call MatGetRow() twice for each row, once to 
>>> get the nonzero locations for each row to determine the numbers needed for 
>>> preallocation and then the second time after the big matrix has been 
>>> preallocated to get the nonzero locations and numerical values for the row 
>>> to call MatSetValues() with to set that row into the bigger matrix. Note of 
>>> course when you call MatSetValues() you will need to shift the rows and 
>>> column locations to take into account the new rows and columns in the 
>>> bigger matrix. If you

Re: [petsc-users] Nonzero I-j locations

2019-05-30 Thread Manav Bhatia via petsc-users
I managed to get this to work. 

I defined a larger matrix with the dense blocks appended to the end of the 
matrix on the last processor. Currently, I am only running with one extra 
unknown, so this should not be a significant penalty for load balancing. 

Since the larger matrix has the same I-j locations for the FE non-zeros, I use 
it directly in the FE assembly. 

I have tested with parallel MUMPS solves and it working smoothly. Also, the 
monolithic system removes the issue with the singularity of J_fe at/near the 
bifurcation point. 

Next, I would like to figure out if there are ways to bring in iterative 
solvers to solve this more efficiently. My J_fe comes from a nonlinear shell 
deformation problem with snap through response. 

I am not sure if it would make sense to use an AMG solver on this monolithic 
matrix, as opposed to using it as a preconditioner for J_fe in the 
Schur-factorization approach. The LOCA solver in Trillions was able to find 
some success with the latter approach: 
https://www.worldscientific.com/doi/abs/10.1142/S0218127405012508 


I would appreciate any general thoughts concerning this. 

Regards,
Manav


> On May 29, 2019, at 9:11 PM, Manav Bhatia  wrote:
> 
> Barry, 
> 
>   Thanks for the detailed message. 
> 
>I checked libMesh’s continuation sovler and it appears to be using the 
> same system solver without creating a larger matrix: 
> https://github.com/libMesh/libmesh/blob/master/src/systems/continuation_system.C
>  
> 
>  
> 
>I need to implement this in my code, MAST, for various reasons (mainly, it 
> fits inside a bigger workflow). The current implementation implementation 
> follows the Schur factorization approach: 
> https://mastmultiphysics.github.io/class_m_a_s_t_1_1_continuation_solver_base.html#details
>  
> 
>  
> 
>I will look into some solutions pertaining to the use of 
> MatGetLocalSubMatrix or leverage some existing functionality in libMesh. 
> 
> Thanks,
> Manav
> 
> 
>> On May 29, 2019, at 7:04 PM, Smith, Barry F. > > wrote:
>> 
>> 
>>  Understood. Where are you putting the "few extra unknowns" in the vector 
>> and matrix? On the first process, on the last process, some places in the 
>> middle of the matrix?
>> 
>>   We don't have any trivial code for copying a big matrix into a even larger 
>> matrix directly because we frown on doing that. It is very wasteful in time 
>> and memory. 
>> 
>>   The simplest way to do it is call MatGetRow() twice for each row, once to 
>> get the nonzero locations for each row to determine the numbers needed for 
>> preallocation and then the second time after the big matrix has been 
>> preallocated to get the nonzero locations and numerical values for the row 
>> to call MatSetValues() with to set that row into the bigger matrix. Note of 
>> course when you call MatSetValues() you will need to shift the rows and 
>> column locations to take into account the new rows and columns in the bigger 
>> matrix. If you put the "extra unknowns" at the every end of the rows/columns 
>> on the last process you won't have to shift.
>> 
>>   Note that B being dense really messes up chances for load balancing since 
>> its rows are dense and take a great deal of space so whatever process gets 
>> those rows needs to have much less of the mesh. 
>> 
>>  The correct long term approach is to have libmesh provide the needed 
>> functionality (for continuation) for the slightly larger matrix directly so 
>> huge matrices do not need to be copied.
>> 
>>  I noticed that libmesh has some functionality related to continuation. I do 
>> not know if they handle it by creating the larger matrix and vector and 
>> filling that up directly for finite elements. If they do then you should 
>> definitely take a look at that and see if it can be extended for your case 
>> (ignore the continuation algorithm they may be using, that is not relevant, 
>> the question is if they generate the larger matrices and if you can leverage 
>> this).
>> 
>> 
>>  The ultimate hack would be to (for example) assign the extra variables to 
>> the end of the last process and hack lib mesh a little bit so the matrix it 
>> creates (before it puts in the numerical values) has the extra rows and 
>> columns, that libmesh will not put the values into but you will. Thus you 
>> get libmesh to fill up the true final matrix for its finite element problem 
>> (not realizing the matrix is a little bigger then it needs) directly, no 
>> copies of the data needed. But this is bit tricky, you'll need to combine 
>> libmesh's preallocation information with yours for the final columns and 
>> rows before you have lib mesh put the numerical values in. Double check if 
>> they have any support for th

Re: [petsc-users] Nonzero I-j locations

2019-05-29 Thread Jed Brown via petsc-users
"Smith, Barry F."  writes:

>   Sorry, my mistake. I assumed that the naming would follow PETSc convention 
> and there would be MatGetLocalSubMatrix_something() as there is 
> MatGetLocalSubMatrix_IS() and MatGetLocalSubMatrix_Nest(). Instead 
> MatGetLocalSubMatrix() is hardwired to call MatCreateLocalRef() if the 
> method is not provide for the original matrix. 
>
>   Now interestingly MatCreateLocalRef() has its own manual page which states: 
> Most will use MatGetLocalSubMatrix(). I am not sure why MatCreateLocalRef() 
> is a public function (that is why it would ever be called directly). Perhaps 
> a note could be added to its manual page indicating why it is public. My 
> inclination would be to make it private and call it 
> MatGetLocalSubMatrix_Basic(). There is harm in having multiple similar public 
> functions unless there is a true need for them.

I think the motivation was that it's actually a Mat implementation and
thus made sense as Developer level interface rather than a strictly
internal interface.  I don't know if we had in mind any use cases where
it could be useful to a general caller.  I don't have a strong opinion
at the moment about whether it makes sense to keep like this or make
internal.


Re: [petsc-users] Nonzero I-j locations

2019-05-29 Thread Smith, Barry F. via petsc-users


  Sorry, my mistake. I assumed that the naming would follow PETSc convention 
and there would be MatGetLocalSubMatrix_something() as there is 
MatGetLocalSubMatrix_IS() and MatGetLocalSubMatrix_Nest(). Instead 
MatGetLocalSubMatrix() is hardwired to call MatCreateLocalRef() if the 
method is not provide for the original matrix. 

  Now interestingly MatCreateLocalRef() has its own manual page which states: 
Most will use MatGetLocalSubMatrix(). I am not sure why MatCreateLocalRef() is 
a public function (that is why it would ever be called directly). Perhaps a 
note could be added to its manual page indicating why it is public. My 
inclination would be to make it private and call it 
MatGetLocalSubMatrix_Basic(). There is harm in having multiple similar public 
functions unless there is a true need for them.

  Barry

I don't remember the names of anything in PETSc, I only remember the naming 
conventions, hence when something is nonstandard I tend to get lost.


> On May 29, 2019, at 11:13 PM, Jed Brown  wrote:
> 
> "Smith, Barry F. via petsc-users"  writes:
> 
>>   This is an interesting idea, but unfortunately not directly compatible 
>> with libMesh filling up the finite element part of the matrix. Plus it 
>> appears MatGetLocalSubMatrix() is only implemented for IS and Nest matrices 
>> :-(
> 
> Maybe I'm missing something, but MatGetLocalSubMatrix *is* implemented
> for arbitrary Mats; it returns a view that allows you to set entries
> using local submatrix indexing.  That was a key feature of the MatNest
> work from so many years ago and a paper on which you're coauthor.  ;-)



Re: [petsc-users] Nonzero I-j locations

2019-05-29 Thread Jed Brown via petsc-users
"Smith, Barry F. via petsc-users"  writes:

>This is an interesting idea, but unfortunately not directly compatible 
> with libMesh filling up the finite element part of the matrix. Plus it 
> appears MatGetLocalSubMatrix() is only implemented for IS and Nest matrices 
> :-(

Maybe I'm missing something, but MatGetLocalSubMatrix *is* implemented
for arbitrary Mats; it returns a view that allows you to set entries
using local submatrix indexing.  That was a key feature of the MatNest
work from so many years ago and a paper on which you're coauthor.  ;-)


Re: [petsc-users] Nonzero I-j locations

2019-05-29 Thread Smith, Barry F. via petsc-users


   This is an interesting idea, but unfortunately not directly compatible with 
libMesh filling up the finite element part of the matrix. Plus it appears 
MatGetLocalSubMatrix() is only implemented for IS and Nest matrices :-(

   You could create  a MATNEST reusing exactly the matrix from lib mesh as the 
first block and call MatConvert() to MPIAIJ format. This is easier I guess then 
coding the conversion yourself. (Still has the memory and copy issues but if 
that is the best we can do). Note that MatNest() requires that all its matrices 
live on all the ranks of the MPI_Comm, so for your A B and C you will need to 
declare them on the MPI_Comm with zero rows and columns for most ranks (maybe 
all but one).

   Barry






> On May 29, 2019, at 6:43 PM, Matthew Knepley  wrote:
> 
> On Wed, May 29, 2019 at 7:30 PM Manav Bhatia via petsc-users 
>  wrote:
> 
> I would not choose Nest if you want to eventually run MUMPS, since that will 
> not work. I would
> try to build your matrix using
> 
>   
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetLocalSubMatrix.html
> 
> obtained from your bigger matrix. This is our interface to assembling into 
> portions or your matrix as if
> its the entire matrix.
> 
>   Thanks,
> 
> Matt
>  
>   Also, I am currently using MUMPS for all my parallel solves. 
> 
>I would appreciate any advice. 
> 
> Regards,
> Manav
> 
> 
> > On May 29, 2019, at 6:07 PM, Smith, Barry F.  wrote:
> > 
> > 
> >  Manav,
> > 
> >  For parallel sparse matrices using the standard PETSc formats the matrix 
> > is stored in two parts on each process (see the details in MatCreateAIJ()) 
> > thus there is no inexpensive way to access directly the IJ locations as a 
> > single local matrix. What are you hoping to use the information for? 
> > Perhaps we have other suggestions on how to achieve the goal.
> > 
> >   Barry
> > 
> > 
> >> On May 29, 2019, at 2:27 PM, Manav Bhatia via petsc-users 
> >>  wrote:
> >> 
> >> Hi, 
> >> 
> >>   Once a MPI-AIJ matrix has been assembled, is there a method to get the 
> >> nonzero I-J locations? I see one for sequential matrices here: 
> >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html
> >>  , but not for parallel matrices. 
> >> 
> >> Regards,
> >> Manav
> >> 
> >> 
> > 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/



Re: [petsc-users] Nonzero I-j locations

2019-05-29 Thread Smith, Barry F. via petsc-users

  Understood. Where are you putting the "few extra unknowns" in the vector and 
matrix? On the first process, on the last process, some places in the middle of 
the matrix?

   We don't have any trivial code for copying a big matrix into a even larger 
matrix directly because we frown on doing that. It is very wasteful in time and 
memory. 

   The simplest way to do it is call MatGetRow() twice for each row, once to 
get the nonzero locations for each row to determine the numbers needed for 
preallocation and then the second time after the big matrix has been 
preallocated to get the nonzero locations and numerical values for the row to 
call MatSetValues() with to set that row into the bigger matrix. Note of course 
when you call MatSetValues() you will need to shift the rows and column 
locations to take into account the new rows and columns in the bigger matrix. 
If you put the "extra unknowns" at the every end of the rows/columns on the 
last process you won't have to shift.

   Note that B being dense really messes up chances for load balancing since 
its rows are dense and take a great deal of space so whatever process gets 
those rows needs to have much less of the mesh. 

  The correct long term approach is to have libmesh provide the needed 
functionality (for continuation) for the slightly larger matrix directly so 
huge matrices do not need to be copied.

  I noticed that libmesh has some functionality related to continuation. I do 
not know if they handle it by creating the larger matrix and vector and filling 
that up directly for finite elements. If they do then you should definitely 
take a look at that and see if it can be extended for your case (ignore the 
continuation algorithm they may be using, that is not relevant, the question is 
if they generate the larger matrices and if you can leverage this).


  The ultimate hack would be to (for example) assign the extra variables to the 
end of the last process and hack lib mesh a little bit so the matrix it creates 
(before it puts in the numerical values) has the extra rows and columns, that 
libmesh will not put the values into but you will. Thus you get libmesh to fill 
up the true final matrix for its finite element problem (not realizing the 
matrix is a little bigger then it needs) directly, no copies of the data 
needed. But this is bit tricky, you'll need to combine libmesh's preallocation 
information with yours for the final columns and rows before you have lib mesh 
put the numerical values in. Double check if they have any support for this 
first. 

   Barry


> On May 29, 2019, at 6:29 PM, Manav Bhatia  wrote:
> 
> Thanks, Barry. 
> 
> I am working on a FE application (involving bifurcation behavior) with 
> libMesh where I need to solve the system of equations along with a few extra 
> unknowns that are not directly related to the FE mesh. I am able to assemble 
> the n x 1 residual (R_fe) and  n x n  Jacobian (J_fe ) from my code and 
> libMesh provides me with the sparsity pattern for this. 
> 
> Next, the system of equations that I need to solve is: 
> 
> [   J_fe   A ]  { dX } =  { R_fe  }
> [   B   C ]  { dV } =  {R_ext } 
> 
> Where, C is a dense matrix of size m x m ( m << n ), A is n x m, B is m x n, 
> R_ext is m x 1.   A, B and C are dense matrixes. This comes from the bordered 
> system for my path continuation solver. 
> 
> I have implemented a solver using Schur factorization ( this is outside of 
> PETSc and does not use the FieldSplit construct ). This works well for most 
> cases, except when J_fe is close to singular. 
> 
> I am now attempting to create a monolithic matrix that solves the complete 
> system. 
> 
> Currently, the approach I am considering is to compute J_fe using my libMesh 
> application, so that I don’t have to change that. I am defining a new matrix 
> with the extra non-zero locations for A, B, C. 
> 
>  With J_fe computed, I am looking to copy its non-zero entries to this new 
> matrix. This is where I am stumbling since I don’t know how best to get the 
> non-zero locations in J_fe. Maybe there is a better approach to copy from 
> J_fe to the new matrix? 
> 
>  I have looked through the nested matrix construct, but have not given this a 
> serious consideration. Maybe I should? Note that I don’t want to solve J_fe 
> and C separately (not as separate systems), so the field-split approach will 
> not be suitable here. 
> 
>  Also, I am currently using MUMPS for all my parallel solves. 
> 
>   I would appreciate any advice. 
> 
> Regards,
> Manav
> 
> 
>> On May 29, 2019, at 6:07 PM, Smith, Barry F.  wrote:
>> 
>> 
>> Manav,
>> 
>> For parallel sparse matrices using the standard PETSc formats the matrix is 
>> stored in two parts on each process (see the details in MatCreateAIJ()) thus 
>> there is no inexpensive way to access directly the IJ locations as a single 
>> local matrix. What are you hoping to use the information for? Perhaps we 
>> have other suggestions on how to achieve t

Re: [petsc-users] Nonzero I-j locations

2019-05-29 Thread Smith, Barry F. via petsc-users


  Manav,

  For parallel sparse matrices using the standard PETSc formats the matrix is 
stored in two parts on each process (see the details in MatCreateAIJ()) thus 
there is no inexpensive way to access directly the IJ locations as a single 
local matrix. What are you hoping to use the information for? Perhaps we have 
other suggestions on how to achieve the goal.

   Barry


> On May 29, 2019, at 2:27 PM, Manav Bhatia via petsc-users 
>  wrote:
> 
> Hi, 
> 
>Once a MPI-AIJ matrix has been assembled, is there a method to get the 
> nonzero I-J locations? I see one for sequential matrices here: 
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html
>  , but not for parallel matrices. 
> 
> Regards,
> Manav
> 
> 



Re: [petsc-users] Nonzero I-j locations

2019-05-29 Thread Zhang, Junchao via petsc-users
Yes, see MatGetRow 
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRow.html
--Junchao Zhang


On Wed, May 29, 2019 at 2:28 PM Manav Bhatia via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi,

   Once a MPI-AIJ matrix has been assembled, is there a method to get the 
nonzero I-J locations? I see one for sequential matrices here: 
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html
 , but not for parallel matrices.

Regards,
Manav