Hi Barry,

I have implemented my matrix-free GMRES in parallel. There is a small 
difference in the residuals (please see the figure below)  when I vary the 
number of cores. But they all converged in the end. I have tried to multiply my 
shell matrix or preconditoning matrix with a vector in parallel, I got same 
values as a serial run. so I believe halo exchange is ok for matrix-vector 
multiplications.

I am using ASM preconditioner with overlap set to 1. Is this behaviour in 
parallel normal for ASM pre-conditioners? If it is not, then I know I need to 
find a bug in my code.

Thanks,
Feng

[cid:c9f5f5bf-c1fc-4350-bf3d-0ff072b884e9]

________________________________
From: petsc-users <[email protected]> on behalf of feng wang 
<[email protected]>
Sent: 28 March 2021 22:05
To: Barry Smith <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation

Hi Barry,
Thanks for your comments. I will try that.
Thanks,
Feng

________________________________
From: Barry Smith <[email protected]>
Sent: 26 March 2021 23:44
To: feng wang <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation



On Mar 25, 2021, at 6:39 PM, feng wang 
<[email protected]<mailto:[email protected]>> wrote:

Hi Barry,

Thanks for your comments.

I will renumber the cells in the way as you recommended. I went through the 
manual again and understand how to update the halo elements for my shell matrix 
routine "mymult(Mat m ,Vec x, Vec y)". I can use the global index of ghost 
cells for each rank and "Vec x" to get the ghost values for each rank via 
scattering.  It should be similar to the example in page 40 in the manual.

One more question, I also have an assembled approximate Jacobian matrix for 
pre-conditioning GMRES.  If I re-number the cells properly as your suggested, I 
don't need to worry about communication and petsc will handle it properly 
together with my shell-matrix?

   If you assembly the approximate Jaocobian using the "new" ordering then it 
will reflect the same function evaluation and matrix free operators so should 
be ok.

  Barry


Thanks,
Feng

________________________________
From: Barry Smith <[email protected]<mailto:[email protected]>>
Sent: 25 March 2021 0:03
To: feng wang <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation



On Mar 24, 2021, at 5:08 AM, feng wang 
<[email protected]<mailto:[email protected]>> wrote:

Hi Barry,

Thanks for your comments. It's very helpful.  For your comments, I have a bit 
more questions


  1.  for your 1st comment " Yes, in some sense. So long as each process ....".
     *   If I understand it correctly (hopefully)  a parallel vector in petsc 
can hold discontinuous rows of data in a global array.  If this is true,  If I 
call "VecGetArray", it would create a copy in a continuous space if the data is 
not continuous, do some operations and petsc will figure out how to put updated 
values back to the right place in the global array?
     *   This would generate an overhead.  If I do the renumbering to make each 
process hold continuous rows, this overhead can be avoided when I call 
"VecGetArray"?

     GetArray does nothing except return the pointer to the data in the vector. 
It does not copy anything or reorder anything. Whatever order the numbers are 
in vector they are in the same order as in the array you obtain with 
VecGetArray.


  1.  for your 2nd comment " The matrix and vectors the algebraic solvers see 
DO NOT have......."  For the callback function of my shell matrix "mymult(Mat m 
,Vec x, Vec y)", I need to get "x" for the halo elements to compute the 
non-linear function. My code will take care of other halo exchanges, but I am 
not sure how to use petsc to get the halo elements "x" in the shell matrix, 
could you please elaborate on this? some related examples or simple pesudo code 
would be great.

    Basically all the parallel code in PETSc does this. How you need to set up 
the halo communication depends on how you are managing the assignment of 
degrees of freedom on each process and between processes. VecScatterCreate() is 
the tool you will use to tell PETSc how to get the correct values from one 
process to their halo-ed location on the process. It like everything in PETSc 
uses a number in the vectors of 0 ... n_0-1 on the first process, n_0, n_0+1, 
... n_1-1 on the second etc. Since you are managing the partitioning and 
distribution of parallel data you must renumber the vector entry numbering in 
your data structures to match that shown above. Just do the numbering once 
after you have setup your distributed data and use it for the rest of the run.  
You might use the object from AOCreate to do the renumbering for you.

  Barry



Thanks,
Feng

________________________________
From: Barry Smith <[email protected]<mailto:[email protected]>>
Sent: 22 March 2021 1:28
To: feng wang <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation



On Mar 21, 2021, at 6:22 PM, feng wang 
<[email protected]<mailto:[email protected]>> wrote:

Hi Barry,

Thanks for your help, I really appreciate it.

In the end I used a shell matrix to compute the matrix-vector product, it is 
clearer to me and there are more things under my control.  I am now trying to 
do a parallel implementation, I have some questions on setting up parallel 
matrices and vectors for a user-defined partition, could you please provide 
some advice? Suppose I have already got a partition for 2 CPUs. Each cpu is 
assigned a list of elements and also their halo elements.

  1.  The global element index for each partition is not necessarily 
continuous, do I have to I re-order them to make them continuous?

   Yes, in some sense. So long as each process can march over ITS elements 
computing the function and Jacobian matrix-vector product it doesn't matter how 
you have named/numbered entries. But conceptually the first process has the 
first set of vector entries and the second the second set.

  1.
  2.  When I set up the size of the matrix and vectors for each cpu, should I 
take into account the halo elements?

    The matrix and vectors the algebraic solvers see DO NOT have halo elements 
in their sizes. You will likely need a halo-ed work vector to do the 
matrix-free multiply from. The standard model is  use VecScatterBegin/End to 
get the values from the non-halo-ed algebraic vector input to MatMult into a 
halo-ed one to do the local product.


  1.  In my serial version, when I initialize my RHS vector, I am not using 
VecSetValues, Instead I use VecGetArray/VecRestoreArray to assign the values.  
VecAssemblyBegin()/VecAssemblyEnd() is never used. would this still work for a 
parallel version?

   Yes, you can use Get/Restore but the input vector x will need to be, as 
noted above, scattered into a haloed version to get all the entries you will 
need to do the local part of the product.


Thanks,
Feng

________________________________

From: Barry Smith <[email protected]<mailto:[email protected]>>
Sent: 12 March 2021 23:40
To: feng wang <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation



On Mar 12, 2021, at 9:37 AM, feng wang 
<[email protected]<mailto:[email protected]>> wrote:

Hi Matt,

Thanks for your prompt response.

Below are my two versions. one is buggy and the 2nd one is working.  For the 
first one, I add the diagonal contribution to the true RHS (variable: rhs) and 
then set the base point, the callback function is somehow called twice 
afterwards to compute Jacobian.

    Do you mean "to compute the Jacobian matrix-vector product?"

    Is it only in the first computation of the product (for the given base 
vector) that it calls it twice or every matrix-vector product?

    It is possible there is a bug in our logic; run in the debugger with a 
break point in FormFunction_mf and each time the function is hit in the 
debugger type where or bt to get the stack frames from the calls. Send this. 
From this we can all see if it is being called excessively and why.

For the 2nd one, I just call the callback function manually to recompute 
everything, the callback function is then called once as expected to compute 
the Jacobian. For me, both versions should do the same things. but I don't know 
why in the first one the callback function is called twice after I set the base 
point. what could possibly go wrong?

   The logic of how it is suppose to work is shown below.

Thanks,
Feng

//This does not work
         fld->cnsv( iqs,iqe, q, aux, csv );
         //add contribution of time-stepping
         for(iv=0; iv<nv; iv++)
        {
            for(iq=0; iq<nq; iq++)
           {
               //use conservative variables here
               rhs[iv][iq] = -rhs[iv][iq] + csv[iv][iq]*lhsa[nlhs-1][iq]/cfl;
           }
        }
         ierr = petsc_setcsv(petsc_csv); CHKERRQ(ierr);
         ierr = petsc_setrhs(petsc_baserhs); CHKERRQ(ierr);
         ierr = MatMFFDSetBase(petsc_A_mf, petsc_csv, petsc_baserhs); 
CHKERRQ(ierr);

//This works
         fld->cnsv( iqs,iqe, q, aux, csv );
         ierr = petsc_setcsv(petsc_csv); CHKERRQ(ierr);
         ierr = FormFunction_mf(this, petsc_csv, petsc_baserhs); //this is my 
callback function, now call it manually
         ierr = MatMFFDSetBase(petsc_A_mf, petsc_csv, petsc_baserhs); 
CHKERRQ(ierr);


    Since you provide petsc_baserhs MatMFFD assumes (naturally) that you will 
keep the correct values in it. Hence for each new base value YOU need to 
compute the new values in petsc_baserhs. This approach gives you a bit more 
control over reusing the information in petsc_baserhs.

    If you would prefer that MatMFFD recomputes the base values, as needed, 
then you call FormFunction_mf(this, petsc_csv, NULL); and PETSc will allocate a 
vector and fill it up as needed by calling your FormFunction_mf() But you need 
to call MatAssemblyBegin/End each time you the base input vector  this, 
petsc_csv values change. For example

    MatAssemblyBegin(petsc_A_mf,...)
    MatAssemblyEnd(petsc_A_mf,...)
    KSPSolve()





________________________________

From: Matthew Knepley <[email protected]<mailto:[email protected]>>
Sent: 12 March 2021 15:08
To: feng wang <[email protected]<mailto:[email protected]>>
Cc: Barry Smith <[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation

On Fri, Mar 12, 2021 at 9:55 AM feng wang 
<[email protected]<mailto:[email protected]>> wrote:
Hi Mat,

Thanks for your reply. I will try the parallel implementation.

I've got a serial matrix-free GMRES working, but I would like to know why my 
initial version of matrix-free implementation does not work and there is still 
something I don't understand. I did some debugging and find that the callback 
function to compute the RHS for the matrix-free matrix is called twice by Petsc 
when it computes the finite difference Jacobian, but it should only be called 
once. I don't know why, could you please give some advice?

F is called once to calculate the base point and once to get the perturbation. 
The base point is not recalculated, so if you do many iterates, it is amortized.

  Thanks,

     Matt

Thanks,
Feng



________________________________

From: Matthew Knepley <[email protected]<mailto:[email protected]>>
Sent: 12 March 2021 12:05
To: feng wang <[email protected]<mailto:[email protected]>>
Cc: Barry Smith <[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation

On Fri, Mar 12, 2021 at 6:02 AM feng wang 
<[email protected]<mailto:[email protected]>> wrote:
Hi Barry,

Thanks for your advice.

You are right on this. somehow there is some inconsistency when I compute the 
right hand side (true RHS + time-stepping contribution to the diagonal matrix) 
to compute the finite difference Jacobian. If I just use the call back function 
to recompute my RHS before I call MatMFFDSetBase, then it works like a charm. 
But now I end up with computing my RHS three times. 1st time is to compute the 
true RHS, the rest two is for computing finite difference Jacobian.

In my previous buggy version, I only compute RHS twice.  If possible, could you 
elaborate on your comments "Also be careful about  petsc_baserhs", so I may 
possibly understand what was going on with my buggy version.

Our FD implementation is simple. It approximates the action of the Jacobian as

  J(b) v = (F(b + h v) - F(b)) / h ||v||

where h is some small parameter and b is the base vector, namely the one that 
you are linearizing around. In a Newton step, b is the previous solution
and v is the proposed solution update.

Besides, for a parallel implementation, my code already has its own partition 
method, is it possible to allow petsc read in a user-defined partition?  if not 
what is a better way to do this?

Sure

  
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSetSizes.html

  Thanks,

    Matt

Many thanks,
Feng

________________________________

From: Barry Smith <[email protected]<mailto:[email protected]>>
Sent: 11 March 2021 22:15
To: feng wang <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] Questions on matrix-free GMRES implementation


   Feng,

     The first thing to check is that for each linear solve that involves a new 
operator (values in the base vector) the MFFD matrix knows it is using a new 
operator.

The easiest way is to call MatMFFDSetBase() before each solve that involves a 
new operator (new values in the base vector).  Also be careful about  
petsc_baserhs, when you change the base vector's values you also need to change 
the petsc_baserhs values to the function evaluation at that point.

If that is correct I would check with a trivial function evaluator to make sure 
the infrastructure is all set up correctly. For examples use for the matrix 
free a 1 4 1 operator applied matrix free.

Barry


On Mar 11, 2021, at 7:35 AM, feng wang 
<[email protected]<mailto:[email protected]>> wrote:

Dear All,

I am new to petsc and trying to implement a matrix-free GMRES. I have assembled 
an approximate Jacobian matrix just for preconditioning. After reading some 
previous questions on this topic, my approach is:

the matrix-free matrix is created as:

      ierr = MatCreateMFFD(*A_COMM_WORLD, iqe*blocksize, iqe*blocksize, 
PETSC_DETERMINE, PETSC_DETERMINE, &petsc_A_mf); CHKERRQ(ierr);
      ierr = MatMFFDSetFunction(petsc_A_mf, FormFunction_mf, this); 
CHKERRQ(ierr);

KSP linear operator is set up as:

      ierr = KSPSetOperators(petsc_ksp, petsc_A_mf, petsc_A_pre); 
CHKERRQ(ierr); //petsc_A_pre is my assembled pre-conditioning matrix

Before calling KSPSolve, I do:

         ierr = MatMFFDSetBase(petsc_A_mf, petsc_csv, petsc_baserhs); 
CHKERRQ(ierr); //petsc_csv is the flow states, petsc_baserhs is the 
pre-computed right hand side

The call back function is defined as:

   PetscErrorCode cFdDomain::FormFunction_mf(void *ctx, Vec in_vec, Vec out_vec)
  {
      PetscErrorCode ierr;
      cFdDomain *user_ctx;

      cout << "FormFunction_mf called\n";

      //in_vec: flow states
      //out_vec: right hand side + diagonal contributions from CFL number

      user_ctx = (cFdDomain*)ctx;

      //get perturbed conservative variables from petsc
      user_ctx->petsc_getcsv(in_vec);

      //get new right side
      user_ctx->petsc_fd_rhs();

      //set new right hand side to the output vector
      user_ctx->petsc_setrhs(out_vec);

      ierr = 0;
      return ierr;
  }

The linear system I am solving is (J+D)x=RHS. J is the Jacobian matrix.  D is a 
diagonal matrix and it is used to stabilise the solution at the start but 
reduced gradually when the solution moves on to recover Newton's method. I add 
D*x to the true right side when non-linear function is computed to work out 
finite difference Jacobian, so when finite difference is used, it actually 
computes (J+D)*dx.

The code runs but diverges in the end. If I don't do matrix-free and use my 
approximate Jacobian matrix, GMRES  works. So something is wrong with my 
matrix-free implementation. Have I missed something in my implementation? 
Besides,  is there a way to check if the finite difference Jacobian matrix is 
computed correctly in a matrix-free implementation?

Thanks for your help in advance.
Feng



--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>

Reply via email to