Re: [petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver

Barry Smith Wed, 20 Oct 2021 16:56:35 -0700


   Hmm. A fix should work (almost exactly the same) with or without the block 
Jacobi on subdomains level, I had assumed that Junchao's branch would handle 
this. Have you looked at it?


  Barry


> On Oct 20, 2021, at 6:14 PM, Chang Liu <c...@pppl.gov> wrote:
> 
> Hi Barry,
> 
> Wait, by "branch" are you talking about the MR Junchao submitted?
> 
> That fix (proposed by me) is only to fix the issue for telescope to work on 
> mpiaijcusparse, when using outside bjacobi. It has nothing to do with the 
> issue for telescope inside bjacobi. It does not help in my tests.
> 
> If my emails made you think the other way, I apologize for that.
> 
> Regards,
> 
> Chang
> 
> On 10/20/21 4:40 PM, Barry Smith wrote:
>>   Yes, but the branch can be used to do telescoping inside the bjacobi as 
>> needed.
>>> On Oct 20, 2021, at 2:59 PM, Junchao Zhang <junchao.zh...@gmail.com 
>>> <mailto:junchao.zh...@gmail.com>> wrote:
>>> 
>>> The MR https://gitlab.com/petsc/petsc/-/merge_requests/4471 
>>> <https://gitlab.com/petsc/petsc/-/merge_requests/4471> has not been merged 
>>> yet.
>>> 
>>> --Junchao Zhang
>>> 
>>> 
>>> On Wed, Oct 20, 2021 at 1:47 PM Chang Liu via petsc-users 
>>> <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>> wrote:
>>> 
>>>    Hi Barry,
>>> 
>>>    Are the fixes merged in the master? I was using bjacobi as a
>>>    preconditioner. Using the latest version of petsc, I found that by
>>>    calling
>>> 
>>>    mpiexec -n 32 --oversubscribe ./ex7 -m 1000 -ksp_view
>>>    -ksp_monitor_true_residual -ksp_type fgmres -pc_type bjacobi
>>>    -pc_bjacobi
>>>    _blocks 4 -sub_ksp_type preonly -sub_pc_type telescope
>>>    -sub_pc_telescope_reduction_factor 8 -sub_pc_telescope_subcomm_type
>>>    contiguous -sub_telescope_pc_type lu -sub_telescope_ksp_type preonly
>>>    -sub_telescope_pc_factor_mat_solver_type mumps -ksp_max_it 2000
>>>    -ksp_rtol 1.e-30 -ksp_atol 1.e-30
>>> 
>>>    The code is calling PCApply_BJacobi_Multiproc. If I use
>>> 
>>>    mpiexec -n 32 --oversubscribe ./ex7 -m 1000 -ksp_view
>>>    -ksp_monitor_true_residual -telescope_ksp_monitor_true_residual
>>>    -ksp_type preonly -pc_type telescope -pc_telescope_reduction_factor 8
>>>    -pc_telescope_subcomm_type contiguous -telescope_pc_type bjacobi
>>>    -telescope_ksp_type fgmres -telescope_pc_bjacobi_blocks 4
>>>    -telescope_sub_ksp_type preonly -telescope_sub_pc_type lu
>>>    -telescope_sub_pc_factor_mat_solver_type mumps -telescope_ksp_max_it
>>>    2000 -telescope_ksp_rtol 1.e-30 -telescope_ksp_atol 1.e-30
>>> 
>>>    The code is calling PCApply_BJacobi_Singleblock. You can test it
>>>    yourself.
>>> 
>>>    Regards,
>>> 
>>>    Chang
>>> 
>>>    On 10/20/21 1:14 PM, Barry Smith wrote:
>>>    >
>>>    >
>>>    >> On Oct 20, 2021, at 12:48 PM, Chang Liu <c...@pppl.gov
>>>    <mailto:c...@pppl.gov>> wrote:
>>>    >>
>>>    >> Hi Pierre,
>>>    >>
>>>    >> I have another suggestion for telescope. I have achieved my
>>>    goal by putting telescope outside bjacobi. But the code still does
>>>    not work if I use telescope as a pc for subblock. I think the
>>>    reason is that I want to use cusparse as the solver, which can
>>>    only deal with seqaij matrix and not mpiaij matrix.
>>>    >
>>>    >
>>>    >      This is suppose to work with the recent fixes. The
>>>    telescope should produce a seq matrix and for each solve map the
>>>    parallel vector (over the subdomain) automatically down to the one
>>>    rank with the GPU to solve it on the GPU. It is not clear to me
>>>    where the process is going wrong.
>>>    >
>>>    >    Barry
>>>    >
>>>    >
>>>    >
>>>    >> However, for telescope pc, it can put the matrix into one mpi
>>>    rank, thus making it a seqaij for factorization stage, but then
>>>    after factorization it will give the data back to the original
>>>    comminicator. This will make the matrix back to mpiaij, and then
>>>    cusparse cannot solve it.
>>>    >>
>>>    >> I think a better option is to do the factorization on CPU with
>>>    mpiaij, then then transform the preconditioner matrix to seqaij
>>>    and do the matsolve GPU. But I am not sure if it can be achieved
>>>    using telescope.
>>>    >>
>>>    >> Regads,
>>>    >>
>>>    >> Chang
>>>    >>
>>>    >> On 10/15/21 5:29 AM, Pierre Jolivet wrote:
>>>    >>> Hi Chang,
>>>    >>> The output you sent with MUMPS looks alright to me, you can
>>>    see that the MatType is properly set to seqaijcusparse (and not
>>>    mpiaijcusparse).
>>>    >>> I don’t know what is wrong with
>>>    -sub_telescope_pc_factor_mat_solver_type cusparse, I don’t have a
>>>    PETSc installation for testing this, hopefully Barry or Junchao
>>>    can confirm this wrong behavior and get this fixed.
>>>    >>> As for permuting PCTELESCOPE and PCBJACOBI, in your case, the
>>>    outer PC will be equivalent, yes.
>>>    >>> However, it would be more efficient to do PCBJACOBI and then
>>>    PCTELESCOPE.
>>>    >>> PCBJACOBI prunes the operator by basically removing all
>>>    coefficients outside of the diagonal blocks.
>>>    >>> Then, PCTELESCOPE "groups everything together”.
>>>    >>> If you do it the other way around, PCTELESCOPE will “group
>>>    everything together” and then PCBJACOBI will prune the operator.
>>>    >>> So the PCTELESCOPE SetUp will be costly for nothing since some
>>>    coefficients will be thrown out afterwards in the PCBJACOBI SetUp.
>>>    >>> I hope I’m clear enough, otherwise I can try do draw some
>>>    pictures.
>>>    >>> Thanks,
>>>    >>> Pierre
>>>    >>>> On 15 Oct 2021, at 4:39 AM, Chang Liu <c...@pppl.gov
>>>    <mailto:c...@pppl.gov>> wrote:
>>>    >>>>
>>>    >>>> Hi Pierre and Barry,
>>>    >>>>
>>>    >>>> I think maybe I should use telescope outside bjacobi? like this
>>>    >>>>
>>>    >>>> mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m
>>>    400 -ksp_view -ksp_monitor_true_residual -pc_type telescope
>>>    -pc_telescope_reduction_factor 4 -t
>>>    >>>> elescope_pc_type bjacobi -telescope_ksp_type fgmres
>>>    -telescope_pc_bjacobi_blocks 4 -mat_type aijcusparse
>>>    -telescope_sub_ksp_type preonly -telescope_sub_pc_type lu
>>>    -telescope_sub_pc_factor_mat_solve
>>>    >>>> r_type cusparse -ksp_max_it 2000 -ksp_rtol 1.e-20 -ksp_atol 1.e-9
>>>    >>>>
>>>    >>>> But then I got an error that
>>>    >>>>
>>>    >>>> [0]PETSC ERROR: MatSolverType cusparse does not support
>>>    matrix type seqaij
>>>    >>>>
>>>    >>>> But the mat type should be aijcusparse. I think telescope
>>>    change the mat type.
>>>    >>>>
>>>    >>>> Chang
>>>    >>>>
>>>    >>>> On 10/14/21 10:11 PM, Chang Liu wrote:
>>>    >>>>> For comparison, here is the output using mumps instead of
>>>    cusparse
>>>    >>>>> $ mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m
>>>    400 -ksp_view -ksp_monitor_true_residual -pc_type bjacobi
>>>    -pc_bjacobi_blocks 4 -ksp_type fgmres -mat_type aijcusparse
>>>    -sub_pc_type telescope -sub_ksp_type preonly
>>>    -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu
>>>    -sub_telescope_pc_factor_mat_solver_type mumps
>>>    -sub_pc_telescope_reduction_factor 4
>>>    -sub_pc_telescope_subcomm_type contiguous -ksp_max_it 2000
>>>    -ksp_rtol 1.e-20 -ksp_atol 1.e-9
>>>    >>>>>    0 KSP unpreconditioned resid norm 4.014971979977e+01 true
>>>    resid norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
>>>    >>>>>    1 KSP unpreconditioned resid norm 2.439995191694e+00 true
>>>    resid norm 2.439995191694e+00 ||r(i)||/||b|| 6.077240896978e-02
>>>    >>>>>    2 KSP unpreconditioned resid norm 1.280694102588e+00 true
>>>    resid norm 1.280694102588e+00 ||r(i)||/||b|| 3.189795866509e-02
>>>    >>>>>    3 KSP unpreconditioned resid norm 1.041100266810e+00 true
>>>    resid norm 1.041100266810e+00 ||r(i)||/||b|| 2.593044912896e-02
>>>    >>>>>    4 KSP unpreconditioned resid norm 7.274347137268e-01 true
>>>    resid norm 7.274347137268e-01 ||r(i)||/||b|| 1.811805206499e-02
>>>    >>>>>    5 KSP unpreconditioned resid norm 5.429229329787e-01 true
>>>    resid norm 5.429229329787e-01 ||r(i)||/||b|| 1.352245882876e-02
>>>    >>>>>    6 KSP unpreconditioned resid norm 4.332970410353e-01 true
>>>    resid norm 4.332970410353e-01 ||r(i)||/||b|| 1.079203150598e-02
>>>    >>>>>    7 KSP unpreconditioned resid norm 3.948206050950e-01 true
>>>    resid norm 3.948206050950e-01 ||r(i)||/||b|| 9.833707609019e-03
>>>    >>>>>    8 KSP unpreconditioned resid norm 3.379580577269e-01 true
>>>    resid norm 3.379580577269e-01 ||r(i)||/||b|| 8.417444988714e-03
>>>    >>>>>    9 KSP unpreconditioned resid norm 2.875593971410e-01 true
>>>    resid norm 2.875593971410e-01 ||r(i)||/||b|| 7.162176936105e-03
>>>    >>>>>   10 KSP unpreconditioned resid norm 2.533983363244e-01 true
>>>    resid norm 2.533983363244e-01 ||r(i)||/||b|| 6.311335112378e-03
>>>    >>>>>   11 KSP unpreconditioned resid norm 2.389169921094e-01 true
>>>    resid norm 2.389169921094e-01 ||r(i)||/||b|| 5.950651543793e-03
>>>    >>>>>   12 KSP unpreconditioned resid norm 2.118961639089e-01 true
>>>    resid norm 2.118961639089e-01 ||r(i)||/||b|| 5.277649880637e-03
>>>    >>>>>   13 KSP unpreconditioned resid norm 1.885892030223e-01 true
>>>    resid norm 1.885892030223e-01 ||r(i)||/||b|| 4.697148671593e-03
>>>    >>>>>   14 KSP unpreconditioned resid norm 1.763510666948e-01 true
>>>    resid norm 1.763510666948e-01 ||r(i)||/||b|| 4.392336175055e-03
>>>    >>>>>   15 KSP unpreconditioned resid norm 1.638219366731e-01 true
>>>    resid norm 1.638219366731e-01 ||r(i)||/||b|| 4.080275964317e-03
>>>    >>>>>   16 KSP unpreconditioned resid norm 1.476792766432e-01 true
>>>    resid norm 1.476792766432e-01 ||r(i)||/||b|| 3.678214378076e-03
>>>    >>>>>   17 KSP unpreconditioned resid norm 1.349906937321e-01 true
>>>    resid norm 1.349906937321e-01 ||r(i)||/||b|| 3.362182710248e-03
>>>    >>>>>   18 KSP unpreconditioned resid norm 1.289673236836e-01 true
>>>    resid norm 1.289673236836e-01 ||r(i)||/||b|| 3.212159993314e-03
>>>    >>>>>   19 KSP unpreconditioned resid norm 1.167505658153e-01 true
>>>    resid norm 1.167505658153e-01 ||r(i)||/||b|| 2.907879965230e-03
>>>    >>>>>   20 KSP unpreconditioned resid norm 1.046037988999e-01 true
>>>    resid norm 1.046037988999e-01 ||r(i)||/||b|| 2.605343185995e-03
>>>    >>>>>   21 KSP unpreconditioned resid norm 9.832660514331e-02 true
>>>    resid norm 9.832660514331e-02 ||r(i)||/||b|| 2.448998539309e-03
>>>    >>>>>   22 KSP unpreconditioned resid norm 8.835618950141e-02 true
>>>    resid norm 8.835618950142e-02 ||r(i)||/||b|| 2.200667649539e-03
>>>    >>>>>   23 KSP unpreconditioned resid norm 7.563496650115e-02 true
>>>    resid norm 7.563496650116e-02 ||r(i)||/||b|| 1.883823022386e-03
>>>    >>>>>   24 KSP unpreconditioned resid norm 6.651291376834e-02 true
>>>    resid norm 6.651291376834e-02 ||r(i)||/||b|| 1.656622115921e-03
>>>    >>>>>   25 KSP unpreconditioned resid norm 5.890393227906e-02 true
>>>    resid norm 5.890393227906e-02 ||r(i)||/||b|| 1.467106933070e-03
>>>    >>>>>   26 KSP unpreconditioned resid norm 4.661992782780e-02 true
>>>    resid norm 4.661992782780e-02 ||r(i)||/||b|| 1.161152009536e-03
>>>    >>>>>   27 KSP unpreconditioned resid norm 3.690705358716e-02 true
>>>    resid norm 3.690705358716e-02 ||r(i)||/||b|| 9.192356452602e-04
>>>    >>>>>   28 KSP unpreconditioned resid norm 3.209680460188e-02 true
>>>    resid norm 3.209680460188e-02 ||r(i)||/||b|| 7.994278605666e-04
>>>    >>>>>   29 KSP unpreconditioned resid norm 2.354337626000e-02 true
>>>    resid norm 2.354337626001e-02 ||r(i)||/||b|| 5.863895533373e-04
>>>    >>>>>   30 KSP unpreconditioned resid norm 1.701296561785e-02 true
>>>    resid norm 1.701296561785e-02 ||r(i)||/||b|| 4.237380908932e-04
>>>    >>>>>   31 KSP unpreconditioned resid norm 1.509942937258e-02 true
>>>    resid norm 1.509942937258e-02 ||r(i)||/||b|| 3.760780759588e-04
>>>    >>>>>   32 KSP unpreconditioned resid norm 1.258274688515e-02 true
>>>    resid norm 1.258274688515e-02 ||r(i)||/||b|| 3.133956338402e-04
>>>    >>>>>   33 KSP unpreconditioned resid norm 9.805748771638e-03 true
>>>    resid norm 9.805748771638e-03 ||r(i)||/||b|| 2.442295692359e-04
>>>    >>>>>   34 KSP unpreconditioned resid norm 8.596552678160e-03 true
>>>    resid norm 8.596552678160e-03 ||r(i)||/||b|| 2.141123953301e-04
>>>    >>>>>   35 KSP unpreconditioned resid norm 6.936406707500e-03 true
>>>    resid norm 6.936406707500e-03 ||r(i)||/||b|| 1.727635147167e-04
>>>    >>>>>   36 KSP unpreconditioned resid norm 5.533741607932e-03 true
>>>    resid norm 5.533741607932e-03 ||r(i)||/||b|| 1.378276519869e-04
>>>    >>>>>   37 KSP unpreconditioned resid norm 4.982347757923e-03 true
>>>    resid norm 4.982347757923e-03 ||r(i)||/||b|| 1.240942099414e-04
>>>    >>>>>   38 KSP unpreconditioned resid norm 4.309608348059e-03 true
>>>    resid norm 4.309608348059e-03 ||r(i)||/||b|| 1.073384414524e-04
>>>    >>>>>   39 KSP unpreconditioned resid norm 3.729408303186e-03 true
>>>    resid norm 3.729408303185e-03 ||r(i)||/||b|| 9.288753001974e-05
>>>    >>>>>   40 KSP unpreconditioned resid norm 3.490003351128e-03 true
>>>    resid norm 3.490003351128e-03 ||r(i)||/||b|| 8.692472496776e-05
>>>    >>>>>   41 KSP unpreconditioned resid norm 3.069012426454e-03 true
>>>    resid norm 3.069012426453e-03 ||r(i)||/||b|| 7.643919912166e-05
>>>    >>>>>   42 KSP unpreconditioned resid norm 2.772928845284e-03 true
>>>    resid norm 2.772928845284e-03 ||r(i)||/||b|| 6.906471225983e-05
>>>    >>>>>   43 KSP unpreconditioned resid norm 2.561454192399e-03 true
>>>    resid norm 2.561454192398e-03 ||r(i)||/||b|| 6.379756085902e-05
>>>    >>>>>   44 KSP unpreconditioned resid norm 2.253662762802e-03 true
>>>    resid norm 2.253662762802e-03 ||r(i)||/||b|| 5.613146926159e-05
>>>    >>>>>   45 KSP unpreconditioned resid norm 2.086800523919e-03 true
>>>    resid norm 2.086800523919e-03 ||r(i)||/||b|| 5.197546917701e-05
>>>    >>>>>   46 KSP unpreconditioned resid norm 1.926028182896e-03 true
>>>    resid norm 1.926028182896e-03 ||r(i)||/||b|| 4.797114880257e-05
>>>    >>>>>   47 KSP unpreconditioned resid norm 1.769243808622e-03 true
>>>    resid norm 1.769243808622e-03 ||r(i)||/||b|| 4.406615581492e-05
>>>    >>>>>   48 KSP unpreconditioned resid norm 1.656654905964e-03 true
>>>    resid norm 1.656654905964e-03 ||r(i)||/||b|| 4.126192945371e-05
>>>    >>>>>   49 KSP unpreconditioned resid norm 1.572052627273e-03 true
>>>    resid norm 1.572052627273e-03 ||r(i)||/||b|| 3.915475961260e-05
>>>    >>>>>   50 KSP unpreconditioned resid norm 1.454960682355e-03 true
>>>    resid norm 1.454960682355e-03 ||r(i)||/||b|| 3.623837699518e-05
>>>    >>>>>   51 KSP unpreconditioned resid norm 1.375985053014e-03 true
>>>    resid norm 1.375985053014e-03 ||r(i)||/||b|| 3.427134883820e-05
>>>    >>>>>   52 KSP unpreconditioned resid norm 1.269325501087e-03 true
>>>    resid norm 1.269325501087e-03 ||r(i)||/||b|| 3.161480347603e-05
>>>    >>>>>   53 KSP unpreconditioned resid norm 1.184791772965e-03 true
>>>    resid norm 1.184791772965e-03 ||r(i)||/||b|| 2.950934100844e-05
>>>    >>>>>   54 KSP unpreconditioned resid norm 1.064535156080e-03 true
>>>    resid norm 1.064535156080e-03 ||r(i)||/||b|| 2.651413662135e-05
>>>    >>>>>   55 KSP unpreconditioned resid norm 9.639036688120e-04 true
>>>    resid norm 9.639036688117e-04 ||r(i)||/||b|| 2.400773090370e-05
>>>    >>>>>   56 KSP unpreconditioned resid norm 8.632359780260e-04 true
>>>    resid norm 8.632359780260e-04 ||r(i)||/||b|| 2.150042347322e-05
>>>    >>>>>   57 KSP unpreconditioned resid norm 7.613605783850e-04 true
>>>    resid norm 7.613605783850e-04 ||r(i)||/||b|| 1.896303591113e-05
>>>    >>>>>   58 KSP unpreconditioned resid norm 6.681073248348e-04 true
>>>    resid norm 6.681073248349e-04 ||r(i)||/||b|| 1.664039819373e-05
>>>    >>>>>   59 KSP unpreconditioned resid norm 5.656127908544e-04 true
>>>    resid norm 5.656127908545e-04 ||r(i)||/||b|| 1.408758999254e-05
>>>    >>>>>   60 KSP unpreconditioned resid norm 4.850863370767e-04 true
>>>    resid norm 4.850863370767e-04 ||r(i)||/||b|| 1.208193580169e-05
>>>    >>>>>   61 KSP unpreconditioned resid norm 4.374055762320e-04 true
>>>    resid norm 4.374055762316e-04 ||r(i)||/||b|| 1.089436186387e-05
>>>    >>>>>   62 KSP unpreconditioned resid norm 3.874398257079e-04 true
>>>    resid norm 3.874398257077e-04 ||r(i)||/||b|| 9.649876204364e-06
>>>    >>>>>   63 KSP unpreconditioned resid norm 3.364908694427e-04 true
>>>    resid norm 3.364908694429e-04 ||r(i)||/||b|| 8.380902061609e-06
>>>    >>>>>   64 KSP unpreconditioned resid norm 2.961034697265e-04 true
>>>    resid norm 2.961034697268e-04 ||r(i)||/||b|| 7.374982221632e-06
>>>    >>>>>   65 KSP unpreconditioned resid norm 2.640593092764e-04 true
>>>    resid norm 2.640593092767e-04 ||r(i)||/||b|| 6.576865557059e-06
>>>    >>>>>   66 KSP unpreconditioned resid norm 2.423231125743e-04 true
>>>    resid norm 2.423231125745e-04 ||r(i)||/||b|| 6.035487016671e-06
>>>    >>>>>   67 KSP unpreconditioned resid norm 2.182349471179e-04 true
>>>    resid norm 2.182349471179e-04 ||r(i)||/||b|| 5.435528521898e-06
>>>    >>>>>   68 KSP unpreconditioned resid norm 2.008438265031e-04 true
>>>    resid norm 2.008438265028e-04 ||r(i)||/||b|| 5.002371809927e-06
>>>    >>>>>   69 KSP unpreconditioned resid norm 1.838732863386e-04 true
>>>    resid norm 1.838732863388e-04 ||r(i)||/||b|| 4.579690400226e-06
>>>    >>>>>   70 KSP unpreconditioned resid norm 1.723786027645e-04 true
>>>    resid norm 1.723786027645e-04 ||r(i)||/||b|| 4.293394913444e-06
>>>    >>>>>   71 KSP unpreconditioned resid norm 1.580945192204e-04 true
>>>    resid norm 1.580945192205e-04 ||r(i)||/||b|| 3.937624471826e-06
>>>    >>>>>   72 KSP unpreconditioned resid norm 1.476687469671e-04 true
>>>    resid norm 1.476687469671e-04 ||r(i)||/||b|| 3.677952117812e-06
>>>    >>>>>   73 KSP unpreconditioned resid norm 1.385018526182e-04 true
>>>    resid norm 1.385018526184e-04 ||r(i)||/||b|| 3.449634351350e-06
>>>    >>>>>   74 KSP unpreconditioned resid norm 1.279712893541e-04 true
>>>    resid norm 1.279712893541e-04 ||r(i)||/||b|| 3.187351991305e-06
>>>    >>>>>   75 KSP unpreconditioned resid norm 1.202010411772e-04 true
>>>    resid norm 1.202010411774e-04 ||r(i)||/||b|| 2.993820175504e-06
>>>    >>>>>   76 KSP unpreconditioned resid norm 1.113459414198e-04 true
>>>    resid norm 1.113459414200e-04 ||r(i)||/||b|| 2.773268206485e-06
>>>    >>>>>   77 KSP unpreconditioned resid norm 1.042523036036e-04 true
>>>    resid norm 1.042523036037e-04 ||r(i)||/||b|| 2.596588572066e-06
>>>    >>>>>   78 KSP unpreconditioned resid norm 9.565176453232e-05 true
>>>    resid norm 9.565176453227e-05 ||r(i)||/||b|| 2.382376888539e-06
>>>    >>>>>   79 KSP unpreconditioned resid norm 8.896901670359e-05 true
>>>    resid norm 8.896901670365e-05 ||r(i)||/||b|| 2.215931198209e-06
>>>    >>>>>   80 KSP unpreconditioned resid norm 8.119298425803e-05 true
>>>    resid norm 8.119298425824e-05 ||r(i)||/||b|| 2.022255314935e-06
>>>    >>>>>   81 KSP unpreconditioned resid norm 7.544528309154e-05 true
>>>    resid norm 7.544528309154e-05 ||r(i)||/||b|| 1.879098620558e-06
>>>    >>>>>   82 KSP unpreconditioned resid norm 6.755385041138e-05 true
>>>    resid norm 6.755385041176e-05 ||r(i)||/||b|| 1.682548489719e-06
>>>    >>>>>   83 KSP unpreconditioned resid norm 6.158629300870e-05 true
>>>    resid norm 6.158629300835e-05 ||r(i)||/||b|| 1.533915885727e-06
>>>    >>>>>   84 KSP unpreconditioned resid norm 5.358756885754e-05 true
>>>    resid norm 5.358756885765e-05 ||r(i)||/||b|| 1.334693470462e-06
>>>    >>>>>   85 KSP unpreconditioned resid norm 4.774852370380e-05 true
>>>    resid norm 4.774852370387e-05 ||r(i)||/||b|| 1.189261692037e-06
>>>    >>>>>   86 KSP unpreconditioned resid norm 3.919358737908e-05 true
>>>    resid norm 3.919358737930e-05 ||r(i)||/||b|| 9.761858258229e-07
>>>    >>>>>   87 KSP unpreconditioned resid norm 3.434042319950e-05 true
>>>    resid norm 3.434042319947e-05 ||r(i)||/||b|| 8.553091620745e-07
>>>    >>>>>   88 KSP unpreconditioned resid norm 2.813699436281e-05 true
>>>    resid norm 2.813699436302e-05 ||r(i)||/||b|| 7.008017615898e-07
>>>    >>>>>   89 KSP unpreconditioned resid norm 2.462248069068e-05 true
>>>    resid norm 2.462248069051e-05 ||r(i)||/||b|| 6.132665635851e-07
>>>    >>>>>   90 KSP unpreconditioned resid norm 2.040558789626e-05 true
>>>    resid norm 2.040558789626e-05 ||r(i)||/||b|| 5.082373674841e-07
>>>    >>>>>   91 KSP unpreconditioned resid norm 1.888523204468e-05 true
>>>    resid norm 1.888523204470e-05 ||r(i)||/||b|| 4.703702077842e-07
>>>    >>>>>   92 KSP unpreconditioned resid norm 1.707071292484e-05 true
>>>    resid norm 1.707071292474e-05 ||r(i)||/||b|| 4.251763900191e-07
>>>    >>>>>   93 KSP unpreconditioned resid norm 1.498636454665e-05 true
>>>    resid norm 1.498636454672e-05 ||r(i)||/||b|| 3.732619958859e-07
>>>    >>>>>   94 KSP unpreconditioned resid norm 1.219393542993e-05 true
>>>    resid norm 1.219393543006e-05 ||r(i)||/||b|| 3.037115947725e-07
>>>    >>>>>   95 KSP unpreconditioned resid norm 1.059996963300e-05 true
>>>    resid norm 1.059996963303e-05 ||r(i)||/||b|| 2.640110487917e-07
>>>    >>>>>   96 KSP unpreconditioned resid norm 9.099659872548e-06 true
>>>    resid norm 9.099659873214e-06 ||r(i)||/||b|| 2.266431725699e-07
>>>    >>>>>   97 KSP unpreconditioned resid norm 8.147347587295e-06 true
>>>    resid norm 8.147347587584e-06 ||r(i)||/||b|| 2.029241456283e-07
>>>    >>>>>   98 KSP unpreconditioned resid norm 7.167226146744e-06 true
>>>    resid norm 7.167226146783e-06 ||r(i)||/||b|| 1.785124823418e-07
>>>    >>>>>   99 KSP unpreconditioned resid norm 6.552540209538e-06 true
>>>    resid norm 6.552540209577e-06 ||r(i)||/||b|| 1.632026385802e-07
>>>    >>>>> 100 KSP unpreconditioned resid norm 5.767783600111e-06 true
>>>    resid norm 5.767783600320e-06 ||r(i)||/||b|| 1.436568830140e-07
>>>    >>>>> 101 KSP unpreconditioned resid norm 5.261057430584e-06 true
>>>    resid norm 5.261057431144e-06 ||r(i)||/||b|| 1.310359688033e-07
>>>    >>>>> 102 KSP unpreconditioned resid norm 4.715498525786e-06 true
>>>    resid norm 4.715498525947e-06 ||r(i)||/||b|| 1.174478564100e-07
>>>    >>>>> 103 KSP unpreconditioned resid norm 4.380052669622e-06 true
>>>    resid norm 4.380052669825e-06 ||r(i)||/||b|| 1.090929822591e-07
>>>    >>>>> 104 KSP unpreconditioned resid norm 3.911664470060e-06 true
>>>    resid norm 3.911664470226e-06 ||r(i)||/||b|| 9.742694319496e-08
>>>    >>>>> 105 KSP unpreconditioned resid norm 3.652211458315e-06 true
>>>    resid norm 3.652211458259e-06 ||r(i)||/||b|| 9.096480564430e-08
>>>    >>>>> 106 KSP unpreconditioned resid norm 3.387532128049e-06 true
>>>    resid norm 3.387532128358e-06 ||r(i)||/||b|| 8.437249737363e-08
>>>    >>>>> 107 KSP unpreconditioned resid norm 3.234218880987e-06 true
>>>    resid norm 3.234218880798e-06 ||r(i)||/||b|| 8.055395895481e-08
>>>    >>>>> 108 KSP unpreconditioned resid norm 3.016905196388e-06 true
>>>    resid norm 3.016905196492e-06 ||r(i)||/||b|| 7.514137611763e-08
>>>    >>>>> 109 KSP unpreconditioned resid norm 2.858246441921e-06 true
>>>    resid norm 2.858246441975e-06 ||r(i)||/||b|| 7.118969836476e-08
>>>    >>>>> 110 KSP unpreconditioned resid norm 2.637118810847e-06 true
>>>    resid norm 2.637118810750e-06 ||r(i)||/||b|| 6.568212241336e-08
>>>    >>>>> 111 KSP unpreconditioned resid norm 2.494976088717e-06 true
>>>    resid norm 2.494976088700e-06 ||r(i)||/||b|| 6.214180574966e-08
>>>    >>>>> 112 KSP unpreconditioned resid norm 2.270639574272e-06 true
>>>    resid norm 2.270639574200e-06 ||r(i)||/||b|| 5.655430686750e-08
>>>    >>>>> 113 KSP unpreconditioned resid norm 2.104988663813e-06 true
>>>    resid norm 2.104988664169e-06 ||r(i)||/||b|| 5.242847707696e-08
>>>    >>>>> 114 KSP unpreconditioned resid norm 1.889361127301e-06 true
>>>    resid norm 1.889361127526e-06 ||r(i)||/||b|| 4.705789073868e-08
>>>    >>>>> 115 KSP unpreconditioned resid norm 1.732367008052e-06 true
>>>    resid norm 1.732367007971e-06 ||r(i)||/||b|| 4.314767367271e-08
>>>    >>>>> 116 KSP unpreconditioned resid norm 1.509288268391e-06 true
>>>    resid norm 1.509288268645e-06 ||r(i)||/||b|| 3.759150191264e-08
>>>    >>>>> 117 KSP unpreconditioned resid norm 1.359169217644e-06 true
>>>    resid norm 1.359169217445e-06 ||r(i)||/||b|| 3.385252062089e-08
>>>    >>>>> 118 KSP unpreconditioned resid norm 1.180146337735e-06 true
>>>    resid norm 1.180146337908e-06 ||r(i)||/||b|| 2.939363820703e-08
>>>    >>>>> 119 KSP unpreconditioned resid norm 1.067757039683e-06 true
>>>    resid norm 1.067757039924e-06 ||r(i)||/||b|| 2.659438335433e-08
>>>    >>>>> 120 KSP unpreconditioned resid norm 9.435833073736e-07 true
>>>    resid norm 9.435833073736e-07 ||r(i)||/||b|| 2.350161625235e-08
>>>    >>>>> 121 KSP unpreconditioned resid norm 8.749457237613e-07 true
>>>    resid norm 8.749457236791e-07 ||r(i)||/||b|| 2.179207546261e-08
>>>    >>>>> 122 KSP unpreconditioned resid norm 7.945760150897e-07 true
>>>    resid norm 7.945760150444e-07 ||r(i)||/||b|| 1.979032528762e-08
>>>    >>>>> 123 KSP unpreconditioned resid norm 7.141240839013e-07 true
>>>    resid norm 7.141240838682e-07 ||r(i)||/||b|| 1.778652721438e-08
>>>    >>>>> 124 KSP unpreconditioned resid norm 6.300566936733e-07 true
>>>    resid norm 6.300566936607e-07 ||r(i)||/||b|| 1.569267971988e-08
>>>    >>>>> 125 KSP unpreconditioned resid norm 5.628986997544e-07 true
>>>    resid norm 5.628986995849e-07 ||r(i)||/||b|| 1.401999073448e-08
>>>    >>>>> 126 KSP unpreconditioned resid norm 5.119018951602e-07 true
>>>    resid norm 5.119018951837e-07 ||r(i)||/||b|| 1.274982484900e-08
>>>    >>>>> 127 KSP unpreconditioned resid norm 4.664670343748e-07 true
>>>    resid norm 4.664670344042e-07 ||r(i)||/||b|| 1.161818903670e-08
>>>    >>>>> 128 KSP unpreconditioned resid norm 4.253264691112e-07 true
>>>    resid norm 4.253264691948e-07 ||r(i)||/||b|| 1.059351027394e-08
>>>    >>>>> 129 KSP unpreconditioned resid norm 3.868921150516e-07 true
>>>    resid norm 3.868921150517e-07 ||r(i)||/||b|| 9.636234498800e-09
>>>    >>>>> 130 KSP unpreconditioned resid norm 3.558445658540e-07 true
>>>    resid norm 3.558445660061e-07 ||r(i)||/||b|| 8.862940209315e-09
>>>    >>>>> 131 KSP unpreconditioned resid norm 3.268710273840e-07 true
>>>    resid norm 3.268710272455e-07 ||r(i)||/||b|| 8.141302825416e-09
>>>    >>>>> 132 KSP unpreconditioned resid norm 3.041273897592e-07 true
>>>    resid norm 3.041273896694e-07 ||r(i)||/||b|| 7.574832182794e-09
>>>    >>>>> 133 KSP unpreconditioned resid norm 2.851926677922e-07 true
>>>    resid norm 2.851926674248e-07 ||r(i)||/||b|| 7.103229333782e-09
>>>    >>>>> 134 KSP unpreconditioned resid norm 2.694708315072e-07 true
>>>    resid norm 2.694708309500e-07 ||r(i)||/||b|| 6.711649104748e-09
>>>    >>>>> 135 KSP unpreconditioned resid norm 2.534825559099e-07 true
>>>    resid norm 2.534825557469e-07 ||r(i)||/||b|| 6.313432746507e-09
>>>    >>>>> 136 KSP unpreconditioned resid norm 2.387342352458e-07 true
>>>    resid norm 2.387342351804e-07 ||r(i)||/||b|| 5.946099658254e-09
>>>    >>>>> 137 KSP unpreconditioned resid norm 2.200861667617e-07 true
>>>    resid norm 2.200861665255e-07 ||r(i)||/||b|| 5.481636425438e-09
>>>    >>>>> 138 KSP unpreconditioned resid norm 2.051415370616e-07 true
>>>    resid norm 2.051415370614e-07 ||r(i)||/||b|| 5.109413915824e-09
>>>    >>>>> 139 KSP unpreconditioned resid norm 1.887376429396e-07 true
>>>    resid norm 1.887376426682e-07 ||r(i)||/||b|| 4.700845824315e-09
>>>    >>>>> 140 KSP unpreconditioned resid norm 1.729743133005e-07 true
>>>    resid norm 1.729743128342e-07 ||r(i)||/||b|| 4.308232129561e-09
>>>    >>>>> 141 KSP unpreconditioned resid norm 1.541021130781e-07 true
>>>    resid norm 1.541021128364e-07 ||r(i)||/||b|| 3.838186508023e-09
>>>    >>>>> 142 KSP unpreconditioned resid norm 1.384631628565e-07 true
>>>    resid norm 1.384631627735e-07 ||r(i)||/||b|| 3.448670712125e-09
>>>    >>>>> 143 KSP unpreconditioned resid norm 1.223114405626e-07 true
>>>    resid norm 1.223114403883e-07 ||r(i)||/||b|| 3.046383411846e-09
>>>    >>>>> 144 KSP unpreconditioned resid norm 1.087313066223e-07 true
>>>    resid norm 1.087313065117e-07 ||r(i)||/||b|| 2.708146085550e-09
>>>    >>>>> 145 KSP unpreconditioned resid norm 9.181901998734e-08 true
>>>    resid norm 9.181901984268e-08 ||r(i)||/||b|| 2.286915582489e-09
>>>    >>>>> 146 KSP unpreconditioned resid norm 7.885850510808e-08 true
>>>    resid norm 7.885850531446e-08 ||r(i)||/||b|| 1.964110975313e-09
>>>    >>>>> 147 KSP unpreconditioned resid norm 6.483393946950e-08 true
>>>    resid norm 6.483393931383e-08 ||r(i)||/||b|| 1.614804278515e-09
>>>    >>>>> 148 KSP unpreconditioned resid norm 5.690132597004e-08 true
>>>    resid norm 5.690132577518e-08 ||r(i)||/||b|| 1.417228465328e-09
>>>    >>>>> 149 KSP unpreconditioned resid norm 5.023671521579e-08 true
>>>    resid norm 5.023671502186e-08 ||r(i)||/||b|| 1.251234511035e-09
>>>    >>>>> 150 KSP unpreconditioned resid norm 4.625371062660e-08 true
>>>    resid norm 4.625371062660e-08 ||r(i)||/||b|| 1.152030720445e-09
>>>    >>>>> 151 KSP unpreconditioned resid norm 4.349049084805e-08 true
>>>    resid norm 4.349049089337e-08 ||r(i)||/||b|| 1.083207830846e-09
>>>    >>>>> 152 KSP unpreconditioned resid norm 3.932593324498e-08 true
>>>    resid norm 3.932593376918e-08 ||r(i)||/||b|| 9.794821474546e-10
>>>    >>>>> 153 KSP unpreconditioned resid norm 3.504167649202e-08 true
>>>    resid norm 3.504167638113e-08 ||r(i)||/||b|| 8.727751166356e-10
>>>    >>>>> 154 KSP unpreconditioned resid norm 2.892726347747e-08 true
>>>    resid norm 2.892726348583e-08 ||r(i)||/||b|| 7.204848160858e-10
>>>    >>>>> 155 KSP unpreconditioned resid norm 2.477647033202e-08 true
>>>    resid norm 2.477647041570e-08 ||r(i)||/||b|| 6.171019508795e-10
>>>    >>>>> 156 KSP unpreconditioned resid norm 2.128504065757e-08 true
>>>    resid norm 2.128504067423e-08 ||r(i)||/||b|| 5.301416991298e-10
>>>    >>>>> 157 KSP unpreconditioned resid norm 1.879248809429e-08 true
>>>    resid norm 1.879248818928e-08 ||r(i)||/||b|| 4.680602575310e-10
>>>    >>>>> 158 KSP unpreconditioned resid norm 1.673649140073e-08 true
>>>    resid norm 1.673649134005e-08 ||r(i)||/||b|| 4.168520085200e-10
>>>    >>>>> 159 KSP unpreconditioned resid norm 1.497123388109e-08 true
>>>    resid norm 1.497123365569e-08 ||r(i)||/||b|| 3.728851342016e-10
>>>    >>>>> 160 KSP unpreconditioned resid norm 1.315982130162e-08 true
>>>    resid norm 1.315982149329e-08 ||r(i)||/||b|| 3.277687007261e-10
>>>    >>>>> 161 KSP unpreconditioned resid norm 1.182395864938e-08 true
>>>    resid norm 1.182395868430e-08 ||r(i)||/||b|| 2.944966675550e-10
>>>    >>>>> 162 KSP unpreconditioned resid norm 1.070204481679e-08 true
>>>    resid norm 1.070204466432e-08 ||r(i)||/||b|| 2.665534085342e-10
>>>    >>>>> 163 KSP unpreconditioned resid norm 9.969290307649e-09 true
>>>    resid norm 9.969290432333e-09 ||r(i)||/||b|| 2.483028644297e-10
>>>    >>>>> 164 KSP unpreconditioned resid norm 9.134440883306e-09 true
>>>    resid norm 9.134440980976e-09 ||r(i)||/||b|| 2.275094577628e-10
>>>    >>>>> 165 KSP unpreconditioned resid norm 8.593316427292e-09 true
>>>    resid norm 8.593316413360e-09 ||r(i)||/||b|| 2.140317904139e-10
>>>    >>>>> 166 KSP unpreconditioned resid norm 8.042173048464e-09 true
>>>    resid norm 8.042173332848e-09 ||r(i)||/||b|| 2.003045942277e-10
>>>    >>>>> 167 KSP unpreconditioned resid norm 7.655518522782e-09 true
>>>    resid norm 7.655518879144e-09 ||r(i)||/||b|| 1.906742791064e-10
>>>    >>>>> 168 KSP unpreconditioned resid norm 7.210283391815e-09 true
>>>    resid norm 7.210283220312e-09 ||r(i)||/||b|| 1.795848951442e-10
>>>    >>>>> 169 KSP unpreconditioned resid norm 6.793967416271e-09 true
>>>    resid norm 6.793967448832e-09 ||r(i)||/||b|| 1.692158122825e-10
>>>    >>>>> 170 KSP unpreconditioned resid norm 6.249160304588e-09 true
>>>    resid norm 6.249160382647e-09 ||r(i)||/||b|| 1.556464257736e-10
>>>    >>>>> 171 KSP unpreconditioned resid norm 5.794936438798e-09 true
>>>    resid norm 5.794936332552e-09 ||r(i)||/||b|| 1.443331699811e-10
>>>    >>>>> 172 KSP unpreconditioned resid norm 5.222337397128e-09 true
>>>    resid norm 5.222337443277e-09 ||r(i)||/||b|| 1.300715788135e-10
>>>    >>>>> 173 KSP unpreconditioned resid norm 4.755359110447e-09 true
>>>    resid norm 4.755358888996e-09 ||r(i)||/||b|| 1.184406494668e-10
>>>    >>>>> 174 KSP unpreconditioned resid norm 4.317537007873e-09 true
>>>    resid norm 4.317537267718e-09 ||r(i)||/||b|| 1.075359252630e-10
>>>    >>>>> 175 KSP unpreconditioned resid norm 3.924177535665e-09 true
>>>    resid norm 3.924177629720e-09 ||r(i)||/||b|| 9.773860563138e-11
>>>    >>>>> 176 KSP unpreconditioned resid norm 3.502843065115e-09 true
>>>    resid norm 3.502843126359e-09 ||r(i)||/||b|| 8.724452234855e-11
>>>    >>>>> 177 KSP unpreconditioned resid norm 3.083873232869e-09 true
>>>    resid norm 3.083873352938e-09 ||r(i)||/||b|| 7.680933686007e-11
>>>    >>>>> 178 KSP unpreconditioned resid norm 2.758980676473e-09 true
>>>    resid norm 2.758980618096e-09 ||r(i)||/||b|| 6.871730691658e-11
>>>    >>>>> 179 KSP unpreconditioned resid norm 2.510978240429e-09 true
>>>    resid norm 2.510978327392e-09 ||r(i)||/||b|| 6.254036989334e-11
>>>    >>>>> 180 KSP unpreconditioned resid norm 2.323000193205e-09 true
>>>    resid norm 2.323000193205e-09 ||r(i)||/||b|| 5.785844097519e-11
>>>    >>>>> 181 KSP unpreconditioned resid norm 2.167480159274e-09 true
>>>    resid norm 2.167480113693e-09 ||r(i)||/||b|| 5.398493749153e-11
>>>    >>>>> 182 KSP unpreconditioned resid norm 1.983545827983e-09 true
>>>    resid norm 1.983546404840e-09 ||r(i)||/||b|| 4.940374216139e-11
>>>    >>>>> 183 KSP unpreconditioned resid norm 1.794576286774e-09 true
>>>    resid norm 1.794576224361e-09 ||r(i)||/||b|| 4.469710457036e-11
>>>    >>>>> 184 KSP unpreconditioned resid norm 1.583490590644e-09 true
>>>    resid norm 1.583490380603e-09 ||r(i)||/||b|| 3.943963715064e-11
>>>    >>>>> 185 KSP unpreconditioned resid norm 1.412659866247e-09 true
>>>    resid norm 1.412659832191e-09 ||r(i)||/||b|| 3.518479927722e-11
>>>    >>>>> 186 KSP unpreconditioned resid norm 1.285613344939e-09 true
>>>    resid norm 1.285612984761e-09 ||r(i)||/||b|| 3.202047215205e-11
>>>    >>>>> 187 KSP unpreconditioned resid norm 1.168115133929e-09 true
>>>    resid norm 1.168114766904e-09 ||r(i)||/||b|| 2.909397058634e-11
>>>    >>>>> 188 KSP unpreconditioned resid norm 1.063377926053e-09 true
>>>    resid norm 1.063377647554e-09 ||r(i)||/||b|| 2.648530681802e-11
>>>    >>>>> 189 KSP unpreconditioned resid norm 9.548967728122e-10 true
>>>    resid norm 9.548964523410e-10 ||r(i)||/||b|| 2.378339019807e-11
>>>    >>>>> KSP Object: 16 MPI processes
>>>    >>>>>    type: fgmres
>>>    >>>>>      restart=30, using Classical (unmodified) Gram-Schmidt
>>>    Orthogonalization with no iterative refinement
>>>    >>>>>      happy breakdown tolerance 1e-30
>>>    >>>>>    maximum iterations=2000, initial guess is zero
>>>    >>>>>    tolerances:  relative=1e-20, absolute=1e-09,
>>>    divergence=10000.
>>>    >>>>>    right preconditioning
>>>    >>>>>    using UNPRECONDITIONED norm type for convergence test
>>>    >>>>> PC Object: 16 MPI processes
>>>    >>>>>    type: bjacobi
>>>    >>>>>      number of blocks = 4
>>>    >>>>>      Local solver information for first block is in the
>>>    following KSP and PC objects on rank 0:
>>>    >>>>>      Use -ksp_view ::ascii_info_detail to display
>>>    information for all blocks
>>>    >>>>>    KSP Object: (sub_) 4 MPI processes
>>>    >>>>>      type: preonly
>>>    >>>>>      maximum iterations=10000, initial guess is zero
>>>    >>>>>      tolerances:  relative=1e-05, absolute=1e-50,
>>>    divergence=10000.
>>>    >>>>>      left preconditioning
>>>    >>>>>      using NONE norm type for convergence test
>>>    >>>>>    PC Object: (sub_) 4 MPI processes
>>>    >>>>>      type: telescope
>>>    >>>>>        petsc subcomm: parent comm size reduction factor = 4
>>>    >>>>>        petsc subcomm: parent_size = 4 , subcomm_size = 1
>>>    >>>>>        petsc subcomm type = contiguous
>>>    >>>>>      linear system matrix = precond matrix:
>>>    >>>>>      Mat Object: (sub_) 4 MPI processes
>>>    >>>>>        type: mpiaij
>>>    >>>>>        rows=40200, cols=40200
>>>    >>>>>        total: nonzeros=199996, allocated nonzeros=203412
>>>    >>>>>        total number of mallocs used during MatSetValues calls=0
>>>    >>>>>          not using I-node (on process 0) routines
>>>    >>>>>          setup type: default
>>>    >>>>>          Parent DM object: NULL
>>>    >>>>>          Sub DM object: NULL
>>>    >>>>>          KSP Object:   (sub_telescope_)   1 MPI processes
>>>    >>>>>            type: preonly
>>>    >>>>>            maximum iterations=10000, initial guess is zero
>>>    >>>>>            tolerances:  relative=1e-05, absolute=1e-50,
>>>    divergence=10000.
>>>    >>>>>            left preconditioning
>>>    >>>>>            using NONE norm type for convergence test
>>>    >>>>>          PC Object:   (sub_telescope_)   1 MPI processes
>>>    >>>>>            type: lu
>>>    >>>>>              out-of-place factorization
>>>    >>>>>              tolerance for zero pivot 2.22045e-14
>>>    >>>>>              matrix ordering: external
>>>    >>>>>              factor fill ratio given 0., needed 0.
>>>    >>>>>                Factored matrix follows:
>>>    >>>>>                  Mat Object:   1 MPI processes
>>>    >>>>>                    type: mumps
>>>    >>>>>                    rows=40200, cols=40200
>>>    >>>>>                    package used to perform factorization: mumps
>>>    >>>>>                    total: nonzeros=1849788, allocated
>>>    nonzeros=1849788
>>>    >>>>>                      MUMPS run parameters:
>>>    >>>>>                        SYM (matrix type):                   0
>>>    >>>>>                        PAR (host participation):            1
>>>    >>>>>                        ICNTL(1) (output for error):         6
>>>    >>>>>                        ICNTL(2) (output of diagnostic msg): 0
>>>    >>>>>                        ICNTL(3) (output for global info):   0
>>>    >>>>>                        ICNTL(4) (level of printing):        0
>>>    >>>>>                        ICNTL(5) (input mat struct):         0
>>>    >>>>>                        ICNTL(6) (matrix prescaling):        7
>>>    >>>>>                        ICNTL(7) (sequential matrix ordering):7
>>>    >>>>>                        ICNTL(8) (scaling strategy):        77
>>>    >>>>>                        ICNTL(10) (max num of refinements):  0
>>>    >>>>>                        ICNTL(11) (error analysis):          0
>>>    >>>>>                        ICNTL(12) (efficiency control):        1
>>>    >>>>>                        ICNTL(13) (sequential factorization
>>>    of the root node):  0
>>>    >>>>>                        ICNTL(14) (percentage of estimated
>>>    workspace increase): 20
>>>    >>>>>                        ICNTL(18) (input mat struct):        0
>>>    >>>>>                        ICNTL(19) (Schur complement info):          
>>>  0
>>>    >>>>>                        ICNTL(20) (RHS sparse pattern):        0
>>>    >>>>>                        ICNTL(21) (solution struct):        0
>>>    >>>>>                        ICNTL(22) (in-core/out-of-core
>>>    facility):        0
>>>    >>>>>                        ICNTL(23) (max size of memory can be
>>>    allocated locally):0
>>>    >>>>>                        ICNTL(24) (detection of null pivot
>>>    rows):        0
>>>    >>>>>                        ICNTL(25) (computation of a null
>>>    space basis):         0
>>>    >>>>>                        ICNTL(26) (Schur options for RHS or
>>>    solution):         0
>>>    >>>>>                        ICNTL(27) (blocking size for multiple
>>>    RHS):         -32
>>>    >>>>>                        ICNTL(28) (use parallel or sequential
>>>    ordering):         1
>>>    >>>>>                        ICNTL(29) (parallel ordering):        0
>>>    >>>>>                        ICNTL(30) (user-specified set of
>>>    entries in inv(A)):    0
>>>    >>>>>                        ICNTL(31) (factors is discarded in
>>>    the solve phase):    0
>>>    >>>>>                        ICNTL(33) (compute determinant):        0
>>>    >>>>>                        ICNTL(35) (activate BLR based
>>>    factorization):         0
>>>    >>>>>                        ICNTL(36) (choice of BLR
>>>    factorization variant):         0
>>>    >>>>>                        ICNTL(38) (estimated compression rate
>>>    of LU factors):   333
>>>    >>>>>                        CNTL(1) (relative pivoting
>>>    threshold):      0.01
>>>    >>>>>                        CNTL(2) (stopping criterion of
>>>    refinement): 1.49012e-08
>>>    >>>>>                        CNTL(3) (absolute pivoting
>>>    threshold):      0.
>>>    >>>>>                        CNTL(4) (value of static pivoting):         
>>>    -1.
>>>    >>>>>                        CNTL(5) (fixation for null pivots):         
>>>    0.
>>>    >>>>>                        CNTL(7) (dropping parameter for
>>>    BLR):       0.
>>>    >>>>>                        RINFO(1) (local estimated flops for
>>>    the elimination after analysis):
>>>    >>>>>                          [0] 1.45525e+08
>>>    >>>>>                        RINFO(2) (local estimated flops for
>>>    the assembly after factorization):
>>>    >>>>>                          [0]  2.89397e+06
>>>    >>>>>                        RINFO(3) (local estimated flops for
>>>    the elimination after factorization):
>>>    >>>>>                          [0]  1.45525e+08
>>>    >>>>>                        INFO(15) (estimated size of (in MB)
>>>    MUMPS internal data for running numerical factorization):
>>>    >>>>>                        [0] 29
>>>    >>>>>                        INFO(16) (size of (in MB) MUMPS
>>>    internal data used during numerical factorization):
>>>    >>>>>                          [0] 29
>>>    >>>>>                        INFO(23) (num of pivots eliminated on
>>>    this processor after factorization):
>>>    >>>>>                          [0] 40200
>>>    >>>>>                        RINFOG(1) (global estimated flops for
>>>    the elimination after analysis): 1.45525e+08
>>>    >>>>>                        RINFOG(2) (global estimated flops for
>>>    the assembly after factorization): 2.89397e+06
>>>    >>>>>                        RINFOG(3) (global estimated flops for
>>>    the elimination after factorization): 1.45525e+08
>>>    >>>>>                        (RINFOG(12) RINFOG(13))*2^INFOG(34)
>>>    (determinant): (0.,0.)*(2^0)
>>>    >>>>>                        INFOG(3) (estimated real workspace
>>>    for factors on all processors after analysis): 1849788
>>>    >>>>>                        INFOG(4) (estimated integer workspace
>>>    for factors on all processors after analysis): 879986
>>>    >>>>>                        INFOG(5) (estimated maximum front
>>>    size in the complete tree): 282
>>>    >>>>>                        INFOG(6) (number of nodes in the
>>>    complete tree): 23709
>>>    >>>>>                        INFOG(7) (ordering option effectively
>>>    used after analysis): 5
>>>    >>>>>                        INFOG(8) (structural symmetry in
>>>    percent of the permuted matrix after analysis): 100
>>>    >>>>>                        INFOG(9) (total real/complex
>>>    workspace to store the matrix factors after factorization): 1849788
>>>    >>>>>                        INFOG(10) (total integer space store
>>>    the matrix factors after factorization): 879986
>>>    >>>>>                        INFOG(11) (order of largest frontal
>>>    matrix after factorization): 282
>>>    >>>>>                        INFOG(12) (number of off-diagonal
>>>    pivots): 0
>>>    >>>>>                        INFOG(13) (number of delayed pivots
>>>    after factorization): 0
>>>    >>>>>                        INFOG(14) (number of memory compress
>>>    after factorization): 0
>>>    >>>>>                        INFOG(15) (number of steps of
>>>    iterative refinement after solution): 0
>>>    >>>>>                        INFOG(16) (estimated size (in MB) of
>>>    all MUMPS internal data for factorization after analysis: value on
>>>    the most memory consuming processor): 29
>>>    >>>>>                        INFOG(17) (estimated size of all
>>>    MUMPS internal data for factorization after analysis: sum over all
>>>    processors): 29
>>>    >>>>>                        INFOG(18) (size of all MUMPS internal
>>>    data allocated during factorization: value on the most memory
>>>    consuming processor): 29
>>>    >>>>>                        INFOG(19) (size of all MUMPS internal
>>>    data allocated during factorization: sum over all processors): 29
>>>    >>>>>                        INFOG(20) (estimated number of
>>>    entries in the factors): 1849788
>>>    >>>>>                        INFOG(21) (size in MB of memory
>>>    effectively used during factorization - value on the most memory
>>>    consuming processor): 26
>>>    >>>>>                        INFOG(22) (size in MB of memory
>>>    effectively used during factorization - sum over all processors): 26
>>>    >>>>>                        INFOG(23) (after analysis: value of
>>>    ICNTL(6) effectively used): 0
>>>    >>>>>                        INFOG(24) (after analysis: value of
>>>    ICNTL(12) effectively used): 1
>>>    >>>>>                        INFOG(25) (after factorization:
>>>    number of pivots modified by static pivoting): 0
>>>    >>>>>                        INFOG(28) (after factorization:
>>>    number of null pivots encountered): 0
>>>    >>>>>                        INFOG(29) (after factorization:
>>>    effective number of entries in the factors (sum over all
>>>    processors)): 1849788
>>>    >>>>>                        INFOG(30, 31) (after solution: size
>>>    in Mbytes of memory used during solution phase): 29, 29
>>>    >>>>>                        INFOG(32) (after analysis: type of
>>>    analysis done): 1
>>>    >>>>>                        INFOG(33) (value used for ICNTL(8)): 7
>>>    >>>>>                        INFOG(34) (exponent of the
>>>    determinant if determinant is requested): 0
>>>    >>>>>                        INFOG(35) (after factorization:
>>>    number of entries taking into account BLR factor compression - sum
>>>    over all processors): 1849788
>>>    >>>>>                        INFOG(36) (after analysis: estimated
>>>    size of all MUMPS internal data for running BLR in-core - value on
>>>    the most memory consuming processor): 0
>>>    >>>>>                        INFOG(37) (after analysis: estimated
>>>    size of all MUMPS internal data for running BLR in-core - sum over
>>>    all processors): 0
>>>    >>>>>                        INFOG(38) (after analysis: estimated
>>>    size of all MUMPS internal data for running BLR out-of-core -
>>>    value on the most memory consuming processor): 0
>>>    >>>>>                        INFOG(39) (after analysis: estimated
>>>    size of all MUMPS internal data for running BLR out-of-core - sum
>>>    over all processors): 0
>>>    >>>>>            linear system matrix = precond matrix:
>>>    >>>>>            Mat Object:   1 MPI processes
>>>    >>>>>              type: seqaijcusparse
>>>    >>>>>              rows=40200, cols=40200
>>>    >>>>>              total: nonzeros=199996, allocated nonzeros=199996
>>>    >>>>>              total number of mallocs used during
>>>    MatSetValues calls=0
>>>    >>>>>                not using I-node routines
>>>    >>>>>    linear system matrix = precond matrix:
>>>    >>>>>    Mat Object: 16 MPI processes
>>>    >>>>>      type: mpiaijcusparse
>>>    >>>>>      rows=160800, cols=160800
>>>    >>>>>      total: nonzeros=802396, allocated nonzeros=1608000
>>>    >>>>>      total number of mallocs used during MatSetValues calls=0
>>>    >>>>>        not using I-node (on process 0) routines
>>>    >>>>> Norm of error 9.11684e-07 iterations 189
>>>    >>>>> Chang
>>>    >>>>> On 10/14/21 10:10 PM, Chang Liu wrote:
>>>    >>>>>> Hi Barry,
>>>    >>>>>>
>>>    >>>>>> No problem. Here is the output. It seems that the resid
>>>    norm calculation is incorrect.
>>>    >>>>>>
>>>    >>>>>> $ mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7
>>>    -m 400 -ksp_view -ksp_monitor_true_residual -pc_type bjacobi
>>>    -pc_bjacobi_blocks 4 -ksp_type fgmres -mat_type aijcusparse
>>>    -sub_pc_type telescope -sub_ksp_type preonly
>>>    -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu
>>>    -sub_telescope_pc_factor_mat_solver_type cusparse
>>>    -sub_pc_telescope_reduction_factor 4
>>>    -sub_pc_telescope_subcomm_type contiguous -ksp_max_it 2000
>>>    -ksp_rtol 1.e-20 -ksp_atol 1.e-9
>>>    >>>>>>     0 KSP unpreconditioned resid norm 4.014971979977e+01
>>>    true resid norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
>>>    >>>>>>     1 KSP unpreconditioned resid norm 0.000000000000e+00
>>>    true resid norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00
>>>    >>>>>> KSP Object: 16 MPI processes
>>>    >>>>>>     type: fgmres
>>>    >>>>>>       restart=30, using Classical (unmodified) Gram-Schmidt
>>>    Orthogonalization with no iterative refinement
>>>    >>>>>>       happy breakdown tolerance 1e-30
>>>    >>>>>>     maximum iterations=2000, initial guess is zero
>>>    >>>>>>     tolerances:  relative=1e-20, absolute=1e-09,
>>>    divergence=10000.
>>>    >>>>>>     right preconditioning
>>>    >>>>>>     using UNPRECONDITIONED norm type for convergence test
>>>    >>>>>> PC Object: 16 MPI processes
>>>    >>>>>>     type: bjacobi
>>>    >>>>>>       number of blocks = 4
>>>    >>>>>>       Local solver information for first block is in the
>>>    following KSP and PC objects on rank 0:
>>>    >>>>>>       Use -ksp_view ::ascii_info_detail to display
>>>    information for all blocks
>>>    >>>>>>     KSP Object: (sub_) 4 MPI processes
>>>    >>>>>>       type: preonly
>>>    >>>>>>       maximum iterations=10000, initial guess is zero
>>>    >>>>>>       tolerances:  relative=1e-05, absolute=1e-50,
>>>    divergence=10000.
>>>    >>>>>>       left preconditioning
>>>    >>>>>>       using NONE norm type for convergence test
>>>    >>>>>>     PC Object: (sub_) 4 MPI processes
>>>    >>>>>>       type: telescope
>>>    >>>>>>         petsc subcomm: parent comm size reduction factor = 4
>>>    >>>>>>         petsc subcomm: parent_size = 4 , subcomm_size = 1
>>>    >>>>>>         petsc subcomm type = contiguous
>>>    >>>>>>       linear system matrix = precond matrix:
>>>    >>>>>>       Mat Object: (sub_) 4 MPI processes
>>>    >>>>>>         type: mpiaij
>>>    >>>>>>         rows=40200, cols=40200
>>>    >>>>>>         total: nonzeros=199996, allocated nonzeros=203412
>>>    >>>>>>         total number of mallocs used during MatSetValues
>>>    calls=0
>>>    >>>>>>           not using I-node (on process 0) routines
>>>    >>>>>>           setup type: default
>>>    >>>>>>           Parent DM object: NULL
>>>    >>>>>>           Sub DM object: NULL
>>>    >>>>>>           KSP Object:   (sub_telescope_)   1 MPI processes
>>>    >>>>>>             type: preonly
>>>    >>>>>>             maximum iterations=10000, initial guess is zero
>>>    >>>>>>             tolerances:  relative=1e-05, absolute=1e-50,
>>>    divergence=10000.
>>>    >>>>>>             left preconditioning
>>>    >>>>>>             using NONE norm type for convergence test
>>>    >>>>>>           PC Object:   (sub_telescope_)   1 MPI processes
>>>    >>>>>>             type: lu
>>>    >>>>>>               out-of-place factorization
>>>    >>>>>>               tolerance for zero pivot 2.22045e-14
>>>    >>>>>>               matrix ordering: nd
>>>    >>>>>>               factor fill ratio given 5., needed 8.62558
>>>    >>>>>>                 Factored matrix follows:
>>>    >>>>>>                   Mat Object:   1 MPI processes
>>>    >>>>>>                     type: seqaijcusparse
>>>    >>>>>>                     rows=40200, cols=40200
>>>    >>>>>>                     package used to perform factorization:
>>>    cusparse
>>>    >>>>>>                     total: nonzeros=1725082, allocated
>>>    nonzeros=1725082
>>>    >>>>>>                       not using I-node routines
>>>    >>>>>>             linear system matrix = precond matrix:
>>>    >>>>>>             Mat Object:   1 MPI processes
>>>    >>>>>>               type: seqaijcusparse
>>>    >>>>>>               rows=40200, cols=40200
>>>    >>>>>>               total: nonzeros=199996, allocated nonzeros=199996
>>>    >>>>>>               total number of mallocs used during
>>>    MatSetValues calls=0
>>>    >>>>>>                 not using I-node routines
>>>    >>>>>>     linear system matrix = precond matrix:
>>>    >>>>>>     Mat Object: 16 MPI processes
>>>    >>>>>>       type: mpiaijcusparse
>>>    >>>>>>       rows=160800, cols=160800
>>>    >>>>>>       total: nonzeros=802396, allocated nonzeros=1608000
>>>    >>>>>>       total number of mallocs used during MatSetValues calls=0
>>>    >>>>>>         not using I-node (on process 0) routines
>>>    >>>>>> Norm of error 400.999 iterations 1
>>>    >>>>>>
>>>    >>>>>> Chang
>>>    >>>>>>
>>>    >>>>>>
>>>    >>>>>> On 10/14/21 9:47 PM, Barry Smith wrote:
>>>    >>>>>>>
>>>    >>>>>>>     Chang,
>>>    >>>>>>>
>>>    >>>>>>>      Sorry I did not notice that one. Please run that with
>>>    -ksp_view -ksp_monitor_true_residual so we can see exactly how
>>>    options are interpreted and solver used. At a glance it looks ok
>>>    but something must be wrong to get the wrong answer.
>>>    >>>>>>>
>>>    >>>>>>>     Barry
>>>    >>>>>>>
>>>    >>>>>>>> On Oct 14, 2021, at 6:02 PM, Chang Liu <c...@pppl.gov
>>>    <mailto:c...@pppl.gov>> wrote:
>>>    >>>>>>>>
>>>    >>>>>>>> Hi Barry,
>>>    >>>>>>>>
>>>    >>>>>>>> That is exactly what I was doing in the second example,
>>>    in which the preconditioner works but the GMRES does not.
>>>    >>>>>>>>
>>>    >>>>>>>> Chang
>>>    >>>>>>>>
>>>    >>>>>>>> On 10/14/21 5:15 PM, Barry Smith wrote:
>>>    >>>>>>>>>     You need to use the PCTELESCOPE inside the block
>>>    Jacobi, not outside it. So something like -pc_type bjacobi
>>>    -sub_pc_type telescope -sub_telescope_pc_type lu
>>>    >>>>>>>>>> On Oct 14, 2021, at 4:14 PM, Chang Liu <c...@pppl.gov
>>>    <mailto:c...@pppl.gov>> wrote:
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> Hi Pierre,
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> I wonder if the trick of PCTELESCOPE only works for
>>>    preconditioner and not for the solver. I have done some tests, and
>>>    find that for solving a small matrix using -telescope_ksp_type
>>>    preonly, it does work for GPU with multiple MPI processes.
>>>    However, for bjacobi and gmres, it does not work.
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> The command line options I used for small matrix is like
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> mpiexec -n 4 --oversubscribe ./ex7 -m 100
>>>    -ksp_monitor_short -pc_type telescope -mat_type aijcusparse
>>>    -telescope_pc_type lu -telescope_pc_factor_mat_solver_type
>>>    cusparse -telescope_ksp_type preonly -pc_telescope_reduction_factor 4
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> which gives the correct output. For iterative solver, I
>>>    tried
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> mpiexec -n 16 --oversubscribe ./ex7 -m 400
>>>    -ksp_monitor_short -pc_type bjacobi -pc_bjacobi_blocks 4 -ksp_type
>>>    fgmres -mat_type aijcusparse -sub_pc_type telescope -sub_ksp_type
>>>    preonly -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu
>>>    -sub_telescope_pc_factor_mat_solver_type cusparse
>>>    -sub_pc_telescope_reduction_factor 4 -ksp_max_it 2000 -ksp_rtol
>>>    1.e-9 -ksp_atol 1.e-20
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> for large matrix. The output is like
>>>    >>>>>>>>>>
>>>    >>>>>>>>>>    0 KSP Residual norm 40.1497
>>>    >>>>>>>>>>    1 KSP Residual norm < 1.e-11
>>>    >>>>>>>>>> Norm of error 400.999 iterations 1
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> So it seems to call a direct solver instead of an
>>>    iterative one.
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> Can you please help check these options?
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> Chang
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> On 10/14/21 10:04 AM, Pierre Jolivet wrote:
>>>    >>>>>>>>>>>> On 14 Oct 2021, at 3:50 PM, Chang Liu <c...@pppl.gov
>>>    <mailto:c...@pppl.gov>> wrote:
>>>    >>>>>>>>>>>>
>>>    >>>>>>>>>>>> Thank you Pierre. I was not aware of PCTELESCOPE
>>>    before. This sounds exactly what I need. I wonder if PCTELESCOPE
>>>    can transform a mpiaijcusparse to seqaircusparse? Or I have to do
>>>    it manually?
>>>    >>>>>>>>>>> PCTELESCOPE uses MatCreateMPIMatConcatenateSeqMat().
>>>    >>>>>>>>>>> 1) I’m not sure this is implemented for cuSparse
>>>    matrices, but it should be;
>>>    >>>>>>>>>>> 2) at least for the implementations
>>>    MatCreateMPIMatConcatenateSeqMat_MPIBAIJ() and
>>>    MatCreateMPIMatConcatenateSeqMat_MPIAIJ(), the resulting MatType
>>>    is MATBAIJ (resp. MATAIJ). Constructors are usually “smart” enough
>>>    to detect if the MPI communicator on which the Mat lives is of
>>>    size 1 (your case), and then the resulting Mat is of type MatSeqX
>>>    instead of MatMPIX, so you would not need to worry about the
>>>    transformation you are mentioning.
>>>    >>>>>>>>>>> If you try this out and this does not work, please
>>>    provide the backtrace (probably something like “Operation XYZ not
>>>    implemented for MatType ABC”), and hopefully someone can add the
>>>    missing plumbing.
>>>    >>>>>>>>>>> I do not claim that this will be efficient, but I
>>>    think this goes in the direction of what you want to achieve.
>>>    >>>>>>>>>>> Thanks,
>>>    >>>>>>>>>>> Pierre
>>>    >>>>>>>>>>>> Chang
>>>    >>>>>>>>>>>>
>>>    >>>>>>>>>>>> On 10/14/21 1:35 AM, Pierre Jolivet wrote:
>>>    >>>>>>>>>>>>> Maybe I’m missing something, but can’t you use
>>>    PCTELESCOPE as a subdomain solver, with a reduction factor equal
>>>    to the number of MPI processes you have per block?
>>>    >>>>>>>>>>>>> -sub_pc_type telescope
>>>    -sub_pc_telescope_reduction_factor X -sub_telescope_pc_type lu
>>>    >>>>>>>>>>>>> This does not work with MUMPS
>>>    -mat_mumps_use_omp_threads because not only do the Mat needs to be
>>>    redistributed, the secondary processes also need to be “converted”
>>>    to OpenMP threads.
>>>    >>>>>>>>>>>>> Thus the need for specific code in mumps.c.
>>>    >>>>>>>>>>>>> Thanks,
>>>    >>>>>>>>>>>>> Pierre
>>>    >>>>>>>>>>>>>> On 14 Oct 2021, at 6:00 AM, Chang Liu via
>>>    petsc-users <petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>> wrote:
>>>    >>>>>>>>>>>>>>
>>>    >>>>>>>>>>>>>> Hi Junchao,
>>>    >>>>>>>>>>>>>>
>>>    >>>>>>>>>>>>>> Yes that is what I want.
>>>    >>>>>>>>>>>>>>
>>>    >>>>>>>>>>>>>> Chang
>>>    >>>>>>>>>>>>>>
>>>    >>>>>>>>>>>>>> On 10/13/21 11:42 PM, Junchao Zhang wrote:
>>>    >>>>>>>>>>>>>>> On Wed, Oct 13, 2021 at 8:58 PM Barry Smith
>>>    <bsm...@petsc.dev <mailto:bsm...@petsc.dev>
>>>    <mailto:bsm...@petsc.dev <mailto:bsm...@petsc.dev>>> wrote:
>>>    >>>>>>>>>>>>>>>         Junchao,
>>>    >>>>>>>>>>>>>>>            If I understand correctly Chang is
>>>    using the block Jacobi
>>>    >>>>>>>>>>>>>>>      method with a single block for a number of
>>>    MPI ranks and a direct
>>>    >>>>>>>>>>>>>>>      solver for each block so it uses
>>>    PCSetUp_BJacobi_Multiproc() which
>>>    >>>>>>>>>>>>>>>      is code Hong Zhang wrote a number of years
>>>    ago for CPUs. For their
>>>    >>>>>>>>>>>>>>>      particular problems this preconditioner works
>>>    well, but using an
>>>    >>>>>>>>>>>>>>>      iterative solver on the blocks does not work
>>>    well.
>>>    >>>>>>>>>>>>>>>            If we had complete MPI-GPU direct
>>>    solvers he could just use
>>>    >>>>>>>>>>>>>>>      the current code with MPIAIJCUSPARSE on each
>>>    block but since we do
>>>    >>>>>>>>>>>>>>>      not he would like to use a single GPU for
>>>    each block, this means
>>>    >>>>>>>>>>>>>>>      that diagonal blocks of  the global parallel
>>>    MPI matrix needs to be
>>>    >>>>>>>>>>>>>>>      sent to a subset of the GPUs (one GPU per
>>>    block, which has multiple
>>>    >>>>>>>>>>>>>>>      MPI ranks associated with the blocks).
>>>    Similarly for the triangular
>>>    >>>>>>>>>>>>>>>      solves the blocks of the right hand side
>>>    needs to be shipped to the
>>>    >>>>>>>>>>>>>>>      appropriate GPU and the resulting solution
>>>    shipped back to the
>>>    >>>>>>>>>>>>>>>      multiple GPUs. So Chang is absolutely
>>>    correct, this is somewhat like
>>>    >>>>>>>>>>>>>>>      your code for MUMPS with OpenMP. OK, I now
>>>    understand the background..
>>>    >>>>>>>>>>>>>>>      One could use PCSetUp_BJacobi_Multiproc() and
>>>    get the blocks on the
>>>    >>>>>>>>>>>>>>>      MPI ranks and then shrink each block down to
>>>    a single GPU but this
>>>    >>>>>>>>>>>>>>>      would be pretty inefficient, ideally one
>>>    would go directly from the
>>>    >>>>>>>>>>>>>>>      big MPI matrix on all the GPUs to the sub
>>>    matrices on the subset of
>>>    >>>>>>>>>>>>>>>      GPUs. But this may be a large coding project.
>>>    >>>>>>>>>>>>>>> I don't understand these sentences. Why do you say
>>>    "shrink"? In my mind, we just need to move each block (submatrix)
>>>    living over multiple MPI ranks to one of them and solve directly
>>>    there.  In other words, we keep blocks' size, no shrinking or
>>>    expanding.
>>>    >>>>>>>>>>>>>>> As mentioned before, cusparse does not provide LU
>>>    factorization. So the LU factorization would be done on CPU, and
>>>    the solve be done on GPU. I assume Chang wants to gain from the
>>>    (potential) faster solve (instead of factorization) on GPU.
>>>    >>>>>>>>>>>>>>>         Barry
>>>    >>>>>>>>>>>>>>>      Since the matrices being factored and solved
>>>    directly are relatively
>>>    >>>>>>>>>>>>>>>      large it is possible that the cusparse code
>>>    could be reasonably
>>>    >>>>>>>>>>>>>>>      efficient (they are not the tiny problems one
>>>    gets at the coarse
>>>    >>>>>>>>>>>>>>>      level of multigrid). Of course, this is
>>>    speculation, I don't
>>>    >>>>>>>>>>>>>>>      actually know how much better the cusparse
>>>    code would be on the
>>>    >>>>>>>>>>>>>>>      direct solver than a good CPU direct sparse
>>>    solver.
>>>    >>>>>>>>>>>>>>>       > On Oct 13, 2021, at 9:32 PM, Chang Liu
>>>    <c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov>>> wrote:
>>>    >>>>>>>>>>>>>>>       >
>>>    >>>>>>>>>>>>>>>       > Sorry I am not familiar with the details
>>>    either. Can you please
>>>    >>>>>>>>>>>>>>>      check the code in
>>>    MatMumpsGatherNonzerosOnMaster in mumps.c?
>>>    >>>>>>>>>>>>>>>       >
>>>    >>>>>>>>>>>>>>>       > Chang
>>>    >>>>>>>>>>>>>>>       >
>>>    >>>>>>>>>>>>>>>       > On 10/13/21 9:24 PM, Junchao Zhang wrote:
>>>    >>>>>>>>>>>>>>>       >> Hi Chang,
>>>    >>>>>>>>>>>>>>>       >>   I did the work in mumps. It is easy for
>>>    me to understand
>>>    >>>>>>>>>>>>>>>      gathering matrix rows to one process.
>>>    >>>>>>>>>>>>>>>       >>   But how to gather blocks (submatrices)
>>>    to form a large block?     Can you draw a picture of that?
>>>    >>>>>>>>>>>>>>>       >>   Thanks
>>>    >>>>>>>>>>>>>>>       >> --Junchao Zhang
>>>    >>>>>>>>>>>>>>>       >> On Wed, Oct 13, 2021 at 7:47 PM Chang Liu
>>>    via petsc-users
>>>    >>>>>>>>>>>>>>>      <petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>>
>>>    >>>>>>>>>>>>>>>      wrote:
>>>    >>>>>>>>>>>>>>>       >>    Hi Barry,
>>>    >>>>>>>>>>>>>>>       >>    I think mumps solver in petsc does
>>>    support that. You can
>>>    >>>>>>>>>>>>>>>      check the
>>>    >>>>>>>>>>>>>>>       >>    documentation on
>>>    "-mat_mumps_use_omp_threads" at
>>>    >>>>>>>>>>>>>>>       >>
>>>    >>>>>>>>>>>>>>>
>>>    https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html
>>>    <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>
>>>    >>>>>>>>>>>>>>>
>>>    <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html 
>>> <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>>
>>>    >>>>>>>>>>>>>>>       >>
>>>    <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html 
>>> <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>
>>>    >>>>>>>>>>>>>>>
>>>    <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html 
>>> <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>>>
>>>    >>>>>>>>>>>>>>>       >>    and the code enclosed by #if
>>>    >>>>>>>>>>>>>>>      defined(PETSC_HAVE_OPENMP_SUPPORT) in
>>>    >>>>>>>>>>>>>>>       >>    functions MatMumpsSetUpDistRHSInfo and
>>>    >>>>>>>>>>>>>>>       >>    MatMumpsGatherNonzerosOnMaster in
>>>    >>>>>>>>>>>>>>>       >>    mumps.c
>>>    >>>>>>>>>>>>>>>       >>    1. I understand it is ideal to do one
>>>    MPI rank per GPU.
>>>    >>>>>>>>>>>>>>>      However, I am
>>>    >>>>>>>>>>>>>>>       >>    working on an existing code that was
>>>    developed based on MPI
>>>    >>>>>>>>>>>>>>>      and the the
>>>    >>>>>>>>>>>>>>>       >>    # of mpi ranks is typically equal to #
>>>    of cpu cores. We don't
>>>    >>>>>>>>>>>>>>>      want to
>>>    >>>>>>>>>>>>>>>       >>    change the whole structure of the code.
>>>    >>>>>>>>>>>>>>>       >>    2. What you have suggested has been
>>>    coded in mumps.c. See
>>>    >>>>>>>>>>>>>>>      function
>>>    >>>>>>>>>>>>>>>       >>    MatMumpsSetUpDistRHSInfo.
>>>    >>>>>>>>>>>>>>>       >>    Regards,
>>>    >>>>>>>>>>>>>>>       >>    Chang
>>>    >>>>>>>>>>>>>>>       >>    On 10/13/21 7:53 PM, Barry Smith wrote:
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >> On Oct 13, 2021, at 3:50 PM, Chang
>>>    Liu <c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov>>>> wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>
>>>    >>>>>>>>>>>>>>>       >>     >> Hi Barry,
>>>    >>>>>>>>>>>>>>>       >>     >>
>>>    >>>>>>>>>>>>>>>       >>     >> That is exactly what I want.
>>>    >>>>>>>>>>>>>>>       >>     >>
>>>    >>>>>>>>>>>>>>>       >>     >> Back to my original question, I am
>>>    looking for an approach to
>>>    >>>>>>>>>>>>>>>       >>    transfer
>>>    >>>>>>>>>>>>>>>       >>     >> matrix
>>>    >>>>>>>>>>>>>>>       >>     >> data from many MPI processes to
>>>    "master" MPI
>>>    >>>>>>>>>>>>>>>       >>     >> processes, each of which taking
>>>    care of one GPU, and then
>>>    >>>>>>>>>>>>>>>      upload
>>>    >>>>>>>>>>>>>>>       >>    the data to GPU to
>>>    >>>>>>>>>>>>>>>       >>     >> solve.
>>>    >>>>>>>>>>>>>>>       >>     >> One can just grab some codes from
>>>    mumps.c to
>>>    >>>>>>>>>>>>>>> aijcusparse.cu <http://aijcusparse.cu/>
>>>    <http://aijcusparse.cu <http://aijcusparse.cu/>>
>>>    >>>>>>>>>>>>>>>       >>    <http://aijcusparse.cu
>>>    <http://aijcusparse.cu/> <http://aijcusparse.cu
>>>    <http://aijcusparse.cu/>>>.
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >    mumps.c doesn't actually do
>>>    that. It never needs to
>>>    >>>>>>>>>>>>>>>      copy the
>>>    >>>>>>>>>>>>>>>       >>    entire matrix to a single MPI rank.
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >    It would be possible to write
>>>    such a code that you
>>>    >>>>>>>>>>>>>>>      suggest but
>>>    >>>>>>>>>>>>>>>       >>    it is not clear that it makes sense
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     > 1)  For normal PETSc GPU usage
>>>    there is one GPU per MPI
>>>    >>>>>>>>>>>>>>>      rank, so
>>>    >>>>>>>>>>>>>>>       >>    while your one GPU per big domain is
>>>    solving its systems the
>>>    >>>>>>>>>>>>>>>      other
>>>    >>>>>>>>>>>>>>>       >>    GPUs (with the other MPI ranks that
>>>    share that domain) are doing
>>>    >>>>>>>>>>>>>>>       >>    nothing.
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     > 2) For each triangular solve you
>>>    would have to gather the
>>>    >>>>>>>>>>>>>>>      right
>>>    >>>>>>>>>>>>>>>       >>    hand side from the multiple ranks to
>>>    the single GPU to pass it to
>>>    >>>>>>>>>>>>>>>       >>    the GPU solver and then scatter the
>>>    resulting solution back
>>>    >>>>>>>>>>>>>>>      to all
>>>    >>>>>>>>>>>>>>>       >>    of its subdomain ranks.
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >    What I was suggesting was assign
>>>    an entire subdomain to a
>>>    >>>>>>>>>>>>>>>       >>    single MPI rank, thus it does
>>>    everything on one GPU and can
>>>    >>>>>>>>>>>>>>>      use the
>>>    >>>>>>>>>>>>>>>       >>    GPU solver directly. If all the major
>>>    computations of a subdomain
>>>    >>>>>>>>>>>>>>>       >>    can fit and be done on a single GPU
>>>    then you would be
>>>    >>>>>>>>>>>>>>>      utilizing all
>>>    >>>>>>>>>>>>>>>       >>    the GPUs you are using effectively.
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >    Barry
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>     >>
>>>    >>>>>>>>>>>>>>>       >>     >> Chang
>>>    >>>>>>>>>>>>>>>       >>     >>
>>>    >>>>>>>>>>>>>>>       >>     >> On 10/13/21 1:53 PM, Barry Smith
>>>    wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>    Chang,
>>>    >>>>>>>>>>>>>>>       >>     >>>      You are correct there is no
>>>    MPI + GPU direct
>>>    >>>>>>>>>>>>>>>      solvers that
>>>    >>>>>>>>>>>>>>>       >>    currently do the triangular solves
>>>    with MPI + GPU parallelism
>>>    >>>>>>>>>>>>>>>      that I
>>>    >>>>>>>>>>>>>>>       >>    am aware of. You are limited that
>>>    individual triangular solves be
>>>    >>>>>>>>>>>>>>>       >>    done on a single GPU. I can only
>>>    suggest making each subdomain as
>>>    >>>>>>>>>>>>>>>       >>    big as possible to utilize each GPU as
>>>    much as possible for the
>>>    >>>>>>>>>>>>>>>       >>    direct triangular solves.
>>>    >>>>>>>>>>>>>>>       >>     >>>     Barry
>>>    >>>>>>>>>>>>>>>       >>     >>>> On Oct 13, 2021, at 12:16 PM,
>>>    Chang Liu via petsc-users
>>>    >>>>>>>>>>>>>>>       >>    <petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>>
>>>    >>>>>>>>>>>>>>>      wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>> Hi Mark,
>>>    >>>>>>>>>>>>>>>       >>     >>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>> '-mat_type aijcusparse' works
>>>    with mpiaijcusparse with
>>>    >>>>>>>>>>>>>>>      other
>>>    >>>>>>>>>>>>>>>       >>    solvers, but with
>>>    -pc_factor_mat_solver_type cusparse, it
>>>    >>>>>>>>>>>>>>>      will give
>>>    >>>>>>>>>>>>>>>       >>    an error.
>>>    >>>>>>>>>>>>>>>       >>     >>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>> Yes what I want is to have mumps
>>>    or superlu to do the
>>>    >>>>>>>>>>>>>>>       >>    factorization, and then do the rest,
>>>    including GMRES solver,
>>>    >>>>>>>>>>>>>>>      on gpu.
>>>    >>>>>>>>>>>>>>>       >>    Is that possible?
>>>    >>>>>>>>>>>>>>>       >>     >>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>> I have tried to use aijcusparse
>>>    with superlu_dist, it
>>>    >>>>>>>>>>>>>>>      runs but
>>>    >>>>>>>>>>>>>>>       >>    the iterative solver is still running
>>>    on CPUs. I have
>>>    >>>>>>>>>>>>>>>      contacted the
>>>    >>>>>>>>>>>>>>>       >>    superlu group and they confirmed that
>>>    is the case right now.
>>>    >>>>>>>>>>>>>>>      But if
>>>    >>>>>>>>>>>>>>>       >>    I set -pc_factor_mat_solver_type
>>>    cusparse, it seems that the
>>>    >>>>>>>>>>>>>>>       >>    iterative solver is running on GPU.
>>>    >>>>>>>>>>>>>>>       >>     >>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>> Chang
>>>    >>>>>>>>>>>>>>>       >>     >>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>> On 10/13/21 12:03 PM, Mark Adams
>>>    wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>> On Wed, Oct 13, 2021 at 11:10
>>>    AM Chang Liu
>>>    >>>>>>>>>>>>>>>      <c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov>>>>> wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     Thank you Junchao for
>>>    explaining this. I guess in
>>>    >>>>>>>>>>>>>>>      my case
>>>    >>>>>>>>>>>>>>>       >>    the code is
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     just calling a seq solver
>>>    like superlu to do
>>>    >>>>>>>>>>>>>>>       >>    factorization on GPUs.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     My idea is that I want to
>>>    have a traditional MPI
>>>    >>>>>>>>>>>>>>>      code to
>>>    >>>>>>>>>>>>>>>       >>    utilize GPUs
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     with cusparse. Right now
>>>    cusparse does not support
>>>    >>>>>>>>>>>>>>>      mpiaij
>>>    >>>>>>>>>>>>>>>       >>    matrix, Sure it does: '-mat_type
>>>    aijcusparse' will give you an
>>>    >>>>>>>>>>>>>>>       >>    mpiaijcusparse matrix with > 1 processes.
>>>    >>>>>>>>>>>>>>>       >>     >>>>> (-mat_type mpiaijcusparse might
>>>    also work with >1 proc).
>>>    >>>>>>>>>>>>>>>       >>     >>>>> However, I see in grepping the
>>>    repo that all the mumps and
>>>    >>>>>>>>>>>>>>>       >>    superlu tests use aij or sell matrix type.
>>>    >>>>>>>>>>>>>>>       >>     >>>>> MUMPS and SuperLU provide their
>>>    own solves, I assume
>>>    >>>>>>>>>>>>>>>      .... but
>>>    >>>>>>>>>>>>>>>       >>    you might want to do other matrix
>>>    operations on the GPU. Is
>>>    >>>>>>>>>>>>>>>      that the
>>>    >>>>>>>>>>>>>>>       >>    issue?
>>>    >>>>>>>>>>>>>>>       >>     >>>>> Did you try -mat_type
>>>    aijcusparse with MUMPS and/or
>>>    >>>>>>>>>>>>>>>      SuperLU
>>>    >>>>>>>>>>>>>>>       >>    have a problem? (no test with it so it
>>>    probably does not work)
>>>    >>>>>>>>>>>>>>>       >>     >>>>> Thanks,
>>>    >>>>>>>>>>>>>>>       >>     >>>>> Mark
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     so I
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     want the code to have a
>>>    mpiaij matrix when adding
>>>    >>>>>>>>>>>>>>>      all the
>>>    >>>>>>>>>>>>>>>       >>    matrix terms,
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     and then transform the
>>>    matrix to seqaij when doing the
>>>    >>>>>>>>>>>>>>>       >>    factorization
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     and
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     solve. This involves
>>>    sending the data to the master
>>>    >>>>>>>>>>>>>>>       >>    process, and I
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     think
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     the petsc mumps solver have
>>>    something similar already.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     Chang
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     On 10/13/21 10:18 AM,
>>>    Junchao Zhang wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      > On Tue, Oct 12, 2021 at
>>>    1:07 PM Mark Adams
>>>    >>>>>>>>>>>>>>>       >>    <mfad...@lbl.gov
>>>    <mailto:mfad...@lbl.gov> <mailto:mfad...@lbl.gov
>>>    <mailto:mfad...@lbl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:mfad...@lbl.gov
>>>    <mailto:mfad...@lbl.gov> <mailto:mfad...@lbl.gov
>>>    <mailto:mfad...@lbl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:mfad...@lbl.gov
>>>    <mailto:mfad...@lbl.gov> <mailto:mfad...@lbl.gov
>>>    <mailto:mfad...@lbl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:mfad...@lbl.gov
>>>    <mailto:mfad...@lbl.gov> <mailto:mfad...@lbl.gov
>>>    <mailto:mfad...@lbl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      > <mailto:mfad...@lbl.gov
>>>    <mailto:mfad...@lbl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:mfad...@lbl.gov
>>>    <mailto:mfad...@lbl.gov>> <mailto:mfad...@lbl.gov
>>>    <mailto:mfad...@lbl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:mfad...@lbl.gov
>>>    <mailto:mfad...@lbl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:mfad...@lbl.gov
>>>    <mailto:mfad...@lbl.gov> <mailto:mfad...@lbl.gov
>>>    <mailto:mfad...@lbl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:mfad...@lbl.gov
>>>    <mailto:mfad...@lbl.gov> <mailto:mfad...@lbl.gov
>>>    <mailto:mfad...@lbl.gov>>>>>> wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >     On Tue, Oct 12, 2021
>>>    at 1:45 PM Chang Liu
>>>    >>>>>>>>>>>>>>>       >>    <c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >        <mailto:c...@pppl.gov 
>>> <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>>> wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         Hi Mark,
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         The option I use
>>>    is like
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         -pc_type bjacobi
>>>    -pc_bjacobi_blocks 16
>>>    >>>>>>>>>>>>>>>       >>    -ksp_type fgmres
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     -mat_type
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         aijcusparse
>>>    *-sub_pc_factor_mat_solver_type
>>>    >>>>>>>>>>>>>>>       >>    cusparse
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     *-sub_ksp_type
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         preonly
>>>    *-sub_pc_type lu* -ksp_max_it 2000
>>>    >>>>>>>>>>>>>>>       >>    -ksp_rtol 1.e-300
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         -ksp_atol 1.e-300
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >     Note, If you use
>>>    -log_view the last column
>>>    >>>>>>>>>>>>>>>      (rows
>>>    >>>>>>>>>>>>>>>       >>    are the
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     method like
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >     MatFactorNumeric)
>>>    has the percent of work
>>>    >>>>>>>>>>>>>>>      in the GPU.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >     Junchao: *This*
>>>    implies that we have a
>>>    >>>>>>>>>>>>>>>      cuSparse LU
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     factorization. Is
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >     that correct? (I
>>>    don't think we do)
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      > No, we don't have
>>>    cuSparse LU factorization.     If you check
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    MatLUFactorSymbolic_SeqAIJCUSPARSE(),you will
>>>    >>>>>>>>>>>>>>>      find it
>>>    >>>>>>>>>>>>>>>       >>    calls
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    MatLUFactorSymbolic_SeqAIJ() instead.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      > So I don't understand
>>>    Chang's idea. Do you want to
>>>    >>>>>>>>>>>>>>>       >>    make bigger
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     blocks?
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         I think this one
>>>    do both factorization and
>>>    >>>>>>>>>>>>>>>       >>    solve on gpu.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         You can check the
>>>    >>>>>>>>>>>>>>>      runex72_aijcusparse.sh file
>>>    >>>>>>>>>>>>>>>       >>    in petsc
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     install
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         directory, and
>>>    try it your self (this
>>>    >>>>>>>>>>>>>>>      is only lu
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     factorization
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         without
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         iterative solve).
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         Chang
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         On 10/12/21 1:17
>>>    PM, Mark Adams wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          > On Tue, Oct
>>>    12, 2021 at 11:19 AM
>>>    >>>>>>>>>>>>>>>      Chang Liu
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >            
>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>>>> wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     Hi Junchao,
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     No I only
>>>    needs it to be transferred
>>>    >>>>>>>>>>>>>>>       >>    within a
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     node. I use
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         block-Jacobi
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     method
>>>    and GMRES to solve the sparse
>>>    >>>>>>>>>>>>>>>       >>    matrix, so each
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         direct solver will
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     take care
>>>    of a sub-block of the
>>>    >>>>>>>>>>>>>>>      whole
>>>    >>>>>>>>>>>>>>>       >>    matrix. In this
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         way, I can use
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     one
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     GPU to
>>>    solve one sub-block, which is
>>>    >>>>>>>>>>>>>>>       >>    stored within
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     one node.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     It was
>>>    stated in the
>>>    >>>>>>>>>>>>>>>      documentation that
>>>    >>>>>>>>>>>>>>>       >>    cusparse
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     solver
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         is slow.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     However,
>>>    in my test using
>>>    >>>>>>>>>>>>>>>      ex72.c, the
>>>    >>>>>>>>>>>>>>>       >>    cusparse
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     solver is
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         faster than
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     mumps or
>>>    superlu_dist on CPUs.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          > Are we
>>>    talking about the
>>>    >>>>>>>>>>>>>>>      factorization, the
>>>    >>>>>>>>>>>>>>>       >>    solve, or
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     both?
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          > We do not
>>>    have an interface to
>>>    >>>>>>>>>>>>>>>      cuSparse's LU
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     factorization (I
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         just
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          > learned that
>>>    it exists a few weeks ago).
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          > Perhaps your
>>>    fast "cusparse solver" is
>>>    >>>>>>>>>>>>>>>       >>    '-pc_type lu
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     -mat_type
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          > aijcusparse'
>>>    ? This would be the CPU
>>>    >>>>>>>>>>>>>>>       >>    factorization,
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     which is the
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          > dominant cost.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     Chang
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     On
>>>    10/12/21 10:24 AM, Junchao
>>>    >>>>>>>>>>>>>>>      Zhang wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > Hi, Chang,
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        For 
>>> the mumps solver, we
>>>    >>>>>>>>>>>>>>>      usually
>>>    >>>>>>>>>>>>>>>       >>    transfers
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     matrix
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         and vector
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     data
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > within
>>>    a compute node.  For
>>>    >>>>>>>>>>>>>>>      the idea you
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     propose, it
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         looks like
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     we need
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > to
>>>    gather data within
>>>    >>>>>>>>>>>>>>>       >>    MPI_COMM_WORLD, right?
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        Mark, 
>>> I remember you said
>>>    >>>>>>>>>>>>>>>       >>    cusparse solve is
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     slow
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         and you would
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > rather
>>>    do it on CPU. Is it right?
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    --Junchao Zhang
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      > On
>>>    Mon, Oct 11, 2021 at 10:25 PM
>>>    >>>>>>>>>>>>>>>       >>    Chang Liu via
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     petsc-users
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>        
>>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >            
>>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>        
>>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>>>
>>>    <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>        
>>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >            
>>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>        
>>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>        
>>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >            
>>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>        
>>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>>>
>>>    <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>        
>>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >            
>>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>        
>>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:petsc-users@mcs.anl.gov
>>>    <mailto:petsc-users@mcs.anl.gov>>>>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     wrote:
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     Hi,
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        
>>> Currently, it is possible
>>>    >>>>>>>>>>>>>>>      to use
>>>    >>>>>>>>>>>>>>>       >>    mumps
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     solver in
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         PETSC with
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    -mat_mumps_use_omp_threads
>>>    >>>>>>>>>>>>>>>       >>    option, so that
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         multiple MPI
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     processes
>>>    will
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        
>>> transfer the matrix and
>>>    >>>>>>>>>>>>>>>      rhs data
>>>    >>>>>>>>>>>>>>>       >>    to the master
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         rank, and then
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     master
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        rank 
>>> will call mumps with
>>>    >>>>>>>>>>>>>>>      OpenMP
>>>    >>>>>>>>>>>>>>>       >>    to solve
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     the matrix.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     I
>>>    wonder if someone can
>>>    >>>>>>>>>>>>>>>      develop
>>>    >>>>>>>>>>>>>>>       >>    similar
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     option for
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         cusparse
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     solver.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        Right 
>>> now, this solver
>>>    >>>>>>>>>>>>>>>      does not
>>>    >>>>>>>>>>>>>>>       >>    work with
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         mpiaijcusparse. I
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     think a
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        
>>> possible workaround is to
>>>    >>>>>>>>>>>>>>>       >>    transfer all the
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     matrix
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         data to one MPI
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        
>>> process, and then upload the
>>>    >>>>>>>>>>>>>>>       >>    data to GPU to
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     solve.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         In this
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     way, one can
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        use 
>>> cusparse solver for a MPI
>>>    >>>>>>>>>>>>>>>       >>    program.
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     Chang
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     --
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        Chang 
>>> Liu
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        Staff 
>>> Research Physicist
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >     +1
>>>    609 243 3438
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >            
>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >            
>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >        
>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        
>>> Princeton Plasma Physics
>>>    >>>>>>>>>>>>>>>      Laboratory
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >        100 
>>> Stellarator Rd,
>>>    >>>>>>>>>>>>>>>      Princeton NJ
>>>    >>>>>>>>>>>>>>>       >>    08540, USA
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     --
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     Chang Liu
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     Staff
>>>    Research Physicist
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     +1 609
>>>    243 3438
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          > c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >            
>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     Princeton
>>>    Plasma Physics Laboratory
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >     100
>>>    Stellarator Rd, Princeton NJ
>>>    >>>>>>>>>>>>>>>      08540, USA
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >          >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         --
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         Chang Liu
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         Staff Research
>>>    Physicist
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         +1 609 243 3438
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      > c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         Princeton Plasma
>>>    Physics Laboratory
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >         100 Stellarator
>>>    Rd, Princeton NJ 08540, USA
>>>    >>>>>>>>>>>>>>>       >>     >>>>>      >
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     --     Chang Liu
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     Staff Research Physicist
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     +1 609 243 3438
>>>    >>>>>>>>>>>>>>>       >>     >>>>> c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>       >>    <mailto:c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     Princeton Plasma Physics
>>>    Laboratory
>>>    >>>>>>>>>>>>>>>       >>     >>>>>     100 Stellarator Rd,
>>>    Princeton NJ 08540, USA
>>>    >>>>>>>>>>>>>>>       >>     >>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>> --
>>>    >>>>>>>>>>>>>>>       >>     >>>> Chang Liu
>>>    >>>>>>>>>>>>>>>       >>     >>>> Staff Research Physicist
>>>    >>>>>>>>>>>>>>>       >>     >>>> +1 609 243 3438
>>>    >>>>>>>>>>>>>>>       >>     >>>> c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >>>> Princeton Plasma Physics Laboratory
>>>    >>>>>>>>>>>>>>>       >>     >>>> 100 Stellarator Rd, Princeton NJ
>>>    08540, USA
>>>    >>>>>>>>>>>>>>>       >>     >>
>>>    >>>>>>>>>>>>>>>       >>     >> --
>>>    >>>>>>>>>>>>>>>       >>     >> Chang Liu
>>>    >>>>>>>>>>>>>>>       >>     >> Staff Research Physicist
>>>    >>>>>>>>>>>>>>>       >>     >> +1 609 243 3438
>>>    >>>>>>>>>>>>>>>       >>     >> c...@pppl.gov
>>>    <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>     >> Princeton Plasma Physics Laboratory
>>>    >>>>>>>>>>>>>>>       >>     >> 100 Stellarator Rd, Princeton NJ
>>>    08540, USA
>>>    >>>>>>>>>>>>>>>       >>     >
>>>    >>>>>>>>>>>>>>>       >>    --     Chang Liu
>>>    >>>>>>>>>>>>>>>       >>    Staff Research Physicist
>>>    >>>>>>>>>>>>>>>       >>    +1 609 243 3438
>>>    >>>>>>>>>>>>>>>       >> c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>>>>>      <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>
>>>    >>>>>>>>>>>>>>>       >>    Princeton Plasma Physics Laboratory
>>>    >>>>>>>>>>>>>>>       >>    100 Stellarator Rd, Princeton NJ
>>>    08540, USA
>>>    >>>>>>>>>>>>>>>       >
>>>    >>>>>>>>>>>>>>>       > --
>>>    >>>>>>>>>>>>>>>       > Chang Liu
>>>    >>>>>>>>>>>>>>>       > Staff Research Physicist
>>>    >>>>>>>>>>>>>>>       > +1 609 243 3438
>>>    >>>>>>>>>>>>>>>       > c...@pppl.gov <mailto:c...@pppl.gov>
>>>    <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>
>>>    >>>>>>>>>>>>>>>       > Princeton Plasma Physics Laboratory
>>>    >>>>>>>>>>>>>>>       > 100 Stellarator Rd, Princeton NJ 08540, USA
>>>    >>>>>>>>>>>>>>
>>>    >>>>>>>>>>>>>> --
>>>    >>>>>>>>>>>>>> Chang Liu
>>>    >>>>>>>>>>>>>> Staff Research Physicist
>>>    >>>>>>>>>>>>>> +1 609 243 3438
>>>    >>>>>>>>>>>>>> c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>>>> Princeton Plasma Physics Laboratory
>>>    >>>>>>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>    >>>>>>>>>>>>
>>>    >>>>>>>>>>>> --
>>>    >>>>>>>>>>>> Chang Liu
>>>    >>>>>>>>>>>> Staff Research Physicist
>>>    >>>>>>>>>>>> +1 609 243 3438
>>>    >>>>>>>>>>>> c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>>>> Princeton Plasma Physics Laboratory
>>>    >>>>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>    >>>>>>>>>>
>>>    >>>>>>>>>> --
>>>    >>>>>>>>>> Chang Liu
>>>    >>>>>>>>>> Staff Research Physicist
>>>    >>>>>>>>>> +1 609 243 3438
>>>    >>>>>>>>>> c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>>>> Princeton Plasma Physics Laboratory
>>>    >>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>    >>>>>>>>
>>>    >>>>>>>> --
>>>    >>>>>>>> Chang Liu
>>>    >>>>>>>> Staff Research Physicist
>>>    >>>>>>>> +1 609 243 3438
>>>    >>>>>>>> c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>>>>>> Princeton Plasma Physics Laboratory
>>>    >>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>    >>>>>>>
>>>    >>>>>>
>>>    >>>>
>>>    >>>> --
>>>    >>>> Chang Liu
>>>    >>>> Staff Research Physicist
>>>    >>>> +1 609 243 3438
>>>    >>>> c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >>>> Princeton Plasma Physics Laboratory
>>>    >>>> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>    >>
>>>    >> --
>>>    >> Chang Liu
>>>    >> Staff Research Physicist
>>>    >> +1 609 243 3438
>>>    >> c...@pppl.gov <mailto:c...@pppl.gov>
>>>    >> Princeton Plasma Physics Laboratory
>>>    >> 100 Stellarator Rd, Princeton NJ 08540, USA
>>>    >
>>> 
>>>    --     Chang Liu
>>>    Staff Research Physicist
>>>    +1 609 243 3438
>>>    c...@pppl.gov <mailto:c...@pppl.gov>
>>>    Princeton Plasma Physics Laboratory
>>>    100 Stellarator Rd, Princeton NJ 08540, USA
>>> 
> 
> -- 
> Chang Liu
> Staff Research Physicist
> +1 609 243 3438
> c...@pppl.gov
> Princeton Plasma Physics Laboratory
> 100 Stellarator Rd, Princeton NJ 08540, USA

Re: [petsc-users] [External] Re: request to add an option similar to use_omp_threads for mumps to cusparse solver

Reply via email to