Hmm. A fix should work (almost exactly the same) with or without the block Jacobi on subdomains level, I had assumed that Junchao's branch would handle this. Have you looked at it?
Barry > On Oct 20, 2021, at 6:14 PM, Chang Liu <c...@pppl.gov> wrote: > > Hi Barry, > > Wait, by "branch" are you talking about the MR Junchao submitted? > > That fix (proposed by me) is only to fix the issue for telescope to work on > mpiaijcusparse, when using outside bjacobi. It has nothing to do with the > issue for telescope inside bjacobi. It does not help in my tests. > > If my emails made you think the other way, I apologize for that. > > Regards, > > Chang > > On 10/20/21 4:40 PM, Barry Smith wrote: >> Yes, but the branch can be used to do telescoping inside the bjacobi as >> needed. >>> On Oct 20, 2021, at 2:59 PM, Junchao Zhang <junchao.zh...@gmail.com >>> <mailto:junchao.zh...@gmail.com>> wrote: >>> >>> The MR https://gitlab.com/petsc/petsc/-/merge_requests/4471 >>> <https://gitlab.com/petsc/petsc/-/merge_requests/4471> has not been merged >>> yet. >>> >>> --Junchao Zhang >>> >>> >>> On Wed, Oct 20, 2021 at 1:47 PM Chang Liu via petsc-users >>> <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>> wrote: >>> >>> Hi Barry, >>> >>> Are the fixes merged in the master? I was using bjacobi as a >>> preconditioner. Using the latest version of petsc, I found that by >>> calling >>> >>> mpiexec -n 32 --oversubscribe ./ex7 -m 1000 -ksp_view >>> -ksp_monitor_true_residual -ksp_type fgmres -pc_type bjacobi >>> -pc_bjacobi >>> _blocks 4 -sub_ksp_type preonly -sub_pc_type telescope >>> -sub_pc_telescope_reduction_factor 8 -sub_pc_telescope_subcomm_type >>> contiguous -sub_telescope_pc_type lu -sub_telescope_ksp_type preonly >>> -sub_telescope_pc_factor_mat_solver_type mumps -ksp_max_it 2000 >>> -ksp_rtol 1.e-30 -ksp_atol 1.e-30 >>> >>> The code is calling PCApply_BJacobi_Multiproc. If I use >>> >>> mpiexec -n 32 --oversubscribe ./ex7 -m 1000 -ksp_view >>> -ksp_monitor_true_residual -telescope_ksp_monitor_true_residual >>> -ksp_type preonly -pc_type telescope -pc_telescope_reduction_factor 8 >>> -pc_telescope_subcomm_type contiguous -telescope_pc_type bjacobi >>> -telescope_ksp_type fgmres -telescope_pc_bjacobi_blocks 4 >>> -telescope_sub_ksp_type preonly -telescope_sub_pc_type lu >>> -telescope_sub_pc_factor_mat_solver_type mumps -telescope_ksp_max_it >>> 2000 -telescope_ksp_rtol 1.e-30 -telescope_ksp_atol 1.e-30 >>> >>> The code is calling PCApply_BJacobi_Singleblock. You can test it >>> yourself. >>> >>> Regards, >>> >>> Chang >>> >>> On 10/20/21 1:14 PM, Barry Smith wrote: >>> > >>> > >>> >> On Oct 20, 2021, at 12:48 PM, Chang Liu <c...@pppl.gov >>> <mailto:c...@pppl.gov>> wrote: >>> >> >>> >> Hi Pierre, >>> >> >>> >> I have another suggestion for telescope. I have achieved my >>> goal by putting telescope outside bjacobi. But the code still does >>> not work if I use telescope as a pc for subblock. I think the >>> reason is that I want to use cusparse as the solver, which can >>> only deal with seqaij matrix and not mpiaij matrix. >>> > >>> > >>> > This is suppose to work with the recent fixes. The >>> telescope should produce a seq matrix and for each solve map the >>> parallel vector (over the subdomain) automatically down to the one >>> rank with the GPU to solve it on the GPU. It is not clear to me >>> where the process is going wrong. >>> > >>> > Barry >>> > >>> > >>> > >>> >> However, for telescope pc, it can put the matrix into one mpi >>> rank, thus making it a seqaij for factorization stage, but then >>> after factorization it will give the data back to the original >>> comminicator. This will make the matrix back to mpiaij, and then >>> cusparse cannot solve it. >>> >> >>> >> I think a better option is to do the factorization on CPU with >>> mpiaij, then then transform the preconditioner matrix to seqaij >>> and do the matsolve GPU. But I am not sure if it can be achieved >>> using telescope. >>> >> >>> >> Regads, >>> >> >>> >> Chang >>> >> >>> >> On 10/15/21 5:29 AM, Pierre Jolivet wrote: >>> >>> Hi Chang, >>> >>> The output you sent with MUMPS looks alright to me, you can >>> see that the MatType is properly set to seqaijcusparse (and not >>> mpiaijcusparse). >>> >>> I don’t know what is wrong with >>> -sub_telescope_pc_factor_mat_solver_type cusparse, I don’t have a >>> PETSc installation for testing this, hopefully Barry or Junchao >>> can confirm this wrong behavior and get this fixed. >>> >>> As for permuting PCTELESCOPE and PCBJACOBI, in your case, the >>> outer PC will be equivalent, yes. >>> >>> However, it would be more efficient to do PCBJACOBI and then >>> PCTELESCOPE. >>> >>> PCBJACOBI prunes the operator by basically removing all >>> coefficients outside of the diagonal blocks. >>> >>> Then, PCTELESCOPE "groups everything together”. >>> >>> If you do it the other way around, PCTELESCOPE will “group >>> everything together” and then PCBJACOBI will prune the operator. >>> >>> So the PCTELESCOPE SetUp will be costly for nothing since some >>> coefficients will be thrown out afterwards in the PCBJACOBI SetUp. >>> >>> I hope I’m clear enough, otherwise I can try do draw some >>> pictures. >>> >>> Thanks, >>> >>> Pierre >>> >>>> On 15 Oct 2021, at 4:39 AM, Chang Liu <c...@pppl.gov >>> <mailto:c...@pppl.gov>> wrote: >>> >>>> >>> >>>> Hi Pierre and Barry, >>> >>>> >>> >>>> I think maybe I should use telescope outside bjacobi? like this >>> >>>> >>> >>>> mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m >>> 400 -ksp_view -ksp_monitor_true_residual -pc_type telescope >>> -pc_telescope_reduction_factor 4 -t >>> >>>> elescope_pc_type bjacobi -telescope_ksp_type fgmres >>> -telescope_pc_bjacobi_blocks 4 -mat_type aijcusparse >>> -telescope_sub_ksp_type preonly -telescope_sub_pc_type lu >>> -telescope_sub_pc_factor_mat_solve >>> >>>> r_type cusparse -ksp_max_it 2000 -ksp_rtol 1.e-20 -ksp_atol 1.e-9 >>> >>>> >>> >>>> But then I got an error that >>> >>>> >>> >>>> [0]PETSC ERROR: MatSolverType cusparse does not support >>> matrix type seqaij >>> >>>> >>> >>>> But the mat type should be aijcusparse. I think telescope >>> change the mat type. >>> >>>> >>> >>>> Chang >>> >>>> >>> >>>> On 10/14/21 10:11 PM, Chang Liu wrote: >>> >>>>> For comparison, here is the output using mumps instead of >>> cusparse >>> >>>>> $ mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 -m >>> 400 -ksp_view -ksp_monitor_true_residual -pc_type bjacobi >>> -pc_bjacobi_blocks 4 -ksp_type fgmres -mat_type aijcusparse >>> -sub_pc_type telescope -sub_ksp_type preonly >>> -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu >>> -sub_telescope_pc_factor_mat_solver_type mumps >>> -sub_pc_telescope_reduction_factor 4 >>> -sub_pc_telescope_subcomm_type contiguous -ksp_max_it 2000 >>> -ksp_rtol 1.e-20 -ksp_atol 1.e-9 >>> >>>>> 0 KSP unpreconditioned resid norm 4.014971979977e+01 true >>> resid norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00 >>> >>>>> 1 KSP unpreconditioned resid norm 2.439995191694e+00 true >>> resid norm 2.439995191694e+00 ||r(i)||/||b|| 6.077240896978e-02 >>> >>>>> 2 KSP unpreconditioned resid norm 1.280694102588e+00 true >>> resid norm 1.280694102588e+00 ||r(i)||/||b|| 3.189795866509e-02 >>> >>>>> 3 KSP unpreconditioned resid norm 1.041100266810e+00 true >>> resid norm 1.041100266810e+00 ||r(i)||/||b|| 2.593044912896e-02 >>> >>>>> 4 KSP unpreconditioned resid norm 7.274347137268e-01 true >>> resid norm 7.274347137268e-01 ||r(i)||/||b|| 1.811805206499e-02 >>> >>>>> 5 KSP unpreconditioned resid norm 5.429229329787e-01 true >>> resid norm 5.429229329787e-01 ||r(i)||/||b|| 1.352245882876e-02 >>> >>>>> 6 KSP unpreconditioned resid norm 4.332970410353e-01 true >>> resid norm 4.332970410353e-01 ||r(i)||/||b|| 1.079203150598e-02 >>> >>>>> 7 KSP unpreconditioned resid norm 3.948206050950e-01 true >>> resid norm 3.948206050950e-01 ||r(i)||/||b|| 9.833707609019e-03 >>> >>>>> 8 KSP unpreconditioned resid norm 3.379580577269e-01 true >>> resid norm 3.379580577269e-01 ||r(i)||/||b|| 8.417444988714e-03 >>> >>>>> 9 KSP unpreconditioned resid norm 2.875593971410e-01 true >>> resid norm 2.875593971410e-01 ||r(i)||/||b|| 7.162176936105e-03 >>> >>>>> 10 KSP unpreconditioned resid norm 2.533983363244e-01 true >>> resid norm 2.533983363244e-01 ||r(i)||/||b|| 6.311335112378e-03 >>> >>>>> 11 KSP unpreconditioned resid norm 2.389169921094e-01 true >>> resid norm 2.389169921094e-01 ||r(i)||/||b|| 5.950651543793e-03 >>> >>>>> 12 KSP unpreconditioned resid norm 2.118961639089e-01 true >>> resid norm 2.118961639089e-01 ||r(i)||/||b|| 5.277649880637e-03 >>> >>>>> 13 KSP unpreconditioned resid norm 1.885892030223e-01 true >>> resid norm 1.885892030223e-01 ||r(i)||/||b|| 4.697148671593e-03 >>> >>>>> 14 KSP unpreconditioned resid norm 1.763510666948e-01 true >>> resid norm 1.763510666948e-01 ||r(i)||/||b|| 4.392336175055e-03 >>> >>>>> 15 KSP unpreconditioned resid norm 1.638219366731e-01 true >>> resid norm 1.638219366731e-01 ||r(i)||/||b|| 4.080275964317e-03 >>> >>>>> 16 KSP unpreconditioned resid norm 1.476792766432e-01 true >>> resid norm 1.476792766432e-01 ||r(i)||/||b|| 3.678214378076e-03 >>> >>>>> 17 KSP unpreconditioned resid norm 1.349906937321e-01 true >>> resid norm 1.349906937321e-01 ||r(i)||/||b|| 3.362182710248e-03 >>> >>>>> 18 KSP unpreconditioned resid norm 1.289673236836e-01 true >>> resid norm 1.289673236836e-01 ||r(i)||/||b|| 3.212159993314e-03 >>> >>>>> 19 KSP unpreconditioned resid norm 1.167505658153e-01 true >>> resid norm 1.167505658153e-01 ||r(i)||/||b|| 2.907879965230e-03 >>> >>>>> 20 KSP unpreconditioned resid norm 1.046037988999e-01 true >>> resid norm 1.046037988999e-01 ||r(i)||/||b|| 2.605343185995e-03 >>> >>>>> 21 KSP unpreconditioned resid norm 9.832660514331e-02 true >>> resid norm 9.832660514331e-02 ||r(i)||/||b|| 2.448998539309e-03 >>> >>>>> 22 KSP unpreconditioned resid norm 8.835618950141e-02 true >>> resid norm 8.835618950142e-02 ||r(i)||/||b|| 2.200667649539e-03 >>> >>>>> 23 KSP unpreconditioned resid norm 7.563496650115e-02 true >>> resid norm 7.563496650116e-02 ||r(i)||/||b|| 1.883823022386e-03 >>> >>>>> 24 KSP unpreconditioned resid norm 6.651291376834e-02 true >>> resid norm 6.651291376834e-02 ||r(i)||/||b|| 1.656622115921e-03 >>> >>>>> 25 KSP unpreconditioned resid norm 5.890393227906e-02 true >>> resid norm 5.890393227906e-02 ||r(i)||/||b|| 1.467106933070e-03 >>> >>>>> 26 KSP unpreconditioned resid norm 4.661992782780e-02 true >>> resid norm 4.661992782780e-02 ||r(i)||/||b|| 1.161152009536e-03 >>> >>>>> 27 KSP unpreconditioned resid norm 3.690705358716e-02 true >>> resid norm 3.690705358716e-02 ||r(i)||/||b|| 9.192356452602e-04 >>> >>>>> 28 KSP unpreconditioned resid norm 3.209680460188e-02 true >>> resid norm 3.209680460188e-02 ||r(i)||/||b|| 7.994278605666e-04 >>> >>>>> 29 KSP unpreconditioned resid norm 2.354337626000e-02 true >>> resid norm 2.354337626001e-02 ||r(i)||/||b|| 5.863895533373e-04 >>> >>>>> 30 KSP unpreconditioned resid norm 1.701296561785e-02 true >>> resid norm 1.701296561785e-02 ||r(i)||/||b|| 4.237380908932e-04 >>> >>>>> 31 KSP unpreconditioned resid norm 1.509942937258e-02 true >>> resid norm 1.509942937258e-02 ||r(i)||/||b|| 3.760780759588e-04 >>> >>>>> 32 KSP unpreconditioned resid norm 1.258274688515e-02 true >>> resid norm 1.258274688515e-02 ||r(i)||/||b|| 3.133956338402e-04 >>> >>>>> 33 KSP unpreconditioned resid norm 9.805748771638e-03 true >>> resid norm 9.805748771638e-03 ||r(i)||/||b|| 2.442295692359e-04 >>> >>>>> 34 KSP unpreconditioned resid norm 8.596552678160e-03 true >>> resid norm 8.596552678160e-03 ||r(i)||/||b|| 2.141123953301e-04 >>> >>>>> 35 KSP unpreconditioned resid norm 6.936406707500e-03 true >>> resid norm 6.936406707500e-03 ||r(i)||/||b|| 1.727635147167e-04 >>> >>>>> 36 KSP unpreconditioned resid norm 5.533741607932e-03 true >>> resid norm 5.533741607932e-03 ||r(i)||/||b|| 1.378276519869e-04 >>> >>>>> 37 KSP unpreconditioned resid norm 4.982347757923e-03 true >>> resid norm 4.982347757923e-03 ||r(i)||/||b|| 1.240942099414e-04 >>> >>>>> 38 KSP unpreconditioned resid norm 4.309608348059e-03 true >>> resid norm 4.309608348059e-03 ||r(i)||/||b|| 1.073384414524e-04 >>> >>>>> 39 KSP unpreconditioned resid norm 3.729408303186e-03 true >>> resid norm 3.729408303185e-03 ||r(i)||/||b|| 9.288753001974e-05 >>> >>>>> 40 KSP unpreconditioned resid norm 3.490003351128e-03 true >>> resid norm 3.490003351128e-03 ||r(i)||/||b|| 8.692472496776e-05 >>> >>>>> 41 KSP unpreconditioned resid norm 3.069012426454e-03 true >>> resid norm 3.069012426453e-03 ||r(i)||/||b|| 7.643919912166e-05 >>> >>>>> 42 KSP unpreconditioned resid norm 2.772928845284e-03 true >>> resid norm 2.772928845284e-03 ||r(i)||/||b|| 6.906471225983e-05 >>> >>>>> 43 KSP unpreconditioned resid norm 2.561454192399e-03 true >>> resid norm 2.561454192398e-03 ||r(i)||/||b|| 6.379756085902e-05 >>> >>>>> 44 KSP unpreconditioned resid norm 2.253662762802e-03 true >>> resid norm 2.253662762802e-03 ||r(i)||/||b|| 5.613146926159e-05 >>> >>>>> 45 KSP unpreconditioned resid norm 2.086800523919e-03 true >>> resid norm 2.086800523919e-03 ||r(i)||/||b|| 5.197546917701e-05 >>> >>>>> 46 KSP unpreconditioned resid norm 1.926028182896e-03 true >>> resid norm 1.926028182896e-03 ||r(i)||/||b|| 4.797114880257e-05 >>> >>>>> 47 KSP unpreconditioned resid norm 1.769243808622e-03 true >>> resid norm 1.769243808622e-03 ||r(i)||/||b|| 4.406615581492e-05 >>> >>>>> 48 KSP unpreconditioned resid norm 1.656654905964e-03 true >>> resid norm 1.656654905964e-03 ||r(i)||/||b|| 4.126192945371e-05 >>> >>>>> 49 KSP unpreconditioned resid norm 1.572052627273e-03 true >>> resid norm 1.572052627273e-03 ||r(i)||/||b|| 3.915475961260e-05 >>> >>>>> 50 KSP unpreconditioned resid norm 1.454960682355e-03 true >>> resid norm 1.454960682355e-03 ||r(i)||/||b|| 3.623837699518e-05 >>> >>>>> 51 KSP unpreconditioned resid norm 1.375985053014e-03 true >>> resid norm 1.375985053014e-03 ||r(i)||/||b|| 3.427134883820e-05 >>> >>>>> 52 KSP unpreconditioned resid norm 1.269325501087e-03 true >>> resid norm 1.269325501087e-03 ||r(i)||/||b|| 3.161480347603e-05 >>> >>>>> 53 KSP unpreconditioned resid norm 1.184791772965e-03 true >>> resid norm 1.184791772965e-03 ||r(i)||/||b|| 2.950934100844e-05 >>> >>>>> 54 KSP unpreconditioned resid norm 1.064535156080e-03 true >>> resid norm 1.064535156080e-03 ||r(i)||/||b|| 2.651413662135e-05 >>> >>>>> 55 KSP unpreconditioned resid norm 9.639036688120e-04 true >>> resid norm 9.639036688117e-04 ||r(i)||/||b|| 2.400773090370e-05 >>> >>>>> 56 KSP unpreconditioned resid norm 8.632359780260e-04 true >>> resid norm 8.632359780260e-04 ||r(i)||/||b|| 2.150042347322e-05 >>> >>>>> 57 KSP unpreconditioned resid norm 7.613605783850e-04 true >>> resid norm 7.613605783850e-04 ||r(i)||/||b|| 1.896303591113e-05 >>> >>>>> 58 KSP unpreconditioned resid norm 6.681073248348e-04 true >>> resid norm 6.681073248349e-04 ||r(i)||/||b|| 1.664039819373e-05 >>> >>>>> 59 KSP unpreconditioned resid norm 5.656127908544e-04 true >>> resid norm 5.656127908545e-04 ||r(i)||/||b|| 1.408758999254e-05 >>> >>>>> 60 KSP unpreconditioned resid norm 4.850863370767e-04 true >>> resid norm 4.850863370767e-04 ||r(i)||/||b|| 1.208193580169e-05 >>> >>>>> 61 KSP unpreconditioned resid norm 4.374055762320e-04 true >>> resid norm 4.374055762316e-04 ||r(i)||/||b|| 1.089436186387e-05 >>> >>>>> 62 KSP unpreconditioned resid norm 3.874398257079e-04 true >>> resid norm 3.874398257077e-04 ||r(i)||/||b|| 9.649876204364e-06 >>> >>>>> 63 KSP unpreconditioned resid norm 3.364908694427e-04 true >>> resid norm 3.364908694429e-04 ||r(i)||/||b|| 8.380902061609e-06 >>> >>>>> 64 KSP unpreconditioned resid norm 2.961034697265e-04 true >>> resid norm 2.961034697268e-04 ||r(i)||/||b|| 7.374982221632e-06 >>> >>>>> 65 KSP unpreconditioned resid norm 2.640593092764e-04 true >>> resid norm 2.640593092767e-04 ||r(i)||/||b|| 6.576865557059e-06 >>> >>>>> 66 KSP unpreconditioned resid norm 2.423231125743e-04 true >>> resid norm 2.423231125745e-04 ||r(i)||/||b|| 6.035487016671e-06 >>> >>>>> 67 KSP unpreconditioned resid norm 2.182349471179e-04 true >>> resid norm 2.182349471179e-04 ||r(i)||/||b|| 5.435528521898e-06 >>> >>>>> 68 KSP unpreconditioned resid norm 2.008438265031e-04 true >>> resid norm 2.008438265028e-04 ||r(i)||/||b|| 5.002371809927e-06 >>> >>>>> 69 KSP unpreconditioned resid norm 1.838732863386e-04 true >>> resid norm 1.838732863388e-04 ||r(i)||/||b|| 4.579690400226e-06 >>> >>>>> 70 KSP unpreconditioned resid norm 1.723786027645e-04 true >>> resid norm 1.723786027645e-04 ||r(i)||/||b|| 4.293394913444e-06 >>> >>>>> 71 KSP unpreconditioned resid norm 1.580945192204e-04 true >>> resid norm 1.580945192205e-04 ||r(i)||/||b|| 3.937624471826e-06 >>> >>>>> 72 KSP unpreconditioned resid norm 1.476687469671e-04 true >>> resid norm 1.476687469671e-04 ||r(i)||/||b|| 3.677952117812e-06 >>> >>>>> 73 KSP unpreconditioned resid norm 1.385018526182e-04 true >>> resid norm 1.385018526184e-04 ||r(i)||/||b|| 3.449634351350e-06 >>> >>>>> 74 KSP unpreconditioned resid norm 1.279712893541e-04 true >>> resid norm 1.279712893541e-04 ||r(i)||/||b|| 3.187351991305e-06 >>> >>>>> 75 KSP unpreconditioned resid norm 1.202010411772e-04 true >>> resid norm 1.202010411774e-04 ||r(i)||/||b|| 2.993820175504e-06 >>> >>>>> 76 KSP unpreconditioned resid norm 1.113459414198e-04 true >>> resid norm 1.113459414200e-04 ||r(i)||/||b|| 2.773268206485e-06 >>> >>>>> 77 KSP unpreconditioned resid norm 1.042523036036e-04 true >>> resid norm 1.042523036037e-04 ||r(i)||/||b|| 2.596588572066e-06 >>> >>>>> 78 KSP unpreconditioned resid norm 9.565176453232e-05 true >>> resid norm 9.565176453227e-05 ||r(i)||/||b|| 2.382376888539e-06 >>> >>>>> 79 KSP unpreconditioned resid norm 8.896901670359e-05 true >>> resid norm 8.896901670365e-05 ||r(i)||/||b|| 2.215931198209e-06 >>> >>>>> 80 KSP unpreconditioned resid norm 8.119298425803e-05 true >>> resid norm 8.119298425824e-05 ||r(i)||/||b|| 2.022255314935e-06 >>> >>>>> 81 KSP unpreconditioned resid norm 7.544528309154e-05 true >>> resid norm 7.544528309154e-05 ||r(i)||/||b|| 1.879098620558e-06 >>> >>>>> 82 KSP unpreconditioned resid norm 6.755385041138e-05 true >>> resid norm 6.755385041176e-05 ||r(i)||/||b|| 1.682548489719e-06 >>> >>>>> 83 KSP unpreconditioned resid norm 6.158629300870e-05 true >>> resid norm 6.158629300835e-05 ||r(i)||/||b|| 1.533915885727e-06 >>> >>>>> 84 KSP unpreconditioned resid norm 5.358756885754e-05 true >>> resid norm 5.358756885765e-05 ||r(i)||/||b|| 1.334693470462e-06 >>> >>>>> 85 KSP unpreconditioned resid norm 4.774852370380e-05 true >>> resid norm 4.774852370387e-05 ||r(i)||/||b|| 1.189261692037e-06 >>> >>>>> 86 KSP unpreconditioned resid norm 3.919358737908e-05 true >>> resid norm 3.919358737930e-05 ||r(i)||/||b|| 9.761858258229e-07 >>> >>>>> 87 KSP unpreconditioned resid norm 3.434042319950e-05 true >>> resid norm 3.434042319947e-05 ||r(i)||/||b|| 8.553091620745e-07 >>> >>>>> 88 KSP unpreconditioned resid norm 2.813699436281e-05 true >>> resid norm 2.813699436302e-05 ||r(i)||/||b|| 7.008017615898e-07 >>> >>>>> 89 KSP unpreconditioned resid norm 2.462248069068e-05 true >>> resid norm 2.462248069051e-05 ||r(i)||/||b|| 6.132665635851e-07 >>> >>>>> 90 KSP unpreconditioned resid norm 2.040558789626e-05 true >>> resid norm 2.040558789626e-05 ||r(i)||/||b|| 5.082373674841e-07 >>> >>>>> 91 KSP unpreconditioned resid norm 1.888523204468e-05 true >>> resid norm 1.888523204470e-05 ||r(i)||/||b|| 4.703702077842e-07 >>> >>>>> 92 KSP unpreconditioned resid norm 1.707071292484e-05 true >>> resid norm 1.707071292474e-05 ||r(i)||/||b|| 4.251763900191e-07 >>> >>>>> 93 KSP unpreconditioned resid norm 1.498636454665e-05 true >>> resid norm 1.498636454672e-05 ||r(i)||/||b|| 3.732619958859e-07 >>> >>>>> 94 KSP unpreconditioned resid norm 1.219393542993e-05 true >>> resid norm 1.219393543006e-05 ||r(i)||/||b|| 3.037115947725e-07 >>> >>>>> 95 KSP unpreconditioned resid norm 1.059996963300e-05 true >>> resid norm 1.059996963303e-05 ||r(i)||/||b|| 2.640110487917e-07 >>> >>>>> 96 KSP unpreconditioned resid norm 9.099659872548e-06 true >>> resid norm 9.099659873214e-06 ||r(i)||/||b|| 2.266431725699e-07 >>> >>>>> 97 KSP unpreconditioned resid norm 8.147347587295e-06 true >>> resid norm 8.147347587584e-06 ||r(i)||/||b|| 2.029241456283e-07 >>> >>>>> 98 KSP unpreconditioned resid norm 7.167226146744e-06 true >>> resid norm 7.167226146783e-06 ||r(i)||/||b|| 1.785124823418e-07 >>> >>>>> 99 KSP unpreconditioned resid norm 6.552540209538e-06 true >>> resid norm 6.552540209577e-06 ||r(i)||/||b|| 1.632026385802e-07 >>> >>>>> 100 KSP unpreconditioned resid norm 5.767783600111e-06 true >>> resid norm 5.767783600320e-06 ||r(i)||/||b|| 1.436568830140e-07 >>> >>>>> 101 KSP unpreconditioned resid norm 5.261057430584e-06 true >>> resid norm 5.261057431144e-06 ||r(i)||/||b|| 1.310359688033e-07 >>> >>>>> 102 KSP unpreconditioned resid norm 4.715498525786e-06 true >>> resid norm 4.715498525947e-06 ||r(i)||/||b|| 1.174478564100e-07 >>> >>>>> 103 KSP unpreconditioned resid norm 4.380052669622e-06 true >>> resid norm 4.380052669825e-06 ||r(i)||/||b|| 1.090929822591e-07 >>> >>>>> 104 KSP unpreconditioned resid norm 3.911664470060e-06 true >>> resid norm 3.911664470226e-06 ||r(i)||/||b|| 9.742694319496e-08 >>> >>>>> 105 KSP unpreconditioned resid norm 3.652211458315e-06 true >>> resid norm 3.652211458259e-06 ||r(i)||/||b|| 9.096480564430e-08 >>> >>>>> 106 KSP unpreconditioned resid norm 3.387532128049e-06 true >>> resid norm 3.387532128358e-06 ||r(i)||/||b|| 8.437249737363e-08 >>> >>>>> 107 KSP unpreconditioned resid norm 3.234218880987e-06 true >>> resid norm 3.234218880798e-06 ||r(i)||/||b|| 8.055395895481e-08 >>> >>>>> 108 KSP unpreconditioned resid norm 3.016905196388e-06 true >>> resid norm 3.016905196492e-06 ||r(i)||/||b|| 7.514137611763e-08 >>> >>>>> 109 KSP unpreconditioned resid norm 2.858246441921e-06 true >>> resid norm 2.858246441975e-06 ||r(i)||/||b|| 7.118969836476e-08 >>> >>>>> 110 KSP unpreconditioned resid norm 2.637118810847e-06 true >>> resid norm 2.637118810750e-06 ||r(i)||/||b|| 6.568212241336e-08 >>> >>>>> 111 KSP unpreconditioned resid norm 2.494976088717e-06 true >>> resid norm 2.494976088700e-06 ||r(i)||/||b|| 6.214180574966e-08 >>> >>>>> 112 KSP unpreconditioned resid norm 2.270639574272e-06 true >>> resid norm 2.270639574200e-06 ||r(i)||/||b|| 5.655430686750e-08 >>> >>>>> 113 KSP unpreconditioned resid norm 2.104988663813e-06 true >>> resid norm 2.104988664169e-06 ||r(i)||/||b|| 5.242847707696e-08 >>> >>>>> 114 KSP unpreconditioned resid norm 1.889361127301e-06 true >>> resid norm 1.889361127526e-06 ||r(i)||/||b|| 4.705789073868e-08 >>> >>>>> 115 KSP unpreconditioned resid norm 1.732367008052e-06 true >>> resid norm 1.732367007971e-06 ||r(i)||/||b|| 4.314767367271e-08 >>> >>>>> 116 KSP unpreconditioned resid norm 1.509288268391e-06 true >>> resid norm 1.509288268645e-06 ||r(i)||/||b|| 3.759150191264e-08 >>> >>>>> 117 KSP unpreconditioned resid norm 1.359169217644e-06 true >>> resid norm 1.359169217445e-06 ||r(i)||/||b|| 3.385252062089e-08 >>> >>>>> 118 KSP unpreconditioned resid norm 1.180146337735e-06 true >>> resid norm 1.180146337908e-06 ||r(i)||/||b|| 2.939363820703e-08 >>> >>>>> 119 KSP unpreconditioned resid norm 1.067757039683e-06 true >>> resid norm 1.067757039924e-06 ||r(i)||/||b|| 2.659438335433e-08 >>> >>>>> 120 KSP unpreconditioned resid norm 9.435833073736e-07 true >>> resid norm 9.435833073736e-07 ||r(i)||/||b|| 2.350161625235e-08 >>> >>>>> 121 KSP unpreconditioned resid norm 8.749457237613e-07 true >>> resid norm 8.749457236791e-07 ||r(i)||/||b|| 2.179207546261e-08 >>> >>>>> 122 KSP unpreconditioned resid norm 7.945760150897e-07 true >>> resid norm 7.945760150444e-07 ||r(i)||/||b|| 1.979032528762e-08 >>> >>>>> 123 KSP unpreconditioned resid norm 7.141240839013e-07 true >>> resid norm 7.141240838682e-07 ||r(i)||/||b|| 1.778652721438e-08 >>> >>>>> 124 KSP unpreconditioned resid norm 6.300566936733e-07 true >>> resid norm 6.300566936607e-07 ||r(i)||/||b|| 1.569267971988e-08 >>> >>>>> 125 KSP unpreconditioned resid norm 5.628986997544e-07 true >>> resid norm 5.628986995849e-07 ||r(i)||/||b|| 1.401999073448e-08 >>> >>>>> 126 KSP unpreconditioned resid norm 5.119018951602e-07 true >>> resid norm 5.119018951837e-07 ||r(i)||/||b|| 1.274982484900e-08 >>> >>>>> 127 KSP unpreconditioned resid norm 4.664670343748e-07 true >>> resid norm 4.664670344042e-07 ||r(i)||/||b|| 1.161818903670e-08 >>> >>>>> 128 KSP unpreconditioned resid norm 4.253264691112e-07 true >>> resid norm 4.253264691948e-07 ||r(i)||/||b|| 1.059351027394e-08 >>> >>>>> 129 KSP unpreconditioned resid norm 3.868921150516e-07 true >>> resid norm 3.868921150517e-07 ||r(i)||/||b|| 9.636234498800e-09 >>> >>>>> 130 KSP unpreconditioned resid norm 3.558445658540e-07 true >>> resid norm 3.558445660061e-07 ||r(i)||/||b|| 8.862940209315e-09 >>> >>>>> 131 KSP unpreconditioned resid norm 3.268710273840e-07 true >>> resid norm 3.268710272455e-07 ||r(i)||/||b|| 8.141302825416e-09 >>> >>>>> 132 KSP unpreconditioned resid norm 3.041273897592e-07 true >>> resid norm 3.041273896694e-07 ||r(i)||/||b|| 7.574832182794e-09 >>> >>>>> 133 KSP unpreconditioned resid norm 2.851926677922e-07 true >>> resid norm 2.851926674248e-07 ||r(i)||/||b|| 7.103229333782e-09 >>> >>>>> 134 KSP unpreconditioned resid norm 2.694708315072e-07 true >>> resid norm 2.694708309500e-07 ||r(i)||/||b|| 6.711649104748e-09 >>> >>>>> 135 KSP unpreconditioned resid norm 2.534825559099e-07 true >>> resid norm 2.534825557469e-07 ||r(i)||/||b|| 6.313432746507e-09 >>> >>>>> 136 KSP unpreconditioned resid norm 2.387342352458e-07 true >>> resid norm 2.387342351804e-07 ||r(i)||/||b|| 5.946099658254e-09 >>> >>>>> 137 KSP unpreconditioned resid norm 2.200861667617e-07 true >>> resid norm 2.200861665255e-07 ||r(i)||/||b|| 5.481636425438e-09 >>> >>>>> 138 KSP unpreconditioned resid norm 2.051415370616e-07 true >>> resid norm 2.051415370614e-07 ||r(i)||/||b|| 5.109413915824e-09 >>> >>>>> 139 KSP unpreconditioned resid norm 1.887376429396e-07 true >>> resid norm 1.887376426682e-07 ||r(i)||/||b|| 4.700845824315e-09 >>> >>>>> 140 KSP unpreconditioned resid norm 1.729743133005e-07 true >>> resid norm 1.729743128342e-07 ||r(i)||/||b|| 4.308232129561e-09 >>> >>>>> 141 KSP unpreconditioned resid norm 1.541021130781e-07 true >>> resid norm 1.541021128364e-07 ||r(i)||/||b|| 3.838186508023e-09 >>> >>>>> 142 KSP unpreconditioned resid norm 1.384631628565e-07 true >>> resid norm 1.384631627735e-07 ||r(i)||/||b|| 3.448670712125e-09 >>> >>>>> 143 KSP unpreconditioned resid norm 1.223114405626e-07 true >>> resid norm 1.223114403883e-07 ||r(i)||/||b|| 3.046383411846e-09 >>> >>>>> 144 KSP unpreconditioned resid norm 1.087313066223e-07 true >>> resid norm 1.087313065117e-07 ||r(i)||/||b|| 2.708146085550e-09 >>> >>>>> 145 KSP unpreconditioned resid norm 9.181901998734e-08 true >>> resid norm 9.181901984268e-08 ||r(i)||/||b|| 2.286915582489e-09 >>> >>>>> 146 KSP unpreconditioned resid norm 7.885850510808e-08 true >>> resid norm 7.885850531446e-08 ||r(i)||/||b|| 1.964110975313e-09 >>> >>>>> 147 KSP unpreconditioned resid norm 6.483393946950e-08 true >>> resid norm 6.483393931383e-08 ||r(i)||/||b|| 1.614804278515e-09 >>> >>>>> 148 KSP unpreconditioned resid norm 5.690132597004e-08 true >>> resid norm 5.690132577518e-08 ||r(i)||/||b|| 1.417228465328e-09 >>> >>>>> 149 KSP unpreconditioned resid norm 5.023671521579e-08 true >>> resid norm 5.023671502186e-08 ||r(i)||/||b|| 1.251234511035e-09 >>> >>>>> 150 KSP unpreconditioned resid norm 4.625371062660e-08 true >>> resid norm 4.625371062660e-08 ||r(i)||/||b|| 1.152030720445e-09 >>> >>>>> 151 KSP unpreconditioned resid norm 4.349049084805e-08 true >>> resid norm 4.349049089337e-08 ||r(i)||/||b|| 1.083207830846e-09 >>> >>>>> 152 KSP unpreconditioned resid norm 3.932593324498e-08 true >>> resid norm 3.932593376918e-08 ||r(i)||/||b|| 9.794821474546e-10 >>> >>>>> 153 KSP unpreconditioned resid norm 3.504167649202e-08 true >>> resid norm 3.504167638113e-08 ||r(i)||/||b|| 8.727751166356e-10 >>> >>>>> 154 KSP unpreconditioned resid norm 2.892726347747e-08 true >>> resid norm 2.892726348583e-08 ||r(i)||/||b|| 7.204848160858e-10 >>> >>>>> 155 KSP unpreconditioned resid norm 2.477647033202e-08 true >>> resid norm 2.477647041570e-08 ||r(i)||/||b|| 6.171019508795e-10 >>> >>>>> 156 KSP unpreconditioned resid norm 2.128504065757e-08 true >>> resid norm 2.128504067423e-08 ||r(i)||/||b|| 5.301416991298e-10 >>> >>>>> 157 KSP unpreconditioned resid norm 1.879248809429e-08 true >>> resid norm 1.879248818928e-08 ||r(i)||/||b|| 4.680602575310e-10 >>> >>>>> 158 KSP unpreconditioned resid norm 1.673649140073e-08 true >>> resid norm 1.673649134005e-08 ||r(i)||/||b|| 4.168520085200e-10 >>> >>>>> 159 KSP unpreconditioned resid norm 1.497123388109e-08 true >>> resid norm 1.497123365569e-08 ||r(i)||/||b|| 3.728851342016e-10 >>> >>>>> 160 KSP unpreconditioned resid norm 1.315982130162e-08 true >>> resid norm 1.315982149329e-08 ||r(i)||/||b|| 3.277687007261e-10 >>> >>>>> 161 KSP unpreconditioned resid norm 1.182395864938e-08 true >>> resid norm 1.182395868430e-08 ||r(i)||/||b|| 2.944966675550e-10 >>> >>>>> 162 KSP unpreconditioned resid norm 1.070204481679e-08 true >>> resid norm 1.070204466432e-08 ||r(i)||/||b|| 2.665534085342e-10 >>> >>>>> 163 KSP unpreconditioned resid norm 9.969290307649e-09 true >>> resid norm 9.969290432333e-09 ||r(i)||/||b|| 2.483028644297e-10 >>> >>>>> 164 KSP unpreconditioned resid norm 9.134440883306e-09 true >>> resid norm 9.134440980976e-09 ||r(i)||/||b|| 2.275094577628e-10 >>> >>>>> 165 KSP unpreconditioned resid norm 8.593316427292e-09 true >>> resid norm 8.593316413360e-09 ||r(i)||/||b|| 2.140317904139e-10 >>> >>>>> 166 KSP unpreconditioned resid norm 8.042173048464e-09 true >>> resid norm 8.042173332848e-09 ||r(i)||/||b|| 2.003045942277e-10 >>> >>>>> 167 KSP unpreconditioned resid norm 7.655518522782e-09 true >>> resid norm 7.655518879144e-09 ||r(i)||/||b|| 1.906742791064e-10 >>> >>>>> 168 KSP unpreconditioned resid norm 7.210283391815e-09 true >>> resid norm 7.210283220312e-09 ||r(i)||/||b|| 1.795848951442e-10 >>> >>>>> 169 KSP unpreconditioned resid norm 6.793967416271e-09 true >>> resid norm 6.793967448832e-09 ||r(i)||/||b|| 1.692158122825e-10 >>> >>>>> 170 KSP unpreconditioned resid norm 6.249160304588e-09 true >>> resid norm 6.249160382647e-09 ||r(i)||/||b|| 1.556464257736e-10 >>> >>>>> 171 KSP unpreconditioned resid norm 5.794936438798e-09 true >>> resid norm 5.794936332552e-09 ||r(i)||/||b|| 1.443331699811e-10 >>> >>>>> 172 KSP unpreconditioned resid norm 5.222337397128e-09 true >>> resid norm 5.222337443277e-09 ||r(i)||/||b|| 1.300715788135e-10 >>> >>>>> 173 KSP unpreconditioned resid norm 4.755359110447e-09 true >>> resid norm 4.755358888996e-09 ||r(i)||/||b|| 1.184406494668e-10 >>> >>>>> 174 KSP unpreconditioned resid norm 4.317537007873e-09 true >>> resid norm 4.317537267718e-09 ||r(i)||/||b|| 1.075359252630e-10 >>> >>>>> 175 KSP unpreconditioned resid norm 3.924177535665e-09 true >>> resid norm 3.924177629720e-09 ||r(i)||/||b|| 9.773860563138e-11 >>> >>>>> 176 KSP unpreconditioned resid norm 3.502843065115e-09 true >>> resid norm 3.502843126359e-09 ||r(i)||/||b|| 8.724452234855e-11 >>> >>>>> 177 KSP unpreconditioned resid norm 3.083873232869e-09 true >>> resid norm 3.083873352938e-09 ||r(i)||/||b|| 7.680933686007e-11 >>> >>>>> 178 KSP unpreconditioned resid norm 2.758980676473e-09 true >>> resid norm 2.758980618096e-09 ||r(i)||/||b|| 6.871730691658e-11 >>> >>>>> 179 KSP unpreconditioned resid norm 2.510978240429e-09 true >>> resid norm 2.510978327392e-09 ||r(i)||/||b|| 6.254036989334e-11 >>> >>>>> 180 KSP unpreconditioned resid norm 2.323000193205e-09 true >>> resid norm 2.323000193205e-09 ||r(i)||/||b|| 5.785844097519e-11 >>> >>>>> 181 KSP unpreconditioned resid norm 2.167480159274e-09 true >>> resid norm 2.167480113693e-09 ||r(i)||/||b|| 5.398493749153e-11 >>> >>>>> 182 KSP unpreconditioned resid norm 1.983545827983e-09 true >>> resid norm 1.983546404840e-09 ||r(i)||/||b|| 4.940374216139e-11 >>> >>>>> 183 KSP unpreconditioned resid norm 1.794576286774e-09 true >>> resid norm 1.794576224361e-09 ||r(i)||/||b|| 4.469710457036e-11 >>> >>>>> 184 KSP unpreconditioned resid norm 1.583490590644e-09 true >>> resid norm 1.583490380603e-09 ||r(i)||/||b|| 3.943963715064e-11 >>> >>>>> 185 KSP unpreconditioned resid norm 1.412659866247e-09 true >>> resid norm 1.412659832191e-09 ||r(i)||/||b|| 3.518479927722e-11 >>> >>>>> 186 KSP unpreconditioned resid norm 1.285613344939e-09 true >>> resid norm 1.285612984761e-09 ||r(i)||/||b|| 3.202047215205e-11 >>> >>>>> 187 KSP unpreconditioned resid norm 1.168115133929e-09 true >>> resid norm 1.168114766904e-09 ||r(i)||/||b|| 2.909397058634e-11 >>> >>>>> 188 KSP unpreconditioned resid norm 1.063377926053e-09 true >>> resid norm 1.063377647554e-09 ||r(i)||/||b|| 2.648530681802e-11 >>> >>>>> 189 KSP unpreconditioned resid norm 9.548967728122e-10 true >>> resid norm 9.548964523410e-10 ||r(i)||/||b|| 2.378339019807e-11 >>> >>>>> KSP Object: 16 MPI processes >>> >>>>> type: fgmres >>> >>>>> restart=30, using Classical (unmodified) Gram-Schmidt >>> Orthogonalization with no iterative refinement >>> >>>>> happy breakdown tolerance 1e-30 >>> >>>>> maximum iterations=2000, initial guess is zero >>> >>>>> tolerances: relative=1e-20, absolute=1e-09, >>> divergence=10000. >>> >>>>> right preconditioning >>> >>>>> using UNPRECONDITIONED norm type for convergence test >>> >>>>> PC Object: 16 MPI processes >>> >>>>> type: bjacobi >>> >>>>> number of blocks = 4 >>> >>>>> Local solver information for first block is in the >>> following KSP and PC objects on rank 0: >>> >>>>> Use -ksp_view ::ascii_info_detail to display >>> information for all blocks >>> >>>>> KSP Object: (sub_) 4 MPI processes >>> >>>>> type: preonly >>> >>>>> maximum iterations=10000, initial guess is zero >>> >>>>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> >>>>> left preconditioning >>> >>>>> using NONE norm type for convergence test >>> >>>>> PC Object: (sub_) 4 MPI processes >>> >>>>> type: telescope >>> >>>>> petsc subcomm: parent comm size reduction factor = 4 >>> >>>>> petsc subcomm: parent_size = 4 , subcomm_size = 1 >>> >>>>> petsc subcomm type = contiguous >>> >>>>> linear system matrix = precond matrix: >>> >>>>> Mat Object: (sub_) 4 MPI processes >>> >>>>> type: mpiaij >>> >>>>> rows=40200, cols=40200 >>> >>>>> total: nonzeros=199996, allocated nonzeros=203412 >>> >>>>> total number of mallocs used during MatSetValues calls=0 >>> >>>>> not using I-node (on process 0) routines >>> >>>>> setup type: default >>> >>>>> Parent DM object: NULL >>> >>>>> Sub DM object: NULL >>> >>>>> KSP Object: (sub_telescope_) 1 MPI processes >>> >>>>> type: preonly >>> >>>>> maximum iterations=10000, initial guess is zero >>> >>>>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> >>>>> left preconditioning >>> >>>>> using NONE norm type for convergence test >>> >>>>> PC Object: (sub_telescope_) 1 MPI processes >>> >>>>> type: lu >>> >>>>> out-of-place factorization >>> >>>>> tolerance for zero pivot 2.22045e-14 >>> >>>>> matrix ordering: external >>> >>>>> factor fill ratio given 0., needed 0. >>> >>>>> Factored matrix follows: >>> >>>>> Mat Object: 1 MPI processes >>> >>>>> type: mumps >>> >>>>> rows=40200, cols=40200 >>> >>>>> package used to perform factorization: mumps >>> >>>>> total: nonzeros=1849788, allocated >>> nonzeros=1849788 >>> >>>>> MUMPS run parameters: >>> >>>>> SYM (matrix type): 0 >>> >>>>> PAR (host participation): 1 >>> >>>>> ICNTL(1) (output for error): 6 >>> >>>>> ICNTL(2) (output of diagnostic msg): 0 >>> >>>>> ICNTL(3) (output for global info): 0 >>> >>>>> ICNTL(4) (level of printing): 0 >>> >>>>> ICNTL(5) (input mat struct): 0 >>> >>>>> ICNTL(6) (matrix prescaling): 7 >>> >>>>> ICNTL(7) (sequential matrix ordering):7 >>> >>>>> ICNTL(8) (scaling strategy): 77 >>> >>>>> ICNTL(10) (max num of refinements): 0 >>> >>>>> ICNTL(11) (error analysis): 0 >>> >>>>> ICNTL(12) (efficiency control): 1 >>> >>>>> ICNTL(13) (sequential factorization >>> of the root node): 0 >>> >>>>> ICNTL(14) (percentage of estimated >>> workspace increase): 20 >>> >>>>> ICNTL(18) (input mat struct): 0 >>> >>>>> ICNTL(19) (Schur complement info): >>> 0 >>> >>>>> ICNTL(20) (RHS sparse pattern): 0 >>> >>>>> ICNTL(21) (solution struct): 0 >>> >>>>> ICNTL(22) (in-core/out-of-core >>> facility): 0 >>> >>>>> ICNTL(23) (max size of memory can be >>> allocated locally):0 >>> >>>>> ICNTL(24) (detection of null pivot >>> rows): 0 >>> >>>>> ICNTL(25) (computation of a null >>> space basis): 0 >>> >>>>> ICNTL(26) (Schur options for RHS or >>> solution): 0 >>> >>>>> ICNTL(27) (blocking size for multiple >>> RHS): -32 >>> >>>>> ICNTL(28) (use parallel or sequential >>> ordering): 1 >>> >>>>> ICNTL(29) (parallel ordering): 0 >>> >>>>> ICNTL(30) (user-specified set of >>> entries in inv(A)): 0 >>> >>>>> ICNTL(31) (factors is discarded in >>> the solve phase): 0 >>> >>>>> ICNTL(33) (compute determinant): 0 >>> >>>>> ICNTL(35) (activate BLR based >>> factorization): 0 >>> >>>>> ICNTL(36) (choice of BLR >>> factorization variant): 0 >>> >>>>> ICNTL(38) (estimated compression rate >>> of LU factors): 333 >>> >>>>> CNTL(1) (relative pivoting >>> threshold): 0.01 >>> >>>>> CNTL(2) (stopping criterion of >>> refinement): 1.49012e-08 >>> >>>>> CNTL(3) (absolute pivoting >>> threshold): 0. >>> >>>>> CNTL(4) (value of static pivoting): >>> -1. >>> >>>>> CNTL(5) (fixation for null pivots): >>> 0. >>> >>>>> CNTL(7) (dropping parameter for >>> BLR): 0. >>> >>>>> RINFO(1) (local estimated flops for >>> the elimination after analysis): >>> >>>>> [0] 1.45525e+08 >>> >>>>> RINFO(2) (local estimated flops for >>> the assembly after factorization): >>> >>>>> [0] 2.89397e+06 >>> >>>>> RINFO(3) (local estimated flops for >>> the elimination after factorization): >>> >>>>> [0] 1.45525e+08 >>> >>>>> INFO(15) (estimated size of (in MB) >>> MUMPS internal data for running numerical factorization): >>> >>>>> [0] 29 >>> >>>>> INFO(16) (size of (in MB) MUMPS >>> internal data used during numerical factorization): >>> >>>>> [0] 29 >>> >>>>> INFO(23) (num of pivots eliminated on >>> this processor after factorization): >>> >>>>> [0] 40200 >>> >>>>> RINFOG(1) (global estimated flops for >>> the elimination after analysis): 1.45525e+08 >>> >>>>> RINFOG(2) (global estimated flops for >>> the assembly after factorization): 2.89397e+06 >>> >>>>> RINFOG(3) (global estimated flops for >>> the elimination after factorization): 1.45525e+08 >>> >>>>> (RINFOG(12) RINFOG(13))*2^INFOG(34) >>> (determinant): (0.,0.)*(2^0) >>> >>>>> INFOG(3) (estimated real workspace >>> for factors on all processors after analysis): 1849788 >>> >>>>> INFOG(4) (estimated integer workspace >>> for factors on all processors after analysis): 879986 >>> >>>>> INFOG(5) (estimated maximum front >>> size in the complete tree): 282 >>> >>>>> INFOG(6) (number of nodes in the >>> complete tree): 23709 >>> >>>>> INFOG(7) (ordering option effectively >>> used after analysis): 5 >>> >>>>> INFOG(8) (structural symmetry in >>> percent of the permuted matrix after analysis): 100 >>> >>>>> INFOG(9) (total real/complex >>> workspace to store the matrix factors after factorization): 1849788 >>> >>>>> INFOG(10) (total integer space store >>> the matrix factors after factorization): 879986 >>> >>>>> INFOG(11) (order of largest frontal >>> matrix after factorization): 282 >>> >>>>> INFOG(12) (number of off-diagonal >>> pivots): 0 >>> >>>>> INFOG(13) (number of delayed pivots >>> after factorization): 0 >>> >>>>> INFOG(14) (number of memory compress >>> after factorization): 0 >>> >>>>> INFOG(15) (number of steps of >>> iterative refinement after solution): 0 >>> >>>>> INFOG(16) (estimated size (in MB) of >>> all MUMPS internal data for factorization after analysis: value on >>> the most memory consuming processor): 29 >>> >>>>> INFOG(17) (estimated size of all >>> MUMPS internal data for factorization after analysis: sum over all >>> processors): 29 >>> >>>>> INFOG(18) (size of all MUMPS internal >>> data allocated during factorization: value on the most memory >>> consuming processor): 29 >>> >>>>> INFOG(19) (size of all MUMPS internal >>> data allocated during factorization: sum over all processors): 29 >>> >>>>> INFOG(20) (estimated number of >>> entries in the factors): 1849788 >>> >>>>> INFOG(21) (size in MB of memory >>> effectively used during factorization - value on the most memory >>> consuming processor): 26 >>> >>>>> INFOG(22) (size in MB of memory >>> effectively used during factorization - sum over all processors): 26 >>> >>>>> INFOG(23) (after analysis: value of >>> ICNTL(6) effectively used): 0 >>> >>>>> INFOG(24) (after analysis: value of >>> ICNTL(12) effectively used): 1 >>> >>>>> INFOG(25) (after factorization: >>> number of pivots modified by static pivoting): 0 >>> >>>>> INFOG(28) (after factorization: >>> number of null pivots encountered): 0 >>> >>>>> INFOG(29) (after factorization: >>> effective number of entries in the factors (sum over all >>> processors)): 1849788 >>> >>>>> INFOG(30, 31) (after solution: size >>> in Mbytes of memory used during solution phase): 29, 29 >>> >>>>> INFOG(32) (after analysis: type of >>> analysis done): 1 >>> >>>>> INFOG(33) (value used for ICNTL(8)): 7 >>> >>>>> INFOG(34) (exponent of the >>> determinant if determinant is requested): 0 >>> >>>>> INFOG(35) (after factorization: >>> number of entries taking into account BLR factor compression - sum >>> over all processors): 1849788 >>> >>>>> INFOG(36) (after analysis: estimated >>> size of all MUMPS internal data for running BLR in-core - value on >>> the most memory consuming processor): 0 >>> >>>>> INFOG(37) (after analysis: estimated >>> size of all MUMPS internal data for running BLR in-core - sum over >>> all processors): 0 >>> >>>>> INFOG(38) (after analysis: estimated >>> size of all MUMPS internal data for running BLR out-of-core - >>> value on the most memory consuming processor): 0 >>> >>>>> INFOG(39) (after analysis: estimated >>> size of all MUMPS internal data for running BLR out-of-core - sum >>> over all processors): 0 >>> >>>>> linear system matrix = precond matrix: >>> >>>>> Mat Object: 1 MPI processes >>> >>>>> type: seqaijcusparse >>> >>>>> rows=40200, cols=40200 >>> >>>>> total: nonzeros=199996, allocated nonzeros=199996 >>> >>>>> total number of mallocs used during >>> MatSetValues calls=0 >>> >>>>> not using I-node routines >>> >>>>> linear system matrix = precond matrix: >>> >>>>> Mat Object: 16 MPI processes >>> >>>>> type: mpiaijcusparse >>> >>>>> rows=160800, cols=160800 >>> >>>>> total: nonzeros=802396, allocated nonzeros=1608000 >>> >>>>> total number of mallocs used during MatSetValues calls=0 >>> >>>>> not using I-node (on process 0) routines >>> >>>>> Norm of error 9.11684e-07 iterations 189 >>> >>>>> Chang >>> >>>>> On 10/14/21 10:10 PM, Chang Liu wrote: >>> >>>>>> Hi Barry, >>> >>>>>> >>> >>>>>> No problem. Here is the output. It seems that the resid >>> norm calculation is incorrect. >>> >>>>>> >>> >>>>>> $ mpiexec -n 16 --hostfile hostfile --oversubscribe ./ex7 >>> -m 400 -ksp_view -ksp_monitor_true_residual -pc_type bjacobi >>> -pc_bjacobi_blocks 4 -ksp_type fgmres -mat_type aijcusparse >>> -sub_pc_type telescope -sub_ksp_type preonly >>> -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu >>> -sub_telescope_pc_factor_mat_solver_type cusparse >>> -sub_pc_telescope_reduction_factor 4 >>> -sub_pc_telescope_subcomm_type contiguous -ksp_max_it 2000 >>> -ksp_rtol 1.e-20 -ksp_atol 1.e-9 >>> >>>>>> 0 KSP unpreconditioned resid norm 4.014971979977e+01 >>> true resid norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00 >>> >>>>>> 1 KSP unpreconditioned resid norm 0.000000000000e+00 >>> true resid norm 4.014971979977e+01 ||r(i)||/||b|| 1.000000000000e+00 >>> >>>>>> KSP Object: 16 MPI processes >>> >>>>>> type: fgmres >>> >>>>>> restart=30, using Classical (unmodified) Gram-Schmidt >>> Orthogonalization with no iterative refinement >>> >>>>>> happy breakdown tolerance 1e-30 >>> >>>>>> maximum iterations=2000, initial guess is zero >>> >>>>>> tolerances: relative=1e-20, absolute=1e-09, >>> divergence=10000. >>> >>>>>> right preconditioning >>> >>>>>> using UNPRECONDITIONED norm type for convergence test >>> >>>>>> PC Object: 16 MPI processes >>> >>>>>> type: bjacobi >>> >>>>>> number of blocks = 4 >>> >>>>>> Local solver information for first block is in the >>> following KSP and PC objects on rank 0: >>> >>>>>> Use -ksp_view ::ascii_info_detail to display >>> information for all blocks >>> >>>>>> KSP Object: (sub_) 4 MPI processes >>> >>>>>> type: preonly >>> >>>>>> maximum iterations=10000, initial guess is zero >>> >>>>>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> >>>>>> left preconditioning >>> >>>>>> using NONE norm type for convergence test >>> >>>>>> PC Object: (sub_) 4 MPI processes >>> >>>>>> type: telescope >>> >>>>>> petsc subcomm: parent comm size reduction factor = 4 >>> >>>>>> petsc subcomm: parent_size = 4 , subcomm_size = 1 >>> >>>>>> petsc subcomm type = contiguous >>> >>>>>> linear system matrix = precond matrix: >>> >>>>>> Mat Object: (sub_) 4 MPI processes >>> >>>>>> type: mpiaij >>> >>>>>> rows=40200, cols=40200 >>> >>>>>> total: nonzeros=199996, allocated nonzeros=203412 >>> >>>>>> total number of mallocs used during MatSetValues >>> calls=0 >>> >>>>>> not using I-node (on process 0) routines >>> >>>>>> setup type: default >>> >>>>>> Parent DM object: NULL >>> >>>>>> Sub DM object: NULL >>> >>>>>> KSP Object: (sub_telescope_) 1 MPI processes >>> >>>>>> type: preonly >>> >>>>>> maximum iterations=10000, initial guess is zero >>> >>>>>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> >>>>>> left preconditioning >>> >>>>>> using NONE norm type for convergence test >>> >>>>>> PC Object: (sub_telescope_) 1 MPI processes >>> >>>>>> type: lu >>> >>>>>> out-of-place factorization >>> >>>>>> tolerance for zero pivot 2.22045e-14 >>> >>>>>> matrix ordering: nd >>> >>>>>> factor fill ratio given 5., needed 8.62558 >>> >>>>>> Factored matrix follows: >>> >>>>>> Mat Object: 1 MPI processes >>> >>>>>> type: seqaijcusparse >>> >>>>>> rows=40200, cols=40200 >>> >>>>>> package used to perform factorization: >>> cusparse >>> >>>>>> total: nonzeros=1725082, allocated >>> nonzeros=1725082 >>> >>>>>> not using I-node routines >>> >>>>>> linear system matrix = precond matrix: >>> >>>>>> Mat Object: 1 MPI processes >>> >>>>>> type: seqaijcusparse >>> >>>>>> rows=40200, cols=40200 >>> >>>>>> total: nonzeros=199996, allocated nonzeros=199996 >>> >>>>>> total number of mallocs used during >>> MatSetValues calls=0 >>> >>>>>> not using I-node routines >>> >>>>>> linear system matrix = precond matrix: >>> >>>>>> Mat Object: 16 MPI processes >>> >>>>>> type: mpiaijcusparse >>> >>>>>> rows=160800, cols=160800 >>> >>>>>> total: nonzeros=802396, allocated nonzeros=1608000 >>> >>>>>> total number of mallocs used during MatSetValues calls=0 >>> >>>>>> not using I-node (on process 0) routines >>> >>>>>> Norm of error 400.999 iterations 1 >>> >>>>>> >>> >>>>>> Chang >>> >>>>>> >>> >>>>>> >>> >>>>>> On 10/14/21 9:47 PM, Barry Smith wrote: >>> >>>>>>> >>> >>>>>>> Chang, >>> >>>>>>> >>> >>>>>>> Sorry I did not notice that one. Please run that with >>> -ksp_view -ksp_monitor_true_residual so we can see exactly how >>> options are interpreted and solver used. At a glance it looks ok >>> but something must be wrong to get the wrong answer. >>> >>>>>>> >>> >>>>>>> Barry >>> >>>>>>> >>> >>>>>>>> On Oct 14, 2021, at 6:02 PM, Chang Liu <c...@pppl.gov >>> <mailto:c...@pppl.gov>> wrote: >>> >>>>>>>> >>> >>>>>>>> Hi Barry, >>> >>>>>>>> >>> >>>>>>>> That is exactly what I was doing in the second example, >>> in which the preconditioner works but the GMRES does not. >>> >>>>>>>> >>> >>>>>>>> Chang >>> >>>>>>>> >>> >>>>>>>> On 10/14/21 5:15 PM, Barry Smith wrote: >>> >>>>>>>>> You need to use the PCTELESCOPE inside the block >>> Jacobi, not outside it. So something like -pc_type bjacobi >>> -sub_pc_type telescope -sub_telescope_pc_type lu >>> >>>>>>>>>> On Oct 14, 2021, at 4:14 PM, Chang Liu <c...@pppl.gov >>> <mailto:c...@pppl.gov>> wrote: >>> >>>>>>>>>> >>> >>>>>>>>>> Hi Pierre, >>> >>>>>>>>>> >>> >>>>>>>>>> I wonder if the trick of PCTELESCOPE only works for >>> preconditioner and not for the solver. I have done some tests, and >>> find that for solving a small matrix using -telescope_ksp_type >>> preonly, it does work for GPU with multiple MPI processes. >>> However, for bjacobi and gmres, it does not work. >>> >>>>>>>>>> >>> >>>>>>>>>> The command line options I used for small matrix is like >>> >>>>>>>>>> >>> >>>>>>>>>> mpiexec -n 4 --oversubscribe ./ex7 -m 100 >>> -ksp_monitor_short -pc_type telescope -mat_type aijcusparse >>> -telescope_pc_type lu -telescope_pc_factor_mat_solver_type >>> cusparse -telescope_ksp_type preonly -pc_telescope_reduction_factor 4 >>> >>>>>>>>>> >>> >>>>>>>>>> which gives the correct output. For iterative solver, I >>> tried >>> >>>>>>>>>> >>> >>>>>>>>>> mpiexec -n 16 --oversubscribe ./ex7 -m 400 >>> -ksp_monitor_short -pc_type bjacobi -pc_bjacobi_blocks 4 -ksp_type >>> fgmres -mat_type aijcusparse -sub_pc_type telescope -sub_ksp_type >>> preonly -sub_telescope_ksp_type preonly -sub_telescope_pc_type lu >>> -sub_telescope_pc_factor_mat_solver_type cusparse >>> -sub_pc_telescope_reduction_factor 4 -ksp_max_it 2000 -ksp_rtol >>> 1.e-9 -ksp_atol 1.e-20 >>> >>>>>>>>>> >>> >>>>>>>>>> for large matrix. The output is like >>> >>>>>>>>>> >>> >>>>>>>>>> 0 KSP Residual norm 40.1497 >>> >>>>>>>>>> 1 KSP Residual norm < 1.e-11 >>> >>>>>>>>>> Norm of error 400.999 iterations 1 >>> >>>>>>>>>> >>> >>>>>>>>>> So it seems to call a direct solver instead of an >>> iterative one. >>> >>>>>>>>>> >>> >>>>>>>>>> Can you please help check these options? >>> >>>>>>>>>> >>> >>>>>>>>>> Chang >>> >>>>>>>>>> >>> >>>>>>>>>> On 10/14/21 10:04 AM, Pierre Jolivet wrote: >>> >>>>>>>>>>>> On 14 Oct 2021, at 3:50 PM, Chang Liu <c...@pppl.gov >>> <mailto:c...@pppl.gov>> wrote: >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> Thank you Pierre. I was not aware of PCTELESCOPE >>> before. This sounds exactly what I need. I wonder if PCTELESCOPE >>> can transform a mpiaijcusparse to seqaircusparse? Or I have to do >>> it manually? >>> >>>>>>>>>>> PCTELESCOPE uses MatCreateMPIMatConcatenateSeqMat(). >>> >>>>>>>>>>> 1) I’m not sure this is implemented for cuSparse >>> matrices, but it should be; >>> >>>>>>>>>>> 2) at least for the implementations >>> MatCreateMPIMatConcatenateSeqMat_MPIBAIJ() and >>> MatCreateMPIMatConcatenateSeqMat_MPIAIJ(), the resulting MatType >>> is MATBAIJ (resp. MATAIJ). Constructors are usually “smart” enough >>> to detect if the MPI communicator on which the Mat lives is of >>> size 1 (your case), and then the resulting Mat is of type MatSeqX >>> instead of MatMPIX, so you would not need to worry about the >>> transformation you are mentioning. >>> >>>>>>>>>>> If you try this out and this does not work, please >>> provide the backtrace (probably something like “Operation XYZ not >>> implemented for MatType ABC”), and hopefully someone can add the >>> missing plumbing. >>> >>>>>>>>>>> I do not claim that this will be efficient, but I >>> think this goes in the direction of what you want to achieve. >>> >>>>>>>>>>> Thanks, >>> >>>>>>>>>>> Pierre >>> >>>>>>>>>>>> Chang >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> On 10/14/21 1:35 AM, Pierre Jolivet wrote: >>> >>>>>>>>>>>>> Maybe I’m missing something, but can’t you use >>> PCTELESCOPE as a subdomain solver, with a reduction factor equal >>> to the number of MPI processes you have per block? >>> >>>>>>>>>>>>> -sub_pc_type telescope >>> -sub_pc_telescope_reduction_factor X -sub_telescope_pc_type lu >>> >>>>>>>>>>>>> This does not work with MUMPS >>> -mat_mumps_use_omp_threads because not only do the Mat needs to be >>> redistributed, the secondary processes also need to be “converted” >>> to OpenMP threads. >>> >>>>>>>>>>>>> Thus the need for specific code in mumps.c. >>> >>>>>>>>>>>>> Thanks, >>> >>>>>>>>>>>>> Pierre >>> >>>>>>>>>>>>>> On 14 Oct 2021, at 6:00 AM, Chang Liu via >>> petsc-users <petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> wrote: >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> Hi Junchao, >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> Yes that is what I want. >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> Chang >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> On 10/13/21 11:42 PM, Junchao Zhang wrote: >>> >>>>>>>>>>>>>>> On Wed, Oct 13, 2021 at 8:58 PM Barry Smith >>> <bsm...@petsc.dev <mailto:bsm...@petsc.dev> >>> <mailto:bsm...@petsc.dev <mailto:bsm...@petsc.dev>>> wrote: >>> >>>>>>>>>>>>>>> Junchao, >>> >>>>>>>>>>>>>>> If I understand correctly Chang is >>> using the block Jacobi >>> >>>>>>>>>>>>>>> method with a single block for a number of >>> MPI ranks and a direct >>> >>>>>>>>>>>>>>> solver for each block so it uses >>> PCSetUp_BJacobi_Multiproc() which >>> >>>>>>>>>>>>>>> is code Hong Zhang wrote a number of years >>> ago for CPUs. For their >>> >>>>>>>>>>>>>>> particular problems this preconditioner works >>> well, but using an >>> >>>>>>>>>>>>>>> iterative solver on the blocks does not work >>> well. >>> >>>>>>>>>>>>>>> If we had complete MPI-GPU direct >>> solvers he could just use >>> >>>>>>>>>>>>>>> the current code with MPIAIJCUSPARSE on each >>> block but since we do >>> >>>>>>>>>>>>>>> not he would like to use a single GPU for >>> each block, this means >>> >>>>>>>>>>>>>>> that diagonal blocks of the global parallel >>> MPI matrix needs to be >>> >>>>>>>>>>>>>>> sent to a subset of the GPUs (one GPU per >>> block, which has multiple >>> >>>>>>>>>>>>>>> MPI ranks associated with the blocks). >>> Similarly for the triangular >>> >>>>>>>>>>>>>>> solves the blocks of the right hand side >>> needs to be shipped to the >>> >>>>>>>>>>>>>>> appropriate GPU and the resulting solution >>> shipped back to the >>> >>>>>>>>>>>>>>> multiple GPUs. So Chang is absolutely >>> correct, this is somewhat like >>> >>>>>>>>>>>>>>> your code for MUMPS with OpenMP. OK, I now >>> understand the background.. >>> >>>>>>>>>>>>>>> One could use PCSetUp_BJacobi_Multiproc() and >>> get the blocks on the >>> >>>>>>>>>>>>>>> MPI ranks and then shrink each block down to >>> a single GPU but this >>> >>>>>>>>>>>>>>> would be pretty inefficient, ideally one >>> would go directly from the >>> >>>>>>>>>>>>>>> big MPI matrix on all the GPUs to the sub >>> matrices on the subset of >>> >>>>>>>>>>>>>>> GPUs. But this may be a large coding project. >>> >>>>>>>>>>>>>>> I don't understand these sentences. Why do you say >>> "shrink"? In my mind, we just need to move each block (submatrix) >>> living over multiple MPI ranks to one of them and solve directly >>> there. In other words, we keep blocks' size, no shrinking or >>> expanding. >>> >>>>>>>>>>>>>>> As mentioned before, cusparse does not provide LU >>> factorization. So the LU factorization would be done on CPU, and >>> the solve be done on GPU. I assume Chang wants to gain from the >>> (potential) faster solve (instead of factorization) on GPU. >>> >>>>>>>>>>>>>>> Barry >>> >>>>>>>>>>>>>>> Since the matrices being factored and solved >>> directly are relatively >>> >>>>>>>>>>>>>>> large it is possible that the cusparse code >>> could be reasonably >>> >>>>>>>>>>>>>>> efficient (they are not the tiny problems one >>> gets at the coarse >>> >>>>>>>>>>>>>>> level of multigrid). Of course, this is >>> speculation, I don't >>> >>>>>>>>>>>>>>> actually know how much better the cusparse >>> code would be on the >>> >>>>>>>>>>>>>>> direct solver than a good CPU direct sparse >>> solver. >>> >>>>>>>>>>>>>>> > On Oct 13, 2021, at 9:32 PM, Chang Liu >>> <c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov>>> wrote: >>> >>>>>>>>>>>>>>> > >>> >>>>>>>>>>>>>>> > Sorry I am not familiar with the details >>> either. Can you please >>> >>>>>>>>>>>>>>> check the code in >>> MatMumpsGatherNonzerosOnMaster in mumps.c? >>> >>>>>>>>>>>>>>> > >>> >>>>>>>>>>>>>>> > Chang >>> >>>>>>>>>>>>>>> > >>> >>>>>>>>>>>>>>> > On 10/13/21 9:24 PM, Junchao Zhang wrote: >>> >>>>>>>>>>>>>>> >> Hi Chang, >>> >>>>>>>>>>>>>>> >> I did the work in mumps. It is easy for >>> me to understand >>> >>>>>>>>>>>>>>> gathering matrix rows to one process. >>> >>>>>>>>>>>>>>> >> But how to gather blocks (submatrices) >>> to form a large block? Can you draw a picture of that? >>> >>>>>>>>>>>>>>> >> Thanks >>> >>>>>>>>>>>>>>> >> --Junchao Zhang >>> >>>>>>>>>>>>>>> >> On Wed, Oct 13, 2021 at 7:47 PM Chang Liu >>> via petsc-users >>> >>>>>>>>>>>>>>> <petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>>> >>> >>>>>>>>>>>>>>> wrote: >>> >>>>>>>>>>>>>>> >> Hi Barry, >>> >>>>>>>>>>>>>>> >> I think mumps solver in petsc does >>> support that. You can >>> >>>>>>>>>>>>>>> check the >>> >>>>>>>>>>>>>>> >> documentation on >>> "-mat_mumps_use_omp_threads" at >>> >>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>> >>> https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html >>> <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html> >>> >>>>>>>>>>>>>>> >>> <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html >>> <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>> >>> >>>>>>>>>>>>>>> >> >>> <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html >>> <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html> >>> >>>>>>>>>>>>>>> >>> <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html >>> <https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html>>> >>> >>>>>>>>>>>>>>> >> and the code enclosed by #if >>> >>>>>>>>>>>>>>> defined(PETSC_HAVE_OPENMP_SUPPORT) in >>> >>>>>>>>>>>>>>> >> functions MatMumpsSetUpDistRHSInfo and >>> >>>>>>>>>>>>>>> >> MatMumpsGatherNonzerosOnMaster in >>> >>>>>>>>>>>>>>> >> mumps.c >>> >>>>>>>>>>>>>>> >> 1. I understand it is ideal to do one >>> MPI rank per GPU. >>> >>>>>>>>>>>>>>> However, I am >>> >>>>>>>>>>>>>>> >> working on an existing code that was >>> developed based on MPI >>> >>>>>>>>>>>>>>> and the the >>> >>>>>>>>>>>>>>> >> # of mpi ranks is typically equal to # >>> of cpu cores. We don't >>> >>>>>>>>>>>>>>> want to >>> >>>>>>>>>>>>>>> >> change the whole structure of the code. >>> >>>>>>>>>>>>>>> >> 2. What you have suggested has been >>> coded in mumps.c. See >>> >>>>>>>>>>>>>>> function >>> >>>>>>>>>>>>>>> >> MatMumpsSetUpDistRHSInfo. >>> >>>>>>>>>>>>>>> >> Regards, >>> >>>>>>>>>>>>>>> >> Chang >>> >>>>>>>>>>>>>>> >> On 10/13/21 7:53 PM, Barry Smith wrote: >>> >>>>>>>>>>>>>>> >> > >>> >>>>>>>>>>>>>>> >> > >>> >>>>>>>>>>>>>>> >> >> On Oct 13, 2021, at 3:50 PM, Chang >>> Liu <c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov>>>> wrote: >>> >>>>>>>>>>>>>>> >> >> >>> >>>>>>>>>>>>>>> >> >> Hi Barry, >>> >>>>>>>>>>>>>>> >> >> >>> >>>>>>>>>>>>>>> >> >> That is exactly what I want. >>> >>>>>>>>>>>>>>> >> >> >>> >>>>>>>>>>>>>>> >> >> Back to my original question, I am >>> looking for an approach to >>> >>>>>>>>>>>>>>> >> transfer >>> >>>>>>>>>>>>>>> >> >> matrix >>> >>>>>>>>>>>>>>> >> >> data from many MPI processes to >>> "master" MPI >>> >>>>>>>>>>>>>>> >> >> processes, each of which taking >>> care of one GPU, and then >>> >>>>>>>>>>>>>>> upload >>> >>>>>>>>>>>>>>> >> the data to GPU to >>> >>>>>>>>>>>>>>> >> >> solve. >>> >>>>>>>>>>>>>>> >> >> One can just grab some codes from >>> mumps.c to >>> >>>>>>>>>>>>>>> aijcusparse.cu <http://aijcusparse.cu/> >>> <http://aijcusparse.cu <http://aijcusparse.cu/>> >>> >>>>>>>>>>>>>>> >> <http://aijcusparse.cu >>> <http://aijcusparse.cu/> <http://aijcusparse.cu >>> <http://aijcusparse.cu/>>>. >>> >>>>>>>>>>>>>>> >> > >>> >>>>>>>>>>>>>>> >> > mumps.c doesn't actually do >>> that. It never needs to >>> >>>>>>>>>>>>>>> copy the >>> >>>>>>>>>>>>>>> >> entire matrix to a single MPI rank. >>> >>>>>>>>>>>>>>> >> > >>> >>>>>>>>>>>>>>> >> > It would be possible to write >>> such a code that you >>> >>>>>>>>>>>>>>> suggest but >>> >>>>>>>>>>>>>>> >> it is not clear that it makes sense >>> >>>>>>>>>>>>>>> >> > >>> >>>>>>>>>>>>>>> >> > 1) For normal PETSc GPU usage >>> there is one GPU per MPI >>> >>>>>>>>>>>>>>> rank, so >>> >>>>>>>>>>>>>>> >> while your one GPU per big domain is >>> solving its systems the >>> >>>>>>>>>>>>>>> other >>> >>>>>>>>>>>>>>> >> GPUs (with the other MPI ranks that >>> share that domain) are doing >>> >>>>>>>>>>>>>>> >> nothing. >>> >>>>>>>>>>>>>>> >> > >>> >>>>>>>>>>>>>>> >> > 2) For each triangular solve you >>> would have to gather the >>> >>>>>>>>>>>>>>> right >>> >>>>>>>>>>>>>>> >> hand side from the multiple ranks to >>> the single GPU to pass it to >>> >>>>>>>>>>>>>>> >> the GPU solver and then scatter the >>> resulting solution back >>> >>>>>>>>>>>>>>> to all >>> >>>>>>>>>>>>>>> >> of its subdomain ranks. >>> >>>>>>>>>>>>>>> >> > >>> >>>>>>>>>>>>>>> >> > What I was suggesting was assign >>> an entire subdomain to a >>> >>>>>>>>>>>>>>> >> single MPI rank, thus it does >>> everything on one GPU and can >>> >>>>>>>>>>>>>>> use the >>> >>>>>>>>>>>>>>> >> GPU solver directly. If all the major >>> computations of a subdomain >>> >>>>>>>>>>>>>>> >> can fit and be done on a single GPU >>> then you would be >>> >>>>>>>>>>>>>>> utilizing all >>> >>>>>>>>>>>>>>> >> the GPUs you are using effectively. >>> >>>>>>>>>>>>>>> >> > >>> >>>>>>>>>>>>>>> >> > Barry >>> >>>>>>>>>>>>>>> >> > >>> >>>>>>>>>>>>>>> >> > >>> >>>>>>>>>>>>>>> >> > >>> >>>>>>>>>>>>>>> >> >> >>> >>>>>>>>>>>>>>> >> >> Chang >>> >>>>>>>>>>>>>>> >> >> >>> >>>>>>>>>>>>>>> >> >> On 10/13/21 1:53 PM, Barry Smith >>> wrote: >>> >>>>>>>>>>>>>>> >> >>> Chang, >>> >>>>>>>>>>>>>>> >> >>> You are correct there is no >>> MPI + GPU direct >>> >>>>>>>>>>>>>>> solvers that >>> >>>>>>>>>>>>>>> >> currently do the triangular solves >>> with MPI + GPU parallelism >>> >>>>>>>>>>>>>>> that I >>> >>>>>>>>>>>>>>> >> am aware of. You are limited that >>> individual triangular solves be >>> >>>>>>>>>>>>>>> >> done on a single GPU. I can only >>> suggest making each subdomain as >>> >>>>>>>>>>>>>>> >> big as possible to utilize each GPU as >>> much as possible for the >>> >>>>>>>>>>>>>>> >> direct triangular solves. >>> >>>>>>>>>>>>>>> >> >>> Barry >>> >>>>>>>>>>>>>>> >> >>>> On Oct 13, 2021, at 12:16 PM, >>> Chang Liu via petsc-users >>> >>>>>>>>>>>>>>> >> <petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>>> >>> >>>>>>>>>>>>>>> wrote: >>> >>>>>>>>>>>>>>> >> >>>> >>> >>>>>>>>>>>>>>> >> >>>> Hi Mark, >>> >>>>>>>>>>>>>>> >> >>>> >>> >>>>>>>>>>>>>>> >> >>>> '-mat_type aijcusparse' works >>> with mpiaijcusparse with >>> >>>>>>>>>>>>>>> other >>> >>>>>>>>>>>>>>> >> solvers, but with >>> -pc_factor_mat_solver_type cusparse, it >>> >>>>>>>>>>>>>>> will give >>> >>>>>>>>>>>>>>> >> an error. >>> >>>>>>>>>>>>>>> >> >>>> >>> >>>>>>>>>>>>>>> >> >>>> Yes what I want is to have mumps >>> or superlu to do the >>> >>>>>>>>>>>>>>> >> factorization, and then do the rest, >>> including GMRES solver, >>> >>>>>>>>>>>>>>> on gpu. >>> >>>>>>>>>>>>>>> >> Is that possible? >>> >>>>>>>>>>>>>>> >> >>>> >>> >>>>>>>>>>>>>>> >> >>>> I have tried to use aijcusparse >>> with superlu_dist, it >>> >>>>>>>>>>>>>>> runs but >>> >>>>>>>>>>>>>>> >> the iterative solver is still running >>> on CPUs. I have >>> >>>>>>>>>>>>>>> contacted the >>> >>>>>>>>>>>>>>> >> superlu group and they confirmed that >>> is the case right now. >>> >>>>>>>>>>>>>>> But if >>> >>>>>>>>>>>>>>> >> I set -pc_factor_mat_solver_type >>> cusparse, it seems that the >>> >>>>>>>>>>>>>>> >> iterative solver is running on GPU. >>> >>>>>>>>>>>>>>> >> >>>> >>> >>>>>>>>>>>>>>> >> >>>> Chang >>> >>>>>>>>>>>>>>> >> >>>> >>> >>>>>>>>>>>>>>> >> >>>> On 10/13/21 12:03 PM, Mark Adams >>> wrote: >>> >>>>>>>>>>>>>>> >> >>>>> On Wed, Oct 13, 2021 at 11:10 >>> AM Chang Liu >>> >>>>>>>>>>>>>>> <c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov>>>>> wrote: >>> >>>>>>>>>>>>>>> >> >>>>> Thank you Junchao for >>> explaining this. I guess in >>> >>>>>>>>>>>>>>> my case >>> >>>>>>>>>>>>>>> >> the code is >>> >>>>>>>>>>>>>>> >> >>>>> just calling a seq solver >>> like superlu to do >>> >>>>>>>>>>>>>>> >> factorization on GPUs. >>> >>>>>>>>>>>>>>> >> >>>>> My idea is that I want to >>> have a traditional MPI >>> >>>>>>>>>>>>>>> code to >>> >>>>>>>>>>>>>>> >> utilize GPUs >>> >>>>>>>>>>>>>>> >> >>>>> with cusparse. Right now >>> cusparse does not support >>> >>>>>>>>>>>>>>> mpiaij >>> >>>>>>>>>>>>>>> >> matrix, Sure it does: '-mat_type >>> aijcusparse' will give you an >>> >>>>>>>>>>>>>>> >> mpiaijcusparse matrix with > 1 processes. >>> >>>>>>>>>>>>>>> >> >>>>> (-mat_type mpiaijcusparse might >>> also work with >1 proc). >>> >>>>>>>>>>>>>>> >> >>>>> However, I see in grepping the >>> repo that all the mumps and >>> >>>>>>>>>>>>>>> >> superlu tests use aij or sell matrix type. >>> >>>>>>>>>>>>>>> >> >>>>> MUMPS and SuperLU provide their >>> own solves, I assume >>> >>>>>>>>>>>>>>> .... but >>> >>>>>>>>>>>>>>> >> you might want to do other matrix >>> operations on the GPU. Is >>> >>>>>>>>>>>>>>> that the >>> >>>>>>>>>>>>>>> >> issue? >>> >>>>>>>>>>>>>>> >> >>>>> Did you try -mat_type >>> aijcusparse with MUMPS and/or >>> >>>>>>>>>>>>>>> SuperLU >>> >>>>>>>>>>>>>>> >> have a problem? (no test with it so it >>> probably does not work) >>> >>>>>>>>>>>>>>> >> >>>>> Thanks, >>> >>>>>>>>>>>>>>> >> >>>>> Mark >>> >>>>>>>>>>>>>>> >> >>>>> so I >>> >>>>>>>>>>>>>>> >> >>>>> want the code to have a >>> mpiaij matrix when adding >>> >>>>>>>>>>>>>>> all the >>> >>>>>>>>>>>>>>> >> matrix terms, >>> >>>>>>>>>>>>>>> >> >>>>> and then transform the >>> matrix to seqaij when doing the >>> >>>>>>>>>>>>>>> >> factorization >>> >>>>>>>>>>>>>>> >> >>>>> and >>> >>>>>>>>>>>>>>> >> >>>>> solve. This involves >>> sending the data to the master >>> >>>>>>>>>>>>>>> >> process, and I >>> >>>>>>>>>>>>>>> >> >>>>> think >>> >>>>>>>>>>>>>>> >> >>>>> the petsc mumps solver have >>> something similar already. >>> >>>>>>>>>>>>>>> >> >>>>> Chang >>> >>>>>>>>>>>>>>> >> >>>>> On 10/13/21 10:18 AM, >>> Junchao Zhang wrote: >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > On Tue, Oct 12, 2021 at >>> 1:07 PM Mark Adams >>> >>>>>>>>>>>>>>> >> <mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov> <mailto:mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov>> >>> >>>>>>>>>>>>>>> <mailto:mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov> <mailto:mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov>>> >>> >>>>>>>>>>>>>>> >> >>>>> <mailto:mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov> <mailto:mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov>> >>> >>>>>>>>>>>>>>> <mailto:mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov> <mailto:mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov>>>> >>> >>>>>>>>>>>>>>> >> >>>>> > <mailto:mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov> >>> >>>>>>>>>>>>>>> <mailto:mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov>> <mailto:mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov> >>> >>>>>>>>>>>>>>> <mailto:mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov>>> >>> >>>>>>>>>>>>>>> >> <mailto:mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov> <mailto:mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov>> >>> >>>>>>>>>>>>>>> <mailto:mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov> <mailto:mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov>>>>>> wrote: >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > On Tue, Oct 12, 2021 >>> at 1:45 PM Chang Liu >>> >>>>>>>>>>>>>>> >> <c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> >>>>> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>> >>> >>>>>>>>>>>>>>> >> >>>>> > <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>>> wrote: >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > Hi Mark, >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > The option I use >>> is like >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > -pc_type bjacobi >>> -pc_bjacobi_blocks 16 >>> >>>>>>>>>>>>>>> >> -ksp_type fgmres >>> >>>>>>>>>>>>>>> >> >>>>> -mat_type >>> >>>>>>>>>>>>>>> >> >>>>> > aijcusparse >>> *-sub_pc_factor_mat_solver_type >>> >>>>>>>>>>>>>>> >> cusparse >>> >>>>>>>>>>>>>>> >> >>>>> *-sub_ksp_type >>> >>>>>>>>>>>>>>> >> >>>>> > preonly >>> *-sub_pc_type lu* -ksp_max_it 2000 >>> >>>>>>>>>>>>>>> >> -ksp_rtol 1.e-300 >>> >>>>>>>>>>>>>>> >> >>>>> > -ksp_atol 1.e-300 >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > Note, If you use >>> -log_view the last column >>> >>>>>>>>>>>>>>> (rows >>> >>>>>>>>>>>>>>> >> are the >>> >>>>>>>>>>>>>>> >> >>>>> method like >>> >>>>>>>>>>>>>>> >> >>>>> > MatFactorNumeric) >>> has the percent of work >>> >>>>>>>>>>>>>>> in the GPU. >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > Junchao: *This* >>> implies that we have a >>> >>>>>>>>>>>>>>> cuSparse LU >>> >>>>>>>>>>>>>>> >> >>>>> factorization. Is >>> >>>>>>>>>>>>>>> >> >>>>> > that correct? (I >>> don't think we do) >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > No, we don't have >>> cuSparse LU factorization. If you check >>> >>>>>>>>>>>>>>> >> >>>>> > >>> MatLUFactorSymbolic_SeqAIJCUSPARSE(),you will >>> >>>>>>>>>>>>>>> find it >>> >>>>>>>>>>>>>>> >> calls >>> >>>>>>>>>>>>>>> >> >>>>> > >>> MatLUFactorSymbolic_SeqAIJ() instead. >>> >>>>>>>>>>>>>>> >> >>>>> > So I don't understand >>> Chang's idea. Do you want to >>> >>>>>>>>>>>>>>> >> make bigger >>> >>>>>>>>>>>>>>> >> >>>>> blocks? >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > I think this one >>> do both factorization and >>> >>>>>>>>>>>>>>> >> solve on gpu. >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > You can check the >>> >>>>>>>>>>>>>>> runex72_aijcusparse.sh file >>> >>>>>>>>>>>>>>> >> in petsc >>> >>>>>>>>>>>>>>> >> >>>>> install >>> >>>>>>>>>>>>>>> >> >>>>> > directory, and >>> try it your self (this >>> >>>>>>>>>>>>>>> is only lu >>> >>>>>>>>>>>>>>> >> >>>>> factorization >>> >>>>>>>>>>>>>>> >> >>>>> > without >>> >>>>>>>>>>>>>>> >> >>>>> > iterative solve). >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > Chang >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > On 10/12/21 1:17 >>> PM, Mark Adams wrote: >>> >>>>>>>>>>>>>>> >> >>>>> > > >>> >>>>>>>>>>>>>>> >> >>>>> > > >>> >>>>>>>>>>>>>>> >> >>>>> > > On Tue, Oct >>> 12, 2021 at 11:19 AM >>> >>>>>>>>>>>>>>> Chang Liu >>> >>>>>>>>>>>>>>> >> >>>>> <c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>> >>> >>>>>>>>>>>>>>> >> >>>>> > >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>> >>> >>>>>>>>>>>>>>> >> >>>>> > > >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>> >>> >>>>>>>>>>>>>>> >> >>>>> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>>>> wrote: >>> >>>>>>>>>>>>>>> >> >>>>> > > >>> >>>>>>>>>>>>>>> >> >>>>> > > Hi Junchao, >>> >>>>>>>>>>>>>>> >> >>>>> > > >>> >>>>>>>>>>>>>>> >> >>>>> > > No I only >>> needs it to be transferred >>> >>>>>>>>>>>>>>> >> within a >>> >>>>>>>>>>>>>>> >> >>>>> node. I use >>> >>>>>>>>>>>>>>> >> >>>>> > block-Jacobi >>> >>>>>>>>>>>>>>> >> >>>>> > > method >>> and GMRES to solve the sparse >>> >>>>>>>>>>>>>>> >> matrix, so each >>> >>>>>>>>>>>>>>> >> >>>>> > direct solver will >>> >>>>>>>>>>>>>>> >> >>>>> > > take care >>> of a sub-block of the >>> >>>>>>>>>>>>>>> whole >>> >>>>>>>>>>>>>>> >> matrix. In this >>> >>>>>>>>>>>>>>> >> >>>>> > way, I can use >>> >>>>>>>>>>>>>>> >> >>>>> > > one >>> >>>>>>>>>>>>>>> >> >>>>> > > GPU to >>> solve one sub-block, which is >>> >>>>>>>>>>>>>>> >> stored within >>> >>>>>>>>>>>>>>> >> >>>>> one node. >>> >>>>>>>>>>>>>>> >> >>>>> > > >>> >>>>>>>>>>>>>>> >> >>>>> > > It was >>> stated in the >>> >>>>>>>>>>>>>>> documentation that >>> >>>>>>>>>>>>>>> >> cusparse >>> >>>>>>>>>>>>>>> >> >>>>> solver >>> >>>>>>>>>>>>>>> >> >>>>> > is slow. >>> >>>>>>>>>>>>>>> >> >>>>> > > However, >>> in my test using >>> >>>>>>>>>>>>>>> ex72.c, the >>> >>>>>>>>>>>>>>> >> cusparse >>> >>>>>>>>>>>>>>> >> >>>>> solver is >>> >>>>>>>>>>>>>>> >> >>>>> > faster than >>> >>>>>>>>>>>>>>> >> >>>>> > > mumps or >>> superlu_dist on CPUs. >>> >>>>>>>>>>>>>>> >> >>>>> > > >>> >>>>>>>>>>>>>>> >> >>>>> > > >>> >>>>>>>>>>>>>>> >> >>>>> > > Are we >>> talking about the >>> >>>>>>>>>>>>>>> factorization, the >>> >>>>>>>>>>>>>>> >> solve, or >>> >>>>>>>>>>>>>>> >> >>>>> both? >>> >>>>>>>>>>>>>>> >> >>>>> > > >>> >>>>>>>>>>>>>>> >> >>>>> > > We do not >>> have an interface to >>> >>>>>>>>>>>>>>> cuSparse's LU >>> >>>>>>>>>>>>>>> >> >>>>> factorization (I >>> >>>>>>>>>>>>>>> >> >>>>> > just >>> >>>>>>>>>>>>>>> >> >>>>> > > learned that >>> it exists a few weeks ago). >>> >>>>>>>>>>>>>>> >> >>>>> > > Perhaps your >>> fast "cusparse solver" is >>> >>>>>>>>>>>>>>> >> '-pc_type lu >>> >>>>>>>>>>>>>>> >> >>>>> -mat_type >>> >>>>>>>>>>>>>>> >> >>>>> > > aijcusparse' >>> ? This would be the CPU >>> >>>>>>>>>>>>>>> >> factorization, >>> >>>>>>>>>>>>>>> >> >>>>> which is the >>> >>>>>>>>>>>>>>> >> >>>>> > > dominant cost. >>> >>>>>>>>>>>>>>> >> >>>>> > > >>> >>>>>>>>>>>>>>> >> >>>>> > > >>> >>>>>>>>>>>>>>> >> >>>>> > > Chang >>> >>>>>>>>>>>>>>> >> >>>>> > > >>> >>>>>>>>>>>>>>> >> >>>>> > > On >>> 10/12/21 10:24 AM, Junchao >>> >>>>>>>>>>>>>>> Zhang wrote: >>> >>>>>>>>>>>>>>> >> >>>>> > > > Hi, Chang, >>> >>>>>>>>>>>>>>> >> >>>>> > > > For >>> the mumps solver, we >>> >>>>>>>>>>>>>>> usually >>> >>>>>>>>>>>>>>> >> transfers >>> >>>>>>>>>>>>>>> >> >>>>> matrix >>> >>>>>>>>>>>>>>> >> >>>>> > and vector >>> >>>>>>>>>>>>>>> >> >>>>> > > data >>> >>>>>>>>>>>>>>> >> >>>>> > > > within >>> a compute node. For >>> >>>>>>>>>>>>>>> the idea you >>> >>>>>>>>>>>>>>> >> >>>>> propose, it >>> >>>>>>>>>>>>>>> >> >>>>> > looks like >>> >>>>>>>>>>>>>>> >> >>>>> > > we need >>> >>>>>>>>>>>>>>> >> >>>>> > > > to >>> gather data within >>> >>>>>>>>>>>>>>> >> MPI_COMM_WORLD, right? >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> >>>>>>>>>>>>>>> >> >>>>> > > > Mark, >>> I remember you said >>> >>>>>>>>>>>>>>> >> cusparse solve is >>> >>>>>>>>>>>>>>> >> >>>>> slow >>> >>>>>>>>>>>>>>> >> >>>>> > and you would >>> >>>>>>>>>>>>>>> >> >>>>> > > > rather >>> do it on CPU. Is it right? >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> --Junchao Zhang >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> >>>>>>>>>>>>>>> >> >>>>> > > > On >>> Mon, Oct 11, 2021 at 10:25 PM >>> >>>>>>>>>>>>>>> >> Chang Liu via >>> >>>>>>>>>>>>>>> >> >>>>> petsc-users >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>> >>> >>>>>>>>>>>>>>> >> >>>>> >>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>>> >>> >>>>>>>>>>>>>>> >> >>>>> > >>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>> >>> >>>>>>>>>>>>>>> >> >>>>> >>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>>>> >>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>> >>> >>>>>>>>>>>>>>> >> >>>>> >>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>>> >>> >>>>>>>>>>>>>>> >> >>>>> > >>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>> >>> >>>>>>>>>>>>>>> >> >>>>> >>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>>>>> >>> >>>>>>>>>>>>>>> >> >>>>> > > >>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>> >>> >>>>>>>>>>>>>>> >> >>>>> >>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>>> >>> >>>>>>>>>>>>>>> >> >>>>> > >>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>> >>> >>>>>>>>>>>>>>> >> >>>>> >>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>>>> >>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>> >>> >>>>>>>>>>>>>>> >> >>>>> >>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>>> >>> >>>>>>>>>>>>>>> >> >>>>> > >>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>> >>> >>>>>>>>>>>>>>> >> >>>>> >>> <mailto:petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov> >>> >>>>>>>>>>>>>>> <mailto:petsc-users@mcs.anl.gov >>> <mailto:petsc-users@mcs.anl.gov>>>>>>>> >>> >>>>>>>>>>>>>>> >> >>>>> > > wrote: >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> >>>>>>>>>>>>>>> >> >>>>> > > > Hi, >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> Currently, it is possible >>> >>>>>>>>>>>>>>> to use >>> >>>>>>>>>>>>>>> >> mumps >>> >>>>>>>>>>>>>>> >> >>>>> solver in >>> >>>>>>>>>>>>>>> >> >>>>> > PETSC with >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> -mat_mumps_use_omp_threads >>> >>>>>>>>>>>>>>> >> option, so that >>> >>>>>>>>>>>>>>> >> >>>>> > multiple MPI >>> >>>>>>>>>>>>>>> >> >>>>> > > processes >>> will >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> transfer the matrix and >>> >>>>>>>>>>>>>>> rhs data >>> >>>>>>>>>>>>>>> >> to the master >>> >>>>>>>>>>>>>>> >> >>>>> > rank, and then >>> >>>>>>>>>>>>>>> >> >>>>> > > master >>> >>>>>>>>>>>>>>> >> >>>>> > > > rank >>> will call mumps with >>> >>>>>>>>>>>>>>> OpenMP >>> >>>>>>>>>>>>>>> >> to solve >>> >>>>>>>>>>>>>>> >> >>>>> the matrix. >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> >>>>>>>>>>>>>>> >> >>>>> > > > I >>> wonder if someone can >>> >>>>>>>>>>>>>>> develop >>> >>>>>>>>>>>>>>> >> similar >>> >>>>>>>>>>>>>>> >> >>>>> option for >>> >>>>>>>>>>>>>>> >> >>>>> > cusparse >>> >>>>>>>>>>>>>>> >> >>>>> > > solver. >>> >>>>>>>>>>>>>>> >> >>>>> > > > Right >>> now, this solver >>> >>>>>>>>>>>>>>> does not >>> >>>>>>>>>>>>>>> >> work with >>> >>>>>>>>>>>>>>> >> >>>>> > mpiaijcusparse. I >>> >>>>>>>>>>>>>>> >> >>>>> > > think a >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> possible workaround is to >>> >>>>>>>>>>>>>>> >> transfer all the >>> >>>>>>>>>>>>>>> >> >>>>> matrix >>> >>>>>>>>>>>>>>> >> >>>>> > data to one MPI >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> process, and then upload the >>> >>>>>>>>>>>>>>> >> data to GPU to >>> >>>>>>>>>>>>>>> >> >>>>> solve. >>> >>>>>>>>>>>>>>> >> >>>>> > In this >>> >>>>>>>>>>>>>>> >> >>>>> > > way, one can >>> >>>>>>>>>>>>>>> >> >>>>> > > > use >>> cusparse solver for a MPI >>> >>>>>>>>>>>>>>> >> program. >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> >>>>>>>>>>>>>>> >> >>>>> > > > Chang >>> >>>>>>>>>>>>>>> >> >>>>> > > > -- >>> >>>>>>>>>>>>>>> >> >>>>> > > > Chang >>> Liu >>> >>>>>>>>>>>>>>> >> >>>>> > > > Staff >>> Research Physicist >>> >>>>>>>>>>>>>>> >> >>>>> > > > +1 >>> 609 243 3438 >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>> >>> >>>>>>>>>>>>>>> >> >>>>> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>> >>> >>>>>>>>>>>>>>> >> >>>>> > >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>> >>> >>>>>>>>>>>>>>> >> >>>>> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>>> >>> >>>>>>>>>>>>>>> >> >>>>> > >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>> >>> >>>>>>>>>>>>>>> >> >>>>> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>> >>> >>>>>>>>>>>>>>> >> >>>>> > > >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>> >>> >>>>>>>>>>>>>>> >> >>>>> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>>>> >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> Princeton Plasma Physics >>> >>>>>>>>>>>>>>> Laboratory >>> >>>>>>>>>>>>>>> >> >>>>> > > > 100 >>> Stellarator Rd, >>> >>>>>>>>>>>>>>> Princeton NJ >>> >>>>>>>>>>>>>>> >> 08540, USA >>> >>>>>>>>>>>>>>> >> >>>>> > > > >>> >>>>>>>>>>>>>>> >> >>>>> > > >>> >>>>>>>>>>>>>>> >> >>>>> > > -- >>> >>>>>>>>>>>>>>> >> >>>>> > > Chang Liu >>> >>>>>>>>>>>>>>> >> >>>>> > > Staff >>> Research Physicist >>> >>>>>>>>>>>>>>> >> >>>>> > > +1 609 >>> 243 3438 >>> >>>>>>>>>>>>>>> >> >>>>> > > c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>> >>> >>>>>>>>>>>>>>> >> >>>>> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> >>>>> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>> >>> >>>>>>>>>>>>>>> >> >>>>> > >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>>> >>> >>>>>>>>>>>>>>> >> >>>>> > > Princeton >>> Plasma Physics Laboratory >>> >>>>>>>>>>>>>>> >> >>>>> > > 100 >>> Stellarator Rd, Princeton NJ >>> >>>>>>>>>>>>>>> 08540, USA >>> >>>>>>>>>>>>>>> >> >>>>> > > >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> > -- >>> >>>>>>>>>>>>>>> >> >>>>> > Chang Liu >>> >>>>>>>>>>>>>>> >> >>>>> > Staff Research >>> Physicist >>> >>>>>>>>>>>>>>> >> >>>>> > +1 609 243 3438 >>> >>>>>>>>>>>>>>> >> >>>>> > c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> >>>>> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>>> >>> >>>>>>>>>>>>>>> >> >>>>> > Princeton Plasma >>> Physics Laboratory >>> >>>>>>>>>>>>>>> >> >>>>> > 100 Stellarator >>> Rd, Princeton NJ 08540, USA >>> >>>>>>>>>>>>>>> >> >>>>> > >>> >>>>>>>>>>>>>>> >> >>>>> -- Chang Liu >>> >>>>>>>>>>>>>>> >> >>>>> Staff Research Physicist >>> >>>>>>>>>>>>>>> >> >>>>> +1 609 243 3438 >>> >>>>>>>>>>>>>>> >> >>>>> c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> >> <mailto:c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>>> >>> >>>>>>>>>>>>>>> >> >>>>> Princeton Plasma Physics >>> Laboratory >>> >>>>>>>>>>>>>>> >> >>>>> 100 Stellarator Rd, >>> Princeton NJ 08540, USA >>> >>>>>>>>>>>>>>> >> >>>> >>> >>>>>>>>>>>>>>> >> >>>> -- >>> >>>>>>>>>>>>>>> >> >>>> Chang Liu >>> >>>>>>>>>>>>>>> >> >>>> Staff Research Physicist >>> >>>>>>>>>>>>>>> >> >>>> +1 609 243 3438 >>> >>>>>>>>>>>>>>> >> >>>> c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> >>>> Princeton Plasma Physics Laboratory >>> >>>>>>>>>>>>>>> >> >>>> 100 Stellarator Rd, Princeton NJ >>> 08540, USA >>> >>>>>>>>>>>>>>> >> >> >>> >>>>>>>>>>>>>>> >> >> -- >>> >>>>>>>>>>>>>>> >> >> Chang Liu >>> >>>>>>>>>>>>>>> >> >> Staff Research Physicist >>> >>>>>>>>>>>>>>> >> >> +1 609 243 3438 >>> >>>>>>>>>>>>>>> >> >> c...@pppl.gov >>> <mailto:c...@pppl.gov> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> >> Princeton Plasma Physics Laboratory >>> >>>>>>>>>>>>>>> >> >> 100 Stellarator Rd, Princeton NJ >>> 08540, USA >>> >>>>>>>>>>>>>>> >> > >>> >>>>>>>>>>>>>>> >> -- Chang Liu >>> >>>>>>>>>>>>>>> >> Staff Research Physicist >>> >>>>>>>>>>>>>>> >> +1 609 243 3438 >>> >>>>>>>>>>>>>>> >> c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>>>>>>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>>> >>> >>>>>>>>>>>>>>> >> Princeton Plasma Physics Laboratory >>> >>>>>>>>>>>>>>> >> 100 Stellarator Rd, Princeton NJ >>> 08540, USA >>> >>>>>>>>>>>>>>> > >>> >>>>>>>>>>>>>>> > -- >>> >>>>>>>>>>>>>>> > Chang Liu >>> >>>>>>>>>>>>>>> > Staff Research Physicist >>> >>>>>>>>>>>>>>> > +1 609 243 3438 >>> >>>>>>>>>>>>>>> > c...@pppl.gov <mailto:c...@pppl.gov> >>> <mailto:c...@pppl.gov <mailto:c...@pppl.gov>> >>> >>>>>>>>>>>>>>> > Princeton Plasma Physics Laboratory >>> >>>>>>>>>>>>>>> > 100 Stellarator Rd, Princeton NJ 08540, USA >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> -- >>> >>>>>>>>>>>>>> Chang Liu >>> >>>>>>>>>>>>>> Staff Research Physicist >>> >>>>>>>>>>>>>> +1 609 243 3438 >>> >>>>>>>>>>>>>> c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>>>>>> Princeton Plasma Physics Laboratory >>> >>>>>>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> -- >>> >>>>>>>>>>>> Chang Liu >>> >>>>>>>>>>>> Staff Research Physicist >>> >>>>>>>>>>>> +1 609 243 3438 >>> >>>>>>>>>>>> c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>>>> Princeton Plasma Physics Laboratory >>> >>>>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA >>> >>>>>>>>>> >>> >>>>>>>>>> -- >>> >>>>>>>>>> Chang Liu >>> >>>>>>>>>> Staff Research Physicist >>> >>>>>>>>>> +1 609 243 3438 >>> >>>>>>>>>> c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>>>> Princeton Plasma Physics Laboratory >>> >>>>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA >>> >>>>>>>> >>> >>>>>>>> -- >>> >>>>>>>> Chang Liu >>> >>>>>>>> Staff Research Physicist >>> >>>>>>>> +1 609 243 3438 >>> >>>>>>>> c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>>>>>> Princeton Plasma Physics Laboratory >>> >>>>>>>> 100 Stellarator Rd, Princeton NJ 08540, USA >>> >>>>>>> >>> >>>>>> >>> >>>> >>> >>>> -- >>> >>>> Chang Liu >>> >>>> Staff Research Physicist >>> >>>> +1 609 243 3438 >>> >>>> c...@pppl.gov <mailto:c...@pppl.gov> >>> >>>> Princeton Plasma Physics Laboratory >>> >>>> 100 Stellarator Rd, Princeton NJ 08540, USA >>> >> >>> >> -- >>> >> Chang Liu >>> >> Staff Research Physicist >>> >> +1 609 243 3438 >>> >> c...@pppl.gov <mailto:c...@pppl.gov> >>> >> Princeton Plasma Physics Laboratory >>> >> 100 Stellarator Rd, Princeton NJ 08540, USA >>> > >>> >>> -- Chang Liu >>> Staff Research Physicist >>> +1 609 243 3438 >>> c...@pppl.gov <mailto:c...@pppl.gov> >>> Princeton Plasma Physics Laboratory >>> 100 Stellarator Rd, Princeton NJ 08540, USA >>> > > -- > Chang Liu > Staff Research Physicist > +1 609 243 3438 > c...@pppl.gov > Princeton Plasma Physics Laboratory > 100 Stellarator Rd, Princeton NJ 08540, USA