> On 14 Dec 2023, at 8:02 PM, Sreeram R Venkat <srven...@utexas.edu> wrote: > > Hello Pierre, > > Thank you for your reply. I tried out the HPDDM CG as you said, and it seems > to be doing the batched solves, but the KSP is not converging due to a NaN or > Inf being generated. I also noticed there are a lot of host-to-device and > device-to-host copies of the matrices (the non-batched KSP solve did not have > any memcopies). I have attached dump.0 again. Could you please take a look?
Yes, but you’d need to send me something I can run with your set of options (if you are more confident doing this in private, you can remove the list from c/c). Not all BoomerAMG smoothers handle blocks of right-hand sides, and there is not much error checking, so instead of erroring out, this may be the reason why you are getting garbage. Thanks, Pierre > Thanks, > Sreeram > > On Thu, Dec 14, 2023 at 12:42 AM Pierre Jolivet <pie...@joliv.et > <mailto:pie...@joliv.et>> wrote: >> Hello Sreeram, >> KSPCG (PETSc implementation of CG) does not handle solves with multiple >> columns at once. >> There is only a single native PETSc KSP implementation which handles solves >> with multiple columns at once: KSPPREONLY. >> If you use --download-hpddm, you can use a CG (or GMRES, or more advanced >> methods) implementation which handles solves with multiple columns at once >> (via -ksp_type hpddm -ksp_hpddm_type cg or KSPSetType(ksp, KSPHPDDM); >> KSPHPDDMSetType(ksp, KSP_HPDDM_TYPE_CG);). >> I’m the main author of HPDDM, there is preliminary support for device >> matrices, but if it’s not working as intended/not faster than column by >> column, I’d be happy to have a deeper look (maybe in private), because most >> (if not all) of my users interested in (pseudo-)block Krylov solvers (i.e., >> solvers that treat right-hand sides in a single go) are using plain host >> matrices. >> >> Thanks, >> Pierre >> >> PS: you could have a look at >> https://www.sciencedirect.com/science/article/abs/pii/S0898122121000055 to >> understand the philosophy behind block iterative methods in PETSc (and in >> HPDDM), src/mat/tests/ex237.c, the benchmark I mentioned earlier, was >> developed in the context of this paper to produce Figures 2-3. Note that >> this paper is now slightly outdated, since then, PCHYPRE and PCMG (among >> others) have been made “PCMatApply()-ready”. >> >>> On 13 Dec 2023, at 11:05 PM, Sreeram R Venkat <srven...@utexas.edu >>> <mailto:srven...@utexas.edu>> wrote: >>> >>> Hello Pierre, >>> >>> I am trying out the KSPMatSolve with the BoomerAMG preconditioner. However, >>> I am noticing that it is still solving column by column (this is stated >>> explicitly in the info dump attached). I looked at the code for >>> KSPMatSolve_Private() and saw that as long as ksp->ops->matsolve is true, >>> it should do the batched solve, though I'm not sure where that gets set. >>> >>> I am using the options -pc_type hypre -pc_hypre_type boomeramg when running >>> the code. >>> >>> Can you please help me with this? >>> >>> Thanks, >>> Sreeram >>> >>> >>> On Thu, Dec 7, 2023 at 4:04 PM Mark Adams <mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov>> wrote: >>>> N.B., AMGX interface is a bit experimental. >>>> Mark >>>> >>>> On Thu, Dec 7, 2023 at 4:11 PM Sreeram R Venkat <srven...@utexas.edu >>>> <mailto:srven...@utexas.edu>> wrote: >>>>> Oh, in that case I will try out BoomerAMG. Getting AMGX to build >>>>> correctly was also tricky so hopefully the HYPRE build will be easier. >>>>> >>>>> Thanks, >>>>> Sreeram >>>>> >>>>> On Thu, Dec 7, 2023, 3:03 PM Pierre Jolivet <pie...@joliv.et >>>>> <mailto:pie...@joliv.et>> wrote: >>>>>> >>>>>> >>>>>>> On 7 Dec 2023, at 9:37 PM, Sreeram R Venkat <srven...@utexas.edu >>>>>>> <mailto:srven...@utexas.edu>> wrote: >>>>>>> >>>>>>> Thank you Barry and Pierre; I will proceed with the first option. >>>>>>> >>>>>>> I want to use the AMGX preconditioner for the KSP. I will try it out >>>>>>> and see how it performs. >>>>>> >>>>>> Just FYI, AMGX does not handle systems with multiple RHS, and thus has >>>>>> no PCMatApply() implementation. >>>>>> BoomerAMG does, and there is a PCMatApply_HYPRE_BoomerAMG() >>>>>> implementation. >>>>>> But let us know if you need assistance figuring things out. >>>>>> >>>>>> Thanks, >>>>>> Pierre >>>>>> >>>>>>> Thanks, >>>>>>> Sreeram >>>>>>> >>>>>>> On Thu, Dec 7, 2023 at 2:02 PM Pierre Jolivet <pie...@joliv.et >>>>>>> <mailto:pie...@joliv.et>> wrote: >>>>>>>> To expand on Barry’s answer, we have observed repeatedly that >>>>>>>> MatMatMult with MatAIJ performs better than MatMult with MatMAIJ, you >>>>>>>> can reproduce this on your own with >>>>>>>> https://petsc.org/release/src/mat/tests/ex237.c.html. >>>>>>>> Also, I’m guessing you are using some sort of preconditioner within >>>>>>>> your KSP. >>>>>>>> Not all are “KSPMatSolve-ready”, i.e., they may treat blocks of >>>>>>>> right-hand sides column by column, which is very inefficient. >>>>>>>> You could run your code with -info dump and send us dump.0 to see what >>>>>>>> needs to be done on our end to make things more efficient, should you >>>>>>>> not be satisfied with the current performance of the code. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Pierre >>>>>>>> >>>>>>>>> On 7 Dec 2023, at 8:34 PM, Barry Smith <bsm...@petsc.dev >>>>>>>>> <mailto:bsm...@petsc.dev>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Dec 7, 2023, at 1:17 PM, Sreeram R Venkat <srven...@utexas.edu >>>>>>>>>> <mailto:srven...@utexas.edu>> wrote: >>>>>>>>>> >>>>>>>>>> I have 2 sequential matrices M and R (both MATSEQAIJCUSPARSE of size >>>>>>>>>> n x n) and a vector v of size n*m. v = [v_1 , v_2 ,... , v_m] where >>>>>>>>>> v_i has size n. The data for v can be stored either in column-major >>>>>>>>>> or row-major order. Now, I want to do 2 types of operations: >>>>>>>>>> >>>>>>>>>> 1. Matvecs of the form M*v_i = w_i, for i = 1..m. >>>>>>>>>> 2. KSPSolves of the form R*x_i = v_i, for i = 1..m. >>>>>>>>>> >>>>>>>>>> From what I have read on the documentation, I can think of 2 >>>>>>>>>> approaches. >>>>>>>>>> >>>>>>>>>> 1. Get the pointer to the data in v (column-major) and use it to >>>>>>>>>> create a dense matrix V. Then do a MatMatMult with M*V = W, and take >>>>>>>>>> the data pointer of W to create the vector w. For KSPSolves, use >>>>>>>>>> KSPMatSolve with R and V. >>>>>>>>>> >>>>>>>>>> 2. Create a MATMAIJ using M/R and use that for matvecs directly with >>>>>>>>>> the vector v. I don't know if KSPSolve with the MATMAIJ will know >>>>>>>>>> that it is a multiple RHS system and act accordingly. >>>>>>>>>> >>>>>>>>>> Which would be the more efficient option? >>>>>>>>> >>>>>>>>> Use 1. >>>>>>>>>> >>>>>>>>>> As a side-note, I am also wondering if there is a way to use >>>>>>>>>> row-major storage of the vector v. >>>>>>>>> >>>>>>>>> No >>>>>>>>> >>>>>>>>>> The reason is that this could allow for more coalesced memory access >>>>>>>>>> when doing matvecs. >>>>>>>>> >>>>>>>>> PETSc matrix-vector products use BLAS GMEV matrix-vector products >>>>>>>>> for the computation so in theory they should already be well-optimized >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Sreeram >>>>>>>> >>>>>> >>> <dump.0> >> > <dump.0>