Hello Sreeram,
KSPCG (PETSc implementation of CG) does not handle solves with multiple columns 
at once.
There is only a single native PETSc KSP implementation which handles solves 
with multiple columns at once: KSPPREONLY.
If you use --download-hpddm, you can use a CG (or GMRES, or more advanced 
methods) implementation which handles solves with multiple columns at once (via 
-ksp_type hpddm -ksp_hpddm_type cg or KSPSetType(ksp, KSPHPDDM); 
KSPHPDDMSetType(ksp, KSP_HPDDM_TYPE_CG);).
I’m the main author of HPDDM, there is preliminary support for device matrices, 
but if it’s not working as intended/not faster than column by column, I’d be 
happy to have a deeper look (maybe in private), because most (if not all) of my 
users interested in (pseudo-)block Krylov solvers (i.e., solvers that treat 
right-hand sides in a single go) are using plain host matrices.

Thanks,
Pierre

PS: you could have a look at 
https://www.sciencedirect.com/science/article/abs/pii/S0898122121000055 to 
understand the philosophy behind block iterative methods in PETSc (and in 
HPDDM), src/mat/tests/ex237.c, the benchmark I mentioned earlier, was developed 
in the context of this paper to produce Figures 2-3. Note that this paper is 
now slightly outdated, since then, PCHYPRE and PCMG (among others) have been 
made “PCMatApply()-ready”.

> On 13 Dec 2023, at 11:05 PM, Sreeram R Venkat <srven...@utexas.edu> wrote:
> 
> Hello Pierre,
> 
> I am trying out the KSPMatSolve with the BoomerAMG preconditioner. However, I 
> am noticing that it is still solving column by column (this is stated 
> explicitly in the info dump attached). I looked at the code for 
> KSPMatSolve_Private() and saw that as long as ksp->ops->matsolve is true, it 
> should do the batched solve, though I'm not sure where that gets set. 
> 
> I am using the options -pc_type hypre -pc_hypre_type boomeramg when running 
> the code.
> 
> Can you please help me with this?
> 
> Thanks,
> Sreeram
> 
> 
> On Thu, Dec 7, 2023 at 4:04 PM Mark Adams <mfad...@lbl.gov 
> <mailto:mfad...@lbl.gov>> wrote:
>> N.B., AMGX interface is a bit experimental.
>> Mark
>> 
>> On Thu, Dec 7, 2023 at 4:11 PM Sreeram R Venkat <srven...@utexas.edu 
>> <mailto:srven...@utexas.edu>> wrote:
>>> Oh, in that case I will try out BoomerAMG. Getting AMGX to build correctly 
>>> was also tricky so hopefully the HYPRE build will be easier.
>>> 
>>> Thanks,
>>> Sreeram
>>> 
>>> On Thu, Dec 7, 2023, 3:03 PM Pierre Jolivet <pie...@joliv.et 
>>> <mailto:pie...@joliv.et>> wrote:
>>>> 
>>>> 
>>>>> On 7 Dec 2023, at 9:37 PM, Sreeram R Venkat <srven...@utexas.edu 
>>>>> <mailto:srven...@utexas.edu>> wrote:
>>>>> 
>>>>> Thank you Barry and Pierre; I will proceed with the first option. 
>>>>> 
>>>>> I want to use the AMGX preconditioner for the KSP. I will try it out and 
>>>>> see how it performs.
>>>> 
>>>> Just FYI, AMGX does not handle systems with multiple RHS, and thus has no 
>>>> PCMatApply() implementation.
>>>> BoomerAMG does, and there is a PCMatApply_HYPRE_BoomerAMG() implementation.
>>>> But let us know if you need assistance figuring things out.
>>>> 
>>>> Thanks,
>>>> Pierre
>>>> 
>>>>> Thanks,
>>>>> Sreeram
>>>>> 
>>>>> On Thu, Dec 7, 2023 at 2:02 PM Pierre Jolivet <pie...@joliv.et 
>>>>> <mailto:pie...@joliv.et>> wrote:
>>>>>> To expand on Barry’s answer, we have observed repeatedly that MatMatMult 
>>>>>> with MatAIJ performs better than MatMult with MatMAIJ, you can reproduce 
>>>>>> this on your own with 
>>>>>> https://petsc.org/release/src/mat/tests/ex237.c.html.
>>>>>> Also, I’m guessing you are using some sort of preconditioner within your 
>>>>>> KSP.
>>>>>> Not all are “KSPMatSolve-ready”, i.e., they may treat blocks of 
>>>>>> right-hand sides column by column, which is very inefficient.
>>>>>> You could run your code with -info dump and send us dump.0 to see what 
>>>>>> needs to be done on our end to make things more efficient, should you 
>>>>>> not be satisfied with the current performance of the code.
>>>>>> 
>>>>>> Thanks,
>>>>>> Pierre
>>>>>> 
>>>>>>> On 7 Dec 2023, at 8:34 PM, Barry Smith <bsm...@petsc.dev 
>>>>>>> <mailto:bsm...@petsc.dev>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Dec 7, 2023, at 1:17 PM, Sreeram R Venkat <srven...@utexas.edu 
>>>>>>>> <mailto:srven...@utexas.edu>> wrote:
>>>>>>>> 
>>>>>>>> I have 2 sequential matrices M and R (both MATSEQAIJCUSPARSE of size n 
>>>>>>>> x n) and a vector v of size n*m. v = [v_1 , v_2 ,... , v_m] where v_i 
>>>>>>>> has size n. The data for v can be stored either in column-major or 
>>>>>>>> row-major order.  Now, I want to do 2 types of operations:
>>>>>>>> 
>>>>>>>> 1. Matvecs of the form M*v_i = w_i, for i = 1..m. 
>>>>>>>> 2. KSPSolves of the form R*x_i = v_i, for i = 1..m.
>>>>>>>> 
>>>>>>>> From what I have read on the documentation, I can think of 2 
>>>>>>>> approaches. 
>>>>>>>> 
>>>>>>>> 1. Get the pointer to the data in v (column-major) and use it to 
>>>>>>>> create a dense matrix V. Then do a MatMatMult with M*V = W, and take 
>>>>>>>> the data pointer of W to create the vector w. For KSPSolves, use 
>>>>>>>> KSPMatSolve with R and V.
>>>>>>>> 
>>>>>>>> 2. Create a MATMAIJ using M/R and use that for matvecs directly with 
>>>>>>>> the vector v. I don't know if KSPSolve with the MATMAIJ will know that 
>>>>>>>> it is a multiple RHS system and act accordingly.
>>>>>>>> 
>>>>>>>> Which would be the more efficient option?
>>>>>>> 
>>>>>>> Use 1. 
>>>>>>>> 
>>>>>>>> As a side-note, I am also wondering if there is a way to use row-major 
>>>>>>>> storage of the vector v.
>>>>>>> 
>>>>>>> No
>>>>>>> 
>>>>>>>> The reason is that this could allow for more coalesced memory access 
>>>>>>>> when doing matvecs.
>>>>>>> 
>>>>>>>   PETSc matrix-vector products use BLAS GMEV matrix-vector products for 
>>>>>>> the computation so in theory they should already be well-optimized
>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Sreeram
>>>>>> 
>>>> 
> <dump.0>

Reply via email to