Hi, Sherry, A petsc user wants to call MatSolve(mat, b, x) multiple times with different b on GPUs. In petsc, the code is like
PetscScalar *bptr = NULL; VecGetArray(b, &bptr) pdgssvx3d(.., bptr, ..); Note VecGetArray() returns a host pointer. If vector b's latest data is on GPU, PETSc needs to do a device to host memory copy. Now we want to save this memory copy and directly pass b's device pointer (obtained via VecCUDAGetArray()) to superlu_dist. But I did not find a mechanism for me to tell superlu_dist that bptr is a device pointer. Do you have suggestions? Thanks On Thu, Oct 6, 2022 at 2:32 PM Sajid Ali <sajidsyed2...@u.northwestern.edu> wrote: > Hi PETSc-developers, > > Does PETSc currently provide (either native or third party support) for > MatSolve that can be performed entirely on a GPU given a factored matrix? > i.e. a direct solver that would store the factors L and U on the device and > use the GPU to solve the linear system. It does not matter if the GPU is > not used for the factorization as we intend to solve the same linear system > for 100s of iterations and thus try to prevent GPU->CPU transfers for the > MatSolve phase. > > Currently, I've built PETSc@main (commit 9c433d, 10/03) with > superlu-dist@develop, both of which are configured with CUDA. With this, > I'm seeing that each call to PCApply/MatSolve involves one GPU->CPU > transfer. Is it possible to avoid this? > > Thank You, > Sajid Ali (he/him) | Research Associate > Scientific Computing Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io >