Sherry, sorry to ping you again for this issue. --Junchao Zhang
On Tue, Oct 11, 2022 at 11:04 AM Junchao Zhang <junchao.zh...@gmail.com> wrote: > Hi, Sherry, > A petsc user wants to call MatSolve(mat, b, x) multiple times with > different b on GPUs. In petsc, the code is like > > PetscScalar *bptr = NULL; > VecGetArray(b, &bptr) > pdgssvx3d(.., bptr, ..); > > Note VecGetArray() returns a host pointer. If vector b's latest data is on > GPU, PETSc needs to do a device to host memory copy. Now we want to save > this memory copy and directly pass b's device pointer (obtained > via VecCUDAGetArray()) to superlu_dist. But I did not find a mechanism for > me to tell superlu_dist that bptr is a device pointer. > Do you have suggestions? > > Thanks > On Thu, Oct 6, 2022 at 2:32 PM Sajid Ali <sajidsyed2...@u.northwestern.edu> > wrote: > >> Hi PETSc-developers, >> >> Does PETSc currently provide (either native or third party support) for >> MatSolve that can be performed entirely on a GPU given a factored matrix? >> i.e. a direct solver that would store the factors L and U on the device and >> use the GPU to solve the linear system. It does not matter if the GPU is >> not used for the factorization as we intend to solve the same linear system >> for 100s of iterations and thus try to prevent GPU->CPU transfers for the >> MatSolve phase. >> >> Currently, I've built PETSc@main (commit 9c433d, 10/03) with >> superlu-dist@develop, both of which are configured with CUDA. With this, >> I'm seeing that each call to PCApply/MatSolve involves one GPU->CPU >> transfer. Is it possible to avoid this? >> >> Thank You, >> Sajid Ali (he/him) | Research Associate >> Scientific Computing Division >> Fermi National Accelerator Laboratory >> s-sajid-ali.github.io >> >