Hi, Maruthi, I could run your example on my machine. BTW, I added these at the end of main() to free petsc objects. VecDestroy(&vout); VecDestroy(&x); VecDestroy(&b); VecDestroy(&u); MatDestroy(&A); VecScatterDestroy(&ctx); KSPDestroy(&ksp);
If you use cuda-12.2, maybe the problem is already fixed by MR https://gitlab.com/petsc/petsc/-/merge_requests/6828 You can use petsc/main branch to try. Note your petsc version is from Date: 2023-08-13 Thanks. --Junchao Zhang On Mon, Sep 11, 2023 at 12:10 PM Maruthi NH <maruth...@gmail.com> wrote: > Hi Barry Smith, > > Thanks for the quick response. > > Here is the code I used to test PETSc on GPU. > This is the command I used to run > mpiexec.hydra -n 1 ./heat_diff_cu -Nx 10000000 -ksp_type gmres -mat_type > aijcusparse -vec_type cuda -use_gpu_aware_mpi 0 -pc_type gamg > -ksp_converged_reason > > Regards, > Maruthi > > > > On Sun, Sep 10, 2023 at 11:37 PM Barry Smith <bsm...@petsc.dev> wrote: > >> >> >> On Sep 10, 2023, at 5:54 AM, Maruthi NH <maruth...@gmail.com> wrote: >> >> Hi all, >> >> I am trying to accelerate the linear solver with PETSc GPU backend. For >> testing I have a simple 1D heat diffusion solver, here are some >> observations. >> 1. If I use -pc_type gamg it throws the following error >> ** On entry to cusparseCreateCsr() parameter number 5 (csrRowOffsets) >> had an illegal value: NULL pointer >> >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: GPU error >> [0]PETSC ERROR: cuSPARSE errorcode 3 (CUSPARSE_STATUS_INVALID_VALUE) : >> invalid value >> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. >> [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-959-g92f1e92e88 >> GIT Date: 2023-08-13 19:43:04 +0000 >> >> Can you share the code that triggers this? >> >> 2. Default pc ilu takes about 1.2 seconds on a single CPU and it takes >> about 105.9 seconds on a GPU. Similar observations with pc_type asm >> I have NVIDIA RTX A2000 8GB Laptop GPU >> >> >> This is expected. The triangular solves sequentialize on the GPU so >> naturally are extremely slow since they cannot take advantage of the >> massive parallelism of the GPU. >> >> >> 3. What I could be missing? Also, are there any general guidelines for >> better GPU performance using PETSc? >> >> Regards, >> Maruthi >> >> >>