Currently we are forming the sparse matrices explicitly, but I think the goal is to move towards matrix-free methods and use a stencil, which I suppose is good to use GPUs for and more efficient. On the other hand, I've also read about matrix-free operations in the manual just on the CPUs. Would there be any benefit then to switching to GPU (looks like matrix-free in PETSc is rather straightforward to use, whereas writing the kernel function for GPU stencil would require quite a lot of work)?
Thanks! Yuyun Get Outlook for iOS<https://aka.ms/o0ukef> ________________________________ From: Smith, Barry F. <bsm...@mcs.anl.gov> Sent: Friday, March 15, 2019 5:43:23 PM To: Yuyun Yang Cc: Matthew Knepley; petsc-users@mcs.anl.gov Subject: Re: [petsc-users] Using PETSc with GPU > On Mar 15, 2019, at 7:33 PM, Yuyun Yang via petsc-users > <petsc-users@mcs.anl.gov> wrote: > > Thanks Matt, I've seen that page, but there isn't that much documentation, > and there is only one CUDA example, so I wanted to check if there may be more > references or examples somewhere else. We have very large linear systems that > need to be solved every time step, and which involves matrix-matrix > multiplications, where do these matrix-matrix multiplications appear? Are you providing a "matrix-free" based operator for your linear system where you apply matrix-vector operations via a subroutine call? Or are you explicitly forming sparse matrices and using them to define the operator? > so we thought GPU could have some benefits, but we are unsure how difficult > it is to migrate parts of the code to GPU with PETSc. From that webpage it > seems like we only need to specify the Vec / Mat option on the command line > and maybe change a few functions to have CUDA? The CUDA example however also > involves using thrust and programming a kernel function, so I want to make > sure I know how this works before trying to implement. How much, if any, CUDA/GPU code you have to write depends on what you want to have done on the GPU. If you provide a sparse matrix and only want the system solve to take place on the GPU then you don't need to write any CUDA/GPU code, you just use the "CUDA" vector and matrix class. If you are doing "matrix-free" solves and you provide the routine that performs the matrix-vector product then you need to write/optimize that routine for CUDA/GPU. Barry > > Thanks a lot, > Yuyun > > Get Outlook for iOS > From: Matthew Knepley <knep...@gmail.com> > Sent: Friday, March 15, 2019 2:54:02 PM > To: Yuyun Yang > Cc: petsc-users@mcs.anl.gov > Subject: Re: [petsc-users] Using PETSc with GPU > > On Fri, Mar 15, 2019 at 5:30 PM Yuyun Yang via petsc-users > <petsc-users@mcs.anl.gov> wrote: > Hello team, > > > > Our group is thinking of using GPUs for the linear solves in our code, which > is written in PETSc. I was reading the 2013 book chapter on implementation of > PETSc using GPUs but wonder if there is any more updated reference that I > check out? I also saw one example cuda code online (using thrust), but would > like to check with you if there is a more complete documentation of how the > GPU implementation is done? > > > Have you seen this page? https://www.mcs.anl.gov/petsc/features/gpus.html > > Also, before using GPUs, I would take some time to understand what you think > the possible benefit can be. > For example, there is almost no benefit is you use BLAS1, and you would have > a huge maintenance burden > with a different toolchain. This is also largely true for SpMV, since the > bandwidth difference between CPUs > and GPUs is now not much. So you really should have some kind of flop > intensive (BLAS3-like) work in there > somewhere or its hard to see your motivation. > > Thanks, > > Matt > > > Thanks very much! > > > > Best regards, > > Yuyun > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/