On Fri, Mar 18, 2022 at 10:28 AM Sajid Ali Syed <sas...@fnal.gov> wrote:
> Hi Matt/Mark, > > I'm working on a Poisson solver for a distributed PIC code, where the > particles are distributed over MPI ranks rather than the grid. Prior to the > solve, all particles are deposited onto a (DMDA) grid. > > The current prototype I have is that each rank holds a full size DMDA > vector and particles on that rank are deposited into it. Then, the data > from all the local vectors in combined into multiple distributed DMDA > vectors via VecScatters and this is followed by solving the Poisson > equation. The need to have multiple subcomms, each solving the same > equation is due to the fact that the grid size too small to use all the MPI > ranks (beyond the strong scaling limit). The solution is then scattered > back to each MPI rank via VecScatters. > > This first local-to-(multi)global transfer required the use of multiple > VecScatters as there is no one-to-multiple scatter capability in SF. This > works and is already giving a large speedup over the current allreduce > baseline (which transfers more data than is necessary) which is currently > used. > > I was wondering if within each subcommunicator I could directly write to > the DMDA vector via VecSetValues and PETSc would take care of stashing them > on the GPU until I call VecAssemblyBegin. Since this would be from within a > kokkos parallel_for operation, there would be multiple (probably ~1e3) > simultaneous writes that the stashing mechanism would have to support. > Currently, we use Kokkos-ScatterView to do this. > VecSetValues() only supports host data. I was wondering to provide a VecSetValues for you to call in Kokkos parallel_for, does it have to be a device function? > > Thank You, > Sajid Ali (he/him) | Research Associate > Scientific Computing Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > > ------------------------------ > *From:* Matthew Knepley <knep...@gmail.com> > *Sent:* Thursday, March 17, 2022 7:25 PM > *To:* Mark Adams <mfad...@lbl.gov> > *Cc:* Sajid Ali Syed <sas...@fnal.gov>; petsc-users@mcs.anl.gov < > petsc-users@mcs.anl.gov> > *Subject:* Re: [petsc-users] Regarding the status of > VecSetValues(Blocked) for GPU vectors > > On Thu, Mar 17, 2022 at 8:19 PM Mark Adams <mfad...@lbl.gov> wrote: > > LocalToGlobal is a DM thing.. > Sajid, do use DM? > If you need to add off procesor entries then DM could give you a local > vector as Matt said that you can add to for off procesor values and then > you could use the CPU communication in DM. > > > It would be GPU communication, not CPU. > > Matt > > > On Thu, Mar 17, 2022 at 7:19 PM Matthew Knepley <knep...@gmail.com> wrote: > > On Thu, Mar 17, 2022 at 4:46 PM Sajid Ali Syed <sas...@fnal.gov> wrote: > > Hi PETSc-developers, > > Is it possible to use VecSetValues with distributed-memory CUDA & Kokkos > vectors from the device, i.e. can I call VecSetValues with GPU memory > pointers and expect PETSc to figure out how to stash on the device it until > I call VecAssemblyBegin (at which point PETSc could use GPU-aware MPI to > populate off-process values) ? > > If this is not currently supported, is supporting this on the roadmap? > Thanks in advance! > > > VecSetValues() will fall back to the CPU vector, so I do not think this > will work on device. > > Usually, our assembly computes all values and puts them in a "local" > vector, which you can access explicitly as Mark said. Then > we call LocalToGlobal() to communicate the values, which does work > directly on device using specialized code in VecScatter/PetscSF. > > What are you trying to do? > > THanks, > > Matt > > > Thank You, > Sajid Ali (he/him) | Research Associate > Scientific Computing Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > <https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=0XVS3DAGXcfL8rzL8Bij70rxbfVtLXqvZC2kUPVoHUEquwZVQwgoBP_aHbei5owb&s=jaqSeHVty0Q2rK0mKuKQMyvcQGtqdOPN6wcZIGZ5_K4&e=> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cse.buffalo.edu_-7Eknepley_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=0XVS3DAGXcfL8rzL8Bij70rxbfVtLXqvZC2kUPVoHUEquwZVQwgoBP_aHbei5owb&s=CoW4LB9JyQtsc-D24RRWHnnDdNjSnjwZ4FPrLmaIZhc&e=> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cse.buffalo.edu_-7Eknepley_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=0XVS3DAGXcfL8rzL8Bij70rxbfVtLXqvZC2kUPVoHUEquwZVQwgoBP_aHbei5owb&s=CoW4LB9JyQtsc-D24RRWHnnDdNjSnjwZ4FPrLmaIZhc&e=> >