Re: [petsc-users] Regarding the status of VecSetValues(Blocked) for GPU vectors

Junchao Zhang Fri, 18 Mar 2022 09:12:24 -0700

On Fri, Mar 18, 2022 at 10:28 AM Sajid Ali Syed <sas...@fnal.gov> wrote:


> Hi Matt/Mark,
>
> I'm working on a Poisson solver for a distributed PIC code, where the
> particles are distributed over MPI ranks rather than the grid. Prior to the
> solve, all particles are deposited onto a (DMDA) grid.
>
> The current prototype I have is that each rank holds a full size DMDA
> vector and particles on that rank are deposited into it. Then, the data
> from all the local vectors in combined into multiple distributed DMDA
> vectors via VecScatters and this is followed by solving the Poisson
> equation. The need to have multiple subcomms, each solving the same
> equation is due to the fact that the grid size too small to use all the MPI
> ranks (beyond the strong scaling limit). The solution is then scattered
> back to each MPI rank via VecScatters.
>
> This first local-to-(multi)global transfer required the use of multiple
> VecScatters as there is no one-to-multiple scatter capability in SF. This
> works and is already giving a large speedup over the current allreduce
> baseline (which transfers more data than is necessary) which is currently
> used.
>
> I was wondering if within each subcommunicator I could directly write to
> the DMDA vector via VecSetValues and PETSc would take care of stashing them
> on the GPU until I call VecAssemblyBegin. Since this would be from within a
> kokkos parallel_for operation, there would be multiple (probably ~1e3)
> simultaneous writes that the stashing mechanism would have to support.
> Currently, we use Kokkos-ScatterView to do this.
>
VecSetValues() only supports host data.  I was wondering to provide a
VecSetValues for you to call in Kokkos parallel_for, does it have to be a
device function?


>
> Thank You,
> Sajid Ali (he/him) | Research Associate
> Scientific Computing Division
> Fermi National Accelerator Laboratory
> s-sajid-ali.github.io
>
> ------------------------------
> *From:* Matthew Knepley <knep...@gmail.com>
> *Sent:* Thursday, March 17, 2022 7:25 PM
> *To:* Mark Adams <mfad...@lbl.gov>
> *Cc:* Sajid Ali Syed <sas...@fnal.gov>; petsc-users@mcs.anl.gov <
> petsc-users@mcs.anl.gov>
> *Subject:* Re: [petsc-users] Regarding the status of
> VecSetValues(Blocked) for GPU vectors
>
> On Thu, Mar 17, 2022 at 8:19 PM Mark Adams <mfad...@lbl.gov> wrote:
>
> LocalToGlobal is a DM thing..
> Sajid, do use DM?
> If you need to add off procesor entries then DM could give you a local
> vector as Matt said that you can add to for off procesor values and then
> you could use the CPU communication in DM.
>
>
> It would be GPU communication, not CPU.
>
>    Matt
>
>
> On Thu, Mar 17, 2022 at 7:19 PM Matthew Knepley <knep...@gmail.com> wrote:
>
> On Thu, Mar 17, 2022 at 4:46 PM Sajid Ali Syed <sas...@fnal.gov> wrote:
>
> Hi PETSc-developers,
>
> Is it possible to use VecSetValues with distributed-memory CUDA & Kokkos
> vectors from the device, i.e. can I call VecSetValues with GPU memory
> pointers and expect PETSc to figure out how to stash on the device it until
> I call VecAssemblyBegin (at which point PETSc could use GPU-aware MPI to
> populate off-process values) ?
>
> If this is not currently supported, is supporting this on the roadmap?
> Thanks in advance!
>
>
> VecSetValues() will fall back to the CPU vector, so I do not think this
> will work on device.
>
> Usually, our assembly computes all values and puts them in a "local"
> vector, which you can access explicitly as Mark said. Then
> we call LocalToGlobal() to communicate the values, which does work
> directly on device using specialized code in VecScatter/PetscSF.
>
> What are you trying to do?
>
>   THanks,
>
>       Matt
>
>
> Thank You,
> Sajid Ali (he/him) | Research Associate
> Scientific Computing Division
> Fermi National Accelerator Laboratory
> s-sajid-ali.github.io
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=0XVS3DAGXcfL8rzL8Bij70rxbfVtLXqvZC2kUPVoHUEquwZVQwgoBP_aHbei5owb&s=jaqSeHVty0Q2rK0mKuKQMyvcQGtqdOPN6wcZIGZ5_K4&e=>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cse.buffalo.edu_-7Eknepley_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=0XVS3DAGXcfL8rzL8Bij70rxbfVtLXqvZC2kUPVoHUEquwZVQwgoBP_aHbei5owb&s=CoW4LB9JyQtsc-D24RRWHnnDdNjSnjwZ4FPrLmaIZhc&e=>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cse.buffalo.edu_-7Eknepley_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=0XVS3DAGXcfL8rzL8Bij70rxbfVtLXqvZC2kUPVoHUEquwZVQwgoBP_aHbei5owb&s=CoW4LB9JyQtsc-D24RRWHnnDdNjSnjwZ4FPrLmaIZhc&e=>
>

Re: [petsc-users] Regarding the status of VecSetValues(Blocked) for GPU vectors

Reply via email to