Hi Matt/Mark,

I'm working on a Poisson solver for a distributed PIC code, where the particles 
are distributed over MPI ranks rather than the grid. Prior to the solve, all 
particles are deposited onto a (DMDA) grid.

The current prototype I have is that each rank holds a full size DMDA vector 
and particles on that rank are deposited into it. Then, the data from all the 
local vectors in combined into multiple distributed DMDA vectors via 
VecScatters and this is followed by solving the Poisson equation. The need to 
have multiple subcomms, each solving the same equation is due to the fact that 
the grid size too small to use all the MPI ranks (beyond the strong scaling 
limit). The solution is then scattered back to each MPI rank via VecScatters.

This first local-to-(multi)global transfer required the use of multiple 
VecScatters as there is no one-to-multiple scatter capability in SF. This works 
and is already giving a large speedup over the current allreduce baseline 
(which transfers more data than is necessary) which is currently used.

I was wondering if within each subcommunicator I could directly write to the 
DMDA vector via VecSetValues and PETSc would take care of stashing them on the 
GPU until I call VecAssemblyBegin. Since this would be from within a kokkos 
parallel_for operation, there would be multiple (probably ~1e3) simultaneous 
writes that the stashing mechanism would have to support. Currently, we use 
Kokkos-ScatterView to do this.

Thank You,
Sajid Ali (he/him) | Research Associate
Scientific Computing Division
Fermi National Accelerator Laboratory
s-sajid-ali.github.io<http://s-sajid-ali.github.io>

________________________________
From: Matthew Knepley <[email protected]>
Sent: Thursday, March 17, 2022 7:25 PM
To: Mark Adams <[email protected]>
Cc: Sajid Ali Syed <[email protected]>; [email protected] 
<[email protected]>
Subject: Re: [petsc-users] Regarding the status of VecSetValues(Blocked) for 
GPU vectors

On Thu, Mar 17, 2022 at 8:19 PM Mark Adams 
<[email protected]<mailto:[email protected]>> wrote:
LocalToGlobal is a DM thing..
Sajid, do use DM?
If you need to add off procesor entries then DM could give you a local vector 
as Matt said that you can add to for off procesor values and then you could use 
the CPU communication in DM.

It would be GPU communication, not CPU.

   Matt

On Thu, Mar 17, 2022 at 7:19 PM Matthew Knepley 
<[email protected]<mailto:[email protected]>> wrote:
On Thu, Mar 17, 2022 at 4:46 PM Sajid Ali Syed 
<[email protected]<mailto:[email protected]>> wrote:
Hi PETSc-developers,

Is it possible to use VecSetValues with distributed-memory CUDA & Kokkos 
vectors from the device, i.e. can I call VecSetValues with GPU memory pointers 
and expect PETSc to figure out how to stash on the device it until I call 
VecAssemblyBegin (at which point PETSc could use GPU-aware MPI to populate 
off-process values) ?

If this is not currently supported, is supporting this on the roadmap? Thanks 
in advance!

VecSetValues() will fall back to the CPU vector, so I do not think this will 
work on device.

Usually, our assembly computes all values and puts them in a "local" vector, 
which you can access explicitly as Mark said. Then
we call LocalToGlobal() to communicate the values, which does work directly on 
device using specialized code in VecScatter/PetscSF.

What are you trying to do?

  THanks,

      Matt

Thank You,
Sajid Ali (he/him) | Research Associate
Scientific Computing Division
Fermi National Accelerator Laboratory
s-sajid-ali.github.io<https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=0XVS3DAGXcfL8rzL8Bij70rxbfVtLXqvZC2kUPVoHUEquwZVQwgoBP_aHbei5owb&s=jaqSeHVty0Q2rK0mKuKQMyvcQGtqdOPN6wcZIGZ5_K4&e=>



--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cse.buffalo.edu_-7Eknepley_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=0XVS3DAGXcfL8rzL8Bij70rxbfVtLXqvZC2kUPVoHUEquwZVQwgoBP_aHbei5owb&s=CoW4LB9JyQtsc-D24RRWHnnDdNjSnjwZ4FPrLmaIZhc&e=>


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cse.buffalo.edu_-7Eknepley_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=0XVS3DAGXcfL8rzL8Bij70rxbfVtLXqvZC2kUPVoHUEquwZVQwgoBP_aHbei5owb&s=CoW4LB9JyQtsc-D24RRWHnnDdNjSnjwZ4FPrLmaIZhc&e=>

Reply via email to