I would like as much as possible to pass the cuda and hip streams to Kokkos, since I can directly handle much of the annoyance with wrangling multiple streams and stream objects externally. Last I checked on this Kokkos was moving towards allowing association of streams to functions, but admittedly this was a while back.
Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) Cell: (312) 694-3391 > On Jan 10, 2021, at 13:10, Mark Adams <mfad...@lbl.gov> wrote: > > > > On Sat, Jan 9, 2021 at 7:37 PM Jacob Faibussowitsch <jacob....@gmail.com > <mailto:jacob....@gmail.com>> wrote: > It is a single object that holds a pointer to every stream implementation and > toggleable type so it can be universally passed around. Currently has a > cudaStream and a hipStream but this is easily extendable to any other stream > implementation. > > Do you have any thoughts on how this would work with Kokkos? > > Would you want to feed Kokkos your Cuda/Hip, etc, stream or add a Kokkos > backend to your object? > > Junchao might be the person to ask. I would guess Kokkos View (vector) > objects carry a stream because they block on a "deep_copy", that moves data > to/from the GPU, and it is blocking. > > Thanks, > Mark > > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > Cell: +1 (312) 694-3391 > >> On Jan 9, 2021, at 18:19, Mark Adams <mfad...@lbl.gov >> <mailto:mfad...@lbl.gov>> wrote: >> >> >> Is this stream object going to have Cuda, Kokkos, etc., implementations? >> >> On Sat, Jan 9, 2021 at 4:09 PM Jacob Faibussowitsch <jacob....@gmail.com >> <mailto:jacob....@gmail.com>> wrote: >> I’m currently working on an implementation of a general PetscStream object. >> Currently it only supports Vector ops and has a proof of concept KSPCG, but >> should be extensible to other objects when finished. Junchao is also >> indirectly working on pipeline support in his NVSHMEM MR. Take a look at >> either MR, it would be very useful to get your input, as tailoring either of >> these approaches for pipelined algorithms is key. >> >> Best regards, >> >> Jacob Faibussowitsch >> (Jacob Fai - booss - oh - vitch) >> Cell: (312) 694-3391 >> >>> On Jan 9, 2021, at 15:01, Mark Adams <mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov>> wrote: >>> >>> I would like to put a non-overlapping ASM solve on the GPU. It's not clear >>> that we have a model for this. >>> >>> PCApply_ASM currently pipelines the scater with the subdomain solves. I >>> think we would want to change this and do a 1) scatter begin loop, 2) >>> scatter end and non-blocking solve loop, 3) solve-wait and scatter begging >>> loop and 4) scatter end loop. >>> >>> I'm not sure how to go about doing this. >>> * Should we make a new PCApply_ASM_PARALLEL or dump this pipelining >>> algorithm and rewrite PCApply_ASM? >>> * Add a solver-wait method to KSP? >>> >>> Thoughts? >>> >>> Mark >>