The COO assembly is entirely based on thrust primitives, I don’t have much experience to say we will get a serious speedup by writing our own kernels, but it is definitely worth a try if we will end up adopting COO as entry point for GPU irregular assembly. Jed, you mentioned BDDC deluxe, what do you mean by that? Porting setup/application of deluxe scaling onto GPU?
Timings are not so bad for me joining the hackaton. > On Mar 13, 2021, at 8:17 AM, Barry Smith <bsm...@petsc.dev> wrote: > > > >> On Mar 12, 2021, at 10:49 PM, Jed Brown <j...@jedbrown.org> wrote: >> >> Barry Smith <bsm...@petsc.dev> writes: >> >>>> On Mar 12, 2021, at 6:58 PM, Jed Brown <j...@jedbrown.org> wrote: >>>> >>>> Barry Smith <bsm...@petsc.dev> writes: >>>> >>>>> I think we should start porting the PetscFE infrastructure, numerical >>>>> integrations, vector and matrix assembly to GPUs soon. It is dog slow on >>>>> CPUs and should be able to deliver higher performance on GPUs. >>>> >>>> IMO, this comes via interfaces to libCEED, not rolling yet another way to >>>> invoke quadrature routines on GPUs. >>> >>> I am not talking about matrix-free stuff, that definitely belongs in >>> libCEED, no reason to rewrite. >>> >>> But does libCEED also support the traditional finite element construction >>> process where the matrices are built explicitly? Or does it provide some of >>> the code, integration points, integration formula etc. that could be shared >>> and used as a starting point? If it includes all of these "traditional" >>> things then we should definitely get it all hooked into PetscFE/DMPLEX and >>> go to town. (But yes not so much need for the GPU hackathon since it is >>> wiring more than GPU code). The way I have always heard about libCEED was >>> as a matrix-free engine, so I may have miss understood. It is definitely >>> not my intention to start a project that reproduces functionality that we >>> can just use. >> >> MFEM wants this too and it's in a draft libCEED PR right now. My intent is >> to ensure it's compatible with Stefano's split-phase COO assembly. > > Cool, would this be something that, in combination with perhaps some libCEED > folk, could be incorporated in the Hackathon? Anyone can join our group > Hackathon group, they don't have to have any financial connection with > "PETSc". > >> >>> We do need solid support for traditional finite element assembly on GPUs, >>> matrix-free finite elements alone is not enough. >> >> Agreed, and while libCEED could be further optimized for lowest order, even >> naive assembly will be faster than what's in DMPlex.