The COO assembly is entirely based on thrust primitives, I don’t have much 
experience to say we will get a serious speedup by writing our own kernels, but 
it is definitely worth a try if we will end up adopting COO as entry point for 
GPU irregular assembly.
Jed, you mentioned BDDC deluxe, what do you mean by that? Porting 
setup/application of deluxe scaling onto GPU?

Timings are not so bad for me joining the hackaton. 

> On Mar 13, 2021, at 8:17 AM, Barry Smith <bsm...@petsc.dev> wrote:
> 
> 
> 
>> On Mar 12, 2021, at 10:49 PM, Jed Brown <j...@jedbrown.org> wrote:
>> 
>> Barry Smith <bsm...@petsc.dev> writes:
>> 
>>>> On Mar 12, 2021, at 6:58 PM, Jed Brown <j...@jedbrown.org> wrote:
>>>> 
>>>> Barry Smith <bsm...@petsc.dev> writes:
>>>> 
>>>>>    I think we should start porting the PetscFE infrastructure, numerical 
>>>>> integrations, vector and matrix assembly to GPUs soon. It is dog slow on 
>>>>> CPUs and should be able to deliver higher performance on GPUs. 
>>>> 
>>>> IMO, this comes via interfaces to libCEED, not rolling yet another way to 
>>>> invoke quadrature routines on GPUs.
>>> 
>>>  I am not talking about matrix-free stuff, that definitely belongs in 
>>> libCEED, no reason to rewrite. 
>>> 
>>>  But does libCEED also support the traditional finite element construction 
>>> process where the matrices are built explicitly? Or does it provide some of 
>>> the code, integration points, integration formula etc. that could be shared 
>>> and used as a starting point? If it includes all of these "traditional" 
>>> things then we should definitely get it all hooked into PetscFE/DMPLEX and 
>>> go to town. (But yes not so much need for the GPU hackathon since it is 
>>> wiring more than GPU code). The way I have always heard about libCEED was 
>>> as a matrix-free engine, so I may have miss understood. It is definitely 
>>> not my intention to start a project that reproduces functionality that we 
>>> can just use. 
>> 
>> MFEM wants this too and it's in a draft libCEED PR right now. My intent is 
>> to ensure it's compatible with Stefano's split-phase COO assembly. 
> 
>  Cool, would this be something that, in combination with perhaps some libCEED 
> folk, could be incorporated in the Hackathon? Anyone can join our group 
> Hackathon group, they don't have to have any financial connection with 
> "PETSc". 
> 
>> 
>>>  We do need solid support for traditional finite element assembly on GPUs, 
>>> matrix-free finite elements alone is not enough.
>> 
>> Agreed, and while libCEED could be further optimized for lowest order, even 
>> naive assembly will be faster than what's in DMPlex.

Reply via email to