Re: [petsc-dev] [petsc-maint] running CUDA on SUMMIT

Smith, Barry F. via petsc-dev Wed, 14 Aug 2019 16:11:43 -0700

> On Aug 14, 2019, at 5:58 PM, Jed Brown <j...@jedbrown.org> wrote:
> 
> "Smith, Barry F." <bsm...@mcs.anl.gov> writes:
> 
>>> On Aug 14, 2019, at 2:37 PM, Jed Brown <j...@jedbrown.org> wrote:
>>> 
>>> Mark Adams via petsc-dev <petsc-dev@mcs.anl.gov> writes:
>>> 
>>>> On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. <bsm...@mcs.anl.gov> wrote:
>>>> 
>>>>> 
>>>>> Mark,
>>>>> 
>>>>>  Would you be able to make one run using single precision? Just single
>>>>> everywhere since that is all we support currently?
>>>>> 
>>>>> 
>>>> Experience in engineering at least is single does not work for FE
>>>> elasticity. I have tried it many years ago and have heard this from others.
>>>> This problem is pretty simple other than using Q2. I suppose I could try
>>>> it, but just be aware the FE people might say that single sucks.
>>> 
>>> When they say that single sucks, is it for the definition of the
>>> operator or the preconditioner?
>>> 
>>> As point of reference, we can apply Q2 elasticity operators in double
>>> precision at nearly a billion dofs/second per GPU.
>> 
>>  And in single you get what?
> 
> I don't have exact numbers, but <2x faster on V100, and it sort of
> doesn't matter because preconditioning cost will dominate.  

   When using block formats a much higher percentage of the bandwidth goes to 
moving the double precision matrix entries so switching to single could 
conceivably benefit    up to almost a factor of two. 

    Depending on the matrix structure perhaps the column indices could be 
handled by a shift and short j indices. Or 2 shifts and 2 sets of j indices

> The big win
> of single is on consumer-grade GPUs, which DOE doesn't install and
> NVIDIA forbids to be used in data centers (because they're so
> cost-effective ;-)).

   DOE LCFs are not our only customers. Cheap-o engineering professors might 
stack a bunch of consumer grade in their lab, would they benefit? Satish's 
basement could hold a great deal of consumer grades.

> 
>>> I'm skeptical of big wins in preconditioning (especially setup) due to
>>> the cost and irregularity of indexing being large compared to the
>>> bandwidth cost of the floating point values.
Re: [petsc-dev] [petsc-maint] running CUDA on SUMMIT

Reply via email to