> On Aug 14, 2019, at 5:58 PM, Jed Brown <j...@jedbrown.org> wrote: > > "Smith, Barry F." <bsm...@mcs.anl.gov> writes: > >>> On Aug 14, 2019, at 2:37 PM, Jed Brown <j...@jedbrown.org> wrote: >>> >>> Mark Adams via petsc-dev <petsc-dev@mcs.anl.gov> writes: >>> >>>> On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. <bsm...@mcs.anl.gov> wrote: >>>> >>>>> >>>>> Mark, >>>>> >>>>> Would you be able to make one run using single precision? Just single >>>>> everywhere since that is all we support currently? >>>>> >>>>> >>>> Experience in engineering at least is single does not work for FE >>>> elasticity. I have tried it many years ago and have heard this from others. >>>> This problem is pretty simple other than using Q2. I suppose I could try >>>> it, but just be aware the FE people might say that single sucks. >>> >>> When they say that single sucks, is it for the definition of the >>> operator or the preconditioner? >>> >>> As point of reference, we can apply Q2 elasticity operators in double >>> precision at nearly a billion dofs/second per GPU. >> >> And in single you get what? > > I don't have exact numbers, but <2x faster on V100, and it sort of > doesn't matter because preconditioning cost will dominate. When using block formats a much higher percentage of the bandwidth goes to moving the double precision matrix entries so switching to single could conceivably benefit up to almost a factor of two. Depending on the matrix structure perhaps the column indices could be handled by a shift and short j indices. Or 2 shifts and 2 sets of j indices > The big win > of single is on consumer-grade GPUs, which DOE doesn't install and > NVIDIA forbids to be used in data centers (because they're so > cost-effective ;-)). DOE LCFs are not our only customers. Cheap-o engineering professors might stack a bunch of consumer grade in their lab, would they benefit? Satish's basement could hold a great deal of consumer grades. > >>> I'm skeptical of big wins in preconditioning (especially setup) due to >>> the cost and irregularity of indexing being large compared to the >>> bandwidth cost of the floating point values.
Re: [petsc-dev] [petsc-maint] running CUDA on SUMMIT
Smith, Barry F. via petsc-dev Wed, 14 Aug 2019 16:11:43 -0700
- Re: [petsc-dev] [petsc-maint] running CUDA o... Mark Adams via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Jed Brown via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Mark Adams via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Brad Aagaard via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Jed Brown via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Mark Adams via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Mark Adams via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Smith, Barry F. via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Smith, Barry F. via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Jed Brown via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Smith, Barry F. via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Jed Brown via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Smith, Barry F. via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Mark Adams via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Smith, Barry F. via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Mark Adams via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Smith, Barry F. via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Mark Adams via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Smith, Barry F. via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Smith, Barry F. via petsc-dev
- Re: [petsc-dev] [petsc-maint] running CUDA o... Mark Adams via petsc-dev