Oh, are all your integers 8 bytes? Even on one node?

  Once Karl's new middleware is in place we should see about reducing to 4 
bytes on the GPU.
   
   Barry


> On Aug 14, 2019, at 7:44 PM, Mark Adams <mfad...@lbl.gov> wrote:
> 
> OK, I'll run single. It a bit perverse to run with 4 byte floats and 8 byte 
> integers ... I could use 32 bit ints and just not scale out.
> 
> On Wed, Aug 14, 2019 at 6:48 PM Smith, Barry F. <bsm...@mcs.anl.gov> wrote:
> 
>  Mark,
> 
>    Oh, I don't even care if it converges, just put in a fixed number of 
> iterations. The idea is to just get a baseline of the possible improvement. 
> 
>     ECP is literally dropping millions into research on "multi precision" 
> computations on GPUs, we need to have some actual numbers for the best 
> potential benefit to determine how much we invest in further investigating 
> it, or not.
> 
>     I am not expressing any opinions on the approach, we are just in the fact 
> gathering stage.
> 
> 
>    Barry
> 
> 
> > On Aug 14, 2019, at 2:27 PM, Mark Adams <mfad...@lbl.gov> wrote:
> > 
> > 
> > 
> > On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. <bsm...@mcs.anl.gov> wrote:
> > 
> >   Mark,
> > 
> >    Would you be able to make one run using single precision? Just single 
> > everywhere since that is all we support currently? 
> > 
> > 
> > Experience in engineering at least is single does not work for FE 
> > elasticity. I have tried it many years ago and have heard this from others. 
> > This problem is pretty simple other than using Q2. I suppose I could try 
> > it, but just be aware the FE people might say that single sucks.
> >  
> >    The results will give us motivation (or anti-motivation) to have support 
> > for running KSP (or PC (or Mat)  in single precision while the simulation 
> > is double.
> > 
> >    Thanks.
> > 
> >      Barry
> > 
> > For example if the GPU speed on KSP is a factor of 3 over the double on 
> > GPUs this is serious motivation. 
> > 
> > 
> > > On Aug 14, 2019, at 12:45 PM, Mark Adams <mfad...@lbl.gov> wrote:
> > > 
> > > FYI, Here is some scaling data of GAMG on SUMMIT. Getting about 4x GPU 
> > > speedup with 98K dof/proc (3D Q2 elasticity).
> > > 
> > > This is weak scaling of a solve. There is growth in iteration count 
> > > folded in here. I should put rtol in the title and/or run a fixed number 
> > > of iterations and make it clear in the title.
> > > 
> > > Comments welcome.
> > > <out_cpu_012288><out_cpu_001536><out_cuda_012288><out_cpu_000024><out_cpu_000192><out_cuda_001536><out_cuda_000192><out_cuda_000024><weak_scaling_cpu.png><weak_scaling_cuda.png>
> > 
> 

Reply via email to