Thanks all for comments. --- On Thu, 7/9/09, Matthew Knepley <knepley at gmail.com> wrote:
> From: Matthew Knepley <knepley at gmail.com> > Subject: Re: GPU related stuff > To: "For users of the development version of PETSc" <petsc-dev at mcs.anl.gov> > Date: Thursday, July 9, 2009, 5:09 PM > On Thu, Jul 9, 2009 at 7:31 AM, Jed > Brown <jed at 59a2.org> > wrote: > > Matthew Knepley wrote: > > > > > PCs which have high flop to memory access ratios look > good. ?No > > > surprise there. > > > > My concern here is that almost all "good" > preconditioners are > > multiplicative in the fine-grained kernels or do > significant work on > > coarse levels. ?Both of these are very bad for putting on > a GPU. > > Switching from SOR or ILU to Jacobi or red-black GS will > greatly improve > > the throughput on a GPU, but is normally much less > effective. ?Since the > > GPU typically needs thousands of threads to attain high > performance, > > it's really hard to use on all but the finest > level. > I agree with all these comments. I have no idea how to make > those PCs > work. I am counting on Barry's genius here. > ? > > > One of the more interesting preconditioners would be > 3-level balancing > > or overlapping DD with very small subdomains (like > thousands of > > subdomains per process). ?There would then be 1 subregion > per process > > and a global coarse level. ?This would allow the PC to be > additive with > > chunks of the right block size, while keeping a minimal > amount of work > > on the coarser levels (which are handled by the CPU). > ?(It's really hard > > to get multigrid to coarsen this rapidly, as in 1M dofs to > 10 dofs in 2 > > levels.) ?Unfortunately, this sort of scheme is rather > problem- and > > discretization-dependent, as well as rather complex to > implement. > With regard to targets, my strategy is to implement things > that I can > prove work well on a GPU. For starters, we have FMM. We > have done > > a complete computational model and can prove that this will > scale almost > indefinitely. The first paper is out, and the other 2 are > almost done. We are > also implementing wavelets, since the structure and proofs > are very similar > > to FMM. > ? > The strategy is to use FMM/Wavelets for problems they can > solve to precondition > more complex problems. The prototype is Stokes > preconditioning variable > viscosity Stokes, which I am working on with Dave May and > Dave Yuen. > > > > > I'll be interested to see what sort of performance you > can get for real > > preconditioners on a GPU. > Felipe Cruz has preliminary numbers for FMM: 500 GF on a > single 1060C! > That is probably 10 times what you can hope to achieve with > traditional > relaxation (I think). > > > ?? Matt > ? > > Jed > -- > What most experimenters take for granted before they begin > their experiments is infinitely more interesting than any > results to which their experiments lead. > -- Norbert Wiener > > >