On Thu, Jul 9, 2009 at 7:31 AM, Jed Brown <jed at 59a2.org> wrote: > Matthew Knepley wrote: > > > PCs which have high flop to memory access ratios look good. No > > surprise there. > > My concern here is that almost all "good" preconditioners are > multiplicative in the fine-grained kernels or do significant work on > coarse levels. Both of these are very bad for putting on a GPU. > Switching from SOR or ILU to Jacobi or red-black GS will greatly improve > the throughput on a GPU, but is normally much less effective. Since the > GPU typically needs thousands of threads to attain high performance, > it's really hard to use on all but the finest level.
I agree with all these comments. I have no idea how to make those PCs work. I am counting on Barry's genius here. > > One of the more interesting preconditioners would be 3-level balancing > or overlapping DD with very small subdomains (like thousands of > subdomains per process). There would then be 1 subregion per process > and a global coarse level. This would allow the PC to be additive with > chunks of the right block size, while keeping a minimal amount of work > on the coarser levels (which are handled by the CPU). (It's really hard > to get multigrid to coarsen this rapidly, as in 1M dofs to 10 dofs in 2 > levels.) Unfortunately, this sort of scheme is rather problem- and > discretization-dependent, as well as rather complex to implement. With regard to targets, my strategy is to implement things that I can prove work well on a GPU. For starters, we have FMM. We have done a complete computational model and can prove that this will scale almost indefinitely. The first paper is out, and the other 2 are almost done. We are also implementing wavelets, since the structure and proofs are very similar to FMM. The strategy is to use FMM/Wavelets for problems they can solve to precondition more complex problems. The prototype is Stokes preconditioning variable viscosity Stokes, which I am working on with Dave May and Dave Yuen. > I'll be interested to see what sort of performance you can get for real > preconditioners on a GPU. Felipe Cruz has preliminary numbers for FMM: 500 GF on a single 1060C! That is probably 10 times what you can hope to achieve with traditional relaxation (I think). Matt > > Jed > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20090709/d32d160d/attachment.html>