On Fri, Nov 25, 2011 at 16:48, Matthew Knepley <knepley at gmail.com> wrote:
> Synopsis of what I said before to elicit comment: > > 1) I think the only thing we can learn from Brook, CUDA, OpenCL is that > you identify threads by a grid ID. > > 2) Things like BLAS are so easy that you can move up to the streaming > model, but this does not work for > > - FD and FEM residual evaluation (Jed has an FD example with Aron, SNES > ex52 is my FEM example) > > - FD and FEM Jacobian evaluation > I think these are also probably too simple. Discontinuous Galerkin with overlapped flux computations and interior integration would be a somewhat better model problem. Nonlinear Gauss-Seidel in a multigrid context would be another. > > 3) If you look at ex52 I do a "thread transposition" meaning threads start > working on different areas of > memory which looks like a transpose on a 2D grid. I can do this using > shared memory for the vector group. > > The API is very simple. Give grid indices to the thread, and its done in > CUDA and OpenCL essentially the > same way. > As is, this seems to assume a flat memory model and the memory access only appears in how the kernel uses threadIdx to determine what memory to operate on. If we could say something about this up-front, then the library could schedule tasks relative to memory and perhaps handle some updates for distributed memory. Can we have a way to specify the required memory access before launching the kernels? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20111126/4ff0f92d/attachment.html>
