On Thu, Nov 24, 2011 at 4:49 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> On Thu, Nov 24, 2011 at 16:41, Matthew Knepley <knepley at gmail.com> wrote: >> >> Let's start with the "lowest" level, or at least the smallest. I think >> the only sane way to program for portable performance here >> is using CUDA-type vectorization. This SIMT style is explained well here >> http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html >> I think this is much easier and more portable than the intrinsics for >> Intel, and more performant and less error prone than threads. >> I think you can show that it will accomplish anything we want to do. >> OpenCL seems to have capitulated on this point. Do we agree >> here? >> > > Moving from the other thread, I asked how far we could get with an API for > high-level data movement combined with CUDA/OpenCL kernels. Matt wrote > > *I think it will get you quite far, and the point for me will be* > *how will the user describe a communication pattern, and how will we > automate the generation of MPI* > *from that specification. Sieve has an attempt to do this buried in it > inspired by the "manifold" idea.* > * > * > Now that CUDA supports function pointers and similar, we can write real > code in it. Whenever OpenCL gets around to supporting them, we'll be able > to write real code for multicore and see how it performs. To unify the > distributed and manycore aspects, we need some sort of hierarchical > abstraction for NUMA and a communicator-like object to maintain scope. > After applying a local-distribution filter, we might be able to express > this using coloring plus the parallel primitives that I have been > suggesting in the other thread. > One key operation which has not yet been discussed is the "push forward" of a mapping as Dmitry put it. Here is a scenario: We understand a matching of mesh points between processes. In order to construct a ghost communication (VecScatter), I need to compose the mapping between mesh points and the mapping of mesh points to data. I think this operation is generic and important, For example, it turns a mesh point partition into a topology distribution, or if you like a row partition into a matrix distribution. I think this might be the right operation to take any partition to a data distribution algorithm. Matt > I'll think more on this and see if I can put together a concrete API > proposal. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20111124/508311a7/attachment.html>