On Thu, Nov 24, 2011 at 4:09 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > Jed, > > Let's stop arguing about whether MPI is or is not a good base for the > next generation of HPC software but instead start a new conversation on > what API (implemented on top of or not on top of MPI/pthreads etc etc) we > want to build PETSc on to scale PETSc up to millions of cores with large > NUMA nodes and GPU like accelerators. > > What do you want in the API? Let's start with the "lowest" level, or at least the smallest. I think the only sane way to program for portable performance here is using CUDA-type vectorization. This SIMT style is explained well here http://www.yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html I think this is much easier and more portable than the intrinsics for Intel, and more performant and less error prone than threads. I think you can show that it will accomplish anything we want to do. OpenCL seems to have capitulated on this point. Do we agree here? Matt > > Barry > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20111124/6200c041/attachment.html>