Aron, 1. It's all NUMA
2. You don't get to repartition the matrix because that is unnatural and not a local optimization. 3. Because of 2, the algorithms are different, so direct comparison is not meaningful, but I do not buy that you can get the same throughput on the kernel that is natural and makes sense as a local optimization. Jed On Nov 12, 2010 7:02 PM, "Aron Ahmadia" <aron.ahmadia at kaust.edu.sa> wrote: > A partial counter-point is that MatSolve with OpenMP is unlikely to be near the throughput of MPI-... I am just going to throw down the gauntlet here and say that I can reproduce or beat in (my choice of) OpenMP or pthreads on a reasonably UMA multi-core processor anything that can be implemented in MPI. -Aron -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20101112/f7e4b5c2/attachment.html>