See P. 38 - 40, MVAPICH2 outperforms Open-MPI for each test, so is it something that they are doing to optimize for CUDA & GPUs and those optimizations are not in OMPI, or did they specifically tune MVAPICH2 to make it shine??
http://hpcadvisorycouncil.com/events/2012/Israel-Workshop/Presentations/7_OSU.pdf The benchmark package: http://mvapich.cse.ohio-state.edu/benchmarks/ Rayson ================================= Open Grid Scheduler / Grid Engine http://gridscheduler.sourceforge.net/ Scalable Grid Engine Support Program http://www.scalablelogic.com/