Dear Sir or Madam, I'm running application on SMP clusters and want to get good performance for collective communications utilizing shared memory feature. I browse official website of OpenMPI, and see that OpenMPI can automatically find the best network according to the hardware architecture, for example in SMP cluster, it choose shared memory to communication intra-node and use socket for inter-node communication.
Now I want to know does the optimization for collective communication is built on top of point to point communication? Or it is separate part? Can you bring me some details about the optimization for shared memory collectives? Best Regards, Shigang Li.