Federico, unless you already took care of that, I would guess all 16 orted bound their children MPI tasks on socket 0
can you try mpirun --bind-to none ... btw, is your benchmark application cpu bound ? memory bound ? MPI bound ? Cheers, Gilles On Monday, January 25, 2016, Federico Reghenzani < [email protected]> wrote: > Hello, > > we have executed a benchmark (SkaMPI) on the same machine (32 core Intel > Xeon 86_64) with these two configurations: > - 1 orted with 16 processes with BTL forced to TCP (--mca btl self,tcp) > - 16 orted with each 1 process (that uses TCP) > > We use a custom RAS to allow multiple orted on the same machine (I know > that it seems non-sense to have multiple orteds on the same machine for the > same application, but we are doing some experiments for migration). > > Initially we have expected approximately the same performance in both > cases (we have 16 processes communicating via TCP in both cases), but we > have a degradation of 50%, and we are sure that is not an overhead due to > orteds initialization. > > Do you have any idea how can multiple orteds influence the processess > performance? > > > Cheers, > Federico > __ > Federico Reghenzani > M.Eng. Student @ Politecnico di Milano > Computer Science and Engineering > > >
