Federico,

unless you already took care of that, I would guess all 16 orted
bound their children MPI tasks on socket 0

can you try
mpirun --bind-to none ...

btw, is your benchmark application cpu bound ? memory bound ? MPI bound ?

Cheers,

Gilles

On Monday, January 25, 2016, Federico Reghenzani <
federico1.reghenz...@mail.polimi.it> wrote:

> Hello,
>
> we have executed a benchmark (SkaMPI) on the same machine (32 core Intel
> Xeon 86_64) with these two configurations:
> - 1 orted with 16 processes with BTL forced to TCP (--mca btl self,tcp)
> - 16 orted with each 1 process (that uses TCP)
>
> We use a custom RAS to allow multiple orted on the same machine (I know
> that it seems non-sense to have multiple orteds on the same machine for the
> same application, but we are doing some experiments for migration).
>
> Initially we have expected approximately the same performance in both
> cases (we have 16 processes communicating via TCP in both cases), but we
> have a degradation of 50%, and we are sure that is not an overhead due to
> orteds initialization.
>
> Do you have any idea how can multiple orteds influence the processess
> performance?
>
>
> Cheers,
> Federico
> __
> Federico Reghenzani
> M.Eng. Student @ Politecnico di Milano
> Computer Science and Engineering
>
>
>

Reply via email to