Re: [OMPI devel] Benchmark with multiple orteds

Gilles Gouaillardet Mon, 25 Jan 2016 09:32:42 -0500 (EST)

Though I did not repeat it, I assumed --mca btl tcp,self is always used, as
described in the initial email


Cheers,

Gilles

On Monday, January 25, 2016, Ralph Castain <r...@open-mpi.org> wrote:

> I believe the performance penalty will still always be greater than zero,
> however, as the TCP stack is smart enough to take an optimized path when
> doing a loopback as opposed to inter-node communication.
>
>
> On Mon, Jan 25, 2016 at 4:28 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote:
>
>> Federico,
>>
>> I did not expect 0% degradation, since you are now comparing two
>> different cases
>> 1 orted means tasks are bound on sockets
>> 16 orted means tasks are not bound.
>>
>> a quick way to improve things is to use a wrapper that binds MPI tasks
>> mpirun --bind-to none wrapper.sh skampi
>>
>> wrapper.sh can use environment variable to retrieve the rank id
>> (PMI(X)_RANK iirc) and then bind the tasks with taskset or helicopter
>> utils
>>
>> mpirun --tag-output grep Cpus_allowed_list /proc/self/status
>> with 1 orted should return the same output than
>> mpirun --tag-output --bind-to none wrapper.sh grep CPUs_allowed_list
>> /proc/self/status
>> with 16 orted
>>
>> when wrapper.sh works fine, skampi degradation should be smaller with 16
>> orted
>>
>> Cheers,
>>
>> Gilles
>>
>> On Monday, January 25, 2016, Federico Reghenzani <
>> federico1.reghenz...@mail.polimi.it
>> <javascript:_e(%7B%7D,'cvml','federico1.reghenz...@mail.polimi.it');>>
>> wrote:
>>
>>> Thank you Gilles, you're right, with --bind-to none we have ~ 15% of
>>> degradation rather than 50%.
>>>
>>> It's much better now, but I think it should be (in theory) around 0%.
>>> The benchmark is MPI bound (the standard benchmark provided with
>>> SkaMPI), it tests these functions: MPI_Bcast, MPI_Barrier, MPI_Reduce, 
>>> MPI_Allreduce,
>>> MPI_Alltoall, MPI_Gather, MPI_Scatter, MPI_Scan, MPI_Send/Recv
>>>
>>> Cheers,
>>> Federico
>>> __
>>> Federico Reghenzani
>>> M.Eng. Student @ Politecnico di Milano
>>> Computer Science and Engineering
>>>
>>>
>>>
>>> 2016-01-25 12:17 GMT+01:00 Gilles Gouaillardet <
>>> gilles.gouaillar...@gmail.com>:
>>>
>>>> Federico,
>>>>
>>>> unless you already took care of that, I would guess all 16 orted
>>>> bound their children MPI tasks on socket 0
>>>>
>>>> can you try
>>>> mpirun --bind-to none ...
>>>>
>>>> btw, is your benchmark application cpu bound ? memory bound ? MPI bound
>>>> ?
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>>
>>>> On Monday, January 25, 2016, Federico Reghenzani <
>>>> federico1.reghenz...@mail.polimi.it> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> we have executed a benchmark (SkaMPI) on the same machine (32 core
>>>>> Intel Xeon 86_64) with these two configurations:
>>>>> - 1 orted with 16 processes with BTL forced to TCP (--mca btl self,tcp)
>>>>> - 16 orted with each 1 process (that uses TCP)
>>>>>
>>>>> We use a custom RAS to allow multiple orted on the same machine (I
>>>>> know that it seems non-sense to have multiple orteds on the same machine
>>>>> for the same application, but we are doing some experiments for 
>>>>> migration).
>>>>>
>>>>> Initially we have expected approximately the same performance in both
>>>>> cases (we have 16 processes communicating via TCP in both cases), but we
>>>>> have a degradation of 50%, and we are sure that is not an overhead due to
>>>>> orteds initialization.
>>>>>
>>>>> Do you have any idea how can multiple orteds influence the processess
>>>>> performance?
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Federico
>>>>> __
>>>>> Federico Reghenzani
>>>>> M.Eng. Student @ Politecnico di Milano
>>>>> Computer Science and Engineering
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/devel/2016/01/18499.php
>>>>
>>>
>>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2016/01/18501.php
>>
>
>

Re: [OMPI devel] Benchmark with multiple orteds

Reply via email to