Though I did not repeat it, I assumed --mca btl tcp,self is always used, as described in the initial email
Cheers, Gilles On Monday, January 25, 2016, Ralph Castain <[email protected]> wrote: > I believe the performance penalty will still always be greater than zero, > however, as the TCP stack is smart enough to take an optimized path when > doing a loopback as opposed to inter-node communication. > > > On Mon, Jan 25, 2016 at 4:28 AM, Gilles Gouaillardet < > [email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> Federico, >> >> I did not expect 0% degradation, since you are now comparing two >> different cases >> 1 orted means tasks are bound on sockets >> 16 orted means tasks are not bound. >> >> a quick way to improve things is to use a wrapper that binds MPI tasks >> mpirun --bind-to none wrapper.sh skampi >> >> wrapper.sh can use environment variable to retrieve the rank id >> (PMI(X)_RANK iirc) and then bind the tasks with taskset or helicopter >> utils >> >> mpirun --tag-output grep Cpus_allowed_list /proc/self/status >> with 1 orted should return the same output than >> mpirun --tag-output --bind-to none wrapper.sh grep CPUs_allowed_list >> /proc/self/status >> with 16 orted >> >> when wrapper.sh works fine, skampi degradation should be smaller with 16 >> orted >> >> Cheers, >> >> Gilles >> >> On Monday, January 25, 2016, Federico Reghenzani < >> [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >> wrote: >> >>> Thank you Gilles, you're right, with --bind-to none we have ~ 15% of >>> degradation rather than 50%. >>> >>> It's much better now, but I think it should be (in theory) around 0%. >>> The benchmark is MPI bound (the standard benchmark provided with >>> SkaMPI), it tests these functions: MPI_Bcast, MPI_Barrier, MPI_Reduce, >>> MPI_Allreduce, >>> MPI_Alltoall, MPI_Gather, MPI_Scatter, MPI_Scan, MPI_Send/Recv >>> >>> Cheers, >>> Federico >>> __ >>> Federico Reghenzani >>> M.Eng. Student @ Politecnico di Milano >>> Computer Science and Engineering >>> >>> >>> >>> 2016-01-25 12:17 GMT+01:00 Gilles Gouaillardet < >>> [email protected]>: >>> >>>> Federico, >>>> >>>> unless you already took care of that, I would guess all 16 orted >>>> bound their children MPI tasks on socket 0 >>>> >>>> can you try >>>> mpirun --bind-to none ... >>>> >>>> btw, is your benchmark application cpu bound ? memory bound ? MPI bound >>>> ? >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> >>>> On Monday, January 25, 2016, Federico Reghenzani < >>>> [email protected]> wrote: >>>> >>>>> Hello, >>>>> >>>>> we have executed a benchmark (SkaMPI) on the same machine (32 core >>>>> Intel Xeon 86_64) with these two configurations: >>>>> - 1 orted with 16 processes with BTL forced to TCP (--mca btl self,tcp) >>>>> - 16 orted with each 1 process (that uses TCP) >>>>> >>>>> We use a custom RAS to allow multiple orted on the same machine (I >>>>> know that it seems non-sense to have multiple orteds on the same machine >>>>> for the same application, but we are doing some experiments for >>>>> migration). >>>>> >>>>> Initially we have expected approximately the same performance in both >>>>> cases (we have 16 processes communicating via TCP in both cases), but we >>>>> have a degradation of 50%, and we are sure that is not an overhead due to >>>>> orteds initialization. >>>>> >>>>> Do you have any idea how can multiple orteds influence the processess >>>>> performance? >>>>> >>>>> >>>>> Cheers, >>>>> Federico >>>>> __ >>>>> Federico Reghenzani >>>>> M.Eng. Student @ Politecnico di Milano >>>>> Computer Science and Engineering >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> devel mailing list >>>> [email protected] >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2016/01/18499.php >>>> >>> >>> >> _______________________________________________ >> devel mailing list >> [email protected] <javascript:_e(%7B%7D,'cvml','[email protected]');> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2016/01/18501.php >> > >
