Re: [OMPI devel] default mapping on master vs v2.x

Gilles Gouaillardet Wed, 18 May 2016 01:28:17 -0400 (EDT)

Folks,


i ran some more test and found this

with both master and v2.x  :

mpirun --host n0:16,n1:16 -np 4 --tag-output hostname | sort
[1,0]<stdout>:n0
[1,1]<stdout>:n0
[1,2]<stdout>:n0
[1,3]<stdout>:n0

and same output with the --map-by socket option.

now, without specifying the number of slots per hosts, and the--oversubscribe option (mandatory for v2.x)


v2.x :

mpirun --host n0,n1 -np 4 --tag-output --oversubscribe hostname | sort
[1,0]<stdout>:n0
[1,1]<stdout>:n0
[1,2]<stdout>:n1
[1,3]<stdout>:n1


master :

mpirun --host n0,n1 -np 4 --tag-output --oversubscribe hostname | sort
[1,0]<stdout>:n0
[1,1]<stdout>:n0
[1,2]<stdout>:n0
[1,3]<stdout>:n0


no change is the --map-by socket is used

my observation is hardware topology is not retrieved when the number ofslots is specified (both v2.x and master). the default policy is--map-by slot, *and* the --map-by socket option seems ignored, should weinstead abort instead of ignoring this option ?

when the number of slots is not specified (and --oversubscribe is used),it seems the hardware topology is retrieved on v2.x, but not on master.instead, master only retrieves the number of slots and use them.

from an end user point of view, the default mapping policy is --map-bysocket on v2.x, and --map-by slot on master. --map-by socket seemsignored on master.

i re-read previous discussions, and i do not think this level of detailwas ever discussed.



fwiw, --map-by node option is correctly interpreted on both master and v2.x

mpirun --host n0,n1 -np 4 --tag-output --oversubscribe --map-by nodehostname | sort

[1,0]<stdout>:n0
[1,1]<stdout>:n1
[1,2]<stdout>:n0
[1,3]<stdout>:n1

also, i can get the mapping i wished/expected with --map-by ppr:2:node


bottom line :

1) should we abort if the number of slots is explicitly specified and--map-by socket and the like option is requested ?

2) in master only, when the number of slots per host is not specified,should we retrieve the hardware topology instead of the number of slots? if not, should we abort if --map-by socket is specified

if there is a consensus and changes are desired, i am fine trying toimplement them



Cheers,


Gilles


On 5/17/2016 11:01 AM, Gilles Gouaillardet wrote:

Folks,

currently, default mapping policy on master is different than v2.x.
my preliminary question is : when will the master mapping policy landinto the release branch ?
v2.0.0 ? v2.x ? v3.0.0 ?
here are some commands and their output (both n0 and n1 have 16 coreseach, mpirun runs on n0)
first, let's force 2 slots per node via the --host parameter, and playwith mapping
[gilles@n0 ~]$ mpirun --tag-output --host n0:2,n1:2 -np 4 hostname |sort
[1,0]<stdout>:n0
[1,1]<stdout>:n0
[1,2]<stdout>:n1
[1,3]<stdout>:n1
[gilles@n0 ~]$ mpirun --tag-output --host n0:2,n1:2 -np 4 --map-bysocket hostname | sort
[1,0]<stdout>:n0
[1,1]<stdout>:n0
[1,2]<stdout>:n1
[1,3]<stdout>:n1
/* so far so good, default mapping is --map-by socket, and mappinglooks correct to me */
[gilles@n0 ~]$ mpirun --tag-output --host n0:2,n1:2 -np 4 --map-bynode hostname | sort
[1,0]<stdout>:n0
[1,1]<stdout>:n1
[1,2]<stdout>:n0
[1,3]<stdout>:n1

/* mapping looks correct to me too */

now let's force 4 slots per node
[gilles@n0 ~]$ mpirun --tag-output --host n0:4,n1:4 -np 4 --map-bynode hostname | sort
[1,0]<stdout>:n0
[1,1]<stdout>:n1
[1,2]<stdout>:n0
[1,3]<stdout>:n1

/* same output than previously, looks correct to me */
[gilles@n0 ~]$ mpirun --tag-output --host n0:4,n1:4 -np 4 --map-bysocket hostname | sort
[1,0]<stdout>:n0
[1,1]<stdout>:n0
[1,2]<stdout>:n0
[1,3]<stdout>:n0
/* all tasks run on n0, even if i explicitly requested --map-bysocket, that looks wrong to me */
[gilles@n0 ~]$ mpirun --tag-output --host n0:4,n1:4 -np 4 hostname |sort
[1,0]<stdout>:n0
[1,1]<stdout>:n0
[1,2]<stdout>:n0
[1,3]<stdout>:n0
/* same output than previously, which makes sense to me since thedefault mapping policy is --map-by socket,
but all tasks run on n0, which still looks wrong to me */
if i do not force the number of slots, i get the same output (16 coresare detected on each node) regardless the --map-by socket option.
it seems --map-by core is used, regardless what we pass on the commandline.
in the last cases, is running all tasks on one node the intendedbehavior ?
if yes, which mapping option can be used to run the first 2 tasks onthe first node, and the last 2 tasks on the second nodes ?
Cheers,


Gilles

_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:http://www.open-mpi.org/community/lists/devel/2016/05/18990.php

Re: [OMPI devel] default mapping on master vs v2.x

Reply via email to