I guess you want process #1 to have core 0 and core 1 bound to it, process #2 have core 2 and core 3 bound?
I can do this with (I do this with 1.8.4, I do not think it works with 1.6.x): --map-by ppr:4:socket:span:pe=2 ppr = processes per resource. socket = the resource span = load balance the processes pe = bind processing elements to each process This should launch 8 processes (you have 2 sockets). Each process should have 2 processing elements bound to it. You can check with --report-bindings to see the "bound" processes bindings. 2015-04-10 15:16 GMT+02:00 <twu...@goodyear.com>: > > We can't seem to get "processor affinity" using 1.6.4 or newer OpenMPI. > > Note this is a 2 socket machine with 8 cores per socket > > We had compiled OpenMPI 1.4.2 with the following configure options: > > =========================================================================== > export CC=/apps/share/intel/v14.0.4.211/bin/icc > export CXX=/apps/share/intel/v14.0.4.211/bin/icpc > export FC=/apps/share/intel/v14.0.4.211/bin/ifort > > version=1.4.2.I1404211 > > ./configure \ > --prefix=/apps/share/openmpi/$version \ > --disable-shared \ > --enable-static \ > --enable-shared=no \ > --with-openib \ > --with-libnuma=/usr \ > --enable-mpirun-prefix-by-default \ > --with-memory-manager=none \ > --with-tm=/apps/share/TORQUE/current/Linux > =========================================================================== > > and then used this mpirun command (where we used 8 cores): > > =========================================================================== > /apps/share/openmpi/1.4.2.I1404211/bin/mpirun \ > --prefix /apps/share/openmpi/1.4.2.I1404211 \ > --mca mpi_paffinity_alone 1 \ > --mca btl openib,tcp,sm,self \ > --x LD_LIBRARY_PATH \ > {model args} > =========================================================================== > > And when we checked the process map, it looks like this: > > PID COMMAND CPUMASK TOTAL [ N0 N1 N2 N3 > N4 N5 ] > 22232 prog1 0 469.9M [ 469.9M 0 0 0 > 0 0 ] > 22233 prog1 1 479.0M [ 4.0M 475.0M 0 0 > 0 0 ] > 22234 prog1 2 516.7M [ 516.7M 0 0 0 > 0 0 ] > 22235 prog1 3 485.4M [ 8.0M 477.4M 0 0 > 0 0 ] > 22236 prog1 4 482.6M [ 482.6M 0 0 0 > 0 0 ] > 22237 prog1 5 486.6M [ 6.0M 480.6M 0 0 > 0 0 ] > 22238 prog1 6 481.3M [ 481.3M 0 0 0 > 0 0 ] > 22239 prog1 7 419.4M [ 8.0M 411.4M 0 0 > 0 0 ] > > Now with 1.6.4 and higher, we did the following: > =========================================================================== > export CC=/apps/share/intel/v14.0.4.211/bin/icc > export CXX=/apps/share/intel/v14.0.4.211/bin/icpc > export FC=/apps/share/intel/v14.0.4.211/bin/ifort > > version=1.6.4.I1404211 > > ./configure \ > --disable-vt \ > --prefix=/apps/share/openmpi/$version \ > --disable-shared \ > --enable-static \ > --with-verbs \ > --enable-mpirun-prefix-by-default \ > --with-memory-manager=none \ > --with-hwloc \ > --enable-mpi-ext \ > --with-tm=/apps/share/TORQUE/current/Linux > =========================================================================== > > We've tried the same mpirun command, with -bind-to-core, with > -bind-to-core -bycore etc > and I can't seem to get the right combination of args to get the same > behavior as 1.4.2. > > We get the following process map (this output is with mpirun args > --bind-to-socket > --mca mpi_paffinity_alone 1): > > PID COMMAND CPUMASK TOTAL [ N0 N1 N2 N3 > N4 N5 ] > 24176 prog1 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 60.2M > [ 60.2M 0 0 0 0 0 ] > 24177 prog1 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 60.5M > [ 60.5M 0 0 0 0 0 ] > 24178 prog1 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 60.5M > [ 60.5M 0 0 0 0 0 ] > 24179 prog1 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 60.5M > [ 60.5M 0 0 0 0 0 ] > 24180 prog1 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 60.5M > [ 60.5M 0 0 0 0 0 ] > 24181 prog1 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 60.5M > [ 60.5M 0 0 0 0 0 ] > 24182 prog1 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 60.5M > [ 60.5M 0 0 0 0 0 ] > 24183 prog1 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 60.5M > [ 60.5M 0 0 0 0 0 ] > > here is the map using just --mca mpi_paffinity_alone 1 > > PID COMMAND CPUMASK TOTAL [ N0 N1 N2 N3 > N4 N5 ] > 25846 prog1 0,16 60.6M [ 60.6M 0 0 0 > 0 0 ] > 25847 prog1 2,18 60.6M [ 60.6M 0 0 0 > 0 0 ] > 25848 prog1 4,20 60.6M [ 60.6M 0 0 0 > 0 0 ] > 25849 prog1 6,22 60.6M [ 60.6M 0 0 0 > 0 0 ] > 25850 prog1 8,24 60.6M [ 60.6M 0 0 0 > 0 0 ] > 25851 prog1 10,26 60.6M [ 60.6M 0 0 0 > 0 0 ] > 25852 prog1 12,28 60.6M [ 60.6M 0 0 0 > 0 0 ] > 25853 prog1 14,30 60.6M [ 60.6M 0 0 0 > 0 0 ] > > I figure I am compiling incorrectly or using the wrong mpirun args. > > Can someone tell me how to duplicate the behavior of 1.4.2 regarding > binding the processes to cores? > > Any help appreciated.. > > thanks > > tom > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/04/17205.php > -- Kind regards Nick