Bug, it should be "span,pe=2" 2015-04-10 15:28 GMT+02:00 Nick Papior Andersen <nickpap...@gmail.com>:
> I guess you want process #1 to have core 0 and core 1 bound to it, process > #2 have core 2 and core 3 bound? > > I can do this with (I do this with 1.8.4, I do not think it works with > 1.6.x): > --map-by ppr:4:socket:span:pe=2 > ppr = processes per resource. > socket = the resource > span = load balance the processes > pe = bind processing elements to each process > > This should launch 8 processes (you have 2 sockets). Each process should > have 2 processing elements bound to it. > You can check with --report-bindings to see the "bound" processes bindings. > > 2015-04-10 15:16 GMT+02:00 <twu...@goodyear.com>: > >> >> We can't seem to get "processor affinity" using 1.6.4 or newer OpenMPI. >> >> Note this is a 2 socket machine with 8 cores per socket >> >> We had compiled OpenMPI 1.4.2 with the following configure options: >> >> >> =========================================================================== >> export CC=/apps/share/intel/v14.0.4.211/bin/icc >> export CXX=/apps/share/intel/v14.0.4.211/bin/icpc >> export FC=/apps/share/intel/v14.0.4.211/bin/ifort >> >> version=1.4.2.I1404211 >> >> ./configure \ >> --prefix=/apps/share/openmpi/$version \ >> --disable-shared \ >> --enable-static \ >> --enable-shared=no \ >> --with-openib \ >> --with-libnuma=/usr \ >> --enable-mpirun-prefix-by-default \ >> --with-memory-manager=none \ >> --with-tm=/apps/share/TORQUE/current/Linux >> >> =========================================================================== >> >> and then used this mpirun command (where we used 8 cores): >> >> >> =========================================================================== >> /apps/share/openmpi/1.4.2.I1404211/bin/mpirun \ >> --prefix /apps/share/openmpi/1.4.2.I1404211 \ >> --mca mpi_paffinity_alone 1 \ >> --mca btl openib,tcp,sm,self \ >> --x LD_LIBRARY_PATH \ >> {model args} >> >> =========================================================================== >> >> And when we checked the process map, it looks like this: >> >> PID COMMAND CPUMASK TOTAL [ N0 N1 N2 N3 >> N4 N5 ] >> 22232 prog1 0 469.9M [ 469.9M 0 0 0 >> 0 0 ] >> 22233 prog1 1 479.0M [ 4.0M 475.0M 0 0 >> 0 0 ] >> 22234 prog1 2 516.7M [ 516.7M 0 0 0 >> 0 0 ] >> 22235 prog1 3 485.4M [ 8.0M 477.4M 0 0 >> 0 0 ] >> 22236 prog1 4 482.6M [ 482.6M 0 0 0 >> 0 0 ] >> 22237 prog1 5 486.6M [ 6.0M 480.6M 0 0 >> 0 0 ] >> 22238 prog1 6 481.3M [ 481.3M 0 0 0 >> 0 0 ] >> 22239 prog1 7 419.4M [ 8.0M 411.4M 0 0 >> 0 0 ] >> >> Now with 1.6.4 and higher, we did the following: >> >> =========================================================================== >> export CC=/apps/share/intel/v14.0.4.211/bin/icc >> export CXX=/apps/share/intel/v14.0.4.211/bin/icpc >> export FC=/apps/share/intel/v14.0.4.211/bin/ifort >> >> version=1.6.4.I1404211 >> >> ./configure \ >> --disable-vt \ >> --prefix=/apps/share/openmpi/$version \ >> --disable-shared \ >> --enable-static \ >> --with-verbs \ >> --enable-mpirun-prefix-by-default \ >> --with-memory-manager=none \ >> --with-hwloc \ >> --enable-mpi-ext \ >> --with-tm=/apps/share/TORQUE/current/Linux >> >> =========================================================================== >> >> We've tried the same mpirun command, with -bind-to-core, with >> -bind-to-core -bycore etc >> and I can't seem to get the right combination of args to get the same >> behavior as 1.4.2. >> >> We get the following process map (this output is with mpirun args >> --bind-to-socket >> --mca mpi_paffinity_alone 1): >> >> PID COMMAND CPUMASK TOTAL [ N0 N1 N2 N3 >> N4 N5 ] >> 24176 prog1 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 >> 60.2M [ 60.2M 0 0 0 0 0 ] >> 24177 prog1 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 >> 60.5M [ 60.5M 0 0 0 0 0 ] >> 24178 prog1 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 >> 60.5M [ 60.5M 0 0 0 0 0 ] >> 24179 prog1 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 >> 60.5M [ 60.5M 0 0 0 0 0 ] >> 24180 prog1 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 >> 60.5M [ 60.5M 0 0 0 0 0 ] >> 24181 prog1 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 >> 60.5M [ 60.5M 0 0 0 0 0 ] >> 24182 prog1 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 >> 60.5M [ 60.5M 0 0 0 0 0 ] >> 24183 prog1 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 >> 60.5M [ 60.5M 0 0 0 0 0 ] >> >> here is the map using just --mca mpi_paffinity_alone 1 >> >> PID COMMAND CPUMASK TOTAL [ N0 N1 N2 N3 >> N4 N5 ] >> 25846 prog1 0,16 60.6M [ 60.6M 0 0 0 >> 0 0 ] >> 25847 prog1 2,18 60.6M [ 60.6M 0 0 0 >> 0 0 ] >> 25848 prog1 4,20 60.6M [ 60.6M 0 0 0 >> 0 0 ] >> 25849 prog1 6,22 60.6M [ 60.6M 0 0 0 >> 0 0 ] >> 25850 prog1 8,24 60.6M [ 60.6M 0 0 0 >> 0 0 ] >> 25851 prog1 10,26 60.6M [ 60.6M 0 0 0 >> 0 0 ] >> 25852 prog1 12,28 60.6M [ 60.6M 0 0 0 >> 0 0 ] >> 25853 prog1 14,30 60.6M [ 60.6M 0 0 0 >> 0 0 ] >> >> I figure I am compiling incorrectly or using the wrong mpirun args. >> >> Can someone tell me how to duplicate the behavior of 1.4.2 regarding >> binding the processes to cores? >> >> Any help appreciated.. >> >> thanks >> >> tom >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/04/17205.php >> > > > > -- > Kind regards Nick > -- Kind regards Nick