Bug, it should be "span,pe=2"

2015-04-10 15:28 GMT+02:00 Nick Papior Andersen <nickpap...@gmail.com>:

> I guess you want process #1 to have core 0 and core 1 bound to it, process
> #2 have core 2 and core 3 bound?
>
> I can do this with (I do this with 1.8.4, I do not think it works with
> 1.6.x):
> --map-by ppr:4:socket:span:pe=2
> ppr = processes per resource.
> socket = the resource
> span = load balance the processes
> pe = bind processing elements to each process
>
> This should launch 8 processes (you have 2 sockets). Each process should
> have 2 processing elements bound to it.
> You can check with --report-bindings to see the "bound" processes bindings.
>
> 2015-04-10 15:16 GMT+02:00 <twu...@goodyear.com>:
>
>>
>> We can't seem to get "processor affinity" using 1.6.4 or newer OpenMPI.
>>
>> Note this is a 2 socket machine with 8 cores per socket
>>
>> We had compiled OpenMPI 1.4.2 with the following configure options:
>>
>>
>> ===========================================================================
>> export CC=/apps/share/intel/v14.0.4.211/bin/icc
>> export CXX=/apps/share/intel/v14.0.4.211/bin/icpc
>> export FC=/apps/share/intel/v14.0.4.211/bin/ifort
>>
>> version=1.4.2.I1404211
>>
>> ./configure \
>>     --prefix=/apps/share/openmpi/$version \
>>     --disable-shared \
>>     --enable-static \
>>     --enable-shared=no \
>>     --with-openib \
>>     --with-libnuma=/usr \
>>     --enable-mpirun-prefix-by-default \
>>     --with-memory-manager=none \
>>     --with-tm=/apps/share/TORQUE/current/Linux
>>
>> ===========================================================================
>>
>> and then used this mpirun command (where we used 8 cores):
>>
>>
>> ===========================================================================
>> /apps/share/openmpi/1.4.2.I1404211/bin/mpirun \
>> --prefix /apps/share/openmpi/1.4.2.I1404211 \
>> --mca mpi_paffinity_alone 1 \
>> --mca btl openib,tcp,sm,self \
>> --x LD_LIBRARY_PATH \
>> {model args}
>>
>> ===========================================================================
>>
>> And when we checked the process map, it looks like this:
>>
>>   PID COMMAND         CPUMASK     TOTAL [     N0     N1     N2     N3
>>  N4     N5 ]
>> 22232 prog1                 0    469.9M [ 469.9M     0      0      0
>> 0      0  ]
>> 22233 prog1                 1    479.0M [   4.0M 475.0M     0      0
>> 0      0  ]
>> 22234 prog1                 2    516.7M [ 516.7M     0      0      0
>> 0      0  ]
>> 22235 prog1                 3    485.4M [   8.0M 477.4M     0      0
>> 0      0  ]
>> 22236 prog1                 4    482.6M [ 482.6M     0      0      0
>> 0      0  ]
>> 22237 prog1                 5    486.6M [   6.0M 480.6M     0      0
>> 0      0  ]
>> 22238 prog1                 6    481.3M [ 481.3M     0      0      0
>> 0      0  ]
>> 22239 prog1                 7    419.4M [   8.0M 411.4M     0      0
>> 0      0  ]
>>
>> Now with 1.6.4 and higher, we did the following:
>>
>> ===========================================================================
>> export CC=/apps/share/intel/v14.0.4.211/bin/icc
>> export CXX=/apps/share/intel/v14.0.4.211/bin/icpc
>> export FC=/apps/share/intel/v14.0.4.211/bin/ifort
>>
>> version=1.6.4.I1404211
>>
>> ./configure \
>>     --disable-vt \
>>     --prefix=/apps/share/openmpi/$version \
>>     --disable-shared \
>>     --enable-static \
>>     --with-verbs \
>>     --enable-mpirun-prefix-by-default \
>>     --with-memory-manager=none \
>>     --with-hwloc \
>>     --enable-mpi-ext \
>>     --with-tm=/apps/share/TORQUE/current/Linux
>>
>> ===========================================================================
>>
>> We've tried the same mpirun command, with -bind-to-core, with
>> -bind-to-core -bycore etc
>> and I can't seem to get the right combination of args to get the same
>> behavior as 1.4.2.
>>
>> We get the following process map (this output is with mpirun args
>> --bind-to-socket
>> --mca mpi_paffinity_alone 1):
>>
>>   PID COMMAND         CPUMASK     TOTAL [     N0     N1     N2     N3
>>  N4     N5 ]
>> 24176 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
>>  60.2M [  60.2M     0      0      0      0      0  ]
>> 24177 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
>>  60.5M [  60.5M     0      0      0      0      0  ]
>> 24178 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
>>  60.5M [  60.5M     0      0      0      0      0  ]
>> 24179 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
>>  60.5M [  60.5M     0      0      0      0      0  ]
>> 24180 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
>>  60.5M [  60.5M     0      0      0      0      0  ]
>> 24181 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
>>  60.5M [  60.5M     0      0      0      0      0  ]
>> 24182 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
>>  60.5M [  60.5M     0      0      0      0      0  ]
>> 24183 prog1           0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
>>  60.5M [  60.5M     0      0      0      0      0  ]
>>
>> here is the map using just --mca mpi_paffinity_alone 1
>>
>>   PID COMMAND         CPUMASK     TOTAL [     N0     N1     N2     N3
>>  N4     N5 ]
>> 25846 prog1              0,16     60.6M [  60.6M     0      0      0
>> 0      0  ]
>> 25847 prog1              2,18     60.6M [  60.6M     0      0      0
>> 0      0  ]
>> 25848 prog1              4,20     60.6M [  60.6M     0      0      0
>> 0      0  ]
>> 25849 prog1              6,22     60.6M [  60.6M     0      0      0
>> 0      0  ]
>> 25850 prog1              8,24     60.6M [  60.6M     0      0      0
>> 0      0  ]
>> 25851 prog1             10,26     60.6M [  60.6M     0      0      0
>> 0      0  ]
>> 25852 prog1             12,28     60.6M [  60.6M     0      0      0
>> 0      0  ]
>> 25853 prog1             14,30     60.6M [  60.6M     0      0      0
>> 0      0  ]
>>
>> I figure I am compiling incorrectly or using the wrong mpirun args.
>>
>> Can someone tell me how to duplicate the behavior of 1.4.2 regarding
>> binding the processes to cores?
>>
>> Any help appreciated..
>>
>> thanks
>>
>> tom
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/04/17205.php
>>
>
>
>
> --
> Kind regards Nick
>



-- 
Kind regards Nick

Reply via email to