Here is what I see on my machine:

07:59:55  (v1.8) /home/common/openmpi/ompi-release$ mpirun -np 8 
--display-devel-map --report-bindings --map-by core -host bend001 --bind-to 
core hostname
 Data for JOB [45531,1] offset 0

 Mapper requested: NULL  Last mapper: round_robin  Mapping policy: BYCORE  
Ranking policy: CORE
 Binding policy: CORE  Cpu set: NULL  PPR: NULL  Cpus-per-rank: 1
        Num new daemons: 0      New daemon starting vpid INVALID
        Num nodes: 1

 Data for node: bend001         Launch id: -1   State: 2
        Daemon: [[45531,0],0]   Daemon launched: True
        Num slots: 12   Slots in use: 8 Oversubscribed: FALSE
        Num slots allocated: 12 Max slots: 0
        Username on node: NULL
        Num procs: 8    Next node_rank: 8
        Data for proc: [[45531,1],0]
                Pid: 0  Local rank: 0   Node rank: 0    App rank: 0
                State: INITIALIZED      Restarts: 0     App_context: 0  Locale: 
0,12    Bind location: 0,12     Binding: 0,12
        Data for proc: [[45531,1],1]
                Pid: 0  Local rank: 1   Node rank: 1    App rank: 1
                State: INITIALIZED      Restarts: 0     App_context: 0  Locale: 
2,14    Bind location: 2,14     Binding: 2,14
        Data for proc: [[45531,1],2]
                Pid: 0  Local rank: 2   Node rank: 2    App rank: 2
                State: INITIALIZED      Restarts: 0     App_context: 0  Locale: 
4,16    Bind location: 4,16     Binding: 4,16
        Data for proc: [[45531,1],3]
                Pid: 0  Local rank: 3   Node rank: 3    App rank: 3
                State: INITIALIZED      Restarts: 0     App_context: 0  Locale: 
6,18    Bind location: 6,18     Binding: 6,18
        Data for proc: [[45531,1],4]
                Pid: 0  Local rank: 4   Node rank: 4    App rank: 4
                State: INITIALIZED      Restarts: 0     App_context: 0  Locale: 
8,20    Bind location: 8,20     Binding: 8,20
        Data for proc: [[45531,1],5]
                Pid: 0  Local rank: 5   Node rank: 5    App rank: 5
                State: INITIALIZED      Restarts: 0     App_context: 0  Locale: 
10,22   Bind location: 10,22    Binding: 10,22
        Data for proc: [[45531,1],6]
                Pid: 0  Local rank: 6   Node rank: 6    App rank: 6
                State: INITIALIZED      Restarts: 0     App_context: 0  Locale: 
1,13    Bind location: 1,13     Binding: 1,13
        Data for proc: [[45531,1],7]
                Pid: 0  Local rank: 7   Node rank: 7    App rank: 7
                State: INITIALIZED      Restarts: 0     App_context: 0  Locale: 
3,15    Bind location: 3,15     Binding: 3,15
[bend001:15493] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: 
[BB/../../../../..][../../../../../..]
[bend001:15493] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]: 
[../BB/../../../..][../../../../../..]
[bend001:15493] MCW rank 2 bound to socket 0[core 2[hwt 0-1]]: 
[../../BB/../../..][../../../../../..]
[bend001:15493] MCW rank 3 bound to socket 0[core 3[hwt 0-1]]: 
[../../../BB/../..][../../../../../..]
[bend001:15493] MCW rank 4 bound to socket 0[core 4[hwt 0-1]]: 
[../../../../BB/..][../../../../../..]
[bend001:15493] MCW rank 5 bound to socket 0[core 5[hwt 0-1]]: 
[../../../../../BB][../../../../../..]
[bend001:15493] MCW rank 6 bound to socket 1[core 6[hwt 0-1]]: 
[../../../../../..][BB/../../../../..]
[bend001:15493] MCW rank 7 bound to socket 1[core 7[hwt 0-1]]: 
[../../../../../..][../BB/../../../..]


I have HT enabled on my box, so the devel-map is showing Locale, Bind location, 
and Binding as the logical HT numbers (i.e., the PUs) for that proc. As you can 
see in the report-bindings output, things are indeed going where they should go.

The numbering in the devel-map always looks a little funny because it depends 
on how the bios numbered cpus. Unlike you might expect, they do tend to bounce 
around. In my case, for example, the bios has assigned the HTs and cores in the 
first socket with all the even numbered PUs, and the second socket got all the 
odd numbers. In other words, it assigned PUs round-robin by socket instead of 
sequentially across each socket.

<shrug> every bios does it differently, so there is no way to provide a 
standardized output. This is why we have report-bindings to tell the user where 
they actually wound up.

HTH
Ralph


> On Apr 21, 2015, at 7:54 AM, Devendar Bureddy <deven...@mellanox.com> wrote:
> 
> I agree.   
> 
> -----Original Message-----
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
> (jsquyres)
> Sent: Tuesday, April 21, 2015 7:17 AM
> To: Open MPI Developers List
> Subject: Re: [OMPI devel] binding output error
> 
> +1
> 
> Devendar, you seem to be reporting a different issue than Elena...?  FWIW: 
> Open MPI has always used logical CPU numbering.  As far as I can tell from 
> your output, it looks like Open MPI did the Right Thing with your examples.
> 
> Elena's example seemed to show conflicting cpu numbering -- where OMPI said 
> it would bind a process and then where it actually bound it.  Ralph mentioned 
> to me that he would look at this as soon as he could; he thinks it might just 
> be an error in the printf output (and that the binding is actually occurring 
> in the right location).
> 
> 
> 
>> On Apr 20, 2015, at 9:48 PM, tmish...@jcity.maeda.co.jp wrote:
>> 
>> Hi Devendar,
>> 
>> As far as I know, the report-bindings option shows the logical cpu 
>> order. On the other hand, you are talking about physical one, I guess.
>> 
>> Regards,
>> Tetsuya Mishima
>> 
>> 2015/04/21 9:04:37、"devel"さんは「Re: [OMPI devel] binding output
>> error」で書きました
>>> HT is not enabled.  All node are same topo . This is reproducible 
>>> even on
>> single node.
>>> 
>>> 
>>> 
>>> I ran osu latency to see if it is really is mapped to other socket or 
>>> not
>> with –map-by socket.  It looks likes mapping is correct as per latency 
>> test.
>>> 
>>> 
>>> 
>>> $mpirun -np 2 -report-bindings -map-by
>> socket  
>> /hpc/local/benchmarks/hpc-stack-icc/install/ompi-mellanox-v1.8/tests/o
>> su-micro-benchmarks-4.4.1/osu_latency
>> 
>>> 
>>> [clx-orion-001:10084] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
>> [B/././././././././././././.][./././././././././././././.]
>>> 
>>> [clx-orion-001:10084] MCW rank 1 bound to socket 1[core 14[hwt 0]]:
>> [./././././././././././././.][B/././././././././././././.]
>>> 
>>> # OSU MPI Latency Test v4.4.1
>>> 
>>> # Size          Latency (us)
>>> 
>>> 0                       0.50
>>> 
>>> 1                       0.50
>>> 
>>> 2                       0.50
>>> 
>>> 4                       0.49
>>> 
>>> 
>>> 
>>> 
>>> 
>>> $mpirun -np 2 -report-bindings -cpu-set
>> 1,7 
>> /hpc/local/benchmarks/hpc-stack-icc/install/ompi-mellanox-v1.8/tests/o
>> su-micro-benchmarks-4.4.1/osu_latency
>> 
>>> 
>>> [clx-orion-001:10155] MCW rank 0 bound to socket 0[core 1[hwt 0]]:
>> [./B/./././././././././././.][./././././././././././././.]
>>> 
>>> [clx-orion-001:10155] MCW rank 1 bound to socket 0[core 7[hwt 0]]:
>> [./././././././B/./././././.][./././././././././././././.]
>>> 
>>> # OSU MPI Latency Test v4.4.1
>>> 
>>> # Size          Latency (us)
>>> 
>>> 0                       0.23
>>> 
>>> 1                       0.24
>>> 
>>> 2                       0.23
>>> 
>>> 4                       0.22
>>> 
>>> 8                       0.23
>>> 
>>> 
>>> 
>>> Both hwloc and /proc/cpuinfo indicates following cpu numbering
>>> 
>>> socket 0 cpus: 0 1 2 3 4 5 6 14 15 16 17 18 19 20
>>> 
>>> socket 1 cpus: 7 8 9 10 11 12 13 21 22 23 24 25 26 27
>>> 
>>> 
>>> 
>>> $hwloc-info -f
>>> 
>>> Machine (256GB)
>>> 
>>>  NUMANode L#0 (P#0 128GB) + Socket L#0 + L3 L#0 (35MB)
>>> 
>>>    L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
>>> 
>>>    L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1 + PU L#1 (P#1)
>>> 
>>>    L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2 + PU L#2 (P#2)
>>> 
>>>    L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3 + PU L#3 (P#3)
>>> 
>>>    L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4 + PU L#4 (P#4)
>>> 
>>>    L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5 + PU L#5 (P#5)
>>> 
>>>    L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6 + PU L#6 (P#6)
>>> 
>>>    L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7 + PU L#7 (P#14)
>>> 
>>>    L2 L#8 (256KB) + L1 L#8 (32KB) + Core L#8 + PU L#8 (P#15)
>>> 
>>>    L2 L#9 (256KB) + L1 L#9 (32KB) + Core L#9 + PU L#9 (P#16)
>>> 
>>>    L2 L#10 (256KB) + L1 L#10 (32KB) + Core L#10 + PU L#10 (P#17)
>>> 
>>>    L2 L#11 (256KB) + L1 L#11 (32KB) + Core L#11 + PU L#11 (P#18)
>>> 
>>>    L2 L#12 (256KB) + L1 L#12 (32KB) + Core L#12 + PU L#12 (P#19)
>>> 
>>>    L2 L#13 (256KB) + L1 L#13 (32KB) + Core L#13 + PU L#13 (P#20)
>>> 
>>>  NUMANode L#1 (P#1 128GB) + Socket L#1 + L3 L#1 (35MB)
>>> 
>>>    L2 L#14 (256KB) + L1 L#14 (32KB) + Core L#14 + PU L#14 (P#7)
>>> 
>>>    L2 L#15 (256KB) + L1 L#15 (32KB) + Core L#15 + PU L#15 (P#8)
>>> 
>>>    L2 L#16 (256KB) + L1 L#16 (32KB) + Core L#16 + PU L#16 (P#9)
>>> 
>>>    L2 L#17 (256KB) + L1 L#17 (32KB) + Core L#17 + PU L#17 (P#10)
>>> 
>>>    L2 L#18 (256KB) + L1 L#18 (32KB) + Core L#18 + PU L#18 (P#11)
>>> 
>>>    L2 L#19 (256KB) + L1 L#19 (32KB) + Core L#19 + PU L#19 (P#12)
>>> 
>>>    L2 L#20 (256KB) + L1 L#20 (32KB) + Core L#20 + PU L#20 (P#13)
>>> 
>>>    L2 L#21 (256KB) + L1 L#21 (32KB) + Core L#21 + PU L#21 (P#21)
>>> 
>>>    L2 L#22 (256KB) + L1 L#22 (32KB) + Core L#22 + PU L#22 (P#22)
>>> 
>>>    L2 L#23 (256KB) + L1 L#23 (32KB) + Core L#23 + PU L#23 (P#23)
>>> 
>>>    L2 L#24 (256KB) + L1 L#24 (32KB) + Core L#24 + PU L#24 (P#24)
>>> 
>>>    L2 L#25 (256KB) + L1 L#25 (32KB) + Core L#25 + PU L#25 (P#25)
>>> 
>>>    L2 L#26 (256KB) + L1 L#26 (32KB) + Core L#26 + PU L#26 (P#26)
>>> 
>>>    L2 L#27 (256KB) + L1 L#27 (32KB) + Core L#27 + PU L#27 (P#27)
>>> 
>>> 
>>> 
>>> 
>>> 
>>> So, Is --reporting-binding shows one more level of logical CPU numbering?
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -Devendar
>>> 
>>> 
>>> 
>>> 
>>> 
>>> From:devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph 
>>> Castain
>>> Sent: Monday, April 20, 2015 3:52 PM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] binding output error
>>> 
>>> 
>>> 
>>> Also, was this with HT's enabled? I'm wondering if the print code is
>> incorrectly computing the core because it isn't correctly accounting 
>> for HT cpus.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Mon, Apr 20, 2015 at 3:49 PM, Jeff Squyres (jsquyres)
>> <jsquy...@cisco.com> wrote:
>>> 
>>> Ralph's the authority on this one, but just to be sure: are all nodes 
>>> the
>> same topology? E.g., does adding "--hetero-nodes" to the mpirun 
>> command line fix the problem?
>>> 
>>> 
>>> 
>>>> On Apr 20, 2015, at 9:29 AM, Elena Elkina <elena.elk...@itseez.com>
>> wrote:
>>>> 
>>>> Hi guys,
>>>> 
>>>> I faced with an issue on our cluster related to mapping & binding
>> policies on 1.8.5.
>>>> 
>>>> The matter is that --report-bindings output doesn't correspond to 
>>>> the
>> locale. It looks like there is a mistake on the output itself, because 
>> it just puts serial core number while that core can be
>>> on another socket. For example,
>>>> 
>>>> mpirun -np 2 --display-devel-map --report-bindings --map-by socket
>> hostname
>>>>  Data for JOB [43064,1] offset 0
>>>> 
>>>>  Mapper requested: NULL  Last mapper: round_robin  Mapping policy:
>> BYSOCKET  Ranking policy: SOCKET
>>>>  Binding policy: CORE  Cpu set: NULL  PPR: NULL  Cpus-per-rank: 1
>>>>       Num new daemons: 0      New daemon starting vpid INVALID
>>>>       Num nodes: 1
>>>> 
>>>>  Data for node: clx-orion-001         Launch id: -1   State: 2
>>>>       Daemon: [[43064,0],0]   Daemon launched: True
>>>>       Num slots: 28   Slots in use: 2 Oversubscribed: FALSE
>>>>       Num slots allocated: 28 Max slots: 0
>>>>       Username on node: NULL
>>>>       Num procs: 2    Next node_rank: 2
>>>>       Data for proc: [[43064,1],0]
>>>>               Pid: 0  Local rank: 0   Node rank: 0    App rank: 0
>>>>               State: INITIALIZED      Restarts: 0     App_context: 0
>> Locale: 0-6,14-20       Bind location: 0        Binding: 0
>>>>       Data for proc: [[43064,1],1]
>>>>               Pid: 0  Local rank: 1   Node rank: 1    App rank: 1
>>>>               State: INITIALIZED      Restarts: 0     App_context: 0
>> Locale: 7-13,21-27      Bind location: 7        Binding: 7
>>>> [clx-orion-001:26951] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
>> [B/././././././././././././.][./././././././././././././.]
>>>> [clx-orion-001:26951] MCW rank 1 bound to socket 1[core 14[hwt 0]]:
>> [./././././././././././././.][B/././././././././././././.]
>>>> 
>>>> The second process should be bound at core 7 (not core 14).
>>>> 
>>>> 
>>>> Another example:
>>>> mpirun -np 8 --display-devel-map --report-bindings --map-by core
>> hostname
>>>>  Data for JOB [43202,1] offset 0
>>>> 
>>>>  Mapper requested: NULL  Last mapper: round_robin  Mapping policy:
>> BYCORE  Ranking policy: CORE
>>>>  Binding policy: CORE  Cpu set: NULL  PPR: NULL  Cpus-per-rank: 1
>>>>       Num new daemons: 0      New daemon starting vpid INVALID
>>>>       Num nodes: 1
>>>> 
>>>>  Data for node: clx-orion-001         Launch id: -1   State: 2
>>>>       Daemon: [[43202,0],0]   Daemon launched: True
>>>>       Num slots: 28   Slots in use: 8 Oversubscribed: FALSE
>>>>       Num slots allocated: 28 Max slots: 0
>>>>       Username on node: NULL
>>>>       Num procs: 8    Next node_rank: 8
>>>>       Data for proc: [[43202,1],0]
>>>>               Pid: 0  Local rank: 0   Node rank: 0    App rank: 0
>>>>               State: INITIALIZED      Restarts: 0     App_context: 0
>> Locale: 0       Bind location: 0        Binding: 0
>>>>       Data for proc: [[43202,1],1]
>>>>               Pid: 0  Local rank: 1   Node rank: 1    App rank: 1
>>>>               State: INITIALIZED      Restarts: 0     App_context: 0
>> Locale: 1       Bind location: 1        Binding: 1
>>>>       Data for proc: [[43202,1],2]
>>>>               Pid: 0  Local rank: 2   Node rank: 2    App rank: 2
>>>>               State: INITIALIZED      Restarts: 0     App_context: 0
>> Locale: 2       Bind location: 2        Binding: 2
>>>>       Data for proc: [[43202,1],3]
>>>>               Pid: 0  Local rank: 3   Node rank: 3    App rank: 3
>>>>               State: INITIALIZED      Restarts: 0     App_context: 0
>> Locale: 3       Bind location: 3        Binding: 3
>>>>       Data for proc: [[43202,1],4]
>>>>               Pid: 0  Local rank: 4   Node rank: 4    App rank: 4
>>>>               State: INITIALIZED      Restarts: 0     App_context: 0
>> Locale: 4       Bind location: 4        Binding: 4
>>>>       Data for proc: [[43202,1],5]
>>>>               Pid: 0  Local rank: 5   Node rank: 5    App rank: 5
>>>>               State: INITIALIZED      Restarts: 0     App_context: 0
>> Locale: 5       Bind location: 5        Binding: 5
>>>>       Data for proc: [[43202,1],6]
>>>>               Pid: 0  Local rank: 6   Node rank: 6    App rank: 6
>>>>               State: INITIALIZED      Restarts: 0     App_context: 0
>> Locale: 6       Bind location: 6        Binding: 6
>>>>       Data for proc: [[43202,1],7]
>>>>               Pid: 0  Local rank: 7   Node rank: 7    App rank: 7
>>>>               State: INITIALIZED      Restarts: 0     App_context: 0
>> Locale: 14      Bind location: 14       Binding: 14
>>>> [clx-orion-001:27069] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
>> [B/././././././././././././.][./././././././././././././.]
>>>> [clx-orion-001:27069] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
>> [./B/./././././././././././.][./././././././././././././.]
>>>> [clx-orion-001:27069] MCW rank 2 bound to socket 0[core 2[hwt 0]]:
>> [././B/././././././././././.][./././././././././././././.]
>>>> [clx-orion-001:27069] MCW rank 3 bound to socket 0[core 3[hwt 0]]:
>> [./././B/./././././././././.][./././././././././././././.]
>>>> [clx-orion-001:27069] MCW rank 4 bound to socket 0[core 4[hwt 0]]:
>> [././././B/././././././././.][./././././././././././././.]
>>>> [clx-orion-001:27069] MCW rank 5 bound to socket 0[core 5[hwt 0]]:
>> [./././././B/./././././././.][./././././././././././././.]
>>>> [clx-orion-001:27069] MCW rank 6 bound to socket 0[core 6[hwt 0]]:
>> [././././././B/././././././.][./././././././././././././.]
>>>> [clx-orion-001:27069] MCW rank 7 bound to socket 0[core 7[hwt 0]]:
>> [./././././././B/./././././.][./././././././././././././.]
>>>> 
>>>> Rank 7 should be bound at core 14 instead of core 7 since core 7 is 
>>>> at
>> another socket.
>>>> 
>>>> Best regards,
>>>> Elena
>>>> 
>>>> 
>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription:  http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/04/17273.php
>>> 
>>> 
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription:  http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/04/17282.php
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/develLink 
>>> to
>> this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/04/17287.php
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/04/17291.php
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/04/17295.php
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/04/17297.php

Reply via email to