What is the output of /sbin/lspci -tv? On Aug 31, 2015, at 4:06 PM, Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
> There was a problem reported on the User's list about Open MPI always picking > one Mellanox card when they were two in the machine. > > http://www.open-mpi.org/community/lists/users/2015/08/27507.php > > We dug a little deeper and I think this has to do with how hwloc is figuring > out where one of the cards is located. This verbose output (with some extra > printfs) shows that it cannot figure out which NUMA node mlx4_0 is closest > too. It can only determine it is located on HWLOC_OBJ_SYSTEM and therefore > Open MPI assumes a distance of 0.0. Because of this (smaller is better) Open > MPI library always picks mlx4_0 for all sockets. I am trying to figure out > if this is a hwloc or Open MPI bug. Any thoughts on this? > > [node1.local:05821] Checking distance for device=mlx4_1 > [node1.local:05821] hwloc_distances->nbobjs=4 > [node1.local:05821] hwloc_distances->latency[0]=1.000000 > [node1.local:05821] hwloc_distances->latency[1]=2.100000 > [node1.local:05821] hwloc_distances->latency[2]=2.100000 > [node1.local:05821] hwloc_distances->latency[3]=2.100000 > [node1.local:05821] hwloc_distances->latency[4]=2.100000 > [node1.local:05821] hwloc_distances->latency[5]=1.000000 > [node1.local:05821] hwloc_distances->latency[6]=2.100000 > [node1.local:05821] hwloc_distances->latency[7]=2.100000 > [node1.local:05821] ibv_obj->type = 4 > [node1.local:05821] ibv_obj->logical_index=1 > [node1.local:05821] my_obj->logical_index=0 > [node1.local:05821] Proc is bound: distance=2.100000 > > [node1.local:05821] Checking distance for device=mlx4_0 > [node1.local:05821] hwloc_distances->nbobjs=4 > [node1.local:05821] hwloc_distances->latency[0]=1.000000 > [node1.local:05821] hwloc_distances->latency[1]=2.100000 > [node1.local:05821] hwloc_distances->latency[2]=2.100000 > [node1.local:05821] hwloc_distances->latency[3]=2.100000 > [node1.local:05821] hwloc_distances->latency[4]=2.100000 > [node1.local:05821] hwloc_distances->latency[5]=1.000000 > [node1.local:05821] hwloc_distances->latency[6]=2.100000 > [node1.local:05821] hwloc_distances->latency[7]=2.100000 > [node1.local:05821] ibv_obj->type = 1 <---------------------HWLOC_OBJ_MACHINE > [node1.local:05821] ibv_obj->type set to NULL > [node1.local:05821] Proc is bound: distance=0.000000 > [node1.local:05821] [rank=0] openib: skipping device mlx4_1; it is too far > away > [node1.local:05821] [rank=0] openib: using port mlx4_0:1 > [node1.local:05821] [rank=0] openib: using port mlx4_0:2 > > Machine (1024GB) > NUMANode L#0 (P#0 256GB) + Socket L#0 + L3 L#0 (30MB) > L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0) > L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1) > L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2) > L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3) > L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4) > L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5) > L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6) > L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7) > L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#8) > L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#9) > L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 > (P#10) > L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 > (P#11) > NUMANode L#1 (P#1 256GB) > Socket L#1 + L3 L#1 (30MB) > L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU > L#12 (P#12) > L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU > L#13 (P#13) > L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU > L#14 (P#14) > L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU > L#15 (P#15) > L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16 + PU > L#16 (P#16) > L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17 + PU > L#17 (P#17) > L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18 + PU > L#18 (P#18) > L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19 + PU > L#19 (P#19) > L2 L#20 (256KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20 + PU > L#20 (P#20) > L2 L#21 (256KB) + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21 + PU > L#21 (P#21) > L2 L#22 (256KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22 + PU > L#22 (P#22) > L2 L#23 (256KB) + L1d L#23 (32KB) + L1i L#23 (32KB) + Core L#23 + PU > L#23 (P#23) > HostBridge L#5 > PCIBridge > PCI 15b3:1003 > Net L#7 "ib2" > Net L#8 "ib3" > OpenFabrics L#9 "mlx4_1" > NUMANode L#2 (P#2 256GB) + Socket L#2 + L3 L#2 (30MB) > L2 L#24 (256KB) + L1d L#24 (32KB) + L1i L#24 (32KB) + Core L#24 + PU L#24 > (P#24) > L2 L#25 (256KB) + L1d L#25 (32KB) + L1i L#25 (32KB) + Core L#25 + PU L#25 > (P#25) > L2 L#26 (256KB) + L1d L#26 (32KB) + L1i L#26 (32KB) + Core L#26 + PU L#26 > (P#26) > L2 L#27 (256KB) + L1d L#27 (32KB) + L1i L#27 (32KB) + Core L#27 + PU L#27 > (P#27) > L2 L#28 (256KB) + L1d L#28 (32KB) + L1i L#28 (32KB) + Core L#28 + PU L#28 > (P#28) > L2 L#29 (256KB) + L1d L#29 (32KB) + L1i L#29 (32KB) + Core L#29 + PU L#29 > (P#29) > L2 L#30 (256KB) + L1d L#30 (32KB) + L1i L#30 (32KB) + Core L#30 + PU L#30 > (P#30) > L2 L#31 (256KB) + L1d L#31 (32KB) + L1i L#31 (32KB) + Core L#31 + PU L#31 > (P#31) > L2 L#32 (256KB) + L1d L#32 (32KB) + L1i L#32 (32KB) + Core L#32 + PU L#32 > (P#32) > L2 L#33 (256KB) + L1d L#33 (32KB) + L1i L#33 (32KB) + Core L#33 + PU L#33 > (P#33) > L2 L#34 (256KB) + L1d L#34 (32KB) + L1i L#34 (32KB) + Core L#34 + PU L#34 > (P#34) > L2 L#35 (256KB) + L1d L#35 (32KB) + L1i L#35 (32KB) + Core L#35 + PU L#35 > (P#35) > NUMANode L#3 (P#3 256GB) + Socket L#3 + L3 L#3 (30MB) > L2 L#36 (256KB) + L1d L#36 (32KB) + L1i L#36 (32KB) + Core L#36 + PU L#36 > (P#36) > L2 L#37 (256KB) + L1d L#37 (32KB) + L1i L#37 (32KB) + Core L#37 + PU L#37 > (P#37) > L2 L#38 (256KB) + L1d L#38 (32KB) + L1i L#38 (32KB) + Core L#38 + PU L#38 > (P#38) > L2 L#39 (256KB) + L1d L#39 (32KB) + L1i L#39 (32KB) + Core L#39 + PU L#39 > (P#39) > L2 L#40 (256KB) + L1d L#40 (32KB) + L1i L#40 (32KB) + Core L#40 + PU L#40 > (P#40) > L2 L#41 (256KB) + L1d L#41 (32KB) + L1i L#41 (32KB) + Core L#41 + PU L#41 > (P#41) > L2 L#42 (256KB) + L1d L#42 (32KB) + L1i L#42 (32KB) + Core L#42 + PU L#42 > (P#42) > L2 L#43 (256KB) + L1d L#43 (32KB) + L1i L#43 (32KB) + Core L#43 + PU L#43 > (P#43) > L2 L#44 (256KB) + L1d L#44 (32KB) + L1i L#44 (32KB) + Core L#44 + PU L#44 > (P#44) > L2 L#45 (256KB) + L1d L#45 (32KB) + L1i L#45 (32KB) + Core L#45 + PU L#45 > (P#45) > L2 L#46 (256KB) + L1d L#46 (32KB) + L1i L#46 (32KB) + Core L#46 + PU L#46 > (P#46) > L2 L#47 (256KB) + L1d L#47 (32KB) + L1i L#47 (32KB) + Core L#47 + PU L#47 > (P#47) > HostBridge L#0 > PCIBridge > PCI 8086:1528 > Net L#0 "eth0" > PCI 8086:1528 > Net L#1 "eth1" > PCIBridge > PCI 1000:005d > Block L#2 "sda" > PCIBridge > PCI 15b3:1003 > Net L#3 "ib0" > Net L#4 "ib1" > OpenFabrics L#5 "mlx4_0" > PCIBridge > PCI 102b:0522 > PCI 19a2:0800 > PCI 8086:1d02 > Block L#6 "sr0" > > > This email message is for the sole use of the intended recipient(s) and may > contain confidential information. Any unauthorized review, use, disclosure > or distribution is prohibited. If you are not the intended recipient, please > contact the sender by reply email and destroy all copies of the original > message. > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/08/17904.php