On 12/03/13 23:27, Jeff Squyres (jsquyres) wrote:
On Nov 22, 2013, at 1:19 PM, Paul Kapinos <kapi...@rz.rwth-aachen.de> wrote:Well, I've tried this path on actual 1.7.3 (where the code is moved some 12 lines - beginning with 2700). !! - no output "skipping device"! Also when starting main processes and -bind-to-socket used. What I see is[cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: found: device mlx4_1, port 1 [cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: this is not a usnic-capable device [cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: found: device mlx4_0, port 1 [cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: this is not a usnic-capable deviceThat's actually ok -- that's from the usnic BTL, not the openib BTL. The usnic BTL is the Cisco UD verbs component, and it only works with Cisco UCS servers and VICs; it will not work with generic IB cards. Hence, these messages are telling you that the usnic BTL is disqualifying itself because the ibv devices it found are not Cisco UCS VICs.
Argh - what a shame not to see "btl:usnic" :-|
Look for the openib messages, not the usnic messages.
Well, as said there were *no messages* form the patch you provided in http://www.open-mpi.org/community/lists/devel/2013/06/12472.phpI've attached of a run with single process per node on nodes with 2 NICs, maybe you can see what goes wrong..
Best Paul -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: registering btl components [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: found loaded component self [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: component self register function successful [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: found loaded component sm -------------------------------------------------------------------------- WARNING: A user-supplied value attempted to override the default-only MCA variable named "btl_sm_use_knem". The user-supplied value was ignored. -------------------------------------------------------------------------- [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: component sm register function successful [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: found loaded component openib [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: component openib register function successful [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: found loaded component usnic [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: component usnic register function successful [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: opening btl components [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: found loaded component self [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: component self open function successful [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: found loaded component sm [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: component sm open function successful [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: found loaded component openib [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: component openib open function successful [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: found loaded component usnic [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: component usnic open function successful [cluster-linux.rz.RWTH-Aachen.DE:19324] select: initializing btl component self [cluster-linux.rz.RWTH-Aachen.DE:19324] select: init of component self returned success [cluster-linux.rz.RWTH-Aachen.DE:19324] select: initializing btl component sm [cluster-linux.rz.RWTH-Aachen.DE:19324] select: init of component sm returned success [cluster-linux.rz.RWTH-Aachen.DE:19324] select: initializing btl component openib [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: registering btl components [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: found loaded component self [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: component self register function successful [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: found loaded component sm [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: component sm register function successful [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: found loaded component openib [cluster-linux.rz.RWTH-Aachen.DE:19324] openib BTL: oob CPC available for use on mlx4_1:1 [cluster-linux.rz.RWTH-Aachen.DE:19324] openib BTL: rdmacm IP address not found on port [cluster-linux.rz.RWTH-Aachen.DE:19324] openib BTL: rdmacm CPC unavailable for use on mlx4_1:1; skipped [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: component openib register function successful [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: found loaded component usnic [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: component usnic register function successful [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: opening btl components [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: found loaded component self [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: component self open function successful [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: found loaded component sm [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: component sm open function successful [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: found loaded component openib [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: component openib open function successful [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: found loaded component usnic [cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: component usnic open function successful [cluster.rz.RWTH-Aachen.DE:64279] select: initializing btl component self [cluster.rz.RWTH-Aachen.DE:64279] select: init of component self returned success [cluster.rz.RWTH-Aachen.DE:64279] select: initializing btl component sm [cluster.rz.RWTH-Aachen.DE:64279] select: init of component sm returned success [cluster.rz.RWTH-Aachen.DE:64279] select: initializing btl component openib [cluster-linux.rz.RWTH-Aachen.DE:19324] openib BTL: oob CPC available for use on mlx4_0:1 [cluster-linux.rz.RWTH-Aachen.DE:19324] openib BTL: rdmacm CPC available for use on mlx4_0:1 [cluster-linux.rz.RWTH-Aachen.DE:19324] select: init of component openib returned success [cluster-linux.rz.RWTH-Aachen.DE:19324] select: initializing btl component usnic [cluster-linux.rz.RWTH-Aachen.DE:19324] found 2 verbs interfaces [cluster-linux.rz.RWTH-Aachen.DE:19324] examining verbs interface: mlx4_1 [cluster-linux.rz.RWTH-Aachen.DE:19324] found acceptable verbs interface mlx4_1:1 [cluster-linux.rz.RWTH-Aachen.DE:19324] examining verbs interface: mlx4_0 [cluster-linux.rz.RWTH-Aachen.DE:19324] found acceptable verbs interface mlx4_0:1 [cluster-linux.rz.RWTH-Aachen.DE:19324] btl:usnic: found: device mlx4_1, port 1 [cluster-linux.rz.RWTH-Aachen.DE:19324] btl:usnic: this is not a usnic-capable device [cluster-linux.rz.RWTH-Aachen.DE:19324] btl:usnic: found: device mlx4_0, port 1 [cluster-linux.rz.RWTH-Aachen.DE:19324] btl:usnic: this is not a usnic-capable device [cluster-linux.rz.RWTH-Aachen.DE:19324] btl:usnic: returning 0 modules [cluster-linux.rz.RWTH-Aachen.DE:19324] select: init of component usnic returned failure [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: component usnic closed [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: unloading component usnic [cluster.rz.RWTH-Aachen.DE:64279] openib BTL: oob CPC available for use on mlx4_1:1 [cluster.rz.RWTH-Aachen.DE:64279] openib BTL: rdmacm IP address not found on port [cluster.rz.RWTH-Aachen.DE:64279] openib BTL: rdmacm CPC unavailable for use on mlx4_1:1; skipped [cluster.rz.RWTH-Aachen.DE:64279] openib BTL: oob CPC available for use on mlx4_0:1 [cluster.rz.RWTH-Aachen.DE:64279] openib BTL: rdmacm CPC available for use on mlx4_0:1 [cluster.rz.RWTH-Aachen.DE:64279] select: init of component openib returned success [cluster.rz.RWTH-Aachen.DE:64279] select: initializing btl component usnic [cluster.rz.RWTH-Aachen.DE:64279] found 2 verbs interfaces [cluster.rz.RWTH-Aachen.DE:64279] examining verbs interface: mlx4_1 [cluster.rz.RWTH-Aachen.DE:64279] found acceptable verbs interface mlx4_1:1 [cluster.rz.RWTH-Aachen.DE:64279] examining verbs interface: mlx4_0 [cluster.rz.RWTH-Aachen.DE:64279] found acceptable verbs interface mlx4_0:1 [cluster.rz.RWTH-Aachen.DE:64279] btl:usnic: found: device mlx4_1, port 1 [cluster.rz.RWTH-Aachen.DE:64279] btl:usnic: this is not a usnic-capable device [cluster.rz.RWTH-Aachen.DE:64279] btl:usnic: found: device mlx4_0, port 1 [cluster.rz.RWTH-Aachen.DE:64279] btl:usnic: this is not a usnic-capable device [cluster.rz.RWTH-Aachen.DE:64279] btl:usnic: returning 0 modules [cluster.rz.RWTH-Aachen.DE:64279] select: init of component usnic returned failure [cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: component usnic closed [cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: unloading component usnic Prozessor 1 on Host: cluster-linux.rz.RWTH-Aachen.DE Prozessor 0 on Host: cluster.rz.RWTH-Aachen.DE 0 --> 1 Latenz: 0.009 ms, Bandbreite: 1804.681 Mbyte/s Fertig-ID 0 on Host: cluster.rz.RWTH-Aachen.DE 1 --> 0 Latenz: 0.149 ms, Bandbreite: 1998.477 Mbyte/s Fertig-ID 1 on Host: cluster-linux.rz.RWTH-Aachen.DE [cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: component self closed [cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: unloading component self [cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: component sm closed [cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: unloading component sm [cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: component openib closed [cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: unloading component openib [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: component self closed [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: unloading component self [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: component sm closed [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: unloading component sm [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: component openib closed [cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: unloading component openib [cluster.rz.RWTH-Aachen.DE:64273] 1 more process has sent help message help-mca-var.txt / default-only-param-set [cluster.rz.RWTH-Aachen.DE:64273] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
smime.p7s
Description: S/MIME Cryptographic Signature