On 12/03/13 23:27, Jeff Squyres (jsquyres) wrote:
On Nov 22, 2013, at 1:19 PM, Paul Kapinos <kapi...@rz.rwth-aachen.de> wrote:

Well, I've tried this path on actual 1.7.3 (where the code is moved some 12 
lines - beginning with 2700).
!! - no output "skipping device"! Also when starting main processes and 
-bind-to-socket used. What I see is
[cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: found: device mlx4_1, port 1
[cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: this is not a usnic-capable device
[cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: found: device mlx4_0, port 1
[cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: this is not a usnic-capable device

That's actually ok -- that's from the usnic BTL, not the openib BTL.

The usnic BTL is the Cisco UD verbs component, and it only works with Cisco UCS 
servers and VICs; it will not work with generic IB cards.  Hence, these 
messages are telling you that the usnic BTL is disqualifying itself because the 
ibv devices it found are not Cisco UCS VICs.


Argh - what a shame not to see "btl:usnic"  :-|



Look for the openib messages, not the usnic messages.

Well, as said there were *no messages* form the patch you provided in
http://www.open-mpi.org/community/lists/devel/2013/06/12472.php

I've attached of a run with single process per node on nodes with 2 NICs, maybe you can see what goes wrong..

Best

Paul


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: 
registering btl components
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: found 
loaded component self
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: 
component self register function successful
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: found 
loaded component sm
--------------------------------------------------------------------------
WARNING: A user-supplied value attempted to override the default-only MCA
variable named "btl_sm_use_knem".

The user-supplied value was ignored.
--------------------------------------------------------------------------
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: 
component sm register function successful
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: found 
loaded component openib
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: 
component openib register function successful
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: found 
loaded component usnic
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: 
component usnic register function successful
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: opening btl 
components
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: found 
loaded component self
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: component 
self open function successful
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: found 
loaded component sm
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: component 
sm open function successful
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: found 
loaded component openib
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: component 
openib open function successful
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: found 
loaded component usnic
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: component 
usnic open function successful
[cluster-linux.rz.RWTH-Aachen.DE:19324] select: initializing btl component self
[cluster-linux.rz.RWTH-Aachen.DE:19324] select: init of component self returned 
success
[cluster-linux.rz.RWTH-Aachen.DE:19324] select: initializing btl component sm
[cluster-linux.rz.RWTH-Aachen.DE:19324] select: init of component sm returned 
success
[cluster-linux.rz.RWTH-Aachen.DE:19324] select: initializing btl component 
openib
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: registering 
btl components
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: found loaded 
component self
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: component 
self register function successful
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: found loaded 
component sm
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: component sm 
register function successful
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: found loaded 
component openib
[cluster-linux.rz.RWTH-Aachen.DE:19324] openib BTL: oob CPC available for use 
on mlx4_1:1
[cluster-linux.rz.RWTH-Aachen.DE:19324] openib BTL: rdmacm IP address not found 
on port
[cluster-linux.rz.RWTH-Aachen.DE:19324] openib BTL: rdmacm CPC unavailable for 
use on mlx4_1:1; skipped
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: component 
openib register function successful
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: found loaded 
component usnic
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: component 
usnic register function successful
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: opening btl 
components
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: found loaded 
component self
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: component self 
open function successful
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: found loaded 
component sm
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: component sm open 
function successful
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: found loaded 
component openib
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: component openib 
open function successful
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: found loaded 
component usnic
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: component usnic 
open function successful
[cluster.rz.RWTH-Aachen.DE:64279] select: initializing btl component self
[cluster.rz.RWTH-Aachen.DE:64279] select: init of component self returned 
success
[cluster.rz.RWTH-Aachen.DE:64279] select: initializing btl component sm
[cluster.rz.RWTH-Aachen.DE:64279] select: init of component sm returned success
[cluster.rz.RWTH-Aachen.DE:64279] select: initializing btl component openib
[cluster-linux.rz.RWTH-Aachen.DE:19324] openib BTL: oob CPC available for use 
on mlx4_0:1
[cluster-linux.rz.RWTH-Aachen.DE:19324] openib BTL: rdmacm CPC available for 
use on mlx4_0:1
[cluster-linux.rz.RWTH-Aachen.DE:19324] select: init of component openib 
returned success
[cluster-linux.rz.RWTH-Aachen.DE:19324] select: initializing btl component usnic
[cluster-linux.rz.RWTH-Aachen.DE:19324] found 2 verbs interfaces
[cluster-linux.rz.RWTH-Aachen.DE:19324] examining verbs interface: mlx4_1
[cluster-linux.rz.RWTH-Aachen.DE:19324] found acceptable verbs interface 
mlx4_1:1
[cluster-linux.rz.RWTH-Aachen.DE:19324] examining verbs interface: mlx4_0
[cluster-linux.rz.RWTH-Aachen.DE:19324] found acceptable verbs interface 
mlx4_0:1
[cluster-linux.rz.RWTH-Aachen.DE:19324] btl:usnic: found: device mlx4_1, port 1
[cluster-linux.rz.RWTH-Aachen.DE:19324] btl:usnic: this is not a usnic-capable 
device
[cluster-linux.rz.RWTH-Aachen.DE:19324] btl:usnic: found: device mlx4_0, port 1
[cluster-linux.rz.RWTH-Aachen.DE:19324] btl:usnic: this is not a usnic-capable 
device
[cluster-linux.rz.RWTH-Aachen.DE:19324] btl:usnic: returning 0 modules
[cluster-linux.rz.RWTH-Aachen.DE:19324] select: init of component usnic 
returned failure
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: component usnic closed
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: unloading component 
usnic
[cluster.rz.RWTH-Aachen.DE:64279] openib BTL: oob CPC available for use on 
mlx4_1:1
[cluster.rz.RWTH-Aachen.DE:64279] openib BTL: rdmacm IP address not found on 
port
[cluster.rz.RWTH-Aachen.DE:64279] openib BTL: rdmacm CPC unavailable for use on 
mlx4_1:1; skipped
[cluster.rz.RWTH-Aachen.DE:64279] openib BTL: oob CPC available for use on 
mlx4_0:1
[cluster.rz.RWTH-Aachen.DE:64279] openib BTL: rdmacm CPC available for use on 
mlx4_0:1
[cluster.rz.RWTH-Aachen.DE:64279] select: init of component openib returned 
success
[cluster.rz.RWTH-Aachen.DE:64279] select: initializing btl component usnic
[cluster.rz.RWTH-Aachen.DE:64279] found 2 verbs interfaces
[cluster.rz.RWTH-Aachen.DE:64279] examining verbs interface: mlx4_1
[cluster.rz.RWTH-Aachen.DE:64279] found acceptable verbs interface mlx4_1:1
[cluster.rz.RWTH-Aachen.DE:64279] examining verbs interface: mlx4_0
[cluster.rz.RWTH-Aachen.DE:64279] found acceptable verbs interface mlx4_0:1
[cluster.rz.RWTH-Aachen.DE:64279] btl:usnic: found: device mlx4_1, port 1
[cluster.rz.RWTH-Aachen.DE:64279] btl:usnic: this is not a usnic-capable device
[cluster.rz.RWTH-Aachen.DE:64279] btl:usnic: found: device mlx4_0, port 1
[cluster.rz.RWTH-Aachen.DE:64279] btl:usnic: this is not a usnic-capable device
[cluster.rz.RWTH-Aachen.DE:64279] btl:usnic: returning 0 modules
[cluster.rz.RWTH-Aachen.DE:64279] select: init of component usnic returned 
failure
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: component usnic closed
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: unloading component usnic
 Prozessor            1  on Host: cluster-linux.rz.RWTH-Aachen.DE
 Prozessor            0  on Host: cluster.rz.RWTH-Aachen.DE
    0 -->     1 Latenz:    0.009 ms, Bandbreite: 1804.681 Mbyte/s
 Fertig-ID            0  on Host: cluster.rz.RWTH-Aachen.DE
    1 -->     0 Latenz:    0.149 ms, Bandbreite: 1998.477 Mbyte/s
 Fertig-ID            1  on Host: cluster-linux.rz.RWTH-Aachen.DE
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: component self closed
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: unloading component self
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: component sm closed
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: unloading component sm
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: component openib closed
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: unloading component openib
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: component self closed
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: unloading component 
self
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: component sm closed
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: unloading component sm
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: component openib 
closed
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: unloading component 
openib
[cluster.rz.RWTH-Aachen.DE:64273] 1 more process has sent help message 
help-mca-var.txt / default-only-param-set
[cluster.rz.RWTH-Aachen.DE:64273] Set MCA parameter "orte_base_help_aggregate" 
to 0 to see all help / error messages

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to