I saw the notes about opensm not working on the 2/19 build which is what I was using. Currently see the following behavior:
Have 3 blades with QDR HCAs and 2 QDR switches. The 2nd switch only connects to the blades when nodes reboot they come up to INIT so switch is not managed as I thought. Two of the blades are installed with RHEL 5.3+OFED 1.4.1. When the third blade is installed with OFED 1.5 the three systems all see each other and work correctly. When the 3rd blade is installed with 1.5.1 then it does not come ACTIVE. When you run ibnetdiscover from blade 3 you get: [r...@blade3 ~]# ibnetdiscover -P 2 ibwarn: [4064] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 0,2) src/ibnetdisc.c:457; Query remote node (DR path slid 0; dlid 0; 0,2) failed, skipping port # # Topology file: generated on Sat Feb 20 15:27:20 2010 # # Initiated from node 0002c9030004de20 port 0002c9030004de22 vendid=0x2c9 devid=0x673c sysimgguid=0x2c9030004de23 caguid=0x2c9030004de20 Ca 2 "H-0002c9030004de20" # "blade3 HCA-1" When you run ibnetdiscover on another blade you see: [r...@blade1 xrdma]# ibnetdiscover -P 2 ibwarn: [8307] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 0,2,19) ibwarn: [8307] handle_port: NodeInfo on DR path slid 0; dlid 0; 0,2,19 failed, skipping port # # Topology file: generated on Sat Feb 20 15:28:19 2010 # # Max of 2 hops discovered # Initiated from node 0002c9030004dc58 port 0002c9030004dc5a vendid=0x8f1 devid=0x5a5e sysimgguid=0x8f1050038014d switchguid=0x8f1050038014c(8f1050038014c) Switch 36 "S-0008f1050038014c" # "IBM HSSM" enhanced port 0 lid 8 lmc 0 [18] "H-0002c9030004db1c"[2](2c9030004db1e) # "blade2 HCA-1" lid 6 4xQDR [17] "H-0002c9030004dc58"[2](2c9030004dc5a) # "blade1" lid 5 4xQDR vendid=0x2c9 devid=0x673c sysimgguid=0x2c9030004db1f caguid=0x2c9030004db1c Ca 2 "H-0002c9030004db1c" # "blade2 HCA-1" [2](2c9030004db1e) "S-0008f1050038014c"[18] # lid 6 lmc 0 "IBM HSSM" lid 8 4xQDR vendid=0x2c9 devid=0x673c sysimgguid=0x2c9030004dc5b caguid=0x2c9030004dc58 Ca 2 "H-0002c9030004dc58" # "blade1" [2](2c9030004dc5a) "S-0008f1050038014c"[17] # lid 5 lmc 0 "IBM HSSM" lid 8 4xQDR Apparently the blades do NOT work together when one is at 1.5.1 2/19 and one is earlier. All of the nodes are at 2.6.818 firmware which seems to be the most recent for IBM.
_______________________________________________ ewg mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
