I hope this is the correct place to get help with the problem I have. I have an IB fabric running on a Cisco SFS switch with a 7000D as the subnet manager and the whole thing has been running great for well over a year now, but today I noticed that after any node gets rebooted its IB link doesn't initialize. This has happened on 4 hosts now. What I see is as follows:
[r...@compute-2-7 ~]# ibstat CA 'mthca0' CA type: MT25204 Number of ports: 1 Firmware version: 1.2.917 Hardware version: 20 Node GUID: 0x0005ad00000c0990 System image GUID: 0x0005ad000100d050 Port 1: State: Initializing Physical state: LinkUp Rate: 20 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510a68 Port GUID: 0x0005ad00000c0991 I don't know much about subnet managers, since ours is in hardware and we've never had to configure anything on it, but I can login to the device and it isn't showing any errors. On a node that hasn't been rebooted recently and is still working I can see what appears to be a working subnet manager: [r...@compute-2-10 ~]# sminfo sminfo: sm lid 2 sm guid 0x5ad00001df2a0, activity count 2146213408 priority 10 state 3 SMINFO_MASTER The same command on a non-working node shows this: [r...@compute-2-7 ~]# sminfo sminfo: sm lid 0 sm guid 0x0, activity count 0 priority 0 state 2 SMINFO_STANDBY So far I have reseated all the cables involved on both ends and I have moved the cables on the switch end to new ports and none of that has made a difference even after reboots. I am hoping to find a node that I can take offline tomorrow so I can actually test the cables, but since this seems to be happening to any host that reboots it doesn't appear to be a cabling problem. Can anybody suggest where I should go from here? Is there anything I can do from a working or non-working host to diagnose the problem? Should I try rebooting the subnet manager switch? Will that affect the rest of the fabric? Thanks, Mike Robbert -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html