I hope this is the correct place to get help with the problem I have. I have an 
IB fabric running on a Cisco SFS switch with a 7000D as the subnet manager and 
the whole thing has been running great for well over a year now, but today I 
noticed that after any node gets rebooted its IB link doesn't initialize. This 
has happened on 4 hosts now. What I see is as follows:

[r...@compute-2-7 ~]# ibstat
CA 'mthca0'
       CA type: MT25204
       Number of ports: 1
       Firmware version: 1.2.917
       Hardware version: 20
       Node GUID: 0x0005ad00000c0990
       System image GUID: 0x0005ad000100d050
       Port 1:
               State: Initializing
               Physical state: LinkUp
               Rate: 20
               Base lid: 0
               LMC: 0
               SM lid: 0
               Capability mask: 0x02510a68
               Port GUID: 0x0005ad00000c0991

I don't know much about subnet managers, since ours is in hardware and we've 
never had to configure anything on it, but I can login to the device and it 
isn't showing any errors. On a node that hasn't been rebooted recently and is 
still working I can see what appears to be a working subnet manager:

[r...@compute-2-10 ~]# sminfo 
sminfo: sm lid 2 sm guid 0x5ad00001df2a0, activity count 2146213408 priority 10 
state 3 SMINFO_MASTER

The same command on a non-working node shows this:

[r...@compute-2-7 ~]# sminfo 
sminfo: sm lid 0 sm guid 0x0, activity count 0 priority 0 state 2 SMINFO_STANDBY

So far I have reseated all the cables involved on both ends and I have moved 
the cables on the switch end to new ports and none of that has made a 
difference even after reboots. I am hoping to find a node that I can take 
offline tomorrow so I can actually test the cables, but since this seems to be 
happening to any host that reboots it doesn't appear to be a cabling problem. 
Can anybody suggest where I should go from here? Is there anything I can do 
from a working or non-working host to diagnose the problem? Should I try 
rebooting the subnet manager switch? Will that affect the rest of the fabric? 

Thanks,
Mike Robbert
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to