We just had an "internal parity error" on a mellanox HCA.  The HCA recovered.  
However, IPoIB did not fair as well.  We are not sure of the details.  What I 
have on the console is:

2006-11-09 15:20:05 ib_mthca 0000:07:00.0: Catastrophic error detected: 
internal parity error
2006-11-09 15:20:05 ib_mthca 0000:07:00.0:   buf[00]: 05000014
2006-11-09 15:20:05 ib_mthca 0000:07:00.0:   buf[01]: 00000000
2006-11-09 15:20:05 ib_mthca 0000:07:00.0:   buf[02]: 00196240
2006-11-09 15:20:05 ib_mthca 0000:07:00.0:   buf[03]: 00126618
2006-11-09 15:20:05 ib_mthca 0000:07:00.0:   buf[04]: 00206128
2006-11-09 15:20:05 ib_mthca 0000:07:00.0:   buf[05]: 001d6ff8
2006-11-09 15:20:05 ib_mthca 0000:07:00.0:   buf[06]: ffffffff
2006-11-09 15:20:05 ib_mthca 0000:07:00.0:   buf[07]: 00000000
2006-11-09 15:20:05 ib_mthca 0000:07:00.0:   buf[08]: 00000000
2006-11-09 15:20:05 ib_mthca 0000:07:00.0:   buf[09]: 00000000
2006-11-09 15:20:05 ib_mthca 0000:07:00.0:   buf[0a]: 00000000
2006-11-09 15:20:05 ib_mthca 0000:07:00.0:   buf[0b]: 00000000
2006-11-09 15:20:05 ib_mthca 0000:07:00.0:   buf[0c]: 00000000
2006-11-09 15:20:05 ib_mthca 0000:07:00.0:   buf[0d]: 00000000
2006-11-09 15:20:05 ib_mthca 0000:07:00.0:   buf[0e]: 00000000
2006-11-09 15:20:05 ib_mthca 0000:07:00.0:   buf[0f]: 00000000
2006-11-09 15:20:05 divert: no divert_blk to free, ib0 not ethernet
2006-11-09 15:20:05 divert: no divert_blk to free, ib1 not ethernet


ifconfig showed ib0 as "gone" (as in not listed).  We tried to ifup ib0 and got:

# zeus64 /root > ifup ib0
ib_ipoib
ib_ipoib device ib0 does not seem to be present, delaying initialization.


I then tried to unload the ib_ipoib module and that has hung for the last 15 
min.

I have run ibv_rc_pingpong and ib_rdma_bw through the node fine.  ibstat and 
ibstatus and the switch show the link to be up.  So it appears as though the 
card recovered fine.

What can we do?

:-/

Thanks,
Ira

_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to