We just had an "internal parity error" on a mellanox HCA. The HCA recovered. However, IPoIB did not fair as well. We are not sure of the details. What I have on the console is:
2006-11-09 15:20:05 ib_mthca 0000:07:00.0: Catastrophic error detected: internal parity error 2006-11-09 15:20:05 ib_mthca 0000:07:00.0: buf[00]: 05000014 2006-11-09 15:20:05 ib_mthca 0000:07:00.0: buf[01]: 00000000 2006-11-09 15:20:05 ib_mthca 0000:07:00.0: buf[02]: 00196240 2006-11-09 15:20:05 ib_mthca 0000:07:00.0: buf[03]: 00126618 2006-11-09 15:20:05 ib_mthca 0000:07:00.0: buf[04]: 00206128 2006-11-09 15:20:05 ib_mthca 0000:07:00.0: buf[05]: 001d6ff8 2006-11-09 15:20:05 ib_mthca 0000:07:00.0: buf[06]: ffffffff 2006-11-09 15:20:05 ib_mthca 0000:07:00.0: buf[07]: 00000000 2006-11-09 15:20:05 ib_mthca 0000:07:00.0: buf[08]: 00000000 2006-11-09 15:20:05 ib_mthca 0000:07:00.0: buf[09]: 00000000 2006-11-09 15:20:05 ib_mthca 0000:07:00.0: buf[0a]: 00000000 2006-11-09 15:20:05 ib_mthca 0000:07:00.0: buf[0b]: 00000000 2006-11-09 15:20:05 ib_mthca 0000:07:00.0: buf[0c]: 00000000 2006-11-09 15:20:05 ib_mthca 0000:07:00.0: buf[0d]: 00000000 2006-11-09 15:20:05 ib_mthca 0000:07:00.0: buf[0e]: 00000000 2006-11-09 15:20:05 ib_mthca 0000:07:00.0: buf[0f]: 00000000 2006-11-09 15:20:05 divert: no divert_blk to free, ib0 not ethernet 2006-11-09 15:20:05 divert: no divert_blk to free, ib1 not ethernet ifconfig showed ib0 as "gone" (as in not listed). We tried to ifup ib0 and got: # zeus64 /root > ifup ib0 ib_ipoib ib_ipoib device ib0 does not seem to be present, delaying initialization. I then tried to unload the ib_ipoib module and that has hung for the last 15 min. I have run ibv_rc_pingpong and ib_rdma_bw through the node fine. ibstat and ibstatus and the switch show the link to be up. So it appears as though the card recovered fine. What can we do? :-/ Thanks, Ira _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general