Hi, While doing some work to have linux bonding driver be able to work on top of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error 62.
ib0: failed send event (status=2, wrid=52 vend_err 62) What does this vendor error means? its the same system over which i saw the qp modify error. There are some more problematic prints i see here which i will be happy to get some idea on their meaning... ib1: dev_queue_xmit failed to requeue packet ib1: dev_queue_xmit failed to requeue packet ??? ib1: timing out; will leak address handles ib1: ib_dealloc_pd failed (the pd dealloc failure is as of the ah leak) but what is the leak cause ??? Below is a more detailed snapshot of the time the problems has occured, I was playing with this HCA 2 IB links, getting one of down for about 45 seconds (by some instrumentation of the SM) and then the other, etc. The ipoib code is unchanged (other then adding the "ipoib_set_mcast_list called" print). The bonding code was changed not to set the slave mac address but rather use the mac address of the active slave and also override the ether_setup() settings with the active slave ones. One thing which i think to see is that the IPoIB attempts to join the IPv4 broadcast group even when the port IB link is down, am i correct? if yes, would it be easy to fix this? Or. 1 ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 2 ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff 3 ib0: starting multicast thread 4 ib1: stopping multicast thread 5 ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 6 ib1: flushing multicast list 7 ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 8 ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff 9 ib1: starting multicast thread 10 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 11 ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 12 ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0) 13 ib1: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff810033c103c0, LID 0xc000, SL 0 14 ib1: successfully joined all multicast groups 15 bonding: bond0: link status definitely down for interface ib0, disabling it 16 bonding: bond0: making interface ib1 the new active one. 17 ib0: ipoib_set_mcast_list called 18 ib1: ipoib_set_mcast_list called 19 ib0: restarting multicast task 20 ib0: stopping multicast thread 21 ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 22 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4) 23 ib0: starting multicast thread 24 ib1: restarting multicast task 25 ib1: stopping multicast thread 26 ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 27 ib1: adding multicast entry for mgid ff12:401b:ffff:0000:0000:0000:0000:0001 28 ib1: starting multicast thread 29 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 30 ib1: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001 31 ib1: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0) 32 ib1: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV ffff810037f91d00, LID 0xc001, SL 0 33 ib1: successfully joined all multicast groups 34 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -110) 35 ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -110 36 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 37 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -110) 38 ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -110 39 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 40 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -110) 41 ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -110 42 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 43 ib0: stopping multicast thread 44 ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 45 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4) 46 ib0: flushing multicast list 47 ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff 48 ib0: starting multicast thread 49 ib1: stopping multicast thread 50 ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:0000:0001 51 ib1: flushing multicast list 52 ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:0000:0001 53 ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001 54 ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 55 ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff 56 ib1: starting multicast thread 57 ib0: stopping multicast thread 58 ib0: flushing multicast list 59 ib0: starting multicast thread 60 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 61 ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 62 bonding: bond0: link status definitely down for interface ib1, disabling it 63 ib1: ipoib_set_mcast_list called 64 bonding: bond0: now running without any active interface ! 65 ib1: restarting multicast task 66 ib1: stopping multicast thread 67 ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 68 ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4) 69 ib1: starting multicast thread 70 ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 71 ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0) 72 ib0: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff810033c10d80, LID 0xc000, SL 0 73 ib0: successfully joined all multicast groups 74 ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0) 75 ib1: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff81000b8453c0, LID 0xc000, SL 0 76 ib1: successfully joined all multicast groups 77 ib1: dev_queue_xmit failed to requeue packet 78 ib1: dev_queue_xmit failed to requeue packet 79 bonding: bond0: link status definitely up for interface ib0. 80 bonding: bond0: link status definitely up for interface ib1. 81 bonding: bond0: making interface ib0 the new active one. 82 ib0: ipoib_set_mcast_list called 83 bonding: bond0: first active interface up! 84 ib0: restarting multicast task 85 ib0: stopping multicast thread 86 ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 87 ib0: adding multicast entry for mgid ff12:401b:ffff:0000:0000:0000:0000:0001 88 ib0: starting multicast thread 89 ib0: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001 90 ib0: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0) 91 ib0: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV ffff81000099c340, LID 0xc001, SL 0 92 ib0: successfully joined all multicast groups 93 ib0: failed send event (status=2, wrid=52 vend_err 62) 94 ib0: ipoib_set_mcast_list called 95 ib0: restarting multicast task 96 ib0: stopping multicast thread 97 ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:0000:0001 98 ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001 99 ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:0000:0001 100 ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001 101 ib0: starting multicast thread 102 ib0: successfully joined all multicast groups 103 ib0: stopping multicast thread 104 ib0: flushing multicast list 105 ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 106 ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff 107 ib1: stopping multicast thread 108 ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 109 ib1: flushing multicast list 110 ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff 111 ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff 112 ib1: timing out; will leak address handles 113 bonding: bond0: released all slaves 114 ib0: stopping multicast thread 115 ib0: flushing multicast list 116 ib1: stopping multicast thread 117 ib1: flushing multicast list 118 ib1: ib_dealloc_pd failed _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general