Hi,

While doing some work to have linux bonding driver be able to work on top
of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error 62.

        ib0: failed send event (status=2, wrid=52 vend_err 62)

What does this vendor error means? its the same system over which i saw the qp 
modify error.

There are some more problematic prints i see here which i will be happy
to get some idea on their meaning...

 ib1: dev_queue_xmit failed to requeue packet
 ib1: dev_queue_xmit failed to requeue packet

 ???

 ib1: timing out; will leak address handles
 ib1: ib_dealloc_pd failed

(the pd dealloc failure is as of the ah leak) but what is the leak cause ???

Below is a more detailed snapshot of the time the problems has occured, I was
playing with this HCA 2 IB links, getting one of down for about 45 seconds (by
some instrumentation of the SM) and then the other, etc.

The ipoib code is unchanged (other then adding the "ipoib_set_mcast_list 
called" print).

The bonding code was changed not to set the slave mac address but rather use 
the mac address
of the active slave and also override the ether_setup() settings with the 
active slave ones.

One thing which i think to see is that the IPoIB attempts to join the IPv4 
broadcast group
even when the port IB link is down, am i correct? if yes, would it be easy to 
fix this?

Or.

     1  ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
     2  ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
     3  ib0: starting multicast thread
     4  ib1: stopping multicast thread
     5  ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
     6  ib1: flushing multicast list
     7  ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
     8  ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
     9  ib1: starting multicast thread
    10  ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    11  ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    12  ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff 
(status 0)
    13  ib1: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff810033c103c0, 
LID 0xc000, SL 0
    14  ib1: successfully joined all multicast groups
    15  bonding: bond0: link status definitely down for interface ib0, 
disabling it
    16  bonding: bond0: making interface ib1 the new active one.
    17  ib0: ipoib_set_mcast_list called
    18  ib1: ipoib_set_mcast_list called
    19  ib0: restarting multicast task
    20  ib0: stopping multicast thread
    21  ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    22  ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff 
(status -4)
    23  ib0: starting multicast thread
    24  ib1: restarting multicast task
    25  ib1: stopping multicast thread
    26  ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    27  ib1: adding multicast entry for mgid 
ff12:401b:ffff:0000:0000:0000:0000:0001
    28  ib1: starting multicast thread
    29  ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    30  ib1: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001
    31  ib1: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 
(status 0)
    32  ib1: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV ffff810037f91d00, 
LID 0xc001, SL 0
    33  ib1: successfully joined all multicast groups
    34  ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff 
(status -110)
    35  ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, 
status -110
    36  ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    37  ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff 
(status -110)
    38  ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, 
status -110
    39  ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    40  ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff 
(status -110)
    41  ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, 
status -110
    42  ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    43  ib0: stopping multicast thread
    44  ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    45  ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff 
(status -4)
    46  ib0: flushing multicast list
    47  ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
    48  ib0: starting multicast thread
    49  ib1: stopping multicast thread
    50  ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:0000:0001
    51  ib1: flushing multicast list
    52  ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:0000:0001
    53  ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001
    54  ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    55  ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
    56  ib1: starting multicast thread
    57  ib0: stopping multicast thread
    58  ib0: flushing multicast list
    59  ib0: starting multicast thread
    60  ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    61  ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    62  bonding: bond0: link status definitely down for interface ib1, 
disabling it
    63  ib1: ipoib_set_mcast_list called
    64  bonding: bond0: now running without any active interface !
    65  ib1: restarting multicast task
    66  ib1: stopping multicast thread
    67  ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    68  ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff 
(status -4)
    69  ib1: starting multicast thread
    70  ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    71  ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff 
(status 0)
    72  ib0: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff810033c10d80, 
LID 0xc000, SL 0
    73  ib0: successfully joined all multicast groups
    74  ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff 
(status 0)
    75  ib1: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff81000b8453c0, 
LID 0xc000, SL 0
    76  ib1: successfully joined all multicast groups
    77  ib1: dev_queue_xmit failed to requeue packet
    78  ib1: dev_queue_xmit failed to requeue packet
    79  bonding: bond0: link status definitely up for interface ib0.
    80  bonding: bond0: link status definitely up for interface ib1.
    81  bonding: bond0: making interface ib0 the new active one.
    82  ib0: ipoib_set_mcast_list called
    83  bonding: bond0: first active interface up!
    84  ib0: restarting multicast task
    85  ib0: stopping multicast thread
    86  ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    87  ib0: adding multicast entry for mgid 
ff12:401b:ffff:0000:0000:0000:0000:0001
    88  ib0: starting multicast thread
    89  ib0: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001
    90  ib0: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 
(status 0)
    91  ib0: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV ffff81000099c340, 
LID 0xc001, SL 0
    92  ib0: successfully joined all multicast groups
    93  ib0: failed send event (status=2, wrid=52 vend_err 62)
    94  ib0: ipoib_set_mcast_list called
    95  ib0: restarting multicast task
    96  ib0: stopping multicast thread
    97  ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:0000:0001
    98  ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001
    99  ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:0000:0001
   100  ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001
   101  ib0: starting multicast thread
   102  ib0: successfully joined all multicast groups
   103  ib0: stopping multicast thread
   104  ib0: flushing multicast list
   105  ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
   106  ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
   107  ib1: stopping multicast thread
   108  ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
   109  ib1: flushing multicast list
   110  ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
   111  ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
   112  ib1: timing out; will leak address handles
   113  bonding: bond0: released all slaves
   114  ib0: stopping multicast thread
   115  ib0: flushing multicast list
   116  ib1: stopping multicast thread
   117  ib1: flushing multicast list
   118  ib1: ib_dealloc_pd failed

_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to