> Quoting Moni Shoua <[EMAIL PROTECTED]>: > Subject: Re: [RFC] [PATCH v2] IB/ipoib: Add bonding support to IPoIB > > > Thanks for the comments > > >> To fix it, this patch adds a dev field to struct ipoib_neigh which is used > >> instead of the struct neighbour dev one. > > > > It seems that in this design, if multiple ipoib interfaces are present, we > > might > > get an skb such that skb->dev will be different from the new dev field in > > struct > > ipoib_neigh. > > > > It seems that the result will be that the packet will be sent on a wrong > > interface. > > Right? > > > I don't see how. The field dev in ipoib_neigh doesn't take part in interface > selection. > As I see it, skb travels this path: > 1. Passed to bond_dev->hard_start_xmit > 2. bond_dev->hard_start_xmit chooses the current active interface, changes > skb->dev and enqueues it back for xmittig.
ipoib_neigh ah field includes struct ib_ah *. This selects important parameters which depend on both packet source and destination interfaces. I think the right thing might be to compare ipoib_neigh dev and skb->dev, and destroy ipoib_neigh if these do not match. > >> In addition, if an IPoIB device is removed before bonding is unloaded it > >> may > >> cause bond0 neighbours (neighbours that point to bond0) to exist after the > >> IPoIB > >> device no longer exist. This is why a neighbour cleanup is required during > >> device > >> cleanup. This cleanup scans the arp cache and the ndisc cache to find > >> there > >> neighbours of bond0 which refer also to the relevant ibX. Also, when > >> ib_ipoib module is > >> unloaded, the neighbour destructor must be set to NULL because the > >> neighbour function is in > >> ib_ipoib. > >> For this neigh table cleanup, it is required to export the symbol nd_tbl > >> just like the symbol arp_tbl is. > > > > I wonder about this: is it really true that any allocated neighbour is > > always in > > either arp_tbl or nd_tbl? For example, could some code have called > > neigh_hold > > and retained a neighbour that is not in either one of these tables? > > > I got the assumption about neighbours living in one of these 2 tables from > observation and code reading. I preferred that that on keeping track of all > ipoib_neighs and putting them in a list. However, I could do that instead of > neigh_table scanning. Do you think it's better? If some neighbours are not on any tables, it seems using our own lists (e.g. lists we have in ipoib_path) is the only option, no? > For the example... I didn't > understand it. Could you please explain? grep for neigh_hold. neighbour is only destroyed when ref count goes to 0. If some code does neigh_hold, it seems neighbour could be removed from table but destructor not yet called. > >> During my tests I found that when running > >> > >> 1. modprobe -r ib_mthca (to delete IPoIB interfaces) > >> 2. ping somewhere on the subnet of bond0 > >> > >> I get this stack dump (which ends with kernel death) > >> [<ffffffff8037ff32>] skb_under_panic+0x5c/0x60 > >> [<ffffffff882e00c2>] :ib_ipoib:ipoib_hard_header+0xa6/0xc0 > >> [<ffffffff803c3c98>] arp_create+0x120/0x226 > >> [<ffffffff803c3dc3>] arp_send+0x25/0x3b > >> [<ffffffff803c466a>] arp_solicit+0x186/0x195 > >> [<ffffffff8038c0ac>] neigh_timer_handler+0x2b5/0x309 > >> [<ffffffff8038bdf7>] neigh_timer_handler+0x0/0x309 > >> [<ffffffff80239599>] run_timer_softirq+0x130/0x19e > >> [<ffffffff80235fcc>] __do_softirq+0x55/0xc3 > >> [<ffffffff8020acac>] call_softirq+0x1c/0x28 > >> [<ffffffff8020c02b>] do_softirq+0x2c/0x7d > >> [<ffffffff8021864a>] smp_apic_timer_interrupt+0x57/0x6a > >> [<ffffffff80208e19>] mwait_idle+0x0/0x45 > >> [<ffffffff8020a756>] apic_timer_interrupt+0x66/0x70 > >> <EOI> [<ffffffff80208e5b>] mwait_idle+0x42/0x45 > >> [<ffffffff80208db1>] cpu_idle+0x8b/0xae > >> [<ffffffff80217d60>] start_secondary+0x47f/0x48f > >> > >> The only way I found to avoid this (for now) is to check skb headroom in > >> ipoib_hard_header. I guess that this safety check doesn't harm regular > >> IPoIB > >> operation and it seems to solve my problem. However, I would be happy to > >> hear what > >> others think of this last issue. > > > > As I said, this seems to indicate a problem in the bonding code. > > But what will happen after you error out in ipoib_hard_header? > > Is the packet dropped? What might break as a result? > > > I will check the hard_header_len issue in the bonding code more carefully. > From first look it seems that bonding does borrow the hard_header_len. So where does a shorter message come from? > Also, > my checks show that it is safe to return with error from > hard_header(). For example, in neigh_connected_output: > > err = dev->hard_header(skb, dev, ntohs(skb->protocol), > neigh->ha, NULL, skb->len); > read_unlock_bh(&neigh->lock); > if (err >= 0) > err = neigh->ops->queue_xmit(skb); > else { > err = -EINVAL; > kfree_skb(skb); > > >> I would really appreciate comments. > >> > >> thanks > >> > >> -MoniS > > -- MST _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general