Re: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour
Michael S. Tsirkin wrote: >> Quoting Moni Shoua <[EMAIL PROTECTED]>: >> Subject: Re: [PATCH] IB/ipoib get net_device from ipoib_neigh instead of >> linux neighbour >> >> >>> Another concern: assume that one device goes away (e.g. hotplug). >>> It seems that neighbours whose dev field point to another device, will not >>> be destroyed. >>> Correct? >> I agree. >> >>> Therefore in your design, it seems that to_ipoib_neigh()->dev >>> will get us a pointer to device that has been removed already. >>> >> I agree that this is a problem. > > I think we can solve this if we track all ipoib neighbours, like we do for > old kernels, > and then flush ipoib neighbours on any hotplug event. > Roland, does this sound too awful? > >> It think it would be best to prevent an IPoIB device >> from disappearing or from ib_ipoib from being unloaded as long as IPoIB >> device is a slave. Unfortunately, I don't see how this can be done just >> by fixing something in bonding or IPoIB. > > So hotplug is blocked potentially forever? > This does not sound good. OK, so I'm dropping this thought. > >> However, any slave knows he has a master (dev->master). >> What do you think about a solution where IPoIB first tries to clean up the >> neighbours that belong to it's master before deleting the IPoIB device? > > How? Let me know what do you think about that. I hope this makes sense. in IPoIB, before calling unregister_netdev do for each kernel neighbour n if n->dev == ib_dev->master delete n Michael, as I see it we have to deal with 2 cases. 1. IPoIB device is deleted (unregister_netdev) - IPoIB netdev in not in the kernel's address space. we have to make sure that no one holds a pointer to it after it is deleted. 2 ib_ipoib module is unloaded (modprobe -r) - the ipoib_neigh_destructor is not in the kernel's address space. we have to make sure no one calls to it after the module is unloaded. I think that if nothing prevents the execution of the "code" above it serves both cases. Do you see any problem with that? Do I have to maintain my own list of neighbours or use the kernel's arp table for that? I am trying to study the neighbour cleanup function and do something like that but I would be happy to learn from others as well. Furthermore, bond_setup_by_slave is called only for non Ethernet devices (we consider to change the logic to "called only for IPoIB devices just for safety). >>> Why is this necessary, BTW? >>> >> If we don't do that, we get a memory leak because the neigh destructor will >> never be called for non IPoIB devices although they carry ipoib_neigh >> with them. > > How can this happen? If it does, I think we are back to where we started: > to_ipoib_neigh is broken for non-IPoIB device. > I thought you said only devices of the same type can be paired? > > The scenario is: 1. kernel allocates a neighbour structure for bond0, puts it on a skb and passed it to bond xmit function. 2. bond0 passes the skb to ipoib 3. ipoib allocates ipoib_neigh and hangs it on linux neighbour. 4. a while after that, the kernel wants to destroy the neighbour (cleanup) but doesn't call ipoib_neigh_destructor because it the neigh setup registered the destructor for ibX device. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour
> Quoting Moni Shoua <[EMAIL PROTECTED]>: > Subject: Re: [PATCH] IB/ipoib get net_device from ipoib_neigh instead of > linux neighbour > > > > Another concern: assume that one device goes away (e.g. hotplug). > > It seems that neighbours whose dev field point to another device, will not > > be destroyed. > > Correct? > > I agree. > > > Therefore in your design, it seems that to_ipoib_neigh()->dev > > will get us a pointer to device that has been removed already. > > > I agree that this is a problem. I think we can solve this if we track all ipoib neighbours, like we do for old kernels, and then flush ipoib neighbours on any hotplug event. Roland, does this sound too awful? > It think it would be best to prevent an IPoIB device > from disappearing or from ib_ipoib from being unloaded as long as IPoIB > device is a slave. Unfortunately, I don't see how this can be done just > by fixing something in bonding or IPoIB. So hotplug is blocked potentially forever? This does not sound good. > However, any slave knows he has a master (dev->master). > What do you think about a solution where IPoIB first tries to clean up the > neighbours that belong to it's master before deleting the IPoIB device? How? > >> Furthermore, bond_setup_by_slave is called only for non > >> Ethernet devices (we consider to change the logic to "called only for > >> IPoIB devices just for safety). > > > > Why is this necessary, BTW? > > > If we don't do that, we get a memory leak because the neigh destructor will > never be called for non IPoIB devices although they carry ipoib_neigh > with them. How can this happen? If it does, I think we are back to where we started: to_ipoib_neigh is broken for non-IPoIB device. I thought you said only devices of the same type can be paired? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour
> Another concern: assume that one device goes away (e.g. hotplug). > It seems that neighbours whose dev field point to another device, will not be > destroyed. > Correct? I agree. > > Therefore in your design, it seems that to_ipoib_neigh()->dev > will get us a pointer to device that has been removed already. > I agree that this is a problem. It think it would be best to prevent an IPoIB device from disappearing or from ib_ipoib from being unloaded as long as IPoIB device is a slave. Unfortunately, I don't see how this can be done just by fixing something in bonding or IPoIB. However, any slave knows he has a master (dev->master). What do you think about a solution where IPoIB first tries to clean up the neighbours that belong to it's master before deleting the IPoIB device? >> Furthermore, bond_setup_by_slave is called only for non >> Ethernet devices (we consider to change the logic to "called only for >> IPoIB devices just for safety). > > Why is this necessary, BTW? > If we don't do that, we get a memory leak because the neigh destructor will never be called for non IPoIB devices although they carry ipoib_neigh with them. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour
> >>-- > >>IPoIB uses a two layer neighboring scheme, such that for each struct > >>neighbour > >>whose device is an ipoib one, there is a struct ipoib_neigh buddy which is > >>created on demand at the tx flow by an > >>ipoib_neigh_alloc(skb->dst->neighbour) > >>call. > >> > >>When using the bonding driver, neighbours are created by the net stack on > >>behalf > >>of the bonding (master) device. On the tx flow the bonding code gets an skb > >>such > >>that skb->dev points to the master device, it changes this skb to point on > >>the > >>slave device and calls the slave hard_start_xmit function. > >> > >>Combing these two flows, there is a hole if some code at ipoib > >>(ipoib_neigh_destructor) assumes that for each struct neighbour it gets, > >>n->dev > >>is an ipoib device so for example netdev_priv(n->dev) would be of type > >>struct > >>ipoib_dev_priv. > > > > > > Could you plese elaborate how ipoib_neigh_destructor comes to be called at > > all? > > At what point does ipoib_neigh_setup_dev get called? > > > > > The bond device uses its slave's neigh_setup function. > Please look at line 19 below from the bonding code. > static void bond_setup_by_slave(struct net_device *bond_dev, > 11 + struct net_device *slave_dev) > 12 +{ > 13 + bond_dev->hard_header = slave_dev->hard_header; > 14 + bond_dev->rebuild_header= slave_dev->rebuild_header; > 15 + bond_dev->hard_header_cache = slave_dev->hard_header_cache; > 16 + bond_dev->header_cache_update = slave_dev->header_cache_update; > 17 + bond_dev->hard_header_parse = slave_dev->hard_header_parse; > 18 + > 19 + bond_dev->neigh_setup = slave_dev->neigh_setup; > 20 + > 21 + bond_dev->type = slave_dev->type; > 22 + bond_dev->hard_header_len = slave_dev->hard_header_len; > 23 + bond_dev->addr_len = slave_dev->addr_len; > 24 + > 25 + memcpy(bond_dev->broadcast, slave_dev->broadcast, > 26 + slave_dev->addr_len); > 27 +} Another concern: assume that one device goes away (e.g. hotplug). It seems that neighbours whose dev field point to another device, will not be destroyed. Correct? Therefore in your design, it seems that to_ipoib_neigh()->dev will get us a pointer to device that has been removed already. > >>To fix it, this patch adds a dev field to struct ipoib_neigh which is used > >>instead of the struct neighbour dev one. > > > > > > What I am concerned with is - if the master is not an IPoIB device, > > what guarantee do we have that to_ipoib_neigh will return 0 > > and not part of an actual hardware address? > > > > Without bonding, the reason is that dev points to an ipoib device, > > so we know hw address is 20 bytes. > > > > I guess you meant "if the slave is not an IPoIB device"... Yes. > The bond device doesn't allow devices of different types to be grouped > together as its slaves. I see. > Furthermore, bond_setup_by_slave is called only for non > Ethernet devices (we consider to change the logic to "called only for > IPoIB devices just for safety). Why is this necessary, BTW? -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour
Michael S. Tsirkin wrote: >>-- >>IPoIB uses a two layer neighboring scheme, such that for each struct neighbour >>whose device is an ipoib one, there is a struct ipoib_neigh buddy which is >>created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) >>call. >> >>When using the bonding driver, neighbours are created by the net stack on >>behalf >>of the bonding (master) device. On the tx flow the bonding code gets an skb >>such >>that skb->dev points to the master device, it changes this skb to point on the >>slave device and calls the slave hard_start_xmit function. >> >>Combing these two flows, there is a hole if some code at ipoib >>(ipoib_neigh_destructor) assumes that for each struct neighbour it gets, >>n->dev >>is an ipoib device so for example netdev_priv(n->dev) would be of type struct >>ipoib_dev_priv. > > > Could you plese elaborate how ipoib_neigh_destructor comes to be called at > all? > At what point does ipoib_neigh_setup_dev get called? > > The bond device uses its slave's neigh_setup function. Please look at line 19 below from the bonding code. static void bond_setup_by_slave(struct net_device *bond_dev, 11 + struct net_device *slave_dev) 12 +{ 13 + bond_dev->hard_header = slave_dev->hard_header; 14 + bond_dev->rebuild_header= slave_dev->rebuild_header; 15 + bond_dev->hard_header_cache = slave_dev->hard_header_cache; 16 + bond_dev->header_cache_update = slave_dev->header_cache_update; 17 + bond_dev->hard_header_parse = slave_dev->hard_header_parse; 18 + 19 + bond_dev->neigh_setup = slave_dev->neigh_setup; 20 + 21 + bond_dev->type = slave_dev->type; 22 + bond_dev->hard_header_len = slave_dev->hard_header_len; 23 + bond_dev->addr_len = slave_dev->addr_len; 24 + 25 + memcpy(bond_dev->broadcast, slave_dev->broadcast, 26 + slave_dev->addr_len); 27 +} >>To fix it, this patch adds a dev field to struct ipoib_neigh which is used >>instead of the struct neighbour dev one. > > > What I am concerned with is - if the master is not an IPoIB device, > what guarantee do we have that to_ipoib_neigh will return 0 > and not part of an actual hardware address? > > Without bonding, the reason is that dev points to an ipoib device, > so we know hw address is 20 bytes. > I guess you meant "if the slave is not an IPoIB device"... The bond device doesn't allow devices of different types to be grouped together as its slaves. Furthermore, bond_setup_by_slave is called only for non Ethernet devices (we consider to change the logic to "called only for IPoIB devices just for safety). ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour
> -- > IPoIB uses a two layer neighboring scheme, such that for each struct neighbour > whose device is an ipoib one, there is a struct ipoib_neigh buddy which is > created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) > call. > > When using the bonding driver, neighbours are created by the net stack on > behalf > of the bonding (master) device. On the tx flow the bonding code gets an skb > such > that skb->dev points to the master device, it changes this skb to point on the > slave device and calls the slave hard_start_xmit function. > > Combing these two flows, there is a hole if some code at ipoib > (ipoib_neigh_destructor) assumes that for each struct neighbour it gets, > n->dev > is an ipoib device so for example netdev_priv(n->dev) would be of type struct > ipoib_dev_priv. Could you plese elaborate how ipoib_neigh_destructor comes to be called at all? At what point does ipoib_neigh_setup_dev get called? > To fix it, this patch adds a dev field to struct ipoib_neigh which is used > instead of the struct neighbour dev one. What I am concerned with is - if the master is not an IPoIB device, what guarantee do we have that to_ipoib_neigh will return 0 and not part of an actual hardware address? Without bonding, the reason is that dev points to an ipoib device, so we know hw address is 20 bytes. -- MST ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour
Michael, Roland, I'd appreciate if you take a look at this and give your comments. The patch here refers to this thread about adding bonding support for IPoIB interfaces and is necessary for it to work properly. http://openib.org/pipermail/openib-general/2007-January/031934.html The patch here is for upstream kernel while there is a version of the patch for OFED as well (for kernels up to 2.6.16) http://openib.org/pipermail/openib-general/2007-January/031935.html thanks - MoniS -- IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb->dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Combing these two flows, there is a hole if some code at ipoib (ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev is an ipoib device so for example netdev_priv(n->dev) would be of type struct ipoib_dev_priv. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one. Signed-off-by: Moni Shoua <[EMAIL PROTECTED]> Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]> --- ipoib.h |4 +++- ipoib_main.c | 23 +-- ipoib_multicast.c |2 +- 3 files changed, 17 insertions(+), 12 deletions(-) Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib.h === --- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib.h2007-01-22 12:11:25.0 +0200 +++ infiniband/drivers/infiniband/ulp/ipoib/ipoib.h 2007-01-22 12:18:06.101698456 +0200 @@ -216,6 +216,7 @@ struct ipoib_neigh { struct sk_buff_head queue; struct neighbour *neighbour; + struct net_device *dev; struct list_headlist; }; @@ -232,7 +233,8 @@ static inline struct ipoib_neigh **to_ip INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, + struct net_device *dev); void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-01-22 12:11:33.0 +0200 +++ infiniband/drivers/infiniband/ulp/ipoib/ipoib_main.c2007-01-22 12:34:57.599156580 +0200 @@ -490,7 +490,7 @@ static void neigh_add_path(struct sk_buf struct ipoib_path *path; struct ipoib_neigh *neigh; - neigh = ipoib_neigh_alloc(skb->dst->neighbour); + neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev); if (!neigh) { ++priv->stats.tx_dropped; dev_kfree_skb_any(skb); @@ -769,32 +769,34 @@ static void ipoib_set_mcast_list(struct static void ipoib_neigh_destructor(struct neighbour *n) { struct ipoib_neigh *neigh; - struct ipoib_dev_priv *priv = netdev_priv(n->dev); + struct ipoib_dev_priv *priv; unsigned long flags; struct ipoib_ah *ah = NULL; - ipoib_dbg(priv, - "neigh_destructor for %06x " IPOIB_GID_FMT "\n", - IPOIB_QPN(n->ha), - IPOIB_GID_RAW_ARG(n->ha + 4)); - - spin_lock_irqsave(&priv->lock, flags); neigh = *to_ipoib_neigh(n); if (neigh) { + priv = netdev_priv(neigh->dev); + ipoib_dbg(priv, + "neigh_destructor for %06x " IPOIB_GID_FMT "\n", + IPOIB_QPN(n->ha), + IPOIB_GID_RAW_ARG(n->ha + 4)); + + spin_lock_irqsave(&priv->lock, flags); if (neigh->ah) ah = neigh->ah; list_del(&neigh->list); ipoib_neigh_free(n->dev, neigh); + spin_unlock_irqrestore(&priv->lock, flags); } - spin_unlock_irqrestore(&priv->lock, flags); if (ah) ipoib_put_ah(ah); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, + struct net_device *dev) { struct ipoib_neigh *neigh; @@ -803,6 +805,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st return NULL; neigh->neighbour = neighbour;