Re: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour

2007-02-08 Thread Moni Shoua
Michael S. Tsirkin wrote:
>> Quoting Moni Shoua <[EMAIL PROTECTED]>:
>> Subject: Re: [PATCH] IB/ipoib get net_device from ipoib_neigh instead of 
>> linux neighbour
>>
>>
>>> Another concern: assume that one device goes away (e.g. hotplug).
>>> It seems that neighbours whose dev field point to another device, will not 
>>> be destroyed.
>>> Correct?
>> I agree.
>>
>>> Therefore in your design, it seems that to_ipoib_neigh()->dev
>>> will get us a pointer to device that has been removed already.
>>>
>> I agree that this is a problem.
> 
> I think we can solve this if we track all ipoib neighbours, like we do for 
> old kernels,
> and then flush ipoib neighbours on any hotplug event.
> Roland, does this sound too awful?
> 
>> It think it would be best to prevent an IPoIB device
>> from disappearing or from ib_ipoib from being unloaded as long as IPoIB
>> device is a slave. Unfortunately, I don't see how this can be done just
>> by fixing something in bonding or IPoIB. 
> 
> So hotplug is blocked potentially forever?
> This does not sound good.
OK, so I'm dropping this thought.
> 
>> However, any slave knows he has a master (dev->master). 
>> What do you think about a solution where IPoIB first tries to clean up the
>> neighbours that belong to it's master before deleting the IPoIB device?
> 
> How?
Let me know what do you think about that. I hope this makes sense.
in IPoIB, before calling unregister_netdev do
for each kernel neighbour n
if  n->dev == ib_dev->master
delete n

Michael, as I see it we have to deal with 2 cases.
1. IPoIB device is deleted (unregister_netdev) - IPoIB netdev in not in the 
kernel's address space.
we have to make sure that no one holds a pointer to it after it is 
deleted.
2 ib_ipoib module is unloaded (modprobe -r) - the ipoib_neigh_destructor is not 
in the kernel's address space.
we have to make sure no one calls to it after the module is unloaded.
I think that if nothing prevents the execution of the "code" above it serves 
both cases.
Do you see any problem with that?
Do I have to maintain my own list of neighbours or use the kernel's arp table 
for that?

I am trying to study the neighbour cleanup function and do something like that 
but
I would be happy to learn from others as well.


 Furthermore, bond_setup_by_slave is called only for non
 Ethernet devices (we consider to change the logic to "called only for
 IPoIB devices just for safety).
>>> Why is this necessary, BTW?
>>>
>> If we don't do that, we get a memory leak because the neigh destructor will
>> never be called for non IPoIB devices although they carry ipoib_neigh
>> with them.
> 
> How can this happen? If it does, I think we are back to where we started:
> to_ipoib_neigh is broken for non-IPoIB device.
> I thought you said only devices of the same type can be paired?
> 
> 
The scenario is:
1. kernel allocates a neighbour structure for bond0, puts it on a skb and 
passed it to bond xmit function.
2. bond0 passes the skb to ipoib
3. ipoib allocates ipoib_neigh and hangs it on linux neighbour. 
4. a while after that, the kernel wants to destroy the neighbour (cleanup) but 
doesn't call ipoib_neigh_destructor because it the neigh setup registered the 
destructor for ibX device.





___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour

2007-02-07 Thread Michael S. Tsirkin
> Quoting Moni Shoua <[EMAIL PROTECTED]>:
> Subject: Re: [PATCH] IB/ipoib get net_device from ipoib_neigh instead of 
> linux neighbour
> 
> 
> > Another concern: assume that one device goes away (e.g. hotplug).
> > It seems that neighbours whose dev field point to another device, will not 
> > be destroyed.
> > Correct?
>
> I agree.
>
> > Therefore in your design, it seems that to_ipoib_neigh()->dev
> > will get us a pointer to device that has been removed already.
> > 
> I agree that this is a problem.

I think we can solve this if we track all ipoib neighbours, like we do for old 
kernels,
and then flush ipoib neighbours on any hotplug event.
Roland, does this sound too awful?

> It think it would be best to prevent an IPoIB device
> from disappearing or from ib_ipoib from being unloaded as long as IPoIB
> device is a slave. Unfortunately, I don't see how this can be done just
> by fixing something in bonding or IPoIB. 

So hotplug is blocked potentially forever?
This does not sound good.

> However, any slave knows he has a master (dev->master). 
> What do you think about a solution where IPoIB first tries to clean up the
> neighbours that belong to it's master before deleting the IPoIB device?

How?

> >> Furthermore, bond_setup_by_slave is called only for non
> >> Ethernet devices (we consider to change the logic to "called only for
> >> IPoIB devices just for safety).
> > 
> > Why is this necessary, BTW?
> > 
> If we don't do that, we get a memory leak because the neigh destructor will
> never be called for non IPoIB devices although they carry ipoib_neigh
> with them.

How can this happen? If it does, I think we are back to where we started:
to_ipoib_neigh is broken for non-IPoIB device.
I thought you said only devices of the same type can be paired?


-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour

2007-02-07 Thread Moni Shoua

> Another concern: assume that one device goes away (e.g. hotplug).
> It seems that neighbours whose dev field point to another device, will not be 
> destroyed.
> Correct?
I agree.
> 
> Therefore in your design, it seems that to_ipoib_neigh()->dev
> will get us a pointer to device that has been removed already.
> 
I agree that this is a problem. It think it would be best to prevent an IPoIB 
device
from disappearing or from ib_ipoib from being unloaded as long as IPoIB
device is a slave. Unfortunately, I don't see how this can be done just
by fixing something in bonding or IPoIB. 
However, any slave knows he has a master (dev->master). 
What do you think about a solution where IPoIB first tries to clean up the
neighbours that belong to it's master before deleting the IPoIB device?

>> Furthermore, bond_setup_by_slave is called only for non
>> Ethernet devices (we consider to change the logic to "called only for
>> IPoIB devices just for safety).
> 
> Why is this necessary, BTW?
> 
If we don't do that, we get a memory leak because the neigh destructor will
never be called for non IPoIB devices although they carry ipoib_neigh
with them.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour

2007-02-06 Thread Michael S. Tsirkin
> >>--
> >>IPoIB uses a two layer neighboring scheme, such that for each struct 
> >>neighbour
> >>whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
> >>created on demand at the tx flow by an 
> >>ipoib_neigh_alloc(skb->dst->neighbour)
> >>call.
> >>
> >>When using the bonding driver, neighbours are created by the net stack on 
> >>behalf
> >>of the bonding (master) device. On the tx flow the bonding code gets an skb 
> >>such
> >>that skb->dev points to the master device, it changes this skb to point on 
> >>the
> >>slave device and calls the slave hard_start_xmit function.
> >>
> >>Combing these two flows, there is a hole if some code at ipoib
> >>(ipoib_neigh_destructor) assumes that for each struct neighbour it gets, 
> >>n->dev
> >>is an ipoib device so for example netdev_priv(n->dev) would be of type 
> >>struct
> >>ipoib_dev_priv.
> > 
> > 
> > Could you plese elaborate how ipoib_neigh_destructor comes to be called at 
> > all?
> > At what point does ipoib_neigh_setup_dev get called?
> > 
> > 
> The bond device uses its slave's neigh_setup function.
> Please look at line 19 below from the bonding code.
> static void bond_setup_by_slave(struct net_device *bond_dev,
>  11 +   struct net_device *slave_dev)
>  12 +{
>  13 +   bond_dev->hard_header   = slave_dev->hard_header;
>  14 +   bond_dev->rebuild_header= slave_dev->rebuild_header;
>  15 +   bond_dev->hard_header_cache = slave_dev->hard_header_cache;
>  16 +   bond_dev->header_cache_update   = slave_dev->header_cache_update;
>  17 +   bond_dev->hard_header_parse = slave_dev->hard_header_parse;
>  18 +
>  19 +   bond_dev->neigh_setup   = slave_dev->neigh_setup;
>  20 +
>  21 +   bond_dev->type  = slave_dev->type;
>  22 +   bond_dev->hard_header_len   = slave_dev->hard_header_len;
>  23 +   bond_dev->addr_len  = slave_dev->addr_len;
>  24 +
>  25 +   memcpy(bond_dev->broadcast, slave_dev->broadcast,
>  26 +   slave_dev->addr_len);
>  27 +}

Another concern: assume that one device goes away (e.g. hotplug).
It seems that neighbours whose dev field point to another device, will not be 
destroyed.
Correct?

Therefore in your design, it seems that to_ipoib_neigh()->dev
will get us a pointer to device that has been removed already.

> >>To fix it, this patch adds a dev field to struct ipoib_neigh which is used
> >>instead of the struct neighbour dev one.
> > 
> > 
> > What I am concerned with is - if the master is not an IPoIB device,
> > what guarantee do we have that to_ipoib_neigh will return 0
> > and not part of an actual hardware address?
> > 
> > Without bonding, the reason is that dev points to an ipoib device,
> > so we know hw address is 20 bytes.
> > 
> 
> I guess you meant "if the slave is not an IPoIB device"...

Yes.

> The bond device doesn't allow devices of different types to be grouped
> together as its slaves.

I see.

> Furthermore, bond_setup_by_slave is called only for non
> Ethernet devices (we consider to change the logic to "called only for
> IPoIB devices just for safety).

Why is this necessary, BTW?

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour

2007-02-06 Thread Moni Shoua
Michael S. Tsirkin wrote:
>>--
>>IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
>>whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
>>created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
>>call.
>>
>>When using the bonding driver, neighbours are created by the net stack on 
>>behalf
>>of the bonding (master) device. On the tx flow the bonding code gets an skb 
>>such
>>that skb->dev points to the master device, it changes this skb to point on the
>>slave device and calls the slave hard_start_xmit function.
>>
>>Combing these two flows, there is a hole if some code at ipoib
>>(ipoib_neigh_destructor) assumes that for each struct neighbour it gets, 
>>n->dev
>>is an ipoib device so for example netdev_priv(n->dev) would be of type struct
>>ipoib_dev_priv.
> 
> 
> Could you plese elaborate how ipoib_neigh_destructor comes to be called at 
> all?
> At what point does ipoib_neigh_setup_dev get called?
> 
> 
The bond device uses its slave's neigh_setup function.
Please look at line 19 below from the bonding code.
static void bond_setup_by_slave(struct net_device *bond_dev,
 11 +   struct net_device *slave_dev)
 12 +{
 13 +   bond_dev->hard_header   = slave_dev->hard_header;
 14 +   bond_dev->rebuild_header= slave_dev->rebuild_header;
 15 +   bond_dev->hard_header_cache = slave_dev->hard_header_cache;
 16 +   bond_dev->header_cache_update   = slave_dev->header_cache_update;
 17 +   bond_dev->hard_header_parse = slave_dev->hard_header_parse;
 18 +
 19 +   bond_dev->neigh_setup   = slave_dev->neigh_setup;
 20 +
 21 +   bond_dev->type  = slave_dev->type;
 22 +   bond_dev->hard_header_len   = slave_dev->hard_header_len;
 23 +   bond_dev->addr_len  = slave_dev->addr_len;
 24 +
 25 +   memcpy(bond_dev->broadcast, slave_dev->broadcast,
 26 +   slave_dev->addr_len);
 27 +}
>>To fix it, this patch adds a dev field to struct ipoib_neigh which is used
>>instead of the struct neighbour dev one.
> 
> 
> What I am concerned with is - if the master is not an IPoIB device,
> what guarantee do we have that to_ipoib_neigh will return 0
> and not part of an actual hardware address?
> 
> Without bonding, the reason is that dev points to an ipoib device,
> so we know hw address is 20 bytes.
> 

I guess you meant "if the slave is not an IPoIB device"...

The bond device doesn't allow devices of different types to be grouped
together as its slaves. Furthermore, bond_setup_by_slave is called only for non
Ethernet devices (we consider to change the logic to "called only for
IPoIB devices just for safety).


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour

2007-02-06 Thread Michael S. Tsirkin
> --
> IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
> whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
> created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
> call.
> 
> When using the bonding driver, neighbours are created by the net stack on 
> behalf
> of the bonding (master) device. On the tx flow the bonding code gets an skb 
> such
> that skb->dev points to the master device, it changes this skb to point on the
> slave device and calls the slave hard_start_xmit function.
> 
> Combing these two flows, there is a hole if some code at ipoib
> (ipoib_neigh_destructor) assumes that for each struct neighbour it gets, 
> n->dev
> is an ipoib device so for example netdev_priv(n->dev) would be of type struct
> ipoib_dev_priv.

Could you plese elaborate how ipoib_neigh_destructor comes to be called at all?
At what point does ipoib_neigh_setup_dev get called?

> To fix it, this patch adds a dev field to struct ipoib_neigh which is used
> instead of the struct neighbour dev one.

What I am concerned with is - if the master is not an IPoIB device,
what guarantee do we have that to_ipoib_neigh will return 0
and not part of an actual hardware address?

Without bonding, the reason is that dev points to an ipoib device,
so we know hw address is 20 bytes.

-- 
MST

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] [PATCH] IB/ipoib get net_device from ipoib_neigh instead of linux neighbour

2007-02-06 Thread Moni Shoua
Michael, Roland,

I'd appreciate if you take a look at this and give your comments.

The patch here refers to this thread about adding bonding 
support for IPoIB interfaces and is necessary for it to work properly.
http://openib.org/pipermail/openib-general/2007-January/031934.html

The patch here is for upstream kernel while there is a version of the patch 
for OFED as well (for kernels up to 2.6.16)
http://openib.org/pipermail/openib-general/2007-January/031935.html

thanks
- MoniS

--
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
call.

When using the bonding driver, neighbours are created by the net stack on behalf
of the bonding (master) device. On the tx flow the bonding code gets an skb such
that skb->dev points to the master device, it changes this skb to point on the
slave device and calls the slave hard_start_xmit function.

Combing these two flows, there is a hole if some code at ipoib
(ipoib_neigh_destructor) assumes that for each struct neighbour it gets, n->dev
is an ipoib device so for example netdev_priv(n->dev) would be of type struct
ipoib_dev_priv.

To fix it, this patch adds a dev field to struct ipoib_neigh which is used
instead of the struct neighbour dev one.

Signed-off-by: Moni Shoua <[EMAIL PROTECTED]>
Signed-off-by: Or Gerlitz <[EMAIL PROTECTED]>
---
 ipoib.h   |4 +++-
 ipoib_main.c  |   23 +--
 ipoib_multicast.c |2 +-
 3 files changed, 17 insertions(+), 12 deletions(-)

Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib.h
===
--- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib.h2007-01-22 
12:11:25.0 +0200
+++ infiniband/drivers/infiniband/ulp/ipoib/ipoib.h 2007-01-22 
12:18:06.101698456 +0200
@@ -216,6 +216,7 @@ struct ipoib_neigh {
struct sk_buff_head queue;
 
struct neighbour   *neighbour;
+   struct net_device *dev;
 
struct list_headlist;
 };
@@ -232,7 +233,8 @@ static inline struct ipoib_neigh **to_ip
 INFINIBAND_ALEN, sizeof(void *));
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh);
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh,
+ struct net_device *dev);
 void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh);
 
 extern struct workqueue_struct *ipoib_workqueue;
Index: infiniband/drivers/infiniband/ulp/ipoib/ipoib_main.c
===
--- infiniband.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c   2007-01-22 
12:11:33.0 +0200
+++ infiniband/drivers/infiniband/ulp/ipoib/ipoib_main.c2007-01-22 
12:34:57.599156580 +0200
@@ -490,7 +490,7 @@ static void neigh_add_path(struct sk_buf
struct ipoib_path *path;
struct ipoib_neigh *neigh;
 
-   neigh = ipoib_neigh_alloc(skb->dst->neighbour);
+   neigh = ipoib_neigh_alloc(skb->dst->neighbour, skb->dev);
if (!neigh) {
++priv->stats.tx_dropped;
dev_kfree_skb_any(skb);
@@ -769,32 +769,34 @@ static void ipoib_set_mcast_list(struct 
 static void ipoib_neigh_destructor(struct neighbour *n)
 {
struct ipoib_neigh *neigh;
-   struct ipoib_dev_priv *priv = netdev_priv(n->dev);
+   struct ipoib_dev_priv *priv;
unsigned long flags;
struct ipoib_ah *ah = NULL;
 
-   ipoib_dbg(priv,
- "neigh_destructor for %06x " IPOIB_GID_FMT "\n",
- IPOIB_QPN(n->ha),
- IPOIB_GID_RAW_ARG(n->ha + 4));
-
-   spin_lock_irqsave(&priv->lock, flags);
 
neigh = *to_ipoib_neigh(n);
if (neigh) {
+   priv = netdev_priv(neigh->dev);
+   ipoib_dbg(priv,
+ "neigh_destructor for %06x " IPOIB_GID_FMT "\n",
+ IPOIB_QPN(n->ha),
+ IPOIB_GID_RAW_ARG(n->ha + 4));
+
+   spin_lock_irqsave(&priv->lock, flags);
if (neigh->ah)
ah = neigh->ah;
list_del(&neigh->list);
ipoib_neigh_free(n->dev, neigh);
+   spin_unlock_irqrestore(&priv->lock, flags);
}
 
-   spin_unlock_irqrestore(&priv->lock, flags);
 
if (ah)
ipoib_put_ah(ah);
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour)
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour,
+ struct net_device *dev)
 {
struct ipoib_neigh *neigh;
 
@@ -803,6 +805,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st
return NULL;
 
neigh->neighbour = neighbour;