Re: [PATCH] net: take care of bonding in build_skb_flow_key (v4)
在 2016年01月22日 14:52, Jiri Pirko 写道: Fri, Jan 22, 2016 at 05:21:28AM CET, wen.gang.w...@oracle.com wrote: 在 2016年01月21日 16:35, Jiri Pirko 写道: Thu, Jan 21, 2016 at 06:32:58AM CET, wen.gang.w...@oracle.com wrote: In a bonding setting, we determines fragment size according to MTU and PMTU associated to the bonding master. If the slave finds the fragment size is too big, it drops the fragment and calls ip_rt_update_pmtu(), passing _skb_ and _pmtu_, trying to update the path MTU. Problem is that the target device that function ip_rt_update_pmtu actually tries to update is the slave (skb->dev), not the master. Thus since no PMTU change happens on master, the fragment size for later packets doesn't change so all later fragments/packets are dropped too. The fix is letting build_skb_flow_key() take care of the transition of device index from bonding slave to the master. That makes the master become the target device that ip_rt_update_pmtu tries to update PMTU to. Signed-off-by: Wengang Wang --- net/ipv4/route.c | 9 + 1 file changed, 9 insertions(+) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 85f184e..7e766b5 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -524,10 +524,19 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb, { const struct iphdr *iph = ip_hdr(skb); int oif = skb->dev->ifindex; + struct net_device *master; u8 tos = RT_TOS(iph->tos); u8 prot = iph->protocol; u32 mark = skb->mark; + if (netif_is_bond_slave(skb->dev)) { + rcu_read_lock(); + master = netdev_master_upper_dev_get_rcu(skb->dev); + if (master) + oif = master->ifindex; + rcu_read_unlock(); + } This is certainly not correct as it should not be bond-specific but rather generic. Then what you would suggest to fix it? Note that you may have bond over bond or bridge over bond or other scenarios, which this patch ignores. I don't think bond over bond is a good configuration. Do you have a real use case for that configuration? Stacking of multiple master devices is absolutelly common. You have to go in the upper tree all the way up, for all master device types. Yep, to make code better. I can do it. thanks, wengang thanks, wengang
Re: [PATCH] net: take care of bonding in build_skb_flow_key (v4)
在 2016年01月21日 16:35, Jiri Pirko 写道: Thu, Jan 21, 2016 at 06:32:58AM CET, wen.gang.w...@oracle.com wrote: In a bonding setting, we determines fragment size according to MTU and PMTU associated to the bonding master. If the slave finds the fragment size is too big, it drops the fragment and calls ip_rt_update_pmtu(), passing _skb_ and _pmtu_, trying to update the path MTU. Problem is that the target device that function ip_rt_update_pmtu actually tries to update is the slave (skb->dev), not the master. Thus since no PMTU change happens on master, the fragment size for later packets doesn't change so all later fragments/packets are dropped too. The fix is letting build_skb_flow_key() take care of the transition of device index from bonding slave to the master. That makes the master become the target device that ip_rt_update_pmtu tries to update PMTU to. Signed-off-by: Wengang Wang --- net/ipv4/route.c | 9 + 1 file changed, 9 insertions(+) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 85f184e..7e766b5 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -524,10 +524,19 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb, { const struct iphdr *iph = ip_hdr(skb); int oif = skb->dev->ifindex; + struct net_device *master; u8 tos = RT_TOS(iph->tos); u8 prot = iph->protocol; u32 mark = skb->mark; + if (netif_is_bond_slave(skb->dev)) { + rcu_read_lock(); + master = netdev_master_upper_dev_get_rcu(skb->dev); + if (master) + oif = master->ifindex; + rcu_read_unlock(); + } This is certainly not correct as it should not be bond-specific but rather generic. Then what you would suggest to fix it? Note that you may have bond over bond or bridge over bond or other scenarios, which this patch ignores. I don't think bond over bond is a good configuration. Do you have a real use case for that configuration? thanks, wengang
[PATCH] net: take care of bonding in build_skb_flow_key (v4)
In a bonding setting, we determines fragment size according to MTU and PMTU associated to the bonding master. If the slave finds the fragment size is too big, it drops the fragment and calls ip_rt_update_pmtu(), passing _skb_ and _pmtu_, trying to update the path MTU. Problem is that the target device that function ip_rt_update_pmtu actually tries to update is the slave (skb->dev), not the master. Thus since no PMTU change happens on master, the fragment size for later packets doesn't change so all later fragments/packets are dropped too. The fix is letting build_skb_flow_key() take care of the transition of device index from bonding slave to the master. That makes the master become the target device that ip_rt_update_pmtu tries to update PMTU to. Signed-off-by: Wengang Wang --- net/ipv4/route.c | 9 + 1 file changed, 9 insertions(+) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 85f184e..7e766b5 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -524,10 +524,19 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb, { const struct iphdr *iph = ip_hdr(skb); int oif = skb->dev->ifindex; + struct net_device *master; u8 tos = RT_TOS(iph->tos); u8 prot = iph->protocol; u32 mark = skb->mark; + if (netif_is_bond_slave(skb->dev)) { + rcu_read_lock(); + master = netdev_master_upper_dev_get_rcu(skb->dev); + if (master) + oif = master->ifindex; + rcu_read_unlock(); + } + __build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0); } -- 2.1.0
Re: [PATCH] net: take care of bonding in build_skb_flow_key (v3)
在 2016年01月21日 12:05, Jay Vosburgh 写道: Wengang Wang wrote: [...] For ipip, yes seems update_pmtu is called in line for each call of queue_xmit. Do you know if it's a good configuration for ipip + bonding? Yes, it is. Other's comment and suggestion? I agree with Sabrina Dubroca 's suggestions from yesterday. Thank you! I will follow. thanks, wengang -J --- -Jay Vosburgh, jay.vosbu...@canonical.com
Re: [PATCH] net: take care of bonding in build_skb_flow_key (v3)
在 2016年01月20日 23:18, Sabrina Dubroca 写道: 2016-01-20, 13:32:13 +0800, Wengang Wang wrote: In a bonding setting, we determines fragment size according to MTU and PMTU associated to the bonding master. If the slave finds the fragment size is too big, it drops the fragment and calls ip_rt_update_pmtu(), passing _skb_ and _pmtu_, trying to update the path MTU. Problem is that the target device that function ip_rt_update_pmtu actually tries to update is the slave (skb->dev), not the master. Thus since no PMTU change happens on master, the fragment size for later packets doesn't change so all later fragments/packets are dropped too. The fix is letting build_skb_flow_key() take care of the transition of device index from bonding slave to the master. That makes the master become the target device that ip_rt_update_pmtu tries to update PMTU to. Signed-off-by: Wengang Wang --- net/ipv4/route.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 85f184e..c59fb0d 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -523,10 +523,21 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb, const struct sock *sk) { const struct iphdr *iph = ip_hdr(skb); - int oif = skb->dev->ifindex; + struct net_device *master = NULL; u8 tos = RT_TOS(iph->tos); u8 prot = iph->protocol; u32 mark = skb->mark; + int oif; + + if (skb->dev->flags & IFF_SLAVE) { Maybe use netif_is_bond_slave here instead, since you have this problem with bonding slaves? + rtnl_lock(); + master = netdev_master_upper_dev_get(skb->dev); + rtnl_unlock(); + } As zhuyj said, this is called from dev_queue_xmit, so you cannot take rtnl_lock here. + if (master) + oif = master->ifindex; You cannot dereference master after you release the rtnl lock. So it would probably be best to use netdev_master_upper_dev_get_rcu, as zhuyj suggested earlier, and make sure that you only use the result between rcu_read_lock()/rcu_read_unlock(): rcu_read_lock(); master = netdev_master_upper_dev_get_rcu(skb->dev); if (master) oif = master->ifindex; rcu_read_unlock(); OK, thanks for advising. thanks, wengang Thanks,
Re: [PATCH] net: take care of bonding in build_skb_flow_key (v3)
在 2016年01月20日 17:56, zhuyj 写道: On 01/20/2016 05:47 PM, Wengang Wang wrote: 在 2016年01月20日 15:54, zhuyj 写道: On 01/20/2016 03:38 PM, Wengang Wang wrote: 在 2016年01月20日 14:24, zhuyj 写道: On 01/20/2016 01:32 PM, Wengang Wang wrote: In a bonding setting, we determines fragment size according to MTU and PMTU associated to the bonding master. If the slave finds the fragment size is too big, it drops the fragment and calls ip_rt_update_pmtu(), passing _skb_ and _pmtu_, trying to update the path MTU. Problem is that the target device that function ip_rt_update_pmtu actually tries to update is the slave (skb->dev), not the master. Thus since no PMTU change happens on master, the fragment size for later packets doesn't change so all later fragments/packets are dropped too. The fix is letting build_skb_flow_key() take care of the transition of device index from bonding slave to the master. That makes the master become the target device that ip_rt_update_pmtu tries to update PMTU to. Signed-off-by: Wengang Wang --- net/ipv4/route.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 85f184e..c59fb0d 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -523,10 +523,21 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb, const struct sock *sk) { const struct iphdr *iph = ip_hdr(skb); -int oif = skb->dev->ifindex; +struct net_device *master = NULL; u8 tos = RT_TOS(iph->tos); u8 prot = iph->protocol; u32 mark = skb->mark; +int oif; + +if (skb->dev->flags & IFF_SLAVE) { +rtnl_lock(); +master = netdev_master_upper_dev_get(skb->dev); +rtnl_unlock(); update_pmtu is called very frequently. Is it appropriate to use rtnl_lock here? By "very frequently", how frequently it is expected? And what situation can cause that? For my case, the update_pmtu is called only once. ip_tunnel_xmit Can you please explain with more details? dev_queue_xmit->ipip_tunnel_xmit->ip_tunnel_xmit->tnl_update_pmtu-> skb_dst(skb)->ops->update_pmtu For ipip, yes seems update_pmtu is called in line for each call of queue_xmit. Do you know if it's a good configuration for ipip + bonding? Other's comment and suggestion? thanks, wengang
Re: [PATCH] net: take care of bonding in build_skb_flow_key
Please see if the V2 is clear. thanks, wengang 在 2016年01月06日 16:18, zhuyj 写道: On 01/06/2016 04:14 PM, Wengang Wang wrote: 在 2016年01月06日 16:00, zhuyj 写道: On 01/06/2016 03:45 PM, Wengang Wang wrote: 在 2016年01月06日 15:35, zhuyj 写道: IMHO, "The path MTU is set to the active slave device, not the bonding master." Can we set PMTU to bonding master when path MTU is set to the active slave device? Actually the route is set on bonding master, not on any slave, the trying to set PMTU to the active slave failed(it didn't found a route on slave). In your commit log: " A problem is found that we are looking for route basing a bonding device and deal with path MTU there: The path MTU is set to the active slave device, not the bonding master. " and "the trying to set PMTU to the active slave failed" I am confused. Maybe changing "The path MTU is set to the active slave device, not the bonding master" to "The path MTU is tried to set to the active slave device, not to the bonding master" is better. Maybe you should explain your problem clearly. Zhu Yanjun It tried to change the PMTU to the slave. Whether the setting can succeed depends: if the route is there(on slave), it goes successfully; if not no route found, it goes unsuccessfully. For the no-route case, your suggestion breaks. thanks, wengang Zhu Yanjun So "set PMTU to bonding master when path MTU is set to the active slave device" is lacking a base. thanks, wengang If not appropriate, you can ignore it. Best Regards! Zhu Yanjun On 01/06/2016 03:06 PM, Wengang Wang wrote: Hi Yanjun, Thanks for your review. Master MTU is same as that for slaves. Maybe fixing in bonding driver is a good idea, but I don't find a good place to do that. Let's go through the simplified follow: ... 1) Fragmentation. --This is is done is against the bonding master device(device MTU and path MTU) 2) bond_start_xmit 3) ipoib_start_xmit(slaves are IPoIB interfaces) For the first send 1) fragment size is 7000(in my case) 2) bond_start_xmit its self is fine 3) ipoib_start_xmit sees the packet size 7000 is larger than the internal limit 2044, drops the packet and try to update PMTU. without the patch, it tried update PMTU on slave device(no changes to master). the seconds send comes, since no changes happen on bonding master(PMTU), the fragment size is still 7000 and the behavior is just the same as the first send. With the patch, the bonding master PMTU is changed to 2044 after the first send(hopefully), for the seconds send the fragment size is set to 2044. To fix in bonding code, I don't find where we can. thanks, wengang 在 2016年01月06日 14:19, zhuyj 写道: IMHO, this should fix in bonding driver because the active slave mtu should be the same with the master. bonding master's mtu is changed to path MTU, then slave dev's MTU should be changed, too. Zhu Yanjun On 01/06/2016 01:49 PM, Wengang Wang wrote: A problem is found that we are looking for route basing a bonding device and deal with path MTU there: The path MTU is set to the active slave device, not the bonding master. The patch tries to fix the issue by letting build_skb_flow_key() take care of the transition of device index from bonding slave to the master. Signed-off-by: Wengang Wang --- net/ipv4/route.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 85f184e..3053f10 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -523,11 +523,20 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb, const struct sock *sk) { const struct iphdr *iph = ip_hdr(skb); -int oif = skb->dev->ifindex; +int oif; +struct net_device *master = NULL; + u8 tos = RT_TOS(iph->tos); u8 prot = iph->protocol; u32 mark = skb->mark; +if (skb->dev->flags & IFF_SLAVE) +master = netdev_master_upper_dev_get(skb->dev); +if (master) +oif = master->ifindex; +else +oif = skb->dev->ifindex; + __build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0); } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: take care of bonding in build_skb_flow_key (v2)
In a bonding setting, we determines fragment size according to MTU and PMTU associated to the bonding master. If the slave finds the fragment size is too big, it drops the fragment and calls ip_rt_update_pmtu(), passing _skb_ and _pmtu_, trying to update the path MTU. Problem is that the target device that function ip_rt_update_pmtu actually tries to update is the slave (skb->dev), not the master. Thus since no PMTU change happens on master, the fragment size for later packets doesn't change so all later fragments/packets are dropped too. The fix is letting build_skb_flow_key() take care of the transition of device index from bonding slave to the master. That makes the master become the target device that ip_rt_update_pmtu tries to update PMTU to. Signed-off-by: Wengang Wang --- net/ipv4/route.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 85f184e..fffc7e6 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -523,10 +523,18 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb, const struct sock *sk) { const struct iphdr *iph = ip_hdr(skb); - int oif = skb->dev->ifindex; u8 tos = RT_TOS(iph->tos); + struct net_device *master; u8 prot = iph->protocol; u32 mark = skb->mark; + int oif; + + if (skb->dev->flags & IFF_SLAVE) { + master = netdev_master_upper_dev_get(skb->dev); + oif = master->ifindex; + } else { + oif = skb->dev->ifindex; + } __build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0); } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: take care of bonding in build_skb_flow_key
在 2016年01月06日 16:00, zhuyj 写道: On 01/06/2016 03:45 PM, Wengang Wang wrote: 在 2016年01月06日 15:35, zhuyj 写道: IMHO, "The path MTU is set to the active slave device, not the bonding master." Can we set PMTU to bonding master when path MTU is set to the active slave device? Actually the route is set on bonding master, not on any slave, the trying to set PMTU to the active slave failed(it didn't found a route on slave). In your commit log: " A problem is found that we are looking for route basing a bonding device and deal with path MTU there: The path MTU is set to the active slave device, not the bonding master. " and "the trying to set PMTU to the active slave failed" I am confused. Maybe changing "The path MTU is set to the active slave device, not the bonding master" to "The path MTU is tried to set to the active slave device, not to the bonding master" is better. It tried to change the PMTU to the slave. Whether the setting can succeed depends: if the route is there(on slave), it goes successfully; if not no route found, it goes unsuccessfully. For the no-route case, your suggestion breaks. thanks, wengang Zhu Yanjun So "set PMTU to bonding master when path MTU is set to the active slave device" is lacking a base. thanks, wengang If not appropriate, you can ignore it. Best Regards! Zhu Yanjun On 01/06/2016 03:06 PM, Wengang Wang wrote: Hi Yanjun, Thanks for your review. Master MTU is same as that for slaves. Maybe fixing in bonding driver is a good idea, but I don't find a good place to do that. Let's go through the simplified follow: ... 1) Fragmentation. --This is is done is against the bonding master device(device MTU and path MTU) 2) bond_start_xmit 3) ipoib_start_xmit(slaves are IPoIB interfaces) For the first send 1) fragment size is 7000(in my case) 2) bond_start_xmit its self is fine 3) ipoib_start_xmit sees the packet size 7000 is larger than the internal limit 2044, drops the packet and try to update PMTU. without the patch, it tried update PMTU on slave device(no changes to master). the seconds send comes, since no changes happen on bonding master(PMTU), the fragment size is still 7000 and the behavior is just the same as the first send. With the patch, the bonding master PMTU is changed to 2044 after the first send(hopefully), for the seconds send the fragment size is set to 2044. To fix in bonding code, I don't find where we can. thanks, wengang 在 2016年01月06日 14:19, zhuyj 写道: IMHO, this should fix in bonding driver because the active slave mtu should be the same with the master. bonding master's mtu is changed to path MTU, then slave dev's MTU should be changed, too. Zhu Yanjun On 01/06/2016 01:49 PM, Wengang Wang wrote: A problem is found that we are looking for route basing a bonding device and deal with path MTU there: The path MTU is set to the active slave device, not the bonding master. The patch tries to fix the issue by letting build_skb_flow_key() take care of the transition of device index from bonding slave to the master. Signed-off-by: Wengang Wang --- net/ipv4/route.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 85f184e..3053f10 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -523,11 +523,20 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb, const struct sock *sk) { const struct iphdr *iph = ip_hdr(skb); -int oif = skb->dev->ifindex; +int oif; +struct net_device *master = NULL; + u8 tos = RT_TOS(iph->tos); u8 prot = iph->protocol; u32 mark = skb->mark; +if (skb->dev->flags & IFF_SLAVE) +master = netdev_master_upper_dev_get(skb->dev); +if (master) +oif = master->ifindex; +else +oif = skb->dev->ifindex; + __build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0); } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: take care of bonding in build_skb_flow_key
在 2016年01月06日 15:35, zhuyj 写道: IMHO, "The path MTU is set to the active slave device, not the bonding master." Can we set PMTU to bonding master when path MTU is set to the active slave device? Actually the route is set on bonding master, not on any slave, the trying to set PMTU to the active slave failed(it didn't found a route on slave). So "set PMTU to bonding master when path MTU is set to the active slave device" is lacking a base. thanks, wengang If not appropriate, you can ignore it. Best Regards! Zhu Yanjun On 01/06/2016 03:06 PM, Wengang Wang wrote: Hi Yanjun, Thanks for your review. Master MTU is same as that for slaves. Maybe fixing in bonding driver is a good idea, but I don't find a good place to do that. Let's go through the simplified follow: ... 1) Fragmentation. --This is is done is against the bonding master device(device MTU and path MTU) 2) bond_start_xmit 3) ipoib_start_xmit(slaves are IPoIB interfaces) For the first send 1) fragment size is 7000(in my case) 2) bond_start_xmit its self is fine 3) ipoib_start_xmit sees the packet size 7000 is larger than the internal limit 2044, drops the packet and try to update PMTU. without the patch, it tried update PMTU on slave device(no changes to master). the seconds send comes, since no changes happen on bonding master(PMTU), the fragment size is still 7000 and the behavior is just the same as the first send. With the patch, the bonding master PMTU is changed to 2044 after the first send(hopefully), for the seconds send the fragment size is set to 2044. To fix in bonding code, I don't find where we can. thanks, wengang 在 2016年01月06日 14:19, zhuyj 写道: IMHO, this should fix in bonding driver because the active slave mtu should be the same with the master. bonding master's mtu is changed to path MTU, then slave dev's MTU should be changed, too. Zhu Yanjun On 01/06/2016 01:49 PM, Wengang Wang wrote: A problem is found that we are looking for route basing a bonding device and deal with path MTU there: The path MTU is set to the active slave device, not the bonding master. The patch tries to fix the issue by letting build_skb_flow_key() take care of the transition of device index from bonding slave to the master. Signed-off-by: Wengang Wang --- net/ipv4/route.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 85f184e..3053f10 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -523,11 +523,20 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb, const struct sock *sk) { const struct iphdr *iph = ip_hdr(skb); -int oif = skb->dev->ifindex; +int oif; +struct net_device *master = NULL; + u8 tos = RT_TOS(iph->tos); u8 prot = iph->protocol; u32 mark = skb->mark; +if (skb->dev->flags & IFF_SLAVE) +master = netdev_master_upper_dev_get(skb->dev); +if (master) +oif = master->ifindex; +else +oif = skb->dev->ifindex; + __build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0); } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: take care of bonding in build_skb_flow_key
Hmm, we are not changing device MTU but PMTU... thanks, wengang 在 2016年01月06日 14:44, zhuyj 写道: IMHO, the following comments will help us all. case NETDEV_CHANGEMTU: /* TODO: Should slaves be allowed to * independently alter their MTU? For * an active-backup bond, slaves need * not be the same type of device, so * MTUs may vary. For other modes, * slaves arguably should have the * same MTUs. To do this, we'd need to * take over the slave's change_mtu * function for the duration of their * servitude. */ break; Best Regards! Zhu Yanjun On 01/06/2016 02:32 PM, Wengang Wang wrote: 在 2016年01月06日 14:18, David Miller 写道: From: Wengang Wang Date: Wed, 6 Jan 2016 13:49:28 +0800 @@ -523,11 +523,20 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb, const struct sock *sk) { const struct iphdr *iph = ip_hdr(skb); -int oif = skb->dev->ifindex; +int oif; +struct net_device *master = NULL; + u8 tos = RT_TOS(iph->tos); u8 prot = iph->protocol; u32 mark = skb->mark; Please fix the stlye of these variable declarations: 1) Order them from longest line to shortest line, also known as "reverse christmas tree" layout. 2) Do not add an empty line in the middle of these variable declarations. OK, will do in second drop. thanks, wengang -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: take care of bonding in build_skb_flow_key
Hi Yanjun, Thanks for your review. Master MTU is same as that for slaves. Maybe fixing in bonding driver is a good idea, but I don't find a good place to do that. Let's go through the simplified follow: ... 1) Fragmentation. --This is is done is against the bonding master device(device MTU and path MTU) 2) bond_start_xmit 3) ipoib_start_xmit(slaves are IPoIB interfaces) For the first send 1) fragment size is 7000(in my case) 2) bond_start_xmit its self is fine 3) ipoib_start_xmit sees the packet size 7000 is larger than the internal limit 2044, drops the packet and try to update PMTU. without the patch, it tried update PMTU on slave device(no changes to master). the seconds send comes, since no changes happen on bonding master(PMTU), the fragment size is still 7000 and the behavior is just the same as the first send. With the patch, the bonding master PMTU is changed to 2044 after the first send(hopefully), for the seconds send the fragment size is set to 2044. To fix in bonding code, I don't find where we can. thanks, wengang 在 2016年01月06日 14:19, zhuyj 写道: IMHO, this should fix in bonding driver because the active slave mtu should be the same with the master. bonding master's mtu is changed to path MTU, then slave dev's MTU should be changed, too. Zhu Yanjun On 01/06/2016 01:49 PM, Wengang Wang wrote: A problem is found that we are looking for route basing a bonding device and deal with path MTU there: The path MTU is set to the active slave device, not the bonding master. The patch tries to fix the issue by letting build_skb_flow_key() take care of the transition of device index from bonding slave to the master. Signed-off-by: Wengang Wang --- net/ipv4/route.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 85f184e..3053f10 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -523,11 +523,20 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb, const struct sock *sk) { const struct iphdr *iph = ip_hdr(skb); -int oif = skb->dev->ifindex; +int oif; +struct net_device *master = NULL; + u8 tos = RT_TOS(iph->tos); u8 prot = iph->protocol; u32 mark = skb->mark; +if (skb->dev->flags & IFF_SLAVE) +master = netdev_master_upper_dev_get(skb->dev); +if (master) +oif = master->ifindex; +else +oif = skb->dev->ifindex; + __build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0); } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: take care of bonding in build_skb_flow_key
在 2016年01月06日 14:18, David Miller 写道: From: Wengang Wang Date: Wed, 6 Jan 2016 13:49:28 +0800 @@ -523,11 +523,20 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb, const struct sock *sk) { const struct iphdr *iph = ip_hdr(skb); - int oif = skb->dev->ifindex; + int oif; + struct net_device *master = NULL; + u8 tos = RT_TOS(iph->tos); u8 prot = iph->protocol; u32 mark = skb->mark; Please fix the stlye of these variable declarations: 1) Order them from longest line to shortest line, also known as "reverse christmas tree" layout. 2) Do not add an empty line in the middle of these variable declarations. OK, will do in second drop. thanks, wengang -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: take care of bonding in build_skb_flow_key
A problem is found that we are looking for route basing a bonding device and deal with path MTU there: The path MTU is set to the active slave device, not the bonding master. The patch tries to fix the issue by letting build_skb_flow_key() take care of the transition of device index from bonding slave to the master. Signed-off-by: Wengang Wang --- net/ipv4/route.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 85f184e..3053f10 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -523,11 +523,20 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb, const struct sock *sk) { const struct iphdr *iph = ip_hdr(skb); - int oif = skb->dev->ifindex; + int oif; + struct net_device *master = NULL; + u8 tos = RT_TOS(iph->tos); u8 prot = iph->protocol; u32 mark = skb->mark; + if (skb->dev->flags & IFF_SLAVE) + master = netdev_master_upper_dev_get(skb->dev); + if (master) + oif = master->ifindex; + else + oif = skb->dev->ifindex; + __build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0); } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ip: find correct route for socket which is not bound (v2)
Hi, Any comment on this patch? thanks, wengang 在 2015年09月25日 09:52, Wengang Wang 写道: This is the v2, comparing the v1, the changes is: * for loopback outbound device, it continue skipping cached route; for others, it goes through the cached route. For multicast, we should find valid route(thus get the meaniful pmtu) for the packet on the socket which is not bound to a device(sk_bound_dev_if being 0) too. From man page of socket(7) SO_BINDTODEVICE Bind this socket to a particular device like “eth0”, as specified in the passed interface name. If the name is an empty string or the option length is zero, the socket device binding is removed. The passed option is a variable-length null-terminated interface name string with the maximum size of IFNAMSIZ. If a socket is bound to an interface, only packets received from that particular interface are processed by the socket. Note that this works only for some socket types, particularly AF_INET sockets. It is not supported for packet sockets (use normal bind(2) there). The man page doesn't say when socket not bound packets won't be routed. A problem is hit that all multicast packets dropped by kernel(from sender host). The lower layer is IPoIB with MTU being 7000. And I was sending 4096 length multicast packets. Inside IPoIB the first send is dropped because is exeeding the internal packet size limitation mcast_mtu which is 2044. So IPoIB calls ip_rt_update_pmtu (indirectly) trying to set path mtu. A correct route is configured for the multicast, so the setting of pmtu cucceeded and the next multicast packet(to the same target) is expected to succeed(it would be well fragmented accroding to the pmtu I just set). But actually the second and later multicast packets got dropped too. And the reason is that the neighor looking up(fib_lookup) is skipped because of the socket is not bound to device(sk_bound_dev_if being 0). After applied the patch I proposed here, it works fine. Signed-off-by: Wengang Wang --- net/ipv4/route.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 5f4a556..c0534c2 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2097,7 +2097,10 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) */ fl4->flowi4_oif = dev_out->ifindex; - goto make_route; + if (dev_out->flags & IFF_LOOPBACK) + goto make_route; + else + goto lookup; } if (!(fl4->flowi4_flags & FLOWI_FLAG_ANYSRC)) { @@ -2153,6 +2156,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) goto make_route; } +lookup: if (fib_lookup(net, fl4, &res, 0)) { res.fi = NULL; res.table = NULL; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ip: find correct route for socket which is not bound (v2)
This is the v2, comparing the v1, the changes is: * for loopback outbound device, it continue skipping cached route; for others, it goes through the cached route. For multicast, we should find valid route(thus get the meaniful pmtu) for the packet on the socket which is not bound to a device(sk_bound_dev_if being 0) too. >From man page of socket(7) SO_BINDTODEVICE Bind this socket to a particular device like “eth0”, as specified in the passed interface name. If the name is an empty string or the option length is zero, the socket device binding is removed. The passed option is a variable-length null-terminated interface name string with the maximum size of IFNAMSIZ. If a socket is bound to an interface, only packets received from that particular interface are processed by the socket. Note that this works only for some socket types, particularly AF_INET sockets. It is not supported for packet sockets (use normal bind(2) there). The man page doesn't say when socket not bound packets won't be routed. A problem is hit that all multicast packets dropped by kernel(from sender host). The lower layer is IPoIB with MTU being 7000. And I was sending 4096 length multicast packets. Inside IPoIB the first send is dropped because is exeeding the internal packet size limitation mcast_mtu which is 2044. So IPoIB calls ip_rt_update_pmtu (indirectly) trying to set path mtu. A correct route is configured for the multicast, so the setting of pmtu cucceeded and the next multicast packet(to the same target) is expected to succeed(it would be well fragmented accroding to the pmtu I just set). But actually the second and later multicast packets got dropped too. And the reason is that the neighor looking up(fib_lookup) is skipped because of the socket is not bound to device(sk_bound_dev_if being 0). After applied the patch I proposed here, it works fine. Signed-off-by: Wengang Wang --- net/ipv4/route.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 5f4a556..c0534c2 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2097,7 +2097,10 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) */ fl4->flowi4_oif = dev_out->ifindex; - goto make_route; + if (dev_out->flags & IFF_LOOPBACK) + goto make_route; + else + goto lookup; } if (!(fl4->flowi4_flags & FLOWI_FLAG_ANYSRC)) { @@ -2153,6 +2156,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) goto make_route; } +lookup: if (fib_lookup(net, fl4, &res, 0)) { res.fi = NULL; res.table = NULL; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ip: find correct route for socket which is not bound (v2)
Hi David, 在 2015年09月25日 05:22, David Miller 写道: From: Wengang Wang Date: Mon, 21 Sep 2015 16:00:09 +0800 This is the v2, comparing the v1, the changes is: * for loopback outbound device, it continue skipping cached route; for others, it goes through the cached route. There are many things I want you to clean up in your commit message. For multi-cast, we should find valid route(thus get the meaniful pmtu) for the package on the socket which is not bound to a device(sk_bound_dev_if being 0) too. Please use the term "packet" rather than "package". Gifts you receive from a friend might be in a "package", but data is sent onto a network inside of "packets". A problem is hit that all multi-cast packages dropped by kernel(from sender "multicast" is a single word. host). The lower layer is IPoIB with MTU being 7000. And I was sending 4096 length multi-cast package. In side IPoIB the first send is dropped because "Inside" is one word. Accepted, sorry for my bad English. I will re-post as you suggested. thanks, wengang -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ip: find correct route for socket which is not bound (v2)
This is the v2, comparing the v1, the changes is: * for loopback outbound device, it continue skipping cached route; for others, it goes through the cached route. For multi-cast, we should find valid route(thus get the meaniful pmtu) for the package on the socket which is not bound to a device(sk_bound_dev_if being 0) too. >From man page of socket(7) SO_BINDTODEVICE Bind this socket to a particular device like “eth0”, as specified in the passed interface name. If the name is an empty string or the option length is zero, the socket device binding is removed. The passed option is a variable-length null-terminated interface name string with the maximum size of IFNAMSIZ. If a socket is bound to an interface, only packets received from that particular interface are processed by the socket. Note that this works only for some socket types, particularly AF_INET sockets. It is not supported for packet sockets (use normal bind(2) there). The man page doesn't say when socket not bound packages won't be routed. A problem is hit that all multi-cast packages dropped by kernel(from sender host). The lower layer is IPoIB with MTU being 7000. And I was sending 4096 length multi-cast package. In side IPoIB the first send is dropped because is exeeding the internal package size limitation mcast_mtu which is 2044. So IPoIB calls ip_rt_update_pmtu (indirectly) trying to set path mtu. A correct route is configured for the multi-cast, so the setting of pmtu cucceeded and the next multi-cast package(to the same target) is expected to succeed(it would be well fragmented accroding to the pmtu I just set). But actually the second and later multi-cast packages got dropped too. And the reason is that the neighor looking up(fib_lookup) is skipped because of the socket is not bound to device(sk_bound_dev_if being 0). After applied the patch I proposed here, it works fine. Signed-off-by: Wengang Wang --- net/ipv4/route.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 5f4a556..c0534c2 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2097,7 +2097,10 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) */ fl4->flowi4_oif = dev_out->ifindex; - goto make_route; + if (dev_out->flags & IFF_LOOPBACK) + goto make_route; + else + goto lookup; } if (!(fl4->flowi4_flags & FLOWI_FLAG_ANYSRC)) { @@ -2153,6 +2156,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) goto make_route; } +lookup: if (fib_lookup(net, fl4, &res, 0)) { res.fi = NULL; res.table = NULL; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ip: find correct route for socket which is not bound to a device
Hi David, Thanks for your review on this. 在 2015年09月18日 12:20, David Miller 写道: From: Wengang Wang Date: Wed, 16 Sep 2015 14:34:15 +0800 For multi-cast, we should find valid route(thus get the meaniful pmtu) for the package on the socket which is not bound to a device(sk_bound_dev_if being 0) too. Your patch breaks exactly the situation explained in full detail in the huge comment about the first change you are making: /* Special hack: user can direct multicasts and limited broadcast via necessary interface without fiddling with IP_MULTICAST_IF or IP_PKTINFO. This hack is not just for fun, it allows vic,vat and friends to work. They bind socket to loopback, set ttl to zero and expect that it will work. From the viewpoint of routing cache they are broken, because we are not allowed to build multicast path with loopback source addr (look, routing cache cannot know, that ttl is zero, so that packet will not leave this host and route is valid). Luckily, this hack is good workaround. */ That situation will now fail after your patch. Yes, I noticed the above comment before I made the patch. I have question regarding that comment: Seems it's talking about the loopback source address(loopback device interface is with the index of 1). Do you think we can make something specific to loopback device and let the package with other outbound devices go through the routing cache? So I cannot apply this patch, sorry. I know what you want, you want to end up with a cached route that will track the PMTU. Yes, that's exactly what I want. Actually the user space can work fine by adding a syscall setsockopt(..., SO_BINDTODEVICE, ..) to bind socket to the device. Well, the problem here is that there are a lot of user space applications which don't bind sockets and they are working fine with old kernel(2.6.39). When moving to new kernel(4.x), the problem appeared. And seems it's hard for force all of them change to bind sockets and recompile. -- that's the pain. Do you have any idea which won't break the existing hack but can help with my case? thanks, wengang -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ip: find correct route for socket which is not bound to a device
For multi-cast, we should find valid route(thus get the meaniful pmtu) for the package on the socket which is not bound to a device(sk_bound_dev_if being 0) too. >From man page of socket(7) SO_BINDTODEVICE Bind this socket to a particular device like “eth0”, as specified in the passed interface name. If the name is an empty string or the option length is zero, the socket device binding is removed. The passed option is a variable-length null-terminated interface name string with the maximum size of IFNAMSIZ. If a socket is bound to an interface, only packets received from that particular interface are processed by the socket. Note that this works only for some socket types, particularly AF_INET sockets. It is not supported for packet sockets (use normal bind(2) there). The man page doesn't say when socket not bound packages won't be routed. A problem is hit that all multi-cast packages dropped by kernel(from sender host). The lower layer is IPoIB with MTU being 7000. And I was sending 4096 length multi-cast package. In side IPoIB the first send is dropped because is exeeding the internal package size limitation mcast_mtu which is 2044. So IPoIB calls ip_rt_update_pmtu (indirectly) trying to set path mtu. A correct route is configured for the multi-cast, so the setting of pmtu cucceeded and the next multi-cast package(to the same target) is expected to succeed(it would be well fragmented accroding to the pmtu I just set). But actually the second and later multi-cast packages got dropped too. And the reason is that the neighor looking up(fib_lookup) is skipped because of the socket is not bound to device(sk_bound_dev_if being 0). After applied the patch I proposed here, it works fine. Signed-off-by: Wengang Wang --- net/ipv4/route.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 5f4a556..032481a 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2097,7 +2097,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) */ fl4->flowi4_oif = dev_out->ifindex; - goto make_route; + goto lookup; } if (!(fl4->flowi4_flags & FLOWI_FLAG_ANYSRC)) { @@ -2153,6 +2153,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) goto make_route; } +lookup: if (fib_lookup(net, fl4, &res, 0)) { res.fi = NULL; res.table = NULL; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html