Re: Minimum MTU Mess
On Mon, Sep 12, 2016 at 11:59:41AM +0900, YOSHIFUJI Hideaki wrote: > > > Jarod Wilson wrote: > > On Tue, Sep 06, 2016 at 04:55:29PM -0700, David Miller wrote: > >> From: Jarod Wilson> >> Date: Fri, 2 Sep 2016 13:07:42 -0400 > >> > >>> In any case, the number of "mtu < 68" and "#define FOO_MIN_MTU 68", or > >>> variations thereof, under drivers/net/ is kind of crazy. > >> > >> Agreed, we can have a default and let the different cases provide > >> overrides. > >> > >> Mostly what to do here is a function of the hardware though. > > > > So I've been tinkering with this some, and it looks like having both > > centralized min and max checking could be useful here. I'm hacking away at > > drivers now, but the basis of all this would potentially look about like > > the patch below, and each device would have to set dev->m{in,ax}_mtu one > > way or another. Drivers using alloc_etherdev and/or ether_setup would get > > the "default" values, and then they can be overridden. Probably need > > something to make sure dev->max_mtu isn't set to 0 though... > > > > Possibly on the right track here, or might there be a better way to > > approach this? > > > > diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h > > index 117d02e..864d6f2 100644 > > --- a/include/uapi/linux/if_ether.h > > +++ b/include/uapi/linux/if_ether.h > > @@ -35,6 +35,8 @@ > > #define ETH_FRAME_LEN 1514/* Max. octets in frame sans > > FCS */ > > #define ETH_FCS_LEN4 /* Octets in the FCS > > */ > > > > +#define ETH_MIN_MTU68 /* Min IPv4 MTU per RFC791 > > */ > > + > > /* > > * These are the defined Ethernet Protocol ID's. > > */ > > Why don't we disable IPv4 if the MTU is lower than this value > as we do for IPv6? What will you be left with that is actually usable? Quite a few NIC drivers already enforce this as a minimum MTU, and for drivers that really want to allow less, they just set min_mtu to whatever they like. I'm actually aiming to be 100% functionally identical wrt all existing minimum mtu checks already in existence, just trying to improve how they're done. -- Jarod Wilson ja...@redhat.com
Re: Minimum MTU Mess
On Mon, Sep 12, 2016 at 04:41:40AM +0200, Andrew Lunn wrote: > > Actually breaking this up into easily digestable/mergeable chunks is going > > to be kind of entertaining... Suggestions welcomed on that. First up is > > obviously the core change, which touches just net/ethernet/eth.c, > > net/core/dev.c, include/linux/netdevice.h and > > include/uapi/linux/if_ether.h, and should let existing code continue to > > Just Work(tm), though devices using ether_setup() that had no MTU range > > checking (or one or the other missing) will wind up with new bounds. > > Hi Jarod > > Did you find any drivers which support jumbo packets, but don't have > checks? These drivers, if there are any, need handling first, before > this core change is made. Otherwise you introduce regressions. Surprisingly, very few. There was dvb_net using eth_change_mtu and it's max of 1500 while setting it's initial mtu to 4096, and I swear there was at least one or two drivers that had no upper bounds checking at all that I set max_mtu to IP_MAX_MTU (65535), but it's possible I missed something. At the moment, things are roughly chopped up into: 1) core change 2) deprecate eth_change_mtu and remove in-kernel users 3) set m{in,ax}_mtu in ethernet drivers 4) set m{in,ax}_mtu in wireless drivers 5) set m{in,ax}_mtu in wan drivers 6) set m{in,ax}_mtu in usb ethernet drivers 7) set m{in,ax}_mtu in net infra 8) set m{in,ax}_mtu in virt drivers 9) set m{in,ax}_mtu in misc drivers The ethernet drivers one is by far the largest, and was thinking I'd start breaking that up next. -- Jarod Wilson ja...@redhat.com
Re: Minimum MTU Mess
Jarod Wilson wrote: > On Tue, Sep 06, 2016 at 04:55:29PM -0700, David Miller wrote: >> From: Jarod Wilson>> Date: Fri, 2 Sep 2016 13:07:42 -0400 >> >>> In any case, the number of "mtu < 68" and "#define FOO_MIN_MTU 68", or >>> variations thereof, under drivers/net/ is kind of crazy. >> >> Agreed, we can have a default and let the different cases provide >> overrides. >> >> Mostly what to do here is a function of the hardware though. > > So I've been tinkering with this some, and it looks like having both > centralized min and max checking could be useful here. I'm hacking away at > drivers now, but the basis of all this would potentially look about like > the patch below, and each device would have to set dev->m{in,ax}_mtu one > way or another. Drivers using alloc_etherdev and/or ether_setup would get > the "default" values, and then they can be overridden. Probably need > something to make sure dev->max_mtu isn't set to 0 though... > > Possibly on the right track here, or might there be a better way to > approach this? > > diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h > index 117d02e..864d6f2 100644 > --- a/include/uapi/linux/if_ether.h > +++ b/include/uapi/linux/if_ether.h > @@ -35,6 +35,8 @@ > #define ETH_FRAME_LEN1514/* Max. octets in frame sans > FCS */ > #define ETH_FCS_LEN 4 /* Octets in the FCS */ > > +#define ETH_MIN_MTU 68 /* Min IPv4 MTU per RFC791 */ > + > /* > * These are the defined Ethernet Protocol ID's. > */ Why don't we disable IPv4 if the MTU is lower than this value as we do for IPv6? -- Hideaki Yoshifuji Technical Division, MIRACLE LINUX CORPORATION
Re: Minimum MTU Mess
> Actually breaking this up into easily digestable/mergeable chunks is going > to be kind of entertaining... Suggestions welcomed on that. First up is > obviously the core change, which touches just net/ethernet/eth.c, > net/core/dev.c, include/linux/netdevice.h and > include/uapi/linux/if_ether.h, and should let existing code continue to > Just Work(tm), though devices using ether_setup() that had no MTU range > checking (or one or the other missing) will wind up with new bounds. Hi Jarod Did you find any drivers which support jumbo packets, but don't have checks? These drivers, if there are any, need handling first, before this core change is made. Otherwise you introduce regressions. Andrew
Re: Minimum MTU Mess
On Thu, Sep 08, 2016 at 03:24:13AM +0200, Andrew Lunn wrote: > > This is definitely going to require a few passes... (Working my way > > through every driver with an ndo_change_mtu wired up right now to > > see just how crazy this might get). > > It might be something Coccinelle can help you with. Try describing the > transformation you want to do, to their mailing list, and they might > come up with a script for you. >From looking everything over, I'd be very surprised if they could. The places where things need changing vary quite wildly by driver, but I've actually got a full set of compiling changes with a cumulative diffstat of: 153 files changed, 599 insertions(+), 1002 deletions(-) Actually breaking this up into easily digestable/mergeable chunks is going to be kind of entertaining... Suggestions welcomed on that. First up is obviously the core change, which touches just net/ethernet/eth.c, net/core/dev.c, include/linux/netdevice.h and include/uapi/linux/if_ether.h, and should let existing code continue to Just Work(tm), though devices using ether_setup() that had no MTU range checking (or one or the other missing) will wind up with new bounds. For the most part, after the initial patch, very few of the others would have any direct interaction with any others, so they could all be singletons, or small batches per-vendor, or whatever. Full diffstat for the aid of discussion on how to break it up: drivers/char/pcmcia/synclink_cs.c | 1 - drivers/firewire/net.c | 14 ++--- drivers/infiniband/hw/nes/nes.c| 1 - drivers/infiniband/hw/nes/nes.h| 4 +- drivers/infiniband/hw/nes/nes_nic.c| 7 +-- drivers/misc/sgi-xp/xpnet.c| 21 ++-- drivers/net/ethernet/agere/et131x.c| 7 +-- drivers/net/ethernet/altera/altera_tse.h | 1 - drivers/net/ethernet/altera/altera_tse_main.c | 12 ++--- drivers/net/ethernet/amd/amd8111e.c| 5 +- drivers/net/ethernet/atheros/alx/hw.h | 1 - drivers/net/ethernet/atheros/alx/main.c| 9 +--- drivers/net/ethernet/atheros/atl1c/atl1c_main.c| 41 +- drivers/net/ethernet/atheros/atl1e/atl1e_main.c| 11 ++-- drivers/net/ethernet/atheros/atlx/atl1.c | 15 +++--- drivers/net/ethernet/atheros/atlx/atl2.c | 14 +++-- drivers/net/ethernet/broadcom/b44.c| 5 +- drivers/net/ethernet/broadcom/bcm63xx_enet.c | 30 +++ drivers/net/ethernet/broadcom/bnx2.c | 8 ++- drivers/net/ethernet/broadcom/bnx2.h | 6 +-- drivers/net/ethernet/broadcom/bnx2x/bnx2x.h| 2 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c| 8 +-- drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c | 22 +++- drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 4 ++ drivers/net/ethernet/broadcom/bnxt/bnxt.c | 7 +-- drivers/net/ethernet/broadcom/tg3.c| 7 +-- drivers/net/ethernet/brocade/bna/bnad.c| 7 +-- drivers/net/ethernet/cadence/macb.c| 17 +++--- drivers/net/ethernet/calxeda/xgmac.c | 18 ++- drivers/net/ethernet/cavium/liquidio/lio_main.c| 15 ++ .../net/ethernet/cavium/liquidio/octeon_network.h | 2 +- drivers/net/ethernet/cavium/octeon/octeon_mgmt.c | 5 +- drivers/net/ethernet/cavium/thunder/nicvf_main.c | 10 ++-- drivers/net/ethernet/chelsio/cxgb/cxgb2.c | 2 - drivers/net/ethernet/cisco/enic/enic_main.c| 7 +-- drivers/net/ethernet/cisco/enic/enic_res.h | 2 +- drivers/net/ethernet/dlink/dl2k.c | 22 ++-- drivers/net/ethernet/dlink/sundance.c | 6 ++- drivers/net/ethernet/freescale/gianfar.c | 9 ++-- drivers/net/ethernet/hisilicon/hns/hns_enet.c | 4 -- drivers/net/ethernet/ibm/ehea/ehea_main.c | 13 ++--- drivers/net/ethernet/ibm/emac/core.c | 7 +-- drivers/net/ethernet/intel/e100.c | 9 drivers/net/ethernet/intel/e1000/e1000_main.c | 12 ++--- drivers/net/ethernet/intel/e1000e/netdev.c | 14 +++-- drivers/net/ethernet/intel/fm10k/fm10k_netdev.c| 15 ++ drivers/net/ethernet/intel/i40e/i40e_main.c| 10 ++-- drivers/net/ethernet/intel/i40evf/i40evf_main.c| 8 +-- drivers/net/ethernet/intel/igb/e1000_defines.h | 3 +- drivers/net/ethernet/intel/igb/igb_main.c | 16 ++ drivers/net/ethernet/intel/igbvf/defines.h | 3 +- drivers/net/ethernet/intel/igbvf/netdev.c | 14 ++--- drivers/net/ethernet/intel/ixgb/ixgb_main.c| 16 ++ drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 11 ++-- drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 33 ++-- drivers/net/ethernet/marvell/mvneta.c | 36 - drivers/net/ethernet/marvell/mvpp2.c | 36 -
Re: Minimum MTU Mess
> This is definitely going to require a few passes... (Working my way > through every driver with an ndo_change_mtu wired up right now to > see just how crazy this might get). It might be something Coccinelle can help you with. Try describing the transformation you want to do, to their mailing list, and they might come up with a script for you. Andrew
Re: Minimum MTU Mess
On Wed, Sep 07, 2016 at 01:35:35PM -0700, Stephen Hemminger wrote: > On Wed, 7 Sep 2016 15:53:56 -0400 > Jarod Wilsonwrote: > > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -6466,9 +6466,17 @@ int dev_set_mtu(struct net_device *dev, int new_mtu) > > if (new_mtu == dev->mtu) > > return 0; > > > > - /* MTU must be positive.*/ > > - if (new_mtu < 0) > > + if (new_mtu < dev->min_mtu) { > > + netdev_err(dev, "Invalid MTU %d requested, hw min %d\n", > > + new_mtu, dev->min_mtu); > > return -EINVAL; > > + } > > + > > + if (new_mtu > dev->max_mtu) { > > + netdev_err(dev, "Invalid MTU %d requested, hw max %d\n", > > + new_mtu, dev->min_mtu); > > + return -EINVAL; > > + } > > > > Maybe don't log something that can be triggered from a user program. > Or at least rate limit it. Yeah, I was a little bit on the fence on whether to log anything, make it netdev_err, netdev_dbg, or what. Quite a few drivers have a netdev_err for failed MTU changes, while others also have netdev_info spew for successful MTU changes. Maybe a rate-limited netdev_err is the way to go here. -- Jarod Wilson ja...@redhat.com
Re: Minimum MTU Mess
On Wed, Sep 07, 2016 at 10:31:12PM +0200, Andrew Lunn wrote: > Hi Jarod > > > - /* MTU must be positive.*/ > > - if (new_mtu < 0) > > + if (new_mtu < dev->min_mtu) { > > + netdev_err(dev, "Invalid MTU %d requested, hw min %d\n", > > + new_mtu, dev->min_mtu); > > return -EINVAL; > > + } > > + > > + if (new_mtu > dev->max_mtu) { > > + netdev_err(dev, "Invalid MTU %d requested, hw max %d\n", > > + new_mtu, dev->min_mtu); > > + return -EINVAL; > > + } > > I doubt you can make such a big change like this in one go. Can you > really guarantee all interfaces, of what ever type, will have some > value for dev->min_mtu and dev->max_mtu? What may fly is something > more like: > > > + if (dev->max_mtu && new_mtu > dev->max_mtu) { > > + netdev_err(dev, "Invalid MTU %d requested, hw max %d\n", > > + new_mtu, dev->min_mtu); > > + return -EINVAL; > > + } > > Maybe in a few cycles you can add a WARN_ON(!dev->max_mtu), and a few > cycles after that go with (new_mtu > dev->max_mtu). My local tree actually has if (dev->max_mtu > 0 && new_mtu > dev->max_mtu) since just after I'd sent my mail for exactly that reason, though looking at alloc_netdev_mqs(), it does seem we're at least guaranteed the value will be 0 if not otherwise initialized, so your version looks perfectly fine, and in the min_mtu case, without any additional handling, things behave exactly as they did before. This is definitely going to require a few passes... (Working my way through every driver with an ndo_change_mtu wired up right now to see just how crazy this might get). -- Jarod Wilson ja...@redhat.com
Re: Minimum MTU Mess
On Wed, 7 Sep 2016 15:53:56 -0400 Jarod Wilsonwrote: > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -6466,9 +6466,17 @@ int dev_set_mtu(struct net_device *dev, int new_mtu) > if (new_mtu == dev->mtu) > return 0; > > - /* MTU must be positive.*/ > - if (new_mtu < 0) > + if (new_mtu < dev->min_mtu) { > + netdev_err(dev, "Invalid MTU %d requested, hw min %d\n", > +new_mtu, dev->min_mtu); > return -EINVAL; > + } > + > + if (new_mtu > dev->max_mtu) { > + netdev_err(dev, "Invalid MTU %d requested, hw max %d\n", > +new_mtu, dev->min_mtu); > + return -EINVAL; > + } > Maybe don't log something that can be triggered from a user program. Or at least rate limit it.
Re: Minimum MTU Mess
Hi Jarod > - /* MTU must be positive.*/ > - if (new_mtu < 0) > + if (new_mtu < dev->min_mtu) { > + netdev_err(dev, "Invalid MTU %d requested, hw min %d\n", > +new_mtu, dev->min_mtu); > return -EINVAL; > + } > + > + if (new_mtu > dev->max_mtu) { > + netdev_err(dev, "Invalid MTU %d requested, hw max %d\n", > +new_mtu, dev->min_mtu); > + return -EINVAL; > + } I doubt you can make such a big change like this in one go. Can you really guarantee all interfaces, of what ever type, will have some value for dev->min_mtu and dev->max_mtu? What may fly is something more like: > + if (dev->max_mtu && new_mtu > dev->max_mtu) { > + netdev_err(dev, "Invalid MTU %d requested, hw max %d\n", > +new_mtu, dev->min_mtu); > + return -EINVAL; > + } Maybe in a few cycles you can add a WARN_ON(!dev->max_mtu), and a few cycles after that go with (new_mtu > dev->max_mtu). Andrew
Re: Minimum MTU Mess
On Tue, Sep 06, 2016 at 04:55:29PM -0700, David Miller wrote: > From: Jarod Wilson> Date: Fri, 2 Sep 2016 13:07:42 -0400 > > > In any case, the number of "mtu < 68" and "#define FOO_MIN_MTU 68", or > > variations thereof, under drivers/net/ is kind of crazy. > > Agreed, we can have a default and let the different cases provide > overrides. > > Mostly what to do here is a function of the hardware though. So I've been tinkering with this some, and it looks like having both centralized min and max checking could be useful here. I'm hacking away at drivers now, but the basis of all this would potentially look about like the patch below, and each device would have to set dev->m{in,ax}_mtu one way or another. Drivers using alloc_etherdev and/or ether_setup would get the "default" values, and then they can be overridden. Probably need something to make sure dev->max_mtu isn't set to 0 though... Possibly on the right track here, or might there be a better way to approach this? diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h index 117d02e..864d6f2 100644 --- a/include/uapi/linux/if_ether.h +++ b/include/uapi/linux/if_ether.h @@ -35,6 +35,8 @@ #define ETH_FRAME_LEN 1514/* Max. octets in frame sans FCS */ #define ETH_FCS_LEN4 /* Octets in the FCS */ +#define ETH_MIN_MTU68 /* Min IPv4 MTU per RFC791 */ + /* * These are the defined Ethernet Protocol ID's. */ diff --git a/net/core/dev.c b/net/core/dev.c index dd6ce59..d20fd5d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6466,9 +6466,17 @@ int dev_set_mtu(struct net_device *dev, int new_mtu) if (new_mtu == dev->mtu) return 0; - /* MTU must be positive.*/ - if (new_mtu < 0) + if (new_mtu < dev->min_mtu) { + netdev_err(dev, "Invalid MTU %d requested, hw min %d\n", + new_mtu, dev->min_mtu); return -EINVAL; + } + + if (new_mtu > dev->max_mtu) { + netdev_err(dev, "Invalid MTU %d requested, hw max %d\n", + new_mtu, dev->min_mtu); + return -EINVAL; + } if (!netif_device_present(dev)) return -ENODEV; diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index 66dff5e..2be0203 100644 --- a/net/ethernet/eth.c +++ b/net/ethernet/eth.c @@ -322,8 +322,6 @@ EXPORT_SYMBOL(eth_mac_addr); */ int eth_change_mtu(struct net_device *dev, int new_mtu) { - if (new_mtu < 68 || new_mtu > ETH_DATA_LEN) - return -EINVAL; dev->mtu = new_mtu; return 0; } @@ -357,6 +355,8 @@ void ether_setup(struct net_device *dev) dev->type = ARPHRD_ETHER; dev->hard_header_len= ETH_HLEN; dev->mtu= ETH_DATA_LEN; + dev->min_mtu= ETH_MIN_MTU; + dev->max_mtu= ETH_DATA_LEN; dev->addr_len = ETH_ALEN; dev->tx_queue_len = 1000; /* Ethernet wants good queues */ dev->flags = IFF_BROADCAST|IFF_MULTICAST; -- Jarod Wilson ja...@redhat.com
Re: Minimum MTU Mess
From: Jarod WilsonDate: Fri, 2 Sep 2016 13:07:42 -0400 > In any case, the number of "mtu < 68" and "#define FOO_MIN_MTU 68", or > variations thereof, under drivers/net/ is kind of crazy. Agreed, we can have a default and let the different cases provide overrides. Mostly what to do here is a function of the hardware though.
Minimum MTU Mess
So... I had a bug reported, about a NIC that ceased to work, if it's MTU was set to 0, then back to it's original value (1500). This got me thinking... What does an MTU of 0 even mean? Why should it be allowed? As it turns out, most (but not all) network drivers have a check in their ndo_change_mtu function that returns -EINVAL if new_mtu < $magic_number, which is often 68 (per page 25 of RFC 791), but is sometimes 64, or 60, or 46... Sometimes other manipulations are done. But the short version is that it seems there's an almost universal need to check for a minimum MTU. There's just some disagreement on what that minimum is. So, rather than having nearly every ndo_change_mut callback do the exact same thing, would it make sense to settle on one minimum MTU value, and check that unilaterally (at least for certain netdev types) in net/core/dev.c's dev_set_mtu()? Or is intentionally left vague, because it's really up to the hardware to care? Alternatively, perhaps each driver should set a netdev->min_mtu value, and net/core/dev.c could check against that. Could even have a centralized IP_MIN_MTU of 68 that all devices using ether_setup() and/or alloc_etherdev() used by default. In any case, the number of "mtu < 68" and "#define FOO_MIN_MTU 68", or variations thereof, under drivers/net/ is kind of crazy. In the interim, I think I'll just do a 3-line patch for this one driver that mirrors all the existing drivers to keep it from going splat... -- Jarod Wilson ja...@redhat.com