Re: [PATCH v4 1/5] geneve: Add geneve udp port offload for ethernet devices
Resending the series with a cover-letter that got missed. Thanks Anjali On 12/14/2015 11:57 AM, Anjali Singhai Jain wrote: Add ndo_ops to add/del UDP ports to a device that supports geneve offload. v2: Comment fix. Signed-off-by: Anjali Singhai Jain Signed-off-by: Kiran Patil --- drivers/net/geneve.c | 23 +++ include/linux/netdevice.h | 20 +++- 2 files changed, 42 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/4] geneve: Add geneve udp port offload for ethernet devices
On 12/11/2015 7:11 PM, Tom Herbert wrote: On Tue, Dec 8, 2015 at 10:12 AM, Anjali Singhai Jain wrote: Add ndo_ops to add/del UDP ports to a device that supports geneve offload. v3: Add some more comments about the use of the new ndo ops. Signed-off-by: Anjali Singhai Jain Signed-off-by: Kiran Patil --- drivers/net/geneve.c | 23 +++ include/linux/netdevice.h | 21 - 2 files changed, 43 insertions(+), 1 deletion(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index de5c30c..b43fd56 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -371,8 +371,11 @@ static struct socket *geneve_create_sock(struct net *net, bool ipv6, static void geneve_notify_add_rx_port(struct geneve_sock *gs) { + struct net_device *dev; struct sock *sk = gs->sock->sk; + struct net *net = sock_net(sk); sa_family_t sa_family = sk->sk_family; + __be16 port = inet_sk(sk)->inet_sport; int err; if (sa_family == AF_INET) { @@ -381,6 +384,14 @@ static void geneve_notify_add_rx_port(struct geneve_sock *gs) pr_warn("geneve: udp_add_offload failed with status %d\n", err); } + + rcu_read_lock(); + for_each_netdev_rcu(net, dev) { + if (dev->netdev_ops->ndo_add_geneve_port) + dev->netdev_ops->ndo_add_geneve_port(dev, sa_family, +port); + } + rcu_read_unlock(); What about IPv6 case? The driver still gets add port calls for IPv6 and can decide to offload L4 RX checksum if the HW is capable. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/4] i40e: geneve tunnel offload support
On 12/8/2015 10:36 AM, Jesse Gross wrote: On Tue, Dec 8, 2015 at 10:20 AM, Alexei Starovoitov wrote: On Tue, Dec 08, 2015 at 10:12:12AM -0800, Anjali Singhai Jain wrote: +/** + * i40e_add_geneve_port - Get notifications about GENEVE ports that come up + * @netdev: This physical port's netdev + * @sa_family: Socket Family that GENEVE is notifying us about + * @port: New UDP port number that GENEVE started listening to + **/ +static void i40e_add_geneve_port(struct net_device *netdev, + sa_family_t sa_family, __be16 port) +{ +#if IS_ENABLED(CONFIG_GENEVE) ... + /* New port: add it and mark its index in the bitmap */ + pf->udp_ports[next_idx].index = port; + pf->udp_ports[next_idx].type = I40E_AQC_TUNNEL_TYPE_NGE; the function suppose to deal with geneve but tunnel type is NGE ?! NGE is an old name for Geneve: "Next Generation Encapsulation" Yes and that is why the old name in the SW/FW header files. -#define I40E_MAX_TUNNEL_HDR_LEN 80 +/* Hardware supports L4 tunnel length of 128B (=2^7) which includes + * inner mac plus all inner ethertypes. + */ +#define I40E_MAX_TUNNEL_HDR_LEN 128 so the driver lied about actual hw capabilities earlier or it needs firmware update to work this way? I'm pretty sure that this is just making the calculation match the hardware more accurately. If you look at the code below it, it is now calculating the length from a different place. It is making the code match the HW more accurately, which did support the 128 size all along but till we enabled geneve support in the SW and FW, the earlier setting was the right value. - if (!(tx_flags & I40E_TX_FLAGS_VXLAN_TUNNEL)) { + if (!(tx_flags & I40E_TX_FLAGS_UDP_TUNNEL)) { ... @@ -163,7 +163,7 @@ enum i40e_dyn_idx_t { #define I40E_TX_FLAGS_FSOBIT(7) #define I40E_TX_FLAGS_TSYN BIT(8) #define I40E_TX_FLAGS_FD_SB BIT(9) -#define I40E_TX_FLAGS_VXLAN_TUNNEL BIT(10) +#define I40E_TX_FLAGS_UDP_TUNNEL BIT(10) these changes implying that HW actually doesn't have special 'geneve or vxlan' hard coded logic and it's generic enough to understand most of udp tunnels. Then why you cannot generalize this whole things as generic udp tunnel offload and do not add any protocol specific hooks and ndos. These are transmit flags but the issue of specialization relates to receive. Second what Jesse said. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
On 12/1/2015 8:08 AM, John W. Linville wrote: On Tue, Dec 01, 2015 at 04:49:28PM +0100, Hannes Frederic Sowa wrote: On Tue, Dec 1, 2015, at 16:44, John W. Linville wrote: On Mon, Nov 30, 2015 at 09:26:51PM -0800, Tom Herbert wrote: On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross wrote: Based on what we can do today, I see only two real choices: do some refactoring to clean up the stack a bit or remove the existing VXLAN offloading altogether. I think this series is trying to do the former and the result is that the stack is cleaner after than before. That seems like a good thing. There is a third choice which is to do nothing. Creating an infrastructure that claims to "Generalize udp based tunnel offload" but actually doesn't generalize the mechanism is nothing more than window dressing-- this does nothing to help with the VXLAN to VXLAN-GPE transition for instance. If geneve specific offload is really needed now then that can be should with another ndo function, or alternatively ntuple filter with a device specific action would at least get the stack out of needing to be concerned with that. Regardless, we will work optimize the rest of the stack for devices that implement protocol agnostic mechanisms. Is there no concern about NDO proliferation? Does the size of the netdev_ops structure matter? Beyond that, I can see how a single entry point with an enum specifying the offload type isn't really any different in the grand scheme of things than having multiple NDOs, one per offload. Given the need to live with existing hardware offloads, I would lean toward a consolidated NDO. But if a different NDO per tunnel type is preferred, I can be satisified with that. Having per-offloading NDOs helps the stack to gather further information what kind of offloads the driver has even maybe without trying to call down into the layer (just by comparing to NULL). Checking this inside the driver offload function clearly does not have this feature. So we finally can have "ip tunnel please-recommend-type" feature. :) That is a valuable insight! Maybe the per-offload NDO isn't such a bad idea afterall... :-) John This helps me understand why having a separate ndo op might still be ok. Thanks for the feedback. I will go back to that model. Also I think I did finally understand the discussion on using a single 2's compliment checksum method for future silicon. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v1 1/6] net: Generalize udp based tunnel offload
-Original Message- From: Tom Herbert [mailto:t...@herbertland.com] Sent: Monday, November 30, 2015 8:36 AM To: Singhai, Anjali Cc: Linux Kernel Network Developers ; Jesse Gross ; Patil, Kiran Subject: Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload On Mon, Nov 23, 2015 at 1:02 PM, Anjali Singhai Jain wrote: > Replace add/del ndo ops for vxlan_port with tunnel_port so that all > UDP based tunnels can use the same ndo op. Add a parameter to pass > tunnel type to the ndo_op. > Please consider using RX ntuple filters for this instead of a new ndo op. The vxlan ndo op essentailly implements a limited filter with a rule to match a destination UDP port and the the action of processing the packet as vxlan. ntuple filters generalizes that so that the filtering becomes arbitrary. We'll need the ability to filter on 4-tuple when we implement tunnels to go through firewalls or for offloading other UDP protocols such SPUD or QUIC. Tom - Tom I am not sure I agree with this suggestion. The easiest way to let the hardware know about port to protocol mapping in case of udp-based tunnels is when we add udp offloads for the ports aka gro etc in the stack. This way the user gets benefit of tunnel offloads from the HWs that support it without having to do any extra filter setups from ethtool. Just like ip/tcp/udp checksum and TSO support, the user does not have to turn this ON specifically if they plan to use those protocols (of course they can turn it off). Besides these are not true filters in that sense, they are not used to guide packets to any particular destination in this case, rather used to identify packets for checksum and TSO purpose. And I agree with your patch series that reduces protocol ossification of the stack and driver interface. My point is this set of patches help with that goal and not really hurt because any new tunnel support would mean no change in the interface and just a new type in the enum and then the drivers can decide to do the magic setup in the HW in their driver based on this new type without ever having to touch the interface. So try to explain to me why this is causing protocol ossification because I don't believe so. And I think the ntupe interface should remain for the purpose of filters which are used to route packet or drop them. Not for packet identification and checksum offload support. > Change all drivers to use the generalized udp tunnel offload > > Patch was compile tested with x86_64_defconfig. > > Signed-off-by: Kiran Patil > Signed-off-by: Anjali Singhai Jain > --- > drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 15 ++--- > drivers/net/ethernet/broadcom/bnxt/bnxt.c| 13 +--- > drivers/net/ethernet/emulex/benet/be_main.c | 14 +--- > drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 27 > drivers/net/ethernet/intel/i40e/i40e_main.c | 41 > +--- > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c| 17 +++--- > drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 21 > drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 17 +++--- > drivers/net/vxlan.c | 23 +++-- > include/linux/netdevice.h| 34 ++-- > include/net/udp_tunnel.h | 6 > 11 files changed, 157 insertions(+), 71 deletions(-) > > diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c > b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c > index 2273576..ad2782f 100644 > --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c > +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c > @@ -47,6 +47,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -10124,11 +10125,14 @@ static void __bnx2x_add_vxlan_port(struct > bnx2x *bp, u16 port) } > > static void bnx2x_add_vxlan_port(struct net_device *netdev, > -sa_family_t sa_family, __be16 port) > +sa_family_t sa_family, __be16 port, > +u32 type) > { > struct bnx2x *bp = netdev_priv(netdev); > u16 t_port = ntohs(port); > > + if (type != UDP_TUNNEL_VXLAN) > + return; > __bnx2x_add_vxlan_port(bp, t_port); } > > @@ -10152,11 +10156,14 @@ static void __bnx2x_del_vxlan_port(struct > bnx2x *bp, u16 port) } > > static void bnx2x_del_vxlan_port(struct net_device *netdev, > -sa_family_t sa_family, __be16 port) > +sa_family_t sa_family, __be16 port, > +u32 type) > { > struct bnx2x *bp = netdev_priv(netdev); > u16 t_port =
RE: [PATCH v1 1/6] net: Generalize udp based tunnel offload
-Original Message- From: David Miller [mailto:da...@davemloft.net] Sent: Sunday, November 29, 2015 7:23 PM To: t...@herbertland.com Cc: Brandeburg, Jesse ; Singhai, Anjali ; je...@kernel.org; netdev@vger.kernel.org; Patil, Kiran Subject: Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload From: Tom Herbert Date: Tue, 24 Nov 2015 09:32:11 -0800 >> >> FWIW, I've brought the issue to the attention of the architects here, >> and we will likely be able to make changes in this space. Intel >> hardware (as demonstrated by your patches) already is able to deal >> with this de-ossification on transmit. Receive is a whole different beast. >> > Please provide the specifics on why "Receive is a whole different > beast.". Generic receive checksum is already a subset of the > functionality that you must have implement to support the protocol > specific offloads. All the hardware needs to do is calculate the 1's > complement checksum of the packet and return the value on the to the > host with that packet. That's it. No parsing of headers, no worrying > about the pseudo header, no dealing with any encapsulation. Just do > the calculation, return the result to the host and the driver converts > this to CHECKSUM_COMPLETE. I find it very hard to believe that this is > any harder than specific support the next protocol du jour. The reason for receive being different than transmit is, on TX side driver can provide the meta data for where the checksum field is and what is the length that needs to be check summed to the HW on a per packet basis. On Rx the HW parser has to parse the packet to identify the tunnel type and based on that figure out the checksum locations and length in the packet, so definitely HW has to parse the packet and it can parse only based on next header type information or in case of udp tunnels based on udp port mapping to a particular protocol. I am not sure why you say it doesn't need to parse the packet, maybe I am miss- understanding something. Although it's not difficult to reduce protocol ossification on the RX side but it is certainly different and particularly in case of udp-tunnels it needs the port to protocol mapping. +1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v1 1/6] net: Generalize udp based tunnel offload
-Original Message- From: David Miller [mailto:da...@davemloft.net] Sent: Sunday, November 29, 2015 7:22 PM To: t...@herbertland.com Cc: Singhai, Anjali ; netdev@vger.kernel.org; je...@kernel.org; Patil, Kiran Subject: Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload From: Tom Herbert Date: Mon, 23 Nov 2015 13:53:44 -0800 > The bad effect of this model is that it is encourages HW vendors to > continue implement HW protocol specific support for encapsulations, we > get so much more benefit if they implement protocol generic > mechanisms. Dave, at least Intel parts have a protocol generic model for tunneled packet offloads and hence we are able to extend our support to newer tunnel types. We do not have protocol specific support in the HW, but since the udp based tunnels do not have a packet type for the tunnel header, the HW needs to know which udp port should be mapped to which specific encapsulation. Otherwise encapsulated types like NVGRE we can identify through packet type and program the HW to account for the header. The newer patches for sure reduce the protocol ossification since in communalizes all the different tunnels into one interface so that any further support to a newer udp tunnel type requires just a type definition and if the driver/HW can support it, minor driver changes to set the right bits for HW. No interface change for sure. And I think that is definitely a step in the right direction. +1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v1 2/6] net: Add a generic udp_offload_get_port function
> -Original Message- > From: Alexander Duyck [mailto:alexander.du...@gmail.com] > Sent: Monday, November 23, 2015 10:38 PM > To: Singhai, Anjali; netdev@vger.kernel.org > Cc: je...@kernel.org; Patil, Kiran > Subject: Re: [PATCH v1 2/6] net: Add a generic udp_offload_get_port > function > > On 11/23/2015 01:02 PM, Anjali Singhai Jain wrote: > > The new function udp_offload_get_port replaces vxlan_get_rx_port(). > > This is a generic function that will help replay all udp tunnel ports > > irrespective of tunnel type. > > This way when new udp tunnels get added this function need not change. > > > > Note: Drivers besides i40e are compile tested with this change. > > > > Signed-off-by: Anjali Singhai Jain > > Signed-off-by: Kiran Patil > > --- > > [...] > > > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index > > f938616..8597020 100644 > > --- a/net/ipv4/udp_offload.c > > +++ b/net/ipv4/udp_offload.c > > @@ -290,6 +290,33 @@ unlock: > > } > > EXPORT_SYMBOL(udp_del_offload); > > > > +void udp_offload_get_port(struct net_device *dev) { > > + struct udp_offload_priv __rcu **head; > > + struct udp_offload_priv *uo_priv; > > + struct udp_offload *uo; > > + > > + if (udp_offload_base) > > + head = &udp_offload_base; > > + else > > + return; > > + > > + spin_lock(&udp_offload_lock); > > + uo_priv = udp_deref_protected(*head); > > + for (; uo_priv != NULL; uo_priv = udp_deref_protected(*head)) { > > + /* call the right add port */ > > + uo = uo_priv->offload; > > + if (uo && dev->netdev_ops->ndo_add_udp_tunnel_port) > > + dev->netdev_ops->ndo_add_udp_tunnel_port(dev, > > + uo->family, > > + uo->port, > > + uo->tunnel_type); > > + head = &uo_priv->next; > > + } > > + spin_unlock(&udp_offload_lock); > > +} > > +EXPORT_SYMBOL(udp_offload_get_port); > > + > > struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff > *skb, > > struct udphdr *uh) > > { > > > > So when I got to patch 5 I realized this approach is horribly broken for > IPv6 tunnels. The udp_offload_base is only populated if the family is > AF_INET. What do you guys plan to do to get support for AF_INET6? > > You probably ought to look at something like what ended up being done for > the IOAT stuff. What you end up needing is to support the drivers querying > for what ports are active, and receiving notifications of tunnel updates, and > the tunnel side that will register some functionality allowing the active > ports > for a given tunnel type to be queried. > > - Alex Alex you are right about the IPv6 handling and I will update the patch series to help fix that. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v1 1/6] net: Generalize udp based tunnel offload
> -Original Message- > From: Jesse Gross [mailto:je...@kernel.org] > Sent: Monday, November 23, 2015 2:50 PM > To: Tom Herbert > Cc: Singhai, Anjali; Linux Kernel Network Developers; Patil, Kiran > Subject: Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload > > On Mon, Nov 23, 2015 at 1:53 PM, Tom Herbert > wrote: > >> diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h > >> index cb2f89f..72415aa 100644 > >> --- a/include/net/udp_tunnel.h > >> +++ b/include/net/udp_tunnel.h > >> @@ -9,6 +9,12 @@ > >> #include > >> #endif > >> > >> +enum udp_tunnel_type { > >> + UDP_TUNNEL_UNSPEC, > >> + UDP_TUNNEL_VXLAN, > >> + UDP_TUNNEL_GENEVE, > >> +}; > >> + > > > > Sorry, I still don't like this. Grant it least it gets rid of of VXLAN > > specific ops, but the problem is there no such things as a common set > > of encapsulations in the kernel (e.g. foo-over-udp adds a bunch of > > encapsulations not represented here), no defined common set of device > > functionality that needs this, and this precludes the use of the RX > > accelerations to be available from a userpsace implementation. > > Regardless, I think this is at least a good cleanup of what is already > there compared to having VXLAN-specific NDOs. We can always add > additional things in the future. Agreed with Jesse that this will help not hurt, when we are ready to cross the bridge for removing RX side Protocol ossification.