Re: [PATCH] xfrm: don't segment UFO packets
On Thu, Mar 17, 2016 at 01:03:59PM +0800, Herbert Xu wrote: > On Wed, Mar 16, 2016 at 05:00:26PM +0100, Jiri Bohac wrote: > > Prevent xfrm_output() from segmenting UFO packets so that they will be > > fragmented after the xfrm transforms. > > Fair enough. But I wonder if this is enough. Wouldn't UDP notice > that we're doing IPsec and prefragment the packet anyway? So I think > this check may also be needed in the UDP output path. Fixes my broken case. Ftracing a sendmsg call that sends ~8k of data over the UDP socket, I see a single 8k skb travel through the stack all the way to xfrm_output(). MTU is 1500. That's the whole poing of fragmentation offloading, to pass all the data at once as far as we can, isn't it? What UDP code did you think would "notice and prefragment"? Thanks, -- Jiri Bohac SUSE Labs, SUSE CZ
[PATCH] xfrm: don't segment UFO packets
xfrm_output() will segment GSO packets, including UDP (UFO) packets. this is wrong per RFC4303, section 3.3.4. Fragmentation: If necessary, fragmentation is performed after ESP processing within an IPsec implementation. Thus, transport mode ESP is applied only to whole IP datagrams (not to IP fragments). Prevent xfrm_output() from segmenting UFO packets so that they will be fragmented after the xfrm transforms. Signed-off-by: Jiri Bohac diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4355129..6f3e814 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3501,6 +3501,12 @@ static inline bool skb_is_gso_v6(const struct sk_buff *skb) return skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6; } +/* Note: Should be called only if skb_is_gso(skb) is true */ +static inline bool skb_is_ufo(const struct sk_buff *skb) +{ + return skb_shinfo(skb)->gso_type & SKB_GSO_UDP; +} + void __skb_warn_lro_forwarding(const struct sk_buff *skb); static inline bool skb_warn_if_lro(const struct sk_buff *skb) diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c index cc3676e..c52cc8b 100644 --- a/net/xfrm/xfrm_output.c +++ b/net/xfrm/xfrm_output.c @@ -197,8 +197,12 @@ int xfrm_output(struct sock *sk, struct sk_buff *skb) struct net *net = dev_net(skb_dst(skb)->dev); int err; - if (skb_is_gso(skb)) - return xfrm_output_gso(net, sk, skb); + if (skb_is_gso(skb)) { + if (skb_is_ufo(skb)) + return xfrm_output2(net, sk, skb); + else + return xfrm_output_gso(net, sk, skb); + } if (skb->ip_summed == CHECKSUM_PARTIAL) { err = skb_checksum_help(skb); -- Jiri Bohac SUSE Labs, SUSE CZ
Re: [PATCH] xfrm: don't segment UFO packets
On Thu, Mar 17, 2016 at 11:24:59AM +0100, Steffen Klassert wrote: > In IPv6 this check is missing, so this could be the > problem if this is IPv6. indeed, this patch also fixes my problem: --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1353,6 +1353,7 @@ emsgsize: (skb && skb_is_gso(skb))) && (sk->sk_protocol == IPPROTO_UDP) && (rt->dst.dev->features & NETIF_F_UFO) && + !rt->dst.header_len && (sk->sk_type == SOCK_DGRAM)) { err = ip6_ufo_append_data(sk, queue, getfrag, from, length, hh_len, fragheaderlen, I can't say which is better. Herbert originally seemed to like the fix inside xfrm_output(). The IPv4 part is fixed by commit c146066ab80267c3305de5dda6a4083f06df9265 (ipv4: Don't use ufo handling on later transformed packets) Thanks, -- Jiri Bohac SUSE Labs, SUSE CZ
Re: [PATCH] xfrm: don't segment UFO packets
On Thu, Mar 17, 2016 at 11:49:53AM +0100, Jiri Bohac wrote: > On Thu, Mar 17, 2016 at 11:24:59AM +0100, Steffen Klassert wrote: > > > > On Wed, Mar 16, 2016 at 05:00:26PM +0100, Jiri Bohac wrote: > > > Fixes my broken case. > > > > Is this IPv4 or IPv6? IPv4 should not create a GSO skb > > if IPsec is done. It checks for rt->dst.header_len > > in __ip_append_data() and does a fallback to the > > standard case if rt->dst.header_len is non zero. > > It's IPv6. > > > In IPv6 this check is missing, so this could be the > > problem if this is IPv6. > > Doesn't the check do exactly the opposite of what the RFC says? > The RFC wants ESP to be performed first and fragmentation after > that. UDPv4 currently seems to be doing the opposite. No, __ip_append_data() only prepares the packets for fragmentation and enqueues them. Then __ip_make_skb() dequeues and builds one skb with a fraglist. Then the xfrm layer is called, so esp linearizes (unfortunately) the skb and applies the transformation. Fragmentation happens after that.
Re: [PATCH] xfrm: don't segment UFO packets
On Thu, Mar 17, 2016 at 11:24:59AM +0100, Steffen Klassert wrote: > > > On Wed, Mar 16, 2016 at 05:00:26PM +0100, Jiri Bohac wrote: > > Fixes my broken case. > > Is this IPv4 or IPv6? IPv4 should not create a GSO skb > if IPsec is done. It checks for rt->dst.header_len > in __ip_append_data() and does a fallback to the > standard case if rt->dst.header_len is non zero. It's IPv6. > In IPv6 this check is missing, so this could be the > problem if this is IPv6. Doesn't the check do exactly the opposite of what the RFC says? The RFC wants ESP to be performed first and fragmentation after that. UDPv4 currently seems to be doing the opposite. Well at least it works, unlike in the IPv6 case, where the packet is fragmented, but not enough space is reserved, so after adding the ESP headers, it is fragmented once more. (Details can be found in my first e-mail in this thread, I now replied into the old thread after >1 month, sorry for that: http://thread.gmane.org/gmane.linux.network/396952 ) -- Jiri Bohac SUSE Labs, SUSE CZ
Re: [PATCH] xfrm: don't segment UFO packets
On Thu, Mar 17, 2016 at 06:08:55PM +0100, Jiri Bohac wrote: > On Thu, Mar 17, 2016 at 11:24:59AM +0100, Steffen Klassert wrote: > > In IPv6 this check is missing, so this could be the > > problem if this is IPv6. > > indeed, this patch also fixes my problem: Hmm, is this what you really want? If I understood you correctly, you want the fragmentation to occur after IPsec. So while this might generate the same output, it is still going to prefragment the data, only to merge them back for IPsec and then refragment again. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] xfrm: don't segment UFO packets
On Thu, Mar 17, 2016 at 10:41:15AM +0100, Jiri Bohac wrote: > On Thu, Mar 17, 2016 at 01:03:59PM +0800, Herbert Xu wrote: > > On Wed, Mar 16, 2016 at 05:00:26PM +0100, Jiri Bohac wrote: > > > Prevent xfrm_output() from segmenting UFO packets so that they will be > > > fragmented after the xfrm transforms. > > > > Fair enough. But I wonder if this is enough. Wouldn't UDP notice > > that we're doing IPsec and prefragment the packet anyway? So I think > > this check may also be needed in the UDP output path. > > Fixes my broken case. Is this IPv4 or IPv6? IPv4 should not create a GSO skb if IPsec is done. It checks for rt->dst.header_len in __ip_append_data() and does a fallback to the standard case if rt->dst.header_len is non zero. In IPv6 this check is missing, so this could be the problem if this is IPv6.
Re: [PATCH] xfrm: don't segment UFO packets
On Wed, Mar 16, 2016 at 05:00:26PM +0100, Jiri Bohac wrote: > xfrm_output() will segment GSO packets, including UDP (UFO) packets. > this is wrong per RFC4303, section 3.3.4. Fragmentation: > >If necessary, fragmentation is performed after ESP >processing within an IPsec implementation. Thus, >transport mode ESP is applied only to whole IP >datagrams (not to IP fragments). > > Prevent xfrm_output() from segmenting UFO packets so that they will be > fragmented after the xfrm transforms. > > Signed-off-by: Jiri Bohac Fair enough. But I wonder if this is enough. Wouldn't UDP notice that we're doing IPsec and prefragment the packet anyway? So I think this check may also be needed in the UDP output path. Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] xfrm: don't segment UFO packets
On Fri, Mar 18, 2016 at 10:36:53AM +0800, Herbert Xu wrote: > On Thu, Mar 17, 2016 at 06:08:55PM +0100, Jiri Bohac wrote: > > On Thu, Mar 17, 2016 at 11:24:59AM +0100, Steffen Klassert wrote: > > > In IPv6 this check is missing, so this could be the > > > problem if this is IPv6. > > > > indeed, this patch also fixes my problem: > > Hmm, is this what you really want? If I understood you correctly, > you want the fragmentation to occur after IPsec. The main problem was probably that UFO handling does not work at all on IPv6 IPsec. > So while this > might generate the same output, it is still going to prefragment > the data, only to merge them back for IPsec and then refragment > again. This is far away from being optimal, but this is what usually happens if a local application sends data that we need to fragment. We currently work on avoiding the linearization on IPsec, but having a skb with a fraglist is really the worst case.