Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-04 Thread Lance Richardson
> From: "Shmulik Ladkani" 
> To: "Lance Richardson" 
> Cc: netdev@vger.kernel.org, f...@strlen.de, jtl...@redhat.com, 
> han...@stressinduktion.org
> Sent: Friday, November 4, 2016 5:24:09 AM
> Subject: Re: [PATCH net v3] ipv4: allow local fragmentation in 
> ip_finish_output_gso()
> 
> Hi,
> 
> On Thu, 3 Nov 2016 09:06:27 -0400 (EDT) Lance Richardson
>  wrote:
> > I'm not sure what could be added that would help, was there something
> > specific you had in mind?
> 
> How about something like this (preliminary, feel free to massage):
> 
> @@ -248,10 +248,16 @@ static int ip_finish_output_gso(struct net *net, struct
> sock *sk,
>  
>   /* Slowpath -  GSO segment length is exceeding the dst MTU.
>*
> -  * This can happen in two cases:
> -  * 1) TCP GRO packet, DF bit not set
> -  * 2) skb arrived via virtio-net, we thus get TSO/GSO skbs directly
> -  * from host network stack.
> +  * This can happen in several cases:
> +  *  - Forwarding of TCP GRO packet, DF bit not set
> +  *  - Forwarding of skb arrived in a virtualization environment (from
> +  *virtio-net/vhost/tap) with TSO/GSO size set by other's network
> +  *stack
> +  *  - Local GSO skb xmitted on an NETIF_F_TSO tunnel stacked over an
> +  *interface with a smaller mtu
> +  *  - Arriving GRO skb (or GSO skb in a virtualized env) that gets L2
> +  *bridged to a NETIF_F_TSO tunnel stacked over an interface with an
> +  *insufficent mtu
>*/
>   features = netif_skb_features(skb);
>   BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET);
> 

Thanks, that looks good to me. I can send a follow-up patch with this change,
if you like (there seems to be agreement that the original patch is OK).

   Lance


Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-04 Thread Lance Richardson
> From: "Shmulik Ladkani" 
> To: "Hannes Frederic Sowa" 
> Cc: "Lance Richardson" , f...@strlen.de, 
> netdev@vger.kernel.org, jtl...@redhat.com
> Sent: Friday, November 4, 2016 5:40:14 AM
> Subject: Re: [PATCH net v3] ipv4: allow local fragmentation in 
> ip_finish_output_gso()
> 
> On Thu, 3 Nov 2016 22:34:34 +0100 Hannes Frederic Sowa
>  wrote:
> > Correct, but we should maybe redefine the code a bit. From my
> > understanding we can now create an ICMP storm in case every fragment gets.
> 
> Yes, you are right.
> 
> Each segment gets into ip_fragment, and due to outer DF being set,
> ICMP_FRAG_NEEDED is sent per segment.
> 
> BTW, suppose GRO is off, and sender actually did send a burst of
> (non-gso) packets with outer DF set, and each was tunnel encapsulated,
> resulting in oversized frames.
> 
> Would'nt the stack just send the ICMP_FRAG_NEEDED per encapsulated
> frame?
> 
> If so, then the GRO behaviour is aligned, and there's nothing to fix.
> 

Agree.

> Best,
> Shmulik
> 


Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-04 Thread Shmulik Ladkani
On Thu, 3 Nov 2016 22:34:34 +0100 Hannes Frederic Sowa 
 wrote:
> Correct, but we should maybe redefine the code a bit. From my
> understanding we can now create an ICMP storm in case every fragment gets.

Yes, you are right.

Each segment gets into ip_fragment, and due to outer DF being set,
ICMP_FRAG_NEEDED is sent per segment.

BTW, suppose GRO is off, and sender actually did send a burst of
(non-gso) packets with outer DF set, and each was tunnel encapsulated,
resulting in oversized frames.

Would'nt the stack just send the ICMP_FRAG_NEEDED per encapsulated
frame?

If so, then the GRO behaviour is aligned, and there's nothing to fix.

Best,
Shmulik


Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-04 Thread Shmulik Ladkani
Hi,

On Thu, 3 Nov 2016 09:06:27 -0400 (EDT) Lance Richardson  
wrote:
> I'm not sure what could be added that would help, was there something
> specific you had in mind?

How about something like this (preliminary, feel free to massage):

@@ -248,10 +248,16 @@ static int ip_finish_output_gso(struct net *net, struct 
sock *sk,
 
/* Slowpath -  GSO segment length is exceeding the dst MTU.
 *
-* This can happen in two cases:
-* 1) TCP GRO packet, DF bit not set
-* 2) skb arrived via virtio-net, we thus get TSO/GSO skbs directly
-* from host network stack.
+* This can happen in several cases:
+*  - Forwarding of TCP GRO packet, DF bit not set
+*  - Forwarding of skb arrived in a virtualization environment (from
+*virtio-net/vhost/tap) with TSO/GSO size set by other's network
+*stack
+*  - Local GSO skb xmitted on an NETIF_F_TSO tunnel stacked over an
+*interface with a smaller mtu
+*  - Arriving GRO skb (or GSO skb in a virtualized env) that gets L2
+*bridged to a NETIF_F_TSO tunnel stacked over an interface with an
+*insufficent mtu
 */
features = netif_skb_features(skb);
BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET);




Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-04 Thread Shmulik Ladkani
Hi,

On Thu, 3 Nov 2016 17:05:54 -0400 (EDT) Lance Richardson  
wrote:
> 
> I'm still digesting the patchwork history, but it seems to me:
> 
>1) If we call skb_gso_validate_mtu() and it returns true, 
> ip_finish_output2() will
>   be called, just as before, so nothing changes.
> 
>2) If we were to avoid calling skb_gso_validate_mtu() when it would have 
> returned
>   false and call ip_finish_output2() without performing fragmentation, we
>   would transmit (or attempt to transmit) a packet that exceeds the 
> configured
>   MTU.

Yes, this seemed strange to me as well.

I proposed your exact same patch, but Hannes wanted to limit its
behavior, expressing the following concerns (see [1]), quote:

"The reason why I rather don't want to see a general solution is that
currently gre and ipip are pmtu-safe and I rather would like to keep the
old behavior that gre and ipip packets don't get treated alike. I
couldn't check the current throughly but it seems they would also be
affected by this change.

My idea was to put those IPSKB_FORWARD just into the vxlan and geneve
endpoints which could be seen as UDP endpoints and don't forward and
care about pmtu events."

and

"My argumentation is that vxlan and geneve can be seen as endpoints and
don't have proper pmtu handling. Thus I would be fine with adding hacks
to just make it working. For GRE I would like to keep strict internet
pmtu handling and also keep the normal ethernet semantics in place, that
we should drop packets if another interface did transmit on an ethernet
bridge with a too big frame size."

Point is, I have no objection to the suggested patch as a whole.
I think (and thought) it is generally correct, as it fixes the anomalies
happending with GSO skbs not getting same treatment as non-gso skbs.

Just want to check whether Hannes' concerns are still valid, and if so,
refine the patch as needed.

Best,
Shmulik

[1] https://patchwork.ozlabs.org/patch/646683/


Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-03 Thread Hannes Frederic Sowa
On 03.11.2016 22:05, Lance Richardson wrote:
>> From: "Shmulik Ladkani" 
>> To: "Lance Richardson" , f...@strlen.de, 
>> han...@stressinduktion.org
>> Cc: netdev@vger.kernel.org, jtl...@redhat.com
>> Sent: Thursday, November 3, 2016 4:27:51 PM
>> Subject: Re: [PATCH net v3] ipv4: allow local fragmentation in 
>> ip_finish_output_gso()
>>
>> Hi Hannes, Lance,
>>
>> On Wed,  2 Nov 2016 16:36:17 -0400 Lance Richardson 
>> wrote:
>>>  
>>> -   if (skb_iif && !(df & htons(IP_DF))) {
>>> -   /* Arrived from an ingress interface, got encapsulated, with
>>> -* fragmentation of encapulating frames allowed.
>>> -* If skb is gso, the resulting encapsulated network segments
>>> -* may exceed dst mtu.
>>> -* Allow IP Fragmentation of segments.
>>> -*/
>>> -   IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
>>> -   }
>>
>> Thinking this over, I'm concerned of this change.
>>
>> Few months back, we discussed this and got to the conclusion that in the
>> "ingress,tunnel,egress" scenario, segments are allowed to be
>> fragmented if the original inner ip packet does NOT have the DF.



>>
>> See
>>   https://patchwork.ozlabs.org/patch/657132/
>>   https://patchwork.ozlabs.org/patch/661219/
>>
>> I think you expressed that those tunneled skbs already having DF set
>> should go through pmtu discovery.
>>
>> Suggested patch unconditionally calls skb_gso_validate_mtu().
>>
>> Thus we're changing behavior for "ingress,tunnel,egress" scenario of
>> the tunneled packets having DF set in the inner iph.
>>
>> WDYT?
>>
> 
> I'm still digesting the patchwork history, but it seems to me:
> 
>1) If we call skb_gso_validate_mtu() and it returns true, 
> ip_finish_output2() will
>   be called, just as before, so nothing changes.
> 
>2) If we were to avoid calling skb_gso_validate_mtu() when it would have 
> returned
>   false and call ip_finish_output2() without performing fragmentation, we
>   would transmit (or attempt to transmit) a packet that exceeds the 
> configured
>   MTU.
> 
>3) If we do call skb_gso_validate_mtu() and it returns false, 
> ip_finish_output_gso()
>   will call ip_fragment() to perform needed fragmentation. Whether 
> fragmentation
>   actually occurs at this point depends on the value of the DF flag in the
>   IP header (and perhaps skb->ignore_df, frag_max_size, etc.).
> 
> Is the issue you're pointing out about cases in which the inner IP header has 
> DF set
> but the tunnel header does not?

Correct, but we should maybe redefine the code a bit. From my
understanding we can now create an ICMP storm in case every fragment gets.

I think for net this is currently fine and we certainly don't want to
generate oversized datagrams.

I fear the more special case logic we add the sooner or later it will
bite us again. :/

Right now, I see the problem we might end up generating lots of error
callbacks for large gso segments. Maybe we can also just abort after
fragmenting the first frame generated an error. Florian? Or just
overoptimize and jump to the last one, which could have a different size. :)

Anyway, the best thing would be that vxlan etc. inherits the inner DF bit.

The problem to solve here is that for some time we generated oversized
packets on the wire, which is absolutely a no-go. If we now start to
break things for people with "wrong" configuration, we could also get
more complains. Currently I think this is the only way out, unfortunately.

Any other ideas?


Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-03 Thread Lance Richardson
> From: "Shmulik Ladkani" 
> To: "Lance Richardson" , f...@strlen.de, 
> han...@stressinduktion.org
> Cc: netdev@vger.kernel.org, jtl...@redhat.com
> Sent: Thursday, November 3, 2016 4:27:51 PM
> Subject: Re: [PATCH net v3] ipv4: allow local fragmentation in 
> ip_finish_output_gso()
> 
> Hi Hannes, Lance,
> 
> On Wed,  2 Nov 2016 16:36:17 -0400 Lance Richardson 
> wrote:
> >  
> > -   if (skb_iif && !(df & htons(IP_DF))) {
> > -   /* Arrived from an ingress interface, got encapsulated, with
> > -* fragmentation of encapulating frames allowed.
> > -* If skb is gso, the resulting encapsulated network segments
> > -* may exceed dst mtu.
> > -* Allow IP Fragmentation of segments.
> > -*/
> > -   IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
> > -   }
> 
> Thinking this over, I'm concerned of this change.
> 
> Few months back, we discussed this and got to the conclusion that in the
> "ingress,tunnel,egress" scenario, segments are allowed to be
> fragmented if the original inner ip packet does NOT have the DF.
> 
> See
>   https://patchwork.ozlabs.org/patch/657132/
>   https://patchwork.ozlabs.org/patch/661219/
> 
> I think you expressed that those tunneled skbs already having DF set
> should go through pmtu discovery.
> 
> Suggested patch unconditionally calls skb_gso_validate_mtu().
> 
> Thus we're changing behavior for "ingress,tunnel,egress" scenario of
> the tunneled packets having DF set in the inner iph.
> 
> WDYT?
> 

I'm still digesting the patchwork history, but it seems to me:

   1) If we call skb_gso_validate_mtu() and it returns true, 
ip_finish_output2() will
  be called, just as before, so nothing changes.

   2) If we were to avoid calling skb_gso_validate_mtu() when it would have 
returned
  false and call ip_finish_output2() without performing fragmentation, we
  would transmit (or attempt to transmit) a packet that exceeds the 
configured
  MTU.

   3) If we do call skb_gso_validate_mtu() and it returns false, 
ip_finish_output_gso()
  will call ip_fragment() to perform needed fragmentation. Whether 
fragmentation
  actually occurs at this point depends on the value of the DF flag in the
  IP header (and perhaps skb->ignore_df, frag_max_size, etc.).

Is the issue you're pointing out about cases in which the inner IP header has 
DF set
but the tunnel header does not?

Thanks,

   Lance


Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-03 Thread David Miller
From: Shmulik Ladkani 
Date: Thu, 3 Nov 2016 22:40:52 +0200

> On Thu, 03 Nov 2016 16:12:44 -0400 (EDT) David Miller  
> wrote:
>> Applied and queued up for -stable.
> 
> Dave, my response lagged your "Applied" by few minutes ;)
> 
> This seems to deserve some more thought to make sure nothing got broken,
> as expressed last in https://patchwork.ozlabs.org/patch/690594/

Feel free to send a followup or a revert if necessary.


Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-03 Thread Shmulik Ladkani
On Thu, 03 Nov 2016 16:12:44 -0400 (EDT) David Miller  
wrote:
> Applied and queued up for -stable.

Dave, my response lagged your "Applied" by few minutes ;)

This seems to deserve some more thought to make sure nothing got broken,
as expressed last in https://patchwork.ozlabs.org/patch/690594/

Best,
Shmulik


Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-03 Thread Shmulik Ladkani
Hi Hannes, Lance,

On Wed,  2 Nov 2016 16:36:17 -0400 Lance Richardson  wrote:
>  
> - if (skb_iif && !(df & htons(IP_DF))) {
> - /* Arrived from an ingress interface, got encapsulated, with
> -  * fragmentation of encapulating frames allowed.
> -  * If skb is gso, the resulting encapsulated network segments
> -  * may exceed dst mtu.
> -  * Allow IP Fragmentation of segments.
> -  */
> - IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
> - }

Thinking this over, I'm concerned of this change.

Few months back, we discussed this and got to the conclusion that in the
"ingress,tunnel,egress" scenario, segments are allowed to be
fragmented if the original inner ip packet does NOT have the DF.

See 
  https://patchwork.ozlabs.org/patch/657132/
  https://patchwork.ozlabs.org/patch/661219/

I think you expressed that those tunneled skbs already having DF set
should go through pmtu discovery.

Suggested patch unconditionally calls skb_gso_validate_mtu().

Thus we're changing behavior for "ingress,tunnel,egress" scenario of
the tunneled packets having DF set in the inner iph.

WDYT?


Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-03 Thread David Miller
From: Lance Richardson 
Date: Wed,  2 Nov 2016 16:36:17 -0400

> Some configurations (e.g. geneve interface with default
> MTU of 1500 over an ethernet interface with 1500 MTU) result
> in the transmission of packets that exceed the configured MTU.
> While this should be considered to be a "bad" configuration,
> it is still allowed and should not result in the sending
> of packets that exceed the configured MTU.
> 
> Fix by dropping the assumption in ip_finish_output_gso() that
> locally originated gso packets will never need fragmentation.
> Basic testing using iperf (observing CPU usage and bandwidth)
> have shown no measurable performance impact for traffic not
> requiring fragmentation.
> 
> Fixes: c7ba65d7b649 ("net: ip: push gso skb forwarding handling down the 
> stack")
> Reported-by: Jan Tluka 
> Signed-off-by: Lance Richardson 

Applied and queued up for -stable.


Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-03 Thread Lance Richardson
> From: "Shmulik Ladkani" 
> To: "Lance Richardson" 
> Cc: netdev@vger.kernel.org, f...@strlen.de, jtl...@redhat.com, 
> han...@stressinduktion.org
> Sent: Thursday, November 3, 2016 3:42:39 AM
> Subject: Re: [PATCH net v3] ipv4: allow local fragmentation in 
> ip_finish_output_gso()
> 
> On Wed,  2 Nov 2016 16:36:17 -0400
> Lance Richardson  wrote:
> 
> > -   /* common case: fragmentation of segments is not allowed,
> > -* or seglen is <= mtu
> > +   /* common case: seglen is <= mtu
> >  */
> > -   if (((IPCB(skb)->flags & IPSKB_FRAG_SEGS) == 0) ||
> > - skb_gso_validate_mtu(skb, mtu))
> > +   if (skb_gso_validate_mtu(skb, mtu))
> > return ip_finish_output2(net, sk, skb);
> 
> OK.
> Resembles https://patchwork.ozlabs.org/patch/644724/ ;)
> 
> > -   if (skb_iif && !(df & htons(IP_DF))) {
> > -   /* Arrived from an ingress interface, got encapsulated, with
> > -* fragmentation of encapulating frames allowed.
> > -* If skb is gso, the resulting encapsulated network segments
> > -* may exceed dst mtu.
> > -* Allow IP Fragmentation of segments.
> > -*/
> > -   IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
> > -   }
> 
> The only potential issue I see removing this, is that the KNOWLEDGE of
> the forwarding/tunnel-bridging usecases (that require a
> skb_gso_validate_mtu check) is lost.
> 
> Meaning, if one decides (as claimed in the past) that locally generated
> traffic must NOT go through fragmentation validation, then no one
> remembers those other valid usecases (which are very specific and not
> trivial to imagine) that MUST go through it; Therefore it can easily get
> broken.
> 

Hi Shmulik,

It is my understanding that the test to avoid fragmentation validation
for locally originated packets was a performance optimization which was
based on the premise that the IP stack will never produce locally originated
packets with an MTU mismatch. Tunneling encapsulation breaks this premise.
For correctness (honoring configured MTU), it seems we cannot avoid calling
skb_gso_validate_mtu().

> Maybe we can elaborate in the comment above the call to
> skb_gso_validate_mtu in ip_finish_output_gso?

I'm not sure what could be added that would help, was there something
specific you had in mind?

Thanks for reviewing this.

Regards,

   Lance
> 
> Best,
> Shmulik
> 


Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-03 Thread Hannes Frederic Sowa
On 03.11.2016 08:42, Shmulik Ladkani wrote:
> On Wed,  2 Nov 2016 16:36:17 -0400
> Lance Richardson  wrote:
> 
>> -/* common case: fragmentation of segments is not allowed,
>> - * or seglen is <= mtu
>> +/* common case: seglen is <= mtu
>>   */
>> -if (((IPCB(skb)->flags & IPSKB_FRAG_SEGS) == 0) ||
>> -  skb_gso_validate_mtu(skb, mtu))
>> +if (skb_gso_validate_mtu(skb, mtu))
>>  return ip_finish_output2(net, sk, skb);
> 
> OK.
> Resembles https://patchwork.ozlabs.org/patch/644724/ ;)
> 
>> -if (skb_iif && !(df & htons(IP_DF))) {
>> -/* Arrived from an ingress interface, got encapsulated, with
>> - * fragmentation of encapulating frames allowed.
>> - * If skb is gso, the resulting encapsulated network segments
>> - * may exceed dst mtu.
>> - * Allow IP Fragmentation of segments.
>> - */
>> -IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
>> -}
> 
> The only potential issue I see removing this, is that the KNOWLEDGE of
> the forwarding/tunnel-bridging usecases (that require a
> skb_gso_validate_mtu check) is lost.
> 
> Meaning, if one decides (as claimed in the past) that locally generated
> traffic must NOT go through fragmentation validation, then no one
> remembers those other valid usecases (which are very specific and not
> trivial to imagine) that MUST go through it; Therefore it can easily get
> broken.

I agree but I think the git changelog reassembles the changes in this
area in good enough detail and it can be added if there is need for that
currently I don't see a reason why to leave this code there.

I am a bit sad to remove this condition, but the alternative, to
generate oversized frames, is absolutely not acceptable. :/

Thanks,
Hannes



Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-03 Thread Hannes Frederic Sowa
On 02.11.2016 21:36, Lance Richardson wrote:
> Some configurations (e.g. geneve interface with default
> MTU of 1500 over an ethernet interface with 1500 MTU) result
> in the transmission of packets that exceed the configured MTU.
> While this should be considered to be a "bad" configuration,
> it is still allowed and should not result in the sending
> of packets that exceed the configured MTU.
> 
> Fix by dropping the assumption in ip_finish_output_gso() that
> locally originated gso packets will never need fragmentation.
> Basic testing using iperf (observing CPU usage and bandwidth)
> have shown no measurable performance impact for traffic not
> requiring fragmentation.
> 
> Fixes: c7ba65d7b649 ("net: ip: push gso skb forwarding handling down the 
> stack")
> Reported-by: Jan Tluka 
> Signed-off-by: Lance Richardson 

Acked-by: Hannes Frederic Sowa 

Thanks Florian for remembering that we can remove IPSKB_FRAG_SEGS now
again. :)




Re: [PATCH net v3] ipv4: allow local fragmentation in ip_finish_output_gso()

2016-11-03 Thread Shmulik Ladkani
On Wed,  2 Nov 2016 16:36:17 -0400
Lance Richardson  wrote:

> - /* common case: fragmentation of segments is not allowed,
> -  * or seglen is <= mtu
> + /* common case: seglen is <= mtu
>*/
> - if (((IPCB(skb)->flags & IPSKB_FRAG_SEGS) == 0) ||
> -   skb_gso_validate_mtu(skb, mtu))
> + if (skb_gso_validate_mtu(skb, mtu))
>   return ip_finish_output2(net, sk, skb);

OK.
Resembles https://patchwork.ozlabs.org/patch/644724/ ;)

> - if (skb_iif && !(df & htons(IP_DF))) {
> - /* Arrived from an ingress interface, got encapsulated, with
> -  * fragmentation of encapulating frames allowed.
> -  * If skb is gso, the resulting encapsulated network segments
> -  * may exceed dst mtu.
> -  * Allow IP Fragmentation of segments.
> -  */
> - IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
> - }

The only potential issue I see removing this, is that the KNOWLEDGE of
the forwarding/tunnel-bridging usecases (that require a
skb_gso_validate_mtu check) is lost.

Meaning, if one decides (as claimed in the past) that locally generated
traffic must NOT go through fragmentation validation, then no one
remembers those other valid usecases (which are very specific and not
trivial to imagine) that MUST go through it; Therefore it can easily get
broken.

Maybe we can elaborate in the comment above the call to
skb_gso_validate_mtu in ip_finish_output_gso?

Best,
Shmulik