Hello, On Wed, 2015-07-08 at 19:17 +0300, Timo Teras wrote: > On Wed, 08 Jul 2015 17:52:32 +0200 > Hannes Frederic Sowa <han...@stressinduktion.org> wrote: > > > On Wed, 2015-07-08 at 16:30 +0300, Timo Teras wrote: > > > This probably is due to the way how the xfrm+gre work together. On > > > first packet, the gre tunnel driver updates pmtu for the inner > > > flow, > > > which is expected to be honored always. And if the 'ttl' value is > > > set for gre tunnel, no re-fragmentation is allowed as the inner > > > flow > > > should know better. This does how the side effect that if the very > > > first packet is large, it'll be dropped to 'learn' the pmtu. > > > > > > It's probably not possible to detect this kind of target easily, > > > as > > > the xfrm can be applied or not even on per inner target IP basis > > > (as > > > then tunnel destination IP can be dynamic for nbma tunnels). > > > > I am currently not sure if we actually have resolved the xfrm path > > at > > the time we enter ip_forward, I actually thought we do. In this case > > we should be able to use skb_dst->dst->path->header_len and > > substract > > it before using it to fragment the packets. I hope it is so easy... > > :) > > It is not. The inner skb just knows that it's going from ethX -> greX. > And that's what contains the path MTU, and that's what ip_forward will > use. > > Only on gre_xmit it is resolved where the tunnel packet goes, and the > xfrm resolved. Thus the update_pmtu work fully internally here.
Oh, yes, sorry, gre is not xfrm and doesn't propagate the information towards the first routing lookup. > > I would actually avoid telling anyone to enable using the path mtu > > information in forwarding ever again. > > The problem here is that pmtu framework is used internally to relay > the > trusted stacking pmtu in addition to the from-the-wire learned pmtu. Yes, and it is not easy to propagate this trusted state across all the different mtu storage location we have (metrics, fnhe, etc...). I don't know if it is worth the effort. > > > So I wonder if ip_gre driver can workaround this somehow, by e.g. > > > refragmenting if necessary. Or if we just should update the > > > sysctl's > > > help text to say that this another scenario where it needs to be > > > turned on. > > > > If above idea does not work, we could simply add an option to gre > > driver to set skb->ignore_df, but I don't like that much. > > This is not acceptable. The gre driver has two operating modes: DF and > non-DF mode (which is triggered by 'ttl inherit' or 'ttl <number>' > option on tunnel creation). The DF mode intentionally sets DF on all > tunnel packets so the pmtu is learned and relayed up the stack. In > non-DF mode the tunnel packets DF is derived from encapsulated packet. > > Basically this info could be used. If the target is gre1 in DF mode, > we > should be trusting the pmtu. Otherwise the existing internal mechanism > breaks. > > Thoughts? At least we know which interface the packet would leave. Should we override this behavior on a per-interface basis? (Although I am in favor of admins just correcting the mtu by hand and documenting this as you proposed earlier. I really don't know if it is worth the effort to propagate those information.). Thanks, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html