On 11/24/25 9:22 PM, Mike Pattrick via dev wrote:
> Add support in the userspace datapath for PATH MTU on tunnel interfaces.
> 
> This feature allows users to configure an MTU on tunnel ports. If set,
> when the userspace datapath attempts to encapsulate a packet that
> exceeds the tunnels MTU, OVS will generate and send an ICMP
> Fragmentation Needed or Packet Too Big message back to the source host.
> 
> If an MTU is not set on the tunnel interface, there is no change in
> behaviour.
> 
> Reported-at: https://issues.redhat.com/browse/FDP-256
> Signed-off-by: Mike Pattrick <[email protected]>
> ---

Hi, Mike.  Thanks for working on this.  Though I'm not sure the current
patch is enough.  There are few issues with the implementation itself:

- The ICMP error packet only has an input port initialized in the metadata.
This is not sufficient to understand where this error is coming from as
a single tunnel port even on OpenFlow level can handle a lot of different
destinations.  Flow-based tunnels can have every aspect of them coming
from OpenFlow rules, including source and destination IPs.
The tunnel metadata must be populated, i.e. we need to parse the tunnel
header and populate packet md with the values from it.  That includes
IPs, ports and TLVs as well.  For example, OVN will just through away
packets coming from a tunnel that do not have OVN metadata in the TLVs.

Note: while we can reverse the addresses and ports from the header while
populating metadata, we can't do the same for Geneve TLVs and VxLAN VNI
and other stuff like that as we simply have no idea what should be in there.
Kernel has the same problem, but, IIRC, it will just have the exact TLVs
of the original packet in the metadata of the ICMP error packet.  OVN, IIRC,
is aware of this and handles them properly.

Note2: Since the port number is not enough to determine where the packet
is coming from anyway, I'm not sure if the previous patch from this set
is needed.  It makes behavior of userspace and kernel datapaths different,
which is not a good thing.

- Normally, the main idea of PMTU discovery is that we do not actually
know the MTU of the fabric.  MTU of the tunnel interface or the physical
port attached to OVS is not really a problem in vast majority of cases
as all CMSes are setting MTU on all the ports they know about.  This
includes eth0 inside containers and VMs, first is set directly by the
CNI, the second is handled by DHCP replies received by the VM.

The main problem is when we have some routes in the physical network with
MTU lower than MTU of our physical interface.  This is what PMTUD is mostly
needed for.  But this case is not covered by this change as we'd need to
capture ICMP errors coming towards our tunnel port and adjust MTU dynamically
to have support for this.

More likely, we'd need to track MTU on a route level, as kernel does, creating
route cache entries with MTU values and use those values while resolving routes
for packets after encapsulation.  MTU would likely need to become part of the
tunnel_push() actions, so we can update them during revalidation in case we
get a route cache update, as route lookup is only performed during the flow
translation.

Having an MTU configuration on a tunnel port may be useful in a very narrow
set of use cases where the guest is misconfigured and is sending over-MTU
packets, but I'm not sure if it's worth adding so much code to support this.

Note for the tests: Would be great to see some more complex tests, e.g.
a test with conntrack that correctly identifies the ICMP error as related to
an existing NATed connection and allows forwarding to the source and unNATs
the inner packet correctly.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to