Conceptually this is right.

And I'm 100% fine with dev mtu change triggering pmtu decrease.

I'm not so sold on the pmtu increase.

PMTUD is one of those things that never ever works right in practice.
There's too many icmp blackholes, rate limits, overloaded management
cpus in switches,
misconfigurations, missing tcp mss clamps, icmps routed differently
then the flows due to ecmp hashing, middle boxes that don't affect the
icmp but change the tcp stream, etc.

In particular there's a lot of routing hardware that can handle
gigabits or terabits of traffic, but can generate only 10s-100s of
packet too big messages per second (ie. a tiny fraction of line rate
pps).  Worse yet, under overload it often falls back to simply
dropping and generating no icmp errors.

I spend a significant fraction of my time making sure we never rely on PMTUD.

Debugging MTU related blackholes is a constant bane of my existence.

[btw. we're considering adding a hack to always fragment UDP to
min(1280, dev/route/path mtu)...]

Basically: lower is always better because it's more likely to work...

Reply via email to