On Mon, Jul 17, 2023 at 12:47:30AM +0200, Juliusz Chroboczek wrote: > >> IP does not support variable MTU links. > > > > Excuse me, but that's plain false. IP was designed in an environment where > > (non-ethernet) networks with various MTU standards were commonplace > > Sorry, I wasn't clear. IP requires every link to have a well-defined > MTU: all the nodes connected to a link must agree on the link's MTU.
I don't think that can be true either. PMTU can vary and paths can be asymmetric so two nodes could very well see different MTUs across the internet. There's just not many ASen that run with less than 1500 MTU :) Do you have a referece for this "MTU well-definedness" criteria, I don't think I ever heard of this. > > There is a way: My routing protocol just has to stop picking links that are > > obviously going to cause a problem. > > Could you please describe the problem in detail? Because I'm probably > missing something. Let me try to give some more context: My mesh network deploys two wg tunnels per node. One wg-over-v6 and one wg-over-v4 tunnel to support dualstack, v4-only and v6-only underlay networks. Nodes run babel over all wg interfaces and will receive a default route covering the wg-over-v6 tunnel endpoint addresses. Some nodes are served by IPv6 routers that are themselves part of the wg mesh network and only have v6 connectivity via wg-over-v4. This can cause wg-over-v6 tunnels on such nodes to want to cross a wg-over-v4 tunnel. All wg interfaces have MTU 1420 configured which is the worst case for wg-over-v6 or v4 (with MTU 1500). In the wg-over-wg-over-v4 case this results in packets that are too big for the v4 underlay network (1420+80+60=1560). Wireguard drops packets when they exceed the underlay network's MTU. When this happens no PTB ICMP errors are generated by wireguard inside the tunnel, packets are simply dropped and TCP applications running on the overlay IPv6 network break badly as no ICMP errors reach the sender. This can be avoided by simply ignoring the wg-over-v6 tunnel which only exists for deployment consistency as a wg-over-v4 tunnel with (actual) 1440 MTU is available too which can reach the entire network. Worth mentioning: The reason I have to run two wg tunnels per node to begin with is that wireguard's strategy for dual-stack support is that it doesn't have one. It supports only one endpoint address per tunnel (well wg-peer really) and if you pick wrong because, say, IPv6 addresses are available but dont work, the tunnel simply blackholes everything. Yey, joy is me. > If Wireguard implements RFC 4459 Section 3.2, then pushing a too large > packet over the tunnel, then Wireguard should synthesise an ICMP "packet > too large", which will cause the sender to retry with a smaller packet. > Is that not the case? Yeah, having wg forward PTB errors from the underlay to inside the tunnel was something I considered for fixing this but I belive that would be called "insecure" by the wg project since the ICMP erros aren't signed like normal wireguard packets. So what happens when an attacker sends spoofed PTB with MTU=0 etc. ;) Furthermore on IPv4 which unfortunately is the underlay in my network more often than not ICMP blackholes are very common so breakage would could ensue again. This really is just putting lipstick on a pig. It would "work" I suppose but I don't want my network to use these paths because the double encapsulation is just plain inefficient! Prune thy inefficient paths I say :] > I'm not opposed to your probing idea, but I'd really prefer to fully > understand the problem first. Sure thing, I'm not opposed to working the problem. I've just been dealing with this problem (and ducktape "solutions" surrounding it) for a while now and I just want to get this squared away so I can go back to my (mostly) IPv6-only bliss :D I think RFC4459 simply didn't consider L3 routing protocol based solutions. Probably since the usual network vendor suspects would never implementing something uncouth like this but we need not be constrained by the inefficiencies of the commercial world in the free software community, now do we :) Speaking of which I'm working on a babeld patch to see if my idea works. Just have to dig through the kernel code first to figure out which one of the amazingly (badly) named IP_PMTUDISC_* options I want to use to force it to neither do fragmentation nor attempt PMTU for the babel socket. Thanks, --Daniel _______________________________________________ Babel-users mailing list Babel-users@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users