Hi all! The linux kernel has been audited regarding handling of fragmentation and path MTUs. Here an overview:
=== Hardening fragmentation cache against Hash-DoS (actually only important for IPv6, but IPv4 is protected, too): https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=5a3da1fe9561828d0ca7eca664b16ec2b9bf0055 === Not respecting IP_MTU_DISCOVER or IPV6_MTU_DISCOVER if outgoing interface has UDP Fragmentation Offloading enabled or the socket was corked: * https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=daba287b299ec7a2c61ae3a714920e90e8396ad5 * https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=4df98e76cde7c64b5606d82584c65dda4151bd6a === New IP_MTU_DISCOVER/IPV6_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE/IPV6_PMTUDISC_INTERFACE: * https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=482fc6094afad572a4ea1fd722e7b11ca72022a0 * https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=93b36cf3425b9bd9c56df7680fb237686b9c82ae === New no_pmtu_disc modes: * https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=188b04d580ab7acf11eb77cb564692050c10edfe * https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=cd174e67a6b312fce9bab502ba2b0583e11f537f Per-namespace no_pmtu_disc mode: * https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=974eda11c54290a1be8f8b155edae7d791e5ce57 Hardened pmtu mode (per namespace, imitates pmtu logic from freebsd): * https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=8ed1dc44d3e9e8387a104b1ae8f92e9a3fbf1b1e === Protect forwarding path against malicious path mtu information: * https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=f87c10a8aa1e82498c42d0335524d6ae7cf5a52b * https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=0954cf9c6141d597929a292b93a2dca2c1f29159 === Networking PRNG improvments (used for UDP bind(0)/autobind and TCP bind(0)) port alloation. Maybe not that important as name servers do their own randomization, I think. But glibc resolver seems to depend on that. * https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=a98814cef87946d2708812ad9f8b1e03b8366b6f * https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=6d31920246a9fc80be4f16acd27c0bbe8d7b8494 * https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=4af712e8df998475736f3e2727701bd31e3751a9 === Fully randomized port mapping in nat code: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=34ce324019e76f6d93768d68343a0e78f464d754 === All those changes will be available in Linux 3.14. There were some more changes how hash table secret initialisation is done. Also some races regarding secure sequential port allocation seeding with netfilter and UDP/TCP socket initialisation races were fixed. Future patches I have in work deal with matching ICMP payloads (if the idea works out). Some fixes may be required on routers with tunnel interfaces (not really sure how to deal with this) and maybe when using generic segmentation offloading (e.g. virtualization). Generally the patches are only the major ones and I don't recommend blindly backporting those because some of them depend on other changes. This now makes it possible to ensure that packets from a linux box do not get fragmented if the max. UDP size is limited by a name server. Otherwise attackers could spray ICMP or UDP payloaded icmp fragmentation needed packets against a box to reduce the path mtu. Linux would then start fragmenting packets even though the name server software tried to avoid that by specifying a max udp packet size (name server can now specifically request the generation of non-fragmented packets with IP_PMTUDISC_INTERFACE with no DF-bit set, for IPv4 this will already be available in v3.13, man-page update is pending). For details you could also have a look into the commit messages. Would it be of interest to get the state of fragmentation on incoming datagrams by e.g. ancillary data on recvmsg so resolvers could check if the incoming packet was fragmented then drop and retry if it was below a certain size? Many thanks to Daniel Borkman, Florian Weimer and Haya Shulman for helping and standing by for questions or patches. Especially thanks to Florian Weimer, who got this ball rolling. Looking for feedback and suggestions, thank you! Greetings, Hannes _______________________________________________ dns-operations mailing list [email protected] https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
