Package: linux-image-5.10.0-19-amd64 Version: 5.10.149-2 Severity: important
Dear Maintainer, Starting with linux-image-5.10.0-15-amd64 (5.10.120-1), it seems that the kernel is reusing ephemeral tcp ports too quickly, even if net.ipv4.tcp_tw_reuse is set to 0. linux-image-5.10.0-14-amd64 (5.10.113-1) and all earlier versions did not show that behaviour. The behaviour is the same for IPv4 and IPv6. * What led up to the situation? I have a couple of medium-to-fairly busy web servers that open TCP sessions (~15-20 new connections per second) to a dedicated port on a backend server. The connections are short-lived and terminated by the backend server after 1 second on average. This setup has been working for many years through many Debian releases and kernel versions. On July 2 2022 I updated (apt update) the systems, which upgraded the linux kernel image from 5.10.0-14 to 5.10.0-15. Shortly afterwards I noticed an increasing number of connection errors being reported by the web servers (timeouts). Further analysis (mostly with tcpdump) showed that the web servers had started reusing ephemeral TCP ports as shortly as 30 seconds after their last use. At that time (30 sec) the backend server (which is also Debian) still had the corresponding sockets in the TIME_WAIT status and replied to the new SYN packet with an ACK instead of a SYN ACK (this is of course normal behaviour, since the socket was still open). The web server did not expect the ACK and discarded it, occasionally resending the SYN, until a timeout occurred. The choice of ephemeral source ports appeared quite erratic. For some seconds they were chosen in ascending order as expected, then seemed to jump back to some lower position, proceed in ascending order from there again, then jump back to the higher position from where they had left off before etc. * What exactly did you do (or not do) that was effective (or ineffective)? I first raised the port range for the ephemeral ports by setting net.ipv4.ip_local_port_range=1024 60999 (from the default 32768 60999). This alleviated the situation (so that the timeouts became less frequent), but did not solve the problem. I then set net.ipv4.tcp_tw_reuse = 0 (from the default 2), which did not change anything (as is expected in this case). * What was the outcome of this action? None of the measures I took proved effective. So I downgraded the kernel to 5.10.0-14, and the problem immediately went away. The web servers now cycle through the available ~60000 ephemeral ports and come around to reusing them long after the socket on the backend server has been closed. I am opening this bug here because I am not knowledgeable enough about the Debian kernel patches to decide whether or not this issue is already present in the upstream vanilla kernel. Thank you for looking into this. Best regards Markus Wernig -- System Information: Debian Release: 11.5 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 5.10.0-14-amd64 (SMP w/4 CPU threads) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.utf8), LANGUAGE=en_US:en Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages linux-image-5.10.0-14-amd64 depends on: ii initramfs-tools [linux-initramfs-tool] 0.140 ii kmod 28-1 ii linux-base 4.6 Versions of packages linux-image-5.10.0-14-amd64 recommends: ii apparmor 2.13.6-10 ii firmware-linux-free 20200122-1 Versions of packages linux-image-5.10.0-14-amd64 suggests: pn debian-kernel-handbook <none> ii grub-pc 2.06-3~deb11u2 pn linux-doc-5.10 <none>