[Kernel-packages] [Bug 2037335] Re: kernel leaking TCP_MEM
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: linux-meta-aws-6.2 (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-meta-aws-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2037335 Title: kernel leaking TCP_MEM Status in linux-meta-aws-6.2 package in Ubuntu: Confirmed Bug description: We are running our Kafka brokers on Jammy on ARM64. Previous they were on kernel version 5.15.0-1028-aws, but a few weeks ago we built a new AMI and it picked up 6.2.0-1009-aws, and we have also upgraded to 6.2.0-1012-aws and found the same problem. What we expected to happen: TCP memory (TCP_MEM) to fluctuate but stay relatively low (on a busy production broker running 5.15.0-1028-aws, we average 1900 pages over a 24 hour period) What happened instead: TCP memory (TCP_MEM) continues to rise until hitting the limit (1.5 million pages as configured currently). At this point, the broker is no longer able to properly create new connections and we start seeing "kernel: TCP: out of memory -- consider tuning tcp_mem" in dmesg output. If allowed to continue, the broker will eventually isolate itself from the rest of the cluster since it can't talk to the other brokers. Attached is a graph of the average TCP memory usage per kernel version for our production environment over the past 24 hours. ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-aws 6.2.0.1012.12~22.04.1 ProcVersionSignature: Ubuntu 6.2.0-1012.12~22.04.1-aws 6.2.16 Uname: Linux 6.2.0-1012-aws aarch64 ApportVersion: 2.20.11-0ubuntu82.5 Architecture: arm64 CasperMD5CheckResult: unknown CloudArchitecture: aarch64 CloudID: aws CloudName: aws CloudPlatform: ec2 CloudRegion: us-east-1 CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Sep 25 20:56:02 2023 Ec2AMI: ami-0b9c5aafc5b2a4725 Ec2AMIManifest: (unknown) Ec2Architecture: arm64 Ec2AvailabilityZone: us-east-1b Ec2Imageid: ami-0b9c5aafc5b2a4725 Ec2InstanceType: im4gn.4xlarge Ec2Instancetype: im4gn.4xlarge Ec2Kernel: unavailable Ec2Ramdisk: unavailable Ec2Region: us-east-1 ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-meta-aws-6.2 UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-meta-aws-6.2/+bug/2037335/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2037335] Re: kernel leaking TCP_MEM
Thank you, that is great to hear! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-meta-aws-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2037335 Title: kernel leaking TCP_MEM Status in linux-meta-aws-6.2 package in Ubuntu: Confirmed Bug description: We are running our Kafka brokers on Jammy on ARM64. Previous they were on kernel version 5.15.0-1028-aws, but a few weeks ago we built a new AMI and it picked up 6.2.0-1009-aws, and we have also upgraded to 6.2.0-1012-aws and found the same problem. What we expected to happen: TCP memory (TCP_MEM) to fluctuate but stay relatively low (on a busy production broker running 5.15.0-1028-aws, we average 1900 pages over a 24 hour period) What happened instead: TCP memory (TCP_MEM) continues to rise until hitting the limit (1.5 million pages as configured currently). At this point, the broker is no longer able to properly create new connections and we start seeing "kernel: TCP: out of memory -- consider tuning tcp_mem" in dmesg output. If allowed to continue, the broker will eventually isolate itself from the rest of the cluster since it can't talk to the other brokers. Attached is a graph of the average TCP memory usage per kernel version for our production environment over the past 24 hours. ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-aws 6.2.0.1012.12~22.04.1 ProcVersionSignature: Ubuntu 6.2.0-1012.12~22.04.1-aws 6.2.16 Uname: Linux 6.2.0-1012-aws aarch64 ApportVersion: 2.20.11-0ubuntu82.5 Architecture: arm64 CasperMD5CheckResult: unknown CloudArchitecture: aarch64 CloudID: aws CloudName: aws CloudPlatform: ec2 CloudRegion: us-east-1 CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Sep 25 20:56:02 2023 Ec2AMI: ami-0b9c5aafc5b2a4725 Ec2AMIManifest: (unknown) Ec2Architecture: arm64 Ec2AvailabilityZone: us-east-1b Ec2Imageid: ami-0b9c5aafc5b2a4725 Ec2InstanceType: im4gn.4xlarge Ec2Instancetype: im4gn.4xlarge Ec2Kernel: unavailable Ec2Ramdisk: unavailable Ec2Region: us-east-1 ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-meta-aws-6.2 UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-meta-aws-6.2/+bug/2037335/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2037335] Re: kernel leaking TCP_MEM
Hello there, I think this may be the same issue as https://bugs.launchpad.net/ubuntu/+source/linux-signed- aws-6.2/+bug/2045560 I believe this might be related to the following kernel bug which impacts Linux 6.0.0+: https://lore.kernel.org/netdev/vi1pr01mb42407d7947b2ea448f1e04efd1...@vi1pr01mb4240.eurprd01.prod.exchangelabs.com/ A patch has been produced which fixes this issue (but has not yet made it into a Linux release): https://lore.kernel.org/all/20240421175248.1692552-1-eduma...@google.com/ Hope this helps! Jonathan -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-meta-aws-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2037335 Title: kernel leaking TCP_MEM Status in linux-meta-aws-6.2 package in Ubuntu: New Bug description: We are running our Kafka brokers on Jammy on ARM64. Previous they were on kernel version 5.15.0-1028-aws, but a few weeks ago we built a new AMI and it picked up 6.2.0-1009-aws, and we have also upgraded to 6.2.0-1012-aws and found the same problem. What we expected to happen: TCP memory (TCP_MEM) to fluctuate but stay relatively low (on a busy production broker running 5.15.0-1028-aws, we average 1900 pages over a 24 hour period) What happened instead: TCP memory (TCP_MEM) continues to rise until hitting the limit (1.5 million pages as configured currently). At this point, the broker is no longer able to properly create new connections and we start seeing "kernel: TCP: out of memory -- consider tuning tcp_mem" in dmesg output. If allowed to continue, the broker will eventually isolate itself from the rest of the cluster since it can't talk to the other brokers. Attached is a graph of the average TCP memory usage per kernel version for our production environment over the past 24 hours. ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-aws 6.2.0.1012.12~22.04.1 ProcVersionSignature: Ubuntu 6.2.0-1012.12~22.04.1-aws 6.2.16 Uname: Linux 6.2.0-1012-aws aarch64 ApportVersion: 2.20.11-0ubuntu82.5 Architecture: arm64 CasperMD5CheckResult: unknown CloudArchitecture: aarch64 CloudID: aws CloudName: aws CloudPlatform: ec2 CloudRegion: us-east-1 CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Sep 25 20:56:02 2023 Ec2AMI: ami-0b9c5aafc5b2a4725 Ec2AMIManifest: (unknown) Ec2Architecture: arm64 Ec2AvailabilityZone: us-east-1b Ec2Imageid: ami-0b9c5aafc5b2a4725 Ec2InstanceType: im4gn.4xlarge Ec2Instancetype: im4gn.4xlarge Ec2Kernel: unavailable Ec2Ramdisk: unavailable Ec2Region: us-east-1 ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-meta-aws-6.2 UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-meta-aws-6.2/+bug/2037335/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 2037335] Re: kernel leaking TCP_MEM
Rebuilt the AMI with 5.15.0-1045-aws and the problem is gone. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-meta-aws-6.2 in Ubuntu. https://bugs.launchpad.net/bugs/2037335 Title: kernel leaking TCP_MEM Status in linux-meta-aws-6.2 package in Ubuntu: New Bug description: We are running our Kafka brokers on Jammy on ARM64. Previous they were on kernel version 5.15.0-1028-aws, but a few weeks ago we built a new AMI and it picked up 6.2.0-1009-aws, and we have also upgraded to 6.2.0-1012-aws and found the same problem. What we expected to happen: TCP memory (TCP_MEM) to fluctuate but stay relatively low (on a busy production broker running 5.15.0-1028-aws, we average 1900 pages over a 24 hour period) What happened instead: TCP memory (TCP_MEM) continues to rise until hitting the limit (1.5 million pages as configured currently). At this point, the broker is no longer able to properly create new connections and we start seeing "kernel: TCP: out of memory -- consider tuning tcp_mem" in dmesg output. If allowed to continue, the broker will eventually isolate itself from the rest of the cluster since it can't talk to the other brokers. Attached is a graph of the average TCP memory usage per kernel version for our production environment over the past 24 hours. ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-aws 6.2.0.1012.12~22.04.1 ProcVersionSignature: Ubuntu 6.2.0-1012.12~22.04.1-aws 6.2.16 Uname: Linux 6.2.0-1012-aws aarch64 ApportVersion: 2.20.11-0ubuntu82.5 Architecture: arm64 CasperMD5CheckResult: unknown CloudArchitecture: aarch64 CloudID: aws CloudName: aws CloudPlatform: ec2 CloudRegion: us-east-1 CloudSubPlatform: metadata (http://169.254.169.254) Date: Mon Sep 25 20:56:02 2023 Ec2AMI: ami-0b9c5aafc5b2a4725 Ec2AMIManifest: (unknown) Ec2Architecture: arm64 Ec2AvailabilityZone: us-east-1b Ec2Imageid: ami-0b9c5aafc5b2a4725 Ec2InstanceType: im4gn.4xlarge Ec2Instancetype: im4gn.4xlarge Ec2Kernel: unavailable Ec2Ramdisk: unavailable Ec2Region: us-east-1 ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=C.UTF-8 SHELL=/bin/bash SourcePackage: linux-meta-aws-6.2 UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-meta-aws-6.2/+bug/2037335/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp