[Kernel-packages] [Bug 2037335] Re: kernel leaking TCP_MEM

2024-04-24 Thread Launchpad Bug Tracker
Status changed to 'Confirmed' because the bug affects multiple users.

** Changed in: linux-meta-aws-6.2 (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-meta-aws-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2037335

Title:
  kernel leaking TCP_MEM

Status in linux-meta-aws-6.2 package in Ubuntu:
  Confirmed

Bug description:
  We are running our Kafka brokers on Jammy on ARM64. Previous they were
  on kernel version 5.15.0-1028-aws, but a few weeks ago we built a new
  AMI and it picked up 6.2.0-1009-aws, and we have also upgraded to
  6.2.0-1012-aws and found the same problem.

  What we expected to happen:
  TCP memory (TCP_MEM) to fluctuate but stay relatively low (on a busy 
production broker running 5.15.0-1028-aws, we average 1900 pages over a 24 hour 
period)

  What happened instead:
  TCP memory (TCP_MEM) continues to rise until hitting the limit (1.5 million 
pages as configured currently). At this point, the broker is no longer able to 
properly create new connections and we start seeing "kernel: TCP: out of memory 
-- consider tuning tcp_mem" in dmesg output. If allowed to continue, the broker 
will eventually isolate itself from the rest of the cluster since it can't talk 
to the other brokers.

  Attached is a graph of the average TCP memory usage per kernel version
  for our production environment over the past 24 hours.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-aws 6.2.0.1012.12~22.04.1
  ProcVersionSignature: Ubuntu 6.2.0-1012.12~22.04.1-aws 6.2.16
  Uname: Linux 6.2.0-1012-aws aarch64
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: arm64
  CasperMD5CheckResult: unknown
  CloudArchitecture: aarch64
  CloudID: aws
  CloudName: aws
  CloudPlatform: ec2
  CloudRegion: us-east-1
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Sep 25 20:56:02 2023
  Ec2AMI: ami-0b9c5aafc5b2a4725
  Ec2AMIManifest: (unknown)
  Ec2Architecture: arm64
  Ec2AvailabilityZone: us-east-1b
  Ec2Imageid: ami-0b9c5aafc5b2a4725
  Ec2InstanceType: im4gn.4xlarge
  Ec2Instancetype: im4gn.4xlarge
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  Ec2Region: us-east-1
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-meta-aws-6.2
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-meta-aws-6.2/+bug/2037335/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2037335] Re: kernel leaking TCP_MEM

2024-04-24 Thread Terra Field
Thank you, that is great to hear!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-meta-aws-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2037335

Title:
  kernel leaking TCP_MEM

Status in linux-meta-aws-6.2 package in Ubuntu:
  Confirmed

Bug description:
  We are running our Kafka brokers on Jammy on ARM64. Previous they were
  on kernel version 5.15.0-1028-aws, but a few weeks ago we built a new
  AMI and it picked up 6.2.0-1009-aws, and we have also upgraded to
  6.2.0-1012-aws and found the same problem.

  What we expected to happen:
  TCP memory (TCP_MEM) to fluctuate but stay relatively low (on a busy 
production broker running 5.15.0-1028-aws, we average 1900 pages over a 24 hour 
period)

  What happened instead:
  TCP memory (TCP_MEM) continues to rise until hitting the limit (1.5 million 
pages as configured currently). At this point, the broker is no longer able to 
properly create new connections and we start seeing "kernel: TCP: out of memory 
-- consider tuning tcp_mem" in dmesg output. If allowed to continue, the broker 
will eventually isolate itself from the rest of the cluster since it can't talk 
to the other brokers.

  Attached is a graph of the average TCP memory usage per kernel version
  for our production environment over the past 24 hours.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-aws 6.2.0.1012.12~22.04.1
  ProcVersionSignature: Ubuntu 6.2.0-1012.12~22.04.1-aws 6.2.16
  Uname: Linux 6.2.0-1012-aws aarch64
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: arm64
  CasperMD5CheckResult: unknown
  CloudArchitecture: aarch64
  CloudID: aws
  CloudName: aws
  CloudPlatform: ec2
  CloudRegion: us-east-1
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Sep 25 20:56:02 2023
  Ec2AMI: ami-0b9c5aafc5b2a4725
  Ec2AMIManifest: (unknown)
  Ec2Architecture: arm64
  Ec2AvailabilityZone: us-east-1b
  Ec2Imageid: ami-0b9c5aafc5b2a4725
  Ec2InstanceType: im4gn.4xlarge
  Ec2Instancetype: im4gn.4xlarge
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  Ec2Region: us-east-1
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-meta-aws-6.2
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-meta-aws-6.2/+bug/2037335/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2037335] Re: kernel leaking TCP_MEM

2024-04-24 Thread Jonathan Heathcote
Hello there,

I think this may be the same issue as
https://bugs.launchpad.net/ubuntu/+source/linux-signed-
aws-6.2/+bug/2045560

I believe this might be related to the following kernel bug which
impacts Linux 6.0.0+:

https://lore.kernel.org/netdev/vi1pr01mb42407d7947b2ea448f1e04efd1...@vi1pr01mb4240.eurprd01.prod.exchangelabs.com/

A patch has been produced which fixes this issue (but has not yet made
it into a Linux release):

https://lore.kernel.org/all/20240421175248.1692552-1-eduma...@google.com/

Hope this helps!

Jonathan

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-meta-aws-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2037335

Title:
  kernel leaking TCP_MEM

Status in linux-meta-aws-6.2 package in Ubuntu:
  New

Bug description:
  We are running our Kafka brokers on Jammy on ARM64. Previous they were
  on kernel version 5.15.0-1028-aws, but a few weeks ago we built a new
  AMI and it picked up 6.2.0-1009-aws, and we have also upgraded to
  6.2.0-1012-aws and found the same problem.

  What we expected to happen:
  TCP memory (TCP_MEM) to fluctuate but stay relatively low (on a busy 
production broker running 5.15.0-1028-aws, we average 1900 pages over a 24 hour 
period)

  What happened instead:
  TCP memory (TCP_MEM) continues to rise until hitting the limit (1.5 million 
pages as configured currently). At this point, the broker is no longer able to 
properly create new connections and we start seeing "kernel: TCP: out of memory 
-- consider tuning tcp_mem" in dmesg output. If allowed to continue, the broker 
will eventually isolate itself from the rest of the cluster since it can't talk 
to the other brokers.

  Attached is a graph of the average TCP memory usage per kernel version
  for our production environment over the past 24 hours.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-aws 6.2.0.1012.12~22.04.1
  ProcVersionSignature: Ubuntu 6.2.0-1012.12~22.04.1-aws 6.2.16
  Uname: Linux 6.2.0-1012-aws aarch64
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: arm64
  CasperMD5CheckResult: unknown
  CloudArchitecture: aarch64
  CloudID: aws
  CloudName: aws
  CloudPlatform: ec2
  CloudRegion: us-east-1
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Sep 25 20:56:02 2023
  Ec2AMI: ami-0b9c5aafc5b2a4725
  Ec2AMIManifest: (unknown)
  Ec2Architecture: arm64
  Ec2AvailabilityZone: us-east-1b
  Ec2Imageid: ami-0b9c5aafc5b2a4725
  Ec2InstanceType: im4gn.4xlarge
  Ec2Instancetype: im4gn.4xlarge
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  Ec2Region: us-east-1
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-meta-aws-6.2
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-meta-aws-6.2/+bug/2037335/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 2037335] Re: kernel leaking TCP_MEM

2023-09-26 Thread Terra Field
Rebuilt the AMI with 5.15.0-1045-aws and the problem is gone.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-meta-aws-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2037335

Title:
  kernel leaking TCP_MEM

Status in linux-meta-aws-6.2 package in Ubuntu:
  New

Bug description:
  We are running our Kafka brokers on Jammy on ARM64. Previous they were
  on kernel version 5.15.0-1028-aws, but a few weeks ago we built a new
  AMI and it picked up 6.2.0-1009-aws, and we have also upgraded to
  6.2.0-1012-aws and found the same problem.

  What we expected to happen:
  TCP memory (TCP_MEM) to fluctuate but stay relatively low (on a busy 
production broker running 5.15.0-1028-aws, we average 1900 pages over a 24 hour 
period)

  What happened instead:
  TCP memory (TCP_MEM) continues to rise until hitting the limit (1.5 million 
pages as configured currently). At this point, the broker is no longer able to 
properly create new connections and we start seeing "kernel: TCP: out of memory 
-- consider tuning tcp_mem" in dmesg output. If allowed to continue, the broker 
will eventually isolate itself from the rest of the cluster since it can't talk 
to the other brokers.

  Attached is a graph of the average TCP memory usage per kernel version
  for our production environment over the past 24 hours.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-aws 6.2.0.1012.12~22.04.1
  ProcVersionSignature: Ubuntu 6.2.0-1012.12~22.04.1-aws 6.2.16
  Uname: Linux 6.2.0-1012-aws aarch64
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: arm64
  CasperMD5CheckResult: unknown
  CloudArchitecture: aarch64
  CloudID: aws
  CloudName: aws
  CloudPlatform: ec2
  CloudRegion: us-east-1
  CloudSubPlatform: metadata (http://169.254.169.254)
  Date: Mon Sep 25 20:56:02 2023
  Ec2AMI: ami-0b9c5aafc5b2a4725
  Ec2AMIManifest: (unknown)
  Ec2Architecture: arm64
  Ec2AvailabilityZone: us-east-1b
  Ec2Imageid: ami-0b9c5aafc5b2a4725
  Ec2InstanceType: im4gn.4xlarge
  Ec2Instancetype: im4gn.4xlarge
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  Ec2Region: us-east-1
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=C.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-meta-aws-6.2
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-meta-aws-6.2/+bug/2037335/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp