This bug is missing log files that will aid in diagnosing the problem.
While running an Ubuntu kernel (not a mainline or third-party kernel)
please enter the following command in a terminal window:

apport-collect 2023143

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable
to run this command, please add a comment stating that fact and change
the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the
Ubuntu Kernel Team.

** Changed in: linux (Ubuntu)
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2023143

Title:
  Memory leak on large server

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Hi,
  We are trying to diagnose a kernel memory look on a production Ubuntu 22.04.2 
LTS.
  We have tried several official Ubuntu kernels, 5.15aws, 5.19aws and now even 
6.2.0-1004-aws (all Ubuntu signed):
  ```
  # cat /proc/version_signature
  Ubuntu 6.2.0-1004.4-aws 6.2.6
  ```

  This is a production server so we'll appreciate any and all help diagnosing 
and solving this issue!
   
  The server is an u-112 instance with 12TB RAM, and is losing 1TB+ of memory a 
day to a kernel leak.
  For example, currently with an uptime of 3.5 days, we have 1.8Ti available, 
however RSS+slabs is only 4.1TB.

  all active process together take about 4TB of RAM (`ps -eo rss | awk
  'BEGIN {x=0} {x = x + $1} END {print x}'` gives 4088636708).

  From slabtop we see about 100GB are consumed by slab (`slabtop -o -s t | 
head`: )
  ```
   Active / Total Objects (% used)    : 303580174 / 332642344 (91.3%)
   Active / Total Slabs (% used)      : 6697552 / 6697552 (100.0%)
   Active / Total Caches (% used)     : 158 / 215 (73.5%)
   Active / Total Size (% used)       : 112801663.93K / 121442845.45K (92.9%)
   Minimum / Average / Maximum Object : 0.01K / 0.36K / 16.00K

    OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
  67537280 59696907  88%    0.03K 527635      128   2110540K kmalloc-32
  65247564 65241398  99%    0.31K 1279364       51  20469824K arc_buf_hdr_t_full
  58270446 58040685  99%    0.10K 747057       78   5976456K abd_t
  16697268 13731405  82%    0.38K 397554       42   6360864K dmu_buf_impl_t
  15982912 10366686  64%    0.50K 249733       64   7991456K kmalloc-512
  14975616 11605380  77%    0.06K 233994       64    935976K kmalloc-64
  ```

  In /proc/meminfo:
  ```
  MemTotal:       12656421408 kB
  MemFree:        1975976204 kB
  MemAvailable:   1968415088 kB
  Buffers:         1087956 kB
  Cached:         101168004 kB
  SwapCached:     17912340 kB
  Active:         101022084 kB
  Inactive:       4129984264 kB
  Active(anon):   94623216 kB
  Inactive(anon): 4104673512 kB
  Active(file):    6398868 kB
  Inactive(file): 25310752 kB
  Unevictable:      338908 kB
  Mlocked:          332132 kB
  SwapTotal:      4294967292 kB
  SwapFree:       3500705532 kB
  Zswap:                 0 kB
  Zswapped:              0 kB
  Dirty:              2908 kB
  Writeback:             0 kB
  AnonPages:      4123489132 kB
  Mapped:          3761620 kB
  Shmem:          70756156 kB
  KReclaimable:   10319220 kB
  Slab:           122355620 kB
  SReclaimable:   10319220 kB
  SUnreclaim:     112036400 kB
  KernelStack:     1793296 kB
  PageTables:     21748556 kB
  SecPageTables:         0 kB
  NFS_Unstable:          0 kB
  Bounce:                0 kB
  WritebackTmp:          0 kB
  CommitLimit:    10623177996 kB
  Committed_AS:   6775476544 kB
  VmallocTotal:   34359738367 kB
  VmallocUsed:    296984480 kB
  VmallocChunk:          0 kB
  Percpu:          1326080 kB
  HardwareCorrupted:     0 kB
  AnonHugePages:  1630980096 kB
  ShmemHugePages:        0 kB
  ShmemPmdMapped:        0 kB
  FileHugePages:         0 kB
  FilePmdMapped:         0 kB
  HugePages_Total:       0
  HugePages_Free:        0
  HugePages_Rsvd:        0
  HugePages_Surp:        0
  Hugepagesize:       2048 kB
  Hugetlb:               0 kB
  DirectMap4k:     2056036 kB
  DirectMap2M:    40935424 kB
  DirectMap1G:    12814647296 kB
  ```

  Its not a tmpfs/shm fs issue either:
  ```
  df -h | grep -E 'tmpfs|shm'
  tmpfs                                               256G   70G  187G  27% 
/dev/shm
  tmpfs                                               256G  3.4M  256G   1% /run
  tmpfs                                               5.0M     0  5.0M   0% 
/run/lock
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10102
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/1002
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10030
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10194
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10200
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10136
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10198
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10143
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10188
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10124
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10174
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10165
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10197
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10183
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10033
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10023
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10133
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10185
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10201
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/1004
  tmpfs                                               8.0G   24K  8.0G   1% 
/run/user/10014
  ```
  --- 
  ProblemType: Bug
  AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 
2: ls: cannot access '/dev/snd/': No such file or directory
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  CRDA: N/A
  CasperMD5CheckResult: unknown
  DistroRelease: Ubuntu 22.04
  Ec2AMI: ami-08c40ec9ead489470
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: us-east-1d
  Ec2InstanceType: u-12tb1.112xlarge
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lspci: Error: [Errno 2] No such file or directory: 'lspci'
  Lspci-vt: Error: [Errno 2] No such file or directory: 'lspci'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  Lsusb-t:
   
  Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
  MachineType: Amazon EC2 u-12tb1.112xlarge
  NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   LC_CTYPE=C.UTF-8
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-1004-aws 
root=PARTUUID=cbb5015f-ca94-467b-91ae-cce97828a042 ro quiet mitigations=off 
console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
  ProcVersionSignature: Ubuntu 6.2.0-1004.4-aws 6.2.6
  RelatedPackageVersions:
   linux-restricted-modules-6.2.0-1004-aws N/A
   linux-backports-modules-6.2.0-1004-aws  N/A
   linux-firmware                          N/A
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  Tags:  jammy ec2-images
  Uname: Linux 6.2.0-1004-aws x86_64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: False
  dmi.bios.date: 10/16/2017
  dmi.bios.release: 1.0
  dmi.bios.vendor: Amazon EC2
  dmi.bios.version: 1.0
  dmi.board.asset.tag: i-0b8914fe51e3d7555
  dmi.board.vendor: Amazon EC2
  dmi.chassis.asset.tag: Amazon EC2
  dmi.chassis.type: 1
  dmi.chassis.vendor: Amazon EC2
  dmi.modalias: 
dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnu-12tb1.112xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku:
  dmi.product.name: u-12tb1.112xlarge
  dmi.sys.vendor: Amazon EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2023143/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to