Curious, I do have a very similar case at the moment with a slab of ~155GB, out of ~190GB RAM installed.

I am not sure yet what causes it but things I plan to investigate are:

* hanging NFS mount
* possible (PVE) service starting too many threads -> restarting each and checking the memory / slab usage.



On 9/20/19 2:31 PM, Chris Hofstaedtler | Deduktiva wrote:
Hi,

I'm seeing a very interesting problem on PVE6: one of our machines
appears to leak kernel memory over time, up to the point where only
a reboot helps. Shutting down all KVM VMs does not release this
memory.

I'll attach some information below, because I just couldn't figure
out what this memory is used for. Once before shutting down the VMs,
and once after. I had to reboot the PVE host now, but I guess
in a few days it will be at least noticable again.

This machine has the same (except CPU) hardware as the box next to
it; however this one was freshly installed with PVE6, the other one
is an upgrade from PVE5 and doesn't exhibit this problem. It's quite
puzzling because I haven't seen this symptom at all at all the
customer installations.

Here are some graphs showing the memory consumption over time:
   http://zeha.at/~ch/T/20190920-pve6_meminfo_0.png
   http://zeha.at/~ch/T/20190920-pve6_meminfo_1.png

Looking forward to any debug help, suggestions, ...

Chris


** Almost out of memory, before VM shutdown: **

top - 10:24:19 up 22 days, 22:29,  1 user,  load average: 1.85, 1.57, 1.32
Tasks: 530 total,   1 running, 529 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.8 us,  0.4 sy,  0.0 ni, 97.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  80413.1 total,    509.9 free,  70879.7 used,   9023.5 buff/cache
MiB Swap:  20480.0 total,   6516.6 free,  13963.4 used.   8699.0 avail Mem

     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    3183 root      20   0   10.6g   6.0g   2960 S   8.7   7.6   5861:52 
/usr/bin/kvm -id 103 -name puppet -chardev 
socket,id=qmp,path=/var/run/qemu-server/103.qmp,server,nowait -mon 
chardev=qmp,mode=control -chardev socket,id=qmp-event+
    3349 root      20   0 9266032   4.3g   2972 S   6.8   5.4   3834:41 
/usr/bin/kvm -id 2017 -name go-test-srv01 -chardev 
socket,id=qmp,path=/var/run/qemu-server/2017.qmp,server,nowait -mon 
chardev=qmp,mode=control -chardev socket,id=+
    3068 root      20   0 5060928   3.7g   2900 S   6.8   4.7   3110:01 
/usr/bin/kvm -id 101 -name backup -chardev 
socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait -mon 
chardev=qmp,mode=control -chardev socket,id=qmp-event+
    3399 root      20   0 5094772   2.3g   2944 S  50.5   2.9  10780:07 
/usr/bin/kvm -id 3002 -name monitor01 -chardev 
socket,id=qmp,path=/var/run/qemu-server/3002.qmp,server,nowait -mon 
chardev=qmp,mode=control -chardev socket,id=qmp-+
    3254 root      20   0   32.8g   1.9g   3040 S   1.0   2.4 490:39.29 
/usr/bin/kvm -id 2005 -name debbuild -chardev 
socket,id=qmp,path=/var/run/qemu-server/2005.qmp,server,nowait -mon 
chardev=qmp,mode=control -chardev socket,id=qmp-e+
    2994 root      20   0 2656268 658428   2980 S   9.7   0.8   2895:15 
/usr/bin/kvm -id 100 -name pbx -chardev 
socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait -mon 
chardev=qmp,mode=control -chardev socket,id=qmp-event,pa+
    2927 root      20   0 2664232 479372   2944 S   6.8   0.6   2343:43 
/usr/bin/kvm -id 102 -name ns1 -chardev 
socket,id=qmp,path=/var/run/qemu-server/102.qmp,server,nowait -mon 
chardev=qmp,mode=control -chardev socket,id=qmp-event,pa+
    2417 root      rt   0  606912 211336  51444 S   1.9   0.3 613:27.87 
/usr/sbin/corosync -f
2023020 root      20   0  246556  98020  97044 S   0.0   0.1  15:47.80 
/lib/systemd/systemd-journald
    1806 root      20   0  967944  32724  23612 S   0.0   0.0  53:49.62 
/usr/bin/pmxcfs
    2801 root      20   0  314488  32428   6464 S   0.0   0.0 322:58.23 
pvestatd                                                                        
                                                                                
   +
3771741 root      20   0  150776  31728   3700 S   0.0   0.0   0:12.81 
/opt/puppetlabs/puppet/bin/ruby /opt/puppetlabs/puppet/bin/puppet agent 
--no-daemonize
    2799 root      20   0  316056  27452   5656 S   0.0   0.0  95:49.25 
pve-firewall                                                                    
                                                                                
   +
    2909 root      20   0  325248  12684   5268 S   1.0   0.0   7:03.91 
pve-ha-lrm                                                                      
                                                                                
   +
  868033 ch        20   0   21660   9104   7280 S   0.0   0.0   0:00.12 
/lib/systemd/systemd --user
  868009 root      20   0   16912   7988   6856 S   0.0   0.0   0:00.03 sshd: 
ch [priv]
       1 root      20   0  171820   7640   5032 S   0.0   0.0  19:58.80 
/lib/systemd/systemd --system --deserialize 37
    2876 root      20   0  325544   7124   4988 S   0.0   0.0   4:18.16 
pve-ha-crm                                                                      
                                                                                
   +
    1654 Debian-+  20   0   40488   7096   2864 S   0.0   0.0  77:37.18 
/usr/sbin/snmpd -Lsd -Lf /dev/null -u Debian-snmp -g Debian-snmp -I -smux 
mteTrigger mteTriggerConf -f -p /run/snmpd.pid
  868045 ch        20   0   10240   5404   3996 S   0.0   0.0   0:00.11 -zsh
  868044 ch        20   0   16912   4636   3492 S   0.0   0.0   0:00.02 sshd: 
ch@pts/0
    1644 root      20   0   29608   4520   3496 S   0.0   0.0   4:59.62 
/usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown 
--wait-for-signal
  868336 root      20   0    7716   4372   3092 S   0.0   0.0   0:00.03 -bash
1761096 root      20   0  351564   4180   3336 S   0.0   0.0   1:12.83 
pvedaemon worker                                                                
                                                                                
   +
1776171 root      20   0  351696   4076   3352 S   0.0   0.0   1:18.27 
pvedaemon worker                                                                
                                                                                
   +
  868370 root      20   0   11680   4016   2964 R   2.9   0.0   0:00.68 top
1780591 root      20   0  351696   4008   3248 S   0.0   0.0   1:11.73 
pvedaemon worker                                                                
                                                                                
   +
    1086 root      20   0   19540   3984   3720 S   0.0   0.0   3:11.21 
/lib/systemd/systemd-logind
  868335 root      20   0   10156   3788   3364 S   0.0   0.0   0:00.01 sudo -i
    2899 www-data  20   0  121256   3412   3080 S   0.0   0.0   0:33.99 
spiceproxy                                                                      
                                                                                
   +
2000791 www-data  20   0  344932   3412   2604 S   0.0   0.0   1:16.39 pveproxy 
worker                                                                          
                                                                          +
2000792 www-data  20   0  344932   3348   2604 S   0.0   0.0   1:07.07 pveproxy 
worker                                                                          
                                                                          +
    1251 root      20   0  225816   3296   2424 S   0.0   0.0   9:47.44 
/usr/sbin/rsyslogd -n -iNONE
    1258 message+  20   0    9212   3268   2820 S   0.0   0.0   6:41.36 
/usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile 
--systemd-activation --syslog-only

root@vn03:~# uname -a
Linux vn03 5.0.21-1-pve #1 SMP PVE 5.0.21-1 (Tue, 20 Aug 2019 17:16:32 +0200) 
x86_64 GNU/Linux
root@vn03:~# free -m
               total        used        free      shared  buff/cache   available
Mem:          80413       70877         515         101        9019        8708
Swap:         20479       13963        6516
root@vn03:~# dpkg -l pve\*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                    Version      Architecture Description
+++-=======================-============-============-======================================================
ii  pve-cluster             6.0-5        amd64        Cluster Infrastructure 
for Proxmox Virtual Environment
ii  pve-container           3.0-5        all          Proxmox VE Container 
management tool
ii  pve-docs                6.0-4        all          Proxmox VE Documentation
ii  pve-edk2-firmware       2.20190614-1 all          edk2 based firmware 
modules for virtual machines
ii  pve-firewall            4.0-7        amd64        Proxmox VE Firewall
ii  pve-firmware            3.0-2        all          Binary firmware code for 
the pve-kernel
ii  pve-ha-manager          3.0-2        amd64        Proxmox VE HA Manager
ii  pve-i18n                2.0-2        all          Internationalization 
support for Proxmox VE
un  pve-kernel              <none>       <none>       (no description available)
ii  pve-kernel-5.0          6.0-7        all          Latest Proxmox VE Kernel 
Image
ii  pve-kernel-5.0.15-1-pve 5.0.15-1     amd64        The Proxmox PVE Kernel 
Image
ii  pve-kernel-5.0.18-1-pve 5.0.18-3     amd64        The Proxmox PVE Kernel 
Image
ii  pve-kernel-5.0.21-1-pve 5.0.21-1     amd64        The Proxmox PVE Kernel 
Image
ii  pve-kernel-helper       6.0-7        all          Function for various 
kernel maintenance tasks.
un  pve-kvm                 <none>       <none>       (no description available)
ii  pve-manager             6.0-6        amd64        Proxmox Virtual 
Environment Management Tools
ii  pve-qemu-kvm            4.0.0-5      amd64        Full virtualization on 
x86 hardware
un  pve-qemu-kvm-2.6.18     <none>       <none>       (no description available)
ii  pve-xtermjs             3.13.2-1     all          HTML/JS Shell client
root@vn03:~# slabtop -o | head -50
  Active / Total Objects (% used)    : 205425461 / 212231433 (96.8%)
  Active / Total Slabs (% used)      : 4949759 / 4949759 (100.0%)
  Active / Total Caches (% used)     : 114 / 161 (70.8%)
  Active / Total Size (% used)       : 60112896.56K / 60714678.54K (99.0%)
  Minimum / Average / Maximum Object : 0.01K / 0.29K / 16.62K

   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
43583592 43542487  99%    0.20K 1117528       39   8940224K vm_area_struct
26520256 26518592  99%    0.06K 414379       64   1657516K anon_vma_chain
16788000 16434450  97%    0.25K 524625       32   4197000K filp
13079680 13078464  99%    0.03K 102185      128    408740K kmalloc-32
11544320 5261058  45%    0.06K 180380       64    721520K dmaengine-unmap-2
10128740 10127452  99%    0.09K 220190       46    880760K anon_vma
9602484 9602484 100%    0.04K  94142      102    376568K pde_opener
7442736 7442572  99%    0.19K 177208       42   1417664K cred_jar
7213200 7209695  99%    0.13K 240440       30    961760K kernfs_node_cache
6023850 5992341  99%    0.19K 143425       42   1147400K dentry
5704350 5704350 100%    0.08K 111850       51    447400K task_delay_info
5054066 5054066 100%    0.69K 109871       46   3515872K files_cache
4664512 4664481  99%    0.12K 145766       32    583064K pid
4591440 4591440 100%    1.06K 153048       30   4897536K mm_struct
4207445 4203908  99%    0.58K  76499       55   2447968K inode_cache
4104480 4104291  99%    0.62K  80480       51   2575360K sock_inode_cache
3901440 3900588  99%    0.06K  60960       64    243840K kmalloc-64
3856230 3856160  99%    1.06K 128541       30   4113312K signal_cache
3423826 3417982  99%    0.65K  69874       49   2235968K proc_inode_cache
3139584 3138382  99%    0.01K   6132      512     24528K kmalloc-8
2983344 2983255  99%    0.19K  71032       42    568256K kmalloc-192
2426976 2426413  99%    1.00K  75843       32   2426976K kmalloc-1k
1939854 1931355  99%    0.09K  46187       42    184748K kmalloc-96
1649895 1649895 100%    2.06K 109993       15   3519776K sighand_cache
1280544 1280544 100%    1.00K  40017       32   1280544K UNIX
1052928 1050819  99%    0.50K  32904       32    526464K kmalloc-512
1029792 1029312  99%    0.25K  32181       32    257448K skbuff_head_cache
940624 940559  99%    4.00K 117578        8   3762496K kmalloc-4k
799895 787069  98%    5.75K 159979        5   5119328K task_struct
735696 724643  98%    0.10K  18864       39     75456K buffer_head
525504 525378  99%    2.00K  32844       16   1051008K kmalloc-2k
433024 426780  98%    0.06K   6766       64     27064K kmem_cache_node
310710 301758  97%    1.05K  10357       30    331424K ext4_inode_cache
292340 290078  99%    0.68K   6220       47    199040K shmem_inode_cache
215250 214814  99%    0.38K   5125       42     82000K kmem_cache
212296 196761  92%    0.57K   7582       28    121312K radix_tree_node
158464 158464 100%    0.02K    619      256      2476K kmalloc-16
149925 149925 100%    1.25K   5997       25    191904K UDPv6
  71424  71140  99%    0.12K   2232       32      8928K kmalloc-128
  70020  70020 100%    0.16K   1376       51     11008K kvm_mmu_page_header
  40032  40009  99%    0.25K   1251       32     10008K kmalloc-256
  34944  33823  96%    0.09K    832       42      3328K kmalloc-rcl-96
  34816  32567  93%    0.06K    544       64      2176K kmalloc-rcl-64
root@vn03:~# pct list
root@vn03:~# qm list
       VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID
        100 pbx                  running    2048              16.00 2994
        101 backup               running    4096              32.00 3068
        102 ns1                  running    2048              32.00 2927
        103 puppet               running    10240             16.00 3183
       2005 debbuild             running    32768             40.00 3254
       2017 go-test-srv01        running    8192              20.00 3349
       3002 monitor01            running    4096              32.00 3399
       5001 salsa-runner-01      stopped    16384             32.00 0
       6001 deduktiva-runner-01  stopped    32768             32.00 0
       6901 mac                  stopped    4096               0.25 0
root@vn03:~# sysctl -a | grep hugepages
vm.nr_hugepages = 0
vm.nr_hugepages_mempolicy = 0
vm.nr_overcommit_hugepages = 0


*** After shutdown of all VMs: ***

top - 10:39:56 up 22 days, 22:44,  2 users,  load average: 0.83, 1.84, 1.88
Tasks: 491 total,   1 running, 490 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.0 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  80413.1 total,  18276.4 free,  52704.9 used,   9431.8 buff/cache
MiB Swap:  20480.0 total,  19393.6 free,   1086.4 used.  26801.1 avail Mem

     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    2417 root      rt   0  606908 211332  51444 S   1.0   0.3 613:46.50 
/usr/sbin/corosync -f
    2878 www-data  20   0  344800 133424  21784 S   0.0   0.2   0:36.09 
pveproxy                                                                        
                                                                                
   +
  883317 www-data  20   0  361776 133084  11056 S   0.0   0.2   0:01.04 
pveproxy worker                                                                 
                                                                                
   +
    2836 root      20   0  343228 132060  21764 S   0.0   0.2   0:38.88 
pvedaemon                                                                       
                                                                                
   +
  883319 www-data  20   0  360688 130992  11148 S   1.0   0.2   0:01.26 
pveproxy worker                                                                 
                                                                                
   +
  883318 www-data  20   0  358056 128864  11148 S   0.0   0.2   0:01.75 
pveproxy worker                                                                 
                                                                                
   +
  883166 root      20   0  351912 121884  10220 S   0.0   0.1   0:00.96 
pvedaemon worker                                                                
                                                                                
   +
  883165 root      20   0  351848 121584   9952 S   0.0   0.1   0:00.40 
pvedaemon worker                                                                
                                                                                
   +
  883164 root      20   0  351712 121560  10060 S   0.0   0.1   0:00.65 
pvedaemon worker                                                                
                                                                                
   +
    2801 root      20   0  307252  92952  20996 S   0.0   0.1 323:07.31 
pvestatd                                                                        
                                                                                
   +
2023020 root      20   0  267408  90508  89344 S   0.0   0.1  15:48.85 
/lib/systemd/systemd-journald
    2899 www-data  20   0  121260  59804  12212 S   0.0   0.1   0:34.77 
spiceproxy                                                                      
                                                                                
   +
  883544 www-data  20   0  121500  51260   3448 S   0.0   0.1   0:00.05 
spiceproxy worker                                                               
                                                                                
   +
  876236 root      20   0  524564  50188  37612 S   0.0   0.1   0:01.90 
/usr/bin/pmxcfs
3771741 root      20   0  150776  30880   3264 S   0.0   0.0   0:12.86 
/opt/puppetlabs/puppet/bin/ruby /opt/puppetlabs/puppet/bin/puppet agent 
--no-daemonize
    2799 root      20   0  316112  28352   5840 S   0.0   0.0  95:51.91 
pve-firewall                                                                    
                                                                                
   +
    2909 root      20   0  325212  14196   5404 S   0.0   0.0   7:04.14 
pve-ha-lrm                                                                      
                                                                                
   +
    2876 root      20   0  325564   9600   5224 S   0.0   0.0   4:18.33 
pve-ha-crm                                                                      
                                                                                
   +
  868033 ch        20   0   21660   8844   7020 S   0.0   0.0   0:00.14 
/lib/systemd/systemd --user

root@vn03:~# free -m
               total        used        free      shared  buff/cache   available
Mem:          80413       52700       18281         115        9431       26805
Swap:         20479        1086       19393
root@vn03:~# slabtop -o | head -50
  Active / Total Objects (% used)    : 199865696 / 200976971 (99.4%)
  Active / Total Slabs (% used)      : 4771440 / 4771440 (100.0%)
  Active / Total Caches (% used)     : 114 / 161 (70.8%)
  Active / Total Size (% used)       : 59688763.91K / 59945034.02K (99.6%)
  Minimum / Average / Maximum Object : 0.01K / 0.30K / 16.62K

   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
43540380 43499279  99%    0.20K 1116420       39   8931360K vm_area_struct
26459776 26457217  99%    0.06K 413434       64   1653736K anon_vma_chain
16782720 16429406  97%    0.25K 524460       32   4195680K filp
13075712 13074728  99%    0.03K 102154      128    408616K kmalloc-32
10104728 10103625  99%    0.09K 219668       46    878672K anon_vma
9599628 9599628 100%    0.04K  94114      102    376456K pde_opener
7442106 7442024  99%    0.19K 177193       42   1417544K cred_jar
7211280 7207550  99%    0.13K 240376       30    961504K kernfs_node_cache
5999322 5970370  99%    0.19K 142841       42   1142728K dentry
5691447 5691447 100%    0.08K 111597       51    446388K task_delay_info
5052594 5052594 100%    0.69K 109839       46   3514848K files_cache
4657408 4657315  99%    0.12K 145544       32    582176K pid
4590750 4590721  99%    1.06K 153025       30   4896800K mm_struct
4206400 4202839  99%    0.58K  76480       55   2447360K inode_cache
4091424 4091235  99%    0.62K  80224       51   2567168K sock_inode_cache
3903104 3901440  99%    0.06K  60986       64    243944K kmalloc-64
3855600 3855530  99%    1.06K 128520       30   4112640K signal_cache
3416133 3410170  99%    0.65K  69717       49   2230944K proc_inode_cache
3124224 3123017  99%    0.01K   6102      512     24408K kmalloc-8
2982840 2982826  99%    0.19K  71020       42    568160K kmalloc-192
2425760 2424977  99%    1.00K  75805       32   2425760K kmalloc-1k
1940694 1932266  99%    0.09K  46207       42    184828K kmalloc-96
1649415 1649346  99%    2.06K 109961       15   3518752K sighand_cache
1279520 1279520 100%    1.00K  39985       32   1279520K UNIX
1043392 1040142  99%    0.50K  32606       32    521696K kmalloc-512
1021152 1020672  99%    0.25K  31911       32    255288K skbuff_head_cache
938880 938777  99%    4.00K 117360        8   3755520K kmalloc-4k
797715 784886  98%    5.75K 159543        5   5105376K task_struct
713388 699031  97%    0.10K  18292       39     73168K buffer_head
643008  73139  11%    0.06K  10047       64     40188K dmaengine-unmap-2
525520 525326  99%    2.00K  32845       16   1051040K kmalloc-2k
432768 426806  98%    0.06K   6762       64     27048K kmem_cache_node
308100 298326  96%    1.05K  10270       30    328640K ext4_inode_cache
292387 289915  99%    0.68K   6221       47    199072K shmem_inode_cache
215250 214971  99%    0.38K   5125       42     82000K kmem_cache
212380 180327  84%    0.57K   7585       28    121360K radix_tree_node
157952 157952 100%    0.02K    617      256      2468K kmalloc-16
150150 150150 100%    1.25K   6006       25    192192K UDPv6
  71008  70660  99%    0.12K   2219       32      8876K kmalloc-128
  40064  40056  99%    0.25K   1252       32     10016K kmalloc-256
  34986  34259  97%    0.09K    833       42      3332K kmalloc-rcl-96
  34368  32733  95%    0.06K    537       64      2148K kmalloc-rcl-64
  33660  33300  98%    0.05K    396       85      1584K ftrace_event_field



typical VM config:

balloon: 0
bootdisk: virtio0
cores: 2
cpu: Haswell-noTSX
ide2: none,media=cdrom
memory: 4096
name: backup
net0: virtio=52:54:00:b7:e0:ba,bridge=vmbr100
numa: 0
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=39d362a5-6bae-41b7-9803-b76279e2280f
sockets: 1
virtio0: datastore:vm-101-disk-1,cache=writeback,size=32G
virtio1: datastore:vm-101-disk-2,cache=writeback,size=100G



_______________________________________________
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to