Patches submitted: https://lists.ubuntu.com/archives/kernel-
team/2021-April/119661.html
** Changed in: linux (Ubuntu Bionic)
Assignee: (unassigned) => Tim Gardner (timg-tpi)
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1926081
Title:
nr_writeback memory leak in kernel 4.15.0-137+
Status in linux package in Ubuntu:
In Progress
Status in linux source package in Bionic:
In Progress
Bug description:
SRU Justification
[Impact]
Ubuntu 18.04.5 4.15.0 LTS kernels at version 4.15.0-137 and above
contain a memory leak due to the inclusion of patch from the upstream
kernel, but not the fix for that patch which was released later.
Bad patch in bionic:linux 2c17fa778db85644458b52a7df8eacc402cbc1ef mm:
memcontrol: fix excessive complexity in memory.stat reporting
This issue manifests itself as an increasing amount of memory used by
the writeback queue, which never returns to zero. This can been seen
either as the value of `nr_writeback` in /proc/vmstat, or the value of
`Writeback` in /proc/meminfo.
Ordinarily these values should be at or around zero, but on our
servers we observe the `nr_writeback` value increasing to over 8
million, (32GB of memory), at which point it isn't long before the
system IO slows to a crawl (tens of Kb/s). Our servers have 256GB of
memory, and are performing many CI related activities - this issue
appears to be related to concurrent writing to disk, and can be
demonstrated with a simple testcase (see later).
On our heavily used systems this memory leak can result in an unstable
server after 2-3 days, requiring a reboot to fix it.
After much investigation the issue appears to be because the patch
"mm: memcontrol: fix excessive complexity in memory.stat reporting"
was brought in to the 4.15.0-137 Ubuntu kernel (see
https://launchpad.net/ubuntu/+source/linux/4.15.0-137.141) as part of
" Bionic update: upstream stable patchset 2021-01-25 (LP: #1913214)",
however in the mainline kernel there was a follow up patch because
this initial patch introduced concurrency issues. The patch "mm:
memcontrol: fix NR_WRITEBACK leak in memcg and system stats" is
required, and should be brought into the Ubuntu packaged kernel to fix
the issues reported.
The required patch is here:
https://github.com/torvalds/linux/commit/c3cc39118c3610eb6ab4711bc624af7fc48a35fe
and was committed a few weeks after the original (broken) patch:
https://github.com/torvalds/linux/commit/a983b5ebee57209c99f68c8327072f25e0e6e3da
I have checked the release notes for Ubuntu versions -137 to -143, and
none include this second patch that should fix the issue. (I checked
https://people.canonical.com/~kernel/info/kernel-version-map.html for
all the kernel versions, and then visited each changelog page in turn,
e.g. https://launchpad.net/ubuntu/+source/linux/4.15.0-143.147 looking
for "mm: memcontrol: fix NR_WRITEBACK leak in memcg and system
stats").
We do not observe this on the 5.4.0 kernel (supported HWE kernel on
18.05.5), which includes this second patch. That kernel may also
include other patches, so we do not know if any other fixes are also
required, but the one we have linked above seems to definitely be
needed, and seems to match our symptoms.
[Test Plan]
Testcase:
The following is enough to permanently increase the value of
`nr_writeback` on our systems (by about 2000 during most executions):
```
date
grep nr_writeback /proc/vmstat
mkdir -p /docker/testfiles/{1..5}
seq -w 1 100000 | xargs -n1 -I% sh -c 'dd if=/dev/urandom
of=/docker/testfiles/1/file.% bs=4k count=10 status=none' &
seq -w 1 100000 | xargs -n1 -I% sh -c 'dd if=/dev/urandom
of=/docker/testfiles/2/file.% bs=4k count=10 status=none' &
seq -w 1 100000 | xargs -n1 -I% sh -c 'dd if=/dev/urandom
of=/docker/testfiles/3/file.% bs=4k count=10 status=none' &
seq -w 1 100000 | xargs -n1 -I% sh -c 'dd if=/dev/urandom
of=/docker/testfiles/4/file.% bs=4k count=10 status=none' &
seq -w 1 100000 | xargs -n1 -I% sh -c 'dd if=/dev/urandom
of=/docker/testfiles/5/file.% bs=4k count=10 status=none' &
wait $(jobs -p)
grep nr_writeback /proc/vmstat
date
```
Subsequent iterations of the test raise it further, and on a system
doing a lot of writing from a lot of different processes, it can rise
quickly.
System details:
lsb_release -rd
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Affected kernel: 4.15.0-137 onwards (current latest version tried was
4.15.0-142)
e.g.
apt-cache policy linux-image-4.15.0-141-generic
linux-image-4.15.0-141-generic:
Installed: 4.15.0-141.145
Candidate: 4.15.0-141.145
Version table:
*** 4.15.0-141.145 500
500 http://mirrors.service.networklayer.com/ubuntu
bionic-updates/main amd64 Packages
500 http://mirrors.service.networklayer.com/ubuntu
bionic-security/main amd64 Packages
100 /var/lib/dpkg/status
According to https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies
I should include additional information from the server, but at this
stage we have upgraded all our affected systems to 5.4.0, and
therefore the kernel versions do not match those with this issue.
We likely have other servers used in other services that are not as
heavily loaded that have not been as affected by this issue - and
therefore and I may be able to get the equivalent diagnostics from
there after confirming that they demonstrate the same issue with my
testcase
Workaround:
After several weeks narrowing this down, our only option was to
upgrade our servers to the 5.4 kernel, which is included as the HWE
kernel in 18.04.5:
apt update && apt install --install-recommends -y linux-generic-
hwe-18.04
We have now upgraded most of our heavily used systems where this is a major
issue to the 5.4.0 kernel, which seemed to be our only option. We have a lot of
other colleagues where this is not a possibility for them, and it seems to be
affecting them to varying degrees depending on the nature of their workloads.
---
ProblemType: Bug
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Apr 27 04:12 seq
crw-rw---- 1 root audio 116, 33 Apr 27 04:12 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.23
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord':
'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq',
'/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 18.04
HibernationDevice: RESUME=UUID=e38970cc-bdc9-406f-9f41-e8b02cfa48d7
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
MachineType: Supermicro PIO-848B-TRF4T-ST031
Package: linux (not installed)
PciMultimedia:
ProcFB: 0 astdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-141-generic
root=UUID=102d359f-6a99-403b-ac57-ff2a5fc1246a ro
ProcVersionSignature: Ubuntu 4.15.0-141.145-generic 4.15.18
RelatedPackageVersions:
linux-restricted-modules-4.15.0-141-generic N/A
linux-backports-modules-4.15.0-141-generic N/A
linux-firmware 1.173.20
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic
Uname: Linux 4.15.0-141-generic x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
WifiSyslog:
_MarkForUpload: False
dmi.bios.date: 10/18/2016
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 2.1
dmi.board.asset.tag: IBM SoftLayer
dmi.board.name: X10QBi
dmi.board.vendor: Supermicro
dmi.board.version: 1.01A
dmi.chassis.asset.tag: IBM SoftLayer
dmi.chassis.type: 1
dmi.chassis.vendor: Supermicro
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias:
dmi:bvnAmericanMegatrendsInc.:bvr2.1:bd10/18/2016:svnSupermicro:pnPIO-848B-TRF4T-ST031:pvr123456789:rvnSupermicro:rnX10QBi:rvr1.01A:cvnSupermicro:ct1:cvrToBeFilledByO.E.M.:
dmi.product.family: SMC X10
dmi.product.name: PIO-848B-TRF4T-ST031
dmi.product.version: 123456789
dmi.sys.vendor: Supermicro
[Where problems could occur]
Memory leakage could continue. The new spinlocks could cause some
performance degradation.
[Other Info]
These patches have been accepted to v4.14.y
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1926081/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp