Your message dated Mon, 19 Feb 2024 10:47:05 +0100
with message-id <[email protected]>
and subject line Re: Bug #1019503: postgresql-13: memory leak with JIT
compilation
has caused the Debian Bug report #1019503,
regarding postgresql-13: memory leak with JIT compilation
to be marked as done.
This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.
(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)
--
1019503: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1019503
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: postgresql-13
Version: 13.7-0+deb11u1
Severity: important
We have found severe regressions when upgrading from bookworm to
bullseye on two of our PostgreSQL servers.
It seems like, in busy workloads, the JIT actually leaks memory. Like
a lot. In this screenshot of a yearly Grafana dashboard, you can see
memory usage is fairly regular until the upgrade (early May) at which
point the server starts regularly swapping and eventually OOM'ing:
https://gitlab.torproject.org/tpo/tpa/team/uploads/41f8850ecc4b4170f56901b4018a9870/image.png
The internal ticket we filed about this has all the gory details,
which are probably too much for this bug report:
https://gitlab.torproject.org/tpo/tpa/team/-/issues/40815
We also had issues on other servers, more examples:
https://gitlab.torproject.org/tpo/tpa/team/-/issues/40814
While this may seem like a one-off thing that affects only certain
workloads — we certainly have other PostgreSQL that do not suffer from
this problem — when it *does* affect the workload, it's pretty
catastrophic. Hence the "important" severity ("major effect on the
usability of a package, without rendering it completely unusable to
everyone").
Also, it took us a long time to track down this problem... it's
basically only because of the release notes of an unrelated project
(PuppetDB) happened to feature a similar bug report that we were
hinted this could be a problem:
https://tickets.puppetlabs.com/browse/PDB-5452
... which makes me think this problem might be more widespread than a
few workloads. It seems like DSA also had problems with the upgrade on
the sources.debian.org server which, granted, is a huge server as
well, but I don't see why that should necessarily be a problem with
PostgreSQL...
Past PostgreSQL upgrades have been basically without flaw for us: the
procedure is a little disruptive (e.g. dump/restore, basically) but
apart from that, we have never seen such a huge regression in
performance. So I figured it was worth at least a bug report.
I'm not sure what should come out of this; I can't help but think this
is a bug in the JIT, but it's far beyond my capacity to even start
debugging this specifically. So maybe this could be forwarded
upstream? But in the meantime, maybe this could be fixed "simply" by
adding a note to the Debian bullseye release notes.
One should also see if this behavior also occurs in newer releases: we
briefly considered upgrading to 14 to see if this was still happening,
before finding the JIT trick, but have not done so (yet?).
Thank you for your attention,
a.
-- System Information:
Debian Release: 11.4
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 5.10.0-17-amd64 (SMP w/2 CPU threads)
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
Versions of packages postgresql-13 depends on:
ii debconf [debconf-2.0] 1.5.77
ii libc6 2.31-13+deb11u3
ii libgcc-s1 10.2.1-6
ii libgssapi-krb5-2 1.18.3-6+deb11u1
ii libicu67 67.1-7
ii libldap-2.4-2 2.4.57+dfsg-3+deb11u1
ii libllvm11 1:11.0.1-2
ii libpam0g 1.4.0-9+deb11u1
ii libpq5 13.7-0+deb11u1
ii libselinux1 3.1-3
ii libssl1.1 1.1.1n-0+deb11u3
ii libstdc++6 10.2.1-6
ii libsystemd0 247.3-7
ii libuuid1 2.36.1-8+deb11u1
ii libxml2 2.9.10+dfsg-6.7+deb11u2
ii libxslt1.1 1.1.34-4+deb11u1
ii locales 2.31-13+deb11u3
ii locales-all 2.31-13+deb11u3
ii postgresql-client-13 13.7-0+deb11u1
ii postgresql-common 225
ii ssl-cert 1.1.0+nmu1
ii tzdata 2021a-1+deb11u5
ii zlib1g 1:1.2.11.dfsg-2+deb11u2
Versions of packages postgresql-13 recommends:
pn sysstat <none>
postgresql-13 suggests no packages.
-- debconf information:
postgresql-13/postrm_purge_data: true
--- End Message ---
--- Begin Message ---
Version: 13.14-0+deb11u1
On Wed, Jan 03, 2024 at 10:06:49PM +0100, Michael Banck wrote:
> On Sat, Sep 10, 2022 at 01:29:22PM -0400, Antoine Beaupre wrote:
> > We have found severe regressions when upgrading from bookworm to
> > bullseye on two of our PostgreSQL servers.
>
> Note that a work-around for this is (AFAIK) to set jit_inline_above_cost
> to 0, to avoid JIT inlining.
>
> In any case, a fix for this (supposedly) landed in master a few months
> back and got backpatched to all active branches some time ago, so it
> will be in the next round of Postgres stable releases, scheduled for
> mid-February.
This has now been released, so closing this report. Note that I have not
tested this on oldoldstable, which uses a different LLVM version, but at
least on oldstable I checked and the memory leak is gone.
Michael
--- End Message ---