Your message dated Mon, 19 Feb 2024 10:47:05 +0100
with message-id <[email protected]>
and subject line Re: Bug #1019503: postgresql-13: memory leak with JIT 
compilation
has caused the Debian Bug report #1019503,
regarding postgresql-13: memory leak with JIT compilation
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
1019503: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1019503
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: postgresql-13
Version: 13.7-0+deb11u1
Severity: important

We have found severe regressions when upgrading from bookworm to
bullseye on two of our PostgreSQL servers.

It seems like, in busy workloads, the JIT actually leaks memory. Like
a lot. In this screenshot of a yearly Grafana dashboard, you can see
memory usage is fairly regular until the upgrade (early May) at which
point the server starts regularly swapping and eventually OOM'ing:

https://gitlab.torproject.org/tpo/tpa/team/uploads/41f8850ecc4b4170f56901b4018a9870/image.png

The internal ticket we filed about this has all the gory details,
which are probably too much for this bug report:

https://gitlab.torproject.org/tpo/tpa/team/-/issues/40815

We also had issues on other servers, more examples:

https://gitlab.torproject.org/tpo/tpa/team/-/issues/40814

While this may seem like a one-off thing that affects only certain
workloads — we certainly have other PostgreSQL that do not suffer from
this problem — when it *does* affect the workload, it's pretty
catastrophic. Hence the "important" severity ("major effect on the
usability of a package, without rendering it completely unusable to
everyone").

Also, it took us a long time to track down this problem... it's
basically only because of the release notes of an unrelated project
(PuppetDB) happened to feature a similar bug report that we were
hinted this could be a problem:

https://tickets.puppetlabs.com/browse/PDB-5452

... which makes me think this problem might be more widespread than a
few workloads. It seems like DSA also had problems with the upgrade on
the sources.debian.org server which, granted, is a huge server as
well, but I don't see why that should necessarily be a problem with
PostgreSQL...

Past PostgreSQL upgrades have been basically without flaw for us: the
procedure is a little disruptive (e.g. dump/restore, basically) but
apart from that, we have never seen such a huge regression in
performance. So I figured it was worth at least a bug report.

I'm not sure what should come out of this; I can't help but think this
is a bug in the JIT, but it's far beyond my capacity to even start
debugging this specifically. So maybe this could be forwarded
upstream? But in the meantime, maybe this could be fixed "simply" by
adding a note to the Debian bullseye release notes.

One should also see if this behavior also occurs in newer releases: we
briefly considered upgrading to 14 to see if this was still happening,
before finding the JIT trick, but have not done so (yet?).

Thank you for your attention,

a.

-- System Information:
Debian Release: 11.4
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.10.0-17-amd64 (SMP w/2 CPU threads)
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages postgresql-13 depends on:
ii  debconf [debconf-2.0]  1.5.77
ii  libc6                  2.31-13+deb11u3
ii  libgcc-s1              10.2.1-6
ii  libgssapi-krb5-2       1.18.3-6+deb11u1
ii  libicu67               67.1-7
ii  libldap-2.4-2          2.4.57+dfsg-3+deb11u1
ii  libllvm11              1:11.0.1-2
ii  libpam0g               1.4.0-9+deb11u1
ii  libpq5                 13.7-0+deb11u1
ii  libselinux1            3.1-3
ii  libssl1.1              1.1.1n-0+deb11u3
ii  libstdc++6             10.2.1-6
ii  libsystemd0            247.3-7
ii  libuuid1               2.36.1-8+deb11u1
ii  libxml2                2.9.10+dfsg-6.7+deb11u2
ii  libxslt1.1             1.1.34-4+deb11u1
ii  locales                2.31-13+deb11u3
ii  locales-all            2.31-13+deb11u3
ii  postgresql-client-13   13.7-0+deb11u1
ii  postgresql-common      225
ii  ssl-cert               1.1.0+nmu1
ii  tzdata                 2021a-1+deb11u5
ii  zlib1g                 1:1.2.11.dfsg-2+deb11u2

Versions of packages postgresql-13 recommends:
pn  sysstat  <none>

postgresql-13 suggests no packages.

-- debconf information:
  postgresql-13/postrm_purge_data: true

--- End Message ---
--- Begin Message ---
Version: 13.14-0+deb11u1

On Wed, Jan 03, 2024 at 10:06:49PM +0100, Michael Banck wrote:
> On Sat, Sep 10, 2022 at 01:29:22PM -0400, Antoine Beaupre wrote:
> > We have found severe regressions when upgrading from bookworm to
> > bullseye on two of our PostgreSQL servers.
> 
> Note that a work-around for this is (AFAIK) to set jit_inline_above_cost
> to 0, to avoid JIT inlining.
> 
> In any case, a fix for this (supposedly) landed in master a few months
> back and got backpatched to all active branches some time ago, so it
> will be in the next round of Postgres stable releases, scheduled for
> mid-February.

This has now been released, so closing this report. Note that I have not
tested this on oldoldstable, which uses a different LLVM version, but at
least on oldstable I checked and the memory leak is gone.


Michael

--- End Message ---

Reply via email to