> On Mar 13, 2023, at 9:36 AM, Joe Conway <m...@joeconway.com> wrote: > > On 3/13/23 13:21, Israel Brewster wrote: >> I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit >> more memory constrained than I would like, such that every week or so the >> various processes running on the machine will align badly and the OOM killer >> will kick in, killing off postgresql, as per the following journalctl output: >> Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process >> of this unit has been killed by the OOM killer. >> Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed >> with result 'oom-kill'. >> Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed >> 5d 17h 48min 24.509s CPU time. >> And the service is no longer running. >> When this happens, I go in and restart the postgresql service, and >> everything is happy again for the next week or two. >> Obviously this is not a good situation. Which leads to two questions: >> 1) is there some tweaking I can do in the postgresql config itself to >> prevent the situation from occurring in the first place? >> 2) My first thought was to simply have systemd restart postgresql whenever >> it is killed like this, which is easy enough. Then I looked at the default >> unit file, and found these lines: >> # prevent OOM killer from choosing the postmaster (individual backends will >> # reset the score to 0) >> OOMScoreAdjust=-900 >> # restarting automatically will prevent "pg_ctlcluster ... stop" from >> working, >> # so we disable it here. Also, the postmaster will restart by itself on most >> # problems anyway, so it is questionable if one wants to enable external >> # automatic restarts. >> #Restart=on-failure >> Which seems to imply that the OOM killer should only be killing off >> individual backends, not the entire cluster to begin with - which should be >> fine. And also that adding the restart=on-failure option is probably not the >> greatest idea. Which makes me wonder what is really going on? > > First, are you running with a cgroup memory.limit set (e.g. in a container)?
Not sure, actually. I *think* I had it set it up as a full VM though, not a container. I’ll have to double-check that. > Assuming no, see: > > https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT > > That will tell you: > 1/ Turn off memory overcommit: "Although this setting will not prevent the > OOM killer from being invoked altogether, it will lower the chances > significantly and will therefore lead to more robust system behavior." > > 2/ set /proc/self/oom_score_adj to -1000 rather than -900 > (OOMScoreAdjust=-1000): the value -1000 is important as it is a "magic" value > which prevents the process from being selected by the OOM killer (see: > https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/oom.h#L6) > whereas -900 just makes it less likely. ..and that answers the question I just sent about the above linked page 😄 Thanks! > > All that said, even if the individual backend gets killed, the postmaster > will still go into crash recovery. So while technically postgres does not > restart, the effect is much the same. So see #1 above as your best protection. Interesting. Makes sense though. Thanks! --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > > HTH, > > Joe > > -- > Joe Conway > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com <https://aws.amazon.com/>