Re: Properly handle OOM death?

Israel Brewster Mon, 13 Mar 2023 10:37:01 -0700

> On Mar 13, 2023, at 9:28 AM, Adrian Klaver <adrian.kla...@aklaver.com> wrote:
> 
> On 3/13/23 10:21 AM, Israel Brewster wrote:
>> I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit 
>> more memory constrained than I would like, such that every week or so the 
>> various processes running on the machine will align badly and the OOM killer 
>> will kick in, killing off postgresql, as per the following journalctl output:
>> Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process 
>> of this unit has been killed by the OOM killer.
>> Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed 
>> with result 'oom-kill'.
>> Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 
>> 5d 17h 48min 24.509s CPU time.
>> And the service is no longer running.
>> When this happens, I go in and restart the postgresql service, and 
>> everything is happy again for the next week or two.
>> Obviously this is not a good situation. Which leads to two questions:
>> 1) is there some tweaking I can do in the postgresql config itself to 
>> prevent the situation from occurring in the first place?
>> 2) My first thought was to simply have systemd restart postgresql whenever 
>> it is killed like this, which is easy enough. Then I looked at the default 
>> unit file, and found these lines:
>> # prevent OOM killer from choosing the postmaster (individual backends will
>> # reset the score to 0)
>> OOMScoreAdjust=-900
>> # restarting automatically will prevent "pg_ctlcluster ... stop" from 
>> working,
>> # so we disable it here. Also, the postmaster will restart by itself on most
>> # problems anyway, so it is questionable if one wants to enable external
>> # automatic restarts.
>> #Restart=on-failure
>> Which seems to imply that the OOM killer should only be killing off 
>> individual backends, not the entire cluster to begin with - which should be 
>> fine. And also that adding the restart=on-failure option is probably not the 
>> greatest idea. Which makes me wonder what is really going on?
> 
> You might want to read:
> 
> https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT


Good information, thanks. One thing there confuses me though. It says:

Another approach, which can be used with or without altering 
vm.overcommit_memory, is to set the process-specific OOM score adjustment value 
for the postmaster process to -1000, thereby guaranteeing it will not be 
targeted by the OOM killer

Isn’t that exactly what the "OOMScoreAdjust=-900” line in the Unit file does 
though (except with a score of -900 rather than -1000)?

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145
> 
>> Thanks.
>> ---
>> Israel Brewster
>> Software Engineer
>> Alaska Volcano Observatory
>> Geophysical Institute - UAF
>> 2156 Koyukuk Drive
>> Fairbanks AK 99775-7320
>> Work: 907-474-5172
>> cell:  907-328-9145
> 
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>

Re: Properly handle OOM death?

Reply via email to