Re: Postgres Out Of Memory Crash
On Thu, Nov 2, 2023 at 4:13 AM Avi Weinberg wrote: > Hi experts, > > > > I'm using Patroni Postgres installation and noticed that twice already > postgres crashed due to out of memory. I'm using logical replication with > around 30-40 active subscribers on this machine. The machine has 128GB but > only 32GB is allocated to Postgres. How can I know what is actually > causing the out of memory issue? Is it caused by not optimal postgres > configuration or something else? > > > > /usr/lib/postgresql/13/bin/postgres -D > /home/postgres/pgdata/pgroot/data > --config-file=/home/postgres/pgdata/pgroot/data/postgresql.conf --port=5432 > --cluster_name=postgres-cluster --wal_level=logical --hot_standby=on > --max_connections=533 --max_wal_senders=90 --max_prepared_transactions=0 > --max_locks_per_transaction=64 --track_commit_timestamp=on > --max_replication_slots=90 --max_worker_processes=30 --wal_log_hints=on > Couple things here. You don't really allocate memory to postgres, rather you set up memory reserved for certain kinds of buffering operations via shared buffers, and other less important settings. This memory is carefully managed, and is likely not underneath your oom condition. Ironically, increasing shared buffers can make your problem more likely to occur as you are taking memory from other tasks. > --max_connections=533 Probably your problem is at least partially here. This number was configured specifically, implying you are running out of connections and had to crank this value. If you have a lot of idle connections sitting around (say, if you have several application servers managing connection pools), a sudden spike in memory load and/or connection utilization can cause this to occur. Suggest one or more of: * lowering shared buffers * implementing pgbouncer and lowering max_connections * increasing physical memory Also, > I'm using logical replication with around 30-40 active subscribers on this machine. Hm. Have you considered streaming replica setup, so that you can attach read only processes to the standby? merlin
Re: Postgres Out Of Memory Crash
On Thu, 2023-11-02 at 09:12 +, Avi Weinberg wrote: > I'm using Patroni Postgres installation and noticed that twice already > postgres > crashed due to out of memory. I'm using logical replication with around 30-40 > active subscribers on this machine. The machine has 128GB but only 32GB is > allocated > to Postgres. How can I know what is actually causing the out of memory issue? > Is it caused by not optimal postgres configuration or something else? You should look into the PostgreSQL log. That should show a message like LOG: server process (PID 16024) was terminated by signal 9: Killed DETAIL: Failed process was running: SELECT ... It is not certain, but often that statement is the one that used up all that memory. At least it is a starting point for your investigation. Yours, Laurenz Albe
Postgres Out Of Memory Crash
Hi experts, I'm using Patroni Postgres installation and noticed that twice already postgres crashed due to out of memory. I'm using logical replication with around 30-40 active subscribers on this machine. The machine has 128GB but only 32GB is allocated to Postgres. How can I know what is actually causing the out of memory issue? Is it caused by not optimal postgres configuration or something else? /usr/lib/postgresql/13/bin/postgres -D /home/postgres/pgdata/pgroot/data --config-file=/home/postgres/pgdata/pgroot/data/postgresql.conf --port=5432 --cluster_name=postgres-cluster --wal_level=logical --hot_standby=on --max_connections=533 --max_wal_senders=90 --max_prepared_transactions=0 --max_locks_per_transaction=64 --track_commit_timestamp=on --max_replication_slots=90 --max_worker_processes=30 --wal_log_hints=on Oct 27 07:05:31 node2 kernel: postgres invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=993 Oct 27 07:05:31 node2 kernel: postgres cpuset=docker-6ae67e04710619972d3b1ab5d4c69c318d001c2da47fecee121cdc60279a14a0.scope mems_allowed=0 Oct 27 07:05:31 node2 kernel: CPU: 6 PID: 15536 Comm: postgres Kdump: loaded Tainted: G T 3.10.0-1160.el7.x86_64 #1 Oct 27 07:05:31 node2 kernel: Hardware name: Kontron MSP8040/4008, BIOS Core: 5.11, MSP804x: 1.57.0943FC77 05/06/2020 Oct 27 07:05:31 node2 kernel: Call Trace: Oct 27 07:05:31 node2 kernel: [] dump_stack+0x19/0x1b Oct 27 07:05:31 node2 kernel: [] dump_header+0x90/0x229 Oct 27 07:05:31 node2 kernel: [] ? ep_poll_callback+0xf8/0x220 Oct 27 07:05:31 node2 kernel: [] ? find_lock_task_mm+0x56/0xc0 Oct 27 07:05:31 node2 kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Oct 27 07:05:31 node2 kernel: [] oom_kill_process+0x2cd/0x490 Oct 27 07:05:31 node2 kernel: [] mem_cgroup_oom_synchronize+0x55c/0x590 Oct 27 07:05:31 node2 kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Oct 27 07:05:31 node2 kernel: [] pagefault_out_of_memory+0x14/0x90 Oct 27 07:05:31 node2 kernel: [] mm_fault_error+0x6a/0x157 Oct 27 07:05:31 node2 kernel: [] __do_page_fault+0x491/0x500 Oct 27 07:05:31 node2 kernel: [] do_page_fault+0x35/0x90 Oct 27 07:05:31 node2 kernel: [] page_fault+0x28/0x30 Oct 27 07:05:31 node2 kernel: Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-6ae67e0471061997$ Oct 27 07:05:31 node2 kernel: memory: usage 32768000kB, limit 32768000kB, failcnt 144867 Oct 27 07:05:31 node2 kernel: memory+swap: usage 32768000kB, limit 9007199254740988kB, failcnt 0 Oct 27 07:05:31 node2 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Oct 27 07:05:31 node2 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice: cache:$ Oct 27 07:05:31 node2 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-$ Oct 27 07:05:31 node2 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-$ Oct 27 07:05:31 node2 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Oct 27 07:05:31 node2 kernel: [13159] 0 13159 2391 3 0 -998 pause Oct 27 07:05:31 node2 kernel: [13322] 0 13322 1095 97 8 0 993 dumb-init Oct 27 07:05:31 node2 kernel: [13335] 0 13335 1156 171 8 0 993 sh Oct 27 07:05:31 node2 kernel: [13411] 0 13411 1137 98 8 0 993 runsvdir Oct 27 07:05:31 node2 kernel: [13438] 0 13438 1099 98 7 0 993 runsv Oct 27 07:05:31 node2 kernel: [13439] 0 13439 1099 98 7 0 993 runsv Oct 27 07:05:31 node2 kernel: [13440] 101 1344027026 1186 54 0 993 pgqd Oct 27 07:05:31 node2 kernel: [13441] 101 13441 155215 8237 101 0 993 patroni Oct 27 07:05:31 node2 kernel: [19532] 101 19532 174003346817 171 0 993 postgres Oct 27 07:05:31 node2 kernel: [19542] 101 19542 1767874 6713 121 0 993 postgres Oct 27 07:05:31 node2 kernel: [19546] 101 19546 1740173 14450313166 0 993 postgres Oct 27 07:05:31 node2 kernel: [19547] 101 19547 174006920060 171 0 993 postgres Oct 27 07:05:31 node2 kernel: [19548] 101 19548 1740027 4821 86 0 993 postgres Oct 27 07:05:31 node2 kernel: [19549] 101 19549 1740283 1011 91 0 993 postgres Oct 27 07:05:31 node2 kernel: [19549] 101 19549 174028