Re: Postgres Out Of Memory Crash

2023-11-06 Thread Merlin Moncure
On Thu, Nov 2, 2023 at 4:13 AM Avi Weinberg  wrote:

> Hi experts,
>
>
>
> I'm using Patroni Postgres installation and noticed that twice already
> postgres crashed due to out of memory.  I'm using logical replication with
> around 30-40 active subscribers on this machine.  The machine has 128GB but
> only 32GB is allocated to Postgres.  How can I know what is actually
> causing the out of memory issue?  Is it caused by not optimal postgres
> configuration or something else?
>
>
>
> /usr/lib/postgresql/13/bin/postgres -D
> /home/postgres/pgdata/pgroot/data
> --config-file=/home/postgres/pgdata/pgroot/data/postgresql.conf --port=5432
> --cluster_name=postgres-cluster --wal_level=logical --hot_standby=on
> --max_connections=533 --max_wal_senders=90 --max_prepared_transactions=0
> --max_locks_per_transaction=64 --track_commit_timestamp=on
> --max_replication_slots=90 --max_worker_processes=30 --wal_log_hints=on
>

Couple things here.   You don't really allocate memory to postgres, rather
you set up memory reserved for certain kinds of buffering operations via
shared buffers, and other less important settings.   This memory is
carefully managed, and is likely not underneath your oom condition.
Ironically, increasing shared buffers can make your problem more likely to
occur as you are taking memory from other tasks.

>  --max_connections=533

Probably your problem is at least partially here.   This number was
configured specifically, implying you are running out of connections and
had to crank this value.

If you have a lot of idle connections sitting around (say, if you have
several application servers managing connection pools),  a sudden spike in
memory load and/or connection utilization can cause this to occur.  Suggest
one or more of:
* lowering shared buffers
* implementing pgbouncer and lowering max_connections
* increasing physical memory

Also,
> I'm using logical replication with around 30-40 active subscribers on
this machine.

Hm.  Have you considered streaming replica setup, so that you can attach
read only processes to the standby?

merlin


Re: Postgres Out Of Memory Crash

2023-11-02 Thread Laurenz Albe
On Thu, 2023-11-02 at 09:12 +, Avi Weinberg wrote:
> I'm using Patroni Postgres installation and noticed that twice already 
> postgres
> crashed due to out of memory.  I'm using logical replication with around 30-40
> active subscribers on this machine.  The machine has 128GB but only 32GB is 
> allocated
> to Postgres.  How can I know what is actually causing the out of memory issue?
> Is it caused by not optimal postgres configuration or something else?

You should look into the PostgreSQL log.  That should show a message like

  LOG:  server process (PID 16024) was terminated by signal 9: Killed
  DETAIL:  Failed process was running: SELECT ...

It is not certain, but often that statement is the one that used up
all that memory.  At least it is a starting point for your investigation.

Yours,
Laurenz Albe




Postgres Out Of Memory Crash

2023-11-02 Thread Avi Weinberg
Hi experts,

I'm using Patroni Postgres installation and noticed that twice already postgres 
crashed due to out of memory.  I'm using logical replication with around 30-40 
active subscribers on this machine.  The machine has 128GB but only 32GB is 
allocated to Postgres.  How can I know what is actually causing the out of 
memory issue?  Is it caused by not optimal postgres configuration or something 
else?

/usr/lib/postgresql/13/bin/postgres -D /home/postgres/pgdata/pgroot/data 
--config-file=/home/postgres/pgdata/pgroot/data/postgresql.conf --port=5432 
--cluster_name=postgres-cluster --wal_level=logical --hot_standby=on 
--max_connections=533 --max_wal_senders=90 --max_prepared_transactions=0 
--max_locks_per_transaction=64 --track_commit_timestamp=on 
--max_replication_slots=90 --max_worker_processes=30 --wal_log_hints=on


Oct 27 07:05:31 node2 kernel: postgres invoked oom-killer: gfp_mask=0xd0, 
order=0, oom_score_adj=993
Oct 27 07:05:31 node2 kernel: postgres 
cpuset=docker-6ae67e04710619972d3b1ab5d4c69c318d001c2da47fecee121cdc60279a14a0.scope
 mems_allowed=0
Oct 27 07:05:31 node2 kernel: CPU: 6 PID: 15536 Comm: postgres Kdump: 
loaded Tainted: G    T 3.10.0-1160.el7.x86_64 #1
Oct 27 07:05:31 node2 kernel: Hardware name: Kontron MSP8040/4008, BIOS 
Core: 5.11, MSP804x: 1.57.0943FC77 05/06/2020
Oct 27 07:05:31 node2 kernel: Call Trace:
Oct 27 07:05:31 node2 kernel: [] dump_stack+0x19/0x1b
Oct 27 07:05:31 node2 kernel: [] dump_header+0x90/0x229
Oct 27 07:05:31 node2 kernel: [] ? 
ep_poll_callback+0xf8/0x220
Oct 27 07:05:31 node2 kernel: [] ? 
find_lock_task_mm+0x56/0xc0
Oct 27 07:05:31 node2 kernel: [] ? 
try_get_mem_cgroup_from_mm+0x28/0x60
Oct 27 07:05:31 node2 kernel: [] 
oom_kill_process+0x2cd/0x490
Oct 27 07:05:31 node2 kernel: [] 
mem_cgroup_oom_synchronize+0x55c/0x590
Oct 27 07:05:31 node2 kernel: [] ? 
mem_cgroup_charge_common+0xc0/0xc0
Oct 27 07:05:31 node2 kernel: [] 
pagefault_out_of_memory+0x14/0x90
Oct 27 07:05:31 node2 kernel: [] mm_fault_error+0x6a/0x157
Oct 27 07:05:31 node2 kernel: [] 
__do_page_fault+0x491/0x500
Oct 27 07:05:31 node2 kernel: [] do_page_fault+0x35/0x90
Oct 27 07:05:31 node2 kernel: [] page_fault+0x28/0x30
Oct 27 07:05:31 node2 kernel: Task in 
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-6ae67e0471061997$
Oct 27 07:05:31 node2 kernel: memory: usage 32768000kB, limit 32768000kB, 
failcnt 144867
Oct 27 07:05:31 node2 kernel: memory+swap: usage 32768000kB, limit 
9007199254740988kB, failcnt 0
Oct 27 07:05:31 node2 kernel: kmem: usage 0kB, limit 9007199254740988kB, 
failcnt 0
Oct 27 07:05:31 node2 kernel: Memory cgroup stats for 
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice:
 cache:$
Oct 27 07:05:31 node2 kernel: Memory cgroup stats for 
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-$
Oct 27 07:05:31 node2 kernel: Memory cgroup stats for 
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-$
Oct 27 07:05:31 node2 kernel: [ pid ]   uid  tgid total_vm  rss nr_ptes 
swapents oom_score_adj name
Oct 27 07:05:31 node2 kernel: [13159] 0 13159  2391   3 
   0  -998 pause
Oct 27 07:05:31 node2 kernel: [13322] 0 13322 1095   97   8 
   0   993 dumb-init
Oct 27 07:05:31 node2 kernel: [13335] 0 13335 1156  171   8 
   0   993 sh
Oct 27 07:05:31 node2 kernel: [13411] 0 13411 1137   98   8 
   0   993 runsvdir
Oct 27 07:05:31 node2 kernel: [13438] 0 13438 1099   98   7 
   0   993 runsv
Oct 27 07:05:31 node2 kernel: [13439] 0 13439 1099   98   7 
   0   993 runsv
Oct 27 07:05:31 node2 kernel: [13440]   101 1344027026 1186  54 
   0   993 pgqd
Oct 27 07:05:31 node2 kernel: [13441]   101 13441   155215 8237 101 
   0   993 patroni
Oct 27 07:05:31 node2 kernel: [19532]   101 19532  174003346817 171 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19542]   101 19542  1767874 6713 121 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19546]   101 19546  1740173  14450313166 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19547]   101 19547  174006920060 171 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19548]   101 19548  1740027 4821  86 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19549]   101 19549  1740283 1011  91 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19549]   101 19549  174028