Re: [GENERAL] checkpoint and recovering process use too much memory
Thank you, Justin Pryzby. I reset shared_buffer to 16GB,and the memory usage of checkpoint and recovering just stayed at 16GB. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 192956 postgres 20 0 18.5g 16g 16g S 1.3 25.9 19:44.69 postgres: startup process recovering 000408A30035 192960 postgres 20 0 18.5g 16g 16g S 0.7 25.8 11:13.79 postgres: checkpointer process 192951 postgres 20 0 18.5g 1.9g 1.9g S 0.0 3.1 0:01.75 /usr/pgsql-9.6/bin/postmaster -D /data/pgdata Thank you again for your help. On 11/03/2017 10:21 AM, Justin Pryzby wrote: > On Fri, Nov 03, 2017 at 01:43:32AM +, tao tony wrote: >> I had an asynchronous steaming replication HA cluster.Each node had 64G >> memory.pg is 9.6.2 and deployed on centos 6. >> >> Last month the database was killed by OS kernel for OOM,the checkpoint >> process was killed. > If you still have logs, was it killed during a large query? Perhaps one using > a hash aggregate? > >> I noticed checkpoint process occupied memory for more than 20GB,and it was >> growing everyday.In the hot-standby node,the recovering process occupied >> memory as big as checkpoint process. > "resident" RAM of a postgres subprocess is often just be the fraction of > shared_buffers it's read/written. checkpointer must necessarily read all > dirty > pages from s-b and write out to disk (by way of page cache), so that's why its > RSS is nearly 32GB. And the recovery process is continuously writing into > s-b. > >> Now In the standby node,checkpoint and recovering process used more then >> 50GB memory as below,and I worried someday the cluster would be killed by OS >> again. >> >> PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND >> 167158 postgres 20 0 34.9g 25g 25g S 0.0 40.4 46:36.86 postgres: >> startup process recovering 00040855004B >> 167162 postgres 20 0 34.9g 25g 25g S 0.0 40.2 17:58.38 postgres: >> checkpointer process >> >> shared_buffers = 32GB > Also, what is work_mem ? > > Justin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] checkpoint and recovering process use too much memory
Hi, On 2017-11-03 01:43:32 +, tao tony wrote: > I had an asynchronous steaming replication HA cluster.Each node had 64G > memory.pg is 9.6.2 and deployed on centos 6. > > > Last month the database was killed by OS kernel for OOM,the checkpoint > process was killed. > > > I noticed checkpoint process occupied memory for more than 20GB,and it was > growing everyday.In the hot-standby node,the recovering process occupied > memory as big as checkpoint process. > > > I turned the checkpoint parameters,but it didn't worked. > > Now In the standby node,checkpoint and recovering process used more then > 50GB memory as below,and I worried someday the cluster would be killed by OS > again. > >PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 167158 postgres 20 0 34.9g 25g 25g S 0.0 40.4 46:36.86 postgres: > startup process recovering 00040855004B > 167162 postgres 20 0 34.9g 25g 25g S 0.0 40.2 17:58.38 postgres: > checkpointer process > > > shared_buffers = 32GB I think it's more likely that what you're seeing is just that both checkpointer and startup process touch most shared buffers and thus show up as having touched all that memory. Regards, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] checkpoint and recovering process use too much memory
On Fri, Nov 03, 2017 at 01:43:32AM +, tao tony wrote: > I had an asynchronous steaming replication HA cluster.Each node had 64G > memory.pg is 9.6.2 and deployed on centos 6. > > Last month the database was killed by OS kernel for OOM,the checkpoint > process was killed. If you still have logs, was it killed during a large query? Perhaps one using a hash aggregate? > I noticed checkpoint process occupied memory for more than 20GB,and it was > growing everyday.In the hot-standby node,the recovering process occupied > memory as big as checkpoint process. "resident" RAM of a postgres subprocess is often just be the fraction of shared_buffers it's read/written. checkpointer must necessarily read all dirty pages from s-b and write out to disk (by way of page cache), so that's why its RSS is nearly 32GB. And the recovery process is continuously writing into s-b. > Now In the standby node,checkpoint and recovering process used more then > 50GB memory as below,and I worried someday the cluster would be killed by OS > again. > >PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 167158 postgres 20 0 34.9g 25g 25g S 0.0 40.4 46:36.86 postgres: > startup process recovering 00040855004B > 167162 postgres 20 0 34.9g 25g 25g S 0.0 40.2 17:58.38 postgres: > checkpointer process > > shared_buffers = 32GB Also, what is work_mem ? Justin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] checkpoint and recovering process use too much memory
hi dears, I had an asynchronous steaming replication HA cluster.Each node had 64G memory.pg is 9.6.2 and deployed on centos 6. Last month the database was killed by OS kernel for OOM,the checkpoint process was killed. I noticed checkpoint process occupied memory for more than 20GB,and it was growing everyday.In the hot-standby node,the recovering process occupied memory as big as checkpoint process. I turned the checkpoint parameters,but it didn't worked. Now In the standby node,checkpoint and recovering process used more then 50GB memory as below,and I worried someday the cluster would be killed by OS again. PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 167158 postgres 20 0 34.9g 25g 25g S 0.0 40.4 46:36.86 postgres: startup process recovering 00040855004B 167162 postgres 20 0 34.9g 25g 25g S 0.0 40.2 17:58.38 postgres: checkpointer process shared_buffers = 32GB and this is my checkpoint configuration: checkpoint_timeout = 5min # range 30s-1d max_wal_size = 3GB min_wal_size = 1GB checkpoint_completion_target = 0.6 # checkpoint target duration, 0.0 - 1.0 checkpoint_flush_after = 1MB# 0 disables, # default is 256kB on linux, 0 otherwise checkpoint_warning = 300s # 0 disables in log file ,it started every 5 minites. 2017-11-02 16:55:14.063 CST,,,758087,,59e5b9c2.b9147,2498,,2017-10-17 16:05:22 CST,,0,LOG,0,"checkpoint complete: wrote 29416 buffers (0.7%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=269.904 s, sync=0.017 s, total=269.930 s; sync files=86, longest=0.006 s, average=0.000 s; distance=235633 kB, estimate=816034 kB""LogCheckpointEnd, xlog.c:8121","" 2017-11-02 16:55:44.093 CST,,,758087,,59e5b9c2.b9147,2499,,2017-10-17 16:05:22 CST,,0,LOG,0,"checkpoint starting: time""LogCheckpointStart, xlog.c:8039","" 2017-11-02 16:59:13.824 CST,,,758087,,59e5b9c2.b9147,2500,,2017-10-17 16:05:22 CST,,0,LOG,0,"checkpoint complete: wrote 2091 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 29 recycled; write=209.585 s, sync=0.072 s, total=209.731 s; sync files=46, longest=0.024 s, average=0.001 s; distance=11700 kB, estimate=735601 kB""LogCheckpointEnd, xlog.c:8121","" 2017-11-02 17:00:44.903 CST,,,758087,,59e5b9c2.b9147,2501,,2017-10-17 16:05:22 CST,,0,LOG,0,"checkpoint starting: time""LogCheckpointStart, xlog.c:8039","" 2017-11-02 17:05:14.249 CST,,,758087,,59e5b9c2.b9147,2502,,2017-10-17 16:05:22 CST,,0,LOG,0,"checkpoint complete: wrote 93078 buffers (2.2%); 0 transaction log file(s) added, 23 removed, 44 recycled; write=269.104 s, sync=0.023 s, total=269.346 s; sync files=109, longest=0.010 s, average=0.000 s; distance=480607 kB, estimate=710102 kB""LogCheckpointEnd, xlog.c:8121","" 2017-11-02 17:05:44.269 CST,,,758087,,59e5b9c2.b9147,2503,,2017-10-17 16:05:22 CST,,0,LOG,0,"checkpoint starting: time""LogCheckpointStart, xlog.c:8039","" 2017-11-02 17:10:14.215 CST,,,758087,,59e5b9c2.b9147,2504,,2017-10-17 16:05:22 CST,,0,LOG,0,"checkpoint complete: wrote 177599 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 53 recycled; write=269.728 s, sync=0.010 s, total=269.945 s; sync files=88, longest=0.004 s, average=0.000 s; distance=1140361 kB, estimate=1140361 kB""LogCheckpointEnd, xlog.c:8121","" Pleas kindly let me know how could I reduce the 2 process memory usage.Many many thanks.