Re: [GENERAL] checkpoint and recovering process use too much memory

2017-11-05 Thread tao tony
Thank you,  Justin Pryzby.
I reset shared_buffer to 16GB,and the  memory usage of  checkpoint and 
recovering just stayed at 16GB.

   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+ COMMAND
192956 postgres  20   0 18.5g  16g  16g S  1.3 25.9  19:44.69 postgres: 
startup process   recovering 000408A30035
192960 postgres  20   0 18.5g  16g  16g S  0.7 25.8  11:13.79 postgres: 
checkpointer process
192951 postgres  20   0 18.5g 1.9g 1.9g S  0.0  3.1   0:01.75 
/usr/pgsql-9.6/bin/postmaster -D /data/pgdata

Thank you again for your help.

On 11/03/2017 10:21 AM, Justin Pryzby wrote:
> On Fri, Nov 03, 2017 at 01:43:32AM +, tao tony wrote:
>> I had an asynchronous steaming replication HA cluster.Each node had 64G 
>> memory.pg is 9.6.2 and deployed on centos 6.
>>
>> Last month the database was killed by OS kernel for OOM,the checkpoint 
>> process was killed.
> If you still have logs, was it killed during a large query?  Perhaps one using
> a hash aggregate?
>
>> I noticed checkpoint process occupied memory for more than 20GB,and it was 
>> growing everyday.In the hot-standby node,the recovering process occupied 
>> memory as big as checkpoint process.
> "resident" RAM of a postgres subprocess is often just be the fraction of
> shared_buffers it's read/written.  checkpointer must necessarily read all 
> dirty
> pages from s-b and write out to disk (by way of page cache), so that's why its
> RSS is nearly 32GB.  And the recovery process is continuously writing into 
> s-b.
>
>> Now In the standby node,checkpoint and recovering process  used more then 
>> 50GB memory as below,and I worried someday the cluster would be killed by OS 
>> again.
>>
>> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>> 167158 postgres  20   0 34.9g  25g  25g S  0.0 40.4  46:36.86 postgres: 
>> startup process   recovering 00040855004B
>> 167162 postgres  20   0 34.9g  25g  25g S  0.0 40.2  17:58.38 postgres: 
>> checkpointer process
>>
>> shared_buffers = 32GB
> Also, what is work_mem ?
>
> Justin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] checkpoint and recovering process use too much memory

2017-11-02 Thread Andres Freund
Hi,

On 2017-11-03 01:43:32 +, tao tony wrote:
> I had an asynchronous steaming replication HA cluster.Each node had 64G 
> memory.pg is 9.6.2 and deployed on centos 6.
> 
> 
> Last month the database was killed by OS kernel for OOM,the checkpoint 
> process was killed.
> 
> 
> I noticed checkpoint process occupied memory for more than 20GB,and it was 
> growing everyday.In the hot-standby node,the recovering process occupied 
> memory as big as checkpoint process.
> 
> 
> I turned the checkpoint parameters,but it didn't worked.
> 
> Now In the standby node,checkpoint and recovering process  used more then 
> 50GB memory as below,and I worried someday the cluster would be killed by OS 
> again.
> 
>PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 167158 postgres  20   0 34.9g  25g  25g S  0.0 40.4  46:36.86 postgres: 
> startup process   recovering 00040855004B
> 167162 postgres  20   0 34.9g  25g  25g S  0.0 40.2  17:58.38 postgres: 
> checkpointer process
> 
> 
> shared_buffers = 32GB

I think it's more likely that what you're seeing is just that both
checkpointer and startup process touch most shared buffers and thus show
up as having touched all that memory.

Regards,

Andres Freund


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] checkpoint and recovering process use too much memory

2017-11-02 Thread Justin Pryzby
On Fri, Nov 03, 2017 at 01:43:32AM +, tao tony wrote:
> I had an asynchronous steaming replication HA cluster.Each node had 64G 
> memory.pg is 9.6.2 and deployed on centos 6.
> 
> Last month the database was killed by OS kernel for OOM,the checkpoint 
> process was killed.

If you still have logs, was it killed during a large query?  Perhaps one using
a hash aggregate?

> I noticed checkpoint process occupied memory for more than 20GB,and it was 
> growing everyday.In the hot-standby node,the recovering process occupied 
> memory as big as checkpoint process.

"resident" RAM of a postgres subprocess is often just be the fraction of
shared_buffers it's read/written.  checkpointer must necessarily read all dirty
pages from s-b and write out to disk (by way of page cache), so that's why its
RSS is nearly 32GB.  And the recovery process is continuously writing into s-b.

> Now In the standby node,checkpoint and recovering process  used more then 
> 50GB memory as below,and I worried someday the cluster would be killed by OS 
> again.
> 
>PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 167158 postgres  20   0 34.9g  25g  25g S  0.0 40.4  46:36.86 postgres: 
> startup process   recovering 00040855004B
> 167162 postgres  20   0 34.9g  25g  25g S  0.0 40.2  17:58.38 postgres: 
> checkpointer process
> 
> shared_buffers = 32GB

Also, what is work_mem ?

Justin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general