Hi, I found the problem should be because I tried to clean RAM cache in the slave by running "sync; echo 3 > /proc/sys/vm/drop_caches' that caused the "receiver" of slave gone away.
ps -ef | grep receiver postgres 6182 6178 0 12:11 ? 00:00:06 postgres: wal receiver process streaming D/FB8DA000 sync; echo 3 > /proc/sys/vm/drop_caches ps -ef | grep receiver root 8804 30447 0 12:29 pts/2 00:00:00 grep --color=auto receiver regards On 6 Aug 2013, at 10:44 AM, ascot.m...@gmail.com wrote: > Hi, > > I am doing some stress tests to a pair of PG servers to monitor the > pg_stat_replication, during the test, the pg_stat_replication suddenly became > empty. > > PG version: 9.2.4 > O/S: Ubuntu: 12.04 > > Since I need to monitor the replication lag from time to time, if the > pg_stat_replication becomes empty, the lag calculation in the slave will be > wrong. > Please advise if this is a bug. > > regards > > > > > How to reproduce: > > session 1: Master server - try to insert a large number of records into a > test table > postgres=# drop table test;CREATE TABLE test (id INTEGER PRIMARY KEY); INSERT > INTO test VALUES (generate_series(1,100000000)); EXPLAIN ANALYZE SELECT > COUNT(*) FROM test; > > 2) session 2: Master server - check the byte_lag from time to time > postgres=# SELECT > sent_offset - ( > replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lag > FROM ( > SELECT > client_addr, > ('x' || lpad(split_part(sent_location, '/', 1), 8, > '0'))::bit(32)::bigint AS sent_xlog, > ('x' || lpad(split_part(replay_location, '/', 1), 8, > '0'))::bit(32)::bigint AS replay_xlog, > ('x' || lpad(split_part(sent_location, '/', 2), 8, > '0'))::bit(32)::bigint AS sent_offset, > ('x' || lpad(split_part(replay_location, '/', 2), 8, > '0'))::bit(32)::bigint AS replay_offset > FROM pg_stat_replication > ) AS s; > byte_lag > ---------- > 2097216 > (1 row) > > postgres=# SELECT > > sent_offset - ( > replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 ) AS byte_lag > FROM ( > SELECT > client_addr, > ('x' || lpad(split_part(sent_location, '/', 1), 8, > '0'))::bit(32)::bigint AS sent_xlog, > ('x' || lpad(split_part(replay_location, '/', 1), 8, > '0'))::bit(32)::bigint AS replay_xlog, > ('x' || lpad(split_part(sent_location, '/', 2), 8, > '0'))::bit(32)::bigint AS sent_offset, > ('x' || lpad(split_part(replay_location, '/', 2), 8, > '0'))::bit(32)::bigint AS replay_offset > FROM pg_stat_replication > ) AS s; > byte_lag > ---------- > (0 rows) > > > 3) session 3: Slave server - > postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = > pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - > pg_last_xact_replay_timestamp()) END AS log_delay; > log_delay > ----------- > 0 > (1 row) > > postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = > pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - > pg_last_xact_replay_timestamp()) END AS log_delay; > log_delay > ----------- > 4.873282 > (1 row) > > . > . > . > postgres=# SELECT CASE WHEN pg_last_xlog_receive_location() = > pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - > pg_last_xact_replay_timestamp()) END AS log_delay; > log_delay > ------------- > 4070.325329 > (1 row) > >