[GENERAL] Detect streaming replication failure

2014-07-17 Thread Lists

For reference:
https://wiki.postgresql.org/wiki/Streaming_Replication

Assume a master - slave streaming replication configuration, Postgresql 
9.2.
Assume that the master has been chugging away, but the slave PG service 
has been offline
for a while and the wal archive has updated enough that the slave cannot 
catch up.


When I start the slave PG instance, pg launches and runs but doesn't 
update. It also doesn't seem to throw any errors. The only outward sign 
that I can see that anything is wrong is that 
pg_last_xlog_replay_location() doesn't update. I can look in 
/var/lib/pgsql/9.2/data/pg_log/postgresql-Thu.csv and see errors there EG:


2014-07-17 22:38:23.851 UTC,,,21310,,53c8505f.533e,2,,2014-07-17 
22:38:23 UTC,,0,FATAL,XX000,could not receive data from WAL stream: 
FATAL:  requested WAL segment 000700050071 has already been 
removed


Is that the only way to detect this condition? I guess I'm looking for 
something like


select * from pg_is_replicating_ok();
1

on the slave. At the moment, it appears that I can either parse the log 
file, or look for pg_last_xact_replay_timestamp()  acceptable threshold 
minutes in the past.


http://www.postgresql.org/docs/9.2/static/functions-admin.html

Thanks,

Ben


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Detect streaming replication failure

2014-07-17 Thread wd
you can run select * from pg_stat_replication on master to check all the
salve stats.


On Fri, Jul 18, 2014 at 6:50 AM, Lists li...@benjamindsmith.com wrote:

 For reference:
 https://wiki.postgresql.org/wiki/Streaming_Replication

 Assume a master - slave streaming replication configuration, Postgresql
 9.2.
 Assume that the master has been chugging away, but the slave PG service
 has been offline
 for a while and the wal archive has updated enough that the slave cannot
 catch up.

 When I start the slave PG instance, pg launches and runs but doesn't
 update. It also doesn't seem to throw any errors. The only outward sign
 that I can see that anything is wrong is that
 pg_last_xlog_replay_location() doesn't update. I can look in
 /var/lib/pgsql/9.2/data/pg_log/postgresql-Thu.csv and see errors there EG:

 2014-07-17 22:38:23.851 UTC,,,21310,,53c8505f.533e,2,,2014-07-17 22:38:23
 UTC,,0,FATAL,XX000,could not receive data from WAL stream: FATAL:
  requested WAL segment 000700050071 has already been removed

 Is that the only way to detect this condition? I guess I'm looking for
 something like

 select * from pg_is_replicating_ok();
 1

 on the slave. At the moment, it appears that I can either parse the log
 file, or look for pg_last_xact_replay_timestamp()  acceptable threshold
 minutes in the past.

 http://www.postgresql.org/docs/9.2/static/functions-admin.html

 Thanks,

 Ben


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general