Re: Monitoring PITR recovery progress

2019-01-23 Thread Alan Hodgson
On Wed, 2019-01-23 at 18:58 +0100, Ivan Voras wrote:
> And, the actual question: how to monitor the WAL replay process?
> Currently, the recovery.conf file is sitting there, with the database
> running, but pg processes are idle, and pg_stat_activity doesn't list
> anything which appears to be related to the recovery process.
> 
> 
> 

The server logs each wal segment that gets processed during recovery.
And you would definitely see a busy high-I/IO process applying the
recovery.
It also logs when the recovery is complete. And I'm pretty sure it
renames recovery.conf to recovery.done or something when it's done.

Monitoring PITR recovery progress

2019-01-23 Thread Ivan Voras
Hello,

I got a sort of POLA violation moment today - a colleague has restored a
PITR archive up to a point in time, and when the developers looked at the
data, it looked wrong - as it if wasn't from that particular time. Later,
he told me he got an error trying to use pg_dump to extract the desired
tables to restore:

Dumping the contents of table "xxx" failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR: canceling statement due to
conflict with recovery
DETAIL: User query might have needed to see row versions that must be
removed.

This could be the source of data problems the devs are seeing. Offhand,
this looks like it's indicating a conflict between the PITR recovery
process and pg_dump. But, something's puzzling to me: why did he manage to
connect to the db at all?

And, the actual question: how to monitor the WAL replay process? Currently,
the recovery.conf file is sitting there, with the database running, but pg
processes are idle, and pg_stat_activity doesn't list anything which
appears to be related to the recovery process.