Re: [GENERAL] PG 9.0 EBS Snapshot Backups on Slave

2012-01-24 Thread Robert Treat
On Mon, Jan 23, 2012 at 8:02 PM, Alan Hodgson ahodg...@simkin.ca wrote:
 On Monday, January 23, 2012 07:54:16 PM Andrew Hannon wrote:
 It is worth noting that, the slave (seemingly) catches up eventually,
 recovering later log files with streaming replication current. Can I trust
 this state?


 Should be able to. The master will also actually retry the logs and eventually
 ship them all too, in my experience.


Right, as long as the failure case is temporary, the master should
retry, and things should work themselves out. It's good to have some
level of monitoring in place for such operations to make sure replay
doesn't get stalled.

That said, have you tested this backup? I'm a little concerned you'll
have ended up with something unusable because you aren't starting xlog
files that are going on during the snapshot time. It's possible that
you won't need them in most cases (we have a script called
zbackup[1] which does similar motions using zfs, though on zfs the
snapshot really is instantaneous, in I can't remember a time when we
got stuck by that, but that might just be faulty memory. A better
approach would probably be to take the omnipitr code [2], which
already had provisions for slaves from backups and catching the
appropriate   wal files, and rewrite the rsync bits to use snapshots
instead, which would give you some assurances against possibly missing
files.

[1] this script is old and crufty, but provides a good example:
http://labs.omniti.com/labs/pgtreats/browser/trunk/tools/zbackup.sh

[2] https://github.com/omniti-labs/omnipitr


Robert Treat
conjecture: xzilla.net
consulting: omniti.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] PG 9.0 EBS Snapshot Backups on Slave

2012-01-23 Thread Andrew Hannon
Hello,

I am playing with a script that implements physical backups by snapshotting the 
EBS-backed software RAID. My basic workflow is this:

1. Stop PG on the slave
2. pg_start_backup on the master
3. On the slave:
   A. unmount the PG RAID
   B. snapshot each disk in the raid
   C. mount the PG RAID 
4. pg_stop_backup
5. Restart PG on the slave

Step 3 is actually quite fast, however, on the master, I end up seeing the 
following warning:

WARNING:  transaction log file 000100CC0076 could not be 
archived: too many failures

I am guessing (I will confirm with timestamps later) this warning happens 
during steps 3A-3C, however my questions below stand regardless of when this 
failure occurs.

It is worth noting that, the slave (seemingly) catches up eventually, 
recovering later log files with streaming replication current. Can I trust this 
state?

Should I be concerned about this warning? Is it a simple blip that can easily 
be ignored, or have I lost data? From googling, it looks like retry attempts is 
not a configurable parameter (it appears to have retried a handful of times).

If this is indeed a real problem, am I best off changing my archive_command to 
retain logs in a transient location when I am in snapshot mode, and then ship 
them in bulk once the snapshot has completed? Are there any other remedies that 
I am missing?

Thank you very much for your time,

Andrew Hannon  
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG 9.0 EBS Snapshot Backups on Slave

2012-01-23 Thread Alan Hodgson
On Monday, January 23, 2012 07:54:16 PM Andrew Hannon wrote:
 It is worth noting that, the slave (seemingly) catches up eventually,
 recovering later log files with streaming replication current. Can I trust
 this state?
 

Should be able to. The master will also actually retry the logs and eventually 
ship them all too, in my experience.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general