[GENERAL] Trouble with replication

David Greco Wed, 05 Jun 2013 13:41:37 -0700

I've setup two 9.2.4 servers to serve as master-slave in a streaming 
replication scenario. I started with a  fresh database on the master, setup the 
replication, then imported using pg_restore about 30GB of data. The master and 
slave are geographically separated, so replication of this amount of data 
can/should take hours.  I saw from pg_last_xlog_receive_location and 
pg_last_xlog_replay_location that the slave began to receive the replication 
information, it eventually quit with the following errors in the log:


2013-06-05 16:28:43.198 EDT,,,19978,,51af9f7a.4e0a,2,,2013-06-05 16:28:42 
EDT,,0,FATAL,XX000,"could not receive data from WAL stream: FATAL:  requested 
WAL segment 000000010000000000000022 has already been removed
",,,,,,,,,""

Checking the master, I see that file has in fact been removed from the pg_xlog 
directory. The master has archive_command setup to ship the wal files to the 
slave, and the slave is setup with a recovery_command to read them from that 
directory. In fact, that WAL segment exists in the slave's pg_xlog directory as 
well.

Now, from what I can tell, the master archived this wal file out of its xlog 
directory (based on the keep wal segments setting). Then, why did the slave not 
pick it up from the directory that it was archived to? It is my understanding 
that the log shipping via archive_command from master to slave is precisely 
there to prevent this scenario. What am I doing wrong? Below are some of the 
pertinent settings.

On Master:
wal_level  = hot_standby
archive_mode = on
archive_command = 'rsync -zaq %p 
postgres@pg-slave:/var/lib/pgsql/wal_restore/%f && test ! -f 
/var/lib/pgsql/backup/wal_archive/%f && cp %p 
/var/lib/pgsql/backup/wal_archive/'
archive_timeout = 300
max_wal_senders = 5
wal_keep_segments = 0   # not sure why I've set it to this?

On Slave:
wal_level = hot_standby
archive_mode = on
archive_command = 'test ! -f /var/lib/pgsql/backup/wal_archive/%f && cp -i %p 
/var/lib/pgsql/backup/wal_archive/%f < /dev/null'
hot_standby = on
wal_keep_segments = 1

On slave - recovery.conf:
standby_mode = 'on'
primary_conninfo = 'host=pg-master port=5432 user=replicator'
restore_command = 'cp /var/lib/psql/wal_restore/%f %p'
archive_cleanup_command = 'pg_archivecleanup /var/lib/pgsql/wal_restore/ %r'

[GENERAL] Trouble with replication

Reply via email to