Hi I have reverted to cp as archive command, but know under heavy load (> 150 WAL segments in a minute) it happens that some wal segments gets corrupted:
postgres@lemur:~/9.1/main/pg_xlog$ md5sum 000000010000001000000049 f1906d2745224430f811496df466203f 000000010000001000000049 postgres@lemur:~/9.1/main/pg_xlog$ md5sum ~/backups/wal/000000010000001000000049 7e73fe759e41e427497360a815f9d3e1 /var/lib/postgresql/backups/wal/000000010000001000000049 On Fri, Apr 26, 2013 at 10:55 AM, Albe Laurenz <laurenz.a...@wien.gv.at>wrote: > German Becker wrote: > > Here is the archive part of the config: > > > > archive_mode = on # allows archiving to be done > > # (change requires restart) > > archive_command = '/var/lib/postgresql/scripts/archive_copy.sh %p %f' > # command to use to > > archive a logfile segment > > #archive_timeout = 0 # force a logfile segment switch after > this > > # number of seconds; 0 disables > > So the problem might be in that script. > > > The archive coommand makes a local copy and then it copies to the backup > server via ssh. Both copies > > are md5-checked and retried up to 3 times in case of failure. > > archive_command should not retry the operation, but rather > return a non-zero return code. > > > I have seen under heavy load that some WALs are skipped, some have less > size, some are corrupted (i,e, > > the loop fails 3 times). > > I'm not sure about the return value (checking it). What is the expected > behaviour of the archiver? > > Will it retry de archive if archive command returns differnt than 0? > Will it retain the WAL segment > > until it is succesfuly archived? > > See > http://www.postgresql.org/docs/current/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL > > archive_command should exit with zero only if the > WAL segment was archived successfully. > PostgreSQL will retry and retain the WAL segment until > archival succeeds. > > Yours, > Laurenz Albe >