On Thu, Jun 10, 2010 at 12:01 PM, Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> wrote: > We're talking about a corrupt record (incorrect CRC, incorrect backlink > etc.), not errors within redo functions. During crash recovery, a corrupt > record means you've reached end of WAL. In standby mode, when streaming WAL > from master, that shouldn't happen, and it's not clear what to do if it > does. PANIC is not a good idea, at least if the server uses hot standby, > because that only makes the situation worse from availability point of view. > So we log the error as a WARNING, and keep retrying. It's unlikely that the > problem will just go away, but we keep retrying anyway in the hope that it > does. However, it seems that we're too aggressive with the retries.
You can reproduce this problem by doing the following. 1. initdb 2. edit postgresql.conf, set wal_level=hot_standby, max_wal_senders=1; edit pg_hba.conf, trust local replication connections 3. pg_ctl start 4. make some changes to the database 5. take a hot backup to another directory (call it pgslave) 6. create pgslave/recovery.conf with standby_mode='on', primary_conninfo=whatever, edit pgslave/postgresql.conf change the port number, set hot_standby=on 7. pg_ctl start -D pgslave At this point you should have a working HS/SR setup. Now: 8. shut the slave down 9. move recovery.conf out of the way 10. restart the slave - it will do recovery and enter normal running 11. make some database changes 12. stop the slave 13. put recovery.conf back 14. restart the slave 15. make a bunch of changes on the master When the slave then tries to replay, you then get something like: WARNING: invalid record length at 0/4005330 WARNING: invalid record length at 0/4005330 WARNING: invalid record length at 0/4005330 ...ad infinitum. Obviously there are other ways this could occur - the WAL could really be corrupted, for example - but the current handling is not too graceful. I'm actually thinking it might be better to trigger a shutdown if this happens. Probably something has gone haywire and manual intervention is required. Retrying when there's no hope of success isn't really that helpful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers