On Sun, Oct 23, 2011 at 9:48 PM, Daniel Farina <dan...@heroku.com> wrote: > On Mon, Oct 17, 2011 at 11:30 PM, Chris Redekop <ch...@replicon.com> wrote: >> Well, on the other hand maybe there is something wrong with the data. >> Here's the test/steps I just did - >> 1. I do the pg_basebackup when the master is under load, hot slave now will >> not start up but warm slave will. >> 2. I start a warm slave and let it catch up to current >> 3. On the slave I change 'hot_standby=on' and do a 'service postgresql >> restart' >> 4. The postgres fails to restart with the same error. >> 5. I turn hot_standby back off and postgres starts back up fine as a warm >> slave >> 6. I then turn off the load, the slave is all caught up, master and slave >> are both sitting idle >> 7. I, again, change 'hot_standby=on' and do a service restart >> 8. Again it fails, with the same error, even though there is no longer any >> load. >> 9. I repeat this warmstart/hotstart cycle a couple more times until to my >> surprise, instead of failing, it successfully starts up as a hot standby >> (this is after maybe 5 minutes or so of sitting idle) >> So...given that it continued to fail even after the load had been turned of, >> that makes me believe that the data which was copied over was invalid in >> some way. And when a checkpoint/logrotation/somethingelse occurred when not >> under load it cleared itself up....I'm shooting in the dark here >> Anyone have any suggestions/ideas/things to try? > > Having digged at this a little -- but not too much -- the problem > seems to be that postgres is reading the commit logs way, way too > early, that is to say, before it has played enough WAL to be > 'consistent' (the WAL between pg_start and pg_stop backup). I have > not been able to reproduce this problem (I think) after the message > from postgres suggesting it has reached a consistent state; at that > time I am able to go into hot-standby mode. > > The message is like: "consistent recovery state reached at %X/%X". > (this is the errmsg) > > It doesn't seem meaningful for StartupCLOG (or, indeed, any of the > hot-standby path functionality) to be called before that code is > executed, but it is anyway right now. I'm not sure if this oversight > is simply an oversight, or indicative of a misplaced assumption > somewhere. Basically, my thoughts for a fix are to suppress > hot_standby = on (in spirit) before the consistent recovery state is > reached.
Not sure about that, but I'll look at where this comes from. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers