Hello list, A little while ago time ago I posted about how my ... exciting .... backup procedure caused occasional problems starting due to clog not being big enough. (http://archives.postgresql.org/pgsql-hackers/2011-04/msg01148.php) I recently had a reproduction and a little bit of luck, and I think I have a slightly better idea of what may be causing this.
The first fact is that turning off hot standby will let the cluster start up, but only after seeing a spate of messages like these (dozen or dozens, not thousands): 2011-06-09 08:02:32 UTC LOG: restored log file "000000020000002C000000C0" from archive 2011-06-09 08:02:33 UTC WARNING: xlog min recovery request 2C/C1F09658 is past current point 2C/C037B278 2011-06-09 08:02:33 UTC CONTEXT: writing block 0 of relation base/16385/16784_vm xlog redo insert: rel 1663/16385/128029; tid 114321/63 2011-06-09 08:02:33 UTC LOG: restartpoint starting: xlog Most importantly, *all* such messages are in visibility map forks (_vm). I reasonably confident that my code does not start reading data until pg_start_backup() has returned, and blocks on pg_stop_backup() after having read all the data. Also, the mailing list correspondence at http://archives.postgresql.org/pgsql-hackers/2010-11/msg02034.php suggests that the visibility map is not flushed at checkpoints, so perhaps with some poor timing an old page can wander onto disk even after a checkpoint barrier that pg_start_backup waits for. (I have not yet found the critical section that makes visibilitymap buffers immune to checkpoint though). Given all that, if the smgr's generic read path that checks the LSN and possibly the clog (but apparently only in hot standby mode, since pre-hot-standby the clog's intermediate states were not so interesting...) has a problem with such uncheckpointed pages, then it would seem reasonable that the system refuses to start vs. the way it once did. FWIW, letting recovery run without hot standby for a little while, canceling, and then starting again after the danger zone had passed would allow recovery to proceed correctly, as one might expect. Thoughts? -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers