On 8.12.2012 03:08, Jeff Janes wrote: > On Thu, Dec 6, 2012 at 3:52 PM, Tomas Vondra <t...@fuzzy.cz> wrote: >> Hi, >> >> On 6.12.2012 23:45, MauMau wrote: >>> From: "Tom Lane" <t...@sss.pgh.pa.us> >>>> Well, that's unfortunate, but it's not clear that automatic recovery is >>>> possible. The only way out of it would be if an undamaged copy of the >>>> segment was in pg_xlog/ ... but if I recall the logic correctly, we'd >>>> not even be trying to fetch from the archive if we had a local copy. >>> >>> No, PG will try to fetch the WAL file from pg_xlog when it cannot get it >>> from archive. XLogFileReadAnyTLI() does that. Also, PG manual contains >>> the following description: >>> >>> http://www.postgresql.org/docs/9.1/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL >>> >>> >>> WAL segments that cannot be found in the archive will be sought in >>> pg_xlog/; this allows use of recent un-archived segments. However, >>> segments that are available from the archive will be used in preference >>> to files in pg_xlog/. >> >> So why don't you use an archive command that does not create such >> incomplete files? I mean something like this: >> >> archive_command = 'cp %p /arch/%f.tmp && mv /arch/%f.tmp /arch/%f' >> >> Until the file is renamed, it's considered 'incomplete'. > > Wouldn't having the incomplete file be preferable over having none of it at > all? > > It seems to me you need considerable expertise to figure out how to do > optimal recovery (i.e. losing the least transactions) in this > situation, and that that expertise cannot be automated. Do you trust > a partial file from a good hard drive, or a complete file from a > partially melted pg_xlog?
It clearly is a rather complex issue, no doubt about that. And yes, reliability of the devices with pg_xlog on them is an important detail. Alghough if the WAL is not written in a reliable way, you're hosed anyway I guess. The recommended archive command is based on the assumption that the local pg_xlog is intact (e.g. because it's located on a reliable RAID1 array), which seems to be the assumption of the OP too. In my opinion it's more likely to meet an incomplete copy of WAL in the archive than a corrupted local WAL. And if it really is corrupted, it would be identified during replay. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers