On 10/30/2014 09:19 PM, Andres Freund wrote:
Some things I noticed while reading the patch:

A lot of good comments, but let me pick up just two that are related:

* There's a couple record types (e.g. XLOG_SMGR_TRUNCATE) that only
   refer to the relation, but not to the block number. These still log
   their rnode manually. Shouldn't we somehow deal with those in a
   similar way explicit block references are dealt with?

* Hm. At least WriteMZeroPageXlogRec (and probably the same for all the
   other slru stuff) doesn't include a reference to the page. Isn't that
   bad? Shouldn't we make XLogRegisterBlock() usable for that case?
   Otherwise I fail to see how pg_rewind like tools can sanely deal with this?

Yeah, there are still operations that modify relation pages, but don't store the information about the modified pages in the standard format. That includes XLOG_SMGR_TRUNCATE that you spotted, and XLOG_SMGR_CREATE, and also XLOG_DBASE_CREATE/DROP. And then there are updates to non-relation files, like all the slru stuff, relcache init files, etc. And updates to the FSM and VM bypass the full-page write mechanism too.

To play it safe, pg_rewind copies all non-relation files as is. That includes all SLRUs, FSM and VM files, and everything else whose filename doesn't match the (main fork of) a relation file. Of course, that's a fair amount of copying to do, so we might want to optimize that in the future, but I want to nail the relation files first. They are usually an order of magnitude larger than the other files, after all.

Unfortunately pg_rewind still needs to recognize and parse the special WAL records like XLOG_SMGR_CREATE/TRUNCATE, that modify relation files outside the normal block registration system. I've been thinking that we should add another flag to the WAL record format to mark such records. pg_rewind will still need to understand the record format of such records, but the flag will help to catch bugs of omission. If pg_rewind or another such tool sees a record that's flagged as "special", but doesn't recognize the record type, it can throw an error.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to