Thanks for the review!

Simon Riggs wrote:
Can I check some aspects of this related to Hot Standby? Some of them
sound obvious, but worth double checking.

* There will be no need to read FSM by any normal operation of a
read-only transaction, so locking correctness considerations can
possibly be ignored during recovery.

Correct.

A HOT prune operation doesn't currently update the FSM, but if it did, that would be a case of a read-only transaction updating the FSM. But we can't prune anyway in a hot standby.

 pg_freespacemap exists though:
would we need to prevent that from executing during recovery, or will
the FSM be fully readable? i.e. does redo take appropriate locks already
(I don't see any Cleanup locks being required).

pg_freespacemap, the contrib module? Yeah, the FSM should be fully readable.

During normal operation, when a bottom level page is updated, and the update needs to be bubbled up, the upper level page is pinned and locked before the lock on the lower level page is released. That interlocking is not done during WAL replay, and the lock on the lower level page is released before locking the upper page. It's not required during WAL replay, as there's no concurrent updates to the FSM.

* FSM will be continuously maintained during recovery, so FSM will now
be correct and immediately available when recovery completes?

Correct,

* There are no cases where a screwed-up FSM will crash either recovery
(FATAL+) or halt normal operation (PANIC)?

Hmm, there's no explicit elog(FATAL/PANIC) calls, but if the FSM is really corrupt, you can probably get a segfault. That should be fixable by adding more sanity checks, though.

* incomplete action cleanup is fairly cheap and doesn't rely on the FSM
being searchable to correct the error? This last is a hard one...

Correct.

Do we have the concept of a invalid/corrupt FSM?  What happens if the
logic goes wrong and we have a corrupt page? Will that mean we can't
> complete actions against the heap?

Some scenarios I can think of:

Scenario: The binary tree on a page is corrupt, so that the value of an upper node is < Max(leftchild, rightchild). Consequence: Searchers won't see the free space below that node, and will look elsewhere.

Scenario: The binary tree on a page is corrupt, so that the value of an upper node is > Max(leftchild, rightchild). Consequence: Searchers will notice the corruption while trying to traverse down that path, and throw an elog(WARNING) in search_avail(). fsm_search will retry the search, and will in worst case go into an infinite loop. That's obviously not good. We could automatically fix the upper nodes of the tree, but that would wipe evidence that would be useful in debugging.

Scenario: An upper level page is corrupt, claiming that there's no free space on a lower level page, while there actually is. (the opposite, where an upper level page claims that there *is* free space on a lower level page, while there actually isn't, is now normal. The searcher will update the upper level page in that case)
Consequence: A searcher won't see that free space, and will look elsewhere.

Scenario: An upper level page is corrupt, claiming that there is free space on a lower level page that doesn't exist. Consequence: Searchers will elog(ERROR), trying to read the non-existent FSM page.

The 3rd scenario would lead to heap inserts/updates failing. We could avoid that by checking that the page exists with RelationGetNumberOfBlocks(), but I'm not sure if it's worth the cost.

Are there really any changes to these files?
src/include/storage/bufmgr.h
src/include/postmaster/bgwriter.h

Hmm, apparently not.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to