Hi.

I've been struggling with understanding all the necessary pieces for
streaming replication. So I put down the pieces as I did understand them,
and would appreciate if you guys could point out any of the stuff I
understood or have done wrong.

The set up is pgpool + streaming replication + hot stand by. No load
balancing, stand-by nodes will not receive any application queries (I don't
have that big of a query load, and I don't want to risk inconsistent
reads). There are no shared file systems, but there is a way to rsync/scp
files between nodes. Fail-over is automatic, and should kick in within
reasonably small period after master failure.

1. Archiving. Should be turned on on all the nodes. The archive command
should copy the archive file to the local archive directory, and rsync
archive directory between all the nodes. My understanding is that archiving
is necessary if a stand-by node ever "missed" enough WAL updates to need an
old enough WAL that might have been removed from pg_xlog.

QUESTION: After the failover, the new master will start archiving its WAL
files. These archived WALs will not collide in any way with the archived
WALs generated by previous master(s)?

QUESTION: What is a good policy for archive clean up? From the perspective
to only remove archive files that are guaranteed to never be required by
any nodes.

2. Failover. On master failure, pgpool will automatically select a new
master, and degenerate all other nodes. The cluster is now in the emergency
state and requires manual intervention for reconfiguration and recovery.
pgpool executes a script to promote a node, that script will create a
trigger file on a newly selected master node, and postgres will exist
stand-by mode.

QUESTION: If multiple pgpools are running, and if there are no network
problems, and configuration files are identical, is there any guarantee
that the same stand-by node will be selected for promotion? Concern here is
that with configuration of (M-SB0-SB1) one pgpool decides to promote SB0
and another - SB1, causing both of them to enter master mode, and splitting
the cluster. It does look that pgpool will always select next "alive" node
for promotion, but I couldn't find a definitive statement on that.

3. Recovery. That part is a bit confusing. The majority of the
documentation says that in this case, the node should be re-loaded from the
base backup, obtained from the master. I'm not sure why this is necessary,
if there are enough archived WALs.

QUESTION: Is there any metric to understand whether hauling base will be
slower/faster than replaying missed WALs? Anyway, pgpool only has one
recovery mechanism, and it does invoke a base restore from whatever current
master is.

PROBLEM: This I see as a problem. The only way that I see to re-attach a
node to the pgpool, short of restarting it, is to call pcp_recovery_node.
This will make the master take a base back up, push it to the stand-by that
needs recovery, and re-start the stand-by node. I am not sure if there is a
good way to check if that node has already been recovered. That because if
there are more than 2 pgpools, they both will attempt to recover the same
stand-by, and this will probably get ugly.

Thank you,
  Pawel.

Reply via email to