Re: [Pgpool-general] diagnosing BackendError from pcp_recovery_node
On 10/11/11 17:48, Tatsuo Ishii wrote: No. It's just a matter of when 'shutdown host pgpool(5432)' message was issued. Actual shutdown (set down status in shared memory area) was done befoere follow command was executed. I think real problem here is, when the private backend status (copy of shared memory area, which pcp process looks into) gets updated. Currently at the very end of failover process, pgpool main sends SIGUSR1 to pcp process requesting a restart. So it might be possible for pcp process to look up outdated status if the follow command starts too early. Attached is a patch to solve the problem. With the patch pcp process refesh the copy of backend status whenever it's idle, rather than after receiving SIGUSR1. Please try. So far in testing, this patch appears to resolve the problem. Thanks much! Any chance the follow_master_command could get an escape code indicating what the new primary node has become? -- Jeff Frost CTO, PostgreSQL Experts, Inc. Phone: 1-888-PG-EXPRT x506 FAX: 415-762-5122 http://www.pgexperts.com/ ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general
Re: [Pgpool-general] difference between master and primary node in streaming mode
On 10/05/11 16:09, Tatsuo Ishii wrote: >> What's the difference between the master and primary node in streaming >> replication mode? >> >> > The master node is the node which is alive and has the least node > id. Not important in streaming replication mode(important in pgpool-II > native replication mode, though). > -- Ah! Thank you! So, Is it safe to assume that in streaming replication mode, the %H special value for follow_master_command will actually be the hostname of the new primary node and not the hostname of the new master node? -- Jeff Frost CTO, PostgreSQL Experts, Inc. Phone: 1-888-PG-EXPRT x506 FAX: 415-762-5122 http://www.pgexperts.com/ ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general
[Pgpool-general] diagnosing BackendError from pcp_recovery_node
I have a follow_master_command set which calls the pcp_recovery_node script after editing a config file on the replica which has been degenerated. Occasionally this fails with BackendError. It looks like the node doesn't get degenerated fast enough and so it's marked as still alive, thus cancelling the recovery that I just requested: BTW, my backend hostnames are: backend_hostname0: pgpool backend_hostname1: node1 backend_hostname2: node2 2011-10-05 14:30:02 LOG: pid 31743: execute command: /var/lib/postgresql/scripts/failover 0 0 node1 Warning: Permanently added 'node1,216.121.61.228' (RSA) to the list of known hosts. Warning: Permanently added 'node1,216.121.61.228' (RSA) to the list of known hosts. 2011-10-05 14:30:14 LOG: pid 31743: find_primary_node_repeatedly: waiting for finding a primary node 2011-10-05 14:30:14 LOG: pid 31743: find_primary_node: primary node id is 1 2011-10-05 14:30:14 LOG: pid 31743: starting follow degeneration. shutdown host pgpool(5432) 2011-10-05 14:30:14 LOG: pid 31743: starting follow degeneration. shutdown host node2(5432) 2011-10-05 14:30:14 LOG: pid 31743: failover: 2 follow backends have been degenerated 2011-10-05 14:30:14 LOG: pid 31743: failover: set new primary node: 1 2011-10-05 14:30:14 LOG: pid 31743: failover: set new master node: 1 2011-10-05 14:30:14 LOG: pid 2941: start triggering follow command. 2011-10-05 14:30:14 LOG: pid 2941: execute command: /var/lib/postgresql/scripts/pgpool_remaster_replica pgpool node1 0 2011-10-05 14:30:14 LOG: pid 31743: failover done. shutdown host pgpool(5432) DEBUG: send: tos="R", len=44 DEBUG: recv: tos="r", len=21, data=AuthenticationOK 2011-10-05 14:30:15 LOG: pid 2536: starting recovering node 0 2011-10-05 14:30:15 ERROR: pid 2536: start_recovery: backend node 0 is alive DEBUG: send: tos="D", len=6 DEBUG: recv: tos="e", len=20, data=recovery failed DEBUG: command failed. reason=recovery failed DEBUG: send: tos="X", len=4 BackendError Notice that the 'shutdown host pgpool(5432)' message was logged AFTER my follow command was executed, so does this: ERROR: pid 2536: start_recovery: backend node 0 is alive mean that the node is not detached or that postgresql is still running? or something else I'm not thinking of? Looks like this is the bit of code in question: pool_log("starting recovering node %d", recovery_node); if (VALID_BACKEND(recovery_node)) { pool_error("start_recovery: backend node %d is alive", recovery_node); return 1; } So, how does it define a VALID_BACKEND and how can I modify my follow_master_command to allow the pcp_recovery_node to finish successfully? If I run the pcp_recovery_node again by hand afterward, it always works fine. I thought perhaps adding a pcp_detach_node before the pcp_recovery_node would do the trick, but it does not. /var/lib/postgresql/scripts/pgpool_remaster_replica: #!/bin/bash CONF_FILE=/var/lib/postgresql/scripts/pitr-replication.conf . ${CONF_FILE} SCRIPT_DIR=$(dirname ${CONF_FILE}) echo "$0 $* run on $(hostname)" >> ${REPLICATION_LOG} REPLICA=$1 MASTER=$2 NODEID=$3 # Update the MASTER value in the replication config file ${SSH} ${SSH_OPT} ${REPLICA} /bin/sh -c "'sed -e \"s/MASTER=.*$/MASTER=${MASTER}/\" ${CONF_FILE} > ${CONF_FILE}.swap && mv ${CONF_FILE}.swap ${CONF_FILE}'" ${SSH} ${SSH_OPT} -T ${REPLICA} $PGCTL ${PGCTLOPTIONS} -m fast -w -D ${PGDATA} stop 2>/dev/null 1>/dev/null < /dev/null & #/usr/local/bin/pcp_detach_node --debug ${PCPTIMEOUT} ${PGPOOLHOST} ${PCPPORT} ${PGPOOLUSER} ${PGPOOLPASS} ${NODEID} /usr/local/bin/pcp_recovery_node --debug ${PCPTIMEOUT} ${PGPOOLHOST} ${PCPPORT} ${PGPOOLUSER} ${PGPOOLPASS} ${NODEID} -- Jeff Frost CTO, PostgreSQL Experts, Inc. Phone: 1-888-PG-EXPRT x506 FAX: 415-762-5122 http://www.pgexperts.com/ ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general
[Pgpool-general] difference between master and primary node in streaming mode
What's the difference between the master and primary node in streaming replication mode? Occasionally, in my testing, I see this: Oct 5 11:36:08 31214-1-807393 pgpool[28871]: find_primary_node_repeatedly: waiting for finding a primary node Oct 5 11:36:08 31214-1-807393 pgpool[28871]: last message repeated 10 times Oct 5 11:36:08 31214-1-807393 pgpool[28871]: find_primary_node: 0 node is standby Oct 5 11:36:08 31214-1-807393 pgpool[28871]: last message repeated 10 times Oct 5 11:36:08 31214-1-807393 pgpool[28871]: find_primary_node: primary node id is 1 Oct 5 11:36:08 31214-1-807393 pgpool[28871]: failover: set new primary node: 1 Oct 5 11:36:08 31214-1-807393 pgpool[28871]: failover: set new master node: 0 Note that the primary and master are different. I understand that the primary node is considered whichever node is not in recovery mode, but what is the master? -- Jeff Frost CTO, PostgreSQL Experts, Inc. Phone: 1-888-PG-EXPRT x506 FAX: 415-762-5122 http://www.pgexperts.com/ ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general
Re: [Pgpool-general] how to find node role and order of scripts being run
On Oct 4, 2011, at 5:42 PM, Tatsuo Ishii wrote: >>> In Streaming replication mode, how do I find which node is currently which >>> role? That is, I'd like to know which node is considered the primary and >>> which are replicas. > > You can use standard PostgreSQL function pg_is_in_recover() to get the > information. It's not necessarily identical to what pgpool-II thinks, > but since pgpool-II also uses the function to judge which is which, > you could assume that what pg_is_in_recover() returns is what > pgpool-II thinks in real world. > > It would be nice if pcp command returns that info directly though. Ah, ok, so no way to find out what pgpool thinks internally. Drat! I ask because I'm trying to troubleshoot some scripts and with 3 active nodes, detaching node 0 causes the system to failover properly to node 1, but if I recover node 0, then detach node 1, the log indicates it can't find a primary node and the failover script eventually gets a -1 as the 'new master node'. I'll try comparing the recovery state to see if that disagrees with what I think is actually happening. How does pgpool use the pg_is_in_recovery() function? --- Jeff Frost CTO, PostgreSQL Experts, Inc. Phone: 1-888-PG-EXPRT x506 FAX: 415-762-5122 http://www.pgexperts.com/ ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general
Re: [Pgpool-general] how to find node role and order of scripts being run
On Oct 4, 2011, at 4:29 PM, Jeff Frost wrote: > In Streaming replication mode, how do I find which node is currently which > role? That is, I'd like to know which node is considered the primary and > which are replicas. > > Also, what is the failback command expected to accomplish? Should have mentioned: I'm specifically asking about version 3.1.0. --- Jeff Frost CTO, PostgreSQL Experts, Inc. Phone: 1-888-PG-EXPRT x506 FAX: 415-762-5122 http://www.pgexperts.com/ ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general
[Pgpool-general] how to find node role and order of scripts being run
In Streaming replication mode, how do I find which node is currently which role? That is, I'd like to know which node is considered the primary and which are replicas. Also, what is the failback command expected to accomplish? -- Jeff Frost CTO, PostgreSQL Experts, Inc. Phone: 1-888-PG-EXPRT x506 FAX: 415-762-5122 http://www.pgexperts.com/ ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general