Hi, I've found something critical into the way that find_primary_node() detect if a node is a standby host and by deduction the primary node in streaming replication mode.
When it call "SELECT pg_is_in_recovery() AND pgpool_walrecrunning()" it always return false if pg_is_in_recovery = 'true' and pgpool_walrecrunning = 'false'. That mean that the host can be promoted as master, the situation will going worse if a failover occurs and PgPool takes this fake host as the master. This is not a problem that can occurs easily but let check the following: - Master has failover to node 1. - Node 0 is reconstructed but the primary_conninfo host in recovery.conf is wrong/broken for some reason. At this stage, both nodes are running with PgPool status 2 and currently the only problem is that node 0 is not replicated from the master node. Then if a failover from node 1 occurs, PgPool will try to find a new master by querying "SELECT pg_is_in_recovery() AND pgpool_walrecrunning()" to all nodes, but as node 0 will return false because the wal receiver is not running, PgPool will take it as the new master, and very bad annoying things will happen from this point. What I suggest is to try to find the master by finding a node that is not in recovery mode and where wal receiver is not running. This will allow a simple fix by changing the SQL query : SELECT pg_is_in_recovery() = 'f' AND pgpool_walrecrunning() = 'f'; or SELECT not pg_is_in_recovery() AND not pgpool_walrecrunning(); What's your opinion, if every one is ok with that fix I already have a patch for that. Regards, -- Gilles Darold http://dalibo.com - http://dalibo.org _______________________________________________ Pgpool-hackers mailing list [email protected] http://pgfoundry.org/mailman/listinfo/pgpool-hackers
