>> BTW, after applying the patches I got following errors while doing >> online recovery. In my testing node 0 is down status and is the >> recovery target. Node 1 is up and running as primary node. This worked >> perfectly before applying your patches. Thoought? >> >> 2011-03-09 08:37:17 ERROR: pid 15531: health check failed. 0 th host /tmp at >> port 5433 is down >> 2011-03-09 08:37:17 LOG: pid 15531: set 0 th backend down status >> 2011-03-09 08:37:17 LOG: pid 15531: starting degeneration. shutdown host >> /tmp(5433) >> 2011-03-09 08:37:17 LOG: pid 15531: execute command: >> /usr/local/etc/failover.sh 0 "/tmp" 5433 /usr/local/pgsql/data 1 0 "/tmp" 0 >> 2011-03-09 08:37:17 LOG: pid 15531: find_primary_node: 1 node is standby >> 2011-03-09 08:37:17 LOG: pid 15531: find_primary_node: no primary node >> found >> 2011-03-09 08:37:17 LOG: pid 15531: Primary node id saved: -1 >> 2011-03-09 08:37:17 LOG: pid 15531: failover done. shutdown host /tmp(5433) >> 2011-03-09 08:37:34 LOG: pid 15566: starting recovering node 0 >> 2011-03-09 08:37:34 ERROR: pid 15566: start_recover: could not connect >> master node. > > Have you applied the entire patch, any reject ?
No. > I mean this error > appears when the change in find_primary_node() has not be done. Please > take a look, you must have: > > SELECT pg_is_in_recovery() AND pgpool_walrecrunning() > > replaced by: > > SELECT not pg_is_in_recovery() AND not pgpool_walrecrunning() > > and the response comparison: strcmp(res->data[0], "t") replaced by > strcmp(res->data[0], "f") > > Could you please check that? I will check again in my side to see if I > forgot something in the patch. All above seem to be fine. By further testing, it seems the error occurs when online recovery repeats two or more times. This time I got: 2011-03-09 18:13:04 ERROR: pid 13569: health check failed. 1 th host /tmp at port 5434 is down 2011-03-09 18:13:04 LOG: pid 13569: set 1 th backend down status 2011-03-09 18:13:04 LOG: pid 13569: starting degeneration. shutdown host /tmp(5434) 2011-03-09 18:13:04 LOG: pid 13569: execute command: /usr/local/etc/failover.sh 1 "/tmp" 5434 /usr/local/pgsql/standby 0 1 "/tmp" 1 2011-03-09 18:13:05 LOG: pid 13569: find_primary_node: 0 node is standby 2011-03-09 18:13:05 LOG: pid 13569: find_primary_node: no primary node found 2011-03-09 18:13:05 LOG: pid 13569: Primary node id saved: -1 2011-03-09 18:13:05 LOG: pid 13569: failover done. shutdown host /tmp(5434) 2011-03-09 18:13:18 LOG: pid 13604: starting recovering node 1 2011-03-09 18:13:18 ERROR: pid 13604: start_recover: could not connect master node. I did the testing in following sequences: 1) node 0 down, node 1 primary 2) recover node 0 (fine) 3) node 0 standby, node 1 primary 4) node 1 down, node 0 promotes to proimary 5) recover node 1 and got above errors -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp _______________________________________________ Pgpool-hackers mailing list [email protected] http://pgfoundry.org/mailman/listinfo/pgpool-hackers
