Hi All, I'm using pgsql resource agent ( resource-agents-3.9.5-9 ) on fedora20.
I'm testing various failure patterns in a pgsql replicated cluster using it. I think if MASTER PostgreSQL process has suspended for a long time, then the resource monitoring and demotion timed out, and the cluster cannot failover until resume. -----the Cluster status after master demotion timed out.----- Online: [ server1 server2 ] Master/Slave Set: msPostgresql [pgsql] pgsql (ocf::heartbeat:pgsql): FAILED server2 Stopped: [ server1 ] Clone Set: ping-gw-rsc-clone [ping-gw-rsc] Started: [ server1 server2 ] Node Attributes: * Node server1: + master-pgsql : -INFINITY + pgsql-data-status : STREAMING|SYNC + pgsql-status : STOP + ping-gw1 : 100 * Node server2: + master-pgsql : -INFINITY + pgsql-data-status : LATEST + pgsql-status : PRI + ping-gw1 : 100 Migration summary: * Node server1: * Node server2: pgsql: migration-threshold=1 fail-count=2 last-failure='Fri Apr 11 14:07:43 2014' Failed actions: pgsql_demote_0 on server2 'unknown error' (1): call=77, status=Timed Out, last-rc-change='Fri Apr 11 14:06:43 2014', queued=1ms, exec=60001ms ------------------------------------------------------- I think pgsql_real_stop() had better throw SIGKILL to PostgreSQL when the shutdown(-m i) command has timed out. What do you think abount my opinion ? Regards, Naoya --- Naoya Anzai Engineering Department NEC Solution Inovetors, Ltd. E-Mail: anzai-na...@mxu.nes.nec.co.jp --- _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems