[Pacemaker] pgsql troubles.
Good Afternoon, I am having loads of trouble with pacemaker/corosync/postgres. Defining the symptoms is rather difficult. The primary being that postgres starts as slave on both nodes. I have tested the pgsqlRA start/stop/status/monitor and they work from the command line after I setup the environment. I have not been able to get promote/demote to work, there are issues with NODENAME not being defined. I am able to run postgres in master/slave mode outside of pacemaker. I can provide additional logs but here is a start. Distributor ID: Ubuntu Description:Ubuntu 12.04.3 LTS Release:12.04 Codename: precise latest verions of pgsql RA (yesterday) pacemaker 1.1.6-2ubuntu3.1 HA cluster resource manager corosync 1.4.2-2Standards-based cluster framework (daemon and module resource-agents 1:3.9.2-5ubuntu4.1 Cluster Resource Agents I have upgraded pgsqlRA to the lastest from git. Last updated: Wed Nov 26 13:55:59 2014 Last change: Wed Nov 26 13:55:58 2014 via crm_attribute on tstdb04 Stack: openais Current DC: tstdb04 - partition with quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, 2 expected votes 4 Resources configured. Online: [ tstdb03 tstdb04 ] Full list of resources: Resource Group: master-group vip-master (ocf::heartbeat:IPaddr2): Stopped vip-rep(ocf::heartbeat:IPaddr2): Stopped Master/Slave Set: msPostgresql [pgsql] Slaves: [ tstdb04 ] Stopped: [ pgsql:0 ] Node Attributes: * Node tstdb03: + master-pgsql:0: -INFINITY + pgsql-data-status : DISCONNECT * Node tstdb04: + master-pgsql:1: -INFINITY + pgsql-data-status : DISCONNECT Migration summary: * Node tstdb04: * Node tstdb03: pgsql:0: migration-threshold=1 fail-count=100 Failed actions: pgsql:0_start_0 (node=tstdb03, call=5, rc=1, status=complete): unknown error config: property \ no-quorum-policy="ignore" \ stonith-enabled="false" \ crmd-transition-delay="0" rsc_defaults \ resource-stickiness="INFINITY" \ migration-threshold="1" group master-group \ vip-master \ vip-rep primitive vip-master ocf:heartbeat:IPaddr2 \ params \ ip="10.132.101.95" \ nic="eth0" \ cidr_netmask="24" \ op start timeout="60s" interval="0" on-fail="restart" \ op monitor timeout="60s" interval="10s" on-fail="restart" \ op stoptimeout="60s" interval="0" on-fail="block" primitive vip-rep ocf:heartbeat:IPaddr2 \ params \ ip="10.132.101.96" \ nic="eth0" \ cidr_netmask="24" \ meta \ migration-threshold="0" \ op start timeout="60s" interval="0" on-fail="stop" \ op monitor timeout="60s" interval="10s" on-fail="restart" \ op stoptimeout="60s" interval="0" on-fail="ignore" master msPostgresql pgsql \ meta \ master-max="1" \ master-node-max="1" \ clone-max="2" \ clone-node-max="1" \ notify="true" primitive pgsql ocf:heartbeat:pgsql \ params \ pgctl="/usr/bin/pg_ctl" \ psql="/usr/bin/psql" \ pgdata="/database/9.3" \ config="/etc/postgresql/9.3/main/postgresql.conf" \ socketdir=/var/run/postgresql \ rep_mode="sync" \ node_list="tstdb03 tstdb04" \ restore_command="cp /database/archive/%f %p" \ primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" \ master_ip="10.132.101.95" \ restart_on_promote="true" \ logfile=/var/log/postgresql/postgresql-9.3-main.log \ op start timeout="60s" interval="0" on-fail="restart" \ op monitor timeout="60s" interval="4s" on-fail="restart" \ op monitor timeout="60s" interval="3s" on-fail="restart" role="Master" \ op promote timeout="60s" interval="0" on-fail="restart" \ op demote timeout="60s" interval="0" on-fail="stop" \ op stoptimeout="60s" interval="0" on-fail="block" \ op notify timeout="60s" interval="0" #colocation rsc_colocation-1 inf: vip-master msPostgresql:Master #order rsc_order-1 0: msPostgresql:promote vip-master:start symmetrical=false #order rsc_order-2 0: msPostgresql:demote vip-rep:stop symmetrical=false colocation rsc_colocation-1 inf: master-group msPostgresql:Master order rsc_order-1 0: msPostgresql:promote master-group:start symmetrical=false order rsc_order-2 0: msPostgresql:demote master-group:stop symmetrical=false ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Failed-over incomplete
sorry for my mistyping, it's res.vBKN6 --teenigma On Thu, Dec 4, 2014 at 4:23 PM, Andrei Borzenkov wrote: > On Thu, Dec 4, 2014 at 9:52 AM, Teerapatr Kittiratanachai > wrote: >> Dear Andrei, >> >> Since the failed over is uncompleted so all the resource isn't failed >> over to another node. >> >> I think this case happened because of the res.vBKN is go into unmanaged >> state. >> > > There is no resource res.vBKN in your logs or configuration snippet > you have shown. > >> But why? Since there is no configuration is changed. >> >> --teenigma >> >> On Thu, Dec 4, 2014 at 1:41 PM, Andrei Borzenkov wrote: >>> On Thu, Dec 4, 2014 at 4:56 AM, Teerapatr Kittiratanachai >>> wrote: Dear List, We are using Pacemaker and Corosync with CMAN as our HA software as below version. OS:CentOS release 6.5 (Final) 64-bit Pacemaker:pacemaker.x86_641.1.10-14.el6_5.3 Corosync:corosync.x86_641.4.1-17.el6_5.1 CMAN:cman.x86_643.0.12.1-59.el6_5.2 Resource-Agent:resource-agents.x86_643.9.5-3.12 Topology:2 Nodes with Active/Standby model. (MySQL is Active/Active by clone) All packages are install from CentOS official repository, and the Resource-Agent is only one which be installed from OpenSUSE repository (http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/). The system is work normally for few months until yesterday morning, around 03:35 UTC+0700, we found that one of resource is go into UNMANAGED state without any configuration changed. After another resource is failed, the pacemaker try to failed-over resource to another node but it incomplete after facing this resource. Configuration of some resource is below and the LOG during event is in attached file. >>> >>> The log just covers resource monitor failure and stopping of >>> resources. It does not contain any event related to starting resources >>> on another nodes. >>> >>> You would need to collect crm_report with start time before resource >>> failed and stop time after resources were started on another node. >>> primitive res.vBKN6 IPv6addr \ params ipv6addr="2001:db8:0:f::61a" cidr_netmask=64 nic=eth0 \ op monitor interval=10s primitive res.vDMZ6 IPv6addr \ params ipv6addr="2001:db8:0:9::61a" cidr_netmask=64 nic=eth1 \ op monitor interval=10s group gr.mainService res.vDMZ4 res.vDMZ6 res.vBKN4 res.vBKN6 res.http res.ftp rsc_defaults rsc_defaults-options: \ migration-threshold=1 Please help me to solve this problem. --teenigma ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org >>> >>> ___ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> ___ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Failed-over incomplete
On Thu, Dec 4, 2014 at 9:52 AM, Teerapatr Kittiratanachai wrote: > Dear Andrei, > > Since the failed over is uncompleted so all the resource isn't failed > over to another node. > > I think this case happened because of the res.vBKN is go into unmanaged state. > There is no resource res.vBKN in your logs or configuration snippet you have shown. > But why? Since there is no configuration is changed. > > --teenigma > > On Thu, Dec 4, 2014 at 1:41 PM, Andrei Borzenkov wrote: >> On Thu, Dec 4, 2014 at 4:56 AM, Teerapatr Kittiratanachai >> wrote: >>> Dear List, >>> >>> We are using Pacemaker and Corosync with CMAN as our HA software as >>> below version. >>> >>> OS:CentOS release 6.5 (Final) 64-bit >>> Pacemaker:pacemaker.x86_641.1.10-14.el6_5.3 >>> Corosync:corosync.x86_641.4.1-17.el6_5.1 >>> CMAN:cman.x86_643.0.12.1-59.el6_5.2 >>> Resource-Agent:resource-agents.x86_643.9.5-3.12 >>> >>> Topology:2 Nodes with Active/Standby model. (MySQL is >>> Active/Active by clone) >>> >>> All packages are install from CentOS official repository, and the >>> Resource-Agent is only one which be installed from OpenSUSE repository >>> (http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/). >>> >>> The system is work normally for few months until yesterday morning, >>> around 03:35 UTC+0700, we found that one of resource is go into >>> UNMANAGED state without any configuration changed. After another >>> resource is failed, the pacemaker try to failed-over resource to >>> another node but it incomplete after facing this resource. >>> >>> Configuration of some resource is below and the LOG during event is in >>> attached file. >>> >> >> The log just covers resource monitor failure and stopping of >> resources. It does not contain any event related to starting resources >> on another nodes. >> >> You would need to collect crm_report with start time before resource >> failed and stop time after resources were started on another node. >> >>> primitive res.vBKN6 IPv6addr \ >>> params ipv6addr="2001:db8:0:f::61a" cidr_netmask=64 nic=eth0 \ >>> op monitor interval=10s >>> >>> primitive res.vDMZ6 IPv6addr \ >>> params ipv6addr="2001:db8:0:9::61a" cidr_netmask=64 nic=eth1 \ >>> op monitor interval=10s >>> >>> group gr.mainService res.vDMZ4 res.vDMZ6 res.vBKN4 res.vBKN6 res.http >>> res.ftp >>> >>> rsc_defaults rsc_defaults-options: \ >>> migration-threshold=1 >>> >>> Please help me to solve this problem. >>> >>> --teenigma >>> >>> ___ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> ___ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Avoid monitoring of resources on nodes
Andrew Beekhof writes: > What version of pacemaker is this? > Some very old versions wanted the agent to be installed on all nodes. It's 1.1.10+git20130802-1ubuntu2.1 on Trusty Tahr. Regards. -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org