[Pacemaker] Slave does not start after failover: Mysql circular replication and master-slave resources
Hi All, Some time ago I exchanged a couple of posts with you here regarding Mysql active-active HA. The best solution I found so far was the Mysql multi-master replication, also referred to as circular replication. Basically I set up two nodes, both were capable of the master role, and the changes were immediately propagated to the other node. But still I wanted to have a M/S approach, to have a RW master and a RO slave - mainly because I prefer to have a signle master VIP where my apps can connect to. (In the first approach I configured a two node clone, and the master IP was always bound to one of the nodes) I applied the following configuration: node db1 \ attributes IP="10.100.1.31" \ attributes standby="off" db2-log-file-db-mysql="mysql-bin.21" db2-log-pos-db-mysql="40730" node db2 \ attributes IP="10.100.1.32" \ attributes standby="off" primitive db-ip-master ocf:heartbeat:IPaddr2 \ params lvs_support="true" ip="10.100.1.30" cidr_netmask="8" broadcast="10.255.255.255" \ op monitor interval="20s" timeout="20s" \ meta target-role="Started" primitive db-mysql ocf:heartbeat:mysql \ params binary="/usr/bin/mysqld_safe" config="/etc/mysql/my.cnf" datadir="/var/lib/mysql" user="mysql" pid="/var/run/mysqld/mysqld.pid" socket="/var/run/mysqld/mysqld.sock" test_passwd="X" test_table="replicatest.connectioncheck" test_user="slave_user" replication_user="slave_user" replication_passwd="X" additional_parameters="--skip-slave-start" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="120s" \ op monitor interval="30" timeout="30s" OCF_CHECK_LEVEL="1" \ op promote interval="0" timeout="120" \ op demote interval="0" timeout="120" ms db-ms-mysql db-mysql \ meta notify="true" master-max="1" clone-max="2" target-role="Started" colocation db-ip-with-master inf: db-ip-master db-ms-mysql:Master property $id="cib-bootstrap-options" \ dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="0" The setup works in the basic conditions: * After the "first" startup, nodes start up as slaves, and shortly after, one of them is promoted to master. * Updates to the master are replicated properly to the slave. * Slave accepts updates, which is Wrong, but I can live with this - I will allow connect to the Master VIP only. * If I stop the slave for some time, and re-start it, it will catch up with the master shortly and get into sync. I have, however a serious issue: * If I stop the current master, the slave is promoted, accepts RW queries, the Master IP is bound to it - ALL fine. * BUT - when I want to bring the other node online, it simply shows: Stopped (not installed) Online: [ db1 db2 ] db-ip-master(ocf::heartbeat:IPaddr2): Started db1 Master/Slave Set: db-ms-mysql [db-mysql] Masters: [ db1 ] Stopped: [ db-mysql:1 ] Node Attributes: * Node db1: + IP: 10.100.1.31 + db2-log-file-db-mysql : mysql-bin.21 + db2-log-pos-db-mysql : 40730 + master-db-mysql:0 : 3601 * Node db2: + IP: 10.100.1.32 Failed actions: db-mysql:0_monitor_3 (node=db2, call=58, rc=5, status=complete): not installed I checked the logs, and could not find a reason why the slave at db2 is not started. Any IDEA Anyone ? Thanks, Attila ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Slave does not start after failover: Mysql circular replication and master-slave resources
Hello Attila, ... see below ... On 12/15/2011 02:42 PM, Attila Megyeri wrote: > Hi All, > > > > Some time ago I exchanged a couple of posts with you here regarding > Mysql active-active HA. > > The best solution I found so far was the Mysql multi-master > replication, also referred to as circular replication. > > > > Basically I set up two nodes, both were capable of the master role, and > the changes were immediately propagated to the other node. > > > > But still I wanted to have a M/S approach, to have a RW master and a RO > slave – mainly because I prefer to have a signle master VIP where my > apps can connect to. > > > > (In the first approach I configured a two node clone, and the master IP > was always bound to one of the nodes) > > > > I applied the following configuration: > > > > node db1 \ > > attributes IP="10.100.1.31" \ > > attributes standby="off" > db2-log-file-db-mysql="mysql-bin.21" db2-log-pos-db-mysql="40730" > > node db2 \ > > attributes IP="10.100.1.32" \ > > attributes standby="off" > > primitive db-ip-master ocf:heartbeat:IPaddr2 \ > > params lvs_support="true" ip="10.100.1.30" cidr_netmask="8" > broadcast="10.255.255.255" \ > > op monitor interval="20s" timeout="20s" \ > > meta target-role="Started" > > primitive db-mysql ocf:heartbeat:mysql \ > > params binary="/usr/bin/mysqld_safe" config="/etc/mysql/my.cnf" > datadir="/var/lib/mysql" user="mysql" pid="/var/run/mysqld/mysqld.pid" > socket="/var/run/mysqld/mysqld.sock" test_passwd="X" > > test_table="replicatest.connectioncheck" test_user="slave_user" > replication_user="slave_user" replication_passwd="X" > additional_parameters="--skip-slave-start" \ > > op start interval="0" timeout="120s" \ > > op stop interval="0" timeout="120s" \ > > op monitor interval="30" timeout="30s" OCF_CHECK_LEVEL="1" \ > > op promote interval="0" timeout="120" \ > > op demote interval="0" timeout="120" > > ms db-ms-mysql db-mysql \ > > meta notify="true" master-max="1" clone-max="2" > target-role="Started" > > colocation db-ip-with-master inf: db-ip-master db-ms-mysql:Master > > property $id="cib-bootstrap-options" \ > > dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ > > cluster-infrastructure="openais" \ > > expected-quorum-votes="2" \ > > stonith-enabled="false" \ > > no-quorum-policy="ignore" > > rsc_defaults $id="rsc-options" \ > > resource-stickiness="0" > > > > > > The setup works in the basic conditions: > > · After the “first” startup, nodes start up as slaves, and > shortly after, one of them is promoted to master. > > · Updates to the master are replicated properly to the slave. > > · Slave accepts updates, which is Wrong, but I can live with > this – I will allow connect to the Master VIP only. > > · If I stop the slave for some time, and re-start it, it will > catch up with the master shortly and get into sync. > > > > I have, however a serious issue: > > · If I stop the current master, the slave is promoted, accepts > RW queries, the Master IP is bound to it – ALL fine. > > · BUT – when I want to bring the other node online, it simply > shows: Stopped (not installed) > > > > Online: [ db1 db2 ] > > > > db-ip-master(ocf::heartbeat:IPaddr2): Started db1 > > Master/Slave Set: db-ms-mysql [db-mysql] > > Masters: [ db1 ] > > Stopped: [ db-mysql:1 ] > > > > Node Attributes: > > * Node db1: > > + IP: 10.100.1.31 > > + db2-log-file-db-mysql : mysql-bin.21 > > + db2-log-pos-db-mysql : 40730 > > + master-db-mysql:0 : 3601 > > * Node db2: > > + IP: 10.100.1.32 > > > > Failed actions: > > db-mysql:0_monitor_3 (node=db2, call=58, rc=5, status=complete): > not installed > Looking at the RA (latest from git) I'd say the problem is somewhere in the check_slave() function. Either the check for replication errors or for a too high slave lag ... though on both errors you should see the log. entries. Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now > > > > > I checked the logs, and could not find a reason why the slave at db2 is > not started. > > Any IDEA Anyone ? > > > > > > Thanks, > > Attila > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org signature.asc Description: OpenPGP digital signature ___ Pa
Re: [Pacemaker] Slave does not start after failover: Mysql circular replication and master-slave resources
Hi Andreas, The slave lag cannot be high, as the slave was restarted within 1-2 mins and there are no active users on the system yet. I did not find anything at all in the logs. I will doublecheck if the RA is the latest. Thanks, Attila -Original Message- From: Andreas Kurz [mailto:andr...@hastexo.com] Sent: 2011. december 16. 1:50 To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] Slave does not start after failover: Mysql circular replication and master-slave resources Hello Attila, ... see below ... On 12/15/2011 02:42 PM, Attila Megyeri wrote: > Hi All, > > > > Some time ago I exchanged a couple of posts with you here regarding > Mysql active-active HA. > > The best solution I found so far was the Mysql multi-master > replication, also referred to as circular replication. > > > > Basically I set up two nodes, both were capable of the master role, > and the changes were immediately propagated to the other node. > > > > But still I wanted to have a M/S approach, to have a RW master and a > RO slave - mainly because I prefer to have a signle master VIP where > my apps can connect to. > > > > (In the first approach I configured a two node clone, and the master > IP was always bound to one of the nodes) > > > > I applied the following configuration: > > > > node db1 \ > > attributes IP="10.100.1.31" \ > > attributes standby="off" > db2-log-file-db-mysql="mysql-bin.21" db2-log-pos-db-mysql="40730" > > node db2 \ > > attributes IP="10.100.1.32" \ > > attributes standby="off" > > primitive db-ip-master ocf:heartbeat:IPaddr2 \ > > params lvs_support="true" ip="10.100.1.30" cidr_netmask="8" > broadcast="10.255.255.255" \ > > op monitor interval="20s" timeout="20s" \ > > meta target-role="Started" > > primitive db-mysql ocf:heartbeat:mysql \ > > params binary="/usr/bin/mysqld_safe" config="/etc/mysql/my.cnf" > datadir="/var/lib/mysql" user="mysql" pid="/var/run/mysqld/mysqld.pid" > socket="/var/run/mysqld/mysqld.sock" test_passwd="X" > > test_table="replicatest.connectioncheck" test_user="slave_user" > replication_user="slave_user" replication_passwd="X" > additional_parameters="--skip-slave-start" \ > > op start interval="0" timeout="120s" \ > > op stop interval="0" timeout="120s" \ > > op monitor interval="30" timeout="30s" OCF_CHECK_LEVEL="1" \ > > op promote interval="0" timeout="120" \ > > op demote interval="0" timeout="120" > > ms db-ms-mysql db-mysql \ > > meta notify="true" master-max="1" clone-max="2" > target-role="Started" > > colocation db-ip-with-master inf: db-ip-master db-ms-mysql:Master > > property $id="cib-bootstrap-options" \ > > dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ > > cluster-infrastructure="openais" \ > > expected-quorum-votes="2" \ > > stonith-enabled="false" \ > > no-quorum-policy="ignore" > > rsc_defaults $id="rsc-options" \ > > resource-stickiness="0" > > > > > > The setup works in the basic conditions: > > * After the "first" startup, nodes start up as slaves, and > shortly after, one of them is promoted to master. > > * Updates to the master are replicated properly to the slave. > > * Slave accepts updates, which is Wrong, but I can live with > this - I will allow connect to the Master VIP only. > > * If I stop the slave for some time, and re-start it, it will > catch up with the master shortly and get into sync. > > > > I have, however a serious issue: > > * If I stop the current master, the slave is promoted, accepts > RW queries, the Master IP is bound to it - ALL fine. > > * BUT - when I want to bring the other node online, it simply > shows: Stopped (not installed) > > > > Online: [ db1 db2 ] > > > > db-ip-master(ocf::heartbeat:IPaddr2): Started db1 > > Master/Slave Set: db-ms-mysql [db-mysql] > > Masters: [ db1 ] > &g