[Pacemaker] Slave does not start after failover: Mysql circular replication and master-slave resources

2011-12-15 Thread Attila Megyeri
Hi All,

Some time ago I exchanged a couple of posts with you here regarding Mysql 
active-active HA.
The best solution I found so  far was the Mysql multi-master replication, also 
referred to as circular replication.

Basically I set up two nodes, both were capable of the master role, and the 
changes were immediately propagated to the other node.

But still I wanted to have a M/S approach, to have a RW master and a RO slave - 
mainly because I prefer to have a signle master VIP where my apps can connect 
to.

(In the first approach I configured a two node clone, and the master IP was 
always bound to one of the nodes)

I applied the following configuration:

node db1 \
attributes IP="10.100.1.31" \
attributes standby="off" db2-log-file-db-mysql="mysql-bin.21" 
db2-log-pos-db-mysql="40730"
node db2 \
attributes IP="10.100.1.32" \
attributes standby="off"
primitive db-ip-master ocf:heartbeat:IPaddr2 \
params lvs_support="true" ip="10.100.1.30" cidr_netmask="8" 
broadcast="10.255.255.255" \
op monitor interval="20s" timeout="20s" \
meta target-role="Started"
primitive db-mysql ocf:heartbeat:mysql \
params binary="/usr/bin/mysqld_safe" config="/etc/mysql/my.cnf" 
datadir="/var/lib/mysql" user="mysql" pid="/var/run/mysqld/mysqld.pid" 
socket="/var/run/mysqld/mysqld.sock" test_passwd="X"
test_table="replicatest.connectioncheck" test_user="slave_user" 
replication_user="slave_user" replication_passwd="X" 
additional_parameters="--skip-slave-start" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s" \
op monitor interval="30" timeout="30s" OCF_CHECK_LEVEL="1" \
op promote interval="0" timeout="120" \
op demote interval="0" timeout="120"
ms db-ms-mysql db-mysql \
meta notify="true" master-max="1" clone-max="2" target-role="Started"
colocation db-ip-with-master inf: db-ip-master db-ms-mysql:Master
property $id="cib-bootstrap-options" \
dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="0"


The setup works in the basic conditions:

* After the "first" startup, nodes start up as slaves, and shortly 
after, one of them is promoted to master.

* Updates to the master are replicated properly to the slave.

* Slave accepts updates, which is Wrong, but I can live with this - I 
will allow connect to the Master VIP only.

* If I stop the slave for some time, and re-start it, it will catch up 
with the master shortly and get into sync.

I have, however a serious issue:

* If I stop the current master, the slave is promoted, accepts RW 
queries, the Master IP is bound to it - ALL fine.

* BUT - when I want to bring the other node online, it simply shows: 
Stopped (not installed)

Online: [ db1 db2 ]

db-ip-master(ocf::heartbeat:IPaddr2):   Started db1
Master/Slave Set: db-ms-mysql [db-mysql]
 Masters: [ db1 ]
 Stopped: [ db-mysql:1 ]

Node Attributes:
* Node db1:
+ IP: 10.100.1.31
+ db2-log-file-db-mysql : mysql-bin.21
+ db2-log-pos-db-mysql  : 40730
+ master-db-mysql:0 : 3601
* Node db2:
+ IP: 10.100.1.32

Failed actions:
db-mysql:0_monitor_3 (node=db2, call=58, rc=5, status=complete): not 
installed


I checked the logs, and could not find a reason why the slave at db2 is not 
started.
Any IDEA Anyone ?


Thanks,
Attila
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Slave does not start after failover: Mysql circular replication and master-slave resources

2011-12-15 Thread Andreas Kurz
Hello Attila,

... see below ...

On 12/15/2011 02:42 PM, Attila Megyeri wrote:
> Hi All,
> 
>  
> 
> Some time ago I exchanged a couple of posts with you here regarding
> Mysql active-active HA.
> 
> The best solution I found so  far was the Mysql multi-master
> replication, also referred to as circular replication.
> 
>  
> 
> Basically I set up two nodes, both were capable of the master role, and
> the changes were immediately propagated to the other node.
> 
>  
> 
> But still I wanted to have a M/S approach, to have a RW master and a RO
> slave – mainly because I prefer to have a signle master VIP where my
> apps can connect to.
> 
>  
> 
> (In the first approach I configured a two node clone, and the master IP
> was always bound to one of the nodes)
> 
>  
> 
> I applied the following configuration:
> 
>  
> 
> node db1 \
> 
> attributes IP="10.100.1.31" \
> 
> attributes standby="off"
> db2-log-file-db-mysql="mysql-bin.21" db2-log-pos-db-mysql="40730"
> 
> node db2 \
> 
> attributes IP="10.100.1.32" \
> 
> attributes standby="off"
> 
> primitive db-ip-master ocf:heartbeat:IPaddr2 \
> 
> params lvs_support="true" ip="10.100.1.30" cidr_netmask="8"
> broadcast="10.255.255.255" \
> 
> op monitor interval="20s" timeout="20s" \
> 
> meta target-role="Started"
> 
> primitive db-mysql ocf:heartbeat:mysql \
> 
> params binary="/usr/bin/mysqld_safe" config="/etc/mysql/my.cnf"
> datadir="/var/lib/mysql" user="mysql" pid="/var/run/mysqld/mysqld.pid"
> socket="/var/run/mysqld/mysqld.sock" test_passwd="X"
> 
> test_table="replicatest.connectioncheck" test_user="slave_user"
> replication_user="slave_user" replication_passwd="X"
> additional_parameters="--skip-slave-start" \
> 
> op start interval="0" timeout="120s" \
> 
> op stop interval="0" timeout="120s" \
> 
> op monitor interval="30" timeout="30s" OCF_CHECK_LEVEL="1" \
> 
> op promote interval="0" timeout="120" \
> 
> op demote interval="0" timeout="120"
> 
> ms db-ms-mysql db-mysql \
> 
> meta notify="true" master-max="1" clone-max="2"
> target-role="Started"
> 
> colocation db-ip-with-master inf: db-ip-master db-ms-mysql:Master
> 
> property $id="cib-bootstrap-options" \
> 
> dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
> 
> cluster-infrastructure="openais" \
> 
> expected-quorum-votes="2" \
> 
> stonith-enabled="false" \
> 
> no-quorum-policy="ignore"
> 
> rsc_defaults $id="rsc-options" \
> 
> resource-stickiness="0"
> 
>  
> 
>  
> 
> The setup works in the basic conditions:
> 
> · After the “first” startup, nodes start up as slaves, and
> shortly after, one of them is promoted to master.
> 
> · Updates to the master are replicated properly to the slave.
> 
> · Slave accepts updates, which is Wrong, but I can live with
> this – I will allow connect to the Master VIP only.
> 
> · If I stop the slave for some time, and re-start it, it will
> catch up with the master shortly and get into sync.
> 
>  
> 
> I have, however a serious issue:
> 
> · If I stop the current master, the slave is promoted, accepts
> RW queries, the Master IP is bound to it – ALL fine.
> 
> · BUT – when I want to bring the other node online, it simply
> shows: Stopped (not installed)
> 
>  
> 
> Online: [ db1 db2 ]
> 
>  
> 
> db-ip-master(ocf::heartbeat:IPaddr2):   Started db1
> 
> Master/Slave Set: db-ms-mysql [db-mysql]
> 
>  Masters: [ db1 ]
> 
>  Stopped: [ db-mysql:1 ]
> 
>  
> 
> Node Attributes:
> 
> * Node db1:
> 
> + IP: 10.100.1.31
> 
> + db2-log-file-db-mysql : mysql-bin.21
> 
> + db2-log-pos-db-mysql  : 40730
> 
> + master-db-mysql:0 : 3601
> 
> * Node db2:
> 
> + IP: 10.100.1.32
> 
>  
> 
> Failed actions:
> 
> db-mysql:0_monitor_3 (node=db2, call=58, rc=5, status=complete):
> not installed
> 

Looking at the RA (latest from git) I'd say the problem is somewhere in
the check_slave() function. Either the check for replication errors or
for a too high slave lag ... though on both errors you should see the
log. entries.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now


>  
> 
>  
> 
> I checked the logs, and could not find a reason why the slave at db2 is
> not started.
> 
> Any IDEA Anyone ?
> 
>  
> 
>  
> 
> Thanks,
> 
> Attila
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




signature.asc
Description: OpenPGP digital signature
___
Pa

Re: [Pacemaker] Slave does not start after failover: Mysql circular replication and master-slave resources

2011-12-16 Thread Attila Megyeri
Hi Andreas,

The slave lag cannot be high, as the slave was restarted within 1-2 mins and 
there are no active users on the system yet.
I did not find anything at all in the logs.

I will doublecheck if the RA is the latest.

Thanks,

Attila


-Original Message-
From: Andreas Kurz [mailto:andr...@hastexo.com] 
Sent: 2011. december 16. 1:50
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] Slave does not start after failover: Mysql circular 
replication and master-slave resources

Hello Attila,

... see below ...

On 12/15/2011 02:42 PM, Attila Megyeri wrote:
> Hi All,
> 
>  
> 
> Some time ago I exchanged a couple of posts with you here regarding 
> Mysql active-active HA.
> 
> The best solution I found so  far was the Mysql multi-master 
> replication, also referred to as circular replication.
> 
>  
> 
> Basically I set up two nodes, both were capable of the master role, 
> and the changes were immediately propagated to the other node.
> 
>  
> 
> But still I wanted to have a M/S approach, to have a RW master and a 
> RO slave - mainly because I prefer to have a signle master VIP where 
> my apps can connect to.
> 
>  
> 
> (In the first approach I configured a two node clone, and the master 
> IP was always bound to one of the nodes)
> 
>  
> 
> I applied the following configuration:
> 
>  
> 
> node db1 \
> 
> attributes IP="10.100.1.31" \
> 
> attributes standby="off"
> db2-log-file-db-mysql="mysql-bin.21" db2-log-pos-db-mysql="40730"
> 
> node db2 \
> 
> attributes IP="10.100.1.32" \
> 
> attributes standby="off"
> 
> primitive db-ip-master ocf:heartbeat:IPaddr2 \
> 
> params lvs_support="true" ip="10.100.1.30" cidr_netmask="8"
> broadcast="10.255.255.255" \
> 
> op monitor interval="20s" timeout="20s" \
> 
> meta target-role="Started"
> 
> primitive db-mysql ocf:heartbeat:mysql \
> 
> params binary="/usr/bin/mysqld_safe" config="/etc/mysql/my.cnf"
> datadir="/var/lib/mysql" user="mysql" pid="/var/run/mysqld/mysqld.pid"
> socket="/var/run/mysqld/mysqld.sock" test_passwd="X"
> 
> test_table="replicatest.connectioncheck" test_user="slave_user"
> replication_user="slave_user" replication_passwd="X"
> additional_parameters="--skip-slave-start" \
> 
> op start interval="0" timeout="120s" \
> 
> op stop interval="0" timeout="120s" \
> 
> op monitor interval="30" timeout="30s" OCF_CHECK_LEVEL="1" \
> 
> op promote interval="0" timeout="120" \
> 
> op demote interval="0" timeout="120"
> 
> ms db-ms-mysql db-mysql \
> 
> meta notify="true" master-max="1" clone-max="2"
> target-role="Started"
> 
> colocation db-ip-with-master inf: db-ip-master db-ms-mysql:Master
> 
> property $id="cib-bootstrap-options" \
> 
> dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
> 
> cluster-infrastructure="openais" \
> 
> expected-quorum-votes="2" \
> 
> stonith-enabled="false" \
> 
> no-quorum-policy="ignore"
> 
> rsc_defaults $id="rsc-options" \
> 
> resource-stickiness="0"
> 
>  
> 
>  
> 
> The setup works in the basic conditions:
> 
> * After the "first" startup, nodes start up as slaves, and
> shortly after, one of them is promoted to master.
> 
> * Updates to the master are replicated properly to the slave.
> 
> * Slave accepts updates, which is Wrong, but I can live with
> this - I will allow connect to the Master VIP only.
> 
> * If I stop the slave for some time, and re-start it, it will
> catch up with the master shortly and get into sync.
> 
>  
> 
> I have, however a serious issue:
> 
> * If I stop the current master, the slave is promoted, accepts
> RW queries, the Master IP is bound to it - ALL fine.
> 
> * BUT - when I want to bring the other node online, it simply
> shows: Stopped (not installed)
> 
>  
> 
> Online: [ db1 db2 ]
> 
>  
> 
> db-ip-master(ocf::heartbeat:IPaddr2):   Started db1
> 
> Master/Slave Set: db-ms-mysql [db-mysql]
> 
>  Masters: [ db1 ]
> 
&g