Re: [Linux-ha-dev] Patch to mysql RA for replication

2012-05-01 Thread Yves Trudeau
Hi Keisuke,
it was half implemented, I went to the least modifications and I 
admit I was not aware of this impact. I can add it back.

Regards,

Yves

Le 2012-05-01 01:35, Keisuke MORI a écrit :
 Hi Yves,

 2012/4/19 Yves Trudeauy.trud...@videotron.ca:
 - cleanup loglevel

 Why did you remove all the loglevel stuff away?
 Was there anything wrong with that?

 After your patch, the RA will generate inappropriate ERROR logs
 whenever it starts/stops/probes even though they're all _expected_
 results and nothing to worry about.

 It's confusing for users and we have been trying to eliminate such
 confusing ERROR logs as possible. The loglevel code is intended to use
 INFO level when it's an expected result, and to use ERROR level only
 when it's considered a failure.
 https://github.com/ClusterLabs/resource-agents/commit/72952904b67b85e1809f90255a55ce39eb2a8922

 I would like to revert them back.

 Thanks,

 Hi Dejan,
   here's my patch to the mysql agent in the commit version 4c18035. Sorry for
 being inept with git.

 Included here:

 - attribute for replication_info
 - put in a variable error code 1040
 - put in a variable the long call to crm_attribute for replication_info
 - cleanup loglevel
 - defined a value for DEBUG_LOG

 Like I wrote before, I didn't find any solution yet to remove the IP
 attribute for each node.  Using a replication_VIP breaks the operation of
 the agent as it removes the easy way to add new nodes (or rejoin)

 Regards,

 Yves

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch to mysql RA for replication

2012-04-30 Thread Keisuke MORI
Hi Yves,

2012/4/19 Yves Trudeau y.trud...@videotron.ca:
 - cleanup loglevel

Why did you remove all the loglevel stuff away?
Was there anything wrong with that?

After your patch, the RA will generate inappropriate ERROR logs
whenever it starts/stops/probes even though they're all _expected_
results and nothing to worry about.

It's confusing for users and we have been trying to eliminate such
confusing ERROR logs as possible. The loglevel code is intended to use
INFO level when it's an expected result, and to use ERROR level only
when it's considered a failure.
https://github.com/ClusterLabs/resource-agents/commit/72952904b67b85e1809f90255a55ce39eb2a8922

I would like to revert them back.

Thanks,

 Hi Dejan,
  here's my patch to the mysql agent in the commit version 4c18035. Sorry for
 being inept with git.

 Included here:

 - attribute for replication_info
 - put in a variable error code 1040
 - put in a variable the long call to crm_attribute for replication_info
 - cleanup loglevel
 - defined a value for DEBUG_LOG

 Like I wrote before, I didn't find any solution yet to remove the IP
 attribute for each node.  Using a replication_VIP breaks the operation of
 the agent as it removes the easy way to add new nodes (or rejoin)

 Regards,

 Yves

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Patch to mysql RA for replication

2012-04-18 Thread Yves Trudeau

Hi Dejan,
  here's my patch to the mysql agent in the commit version 4c18035. 
Sorry for being inept with git.


Included here:

- attribute for replication_info
- put in a variable error code 1040
- put in a variable the long call to crm_attribute for replication_info
- cleanup loglevel
- defined a value for DEBUG_LOG

Like I wrote before, I didn't find any solution yet to remove the IP 
attribute for each node.  Using a replication_VIP breaks the operation 
of the agent as it removes the easy way to add new nodes (or rejoin)


Regards,

Yves
--- mysql.old   2012-04-13 03:19:42.058422681 -0400
+++ mysql   2012-04-18 18:16:46.898421243 -0400
@@ -79,6 +79,7 @@
 OCF_RESKEY_max_slave_lag_default=3600
 OCF_RESKEY_evict_outdated_slaves_default=false
 OCF_RESKEY_reader_attribute_default=readable
+OCF_RESKEY_replication_info_attribute_default=replication_info
 
 : ${OCF_RESKEY_binary=${OCF_RESKEY_binary_default}}
 MYSQL_BINDIR=`dirname ${OCF_RESKEY_binary}`
@@ -109,7 +110,8 @@
 : ${OCF_RESKEY_max_slave_lag=${OCF_RESKEY_max_slave_lag_default}}
 : 
${OCF_RESKEY_evict_outdated_slaves=${OCF_RESKEY_evict_outdated_slaves_default}}
 
-: ${OCF_RESKEY_reader_attribute=${OCF_RESKEY_evict_reader_attribute_default}}
+: ${OCF_RESKEY_reader_attribute=${OCF_RESKEY_reader_attribute_default}}
+: 
${OCF_RESKEY_replication_info_attribute=${OCF_RESKEY_replication_info_attribute_default}}
 
 ###
 
@@ -328,7 +330,19 @@
 /longdesc
 shortdesc lang=enSets the node attribute that determines
 whether a node is usable for clients to read from./shortdesc
-content type=boolean default=${OCF_RESKEY_reader_attribute_default} /
+content type=string default=${OCF_RESKEY_reader_attribute_default} /
+/parameter
+
+parameter name=replication_info_attribute unique=1 required=0
+longdesc lang=en
+An attribute that stores the current master IP, replication file and position. 
+This is queried by the agent in the post-promote notification
+to reconnect the slaves to the new master.
+
+This parameter is only meaningful in master/slave set configurations.
+/longdesc
+shortdesc lang=enCluster attribute storing replication 
information/shortdesc
+content type=string 
default=${OCF_RESKEY_replication_info_attribute_default} /
 /parameter
 /parameters
 
@@ -355,10 +369,12 @@
 MYSQL_OPTIONS_LOCAL=-S $OCF_RESKEY_socket --connect_timeout=10
 MYSQL_OPTIONS_REPL=$MYSQL_OPTIONS_LOCAL --user=$OCF_RESKEY_replication_user 
--password=$OCF_RESKEY_replication_passwd
 MYSQL_OPTIONS_TEST=$MYSQL_OPTIONS_LOCAL --user=$OCF_RESKEY_test_user 
--password=$OCF_RESKEY_test_passwd
+MYSQL_TOO_MANY_CONN_ERR=1040
 
 CRM_MASTER=${HA_SBIN_DIR}/crm_master -l reboot 
 HOSTNAME=`uname -n`
 CRM_ATTR=${HA_SBIN_DIR}/crm_attribute -N $HOSTNAME 
+CRM_ATTR_REPL_INFO=${HA_SBIN_DIR}/crm_attribute --type crm_config --name 
${OCF_RESKEY_replication_info_attribute} -s mysql_replication --query  -q
 INSTANCE_ATTR_NAME=`echo ${OCF_RESOURCE_INSTANCE}| awk -F : '{print $1}'`
 
 ###
@@ -468,7 +484,7 @@
 
 if [ $rc -eq 0 ]; then
 # Did we receive an error other than max_connections?
-if [ $last_errno -ne 0 -a $last_errno -ne 1040 ]; then
+if [ $last_errno -ne 0 -a $last_errno -ne $MYSQL_TOO_MANY_CONN_ERR 
]; then
 # Whoa. Replication ran into an error. This slave has
 # diverged from its master. Make sure this resource
 # doesn't restart in place.
@@ -484,7 +500,7 @@
 fi
 
 # If we got max_connections, let's remove the vip
-if [ $last_errno -eq 1040 ]; then
+if [ $last_errno -eq $MYSQL_TOO_MANY_CONN_ERR ]; then
 set_reader_attr 0
 exit $OCF_SUCCESS
 fi
@@ -496,7 +512,7 @@
 ocf_log warn MySQL Slave IO threads currently not running.
 
 # Sanity check, are we at least on the right master
-new_master_IP=`${HA_SBIN_DIR}/crm_attribute --type crm_config 
--name replication_info -s mysql_replication --query  -q | cut -d'|' -f1`
+new_master_IP=`$CRM_ATTR_REPL_INFO | cut -d'|' -f1`
 
 if [ $master_host != $new_master_IP ]; then
# Not pointing to the right master, not good, removing the VIPs
@@ -573,7 +589,7 @@
 local new_master_IP master_log_file master_log_pos
 local master_params
 
-new_master_IP=`${HA_SBIN_DIR}/crm_attribute --type crm_config --name 
replication_info -s mysql_replication --query  -q | cut -d'|' -f1`
+new_master_IP=`$CRM_ATTR_REPL_INFO | cut -d'|' -f1`
 
 # Keep replication position
 get_slave_info
@@ -585,8 +601,8 @@
 rm -f $tmpfile
 return
 else
-master_log_file=`${HA_SBIN_DIR}/crm_attribute --type crm_config --name 
replication_info -s mysql_replication --query  -q | cut -d'|' -f2`
-master_log_pos=`${HA_SBIN_DIR}/crm_attribute --type crm_config --name 
replication_info -s mysql_replication 

Re: [Linux-ha-dev] Patch for mysql RA

2010-11-07 Thread Marek Marczykowski
On 25.10.2010 03:10, Marek Marczykowski wrote:
 On 19.08.2010 21:34, Florian Haas wrote:
 Marek,

 I've finally found time to look into this. Sorry about the delay. So to
 recap, based on the patches list in
 http://marmarek.w.staszic.waw.pl/patches/ha-mysql-ra/,
 
 I've also troubles to find some free time :/
 
 05 has gone in but I think it ought to be reverted and replaced with a
 change in functionality, not documentation. Why not check whether the
 resource is configured as a M/S, and if yes, actually _start_ mysqld
 with --skip-slave-start rather than expecting the user to add this to
 the config?
 
 Fixed, new patches attached and also on website. I also think on passing
 --read_only option instead of starting in read-write mode and setting
 read_only right after start (when in M/S of course). What do you think?
 
 06 has not gone in, but I'm generally OK with it. But, please, START
 SLAVE, not SLAVE START. And on a different style note, no reason for
 trailing semicolons after ocf_log and return.
 
 Fixed.
 
 08 has not gone in. It's nice but I hate the way it's implemented with a
 state file. Why not use crm_attribute and stick transient attributes
 onto nodes? If we could get that patch rewritten to use transient node
 attributes I'd like to see this go in. But here too: STOP SLAVE
 please, not SLAVE STOP.
 
 Changed. I've used persistent node attributes to keep replication state
 even on reboot.
 
 09: not in, comments on 06 and 05 apply here too.
 
 Fixed (this semicolons was from patch 06...).
 
 I've made some new patches:
 
 10_mysql-ra-monitor-ms-get-ro-state.patch: In monitor action, check if
 this instance is running as master based on read_only mysql variable.
 It's better than CRM variables because represent real state, not the
 desirable one.
 
 11_mysql-ra-use-monitor-to-check-start.patch: Call detailed
 (OCF_CHECK_LEVEL=10) monitor action to check if mysql is really working
 (in start action). It helps when database is broken (and automatic
 recovery failed) - then do not try to restart it - fail immediately.


I've written some additional patches (mostly changes in my previous code):
12_mysql-ra-replication-fail-code.patch: fix error code indicating
failed replication

13_mysql-ra-repl-info-loglevel.patch: log replication state with 'info'
loglevel instead of 'debug'

14_mysql-ra-repl-state-forget-after-start-slave.patch: forget
replication slave only on START SLAVE, not just CHANGE MASTER TO

15_mysql-ra-replication-state-for-more-than-two-nodes.patch: store
replication state in separate attributes for each master (2 node support)

Everything on web: http://marmarek.w.staszic.waw.pl/patches/ha-mysql-ra/

-- 
Best Regards,
Marek Marczykowski  |   gg:2873965  | RLU #390519
marmarek at staszic waw pl  | xmpp:marmarek at staszic waw pl



smime.p7s
Description: S/MIME Cryptographic Signature
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch for mysql RA

2010-08-19 Thread Florian Haas
Marek,

I've finally found time to look into this. Sorry about the delay. So to
recap, based on the patches list in
http://marmarek.w.staszic.waw.pl/patches/ha-mysql-ra/,

01 has gone in.

02 has gone in.

03 has gone in.

04 has gone in.

05 has gone in but I think it ought to be reverted and replaced with a
change in functionality, not documentation. Why not check whether the
resource is configured as a M/S, and if yes, actually _start_ mysqld
with --skip-slave-start rather than expecting the user to add this to
the config?

06 has not gone in, but I'm generally OK with it. But, please, START
SLAVE, not SLAVE START. And on a different style note, no reason for
trailing semicolons after ocf_log and return.

07 has gone in.

08 has not gone in. It's nice but I hate the way it's implemented with a
state file. Why not use crm_attribute and stick transient attributes
onto nodes? If we could get that patch rewritten to use transient node
attributes I'd like to see this go in. But here too: STOP SLAVE
please, not SLAVE STOP.

09: not in, comments on 06 and 05 apply here too.

What are your thoughts on this?

Cheers,
Florian



signature.asc
Description: OpenPGP digital signature
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch for mysql RA

2010-07-27 Thread Marek Marczykowski
On Fri, Jul 02, 2010 at 12:16:03AM +0200, Marek Marczykowski wrote:
 On Thu, Jul 01, 2010 at 05:42:00PM +0200, Dejan Muhamedagic wrote:
  On Tue, Jun 29, 2010 at 06:53:23PM +0200, Marek Marczykowski wrote:
   --- mysql-05  2010-06-29 18:23:43.806105971 +0200
   +++ mysql 2010-06-29 18:23:55.882104463 +0200
   @@ -730,6 +730,19 @@
 # don't know what master to replicate from), we simply start
 # in read only mode.
 set_read_only on
   +
   + master_host=`echo $OCF_RESKEY_CRM_meta_notify_master_uname|tr -d  `
   + if [ $master_host -a $master_host != `uname -n` ]; then
   + ocf_log info Changing MySQL configuration to replicate from 
   $master_host.
   + set_master $master_host
   + ocf_run $MYSQL $MYSQL_OPTIONS_LOCAL $MYSQL_OPTIONS_REPL \
   + -e SLAVE START
   + if [ $? -ne 0 ]; then
   + ocf_log err Failed to start slave;
   + return $OCF_ERR_GENERIC;
   + fi
   + fi
   +
 # We also need to set a master preference, otherwise Pacemaker
 # won't ever promote us in the absence of any explicit
 # preference set by the administrator. We choose a low
  
  This part I don't understand. set_master does:
  
  ocf_run $MYSQL $MYSQL_OPTIONS_LOCAL $MYSQL_OPTIONS_REPL ...
 
 It does only CHANGE MASTER TO ... - set from which master replicate, but
 do not start it. When called from pre-promote, we start replication from
 post-promote, so this function cannot START SLAVE.
 
  and then you have the same thing repeated afterwards:
  
   + ocf_run $MYSQL $MYSQL_OPTIONS_LOCAL $MYSQL_OPTIONS_REPL ...
   + -e SLAVE START
  
  Otherwise, set_master is invoked in pre-promote(). This patch
  invokes it from the start operation. Is that a duplicate?
 
 No. This is used on new slave start when master already exists. In this
 situation it doesn't receive pre-promote notification. Or maybe I'm
 wrong?
 
  I also don't understand why that function is called set_master
  when it's all about the slave replication.
 
 This function point slave what master replicate from.

In addition to patch above ([1]) it is also needed to disable replication
after mysql start when there is no master node (only slaves). Without
this monitor action fails (replication started, but not connected to any
master). Patch attached (and also available in [2]).

From comment in mysql script:
# Since we can't start as a MySQL slave (we
# don't know what master to replicate from), we simply start
# in read only mode.

We know master (if any exists). It is available in
$OCF_RESKEY_CRM_meta_notify_master_uname and this patch ([1]) use this
knowledge.

[1]
http://marmarek.w.staszic.waw.pl/patches/ha-mysql-ra/06_mysql-ra-slave-start-replication.patch
[2]
http://marmarek.w.staszic.waw.pl/patches/ha-mysql-ra/09_mysql-ra-disable-slave-on-no-master.patch

-- 
Best Regards,
Marek Marczykowski  |   gg:2873965  | RLU #390519
marmarek at staszic waw pl  | xmpp:marmarek at staszic waw pl

--- /usr/lib/ocf/resource.d/heartbeat/mysql-repl.orig   2010-07-20 
04:32:01.681369222 +0200
+++ /usr/lib/ocf/resource.d/heartbeat/mysql-repl2010-07-20 
04:33:06.374301483 +0200
@@ -802,6 +802,9 @@
ocf_log err Failed to start slave;
return $OCF_ERR_GENERIC;
fi
+   else 
+   ocf_log info No MySQL master present - clearing replication state
+   unset_master
fi
 
# We also need to set a master preference, otherwise Pacemaker


smime.p7s
Description: S/MIME cryptographic signature
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch for mysql RA

2010-07-01 Thread Marek Marczykowski
On Thu, Jul 01, 2010 at 05:42:00PM +0200, Dejan Muhamedagic wrote:
 On Tue, Jun 29, 2010 at 06:53:23PM +0200, Marek Marczykowski wrote:
  On Tue, Jun 29, 2010 at 05:41:08PM +0200, Dejan Muhamedagic wrote:
  --- mysql-022010-06-29 18:17:59.214106092 +0200
  +++ mysql   2010-06-29 18:18:26.238105571 +0200
  @@ -618,8 +618,13 @@
  fi
   fi
   
  -ocf_log info MySQL monitor succeeded;
  -return $OCF_SUCCESS
  +if [ $OCF_RESKEY_CRM_meta_role = Master ]; then
  +   ocf_log info MySQL monitor succeeded (master);
  +   return $OCF_RUNNING_MASTER
  +else
  +   ocf_log info MySQL monitor succeeded;
  +   return $OCF_SUCCESS
  +fi
   }
   
   mysql_start() {
 
 This seems to be rather serious. I wonder if the RA could've been
 used in MS mode at all.

I think it should also check on slave if it is connected to the master
(if any)... Calling is_slave when master exists should be enough. Or
just use condition if is_slave; then from beginning of this function.
I've added this to my todo-list...

   mysql_demote() {
  -set_read_only on || return $OCF_ERR_GENERIC
  +if ( ! mysql_status ); then
 
 () removed because they are unnecessary.
 
  +   return $OCF_NOT_RUNNING
  +fi
 
 This test is a bit more than logging change.

Yes, but basically only improve error reporting :)

  +
  +set_read_only on
  +if [ $? -ne 0 ]; then
  +   ocf_log err Failed to set read-only;
  +   return $OCF_ERR_GENERIC;
  +fi
   
   # Return master preference to default, so the cluster manager gets
   # a chance to select a new master
 
 OK.
 
  --- mysql-052010-06-29 18:23:43.806105971 +0200
  +++ mysql   2010-06-29 18:23:55.882104463 +0200
  @@ -730,6 +730,19 @@
  # don't know what master to replicate from), we simply start
  # in read only mode.
  set_read_only on
  +
  +   master_host=`echo $OCF_RESKEY_CRM_meta_notify_master_uname|tr -d  `
  +   if [ $master_host -a $master_host != `uname -n` ]; then
  +   ocf_log info Changing MySQL configuration to replicate from 
  $master_host.
  +   set_master $master_host
  +   ocf_run $MYSQL $MYSQL_OPTIONS_LOCAL $MYSQL_OPTIONS_REPL \
  +   -e SLAVE START
  +   if [ $? -ne 0 ]; then
  +   ocf_log err Failed to start slave;
  +   return $OCF_ERR_GENERIC;
  +   fi
  +   fi
  +
  # We also need to set a master preference, otherwise Pacemaker
  # won't ever promote us in the absence of any explicit
  # preference set by the administrator. We choose a low
 
 This part I don't understand. set_master does:
 
 ocf_run $MYSQL $MYSQL_OPTIONS_LOCAL $MYSQL_OPTIONS_REPL ...

It does only CHANGE MASTER TO ... - set from which master replicate, but
do not start it. When called from pre-promote, we start replication from
post-promote, so this function cannot START SLAVE.

 and then you have the same thing repeated afterwards:
 
  +   ocf_run $MYSQL $MYSQL_OPTIONS_LOCAL $MYSQL_OPTIONS_REPL ...
  +   -e SLAVE START
 
 Otherwise, set_master is invoked in pre-promote(). This patch
 invokes it from the start operation. Is that a duplicate?

No. This is used on new slave start when master already exists. In this
situation it doesn't receive pre-promote notification. Or maybe I'm
wrong?

 I also don't understand why that function is called set_master
 when it's all about the slave replication.

This function point slave what master replicate from.

  --- mysql-062010-06-29 18:24:48.166105436 +0200
  +++ mysql   2010-06-29 18:32:18.666101597 +0200
  @@ -371,7 +371,7 @@
   
   tmpfile=`mktemp ${HA_RSCTMP}/is_slave.${OCF_RESOURCE_INSTANCE}.XX`
   
  -mysql_options=$MYSQL_OPTIONS_LOCAL --user=$OCF_RESKEY_test_user 
  --password=$OCF_RESKEY_test_passwd
  +mysql_options=$MYSQL_OPTIONS_LOCAL 
  --user=$OCF_RESKEY_replication_user 
  --password=$OCF_RESKEY_replication_passwd
   
   $MYSQL $mysql_options \
   -e 'SHOW SLAVE STATUS\G'  $tmpfile
  @@ -396,7 +396,7 @@
   rc=1
   tmpfile=`mktemp 
  ${HA_RSCTMP}/check_slave.${OCF_RESOURCE_INSTANCE}.XX`
   
  -mysql_options=$MYSQL_OPTIONS_LOCAL --user=$OCF_RESKEY_test_user 
  --password=$OCF_RESKEY_test_passwd
  +mysql_options=$MYSQL_OPTIONS_LOCAL 
  --user=$OCF_RESKEY_replication_user 
  --password=$OCF_RESKEY_replication_passwd
   
   $MYSQL $mysql_options \
   -e 'SHOW SLAVE STATUS\G'  $tmpfile
 
 Wouldn't this work with the test_user as well?

This user need REPLICATION CLIENT rights. The replication_user must
have it to manipulate replication ;) but test_user can be less
powerful. The test_user should have access to test_table (and need only
SELECT right), so why force to give additional rights to him? In my case
I'm using user from application working on this database, so I don't
want give him rights to manipulate replication. Of course if you use
root in both cases it will work, but I don't like such configuration
on production servers.

  --- mysql 

Re: [Linux-ha-dev] Patch for mysql RA

2010-06-29 Thread Dejan Muhamedagic
Hi,

On Tue, Jun 29, 2010 at 12:45:47AM +0200, Marek Marczykowski wrote:
 Hello,
 
 I'm implementing some HA solution using pacemaker and I've made some
 changes to mysql RA. Maybe you get interested in some of them. Patch
 attached. List of changes:
  * [bugfix] monitor return $OCF_RUNNING_MASTER on master
  * [bugfix] slave info collected with replication user
  * [bugfix] cut ending space from OCF_* host lists
  * [doc] suggest --skip-slave-start option
  * [feature] detailed logging on errors
  * [feature] setup replication on late slave start
  * [feature] another concept of M/S replication - try to keep state

Could we please have this split into as many patches as there are
unrelated changes (looks like there should be 7). Otherwise it's
going to be difficult to see what's affected by which part of the
patch.

 Some explanation about the last one: in dual-node mysql setup there is
 no need to reset master after any topology change. You need to store
 last log_file and log_pos and when master demotes (or started as slave)
 it can continue from the last position (as slave - new master - was in
 read-only mode). This also helps to not lose some data, ex. in scenario:
  - slave maintenance shutdown 
  - some time later master reboot (yes - no mysql left for a moment)
- this resets master
  - slave startup
- replication starts from point at master reboot, loosing data from
   time after slave shutdown and before master reboot
 
 The main concept is to NOT use of RESET MASTER. When some positions
 desynchronises - it will need manual intervention. In original version
 you can not even notice that you miss some inserts in replicas!

That sounds quite bad.

 This concept works only in dual-node mysql setup (namely: when a slave
 have only one choose for master). In more-node setup it will need manual
 synchronization (or original version with RESET MASTER - with warning
 above).

OK. Anybody out there running mysql wants to comment?

Many thanks for sharing your work!

Cheers,

Dejan
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch for mysql RA

2010-06-29 Thread Marek Marczykowski
On Tue, Jun 29, 2010 at 05:41:08PM +0200, Dejan Muhamedagic wrote:
 Hi,

Hi,

 On Tue, Jun 29, 2010 at 12:45:47AM +0200, Marek Marczykowski wrote:
  I'm implementing some HA solution using pacemaker and I've made some
  changes to mysql RA. Maybe you get interested in some of them. Patch
  attached. List of changes:
   * [bugfix] monitor return $OCF_RUNNING_MASTER on master
   * [bugfix] slave info collected with replication user
   * [bugfix] cut ending space from OCF_* host lists
   * [doc] suggest --skip-slave-start option
   * [feature] detailed logging on errors
   * [feature] setup replication on late slave start
   * [feature] another concept of M/S replication - try to keep state
 
 Could we please have this split into as many patches as there are
 unrelated changes (looks like there should be 7). Otherwise it's
 going to be difficult to see what's affected by which part of the
 patch.

Ok, I've splitted it into 8 patches (some typo fix missing in the list
above). Patches attached and also uploaded here:
http://marmarek.w.staszic.waw.pl/patches/ha-mysql-ra

-- 
Best Regards,
Marek Marczykowski  |   gg:2873965  | RLU #390519
marmarek at staszic waw pl  | xmpp:marmarek at staszic waw pl

--- mysql.orig  2010-06-29 17:45:14.390077677 +0200
+++ mysql   2010-06-29 18:07:44.399141260 +0200
@@ -458,7 +458,7 @@
master_pref=$((${OCF_RESKEY_max_slave_lag}-${secs_behind}))
if [ $master_pref -lt 0 ]; then
# Sanitize a below-zero preference to just zero
-   $master_pref=0
+   master_pref=0
fi
$CRM_MASTER -v $master_pref
fi
--- mysql-012010-06-29 18:13:35.738106899 +0200
+++ mysql   2010-06-29 18:16:52.899101522 +0200
@@ -811,7 +811,7 @@
# connect to it and wait for it to start replicating.
local master_host
local master_status
-   master_host=$OCF_RESKEY_CRM_meta_notify_promote_uname
+   master_host=`echo $OCF_RESKEY_CRM_meta_notify_promote_uname|tr -d  
`
 
if ( ! mysql_status ); then
return $OCF_NOT_RUNNING
@@ -834,7 +834,8 @@
# The master has completed its promotion. Now is a good
# time to check whether our replication slave is working
# correctly.
-   if [ $OCF_RESKEY_CRM_meta_notify_promote_uname = `uname -n` ]; then
+   master_host=`echo $OCF_RESKEY_CRM_meta_notify_promote_uname|tr -d  
`
+   if [ $master_host = `uname -n` ]; then
ocf_log info Ignoring post-promote notification for my own 
promotion.
return $OCF_SUCCESS
fi
@@ -842,10 +843,12 @@
-e 'START SLAVE';
;;
'post-demote')
-   if [ $OCF_RESKEY_CRM_meta_notify_demote_uname = `uname -n` ]; then
+   demote_host=`echo $OCF_RESKEY_CRM_meta_notify_demote_uname|tr -d  
`
+   if [ $demote_host = `uname -n` ]; then
ocf_log info Ignoring post-demote notification for my own 
demotion.
return $OCF_SUCCESS
fi
+   ocf_log info post-demote notification for $demote_host.
# The former master has just been gracefully demoted.
unset_master
;;
--- mysql-022010-06-29 18:17:59.214106092 +0200
+++ mysql   2010-06-29 18:18:26.238105571 +0200
@@ -618,8 +618,13 @@
fi
 fi
 
-ocf_log info MySQL monitor succeeded;
-return $OCF_SUCCESS
+if [ $OCF_RESKEY_CRM_meta_role = Master ]; then
+   ocf_log info MySQL monitor succeeded (master);
+   return $OCF_RUNNING_MASTER
+else
+   ocf_log info MySQL monitor succeeded;
+   return $OCF_SUCCESS
+fi
 }
 
 mysql_start() {
--- mysql-032010-06-29 18:19:06.698076674 +0200
+++ mysql   2010-06-29 18:21:29.398105235 +0200
@@ -511,7 +511,12 @@
 # First, stop the slave I/O thread and wait for relay log
 # processing to complete
 ocf_run $MYSQL $mysql_options \
-   -e STOP SLAVE IO_THREAD || exit $OCF_ERR_GENERIC
+   -e STOP SLAVE IO_THREAD
+if [ $? -gt 0 ]; then
+   ocf_log err Error stopping slave IO thread
+   exit $OCF_ERR_GENERIC
+fi
+
 while true; do
$MYSQL $mysql_options \
-e 'SHOW PROCESSLIST\G'  $tmpfile
@@ -526,9 +531,18 @@
 
 # Now, stop all slave activity and unset the master host
 ocf_run $MYSQL $mysql_options \
-   -e STOP SLAVE || exit $OCF_ERR_GENERIC
+   -e STOP SLAVE
+if [ $? -gt 0 ]; then
+   ocf_log err Error stopping rest slave threads
+   exit $OCF_ERR_GENERIC
+fi
+
 ocf_run $MYSQL $mysql_options \
-   -e CHANGE MASTER TO MASTER_HOST='' || exit $OCF_ERR_GENERIC
+   -e CHANGE MASTER TO MASTER_HOST='' 
+if [ $? -gt 0 ]; then
+   ocf_log err Failed to set master
+   exit $OCF_ERR_GENERIC
+fi
 }
 
 ###
@@