On Thu, Jul 01, 2010 at 05:42:00PM +0200, Dejan Muhamedagic wrote:
> On Tue, Jun 29, 2010 at 06:53:23PM +0200, Marek Marczykowski wrote:
> > On Tue, Jun 29, 2010 at 05:41:08PM +0200, Dejan Muhamedagic wrote:
> > --- mysql-02        2010-06-29 18:17:59.214106092 +0200
> > +++ mysql   2010-06-29 18:18:26.238105571 +0200
> > @@ -618,8 +618,13 @@
> >     fi
> >      fi
> >  
> > -    ocf_log info "MySQL monitor succeeded";
> > -    return $OCF_SUCCESS
> > +    if [ "$OCF_RESKEY_CRM_meta_role" = "Master" ]; then
> > +       ocf_log info "MySQL monitor succeeded (master)";
> > +       return $OCF_RUNNING_MASTER
> > +    else
> > +       ocf_log info "MySQL monitor succeeded";
> > +       return $OCF_SUCCESS
> > +    fi
> >  }
> >  
> >  mysql_start() {
> 
> This seems to be rather serious. I wonder if the RA could've been
> used in MS mode at all.

I think it should also check on slave if it is connected to the master
(if any)... Calling "is_slave" when master exists should be enough. Or
just use condition "if is_slave; then" from beginning of this function.
I've added this to my todo-list...

> >  mysql_demote() {
> > -    set_read_only on || return $OCF_ERR_GENERIC
> > +    if ( ! mysql_status ); then
> 
> () removed because they are unnecessary.
> 
> > +   return $OCF_NOT_RUNNING
> > +    fi
> 
> This test is a bit more than logging change.

Yes, but basically only improve error reporting :)

> > +
> > +    set_read_only on
> > +    if [ $? -ne 0 ]; then
> > +   ocf_log err "Failed to set read-only";
> > +   return $OCF_ERR_GENERIC;
> > +    fi
> >  
> >      # Return master preference to default, so the cluster manager gets
> >      # a chance to select a new master
> 
> OK.
> 
> > --- mysql-05        2010-06-29 18:23:43.806105971 +0200
> > +++ mysql   2010-06-29 18:23:55.882104463 +0200
> > @@ -730,6 +730,19 @@
> >     # don't know what master to replicate from), we simply start
> >     # in read only mode.
> >     set_read_only on
> > +
> > +   master_host=`echo $OCF_RESKEY_CRM_meta_notify_master_uname|tr -d " "`
> > +   if [ "$master_host" -a "$master_host" != `uname -n` ]; then
> > +       ocf_log info "Changing MySQL configuration to replicate from 
> > $master_host."
> > +       set_master $master_host
> > +       ocf_run $MYSQL $MYSQL_OPTIONS_LOCAL $MYSQL_OPTIONS_REPL \
> > +           -e "SLAVE START"
> > +       if [ $? -ne 0 ]; then
> > +           ocf_log err "Failed to start slave";
> > +           return $OCF_ERR_GENERIC;
> > +       fi
> > +   fi
> > +
> >     # We also need to set a master preference, otherwise Pacemaker
> >     # won't ever promote us in the absence of any explicit
> >     # preference set by the administrator. We choose a low
> 
> This part I don't understand. set_master does:
> 
> ocf_run $MYSQL $MYSQL_OPTIONS_LOCAL $MYSQL_OPTIONS_REPL ...

It does only "CHANGE MASTER TO ..." - set from which master replicate, but
do not start it. When called from pre-promote, we start replication from
post-promote, so this function cannot "START SLAVE".

> and then you have the same thing repeated afterwards:
> 
> > +       ocf_run $MYSQL $MYSQL_OPTIONS_LOCAL $MYSQL_OPTIONS_REPL ...
> > +           -e "SLAVE START"
> 
> Otherwise, set_master is invoked in pre-promote(). This patch
> invokes it from the start operation. Is that a duplicate?

No. This is used on new slave start when master already exists. In this
situation it doesn't receive pre-promote notification. Or maybe I'm
wrong?

> I also don't understand why that function is called set_master
> when it's all about the slave replication.

This function point slave what master replicate from.

> > --- mysql-06        2010-06-29 18:24:48.166105436 +0200
> > +++ mysql   2010-06-29 18:32:18.666101597 +0200
> > @@ -371,7 +371,7 @@
> >  
> >      tmpfile=`mktemp ${HA_RSCTMP}/is_slave.${OCF_RESOURCE_INSTANCE}.XXXXXX`
> >  
> > -    mysql_options="$MYSQL_OPTIONS_LOCAL --user=$OCF_RESKEY_test_user 
> > --password=$OCF_RESKEY_test_passwd"
> > +    mysql_options="$MYSQL_OPTIONS_LOCAL 
> > --user=$OCF_RESKEY_replication_user 
> > --password=$OCF_RESKEY_replication_passwd"
> >  
> >      $MYSQL $mysql_options \
> >          -e 'SHOW SLAVE STATUS\G' > $tmpfile
> > @@ -396,7 +396,7 @@
> >      rc=1
> >      tmpfile=`mktemp 
> > ${HA_RSCTMP}/check_slave.${OCF_RESOURCE_INSTANCE}.XXXXXX`
> >  
> > -    mysql_options="$MYSQL_OPTIONS_LOCAL --user=$OCF_RESKEY_test_user 
> > --password=$OCF_RESKEY_test_passwd"
> > +    mysql_options="$MYSQL_OPTIONS_LOCAL 
> > --user=$OCF_RESKEY_replication_user 
> > --password=$OCF_RESKEY_replication_passwd"
> >  
> >      $MYSQL $mysql_options \
> >          -e 'SHOW SLAVE STATUS\G' > $tmpfile
> 
> Wouldn't this work with the test_user as well?

This user need "REPLICATION CLIENT" rights. The replication_user must
have it to manipulate replication ;) but test_user can be less
powerful. The test_user should have access to test_table (and need only
SELECT right), so why force to give additional rights to him? In my case
I'm using user from application working on this database, so I don't
want give him rights to manipulate replication. Of course if you use
"root" in both cases it will work, but I don't like such configuration
on production servers.

> > --- mysql   2010-06-29 18:32:18.666101597 +0200
> > +++ mysql-07        2010-06-29 18:30:40.254105999 +0200

(...)

> > @@ -401,23 +410,36 @@
> >      $MYSQL $mysql_options \
> >          -e 'SHOW SLAVE STATUS\G' > $tmpfile
> >  
> > -    local master_host
> > -    local master_user
> > -    local master_port
> > -    local slave_sql
> > -    local slave_io
> > -    local last_errno
> > -    local secs_behind
> 
> Why remove the local statements? I see now, you want to use them
> in more functions.

Yes. There is any other solution for this? The worst thing is not
cleaning $tmpfile, but it is needed for debugging: in case of error,
message contains path to it and leave this file. Have you better idea?

> >      if [ -s $tmpfile ]; then
> > -   master_host=`sed -ne 's/^.*Master_Host: \(.*\)$/\1/p' < $tmpfile`
> > -   master_user=`sed -ne 's/^.*Master_User: \(.*\)$/\1/p' < $tmpfile`
> > -   master_port=`sed -ne 's/^.*Master_Port: \(.*\)$/\1/p' < $tmpfile`
> > -   slave_sql=`sed -ne 's/^.*Slave_SQL_Running: \(.*\)$/\1/p' < $tmpfile`
> > -   slave_io=`sed -ne 's/^.*Slave_IO_Running: \(.*\)$/\1/p' < $tmpfile`
> > -   last_errno=`sed -ne 's/^.*Last_Errno: \(.*\)$/\1/p' < $tmpfile`
> > -   secs_behind=`sed -ne 's/^.*Seconds_Behind_Master: \(.*\)$/\1/p' < 
> > $tmpfile`
> > +   master_host=`sed -ne 's/^.* Master_Host: \(.*\)$/\1/p' < $tmpfile`
> > +   master_user=`sed -ne 's/^.* Master_User: \(.*\)$/\1/p' < $tmpfile`
> > +   master_port=`sed -ne 's/^.* Master_Port: \(.*\)$/\1/p' < $tmpfile`
> > +   master_log_file=`sed -ne 's/^.* Master_Log_File: \(.*\)$/\1/p' < 
> > $tmpfile`
> > +   master_log_pos=`sed -ne 's/^.* Read_Master_Log_Pos: \(.*\)$/\1/p' < 
> > $tmpfile`
> > +   slave_sql=`sed -ne 's/^.* Slave_SQL_Running: \(.*\)$/\1/p' < $tmpfile`
> > +   slave_io=`sed -ne 's/^.* Slave_IO_Running: \(.*\)$/\1/p' < $tmpfile`
> > +   last_errno=`sed -ne 's/^.* Last_Errno: \(.*\)$/\1/p' < $tmpfile`
> > +   secs_behind=`sed -ne 's/^.* Seconds_Behind_Master: \(.*\)$/\1/p' < 
> > $tmpfile`
> 
> Ugh, this code is ugly. One and the same sed pattern is repeated
> many times, it should move to a function.

Right... New version of this patch attached.

> It seems like you introduced two new variables. For the others
> you updated the search pattern by prefixing strings with a space.

(...)

> does this obliterate the earlier replication? If so, is that
> because you think this method better? You even mentioned that
> some data may be lost otherwise, right?

Yes. Using "RESET MASTER" on working replication is very dangerous...
This method fix this on dual-node setup. I have no idea how to do it
_automatically_ on three or more nodes.

In short: master records changes in binlog. Slave must know position in
this log from which should start the replication (or at which it
currently is). In original version script used "RESET MASTER" on each
promote action, which delete all binlogs so slaves start replication
from the beginning. But in some cases this deleted logs can contain
changes not applied by slaves yet.

Maybe the better solution is to detect it and refuse to start slave
before manual intervention? I don't know how detect it reliable, but it
is some other concept. The bad side is that, when you have no binlogs,
you must manually synchronize database (dump+restore), which can be not
so easy on production database (dump locks tables).

> Florian, can you please take a look when you have time. Anybody
> out there with enough mysql expertise, please comment too.
> 
> To recap, patches 1-5 are in. Patch 6 needs more clarification,
> patch 7 seems to be unnecessary, and patch 8 adds a new feature
> on which I can't comment.

Thanks!

-- 
Best Regards,
Marek Marczykowski          |   gg:2873965      | RLU #390519
marmarek at staszic waw pl  | xmpp:marmarek at staszic waw pl

--- mysql-07    2010-07-01 23:35:41.088934132 +0200
+++ mysql       2010-07-02 00:07:30.132981006 +0200
@@ -76,6 +76,7 @@
 OCF_RESKEY_replication_port_default="3306"
 OCF_RESKEY_max_slave_lag_default="3600"
 OCF_RESKEY_evict_outdated_slaves_default="false"
+OCF_RESKEY_state_default=${HA_RSCTMP}/Mysql-repl-${OCF_RESOURCE_INSTANCE}.state
 
 : ${OCF_RESKEY_binary=${OCF_RESKEY_binary_default}}
 MYSQL_BINDIR=`dirname ${OCF_RESKEY_binary}`
@@ -106,6 +107,8 @@
 : ${OCF_RESKEY_max_slave_lag=${OCF_RESKEY_max_slave_lag_default}}
 : 
${OCF_RESKEY_evict_outdated_slaves=${OCF_RESKEY_evict_outdated_slaves_default}}
 
+: ${OCF_RESKEY_state=${OCF_RESKEY_state_default}}
+
 #######################################################################
 
 usage() {
@@ -308,6 +311,14 @@
 <content type="boolean" default="${OCF_RESKEY_evict_outdated_slaves_default}" 
/>
 </parameter>
 
+<parameter name="state" unique="1">
+<longdesc lang="en">
+Location to store the mysql replication state in.
+</longdesc>
+<shortdesc lang="en">State file</shortdesc>
+<content type="string" default="${OCF_RESKEY_state_default}" />
+</parameter>
+
 </parameters>
 
 <actions>
@@ -387,13 +398,16 @@
     return 1
 }
 
-check_slave() {
-    # Checks slave status
-    local rc
-    local tmpfile
+parse_slave_info() {
+    # Extracts field $1 from result of "SHOW SLAVE STATUS\G" from file $2
+    sed -ne "s/^.* $1: \(.*\)$/\1/p" < $2
+}
+
+get_slave_info() {
+    # Warning: this sets $tmpfile and LEAVE this file! You must delete it 
after use!
+
     local mysql_options
 
-    rc=1
     tmpfile=`mktemp ${HA_RSCTMP}/check_slave.${OCF_RESOURCE_INSTANCE}.XXXXXX`
 
     mysql_options="$MYSQL_OPTIONS_LOCAL --user=$OCF_RESKEY_replication_user 
--password=$OCF_RESKEY_replication_passwd"
@@ -401,23 +415,36 @@
     $MYSQL $mysql_options \
         -e 'SHOW SLAVE STATUS\G' > $tmpfile
 
-    local master_host
-    local master_user
-    local master_port
-    local slave_sql
-    local slave_io
-    local last_errno
-    local secs_behind
-
     if [ -s $tmpfile ]; then
-       master_host=`sed -ne 's/^.*Master_Host: \(.*\)$/\1/p' < $tmpfile`
-       master_user=`sed -ne 's/^.*Master_User: \(.*\)$/\1/p' < $tmpfile`
-       master_port=`sed -ne 's/^.*Master_Port: \(.*\)$/\1/p' < $tmpfile`
-       slave_sql=`sed -ne 's/^.*Slave_SQL_Running: \(.*\)$/\1/p' < $tmpfile`
-       slave_io=`sed -ne 's/^.*Slave_IO_Running: \(.*\)$/\1/p' < $tmpfile`
-       last_errno=`sed -ne 's/^.*Last_Errno: \(.*\)$/\1/p' < $tmpfile`
-       secs_behind=`sed -ne 's/^.*Seconds_Behind_Master: \(.*\)$/\1/p' < 
$tmpfile`
+       master_host=`parse_slave_info Master_Host $tmpfile`
+       master_user=`parse_slave_info Master_User $tmpfile`
+       master_port=`parse_slave_info Master_Port $tmpfile`
+       master_log_file=`parse_slave_info Master_Log_File $tmpfile`
+       master_log_pos=`parse_slave_info Read_Master_Log_Pos $tmpfile`
+       slave_sql=`parse_slave_info Slave_SQL_Running $tmpfile`
+       slave_io=`parse_slave_info Slave_IO_Running $tmpfile`
+       last_errno=`parse_slave_info Last_Errno $tmpfile`
+       secs_behind=`parse_slave_info Seconds_Behind_Master $tmpfile`
+
+        ocf_log debug "MySQL instance running as a replication slave"
+    else
+        # Instance produced an empty "SHOW SLAVE STATUS" output --
+        # instance is not a slave
+       ocf_log err "check_slave invoked on an instance that is not a 
replication slave."
+       return $OCF_ERR_GENERIC
+    fi
+
+    return $OCF_SUCCESS
+}
+
+check_slave() {
+    # Checks slave status
+    local rc
 
+    get_slave_info
+    rc=$?
+
+    if [ $rc -eq 0 ]; then
        if [ $last_errno -ne 0 ]; then
            # Whoa. Replication ran into an error. This slave has
            # diverged from its master. Make sure this resource
@@ -476,18 +503,42 @@
 }
 
 set_master() {
+    local new_master_host
+    local master_params
+
+    new_master_host=$1
+
+    # Keep replication position
+    get_slave_info
+
+    if [ "$master_log_file" -a "$new_master_host" = "$master_host" ]; then
+        master_params=", MASTER_LOG_FILE='$master_log_file', \
+                         MASTER_LOG_POS=$master_log_pos"
+        ocf_log debug "Kept master pos for $master_host : 
$master_log_file:$master_log_pos"
+    elif [ -r "$OCF_RESKEY_state" ]; then
+        master_host=
+        . $OCF_RESKEY_state
+        if [ "$new_master_host" = "$master_host" ]; then
+                master_params=", MASTER_LOG_FILE='$master_log_file', \
+                                 MASTER_LOG_POS=$master_log_pos"
+                 ocf_log debug "Restored master pos for $master_host : 
$master_log_file:$master_log_pos"
+        fi
+     fi
+
     # Informs the MySQL server of the master to replicate
     # from. Accepts one mandatory argument which must contain the host
     # name of the new master host. The master must either be unchanged
     # from the laste master the slave replicated from, or freshly
     # reset with RESET MASTER.
-    local master_host
-    master_host=$1
 
     ocf_run $MYSQL $MYSQL_OPTIONS_LOCAL $MYSQL_OPTIONS_REPL \
-       -e "CHANGE MASTER TO MASTER_HOST='$master_host', \
+       -e "CHANGE MASTER TO MASTER_HOST='$new_master_host', \
                              MASTER_USER='$OCF_RESKEY_replication_user', \
-                             MASTER_PASSWORD='$OCF_RESKEY_replication_passwd'"
+                             MASTER_PASSWORD='$OCF_RESKEY_replication_passwd' 
$master_params"
+
+    # Remove state file - it will be invalid after SLAVE START
+    rm -f $OCF_RESKEY_state
+    rm -f $tmpfile
 }
 
 unset_master(){
@@ -537,7 +588,16 @@
        ocf_log err "Error stopping rest slave threads"
        exit $OCF_ERR_GENERIC
     fi
-    
+
+       #Save current state
+       get_slave_info
+       cat <<END > $OCF_RESKEY_state
+master_host="$master_host"
+master_log_file="$master_log_file"
+master_log_pos="$master_log_pos"
+END
+       rm -f $tmpfile
+
     ocf_run $MYSQL $mysql_options \
        -e "CHANGE MASTER TO MASTER_HOST=''" 
     if [ $? -gt 0 ]; then
@@ -805,6 +865,8 @@
     if ( ! mysql_status ); then
        return $OCF_NOT_RUNNING
     fi
+    ocf_run $MYSQL $MYSQL_OPTIONS_LOCAL $MYSQL_OPTIONS_REPL \
+       -e "SLAVE STOP"
     set_read_only off || return $OCF_ERR_GENERIC
 
     # Existing master gets a higher-than-default master preference, so
@@ -863,9 +925,7 @@
            fi
 
            if [ $master_host = `uname -n` ]; then
-               ocf_log info "Resetting MySQL replication configuration on new 
master $master_host"
-               ocf_run $MYSQL $MYSQL_OPTIONS_LOCAL $MYSQL_OPTIONS_REPL \
-                   -e 'RESET MASTER'
+               ocf_log info "This will be new master"
            else
                ocf_log info "Changing MySQL configuration to replicate from 
$master_host"
                set_master $master_host

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to