On 11/03/11 19:05, Raoul Bhatia [IPAX] wrote: > hi, > > disclaimer: it has been some time since i thoroughly studied > the inner workings of mysql replication, but i think that my > current knowledge is sufficient to diagnose this problem. > perhaps i'm lacking skills to find the most decent solution ;) > > > PROBLEM DESCRIPTION: > > i use the mysql ra rev 5ffc6ee2fae98adae8f2 from git [2] > i have my mysql m/s up and running: >> root@wdb01 ~ # crm status > ... >> Online: [ wdb01 wdb02 ] > ... >> Master/Slave Set: ms-wdb-mysql >> Masters: [ wdb01 ] >> Slaves: [ wdb02 ] > > my databases are in sync and i truncate the test table: >> # ssh wdb01c "mysql nagiostest -Be 'truncate table test_table'" >> # ssh wdb01c "mysql nagiostest -Be 'select * from test_table'" >> # ssh wdb02c "mysql nagiostest -Be 'select * from test_table'" > > inserting some values and query from both nodes to verify that > everything is in sync: >> # ssh wdb01c "mysql nagiostest -Be 'insert into test_table values (1),(2)'" >> # ssh wdb01c "mysql nagiostest -Be 'select * from test_table'" >> id >> 1 >> 2 >> # ssh wdb02c "mysql nagiostest -Be 'select * from test_table'" >> id >> 1 >> 2 > > great, now i do a migration that affects ms-wdb-mysql >> # crm resource migrate wdb-ip0 PT1M >> Migration will take effect until: 2011-03-11 17:21:50Z >> WARNING: Creating rsc_location constraint 'cli-standby-wdb-ip0' with a score >> of -INFINITY for resource wdb-ip0 on wdb01. >> This will prevent wdb-ip0 from running on wdb01 until the constraint >> is removed using the 'crm_resource -U' command or manually with cibadmin >> This will be the case even if wdb01 is the last node in the cluster >> This message can be disabled with -Q > > checking the status: >> # crm status > ... >> Online: [ wdb01 wdb02 ] > ... >> Master/Slave Set: ms-wdb-mysql >> Masters: [ wdb02 ] >> Slaves: [ wdb01 ] > > checking the data: >> # ssh wdb01c "mysql nagiostest -e 'select * from test_table'" >> id >> 1 >> 2 >> # ssh wdb02c "mysql nagiostest -e 'select * from test_table'" >> id >> 1 >> 2 > > -> *everything is fine* > -> insert more values on the new master wdb02: >> # ssh wdb02c "mysql nagiostest -Be 'insert into test_table values >> (3),(4),(5)'" >> # ssh wdb02c "mysql nagiostest -e 'select * from test_table'" >> id >> 1 >> 2 >> 3 >> 4 >> 5 >> root@wdb01 ~ # ssh wdb01c "mysql nagiostest -e 'select * from test_table'" >> id >> 1 >> 2 >> 3 >> 4 >> 5 > > -> great! > -> another migration: >> root@wdb01 ~ # crm resource migrate wdb-ip0 PT1M >> Migration will take effect until: 2011-03-11 17:27:13Z >> WARNING: Creating rsc_location constraint 'cli-standby-wdb-ip0' with a score >> of -INFINITY for resource wdb-ip0 on wdb02. >> This will prevent wdb-ip0 from running on wdb02 until the constraint >> is removed using the 'crm_resource -U' command or manually with cibadmin >> This will be the case even if wdb02 is the last node in the cluster >> This message can be disabled with -Q > > check status: >> # crm status > ... >> Online: [ wdb01 wdb02 ] > ... >> Masters: [ wdb01 ] >> Slaves: [ wdb02 ] > > -> fine > -> insert another value on wdb01 and query: >> # ssh wdb01c "mysql nagiostest -Be 'insert into test_table values (6)'"^C >> # ssh wdb01c "mysql nagiostest -e 'select * from test_table'" >> id >> 1 >> 2 >> 3 >> 4 >> 5 >> 6 > -> fine, now query wdb02: > >> root@wdb01 ~ # ssh wdb02c "mysql nagiostest -e 'select * from test_table'" >> id >> 1 >> 2 >> 6 > > *error* > > things get worse for the next failover, and so on :) > > > MY CONCLUSION: > the source and some thinking leads me to the following conclusion: > > 1. when a new master is promoted, its slave binlogs do not get purged > this is no problem in case of *one* failover, because > > a. the slave will (hopefully) be consistant > b. gets promoted to master and > c. the old master (with no slave replication logs) will happily start > the replication starting with an empty replication log. > > > *but* > 2. when the roles switch, > a. the consistent slave is promoted > b. the now new slave will use set_master() from the ra [1] > -> >> ocf_run $MYSQL $MYSQL_OPTIONS_LOCAL $MYSQL_OPTIONS_REPL \ >> -e "CHANGE MASTER TO MASTER_HOST='$master_host', \ >> MASTER_USER='$OCF_RESKEY_replication_user', \ >> >> MASTER_PASSWORD='$OCF_RESKEY_replication_passwd'" > > c. and will then start the replication, starting with > the default RELAY_LOG_FILE and RELAY_LOG_POS=0, as well as > the MASTER_LOG_FILE and MASTER_LOG_POS=0 information from > master.info/relay-log.info leading to: > > d. truncate table, insert 1,2 and finally insert 6 > > > > SOLVING THIS ISSUE? > i thought about how to solve this problem, and i currently came up with > the following idea: > > 1. besides using reset master upon failover, also use reset slave to > purge any existing relay logs upon changing the master. > > 2. maybe it is possible to save and communicate the correct > RELAY_LOG_FILE, RELAY_LOG_POS and/or MASTER_LOG_FILE, MASTER_LOG_POS > information in the cib?
I don't know if this is enough, but my version saves MASTER_LOG_FILE and MASTER_LOG_POS. See http://marmarek.w.staszic.waw.pl/patches/ha-mysql-ra/ especially: http://marmarek.w.staszic.waw.pl/patches/ha-mysql-ra/15_mysql-ra-replication-state-for-more-than-two-nodes.patch. -- Best Regards, Marek Marczykowski | gg:2873965 | RLU #390519 marmarek at staszic waw pl | xmpp:marmarek at staszic waw pl
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/