Hi guys, I'm am experiencing strange behaviour with the replication. Our replication Setup looks like this:
Master1(5.1.46-log) =================Master2(5.1.46-log) / | \ Slave1(5.1.46-log) Slave2(5.1.52) Slave3(5.1.52) Yesterday I noticed that 2 of the slaves start lagging behind the master. The load on the slaves Machines was about 1.0, top = mysqld 100% SHOW SLAVE STATUS: Slave_IO_Running: Yes Slave_SQL_Running: Yes Relay_Log_Pos: 670375858 without any progress for 4 hours Exec_Master_Log_Pos: without any progress for 4 hours Seconds_Behind_Master: steady growing No Errors. Relay log at the Relay log position was something like this: # at 670375858 #120419 6:22:57 server id 5 end_log_pos 670375922 Query thread_id=48477609 exec_time=8 error_code=0 SET TIMESTAMP=1334830977/*!*/; SET @@session.auto_increment_increment=10, @@session.auto_increment_offset=5/*!*/; BEGIN /*!*/; # at 670375922 # at 670376015 # at 670377033 # at 670378042 # at 670379050 # at 670380055 ...... ...... # at 678710787 # at 678711799 (8,257 rows like that..) #120419 6:22:57 server id 5 end_log_pos 670376015 Table_map: `fw4`.`newsletter_campaigns_recipients_type_regular` mapped to number 10591074 #120419 6:22:57 server id 5 end_log_pos 670377033 Delete_rows: table id 10591074 #120419 6:22:57 server id 5 end_log_pos 670378042 Delete_rows: table id 10591074 #120419 6:22:57 server id 5 end_log_pos 670379050 Delete_rows: table id 10591074 ................ ................ #120419 6:22:57 server id 5 end_log_pos 678711799 Delete_rows: table id 10591074 #120419 6:22:57 server id 5 end_log_pos 678712260 Delete_rows: table id 10591074 flags: STMT_END_F (8,257 rows like that..) BINLOG ' geePTxMFAAAAXQAAAE8g9ScAAGKboQAAAAEAA2Z3NAAsbmV3c2xldHRlcl9jYW1wYWlnbnNfcmVj aXBpZW50c190eXBlX3JlZ3VsYXIABgMDAw8PDwYQAHwAIAAI geePTxkFAAAA+gMAAEkk9ScAAGKboQAAAAAABv/AlWdgIAEAAABnNAUAA01ycwRGVzI0IDkwODcx ZGIxMDEwOTUyNWM3NTJmODYwYjY4YmNkMjdlwItnYCABAAAAZzQFAAJEcgRGVzI0IDYxOTQxZTMw ZGU2MDNlYWRmOTBmZTMyMGZiODA5OGNhwIFnYCABAAAAZzQFAAJNcwRGVzI0IGMxMTIzYWZlNThh NDczMmQ0ODE1Yzk3ZDUwNmEyMTdhwHdnYCABAAAAZzQFAAJNcwRGVzI0IGQ5YmY3YTJjZDAyNzNl M2Y0MmNmYzI3MDliOTJmNzc0wG1nYCABAAAAZzQFAAJNcgRGVzI0IGJmZjY4OTlhNjUwZjdlYmE3 NjY2YzZjMjkyNzJkZGIzwGNnYCABAAAAZzQFAAJNcwlFeGVjdXRpdmUgMTE0ZDlmNWY4ZDU2ZGIy ZmJiNmRiMWY0OTExZmZkNjTAWWdgIAEAAABnNAUAAkRyBEZXMjQgMGMxMGMzMTA3Y2MxMGJhNmE1 ZjhkYzlmMGI1NTM1MDHAT2dgIAEAAABnNAUAAk1zBEZXMjQgNzBkOGFjNmE5MmU3ZDE3MDc5OTEx NmVmOTE3OTg2OTHARWdgIAEAAABnNAUAAkRyCUV4ZWN1dGl2ZSA4ZjlmMmZiMzc3ZWYwOTFjZDc0 ZWJkNGZmOTdmYzVkZsA7Z2AgAQAAAGc0BQACRHIERlcyNCA2OTQwMGZhNzUyNTg5NmM2Mjc4ZDI1 (1 686 969: 125Meg Rows of Rubbish like that ) After 4Hours time the Slaves started processing the Replication logs and gradually catching up with the masters. No Slave errors btw yet. Those deletes haven't been processed on slaves and we have now 500 000 rows difference with the masters. I'm trying to understand what's caused this issue and what actually happened here. So far I found : binlog_format different: Masters (MIXED) Slaves (STATEMENT) sync_binlog different on one of the slaves: 1024 against 0 on the rest. Does any of you experienced similar problems? I would really appreciate any suggestions on the issue. Many thanks, Egor