Hello Egor,
On 4/20/2012 7:17 AM, nixofortune wrote:
Hi guys,
I'm am experiencing strange behaviour with the replication.
Our replication Setup looks like this:
Master1(5.1.46-log)
=================Master2(5.1.46-log)
/ | \
Slave1(5.1.46-log) Slave2(5.1.52) Slave3(5.1.52)
Yesterday I noticed that 2 of the slaves start lagging behind the master.
The load on the slaves Machines was about 1.0, top = mysqld 100%
SHOW SLAVE STATUS:
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Relay_Log_Pos: 670375858 without any progress for 4 hours
Exec_Master_Log_Pos: without any progress for 4 hours
Seconds_Behind_Master: steady growing
No Errors.
Relay log at the Relay log position was something like this:
# at 670375858
#120419 6:22:57 server id 5 end_log_pos 670375922 Query
thread_id=48477609 exec_time=8 error_code=0
SET TIMESTAMP=1334830977/*!*/;
SET @@session.auto_increment_increment=10,
@@session.auto_increment_offset=5/*!*/;
BEGIN
/*!*/;
# at 670375922
# at 670376015
# at 670377033
# at 670378042
# at 670379050
# at 670380055
......
......
# at 678710787
# at 678711799
(8,257 rows like that..)
This is a representation of the replication stream using ROW formatting.
#120419 6:22:57 server id 5 end_log_pos 670376015 Table_map:
`fw4`.`newsletter_campaigns_recipients_type_regular` mapped to number
10591074
#120419 6:22:57 server id 5 end_log_pos 670377033 Delete_rows: table id
10591074
#120419 6:22:57 server id 5 end_log_pos 670378042 Delete_rows: table id
10591074
#120419 6:22:57 server id 5 end_log_pos 670379050 Delete_rows: table id
10591074
................
................
#120419 6:22:57 server id 5 end_log_pos 678711799 Delete_rows: table id
10591074
#120419 6:22:57 server id 5 end_log_pos 678712260 Delete_rows: table id
10591074 flags: STMT_END_F
(8,257 rows like that..)
Those are descriptions of the actual ROW events being executed. It
appears you are trying to delete 8257 rows from the table 10591074 which
has been mapped to the name
`fw4`.`newsletter_campaigns_recipients_type_regular`
BINLOG '
geePTxMFAAAAXQAAAE8g9ScAAGKboQAAAAEAA2Z3NAAsbmV3c2xldHRlcl9jYW1wYWlnbnNfcmVj
aXBpZW50c190eXBlX3JlZ3VsYXIABgMDAw8PDwYQAHwAIAAI
geePTxkFAAAA+gMAAEkk9ScAAGKboQAAAAAABv/AlWdgIAEAAABnNAUAA01ycwRGVzI0IDkwODcx
ZGIxMDEwOTUyNWM3NTJmODYwYjY4YmNkMjdlwItnYCABAAAAZzQFAAJEcgRGVzI0IDYxOTQxZTMw
ZGU2MDNlYWRmOTBmZTMyMGZiODA5OGNhwIFnYCABAAAAZzQFAAJNcwRGVzI0IGMxMTIzYWZlNThh
NDczMmQ0ODE1Yzk3ZDUwNmEyMTdhwHdnYCABAAAAZzQFAAJNcwRGVzI0IGQ5YmY3YTJjZDAyNzNl
M2Y0MmNmYzI3MDliOTJmNzc0wG1nYCABAAAAZzQFAAJNcgRGVzI0IGJmZjY4OTlhNjUwZjdlYmE3
NjY2YzZjMjkyNzJkZGIzwGNnYCABAAAAZzQFAAJNcwlFeGVjdXRpdmUgMTE0ZDlmNWY4ZDU2ZGIy
ZmJiNmRiMWY0OTExZmZkNjTAWWdgIAEAAABnNAUAAkRyBEZXMjQgMGMxMGMzMTA3Y2MxMGJhNmE1
ZjhkYzlmMGI1NTM1MDHAT2dgIAEAAABnNAUAAk1zBEZXMjQgNzBkOGFjNmE5MmU3ZDE3MDc5OTEx
NmVmOTE3OTg2OTHARWdgIAEAAABnNAUAAkRyCUV4ZWN1dGl2ZSA4ZjlmMmZiMzc3ZWYwOTFjZDc0
ZWJkNGZmOTdmYzVkZsA7Z2AgAQAAAGc0BQACRHIERlcyNCA2OTQwMGZhNzUyNTg5NmM2Mjc4ZDI1
(1 686 969: 125Meg Rows of Rubbish like that )
This is an actual ROW event. It is a base64 encoded representation of
the binary information that represents both the values of the original
row and the values that you want that row to become.
After 4Hours time the Slaves started processing the Replication logs and
gradually catching up with the masters.
No Slave errors btw yet.
Those deletes haven't been processed on slaves and we have now 500 000 rows
difference with the masters.
I'm trying to understand what's caused this issue and what actually
happened here.
> ...
The most common mistake when using ROW or MIXED is the failure to verify
that every table you want to replicate has a PRIMARY KEY on it. This is
a mistake because when a ROW event (such as the one documented above) is
sent to the slave and neither the master's copy nor the slave's copy of
the table has a PRIMARY KEY on the table, there is no way to easily
identify which unique row you want replication to change.
Replication solves this problem by scanning your entire table and
choosing an appropriately matching row to the 'before' image embedded in
the ROW event. Multiply the number of rows you are trying to change by
the number of rows you need to scan and this can quickly become a
process that takes a long time to complete. A numerical example would
look like this:
* 50000 rows in a table without a PRIMARY KEY
* You delete 5000 of those on the master.
* The 5000 deletion events are written to the Binary Log in ROW formatting.
When the slave gets this event, it must process those 5000 row events by
scanning the full table 5000 times. This means that the slave needs to
resolve 5000*50000=250000000 (250 million) full-row comparisons in order
to process this DELETE event.
The lesson here: Always assign a PRIMARY KEY to your tables if you are
going to use MIXED or ROW values for --binlog-format
--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN
--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql