Re: Maser-Slave replication is playing up

nixofortune Fri, 20 Apr 2012 05:19:19 -0700

Hi Shawn,
Great! This is most likely what happened.
I saw yesterday when the drama developed, that mysql does huge amount of
select queries, but couldn't identified the source. the was no long running
threads, no slow logs entries as well.
Now I understand what happened.


This is a table definition:
+-------------------------+--------------+------+-----+---------+-------+
| Field                   | Type         | Null | Key | Default | Extra |
+-------------------------+--------------+------+-----+---------+-------+
| recipient_id            | int(11)      | NO   |     | NULL    |       |
| newsletter_type_id      | int(11)      | NO   |     | NULL    |       |
| recipient_newsletter_id | int(11)      | NO   | MUL | NULL    |       |
| recipient_salutation    | varchar(16)  | YES  |     | NULL    |       |
| recipient_role          | varchar(124) | NO   |     | NULL    |       |
| recipient_userhash      | varchar(32)  | NO   |     | NULL    |       |
+-------------------------+--------------+------+-----+---------+-------+
No PRI key!

Now. the next step will be to Alter the table and create PRI key
on recipient_id luckily all ID's are distinct there.
ALTER TABLE tb_name add PRIMARY KEY (recipient_id);
That should fix it.
Many thanks,
Egor






On 20 April 2012 12:51, shawn green <shawn.l.gr...@oracle.com> wrote:

> Hello Egor,
>
>
> On 4/20/2012 7:17 AM, nixofortune wrote:
>
>> Hi guys,
>> I'm am experiencing strange behaviour with the replication.
>> Our replication Setup looks like this:
>>
>>                        Master1(5.1.46-log)
>> =================Master2(5.1.**46-log)
>>                        /                  |                \
>> Slave1(5.1.46-log) Slave2(5.1.52) Slave3(5.1.52)
>>
>>
>>
>> Yesterday I noticed that 2 of the slaves start lagging behind the master.
>> The load on the slaves Machines was about 1.0, top = mysqld 100%
>>
>> SHOW SLAVE STATUS:
>> Slave_IO_Running: Yes
>> Slave_SQL_Running: Yes
>> Relay_Log_Pos: 670375858  without any progress for 4 hours
>> Exec_Master_Log_Pos: without any progress for 4 hours
>> Seconds_Behind_Master: steady growing
>> No Errors.
>>
>> Relay log at the Relay log position was something like this:
>>
>> # at 670375858
>> #120419  6:22:57 server id 5  end_log_pos 670375922  Query
>> thread_id=48477609 exec_time=8 error_code=0
>> SET TIMESTAMP=1334830977/*!*/;
>> SET @@session.auto_increment_**increment=10,
>> @@session.auto_increment_**offset=5/*!*/;
>> BEGIN
>> /*!*/;
>> # at 670375922
>> # at 670376015
>> # at 670377033
>> # at 670378042
>> # at 670379050
>> # at 670380055
>> ......
>> ......
>> # at 678710787
>> # at 678711799
>> (8,257 rows like that..)
>>
>>
> This is a representation of the replication stream using ROW formatting.
>
>
>  #120419  6:22:57 server id 5  end_log_pos 670376015  Table_map:
>> `fw4`.`newsletter_campaigns_**recipients_type_regular` mapped to number
>> 10591074
>> #120419  6:22:57 server id 5  end_log_pos 670377033  Delete_rows: table id
>> 10591074
>> #120419  6:22:57 server id 5  end_log_pos 670378042  Delete_rows: table id
>> 10591074
>> #120419  6:22:57 server id 5  end_log_pos 670379050  Delete_rows: table id
>> 10591074
>> ................
>> ................
>> #120419  6:22:57 server id 5  end_log_pos 678711799  Delete_rows: table id
>> 10591074
>> #120419  6:22:57 server id 5  end_log_pos 678712260  Delete_rows: table id
>> 10591074 flags: STMT_END_F
>> (8,257 rows like that..)
>>
>>
> Those are descriptions of the actual ROW events being executed. It appears
> you are trying to delete 8257 rows from the table 10591074 which has been
> mapped to the name `fw4`.`newsletter_campaigns_**recipients_type_regular`
>
>
>  BINLOG '
>> geePTxMFAAAAXQAAAE8g9ScAAGKboQ**AAAAEAA2Z3NAAsbmV3c2xldHRlcl9j**
>> YW1wYWlnbnNfcmVj
>> aXBpZW50c190eXBlX3JlZ3VsYXIABg**MDAw8PDwYQAHwAIAAI
>> geePTxkFAAAA+**gMAAEkk9ScAAGKboQAAAAAABv/**AlWdgIAEAAABnNAUAA01ycwRGVzI0I
>> **DkwODcx
>> ZGIxMDEwOTUyNWM3NTJmODYwYjY4Ym**NkMjdlwItnYCABAAAAZzQFAAJEcgRG**
>> VzI0IDYxOTQxZTMw
>> ZGU2MDNlYWRmOTBmZTMyMGZiODA5OG**NhwIFnYCABAAAAZzQFAAJNcwRGVzI0**
>> IGMxMTIzYWZlNThh
>> NDczMmQ0ODE1Yzk3ZDUwNmEyMTdhwH**dnYCABAAAAZzQFAAJNcwRGVzI0IGQ5**
>> YmY3YTJjZDAyNzNl
>> M2Y0MmNmYzI3MDliOTJmNzc0wG1nYC**ABAAAAZzQFAAJNcgRGVzI0IGJmZjY4**
>> OTlhNjUwZjdlYmE3
>> NjY2YzZjMjkyNzJkZGIzwGNnYCABAA**AAZzQFAAJNcwlFeGVjdXRpdmUgMTE0**
>> ZDlmNWY4ZDU2ZGIy
>> ZmJiNmRiMWY0OTExZmZkNjTAWWdgIA**EAAABnNAUAAkRyBEZXMjQgMGMxMGMz**
>> MTA3Y2MxMGJhNmE1
>> ZjhkYzlmMGI1NTM1MDHAT2dgIAEAAA**BnNAUAAk1zBEZXMjQgNzBkOGFjNmE5**
>> MmU3ZDE3MDc5OTEx
>> NmVmOTE3OTg2OTHARWdgIAEAAABnNA**UAAkRyCUV4ZWN1dGl2ZSA4ZjlmMmZi**
>> Mzc3ZWYwOTFjZDc0
>> ZWJkNGZmOTdmYzVkZsA7Z2AgAQAAAG**c0BQACRHIERlcyNCA2OTQwMGZhNzUy**
>> NTg5NmM2Mjc4ZDI1
>>
>> (1 686 969: 125Meg Rows of Rubbish like that )
>>
>>
> This is an actual ROW event. It is a base64 encoded representation of the
> binary information that represents both the values of the original row and
> the values that you want that row to become.
>
>
>
>  After 4Hours time the Slaves started processing the Replication logs and
>> gradually catching up with the masters.
>> No Slave errors btw yet.
>> Those deletes haven't been processed on slaves and we have now 500 000
>> rows
>> difference with the masters.
>> I'm trying to understand what's caused this issue and what actually
>> happened here.
>>
> > ...
>
> The most common mistake when using ROW or MIXED is the failure to verify
> that every table you want to replicate has a PRIMARY KEY on it. This is a
> mistake because when a ROW event (such as the one documented above) is sent
> to the slave and neither the master's copy nor the slave's copy of the
> table has a PRIMARY KEY on the table, there is no way to easily identify
> which unique row you want replication to change.
>
> Replication solves this problem by scanning your entire table and choosing
> an appropriately matching row to the 'before' image embedded in the ROW
> event.  Multiply the number of rows you are trying to change by the number
> of rows you need to scan and this can quickly become a process that takes a
> long time to complete. A numerical example would look like this:
>
> * 50000 rows in a table without a PRIMARY KEY
> * You delete 5000 of those on the master.
> * The 5000 deletion events are written to the Binary Log in ROW formatting.
>
> When the slave gets this event, it must process those 5000 row events by
> scanning the full table 5000 times. This means that the slave needs to
> resolve 5000*50000=250000000 (250 million) full-row comparisons in order to
> process this DELETE event.
>
> The lesson here: Always assign a PRIMARY KEY to your tables if you are
> going to use MIXED or ROW values for --binlog-format
>
> --
> Shawn Green
> MySQL Principal Technical Support Engineer
> Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
> Office: Blountville, TN
>
>
> --
> Shawn Green
> MySQL Principal Technical Support Engineer
> Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
> Office: Blountville, TN
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql
>
>

Re: Maser-Slave replication is playing up

Reply via email to