Hi all,

I have a problem with replication, that while repeatable for me very
easily, I can not come up with a way for others to repeat it without all
our tables and binlogs (tens of gigabytes). So I'm simply going to
describe things here and see if anyone else has experienced anything
similar or might have some suggestions.

After thinking about using replication, for what seems like forever, I
finally got around to it. Both the master and the slave are v4.0.10. I
started it up and all seemed to work well for a while. Maybe a few
hours. 

Then I found that a table got corrupted on the slave:

ERROR: 1034  Incorrect key file for table: 'forums_posts_new_0'. Try to
repair it
030215 10:01:12  Slave: error 'Incorrect key file for table:
'forums_posts_new_0'. Try to repair it' on query 'insert into
forums_posts_new_0...
Error running query, slave SQL thread aborted. Fix the problem, and
restart the slave SQL thread with "SLAVE START". We stopped at log
'binlog.003' position 97273308

At this point the slave SQL thread stopped. The IO thread continued.

A couple of days later I noticed the error, repaired the table and
started the slave thread again. With the IO thread so far ahead, the SQL
thread could pump through the queries much faster. Now it only takes 3-4
minutes before another table gets corrupted. 

However, it is not just any table. I have tables 'forums_posts_new_0' to
'forums_posts_new_9' that hold messages. Out of all the tables, only
these get corrupted.

If I repair the table, then start the slave it will work for 5-15
minutes until another table is corrupted. Repeat. Repeat. Repeat.

I checked the drives and the file system for errors and found no
problems. The machine that acts as a slave to the master is also used
for data-warehouse and FTS operations, has lots of disk access on its
database and has no errors. I have tried stopping data warehouse and FTS
operations while the slave runs, but it makes no difference. 

BTW: Sometimes the slave crashes when doing replication (and in the
following example, only replication). Example of a backtrace:

0x806f53b handle_segfault + 447
0x826ae18 pthread_sighandler + 184
0x8296b07 memcpy + 39
0x823703d _mi_balance_page + 649
0x8236994 _mi_insert + 392
0x82367d2 w_search + 518
0x8236793 w_search + 455
0x8236793 w_search + 455
0x8236793 w_search + 455
0x8236793 w_search + 455
0x8236482 _mi_ck_write_btree + 142
0x82363e9 _mi_ck_write + 65
0x823602f mi_write + 591
0x80c257d write_row__9ha_myisamPc + 101
0x80a17f5 write_record__FP8st_tableP12st_copy_info + 513
0x80a110d
mysql_insert__FP3THDP13st_table_listRt4List1Z4ItemRt4List1Zt4List1Z4Item
15enum_duplicates + 1129
0x807ad7a mysql_execute_command__Fv + 6598
0x807d226 mysql_parse__FP3THDPcUi + 146
0x80add97 exec_event__15Query_log_eventP17st_relay_log_info + 427
0x80e3faa exec_relay_log_event__FP3THDP17st_relay_log_info + 542
0x80e4aca handle_slave_sql + 602
0x82685cc pthread_start_thread + 220
0x829dd8a thread_start + 4

After the above crash, mysqld restarted and the slave continued to run
for a while without error. Weird. For a while...

-steve-



---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to