RE: Problem with replication and corrupting tables
Quick question: Are the binlog and relaylog files the same format? Initial tests seem to indicate that they are the same. Can I use mysqlbinlog -o Relay_Log_Pos Relay_Log_File | mysql to get the slave more up to date (without having the slave SQL thread running)? I tried the above but the Relay_Log_Pos from 'show slave status' seemed way past the end of the file as it returned no results. :( How do I get a proper offset from which to start? Being able to do this would isolate the issue squarely at the slave SQL thread if the above had no issues. Also, I uploaded a small trace file that shows the corruption. It is the smallest I was able to make last night (about 72MB -- 6MB gzipped). It is in the secret folder. Hopefully it will help. -steve- - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: Problem with replication and corrupting tables
Hi, mysqlbinlog -j Relay_Log_Pos Relay_Log_File | mysql works fine. I used -o instead of -j before. So I answered my last question. When doing this: mysqlbinlog -j Relay_Log_Pos Relay_Log_File | more I see that it had advanced to the query after the one with the problem in the trace file. In fact, the query succeeded and was there after a REPAIR TABLE .. USE_FRM. Now that I got the above to work, I ran it. And I found a surprising result (to me): It still failed. So the problem is not with the replication code per se. So maybe I can make a test case - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: Problem with replication and corrupting tables
Hi, And fixed. Sorry for the waste of time. Only 4 days before I was set to replace the disk the database was on, and it is going bad. :( -steve- sql,query - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: Problem with replication and corrupting tables
An update. I'm now running the debug version on the slave. I could not trace out 'info' since it wrote way too much to the trace file. What I did find that was unique when the table crashed is this: handle_slave_sql: query: insert into forums_posts_new_3 ( w_search: error: Got errno: 0 from key_cache_read mi_write: error: Got error: 126 on write my_message_sql: error: Message: 'Incorrect key file for table: 'forums_posts_new_3'. Try to repair it' sql_print_error: error: Slave: error 'Incorrect key file for table: 'forums_posts_new_3'. Try to repair it' on query 'insert into forums_posts_new_3 I don't know why it had a problem with error zero from key_cache_read -- that seems to be the oddest thing in the log. It appears no where else. I'll keep digging. Does no one else have a problem with a slave stopping and corrupting its own tables? The only thing about the insert query that may be seen as odd is that it has binary data in it. That is one of the fields is like this: 'xÚÍTKoã6^P¾÷W~L}ÉÅÕf~K^B^EÚÝ,~\®~Qm7Û\0qÐ`~O#~S~R^HQ¤AÒQÕ_ßoH%V^N=ôP`~Q~Ge~ J3ó=ffKIó¡Ó~AZ~Ò$^O~MiRGm`¥é`9FbGø0^[´K?Ó~MN~T:m^B^]9à$RòHc~\^NÄ^T~S^O^S~M ^Fi~X^F^_Ø^Rç^PÒN~QoȤ~J^^ðÕé¿^R)~^ò»Þ¨H^G^^4Õ|èQ^TW^]¾LåC·~@±(~DòRÇèX½»¾¿ú. ÿûÌ©~[(²Q^[Z^?~Y¨Áw^].@^A~Oa\0;I;~B¨°e² ^^ݶß[~^~Lk¡~@~K^UÝ¡b2~@2ÊoÀ^Wîå?^E¹~\^Y^O^\z(a ^N^PGyö.3i~Bw~I¢æ~\\\N~NæÐ~_~N4\{®n~R^D~]~F£~DË~EàYel\\Ø~U*#Ô¥Æf`ò²^N¾×ùi^P w ®~N±Z~_ù¯·B¯ã^DÊ~R¥~H?~Cȶ|X~S`Uùdö~!àú£w^W~I~N§~T~QL/~D~A3Û~P~I®~V%ÿÔ0»õ^mÖ Yù^?Éü~GxmMJVÓíé0Q`^S~A~Cá~E+ÜgßîP³áÁØ~IÐ_³qq^CÍ}E×@^F^Es0^]:è èñ¡ð^UÍ~K1^U=~JÍR@ù¿¡{¦ágc7P?^U?^P7áb^BÞÜ^ZÞYá~C~FÅKk~^æ21C~K*Ø~Y87»^A~U ¥;ø^SL˽ÀÃ~IɶN^WA?×zåÛ^C^NiäܨÆÍ^e~]*úÝwÎM~R_Ò·n¹^A^T±~CJ^_~Vɾj| ^Yø~B~G^Ua^H^T+%7-ÚF/Así!Â0ÑV ^?áÐê~Jö`8^C±¦í^Rf¶^E\~L~N:^EiÄ~O:ê~Ph~O^TCÖD~VÆÑ2P~Kj~]^L·$^Qí^A¯fP~W:þ ÷´Ó¨~Wt^HF l^zÎ^Q¥tís~_| qGÀíµ\\^D#^H*UN®~D^A2~DL^BÚWU%)T`×g~WæÀç~Y^\±e~T^_^]E~OõC£?Y%vÕAs~_·~QË^XÀC¡ ^OB¦^Oç`~T*hÞ^^S^HÞ^_caÚ^[k^QóÓåiCÙ}9t~A¹t^A~J^CcQ¤A^W\4ÅÕ^YÀ~\ì~GKé3ý:^ S¸¢{ç^MQ[Ù^Ay%ü\^G¯ÂS¦^R^S^P~W^TbR͹û~\ÂÎ| E¿Ü`xq^A}åâr~H;^O~[^Z#éÊ8c8e¹| HÙãe^R^T®ç~~_^[M~WÅ!kp@^^ÜÏ~Ijû²~B^Vµ÷i^ÿ~_$~LŦ^\wnÉ2¥à~F~]Vk*´¡lãå\0ÝÞå~_ ^U½3CK1^\Þ¯ß~X~A[^]ßt|NUk~Z58~Jïßþ^HMTêðpõÿ^G~\AþçRtïc\\Ä×á*ÿ}s^\éáÓ~N®w;º ÙíöôøÛí-mo^_·_÷t^?÷ëçÕê~[õå^_-:^G^W' I'm starting to run out of ideas... -steve- - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: Problem with replication and corrupting tables
Below is a trace (--debug=d,enter,exit,info,error,query,general,where: O,/tmp/mysqld.trace) of the slave thread. This is the best I can do as far as a bug report. No other queries were running and the slave I/O thread was idle (I firewalled its connection to the master/rest of the world). Without the SQL slave thread all is OK. This server can do any number of normal operations without error. The IO slave works fine. The SQL slave normally causes corruption, but has also caused a crash (a backtrace is in the first message of this thread). The error in this more detailed log seems different than in the previous log. But both point to the key cache. Why the SQL slave thread would cause something bad to happen in the key cache is beyond me. Another day... Very tired, -steve- my_b_seek: enter: pos: 0 my_malloc: exit: ptr: 84dc248 my_malloc: exit: ptr: 84bffd8 my_malloc: exit: ptr: 8525b18 handle_slave_sql: query: insert into forums_posts_new_0 ( forumid, messageid, parent, title, author, message, approved, email, ip, rootmessageid,loginid,autorespond,user_id ) values (32380, 1045077656, 0, 'Faculty experts available to discuss issues involving Korea', 'UM', 'http://www.umich.edu/news/Releases/2003/Feb03/r020703a.html', 'yes', '', inet_aton('244.118.132.197'), 1045077656, 0,'no','4a119100a6134a6dee9964dc257ea582' ) my_malloc: exit: ptr: 8522f60 set_lock_for_tables: enter: lock_type: 7 for_update: 1 check_access: enter: want_access: 2 master_access: 4294967295 hash_search: exit: found key at 26 my_malloc: exit: ptr: 8512f48 mi_get_status: info: key_file: 302662656 data_file: 1911596088 mi_write: enter: isam: 56 data: 57 _mi_make_key: exit: keynr: 0 w_search: enter: page: 64677888 key_cache_read: enter: file 56, filepos 64677888, length 1024 find_key_block: enter: file 56, filepos 64677888 _mi_bin_search: exit: flag: 1 keypos: 2 w_search: enter: page: 12455936 key_cache_read: enter: file 56, filepos 12455936, length 1024 find_key_block: enter: file 56, filepos 12455936 _mi_bin_search: exit: flag: 1 keypos: 4 w_search: enter: page: 8588288 key_cache_read: enter: file 56, filepos 8588288, length 1024 find_key_block: enter: file 56, filepos 8588288 _mi_bin_search: exit: flag: 1 keypos: 31 w_search: enter: page: 8554496 key_cache_read: enter: file 56, filepos 8554496, length 1024 find_key_block: enter: file 56, filepos 8554496 _mi_bin_search: exit: flag: 1 keypos: 28 _mi_insert: enter: key_pos: bfefc8ae key_cache_write: enter: file 56, filepos 8554496, length 1024 find_key_block: enter: file 56, filepos 8554496 _mi_make_key: exit: keynr: 1 w_search: enter: page: 118468608 key_cache_read: enter: file 56, filepos 118468608, length 1024 find_key_block: enter: file 56, filepos 118468608 _mi_bin_search: exit: flag: 1 keypos: 1 w_search: enter: page: 7552 key_cache_read: enter: file 56, filepos 7552, length 1024 find_key_block: enter: file 56, filepos 7552 _mi_bin_search: exit: flag: 1 keypos: 23 w_search: enter: page: 71856128 key_cache_read: enter: file 56, filepos 71856128, length 1024 find_key_block: enter: file 56, filepos 71856128 _mi_bin_search: exit: flag: 1 keypos: 11 w_search: enter: page: 71792640 key_cache_read: enter: file 56, filepos 71792640, length 1024 find_key_block: enter: file 56, filepos 71792640 w_search: error: page 71792640 had wrong page length: 26656 w_search: exit: Error: 126 mi_write: error: Got error: 126 on write print_error: enter: error: 126 my_message_sql: error: Message: 'Incorrect key file for table: 'forums_posts_new_0'. Try to repair it' thr_unlock: info: updating status: key_file: 302662656 data_file: 1911596088 flush_key_blocks_int: enter: file: 56 blocks_used: 8647 blocks_changed: 1 send_error: enter: sql_errno: 0 err: Incorrect key file for table: 'forums_posts_new_0'. Try to repair it close_thread_tables: info: thd-open_tables=0x84f4fc0 mi_extra: enter: function: 2 sql_print_error: error: Slave: error 'Incorrect key file for table: 'forums_posts_new_0'. Try to repair it' on query 'insert into forums_posts_new_0 ( forumid, messageid, parent, title, author, message, approved, email, ip, rootmessageid,loginid,autorespond,user_id ) values (32380, 1045077656, 0, 'Faculty experts available to discuss issues involving Korea', 'UM', 'http://www.umich.edu/news/Releases/2003/Feb03/r020703a.html', 'yes', '', inet_aton('144.118.132.197'), 1045077656, 0,'no','4a119100a6134a6dee9964dc257ea586' )', error_code=1034 sql_print_error: error: Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with SLAVE START. We stopped at log 'binlog.004' position 116581764 ~THD(): info: freeing host my_malloc: exit: ptr: 84aa508 hash_init: enter: hash: 84aa9b0 size: 16 my_malloc: exit: ptr: 84c74b8 vio_new: enter: sd=90 my_malloc:
Problem with replication and corrupting tables
Hi all, I have a problem with replication, that while repeatable for me very easily, I can not come up with a way for others to repeat it without all our tables and binlogs (tens of gigabytes). So I'm simply going to describe things here and see if anyone else has experienced anything similar or might have some suggestions. After thinking about using replication, for what seems like forever, I finally got around to it. Both the master and the slave are v4.0.10. I started it up and all seemed to work well for a while. Maybe a few hours. Then I found that a table got corrupted on the slave: ERROR: 1034 Incorrect key file for table: 'forums_posts_new_0'. Try to repair it 030215 10:01:12 Slave: error 'Incorrect key file for table: 'forums_posts_new_0'. Try to repair it' on query 'insert into forums_posts_new_0... Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with SLAVE START. We stopped at log 'binlog.003' position 97273308 At this point the slave SQL thread stopped. The IO thread continued. A couple of days later I noticed the error, repaired the table and started the slave thread again. With the IO thread so far ahead, the SQL thread could pump through the queries much faster. Now it only takes 3-4 minutes before another table gets corrupted. However, it is not just any table. I have tables 'forums_posts_new_0' to 'forums_posts_new_9' that hold messages. Out of all the tables, only these get corrupted. If I repair the table, then start the slave it will work for 5-15 minutes until another table is corrupted. Repeat. Repeat. Repeat. I checked the drives and the file system for errors and found no problems. The machine that acts as a slave to the master is also used for data-warehouse and FTS operations, has lots of disk access on its database and has no errors. I have tried stopping data warehouse and FTS operations while the slave runs, but it makes no difference. BTW: Sometimes the slave crashes when doing replication (and in the following example, only replication). Example of a backtrace: 0x806f53b handle_segfault + 447 0x826ae18 pthread_sighandler + 184 0x8296b07 memcpy + 39 0x823703d _mi_balance_page + 649 0x8236994 _mi_insert + 392 0x82367d2 w_search + 518 0x8236793 w_search + 455 0x8236793 w_search + 455 0x8236793 w_search + 455 0x8236793 w_search + 455 0x8236482 _mi_ck_write_btree + 142 0x82363e9 _mi_ck_write + 65 0x823602f mi_write + 591 0x80c257d write_row__9ha_myisamPc + 101 0x80a17f5 write_record__FP8st_tableP12st_copy_info + 513 0x80a110d mysql_insert__FP3THDP13st_table_listRt4List1Z4ItemRt4List1Zt4List1Z4Item 15enum_duplicates + 1129 0x807ad7a mysql_execute_command__Fv + 6598 0x807d226 mysql_parse__FP3THDPcUi + 146 0x80add97 exec_event__15Query_log_eventP17st_relay_log_info + 427 0x80e3faa exec_relay_log_event__FP3THDP17st_relay_log_info + 542 0x80e4aca handle_slave_sql + 602 0x82685cc pthread_start_thread + 220 0x829dd8a thread_start + 4 After the above crash, mysqld restarted and the slave continued to run for a while without error. Weird. For a while... -steve- - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php