RE: Problem with replication and corrupting tables

2003-02-19 Thread Steven Roussey
Quick question: Are the binlog and relaylog files the same format?
Initial tests seem to indicate that they are the same. Can I use

mysqlbinlog -o Relay_Log_Pos Relay_Log_File | mysql

to get the slave more up to date (without having the slave SQL thread
running)? I tried the above but the Relay_Log_Pos from 'show slave
status' seemed way past the end of the file as it returned no results.
:(

How do I get a proper offset from which to start?

Being able to do this would isolate the issue squarely at the slave SQL
thread if the above had no issues.

Also, I uploaded a small trace file that shows the corruption. It is the
smallest I was able to make last night (about 72MB -- 6MB gzipped). It
is in the secret folder. Hopefully it will help.

-steve-



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: Problem with replication and corrupting tables

2003-02-19 Thread Steven Roussey
Hi,

mysqlbinlog -j Relay_Log_Pos Relay_Log_File | mysql

works fine. I used -o instead of -j before. So I answered my last
question. When doing this:

mysqlbinlog -j Relay_Log_Pos Relay_Log_File | more

I see that it had advanced to the query after the one with the problem
in the trace file. In fact, the query succeeded and was there after a
REPAIR TABLE .. USE_FRM.

Now that I got the above to work, I ran it.

And I found a surprising result (to me): It still failed.

So the problem is not with the replication code per se.

So maybe I can make a test case



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: Problem with replication and corrupting tables

2003-02-19 Thread Steven Roussey
Hi,

And fixed.

Sorry for the waste of time. Only 4 days before I was set to replace the
disk the database was on, and it is going bad. :( 

-steve-

sql,query


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: Problem with replication and corrupting tables

2003-02-18 Thread Steven Roussey
An update. I'm now running the debug version on the slave. I could not trace
out 'info' since it wrote way too much to the trace file.

What I did find that was unique when the table crashed is this:

handle_slave_sql: query: insert into forums_posts_new_3 ( 
w_search: error: Got errno: 0 from key_cache_read
mi_write: error: Got error: 126 on write
my_message_sql: error: Message: 'Incorrect key file for table:
'forums_posts_new_3'. Try to repair it'
sql_print_error: error: Slave: error 'Incorrect key file for table:
'forums_posts_new_3'. Try to repair it' on query 'insert into
forums_posts_new_3 

I don't know why it had a problem with error zero from key_cache_read --
that seems to be the oddest thing in the log. It appears no where else.

I'll keep digging. Does no one else have a problem with a slave stopping and
corrupting its own tables?

The only thing about the insert query that may be seen as odd is that it has
binary data in it. That is one of the fields is like this:

'xÚÍTKoã6^P¾÷W~L}ÉÅÕf~K^B^EÚÝ,~\®~Qm7Û\0qÐ`~O#~S~R^HQ¤AÒQÕ_ßoH%V^N=ôP`~Q~Ge~
J3ó=ffKIó¡Ó~AZ~Ò$^O~MiRGm`¥é`9FbGø0­^[´K?Ó~MN~T:m^B^]9à$RòHc­~\^NÄ^T~S^O^S~M
^Fi~X^F^_Ø^Rç^PÒN~QoȤ~J^^ðÕé¿^R)~^ò»Þ¨H^G^^4Õ|èQ^TW^]¾LåC·~@±(~DòRÇèX½»¾¿ú.
ÿûÌ©~[(²Q^[Z^?~Y¨Áw^].@^A~Oa\0;I;~B¨°e²
^^ݶß[~^~Lk¡~@~K^UÝ¡b2~@2ÊoÀ^Wîå­?^E¹~\^Y^O^\z(a
^N^PGyö.3i~Bw~I¢æ~\\\N~NæÐ~_~N4\{®n~R^D~]~F£~DË~EàYel\\Ø~U*#Ô¥Æf`ò²^N¾×ùi^P
w
®~N±Z~_ù¯·B¯ã^DÊ~R¥~H?~Cȶ|X~S`Uùdö~!àú£w^W~I~N§~T~QL/~D~A3Û~P~I®~V%ÿÔ0»õ^mÖ
Yù^?Éü~GxmMJVÓíé0Q`^S~A~Cá~E+ÜgßîP³áÁØ~IÐ_³qq^CÍ}E×@^F^Es0^]:è
èñ¡ð^UÍ~K1^U=~JÍR@ù¿¡{¦ágc7P?^U?^P7áb^BÞÜ^ZÞYá~C~FÅKk~^æ21C~K*­Ø~Y87»^A~U
¥;ø^SL˽ÀÃ~IɶN^WA?×zåÛ^C^NiäܨÆÍ^e~]*úÝwÎM~R_Ò·n¹^A^T±~CJ^_~Vɾj|
^Yø~B~G^Ua^H^T+%7-ÚF/Así!Â0ÑV
^?áÐê~Jö`8^C±¦í^Rf¶^E\~L~N:^EiÄ~O:ê~Ph~O^TCÖD~VÆÑ2P~Kj~]^L·$^Qí^A¯fP~W:þ
÷´Ó¨~Wt^HF l^zÎ^Q¥tís~_|
qGÀíµ\\^D#^H*UN®~D^A2~DL^BÚWU%)T`×g~WæÀç~Y^\±e~T^_^]E~OõC£?Y%vÕAs~_·~QË^XÀC¡
^OB¦^Oç`~T*hÞ^^S^HÞ^_caÚ^[k^QóÓåiCÙ}9t~A¹t^A~J^CcQ¤A^W\4ÅÕ^YÀ~\ì~GKé3ý:^
S¸¢{ç^MQ[Ù^Ay%ü\^G¯ÂS¦^R^S^P~W^TbR͹û~\ÂÎ| E¿Ü`xq^A}åâr~H;^O~[^Z#éÊ8c8e¹|
HÙãe^R^T®ç~~_^[M~WÅ!kp@^^ÜÏ~Ijû²~B^Vµ÷i^ÿ~_$~LŦ^\wnÉ2¥à~F~]Vk*´¡lãå\0ÝÞå~_
^U½3CK1^\Þ¯ß~X~A[^]ßt|NUk~Z58~J­ïßþ^HMTêðpõÿ^G~\AþçRtïc\\Ä×á*ÿ}s^\éáÓ~N®w;º
ÙíöôøÛí-mo^_·_÷t^?÷ëçÕê~[õå^_-:^G^W'

I'm starting to run out of ideas...

-steve-



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: Problem with replication and corrupting tables

2003-02-18 Thread Steven Roussey
Below is a trace (--debug=d,enter,exit,info,error,query,general,where:
O,/tmp/mysqld.trace) of the slave thread. This is the best I can do as
far as a bug report. No other queries were running and the slave I/O
thread was idle (I firewalled its connection to the master/rest of the
world).

Without the SQL slave thread all is OK. This server can do any number of
normal operations without error. The IO slave works fine. The SQL slave
normally causes corruption, but has also caused a crash (a backtrace is
in the first message of this thread). The error in this more detailed
log seems different than in the previous log. But both point to the key
cache. Why the SQL slave thread would cause something bad to happen in
the key cache is beyond me. Another day...

Very tired,
-steve-


my_b_seek: enter: pos: 0
my_malloc: exit: ptr: 84dc248
my_malloc: exit: ptr: 84bffd8
my_malloc: exit: ptr: 8525b18
handle_slave_sql: query: insert into forums_posts_new_0 ( forumid,
messageid, parent, title, author, message, approved, email, ip,
rootmessageid,loginid,autorespond,user_id )
values
(32380, 1045077656, 0,
'Faculty experts available to discuss issues involving Korea', 'UM',
'http://www.umich.edu/news/Releases/2003/Feb03/r020703a.html', 'yes',
'', inet_aton('244.118.132.197'), 1045077656,
0,'no','4a119100a6134a6dee9964dc257ea582' )
my_malloc: exit: ptr: 8522f60
set_lock_for_tables: enter: lock_type: 7  for_update: 1
check_access: enter: want_access: 2  master_access: 4294967295
hash_search: exit: found key at 26
my_malloc: exit: ptr: 8512f48
mi_get_status: info: key_file: 302662656  data_file: 1911596088
mi_write: enter: isam: 56  data: 57
_mi_make_key: exit: keynr: 0
w_search: enter: page: 64677888
key_cache_read: enter: file 56, filepos 64677888, length 1024
find_key_block: enter: file 56, filepos 64677888
_mi_bin_search: exit: flag: 1  keypos: 2
w_search: enter: page: 12455936
key_cache_read: enter: file 56, filepos 12455936, length 1024
find_key_block: enter: file 56, filepos 12455936
_mi_bin_search: exit: flag: 1  keypos: 4
w_search: enter: page: 8588288
key_cache_read: enter: file 56, filepos 8588288, length 1024
find_key_block: enter: file 56, filepos 8588288
_mi_bin_search: exit: flag: 1  keypos: 31
w_search: enter: page: 8554496
key_cache_read: enter: file 56, filepos 8554496, length 1024
find_key_block: enter: file 56, filepos 8554496
_mi_bin_search: exit: flag: 1  keypos: 28
_mi_insert: enter: key_pos: bfefc8ae
key_cache_write: enter: file 56, filepos 8554496, length 1024
find_key_block: enter: file 56, filepos 8554496
_mi_make_key: exit: keynr: 1
w_search: enter: page: 118468608
key_cache_read: enter: file 56, filepos 118468608, length 1024
find_key_block: enter: file 56, filepos 118468608
_mi_bin_search: exit: flag: 1  keypos: 1
w_search: enter: page: 7552
key_cache_read: enter: file 56, filepos 7552, length 1024
find_key_block: enter: file 56, filepos 7552
_mi_bin_search: exit: flag: 1  keypos: 23
w_search: enter: page: 71856128
key_cache_read: enter: file 56, filepos 71856128, length 1024
find_key_block: enter: file 56, filepos 71856128
_mi_bin_search: exit: flag: 1  keypos: 11
w_search: enter: page: 71792640
key_cache_read: enter: file 56, filepos 71792640, length 1024
find_key_block: enter: file 56, filepos 71792640
w_search: error: page 71792640 had wrong page length: 26656
w_search: exit: Error: 126
mi_write: error: Got error: 126 on write
print_error: enter: error: 126
my_message_sql: error: Message: 'Incorrect key file for table:
'forums_posts_new_0'. Try to repair it'
thr_unlock: info: updating status:  key_file: 302662656  data_file:
1911596088
flush_key_blocks_int: enter: file: 56  blocks_used: 8647
blocks_changed: 1
send_error: enter: sql_errno: 0  err: Incorrect key file for table:
'forums_posts_new_0'. Try to repair it
close_thread_tables: info: thd-open_tables=0x84f4fc0
mi_extra: enter: function: 2
sql_print_error: error: Slave: error 'Incorrect key file for table:
'forums_posts_new_0'. Try to repair it' on query 'insert into
forums_posts_new_0 ( forumid, messageid, parent, title, author, message,
approved, email, ip, rootmessageid,loginid,autorespond,user_id )
values
(32380, 1045077656, 0,
'Faculty experts available to discuss issues involving Korea', 'UM',
'http://www.umich.edu/news/Releases/2003/Feb03/r020703a.html', 'yes',
'', inet_aton('144.118.132.197'), 1045077656,
0,'no','4a119100a6134a6dee9964dc257ea586' )', error_code=1034
sql_print_error: error: Error running query, slave SQL thread aborted.
Fix the problem, and restart the slave SQL thread with SLAVE START. We
stopped at log 'binlog.004' position 116581764
~THD(): info: freeing host
my_malloc: exit: ptr: 84aa508
hash_init: enter: hash: 84aa9b0  size: 16
my_malloc: exit: ptr: 84c74b8
vio_new: enter: sd=90
my_malloc: 

Problem with replication and corrupting tables

2003-02-15 Thread Steven Roussey
Hi all,

I have a problem with replication, that while repeatable for me very
easily, I can not come up with a way for others to repeat it without all
our tables and binlogs (tens of gigabytes). So I'm simply going to
describe things here and see if anyone else has experienced anything
similar or might have some suggestions.

After thinking about using replication, for what seems like forever, I
finally got around to it. Both the master and the slave are v4.0.10. I
started it up and all seemed to work well for a while. Maybe a few
hours. 

Then I found that a table got corrupted on the slave:

ERROR: 1034  Incorrect key file for table: 'forums_posts_new_0'. Try to
repair it
030215 10:01:12  Slave: error 'Incorrect key file for table:
'forums_posts_new_0'. Try to repair it' on query 'insert into
forums_posts_new_0...
Error running query, slave SQL thread aborted. Fix the problem, and
restart the slave SQL thread with SLAVE START. We stopped at log
'binlog.003' position 97273308

At this point the slave SQL thread stopped. The IO thread continued.

A couple of days later I noticed the error, repaired the table and
started the slave thread again. With the IO thread so far ahead, the SQL
thread could pump through the queries much faster. Now it only takes 3-4
minutes before another table gets corrupted. 

However, it is not just any table. I have tables 'forums_posts_new_0' to
'forums_posts_new_9' that hold messages. Out of all the tables, only
these get corrupted.

If I repair the table, then start the slave it will work for 5-15
minutes until another table is corrupted. Repeat. Repeat. Repeat.

I checked the drives and the file system for errors and found no
problems. The machine that acts as a slave to the master is also used
for data-warehouse and FTS operations, has lots of disk access on its
database and has no errors. I have tried stopping data warehouse and FTS
operations while the slave runs, but it makes no difference. 

BTW: Sometimes the slave crashes when doing replication (and in the
following example, only replication). Example of a backtrace:

0x806f53b handle_segfault + 447
0x826ae18 pthread_sighandler + 184
0x8296b07 memcpy + 39
0x823703d _mi_balance_page + 649
0x8236994 _mi_insert + 392
0x82367d2 w_search + 518
0x8236793 w_search + 455
0x8236793 w_search + 455
0x8236793 w_search + 455
0x8236793 w_search + 455
0x8236482 _mi_ck_write_btree + 142
0x82363e9 _mi_ck_write + 65
0x823602f mi_write + 591
0x80c257d write_row__9ha_myisamPc + 101
0x80a17f5 write_record__FP8st_tableP12st_copy_info + 513
0x80a110d
mysql_insert__FP3THDP13st_table_listRt4List1Z4ItemRt4List1Zt4List1Z4Item
15enum_duplicates + 1129
0x807ad7a mysql_execute_command__Fv + 6598
0x807d226 mysql_parse__FP3THDPcUi + 146
0x80add97 exec_event__15Query_log_eventP17st_relay_log_info + 427
0x80e3faa exec_relay_log_event__FP3THDP17st_relay_log_info + 542
0x80e4aca handle_slave_sql + 602
0x82685cc pthread_start_thread + 220
0x829dd8a thread_start + 4

After the above crash, mysqld restarted and the slave continued to run
for a while without error. Weird. For a while...

-steve-



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php