I have a setup with a master and a bunch of slaves in my LAN as well as one external slave that is running on a Xen-Server on the internet. All servers run Debian Linux and its mysql version 5.0.32 Binlogs are around 2 GB per day. I have no trouble at all with my local slaves, but the external one hangs once every two days. As this server has no "other" problems like crashing programs, kenrel panics, corrupted files or such, I am pretty sure that the hardware is OK.
the slave's log: Apr 15 06:39:19 db-extern mysqld[24884]: 080415 6:39:19 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Apr 15 06:39:19 db-extern mysqld[24884]: 080415 6:39:19 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.045709' position 7334981 Apr 15 06:39:19 db-extern mysqld[24884]: 080415 6:39:19 [Note] Slave: connected to master '[EMAIL PROTECTED]:1234',replication resumed in log 'mysql-bin.045709' at position 7334981 Apr 15 06:39:20 db-extern mysqld[24884]: 080415 6:39:20 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 503316507, event_type: 16 Apr 15 06:39:20 db-extern mysqld[24884]: 080415 6:39:20 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error Apr 15 06:39:20 db-extern mysqld[24884]: 080415 6:39:20 [ERROR] Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysq lbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0 Apr 15 06:39:20 db-extern mysqld[24884]: 080415 6:39:20 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.045709' position 172 Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [Note] Slave I/O thread killed while reading event Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.045709', position 23801854 Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.045709' at position 172, relay log './db-extern-relay-bin.000001' position: 4 Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [Note] Slave I/O thread: connected to master '[EMAIL PROTECTED]:1234', replication started in log 'mysql-bin.045709' at position 172 Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [ERROR] Error reading packet from server: error reading log entry ( server_errno=1236) Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [ERROR] Got fatal error 1236: 'error reading log entry' from master when reading data from binary log Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.045709', position 172 slave start; doesn't help. slave stop, reset slave; change master to master_log_file="mysql-bin.045709", master_log_pos=172;slave start does not help as well the only way to get this up and running again is to do a change master to master_log_file="mysql-bin.045709", master_log_pos=0 and use sql_slave_skip_counter when I get duplicate key errors. this sucks. When this problem occurs, the log positions are always small number, I would say less than 500. I also get connection errors in the log from time to time, but it recovers itself: Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.045705' position 34671615 Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [Note] Slave: connected to master '[EMAIL PROTECTED]:1234',replication resumed in log 'mysql-bin.045705' at position 34671615 Sometimes I have Apr 13 23:22:04 db-extern mysqld[24884]: 080413 23:22:04 [ERROR] Slave: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '^\' at line 1' on query. Apr 13 23:22:04 db-extern mysqld[24884]: 080413 23:22:04 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.045699' position 294101453 But this time slave stop, reset slave; change master to master_log_file="mysql-bin.045699", master_log_pos=294101453;slave start helps! master# mysqlbinlog --position=172 mysql-bin.045709 /*!40019 SET @@session.max_insert_delayed_threads=0*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 543519343, event_type: 116 # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET [EMAIL PROTECTED]/; so the position "172" seems to be wrong? master# mysqlbinlog mysql-bin.045709 >/dev/null master# The binlog on the master is ok (As I said, alle other slaves replicate without any problems...) Any suggestions? I have cronjobs running now that read the output of "show slave status" and run queries like the above slave stop, reset slave; change master to master_log_file="mysql-bin.045699", master_log_pos=294101453; if necessary and every second day I do a change master to master_log_file="abc", master_log_pos=0;slave start in the console and start a "sql_slave_skip_counter"-loop in the bash until everything is running without error again. btw: Although the master's binlog-postion the slave tells me (in this case 172) is a relatively low number, i have to send at least a few dozen of "sql_slave_skip_counter"-queries. So the problem seems to be, that the "172" should be something in the ten-thousands or more... Has anybody seen something like this? Jan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]