I have a setup with a master and a bunch of slaves in my LAN as well as
one external slave that is running on a Xen-Server on the internet.
All servers run Debian Linux and its mysql version 5.0.32
Binlogs are around 2 GB per day. I have no trouble at all with my local
slaves, but the external one hangs once every two days.
As this server has no "other" problems like crashing programs, kenrel
panics, corrupted files or such, I am pretty sure that the hardware is OK.

the slave's log:

Apr 15 06:39:19 db-extern mysqld[24884]: 080415  6:39:19 [ERROR] Error
reading packet from server: Lost connection to MySQL server during query
( server_errno=2013)
Apr 15 06:39:19 db-extern mysqld[24884]: 080415  6:39:19 [Note] Slave
I/O thread: Failed reading log event, reconnecting to retry, log
'mysql-bin.045709' position 7334981
Apr 15 06:39:19 db-extern mysqld[24884]: 080415  6:39:19 [Note] Slave:
connected to master '[EMAIL PROTECTED]:1234',replication resumed in log
'mysql-bin.045709' at position 7334981
Apr 15 06:39:20 db-extern mysqld[24884]: 080415  6:39:20 [ERROR] Error
in Log_event::read_log_event(): 'Event too big', data_len: 503316507,
event_type: 16
Apr 15 06:39:20 db-extern mysqld[24884]: 080415  6:39:20 [ERROR] Error
reading relay log event: slave SQL thread aborted because of I/O error
Apr 15 06:39:20 db-extern mysqld[24884]: 080415  6:39:20 [ERROR] Slave:
Could not parse relay log event entry. The possible reasons are: the
master's binary log is corrupted (you can check this by running
'mysqlbinlog' on the binary log), the slave's relay log is corrupted
(you can check this by running 'mysq
lbinlog' on the relay log), a network problem, or a bug in the master's
or slave's MySQL code. If you want to check the master's binary log or
slave's relay log, you will be able to know their names by issuing 'SHOW
SLAVE STATUS' on this slave. Error_code: 0
Apr 15 06:39:20 db-extern mysqld[24884]: 080415  6:39:20 [ERROR] Error
running query, slave SQL thread aborted. Fix the problem, and restart
the slave SQL thread with "SLAVE START". We stopped at log
'mysql-bin.045709' position 172
Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [Note] Slave
I/O thread killed while reading event
Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [Note] Slave
I/O thread exiting, read up to log 'mysql-bin.045709', position 23801854
Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [Note] Slave
SQL thread initialized, starting replication in log 'mysql-bin.045709'
at position 172, relay log './db-extern-relay-bin.000001' position: 4
Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [Note] Slave
I/O thread: connected to master '[EMAIL PROTECTED]:1234',  replication
started in log 'mysql-bin.045709' at position 172
Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [ERROR] Error
reading packet from server: error reading log entry ( server_errno=1236)
Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [ERROR] Got
fatal error 1236: 'error reading log entry' from master when reading
data from binary log
Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [Note] Slave
I/O thread exiting, read up to log 'mysql-bin.045709', position 172

slave start;
doesn't help.

slave stop, reset slave; change master to
master_log_file="mysql-bin.045709", master_log_pos=172;slave start
does not help as well

the only way to get this up and running again is to do a change master
to master_log_file="mysql-bin.045709", master_log_pos=0 and use
sql_slave_skip_counter when I get duplicate key errors. this sucks.
When this problem occurs, the log positions are always small number, I
would say less than 500.

I also get connection errors in the log from time to time, but it
recovers itself:
Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [ERROR] Error
reading packet from server: Lost connection to MySQL server during query
( server_errno=2013)
Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [Note] Slave
I/O thread: Failed reading log event, reconnecting to retry, log
'mysql-bin.045705' position 34671615
Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [Note] Slave:
connected to master '[EMAIL PROTECTED]:1234',replication resumed in log
'mysql-bin.045705' at position 34671615

Sometimes I have
Apr 13 23:22:04 db-extern mysqld[24884]: 080413 23:22:04 [ERROR] Slave:
Error 'You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use
near '^\' at line 1' on query.
Apr 13 23:22:04 db-extern mysqld[24884]: 080413 23:22:04 [ERROR] Error
running query, slave SQL thread aborted. Fix the problem, and restart
the slave SQL thread with "SLAVE START". We stopped at log
'mysql-bin.045699' position 294101453
But this time 
slave stop, reset slave; change master to
master_log_file="mysql-bin.045699", master_log_pos=294101453;slave start
helps!

master# mysqlbinlog --position=172 mysql-bin.045709
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;

ERROR: Error in Log_event::read_log_event(): 'read error', data_len:
543519343, event_type: 116
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET [EMAIL PROTECTED]/;

so the position "172" seems to be wrong?

master# mysqlbinlog mysql-bin.045709  >/dev/null
master#

The binlog on the master is ok (As I said, alle other slaves replicate
without any problems...)

Any suggestions? I have cronjobs running now that read the output of
"show slave status" and run queries like the above
slave stop, reset slave; change master to
master_log_file="mysql-bin.045699", master_log_pos=294101453; if
necessary and every second day I do a change master to
master_log_file="abc", master_log_pos=0;slave start in the console and
start a "sql_slave_skip_counter"-loop in the bash until everything is
running without error again.
btw: Although the master's binlog-postion the slave tells me (in this
case 172) is a relatively low number, i have to send at least a few
dozen of "sql_slave_skip_counter"-queries. So the problem seems to be,
that the "172" should be something in the ten-thousands or more...

Has anybody seen something like this?

Jan


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to