Re: problems w/ Replication over the Internet
Hmmm... no more ideas or suggestions anybody? :( -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: problems w/ Replication over the Internet
Eric Bergen schrieb: TCP checksums aren't as strong as encryption. It's rare but corruption can happen. But it happens every other day? that means at least one error in 4GB of data (I have around 2GB of binlogs/day)? Every DVD-ISO you download would be corrupt (statistically)? Where are you reading the positions from and how are you taking the snapshot to restore the slave? From the log file: 080415 6:39:20 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with SLAVE START. We stopped at log 'mysql-bin.045709' position 172 I use rsync to set up the slave... On Mon, Apr 21, 2008 at 12:30 AM, Jan Kirchhoff [EMAIL PROTECTED] wrote: Eric Bergen schrieb: Hi Jan, You have two separate issues here. First the issue with the link between the external slave and the master. Running mysql through something like stunnel may help with the connection and data loss issues. I wonder how any corruption could happen on a TCP connection as TCP has its own checksums and a connection would break down in case of a missing packet? The second problem is that your slave is corrupt. Duplicate key errors are sometimes caused by a corrupt table but more often by restarting replication from an incorrect binlog location. Try recloning the slave and starting replication again through stunnel. The duplicate key errors happen after I start at the beginning of a logfile (master_log_pos=0) when the positions that mysql reports as its last positions is not working. I think I have 2 issues: #1: how can this kind of binlog corruption happen on a TCP link although TCP has its checksums and resends lost packets? #2: why does mysql report a master log position that is obviously wrong? mysql reports log-posion 172 which is not working at all in a change master to command, my only option is to start with master_log_pos=0 and the number of duplicate key errors and such that I have to skip after starting from master_log_pos=0 shows me that the real position that mysql has stopped processing the binlog must be something in the thousands or tenthousands and not 172?! Jan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: problems w/ Replication over the Internet
Eric Bergen schrieb: Hi Jan, You have two separate issues here. First the issue with the link between the external slave and the master. Running mysql through something like stunnel may help with the connection and data loss issues. I wonder how any corruption could happen on a TCP connection as TCP has its own checksums and a connection would break down in case of a missing packet? The second problem is that your slave is corrupt. Duplicate key errors are sometimes caused by a corrupt table but more often by restarting replication from an incorrect binlog location. Try recloning the slave and starting replication again through stunnel. The duplicate key errors happen after I start at the beginning of a logfile (master_log_pos=0) when the positions that mysql reports as its last positions is not working. I think I have 2 issues: #1: how can this kind of binlog corruption happen on a TCP link although TCP has its checksums and resends lost packets? #2: why does mysql report a master log position that is obviously wrong? mysql reports log-posion 172 which is not working at all in a change master to command, my only option is to start with master_log_pos=0 and the number of duplicate key errors and such that I have to skip after starting from master_log_pos=0 shows me that the real position that mysql has stopped processing the binlog must be something in the thousands or tenthousands and not 172?! Jan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: problems w/ Replication over the Internet
TCP checksums aren't as strong as encryption. It's rare but corruption can happen. Where are you reading the positions from and how are you taking the snapshot to restore the slave? On Mon, Apr 21, 2008 at 12:30 AM, Jan Kirchhoff [EMAIL PROTECTED] wrote: Eric Bergen schrieb: Hi Jan, You have two separate issues here. First the issue with the link between the external slave and the master. Running mysql through something like stunnel may help with the connection and data loss issues. I wonder how any corruption could happen on a TCP connection as TCP has its own checksums and a connection would break down in case of a missing packet? The second problem is that your slave is corrupt. Duplicate key errors are sometimes caused by a corrupt table but more often by restarting replication from an incorrect binlog location. Try recloning the slave and starting replication again through stunnel. The duplicate key errors happen after I start at the beginning of a logfile (master_log_pos=0) when the positions that mysql reports as its last positions is not working. I think I have 2 issues: #1: how can this kind of binlog corruption happen on a TCP link although TCP has its checksums and resends lost packets? #2: why does mysql report a master log position that is obviously wrong? mysql reports log-posion 172 which is not working at all in a change master to command, my only option is to start with master_log_pos=0 and the number of duplicate key errors and such that I have to skip after starting from master_log_pos=0 shows me that the real position that mysql has stopped processing the binlog must be something in the thousands or tenthousands and not 172?! Jan -- high performance mysql consulting. http://provenscaling.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: problems w/ Replication over the Internet
Hi Jan, You have two separate issues here. First the issue with the link between the external slave and the master. Running mysql through something like stunnel may help with the connection and data loss issues. The second problem is that your slave is corrupt. Duplicate key errors are sometimes caused by a corrupt table but more often by restarting replication from an incorrect binlog location. Try recloning the slave and starting replication again through stunnel. -Eric On Tue, Apr 15, 2008 at 1:11 AM, Jan Kirchhoff [EMAIL PROTECTED] wrote: I have a setup with a master and a bunch of slaves in my LAN as well as one external slave that is running on a Xen-Server on the internet. All servers run Debian Linux and its mysql version 5.0.32 Binlogs are around 2 GB per day. I have no trouble at all with my local slaves, but the external one hangs once every two days. As this server has no other problems like crashing programs, kenrel panics, corrupted files or such, I am pretty sure that the hardware is OK. the slave's log: Apr 15 06:39:19 db-extern mysqld[24884]: 080415 6:39:19 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Apr 15 06:39:19 db-extern mysqld[24884]: 080415 6:39:19 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.045709' position 7334981 Apr 15 06:39:19 db-extern mysqld[24884]: 080415 6:39:19 [Note] Slave: connected to master '[EMAIL PROTECTED]:1234',replication resumed in log 'mysql-bin.045709' at position 7334981 Apr 15 06:39:20 db-extern mysqld[24884]: 080415 6:39:20 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 503316507, event_type: 16 Apr 15 06:39:20 db-extern mysqld[24884]: 080415 6:39:20 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error Apr 15 06:39:20 db-extern mysqld[24884]: 080415 6:39:20 [ERROR] Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysq lbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0 Apr 15 06:39:20 db-extern mysqld[24884]: 080415 6:39:20 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with SLAVE START. We stopped at log 'mysql-bin.045709' position 172 Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [Note] Slave I/O thread killed while reading event Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.045709', position 23801854 Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.045709' at position 172, relay log './db-extern-relay-bin.01' position: 4 Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [Note] Slave I/O thread: connected to master '[EMAIL PROTECTED]:1234', replication started in log 'mysql-bin.045709' at position 172 Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [ERROR] Error reading packet from server: error reading log entry ( server_errno=1236) Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [ERROR] Got fatal error 1236: 'error reading log entry' from master when reading data from binary log Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.045709', position 172 slave start; doesn't help. slave stop, reset slave; change master to master_log_file=mysql-bin.045709, master_log_pos=172;slave start does not help as well the only way to get this up and running again is to do a change master to master_log_file=mysql-bin.045709, master_log_pos=0 and use sql_slave_skip_counter when I get duplicate key errors. this sucks. When this problem occurs, the log positions are always small number, I would say less than 500. I also get connection errors in the log from time to time, but it recovers itself: Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.045705' position 34671615 Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [Note] Slave: connected to master '[EMAIL PROTECTED]:1234',replication resumed in log 'mysql-bin.045705' at position 34671615 Sometimes I have Apr 13 23:22:04
problems w/ Replication over the Internet
I have a setup with a master and a bunch of slaves in my LAN as well as one external slave that is running on a Xen-Server on the internet. All servers run Debian Linux and its mysql version 5.0.32 Binlogs are around 2 GB per day. I have no trouble at all with my local slaves, but the external one hangs once every two days. As this server has no other problems like crashing programs, kenrel panics, corrupted files or such, I am pretty sure that the hardware is OK. the slave's log: Apr 15 06:39:19 db-extern mysqld[24884]: 080415 6:39:19 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Apr 15 06:39:19 db-extern mysqld[24884]: 080415 6:39:19 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.045709' position 7334981 Apr 15 06:39:19 db-extern mysqld[24884]: 080415 6:39:19 [Note] Slave: connected to master '[EMAIL PROTECTED]:1234',replication resumed in log 'mysql-bin.045709' at position 7334981 Apr 15 06:39:20 db-extern mysqld[24884]: 080415 6:39:20 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 503316507, event_type: 16 Apr 15 06:39:20 db-extern mysqld[24884]: 080415 6:39:20 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error Apr 15 06:39:20 db-extern mysqld[24884]: 080415 6:39:20 [ERROR] Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysq lbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0 Apr 15 06:39:20 db-extern mysqld[24884]: 080415 6:39:20 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with SLAVE START. We stopped at log 'mysql-bin.045709' position 172 Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [Note] Slave I/O thread killed while reading event Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.045709', position 23801854 Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.045709' at position 172, relay log './db-extern-relay-bin.01' position: 4 Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [Note] Slave I/O thread: connected to master '[EMAIL PROTECTED]:1234', replication started in log 'mysql-bin.045709' at position 172 Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [ERROR] Error reading packet from server: error reading log entry ( server_errno=1236) Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [ERROR] Got fatal error 1236: 'error reading log entry' from master when reading data from binary log Apr 15 06:40:01 db-extern mysqld[24884]: 080415 6:40:01 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.045709', position 172 slave start; doesn't help. slave stop, reset slave; change master to master_log_file=mysql-bin.045709, master_log_pos=172;slave start does not help as well the only way to get this up and running again is to do a change master to master_log_file=mysql-bin.045709, master_log_pos=0 and use sql_slave_skip_counter when I get duplicate key errors. this sucks. When this problem occurs, the log positions are always small number, I would say less than 500. I also get connection errors in the log from time to time, but it recovers itself: Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.045705' position 34671615 Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [Note] Slave: connected to master '[EMAIL PROTECTED]:1234',replication resumed in log 'mysql-bin.045705' at position 34671615 Sometimes I have Apr 13 23:22:04 db-extern mysqld[24884]: 080413 23:22:04 [ERROR] Slave: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '^\' at line 1' on query. Apr 13 23:22:04 db-extern mysqld[24884]: 080413 23:22:04 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with SLAVE START. We stopped at log 'mysql-bin.045699' position 294101453 But this time slave stop, reset slave; change master to master_log_file=mysql-bin.045699, master_log_pos=294101453;slave start helps! master# mysqlbinlog --position=172 mysql-bin.045709 /*!40019 SET @@session.max_insert_delayed_threads=0*/;