Re: problems w/ Replication over the Internet

2008-04-25 Thread Jan Kirchhoff
Hmmm...
no more ideas or suggestions anybody? :(

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: problems w/ Replication over the Internet

2008-04-22 Thread Jan Kirchhoff
Eric Bergen schrieb:
 TCP checksums aren't as strong as encryption. It's rare but corruption
 can happen.
   
But it happens every other day? that means at least one error in 4GB of
data (I have around 2GB of binlogs/day)?
Every DVD-ISO you download would be corrupt (statistically)?

 Where are you reading the positions from and how are you taking the
 snapshot to restore the slave?
   
From the log file:

080415  6:39:20 [ERROR] Error
running query, slave SQL thread aborted. Fix the problem, and restart
the slave SQL thread with SLAVE START. We stopped at log
'mysql-bin.045709' position 172


I use rsync to set up the slave...



 On Mon, Apr 21, 2008 at 12:30 AM, Jan Kirchhoff [EMAIL PROTECTED] wrote:
   
 Eric Bergen schrieb:

 
 Hi Jan,
   
  
   You have two separate issues here. First the issue with the link
   between the external slave and the master. Running mysql through
   something like stunnel may help with the connection and data loss
   issues.
  
  I wonder how any corruption could happen on a TCP connection as TCP has
  its own checksums and a connection would break down in case of a missing
  packet?

 
 The second problem is that your slave is corrupt. Duplicate key errors
   
   are sometimes caused by a corrupt table but more often by restarting
   replication from an incorrect binlog location. Try recloning the slave
   and starting replication again through stunnel.
  
  The duplicate key errors happen after I start at the beginning of a
  logfile (master_log_pos=0) when the positions that mysql reports as its
  last positions is not working.

  I think I have 2 issues:
  #1: how can this kind of binlog corruption happen on a TCP link although
  TCP has its checksums and resends lost packets?

  #2: why does mysql report a master log position that is obviously wrong?
  mysql  reports log-posion 172 which is not working at all in a change
  master to command, my only option is to start with master_log_pos=0 and
  the number of duplicate key errors and such that I have to skip after
  starting from master_log_pos=0 shows me that the real position that
  mysql has stopped processing the binlog must be something in the
  thousands or tenthousands and not 172?!

  Jan

 



   


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: problems w/ Replication over the Internet

2008-04-21 Thread Jan Kirchhoff
Eric Bergen schrieb:
 Hi Jan,

 You have two separate issues here. First the issue with the link
 between the external slave and the master. Running mysql through
 something like stunnel may help with the connection and data loss
 issues.
   
I wonder how any corruption could happen on a TCP connection as TCP has
its own checksums and a connection would break down in case of a missing
packet?
 The second problem is that your slave is corrupt. Duplicate key errors
 are sometimes caused by a corrupt table but more often by restarting
 replication from an incorrect binlog location. Try recloning the slave
 and starting replication again through stunnel.
   
The duplicate key errors happen after I start at the beginning of a
logfile (master_log_pos=0) when the positions that mysql reports as its
last positions is not working.

I think I have 2 issues:
#1: how can this kind of binlog corruption happen on a TCP link although
TCP has its checksums and resends lost packets?

#2: why does mysql report a master log position that is obviously wrong?
mysql  reports log-posion 172 which is not working at all in a change
master to command, my only option is to start with master_log_pos=0 and
the number of duplicate key errors and such that I have to skip after
starting from master_log_pos=0 shows me that the real position that
mysql has stopped processing the binlog must be something in the
thousands or tenthousands and not 172?!

Jan

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: problems w/ Replication over the Internet

2008-04-21 Thread Eric Bergen
TCP checksums aren't as strong as encryption. It's rare but corruption
can happen.

Where are you reading the positions from and how are you taking the
snapshot to restore the slave?


On Mon, Apr 21, 2008 at 12:30 AM, Jan Kirchhoff [EMAIL PROTECTED] wrote:
 Eric Bergen schrieb:

  Hi Jan,
  
   You have two separate issues here. First the issue with the link
   between the external slave and the master. Running mysql through
   something like stunnel may help with the connection and data loss
   issues.
  
  I wonder how any corruption could happen on a TCP connection as TCP has
  its own checksums and a connection would break down in case of a missing
  packet?

  The second problem is that your slave is corrupt. Duplicate key errors
   are sometimes caused by a corrupt table but more often by restarting
   replication from an incorrect binlog location. Try recloning the slave
   and starting replication again through stunnel.
  
  The duplicate key errors happen after I start at the beginning of a
  logfile (master_log_pos=0) when the positions that mysql reports as its
  last positions is not working.

  I think I have 2 issues:
  #1: how can this kind of binlog corruption happen on a TCP link although
  TCP has its checksums and resends lost packets?

  #2: why does mysql report a master log position that is obviously wrong?
  mysql  reports log-posion 172 which is not working at all in a change
  master to command, my only option is to start with master_log_pos=0 and
  the number of duplicate key errors and such that I have to skip after
  starting from master_log_pos=0 shows me that the real position that
  mysql has stopped processing the binlog must be something in the
  thousands or tenthousands and not 172?!

  Jan




-- 
high performance mysql consulting.
http://provenscaling.com

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: problems w/ Replication over the Internet

2008-04-20 Thread Eric Bergen
Hi Jan,

You have two separate issues here. First the issue with the link
between the external slave and the master. Running mysql through
something like stunnel may help with the connection and data loss
issues.

The second problem is that your slave is corrupt. Duplicate key errors
are sometimes caused by a corrupt table but more often by restarting
replication from an incorrect binlog location. Try recloning the slave
and starting replication again through stunnel.

-Eric

On Tue, Apr 15, 2008 at 1:11 AM, Jan Kirchhoff [EMAIL PROTECTED] wrote:
 I have a setup with a master and a bunch of slaves in my LAN as well as
  one external slave that is running on a Xen-Server on the internet.
  All servers run Debian Linux and its mysql version 5.0.32
  Binlogs are around 2 GB per day. I have no trouble at all with my local
  slaves, but the external one hangs once every two days.
  As this server has no other problems like crashing programs, kenrel
  panics, corrupted files or such, I am pretty sure that the hardware is OK.

  the slave's log:

  Apr 15 06:39:19 db-extern mysqld[24884]: 080415  6:39:19 [ERROR] Error
  reading packet from server: Lost connection to MySQL server during query
  ( server_errno=2013)
  Apr 15 06:39:19 db-extern mysqld[24884]: 080415  6:39:19 [Note] Slave
  I/O thread: Failed reading log event, reconnecting to retry, log
  'mysql-bin.045709' position 7334981
  Apr 15 06:39:19 db-extern mysqld[24884]: 080415  6:39:19 [Note] Slave:
  connected to master '[EMAIL PROTECTED]:1234',replication resumed in log
  'mysql-bin.045709' at position 7334981
  Apr 15 06:39:20 db-extern mysqld[24884]: 080415  6:39:20 [ERROR] Error
  in Log_event::read_log_event(): 'Event too big', data_len: 503316507,
  event_type: 16
  Apr 15 06:39:20 db-extern mysqld[24884]: 080415  6:39:20 [ERROR] Error
  reading relay log event: slave SQL thread aborted because of I/O error
  Apr 15 06:39:20 db-extern mysqld[24884]: 080415  6:39:20 [ERROR] Slave:
  Could not parse relay log event entry. The possible reasons are: the
  master's binary log is corrupted (you can check this by running
  'mysqlbinlog' on the binary log), the slave's relay log is corrupted
  (you can check this by running 'mysq
  lbinlog' on the relay log), a network problem, or a bug in the master's
  or slave's MySQL code. If you want to check the master's binary log or
  slave's relay log, you will be able to know their names by issuing 'SHOW
  SLAVE STATUS' on this slave. Error_code: 0
  Apr 15 06:39:20 db-extern mysqld[24884]: 080415  6:39:20 [ERROR] Error
  running query, slave SQL thread aborted. Fix the problem, and restart
  the slave SQL thread with SLAVE START. We stopped at log
  'mysql-bin.045709' position 172
  Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [Note] Slave
  I/O thread killed while reading event
  Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [Note] Slave
  I/O thread exiting, read up to log 'mysql-bin.045709', position 23801854
  Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [Note] Slave
  SQL thread initialized, starting replication in log 'mysql-bin.045709'
  at position 172, relay log './db-extern-relay-bin.01' position: 4
  Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [Note] Slave
  I/O thread: connected to master '[EMAIL PROTECTED]:1234',  replication
  started in log 'mysql-bin.045709' at position 172
  Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [ERROR] Error
  reading packet from server: error reading log entry ( server_errno=1236)
  Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [ERROR] Got
  fatal error 1236: 'error reading log entry' from master when reading
  data from binary log
  Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [Note] Slave
  I/O thread exiting, read up to log 'mysql-bin.045709', position 172

  slave start;
  doesn't help.

  slave stop, reset slave; change master to
  master_log_file=mysql-bin.045709, master_log_pos=172;slave start
  does not help as well

  the only way to get this up and running again is to do a change master
  to master_log_file=mysql-bin.045709, master_log_pos=0 and use
  sql_slave_skip_counter when I get duplicate key errors. this sucks.
  When this problem occurs, the log positions are always small number, I
  would say less than 500.

  I also get connection errors in the log from time to time, but it
  recovers itself:
  Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [ERROR] Error
  reading packet from server: Lost connection to MySQL server during query
  ( server_errno=2013)
  Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [Note] Slave
  I/O thread: Failed reading log event, reconnecting to retry, log
  'mysql-bin.045705' position 34671615
  Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [Note] Slave:
  connected to master '[EMAIL PROTECTED]:1234',replication resumed in log
  'mysql-bin.045705' at position 34671615

  Sometimes I have
  Apr 13 23:22:04 

problems w/ Replication over the Internet

2008-04-15 Thread Jan Kirchhoff
I have a setup with a master and a bunch of slaves in my LAN as well as
one external slave that is running on a Xen-Server on the internet.
All servers run Debian Linux and its mysql version 5.0.32
Binlogs are around 2 GB per day. I have no trouble at all with my local
slaves, but the external one hangs once every two days.
As this server has no other problems like crashing programs, kenrel
panics, corrupted files or such, I am pretty sure that the hardware is OK.

the slave's log:

Apr 15 06:39:19 db-extern mysqld[24884]: 080415  6:39:19 [ERROR] Error
reading packet from server: Lost connection to MySQL server during query
( server_errno=2013)
Apr 15 06:39:19 db-extern mysqld[24884]: 080415  6:39:19 [Note] Slave
I/O thread: Failed reading log event, reconnecting to retry, log
'mysql-bin.045709' position 7334981
Apr 15 06:39:19 db-extern mysqld[24884]: 080415  6:39:19 [Note] Slave:
connected to master '[EMAIL PROTECTED]:1234',replication resumed in log
'mysql-bin.045709' at position 7334981
Apr 15 06:39:20 db-extern mysqld[24884]: 080415  6:39:20 [ERROR] Error
in Log_event::read_log_event(): 'Event too big', data_len: 503316507,
event_type: 16
Apr 15 06:39:20 db-extern mysqld[24884]: 080415  6:39:20 [ERROR] Error
reading relay log event: slave SQL thread aborted because of I/O error
Apr 15 06:39:20 db-extern mysqld[24884]: 080415  6:39:20 [ERROR] Slave:
Could not parse relay log event entry. The possible reasons are: the
master's binary log is corrupted (you can check this by running
'mysqlbinlog' on the binary log), the slave's relay log is corrupted
(you can check this by running 'mysq
lbinlog' on the relay log), a network problem, or a bug in the master's
or slave's MySQL code. If you want to check the master's binary log or
slave's relay log, you will be able to know their names by issuing 'SHOW
SLAVE STATUS' on this slave. Error_code: 0
Apr 15 06:39:20 db-extern mysqld[24884]: 080415  6:39:20 [ERROR] Error
running query, slave SQL thread aborted. Fix the problem, and restart
the slave SQL thread with SLAVE START. We stopped at log
'mysql-bin.045709' position 172
Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [Note] Slave
I/O thread killed while reading event
Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [Note] Slave
I/O thread exiting, read up to log 'mysql-bin.045709', position 23801854
Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [Note] Slave
SQL thread initialized, starting replication in log 'mysql-bin.045709'
at position 172, relay log './db-extern-relay-bin.01' position: 4
Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [Note] Slave
I/O thread: connected to master '[EMAIL PROTECTED]:1234',  replication
started in log 'mysql-bin.045709' at position 172
Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [ERROR] Error
reading packet from server: error reading log entry ( server_errno=1236)
Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [ERROR] Got
fatal error 1236: 'error reading log entry' from master when reading
data from binary log
Apr 15 06:40:01 db-extern mysqld[24884]: 080415  6:40:01 [Note] Slave
I/O thread exiting, read up to log 'mysql-bin.045709', position 172

slave start;
doesn't help.

slave stop, reset slave; change master to
master_log_file=mysql-bin.045709, master_log_pos=172;slave start
does not help as well

the only way to get this up and running again is to do a change master
to master_log_file=mysql-bin.045709, master_log_pos=0 and use
sql_slave_skip_counter when I get duplicate key errors. this sucks.
When this problem occurs, the log positions are always small number, I
would say less than 500.

I also get connection errors in the log from time to time, but it
recovers itself:
Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [ERROR] Error
reading packet from server: Lost connection to MySQL server during query
( server_errno=2013)
Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [Note] Slave
I/O thread: Failed reading log event, reconnecting to retry, log
'mysql-bin.045705' position 34671615
Apr 14 22:27:17 db-extern mysqld[24884]: 080414 22:27:17 [Note] Slave:
connected to master '[EMAIL PROTECTED]:1234',replication resumed in log
'mysql-bin.045705' at position 34671615

Sometimes I have
Apr 13 23:22:04 db-extern mysqld[24884]: 080413 23:22:04 [ERROR] Slave:
Error 'You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use
near '^\' at line 1' on query.
Apr 13 23:22:04 db-extern mysqld[24884]: 080413 23:22:04 [ERROR] Error
running query, slave SQL thread aborted. Fix the problem, and restart
the slave SQL thread with SLAVE START. We stopped at log
'mysql-bin.045699' position 294101453
But this time 
slave stop, reset slave; change master to
master_log_file=mysql-bin.045699, master_log_pos=294101453;slave start
helps!

master# mysqlbinlog --position=172 mysql-bin.045709
/*!40019 SET @@session.max_insert_delayed_threads=0*/;