Re: Replication suddenly stops on mysql 4.1.7 with Slave_IO_Running: No

2005-02-07 Thread Jan Kirchhoff
We've had very good performance with the official mysql-icc-binaries, so 
I upgraded to 4.1.8 last weekend since there is no official 4.1.9 binary 
on the mysql.com-site...

It didn't help with my problems, I still have replication-crashs almost 
every other hour. I put a fresh snapshot from the master onto the slave 
but it didn't help either :(
A simple slave start helps, so I have a cronjob running right now 
checking for the replication-status and issuing a slave start if 
necessary

I have no other idea but try the gcc-4.1.9 in about 3 weeks, I have no 
possibility to take the master database down anytime before that :(

Gleb Paharenko schrieb:
Hello.

 

But I use 4.1.7, not 4.0.21 ...weird.
   


As said at:
 http://dev.mysql.com/doc/mysql/en/news-4-1-8.html

Fixed a bug which caused a crash when only the slave I/O thread was 

stopped and started. (Bug #6148)

I suggest you to upgrade to the latest release (4.1.9 now).





Jan Kirchhoff [EMAIL PROTECTED] wrote:

 

Gleb Paharenko schrieb:
   

 

 

Hello.
 

 

 

 

 

I've looked through the bug database, and the only thing
 

 

 

that I've found was an already-closed bug:
 

 

 

http://bugs.mysql.com/bug.php?id=6148
 

 

 

 

 

I had been looking around the Changelogs, but I had not found that one. 
   

 

Sounds pretty much like my problem :(
   

 

But I use 4.1.7, not 4.0.21 ...weird.
   

 

 

Check that your server passes rpl_relayspace.test. Go to the mysql-test
 

 

 

directory and execute:
 

 

 

./mysql-test-run t/rpl_relayspace.test   
 

 

 

 

 

This one runs wirhout errors on the master and the slave...:
   

 

 

hostname:/usr/local/mysql-standard-4.1.7-pc-linux-i686-icc-glibc23/mysql-test# 
   

 

./mysql-test-run t/rpl_relayspace.test  
   

 

Installing Test Databases
   

 

Removing Stale Files
   

 

Installing Master Databases
   

 

running  ../bin/mysqld --no-defaults --bootstrap --skip-grant-tables 
   

 

--basedir=.. --datadir=mysql-test/var/master-data --skip-innodb 
   

 

--skip-ndbcluster --skip-bdb
   

 

Installing Slave Databases
   

 

running  ../bin/mysqld --no-defaults --bootstrap --skip-grant-tables 
   

 

--basedir=.. --datadir=mysql-test/var/slave-data --skip-innodb 
   

 

--skip-ndbcluster --skip-bdb
   

 

Manager disabled, skipping manager start.
   

 

Loading Standard Test Databases
   

 

Starting Tests
   

 

 

TESTRESULT
   

 

---
   

 

rpl_relayspace [ pass ]  
   

 

---
   

 

 

Ending Tests
   

 

Shutting-down MySQL daemon
   

 

 

Master shutdown finished
   

 

Slave shutdown finished
   

 

All 1 tests were successful.
   

 

 

I'm not able to exchange the mysql-software itself (I use the 
   

 

icc-binary) to a gcc-version or to upgrade to 4.1.9 in the next 2-3 
   

 

weeks. And looking at the changelogs on mysql.com I don't think it would 
   

 

change anything...
   

 

Hasn't anybody else had such problems with 4.1.x?
   

 

 

hostname:/usr/local/mysql-standard-4.1.7-pc-linux-i686-icc-glibc23/bin# 
   

 

./mysqld --version
   

 

./mysqld  Ver 4.1.7-standard for pc-linux on i686 (Official 
   

 

MySQL-standard binary)
   

 

 

(more detailed information on my systems in my initial mail from 2005-1-27)
   

 

 

btw: I also ran mysqlcheck -q and mysqlcheck -o on all tables last week 
   

 

to make sure the tables are OK...
   

 

 

 

 

 

 

 

 

 

Jan Kirchhoff [EMAIL PROTECTED] wrote:
 

 

 

 

 

 

Hi,
   

 

  
   

 

 

 

 

 

 

 

 

 

 

My problem still goes on... After having had the problem 2 more times 
   

 

  
   

 

 

 

 

 

 

within 1 day, I decided to re-do the replication (copy the whole 
   

 

  
   

 

 

 

 

 

 

database onto the slave with rsync and reset master and slave). That 
   

 

  
   

 

 

 

 

 

 

only lasted for little more than 1 day and I ended up with the same error:
   

 

  
   

 

 

 

 

 

 

 

 

 

 

Could not parse relay log event entry. The possible reasons are: the 
   

 

  
   

 

 

 

 

 

 

master's binary log is corrupted (you can check this by running 
   

 

  
   

 

 

 

 

 

 

'mysqlbinlog' on the binary log), the slave's relay log is corrupted 
   

 

  
   

 

 

 

 

 

 

(you can check this by running 'mysqlbinlog' on the relay log), a 
   

 

  
   

 

 

 

 

 

 

network problem, or a bug in the master's or slave's MySQL code. If you 
   

 

  
   

 

 

 

 

 

 

want to check the master's binary log or slave's relay log, you will be 
   

 

  
   

 

 

 

 

 

 

able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
   


Re: Replication suddenly stops on mysql 4.1.7 with Slave_IO_Running: No

2005-02-01 Thread Jan Kirchhoff
Gleb Paharenko schrieb:
Hello.

I've looked through the bug database, and the only thing
that I've found was an already-closed bug:
 http://bugs.mysql.com/bug.php?id=6148
 

I had been looking around the Changelogs, but I had not found that one. 
Sounds pretty much like my problem :(
But I use 4.1.7, not 4.0.21 ...weird.

Check that your server passes rpl_relayspace.test. Go to the mysql-test
directory and execute:
 ./mysql-test-run t/rpl_relayspace.test   
 

This one runs wirhout errors on the master and the slave...:
hostname:/usr/local/mysql-standard-4.1.7-pc-linux-i686-icc-glibc23/mysql-test# 
./mysql-test-run t/rpl_relayspace.test  
Installing Test Databases
Removing Stale Files
Installing Master Databases
running  ../bin/mysqld --no-defaults --bootstrap --skip-grant-tables 
--basedir=.. --datadir=mysql-test/var/master-data --skip-innodb 
--skip-ndbcluster --skip-bdb
Installing Slave Databases
running  ../bin/mysqld --no-defaults --bootstrap --skip-grant-tables 
--basedir=.. --datadir=mysql-test/var/slave-data --skip-innodb 
--skip-ndbcluster --skip-bdb
Manager disabled, skipping manager start.
Loading Standard Test Databases
Starting Tests

TESTRESULT
---
rpl_relayspace [ pass ]  
---

Ending Tests
Shutting-down MySQL daemon
Master shutdown finished
Slave shutdown finished
All 1 tests were successful.
I'm not able to exchange the mysql-software itself (I use the 
icc-binary) to a gcc-version or to upgrade to 4.1.9 in the next 2-3 
weeks. And looking at the changelogs on mysql.com I don't think it would 
change anything...
Hasn't anybody else had such problems with 4.1.x?

hostname:/usr/local/mysql-standard-4.1.7-pc-linux-i686-icc-glibc23/bin# 
./mysqld --version
./mysqld  Ver 4.1.7-standard for pc-linux on i686 (Official 
MySQL-standard binary)

(more detailed information on my systems in my initial mail from 2005-1-27)
btw: I also ran mysqlcheck -q and mysqlcheck -o on all tables last week 
to make sure the tables are OK...

 



Jan Kirchhoff [EMAIL PROTECTED] wrote:
 

Hi,
   

 

 

My problem still goes on... After having had the problem 2 more times 
   

 

within 1 day, I decided to re-do the replication (copy the whole 
   

 

database onto the slave with rsync and reset master and slave). That 
   

 

only lasted for little more than 1 day and I ended up with the same error:
   

 

 

Could not parse relay log event entry. The possible reasons are: the 
   

 

master's binary log is corrupted (you can check this by running 
   

 

'mysqlbinlog' on the binary log), the slave's relay log is corrupted 
   

 

(you can check this by running 'mysqlbinlog' on the relay log), a 
   

 

network problem, or a bug in the master's or slave's MySQL code. If you 
   

 

want to check the master's binary log or slave's relay log, you will be 
   

 

able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
   

 

 

I can look at the binlog with mysqlbinlog on the master and the slave; 
   

 

no errors or problems.
   

 

After a simple SLAVE START without having done any changes to the 
   

 

database, the slave thread startet again and caught up with the master.
   

 

 

I've been using mysql's replication-feature since it first came up in 
   

 

1999 or 2000 and dealt with lots of problems and workarounds, but this 
   

 

one is weird. Any ideas anybody?
   

 

 

Jan
   

 


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Replication suddenly stops on mysql 4.1.7 with Slave_IO_Running: No

2005-02-01 Thread Gleb Paharenko
Hello.



 But I use 4.1.7, not 4.0.21 ...weird.



As said at:

  http://dev.mysql.com/doc/mysql/en/news-4-1-8.html



Fixed a bug which caused a crash when only the slave I/O thread was 

stopped and started. (Bug #6148)



I suggest you to upgrade to the latest release (4.1.9 now).











Jan Kirchhoff [EMAIL PROTECTED] wrote:



 Gleb Paharenko schrieb:

 

Hello.







I've looked through the bug database, and the only thing



that I've found was an already-closed bug:



  http://bugs.mysql.com/bug.php?id=6148

  



 I had been looking around the Changelogs, but I had not found that one. 

 Sounds pretty much like my problem :(

 But I use 4.1.7, not 4.0.21 ...weird.

 

Check that your server passes rpl_relayspace.test. Go to the mysql-test



directory and execute:



  ./mysql-test-run t/rpl_relayspace.test   

  



 This one runs wirhout errors on the master and the slave...:

 

 hostname:/usr/local/mysql-standard-4.1.7-pc-linux-i686-icc-glibc23/mysql-test#
  

 ./mysql-test-run t/rpl_relayspace.test  

 Installing Test Databases

 Removing Stale Files

 Installing Master Databases

 running  ../bin/mysqld --no-defaults --bootstrap --skip-grant-tables 

 --basedir=.. --datadir=mysql-test/var/master-data --skip-innodb 

 --skip-ndbcluster --skip-bdb

 Installing Slave Databases

 running  ../bin/mysqld --no-defaults --bootstrap --skip-grant-tables 

 --basedir=.. --datadir=mysql-test/var/slave-data --skip-innodb 

 --skip-ndbcluster --skip-bdb

 Manager disabled, skipping manager start.

 Loading Standard Test Databases

 Starting Tests

 

 TESTRESULT

 ---

 rpl_relayspace [ pass ]  

 ---

 

 Ending Tests

 Shutting-down MySQL daemon

 

 Master shutdown finished

 Slave shutdown finished

 All 1 tests were successful.

 

 I'm not able to exchange the mysql-software itself (I use the 

 icc-binary) to a gcc-version or to upgrade to 4.1.9 in the next 2-3 

 weeks. And looking at the changelogs on mysql.com I don't think it would 

 change anything...

 Hasn't anybody else had such problems with 4.1.x?

 

 hostname:/usr/local/mysql-standard-4.1.7-pc-linux-i686-icc-glibc23/bin# 

 ./mysqld --version

 ./mysqld  Ver 4.1.7-standard for pc-linux on i686 (Official 

 MySQL-standard binary)

 

 (more detailed information on my systems in my initial mail from 2005-1-27)

 

 btw: I also ran mysqlcheck -q and mysqlcheck -o on all tables last week 

 to make sure the tables are OK...

 

  











Jan Kirchhoff [EMAIL PROTECTED] wrote:



  



Hi,







  





  



My problem still goes on... After having had the problem 2 more times 







  



within 1 day, I decided to re-do the replication (copy the whole 







  



database onto the slave with rsync and reset master and slave). That 







  



only lasted for little more than 1 day and I ended up with the same error:







  





  



Could not parse relay log event entry. The possible reasons are: the 







  



master's binary log is corrupted (you can check this by running 







  



'mysqlbinlog' on the binary log), the slave's relay log is corrupted 







  



(you can check this by running 'mysqlbinlog' on the relay log), a 







  



network problem, or a bug in the master's or slave's MySQL code. If you 







  



want to check the master's binary log or slave's relay log, you will be 







  



able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.







  





  



I can look at the binlog with mysqlbinlog on the master and the slave; 







  



no errors or problems.







  



After a simple SLAVE START without having done any changes to the 







  



database, the slave thread startet again and caught up with the master.







  





  



I've been using mysql's replication-feature since it first came up in 







  



1999 or 2000 and dealt with lots of problems and workarounds, but this 







  



one is weird. Any ideas anybody?







  





  



Jan







  



 

 



-- 
For technical support contracts, goto https://order.mysql.com/?ref=ensita
This email is sponsored by Ensita.NET http://www.ensita.net/
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /Gleb Paharenko
 / /|_/ / // /\ \/ /_/ / /__   [EMAIL PROTECTED]
/_/  /_/\_, /___/\___\_\___/   MySQL AB / Ensita.NET
   ___/   www.mysql.com




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Replication suddenly stops on mysql 4.1.7 with Slave_IO_Running: No

2005-01-31 Thread Jan Kirchhoff
Hi,
My problem still goes on... After having had the problem 2 more times 
within 1 day, I decided to re-do the replication (copy the whole 
database onto the slave with rsync and reset master and slave). That 
only lasted for little more than 1 day and I ended up with the same error:

Could not parse relay log event entry. The possible reasons are: the 
master's binary log is corrupted (you can check this by running 
'mysqlbinlog' on the binary log), the slave's relay log is corrupted 
(you can check this by running 'mysqlbinlog' on the relay log), a 
network problem, or a bug in the master's or slave's MySQL code. If you 
want to check the master's binary log or slave's relay log, you will be 
able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

I can look at the binlog with mysqlbinlog on the master and the slave; 
no errors or problems.
After a simple SLAVE START without having done any changes to the 
database, the slave thread startet again and caught up with the master.

I've been using mysql's replication-feature since it first came up in 
1999 or 2000 and dealt with lots of problems and workarounds, but this 
one is weird. Any ideas anybody?

Jan
Hello,
I have a replication setup on to linux boxes (debian woody, kernel 
2.4.21-xfs, mysql 4.1.7-standard official intel-compiler binary from 
mysql.com).

master:~# mysqladmin status
Uptime: 464848  Threads: 10  Questions: 296385136  Slow queries: 1752  
Opens: 2629  Flush tables: 1  Open tables: 405  Queries per second 
avg: 637.596

slave:~# mysqladmin  status
Uptime: 463460  Threads: 2  Questions: 292885156  Slow queries: 6  
Opens: 2510  Flush tables: 1  Open tables: 327  Queries per second 
avg: 631.953

both systems have identical hardware (P4 2.4ghz, 3GB RAM, 
SCSI-Hardware-RAID) connection is gigabit-ethernet.

Everything used to work fine, but I wanted to get rid of InnoDB since 
I did only use that for very big table containing historical data and 
those tables were moved to another server. I ran out of discspace, 
innodb-datafiles can only grow but not shrink and i didn't need it 
anyway, so it had to go.
I stopped the slave, changed all left over innodb-tables to myisam, 
added skip-innodb  to my.cnf on the master and the slave, restarted 
the server, renewed the replication by doing it the classical way: 
flush tables with read log, copy the /var/lib/mysql on the slave (not 
much, just around 20GB), reset master, unlock tables. Then start the 
slave-mysqld, reset slave, slave start.

Everything was fine and very fast for 4 days (from saturday till 
wednesday afternoon), then suddenly the slave stopped.
this is where the weird stuff starts:
show slave status tells me everything is fine, just 
Slave_IO_Running: No is wrong.
After typing slave start, it says Slave_IO_Running: Yes, and 
Slave_SQL_Running: No. Very strange. Now i did a slave stop;slave 
start; and everything is fine again, the slave catches up and goes 
on. Today (thursday afternoon), the same thing happens again and can 
be solved again by slave stop;slave start;. Now it happened again 
around 10pm. Again, the stop-start-trick made it working again.

I add the output of my mysql-shell
Can anybody help me with that?
This is a production system under heavy load and I can't play around 
with different mysql-versions and such...
If I don't find a solution really quick, I'll have to do help myself 
with some shell-skript-daemon checking if replication is running and 
issuing stop slave;start slave-commands otherwise... not really the 
way it should be :(

Thanks
Jan
SLAVE:
slave:~# cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Pentium(R) 4 CPU 2.40GHz
stepping: 7
cpu MHz : 2392.077
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips: 4771.02

slave:~# free
 total   used   free sharedbuffers cached
Mem:   31051042355364 749740  04401514104
-/+ buffers/cache: 8408202264284
Swap:   779144 428072 351072
MASTER
master:~# cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Pentium(R) 4 CPU 2.40GHz
stepping: 7
cpu MHz : 2392.163
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips: 4771.02

master:~# free

Re: Replication suddenly stops on mysql 4.1.7 with Slave_IO_Running: No

2005-01-31 Thread Gleb Paharenko
Hello.



I've looked through the bug database, and the only thing

that I've found was an already-closed bug:

  http://bugs.mysql.com/bug.php?id=6148



Check that your server passes rpl_relayspace.test. Go to the mysql-test

directory and execute:

  ./mysql-test-run t/rpl_relayspace.test   



  





Jan Kirchhoff [EMAIL PROTECTED] wrote:

 Hi,

 

 My problem still goes on... After having had the problem 2 more times 

 within 1 day, I decided to re-do the replication (copy the whole 

 database onto the slave with rsync and reset master and slave). That 

 only lasted for little more than 1 day and I ended up with the same error:

 

 Could not parse relay log event entry. The possible reasons are: the 

 master's binary log is corrupted (you can check this by running 

 'mysqlbinlog' on the binary log), the slave's relay log is corrupted 

 (you can check this by running 'mysqlbinlog' on the relay log), a 

 network problem, or a bug in the master's or slave's MySQL code. If you 

 want to check the master's binary log or slave's relay log, you will be 

 able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

 

 I can look at the binlog with mysqlbinlog on the master and the slave; 

 no errors or problems.

 After a simple SLAVE START without having done any changes to the 

 database, the slave thread startet again and caught up with the master.

 

 I've been using mysql's replication-feature since it first came up in 

 1999 or 2000 and dealt with lots of problems and workarounds, but this 

 one is weird. Any ideas anybody?

 

 Jan

 

 Hello,



 I have a replication setup on to linux boxes (debian woody, kernel 

 2.4.21-xfs, mysql 4.1.7-standard official intel-compiler binary from 

 mysql.com).



 master:~# mysqladmin status

 Uptime: 464848  Threads: 10  Questions: 296385136  Slow queries: 1752  

 Opens: 2629  Flush tables: 1  Open tables: 405  Queries per second 

 avg: 637.596



 slave:~# mysqladmin  status

 Uptime: 463460  Threads: 2  Questions: 292885156  Slow queries: 6  

 Opens: 2510  Flush tables: 1  Open tables: 327  Queries per second 

 avg: 631.953



 both systems have identical hardware (P4 2.4ghz, 3GB RAM, 

 SCSI-Hardware-RAID) connection is gigabit-ethernet.



 Everything used to work fine, but I wanted to get rid of InnoDB since 

 I did only use that for very big table containing historical data and 

 those tables were moved to another server. I ran out of discspace, 

 innodb-datafiles can only grow but not shrink and i didn't need it 

 anyway, so it had to go.

 I stopped the slave, changed all left over innodb-tables to myisam, 

 added skip-innodb  to my.cnf on the master and the slave, restarted 

 the server, renewed the replication by doing it the classical way: 

 flush tables with read log, copy the /var/lib/mysql on the slave (not 

 much, just around 20GB), reset master, unlock tables. Then start the 

 slave-mysqld, reset slave, slave start.



 Everything was fine and very fast for 4 days (from saturday till 

 wednesday afternoon), then suddenly the slave stopped.

 this is where the weird stuff starts:

 show slave status tells me everything is fine, just 

 Slave_IO_Running: No is wrong.

 After typing slave start, it says Slave_IO_Running: Yes, and 

 Slave_SQL_Running: No. Very strange. Now i did a slave stop;slave 

 start; and everything is fine again, the slave catches up and goes 

 on. Today (thursday afternoon), the same thing happens again and can 

 be solved again by slave stop;slave start;. Now it happened again 

 around 10pm. Again, the stop-start-trick made it working again.



 I add the output of my mysql-shell



 Can anybody help me with that?

 This is a production system under heavy load and I can't play around 

 with different mysql-versions and such...

 If I don't find a solution really quick, I'll have to do help myself 

 with some shell-skript-daemon checking if replication is running and 

 issuing stop slave;start slave-commands otherwise... not really the 

 way it should be :(



 Thanks

 Jan





 SLAVE:

 slave:~# cat /proc/cpuinfo

 processor   : 0

 vendor_id   : GenuineIntel

 cpu family  : 15

 model   : 2

 model name  : Intel(R) Pentium(R) 4 CPU 2.40GHz

 stepping: 7

 cpu MHz : 2392.077

 cache size  : 512 KB

 fdiv_bug: no

 hlt_bug : no

 f00f_bug: no

 coma_bug: no

 fpu : yes

 fpu_exception   : yes

 cpuid level : 2

 wp  : yes

 flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 

 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm

 bogomips: 4771.02



 slave:~# free

  total   used   free sharedbuffers cached

 Mem:   31051042355364 749740  04401514104

 -/+ buffers/cache: 8408202264284

 Swap:   779144 428072 351072



 MASTER

 master:~# cat 

Replication suddenly stops on mysql 4.1.7 with Slave_IO_Running: No

2005-01-27 Thread Jan Kirchhoff
Hello,
I have a replication setup on to linux boxes (debian woody, kernel 2.4.21-xfs, 
mysql 4.1.7-standard official intel-compiler binary from mysql.com).

master:~# mysqladmin status
Uptime: 464848  Threads: 10  Questions: 296385136  Slow queries: 1752  Opens: 
2629  Flush tables: 1  Open tables: 405  Queries per second avg: 637.596

slave:~# mysqladmin  status
Uptime: 463460  Threads: 2  Questions: 292885156  Slow queries: 6  Opens: 2510 
 Flush tables: 1  Open tables: 327  Queries per second avg: 631.953

both systems have identical hardware (P4 2.4ghz, 3GB RAM, SCSI-Hardware-RAID) 
connection is gigabit-ethernet.

Everything used to work fine, but I wanted to get rid of InnoDB since I did 
only use that for very big table containing historical data and those tables 
were moved to another server. I ran out of discspace, innodb-datafiles can 
only grow but not shrink and i didn't need it anyway, so it had to go.
I stopped the slave, changed all left over innodb-tables to myisam, added 
skip-innodb  to my.cnf on the master and the slave, restarted the server, 
renewed the replication by doing it the classical way: flush tables with 
read log, copy the /var/lib/mysql on the slave (not much, just around 20GB), 
reset master, unlock tables. Then start the slave-mysqld, reset slave, slave 
start.

Everything was fine and very fast for 4 days (from saturday till wednesday 
afternoon), then suddenly the slave stopped.
this is where the weird stuff starts:
show slave status tells me everything is fine, just Slave_IO_Running: No 
is wrong.
After typing slave start, it says Slave_IO_Running: Yes, and 
Slave_SQL_Running: No. Very strange. Now i did a slave stop;slave start; 
and everything is fine again, the slave catches up and goes on. Today 
(thursday afternoon), the same thing happens again and can be solved again by 
slave stop;slave start;. Now it happened again around 10pm. Again, the 
stop-start-trick made it working again.

I add the output of my mysql-shell
Can anybody help me with that?
This is a production system under heavy load and I can't play around with 
different mysql-versions and such...
If I don't find a solution really quick, I'll have to do help myself with some 
shell-skript-daemon checking if replication is running and issuing stop 
slave;start slave-commands otherwise... not really the way it should be :(

Thanks
Jan
SLAVE:
slave:~# cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Pentium(R) 4 CPU 2.40GHz
stepping: 7
cpu MHz : 2392.077
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips: 4771.02

slave:~# free
 total   used   free sharedbuffers cached
Mem:   31051042355364 749740  04401514104
-/+ buffers/cache: 8408202264284
Swap:   779144 428072 351072
MASTER
master:~# cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Pentium(R) 4 CPU 2.40GHz
stepping: 7
cpu MHz : 2392.163
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips: 4771.02

master:~# free
 total   used   free sharedbuffers cached
Mem:   31051043096016   9088  06482087780
-/+ buffers/cache:10075882097516
Swap:   779144 391732 387412

Slave shell:
wpdb2:~# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 23083 to server version: 4.1.7-standard
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
wpdb2 mysql show slave status\G
*** 1. row ***
 Slave_IO_State:
Master_Host: 192.168.10.26
Master_User: repl
Master_Port: 3306
  Connect_Retry: 10
Master_Log_File: mysql-bin.000210
Read_Master_Log_Pos: 146168522
 Relay_Log_File: wpdb2-relay-bin.000210
  Relay_Log_Pos: 146168608
  Relay_Master_Log_File: mysql-bin.000210
   Slave_IO_Running: No
  Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
 Replicate_Do_Table:
 Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table: