Problems with Replication in 4.0.17

Neil Gunton Tue, 13 Jan 2004 11:26:29 -0800

I am using 4.0.17 rpm on Red Hat 7.3 (fully updated). I have a server
colocated at my local ISP, and my workstation is on ADSL behind a Netsys
router (the ADSL ISP uses PPPoE, don't know if that's relevant or not).
The server has RAID 1, and has always been 100% reliable (up since
2000). I have been using MySQL for over four years now, and have never
had any problems until recently, when I tried using replication.


I wanted to mirror the database to my workstation over the DSL
connection. I got it working correctly, but quickly found that the slave
would just stop replicating if I went away and left it for a while
(hours). It would be fine while I sat there, but overnight or after a
couple of hours away from my workstation, I would return and it had just
stopped. There were no errors in the log on either end. It just wasn't
updating. Restarting the slave would quickly bring things up to date
again. Eventually I tried lowering the master-connect-retry to 10
seconds, and slave-net-timeout to 60 seconds. This seemed to fix this
particular problem. Overnight I could come back and everything was still
synced up. I don't know why this could cause an issue, since I keep
long-lived ssh connections to my server all day long without problem. 

I have also noticed other problems - most worrying of which is that
records inserted into the master database have actually disappeared
completely from the master and slave. My website has message boards, and
on two occasions now I have posted a message, seen it in the database
(i.e. read the website) and then come back to see that the new message
is just gone. These boards have been in operation for years, and are
extremely reliable. Never have messages simply vanished. The first time
this happened, it only took a few seconds to go away. The second time,
it was overnight. This is extremely scary behaviour.

Also, in multiple unrelated instances, one of the master index files
have become corrupted, and had to be repaired using myisamchk. All my
tables are MyISAM. The same corruption has also happened on the slave. I
have never had corrupted tables before now.

The other thing that keeps happening is that the slave seems to get out
of sync somehow with the master - I came in this morning to find that it
had choked on a duplicate primary key. I made the slave skip 2 and it
recovered itself, but this has happened a number of times now. There is
no work being done on the slave version of the database, no possible way
that it would get out of sync as a result of changes on the workstation.
I am the only user, and there are no processes doing anything with the
database. It is a pure slave. Yet, somehow, it ends up with a duplicate
key.

I am worried enough about all this that I have disabled replication for
the time being. 

Has anyone else experienced missing updates and/or table index
corruption as a result of enabling replication? The replication
mechanism should surely "do no harm" on the master as a result of being
active, but this is clearly happening. I am fairly sure that this is a
bug, but since it is so sporadic and non repeatable, it's very hard to
say what could be causing it. I should make clear that I am fairly
certain that replication is set up correctly - it replicates very well
in normal circumstances. Updates on the server appear on the slave
almost instantaneously.

If anyone else has any insight or similar experiences, please let me
know. I would like to know if this is a "known bug" or something that
hasn't been nailed down yet.

I should finally say that I've always been 100% happy with the
robustness of MySQL, so this was a little shocking to me! I think MySQL
is an extremely useful database system, and I plan to continue using it.
Hopefully all this is just an obscure bug.

Thanks,

-Neil Gunton

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Problems with Replication in 4.0.17

Reply via email to