I want to fix a replication issue with a 2-node cluster (one active, one passive) that is using Heartbeat for failover. The nodes are in Master-Master configuration (that is, each is the slave and master of the other).
I have several other hosts that are replication slaves from the active node. They connect to MySQL via TCP over an SSH tunnels. When failover occurs, the passive node becomes the active node. However the replication slaves stop replicating. The error from a log on one of the slaves is: Jul 15 07:43:32 <host> mysqld[1339]: 090715 7:43:32 [Note] Slave I/O thread: conn ected to master '<user>@127.0.0.1:3307', replication started in log 'mysql-bin.00 0978' at position 23923243 Jul 15 07:43:32 <host> mysqld[1339]: 090715 7:43:32 [ERROR] Error reading packet from server: Could not find first log file name in binary log index file ( serve r_errno=1236) Jul 15 07:43:32 <host> mysqld[1339]: 090715 7:43:32 [ERROR] Got fatal error 1236: 'Could not find first log file name in binary log index file' from master when reading data from binary log Jul 15 07:43:32 <host> mysqld[1339]: 090715 7:43:32 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000978', position 23923243 I do not think this is an SSH tunnel issue. I believe this is because of inconsistent binary log file names and positions between the two nodes. Probably because one of the nodes had been in operation a lot longer than the other. At the moment I have to get replication going by dumping the master databases again, re-import to the slave hosts and bootstrap the slaves. What is the best way to make this consistent and ensure that replication continues smoothly after a failover (and failback) event? Thank you, Imran Chaudhry -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=arch...@jab.org