Re: Replication still stopping...
A couple of thoughts. Do you have slaves with duplicated server IDs? That seems most likely to me. Nope. I've got one master, and one slave. The server ID is set to 1 on the master, and it's set to 2 on the slave. If that's not it, is the max_packet_size mismatched on the master and slave? I don't find max_packet_size in the My.ini file on either server, and when I do a show variables on both, max_packet_size is not listed on either of them. Can you connect to the master and view the binary log event at the position it's trying to read, with SHOW BINLOG EVENTS? That's where things get squirley. The position it reports always seems to be incorrect. For instance, when this was happening previously, I know that it had made it to a later position in the log. However, when replication stopped, it reported a position earlier in the file. This one, for instance, reports position 195. the Nearest one I have starts at position 98 and ends at position 1032. This is an update statement. If my logic is not flawed, I'm thinking that I should follow starting at 98 out until I get to position 195. When I do that, I come to: RegOpenDate = '2007-11-05 00:00:00', which is part of the udpate statement. This appears normal to me. I've checked, and it is a DateTime field, and it is exactly the same on both the master and slave. Can you use the mysqlbinlog tool to verify that the binary log isn't corrupted on the master? I've dumped the log to a text file. What, exactly, should I look for? The only suspicious thing I see is the first entry: # at 4 #071020 15:45:34 server id 1 end_log_pos 98Start: binlog v 4, server v 5.0.17-nt-log created 071020 15:45:34 at startup # Warning: this binlog was not closed properly. Most probably mysqld crashed writing it. ROLLBACK; Don't know why it would do this. However, I set the master_log_pos to 98 before re-starting the slave after re-setting it last time. Thanks, Jesse -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: [OT] Memory Usage on Windows? Re: Replication still stopping...
as i can see you are running mysql on windows. If i start my db server (5.0.45/innodb/win2k) the server uses about ~80K handles (as seen in taskmgr) and memory usage increases around 1g. Taskmgr.exe says that there is some swapping (the box has only 1gb ram). The DB itself is small (~50mb or so). My Question is, did you have the same things on your box? Did you have performace issues which resultes from the memory usage? I can't even keep it running for longer that 24 hours, and I don't know why I haven't even started looking into memory issues or performance. When it is runnning, as a test, I change a record on the master, and I notice that almost immediately, the same change is made on the slave. Works perfectly for a few hours, then it just stops working. It almost appears to be a network related issue, but I can't seem to track it down. Jesse -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Replication still stopping...
Jesse wrote: A couple of thoughts. Do you have slaves with duplicated server IDs? That seems most likely to me. Nope. I've got one master, and one slave. The server ID is set to 1 on the master, and it's set to 2 on the slave. If that's not it, is the max_packet_size mismatched on the master and slave? I don't find max_packet_size in the My.ini file on either server, and when I do a show variables on both, max_packet_size is not listed on either of them. Whoops, I got the name wrong: mysql show variables like '%packet%'; ++--+ | Variable_name | Value| ++--+ | max_allowed_packet | 16776192 | ++--+ 1 row in set (0.00 sec) Can you connect to the master and view the binary log event at the position it's trying to read, with SHOW BINLOG EVENTS? That's where things get squirley. The position it reports always seems to be incorrect. For instance, when this was happening previously, I know that it had made it to a later position in the log. However, when replication stopped, it reported a position earlier in the file. This one, for instance, reports position 195. the Nearest one I have starts at position 98 and ends at position 1032. This is an update statement. If my logic is not flawed, I'm thinking that I should follow starting at 98 out until I get to position 195. When I do that, I come to: RegOpenDate = '2007-11-05 00:00:00', which is part of the udpate statement. This appears normal to me. I've checked, and it is a DateTime field, and it is exactly the same on both the master and slave. That's strange. I'm not sure I understand what's happening there. Check the packet size and let's come back to this if that's not the problem. Can you use the mysqlbinlog tool to verify that the binary log isn't corrupted on the master? I've dumped the log to a text file. What, exactly, should I look for? The only suspicious thing I see is the first entry: # at 4 #071020 15:45:34 server id 1 end_log_pos 98Start: binlog v 4, server v 5.0.17-nt-log created 071020 15:45:34 at startup # Warning: this binlog was not closed properly. Most probably mysqld crashed writing it. ROLLBACK; That's fine --it just means the log is still open. (It is still open, right?) If you run this on a log other than the newest one, you shouldn't see that. If there was corruption, the mysqlbinlog tool would have crashed. Baron -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Replication still stopping...
I tried posting this on the Replication list, and got no response. Maybe someone here can help... OK. Still battling this issue after weeks of working with it. I'm racking my brains. I re-set the slave again on Saturday, and got replication started again. It was working fine until this afternoon some time. Before starting things up, I cleaned the error log out completely, so it would be clean before I started. Here is my error log in total: 071020 14:43:51 InnoDB: Started; log sequence number 0 142497221 071020 14:43:51 [Note] C:\Program Files\MySQL\MySQL Server 5.0\bin\mysqld-nt: ready for connections. Version: '5.0.45-community-nt' socket: '' port: 3306 MySQL Community Edition (GPL) 071020 14:43:51 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.06' at position 98, relay log 'C:\Program Files\MySQL\MySQL Server 5.0\Data\dlgsrv-relay-bin.02' position: 235 071020 14:43:52 [Note] Slave I/O thread: connected to master '[EMAIL PROTECTED]:3306', replication started in log 'mysql-bin.06' at position 98 071020 15:43:32 [Note] Slave: received end packet from server, apparent master shutdown: 071020 15:43:32 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.06' position 98 071020 15:43:33 [ERROR] Slave I/O thread: error reconnecting to master '[EMAIL PROTECTED]:3306': Error: 'Can't connect to MySQL server on 'webserver' (10061)' errno: 2003 retry-time: 60 retries: 86400 071020 15:45:56 [Note] Slave: connected to master '[EMAIL PROTECTED]:3306',replication resumed in log 'mysql-bin.06' at position 98 071021 15:02:21 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.07' at position 195 I checked periodically on the server, and everything seemed to be working. The last time I checked was this morning sometime around 8:00 pr so. Still running. As you can see, however, it juststopped processing at 15:02:21 this afternoon. The master server was not down. I was in and out of web sites that use the MySQL database on the master several times, and it always worked just fine, and never gave me an error. It almost appears as though the slave cannot communicate with the master. It looks like it tried 86,400 times, which I guess took almost a day to do, and just gave up. Why would it be able to connect initially to the server, then suddenly not be able to connect any more? Any help or suggestions anyone can offer is greatly appreciated! Jesse -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Replication still stopping...
Hi Jesse, Jesse wrote: I tried posting this on the Replication list, and got no response. Maybe someone here can help... OK. Still battling this issue after weeks of working with it. I'm racking my brains. I re-set the slave again on Saturday, and got replication started again. It was working fine until this afternoon some time. Before starting things up, I cleaned the error log out completely, so it would be clean before I started. Here is my error log in total: 071020 14:43:51 InnoDB: Started; log sequence number 0 142497221 071020 14:43:51 [Note] C:\Program Files\MySQL\MySQL Server 5.0\bin\mysqld-nt: ready for connections. Version: '5.0.45-community-nt' socket: '' port: 3306 MySQL Community Edition (GPL) 071020 14:43:51 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.06' at position 98, relay log 'C:\Program Files\MySQL\MySQL Server 5.0\Data\dlgsrv-relay-bin.02' position: 235 071020 14:43:52 [Note] Slave I/O thread: connected to master '[EMAIL PROTECTED]:3306', replication started in log 'mysql-bin.06' at position 98 071020 15:43:32 [Note] Slave: received end packet from server, apparent master shutdown: 071020 15:43:32 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.06' position 98 071020 15:43:33 [ERROR] Slave I/O thread: error reconnecting to master '[EMAIL PROTECTED]:3306': Error: 'Can't connect to MySQL server on 'webserver' (10061)' errno: 2003 retry-time: 60 retries: 86400 071020 15:45:56 [Note] Slave: connected to master '[EMAIL PROTECTED]:3306',replication resumed in log 'mysql-bin.06' at position 98 071021 15:02:21 [Note] Slave SQL thread exiting, replication stopped in log 'mysql-bin.07' at position 195 I checked periodically on the server, and everything seemed to be working. The last time I checked was this morning sometime around 8:00 pr so. Still running. As you can see, however, it juststopped processing at 15:02:21 this afternoon. The master server was not down. I was in and out of web sites that use the MySQL database on the master several times, and it always worked just fine, and never gave me an error. It almost appears as though the slave cannot communicate with the master. It looks like it tried 86,400 times, which I guess took almost a day to do, and just gave up. Why would it be able to connect initially to the server, then suddenly not be able to connect any more? A couple of thoughts. Do you have slaves with duplicated server IDs? That seems most likely to me. If that's not it, is the max_packet_size mismatched on the master and slave? Can you connect to the master and view the binary log event at the position it's trying to read, with SHOW BINLOG EVENTS? Can you use the mysqlbinlog tool to verify that the binary log isn't corrupted on the master? Baron -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
[OT] Memory Usage on Windows? Re: Replication still stopping...
Hi Jesse, 071020 14:43:51 InnoDB: Started; log sequence number 0 142497221 071020 14:43:51 [Note] C:\Program Files\MySQL\MySQL Server 5.0\bin\mysqld-nt: ready for connections. as i can see you are running mysql on windows. If i start my db server (5.0.45/innodb/win2k) the server uses about ~80K handles (as seen in taskmgr) and memory usage increases around 1g. Taskmgr.exe says that there is some swapping (the box has only 1gb ram). The DB itself is small (~50mb or so). My Question is, did you have the same things on your box? Did you have performace issues which resultes from the memory usage? Thanks Ralf -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]