2015-06-25 14:02 GMT+02:00 wagnerbianchi.com <m...@wagnerbianchi.com>: > Some additional information here, just my 2 cents. > > (...) > > Just checking in: using two servers in replication, idle servers, on the > slave side I configured globally the slave_net_timeout=1 and log_warnings=2, > as I'm using 5.6 for these tests. The interest here is to check the > reconnection made by the slave and with that, the restart of Binlog Dump > Thread on the master. Looking at the MySQL Error log... > > #: slave error log - reported every 5 secs > 2015-06-25 11:38:21 2598 [Warning] Storing MySQL user name or password > information in the master info repository is not secure and is therefore not > recommended. Please consider using the USER and PASSWORD connection options > for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more > information. > 2015-06-25 11:38:26 2598 [Warning] Storing MySQL user name or password > information in the master info repository is not secure and is therefore not > recommended. Please consider using the USER and PASSWORD connection options > for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more > information. > 2015-06-25 11:38:31 2598 [Warning] Storing MySQL user name or password > information in the master info repository is not secure and is therefore not > recommended. Please consider using the USER and PASSWORD connection options > for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more > information. > > Here we can see clearly that the slave is reconnecting every sec, since > those messages appearing on the error log are showing us the same behavior > that happens when one issue a START SLAVE (with no user and password and > even SSL) using e.g. the mysql client. > > #: master side error log - reported every 3 secs > 2015-06-25 11:41:05 2648 [Note] Start binlog_dump to master_thread_id(120) > slave_server(256380), pos(mysql-bin.000002, 120) > 2015-06-25 11:41:08 2648 [Note] Start binlog_dump to master_thread_id(121) > slave_server(256380), pos(mysql-bin.000002, 120) > 2015-06-25 11:41:11 2648 [Note] Start binlog_dump to master_thread_id(122) > slave_server(256380), pos(mysql-bin.000002, 120) > > We can see that the master Binlog Dump Thread is re-initialized as well when > the Slave I/O Thread reconnects. > > BTW, SHOW PROCESSLIST delays at least 10 seconds to report that a slave has > died when we can see a new connection is made observing the increment of the > thread id. > > (...) > > Even having the SLAVE reconnecting on every second, the slave error log > reports that reconnection every 5 secs, the SHOW PROCESSLIST reports a new > thread id every 10 secs, the master report the start of Binlog Dump Thread > on every 3 secs. > > From here, we need to investigate more... >
Wagner, thank you very much for your experience. And when you make your network link to die (remove cable / ifconfig down / iptables...) between your master and your slave, how long does your Binlog Dump process stay up ? > 2015-06-25 2:48 GMT-03:00 Ben RUBSON <ben.rub...@gmail.com>: >> >> 2015-06-22 13:45 GMT+02:00 Ben RUBSON <ben.rub...@gmail.com>: >> >> > 2015-06-19 12:08 GMT+02:00 Ben RUBSON <ben.rub...@gmail.com>: >> >> >> >> 2015-06-18 22:52 GMT+02:00 shawn l.green <shawn.l.gr...@oracle.com>: >> >>> >> >>> On 6/18/2015 2:10 PM, Ben RUBSON wrote: >> >>>> >> >>>> Hello, >> >>>> >> >>>> In order for the slave to quickly show a communication issue between >> >>>> the master and the slave, I set slave_net_timeout to 10. >> >>>> "show slave status" then quickly updates, perfect. >> >>>> >> >>>> I would also like the master to quickly show when the slave is no >> >>>> more >> >>>> reachable. >> >>>> >> >>>> However, "show processlist" and "show slave hosts" take a very long >> >>>> time to update their status when the slave has gone. >> >>>> Is there any way to have a refresh rate of about 10 seconds, as I did >> >>>> on slave side ? >> >>> >> >>> There are two situations to consider >> >>> >> >>> 1) The slave is busy re-trying. It will do this a number of times >> >>> then >> >>> eventually disconnect itself. If it does disconnect itself, the >> >>> processlist >> >>> report will show it as soon as that happens. >> >> >> >> Yes, I confirm. >> >> >> >>> 2) The connection between the master and slave died (or the slave >> >>> itself is >> >>> lost). In this case, the server did not receive any "I am going to >> >>> disconnect" message from its client (the slave). So as far as the >> >>> server is >> >>> concerned, it is simply sitting in a wait expecting the client to >> >>> eventually >> >>> send in a new command packet. >> >>> >> >>> That wait is controlled by --wait-timeout. Once an idle client >> >>> connection >> >>> hits that limit, the server is programmed to think "the idiot on the >> >>> other >> >>> end of this call has hung up on me" so it simply closes its end of the >> >>> socket. There are actually two different timers that could be used, >> >>> --wait-timeout or --interactive-timeout and which one is used to >> >>> monitor the >> >>> idle socket depends entirely on if the client did or did not set the >> >>> 'interactive flag' when it formed the connection. MySQL slaves do not >> >>> use >> >>> that flag. >> >>> >> >>> Now, if the line between the two systems died in the middle of a >> >>> conversation (an actual data transfer) then a shorter >> >>> -net-write-timeout or >> >>> --net-read-timeout would expire and the session would die then. >> >> >> >> This is the interesting part yes, when the connection dies (whatever >> >> the link status is at this moment, idle or not). >> >> So I set wait_timeout=10. >> >> >> >> When the link is up, we clearly see that the idle connection is reset >> >> every 10 seconds : the "show processlist" clearly shows that the slave >> >> TCP source port changes, and time is reset from 10 to 0. >> >> Perfect. >> > >> > Well this behavior is due to slave_net_timeout, not to wait_timeout. >> > So neither wait_timeout nor interactive_timeout (expected), >> > net_read_timeout, net_write_timeout helped. >> > >> >> However, when the link dies, the "Binlog Dump" process stays in the >> >> "show processlist", I have to wait more than 1000 seconds for it to >> >> disappear. >> >> I made tests adding interactive_timeout=10, net_read_timeout=10 and >> >> net_write_timeout=10, however the behavior is the same. >> >> >> >> Did I miss something ? >> >> >> >> Of course goal is to monitor replication, from the slave (done and >> >> working thanks to slave_net_timeout), but from the master too (some >> >> more tuning needed), as we never know which one will be able to >> >> transmit the alert properly. >> >> >> >> Thank you very much Shawn. >> >> Hello, >> >> Would you have any further advice on this topic please ? >> >> Thank you again, >> >> Best regards, >> >> Ben -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql