On Monday 29 April 2002 04:39 am, David Harper wrote: > >Description: > I'm running the pre-compiled Compaq Alpha (OSF1) version of MySQL 3.23.49 > with master/slave replication. > > The master mysqld is running on one machine, the slave on another. > Everything works fine until I shutdown the master server. The slave > then immediately crashes due to a segmentation violation fault. Here > are the lines from the mysqld_multi.log file: > > /nfs/pathsoft/external/mysql-3.23.49/libexec/mysqld: ready for connections > 020426 11:26:15 Slave: connected to master 'slave@babel:14641', replication started in log 'mysql.001' at position 156 > 020426 11:27:03 Slave: received 0 length packet from server, apparent master shutdown: (0) > 020426 11:27:03 Slave: Failed reading log event, reconnecting to retry, log 'mysql.002' position 73 > mysqld got signal 11; > > I can make the slave server crash *every* time it loses its connection > to the master server. > > It's not a hardware problem on one machine, because I have run the master > and slave servers on several combinations of machines and the slave > crashes *every* time. > > It might help you to know that when I run a slave server on an i386 Linux > machine, it survives when the master server on the Alpha machine is shut down, > and it happily reconnects when I restart the master server. > > This leads me to think that the problem is in the slave code, and is specific > to the build for Compaq Alphas. > > I built mysqld from the source code with the --with-debug option specified > to the configure script. Then I duplicated the slave server crash and found > that the problem is in the code which tried to re-connect to the master. > Specifically, the SEGV fault occurs within call to gethostbyname_r. Here is > the debugger traceback: > > (ladebug) where > >0 0x12025a538 in __nxm_thread_kill(0x20000f3f8c8, 0xb, 0x1, 0x1, 0x25, 0x20000f3f600) in /nfs/pathsoft/external/mysql-3.23.49-src/libexec/mysqld > #1 0x120242ac4 in pthread_kill(0x20000f3f8c8, 0xb, 0x1, 0x1, 0x25, 0x20000f3f600) in /nfs/pathsoft/external/mysql-3.23.49-src/libexec/mysqld > #2 0x1201b48d4 in write_core(sig=11) "stacktrace.c":220 > #3 0x120103f48 in handle_segfault(sig=11) "mysqld.cc":1287 > #4 0x120287bcc in __sigtramp(0x20000f3f8c8, 0xb, 0x1, 0x1, 0x25, 0x20000f3f600) in /nfs/pathsoft/external/mysql-3.23.49-src/libexec/mysqld > #5 0x1202b2b48 in rewind(0x20000000199, 0x20000f3b418, 0x20000f3b318, 0x20000f3b418, 0x0, 0x1) in /nfs/pathsoft/external/mysql-3.23.49-src/libexec/mysqld > #6 0x1202928e4 in UnknownProcedure1FromFile1780(0x20000000199, 0x20000f3b418, 0x20000f3b318, 0x20000f3b418, 0x0, 0x1) in /nfs/pathsoft/external/mysql-3.23.49-src/libexec/mysqld > #7 0x120293330 in UnknownProcedure13FromFile1780(0x20000000199, 0x20000f3b418, 0x20000f3b318, 0x20000f3b418, 0x0, 0x1) in /nfs/pathsoft/external/mysql-3.23.49-src/libexec/mysqld > #8 0x1202953e0 in __gethostbyname_r(0x20000000199, 0x20000f3b418, 0x20000f3b318, 0x20000f3b418, 0x0, 0x1) in /nfs/pathsoft/external/mysql-3.23.49-src/libexec/mysqld > #9 0x12021c6e4 in my_gethostbyname_r(name=0x1400788d8="babel", result=0x20000f3b318, buffer=0x20000f3b418="", buflen=8840, h_errnop=0x20000f3b338) "my_pthread.c":440 > #10 0x1201b27ec in mc_mysql_connect(mysql=0x20000f3d6e8, host=0x1400788d8="babel", user=0x1400994a0="slave", passwd=0x140098320="mylittlesecret", db=0x0, port=14641, unix_socket=0x0, client_flag=133) "mini_client.cc":622 > #11 0x1201b20ac in mc_mysql_reconnect(mysql=0x14009fb00) "mini_client.cc":416 > #12 0x1201ae040 in safe_reconnect(thd=0x140079400, mysql=0x14009fb00, mi=0x14005bb20) "slave.cc":1517 > #13 0x1201adae8 in handle_slave(arg=0x0) "slave.cc":1384 > #14 0x12023f648 in __thdBase(0x20000000199, 0x20000f3b418, 0x20000f3b318, 0x20000f3b418, 0x0, 0x1) in /nfs/pathsoft/external/mysql-3.23.49-src/libexec/mysqld >
David: At this point, I have two theories: a) There is something wrong with your gethostbyname_r function b) MySQL has a sublte buffer overrun ( probably only a couple of bytes), that in your case happen to corrupt some critical structures in __gethostbyname_r. If a) is the case, I would first try how well gethostbyname_r handles sequences of repeated calls. I would imagine the bug will be manifested only with a certain name resolutoin setup. For a temporary workaround, I would suggest trying to use a numeric IP, or try to use a different name resolution configuration ( eg. put the master in /etc/hosts instead of of name server, or vice versa). -- MySQL Development Team For technical support contracts, visit https://order.mysql.com/?ref=mspa __ ___ ___ ____ __ / |/ /_ __/ __/ __ \/ / Sasha Pachev <[EMAIL PROTECTED]> / /|_/ / // /\ \/ /_/ / /__ MySQL AB, http://www.mysql.com/ /_/ /_/\_, /___/\___\_\___/ Provo, Utah, USA <___/ --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php