A NOTE has been added to this issue. ====================================================================== http://www.dbmail.org/mantis/view.php?id=879 ====================================================================== Reported By: jpecar Assigned To: ====================================================================== Project: DBMail Issue ID: 879 Category: Database layer Reproducibility: always Severity: major Priority: normal Status: new target: ====================================================================== Date Submitted: 31-Jan-11 16:21 CET Last Modified: 15-Feb-11 16:27 CET ====================================================================== Summary: login hangs if DB unavailable Description: I'm setting up a HA environment of dbmail instances over replicated mysql. I want to monitor health by connecting to pop port and issuing login statements, fully expecting some kind of error msg if db has gone away. However, I see that login hangs indefinitely after pass is given, without any feedback. ======================================================================
---------------------------------------------------------------------- (0003156) paul (administrator) - 31-Jan-11 16:33 http://www.dbmail.org/mantis/view.php?id=879#c3156 ---------------------------------------------------------------------- I'm not seeing this behaviour on git-master. I'm seeing immediate network disconnects. Yes, providing an informative error would be nicer, but terminating the connection at least will tell you something is amiss. ---------------------------------------------------------------------- (0003158) jpecar (reporter) - 01-Feb-11 12:04 http://www.dbmail.org/mantis/view.php?id=879#c3158 ---------------------------------------------------------------------- Just compiled fresh code from git and it behaves the same here. Here's the strace output: .... [pid 3306] read(14, "user test2\r\n", 4095) = 12 [pid 3306] read(14, 0x7fffdd61be40, 4095) = -1 EAGAIN (Resource temporarily unavailable) [pid 3306] write(14, "+OK Password required for test2\r\n", 33) = 33 [pid 3306] epoll_ctl(8, EPOLL_CTL_MOD, 14, {EPOLLIN|EPOLLOUT, {u32=14, u64=14}}) = 0 [pid 3306] epoll_wait(8, {{EPOLLOUT, {u32=14, u64=14}}}, 32, 53841) = 1 [pid 3306] clock_gettime(CLOCK_MONOTONIC, {1616277, 212805117}) = 0 [pid 3306] epoll_ctl(8, EPOLL_CTL_MOD, 14, {EPOLLIN, {u32=14, u64=14}}) = 0 [pid 3306] epoll_wait(8, {{EPOLLIN, {u32=14, u64=14}}}, 32, 53834) = 1 [pid 3306] clock_gettime(CLOCK_MONOTONIC, {1616279, 477993117}) = 0 [pid 3306] read(14, "pass test2\r\n", 4095) = 12 [pid 3306] read(14, 0x7fffdd61be40, 4095) = -1 EAGAIN (Resource temporarily unavailable) [pid 3306] poll([{fd=3, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [pid 3306] write(3, "\1\0\0\0\16", 5) = 5 [pid 3306] read(3, <unfinished ...> [pid 3308] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 3308] futex(0x1a57c3d8, FUTEX_WAIT_PRIVATE, 2, NULL And lsof confirms that fd3 is indeed tcp connection to mysql. Telnet session to dbmail-pop3d returns nothing after pass is given, just sits there. One thing I need to verify - I have dbmail compiled against libmysqlclient15 from mysql 5.0, while I have mysq 5.1 on server with client version 16. ---------------------------------------------------------------------- (0003159) jpecar (reporter) - 01-Feb-11 13:19 http://www.dbmail.org/mantis/view.php?id=879#c3159 ---------------------------------------------------------------------- I've recompiled libzdb to link against mysqlclient 16, but I'm still observing the same behaviour: .... [pid 3306] 13:02:55 read(4, "user test2\r\n", 4095) = 12 [pid 3306] 13:02:55 read(4, 0x7fffdd61be40, 4095) = -1 EAGAIN (Resource temporarily unavailable) [pid 3306] 13:02:55 write(4, "+OK Password required for test2\r\n", 33) = 33 [pid 3306] 13:02:55 epoll_ctl(8, EPOLL_CTL_MOD, 4, {EPOLLIN|EPOLLOUT, {u32=4, u64=4}}) = 0 [pid 3306] 13:02:55 epoll_wait(8, {{EPOLLOUT, {u32=4, u64=4}}}, 32, 43147) = 1 [pid 3306] 13:02:55 clock_gettime(CLOCK_MONOTONIC, {1620237, 726346117}) = 0 [pid 3306] 13:02:55 epoll_ctl(8, EPOLL_CTL_MOD, 4, {EPOLLIN, {u32=4, u64=4}}) = 0 [pid 3306] 13:02:55 epoll_wait(8, {{EPOLLIN, {u32=4, u64=4}}}, 32, 43140) = 1 [pid 3306] 13:02:57 clock_gettime(CLOCK_MONOTONIC, {1620239, 367304117}) = 0 [pid 3306] 13:02:57 read(4, "pass test2\r\n", 4095) = 12 [pid 3306] 13:02:57 read(4, 0x7fffdd61be40, 4095) = -1 EAGAIN (Resource temporarily unavailable) [pid 3306] 13:02:57 poll([{fd=3, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [pid 3306] 13:02:57 write(3, "\1\0\0\0\16", 5) = 5 [pid 3306] 13:02:57 read(3, <unfinished ...> [pid 3308] 13:03:19 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 3308] 13:03:19 futex(0x1a57c3d8, FUTEX_WAIT_PRIVATE, 2, NULL waiting ... and then I unblock mysql in iptables: <unfinished ...> [pid 3306] 13:12:27 <... read resumed> "\7\0\0\1\0\0\0\2\0\0\0", 16384) = 11 [pid 3306] 13:12:27 futex(0x1a57c3d8, FUTEX_WAKE_PRIVATE, 1 <unfinished ....> [pid 3308] 13:12:27 <... futex resumed> ) = 0 [pid 3306] 13:12:27 <... futex resumed> ) = 1 [pid 3308] 13:12:27 futex(0x1a57c3d8, FUTEX_WAKE_PRIVATE, 1 <unfinished ....> [pid 3306] 13:12:27 poll([{fd=3, events=POLLIN|POLLPRI}], 1, 0 <unfinished ...> [pid 3308] 13:12:27 <... futex resumed> ) = 0 [pid 3306] 13:12:27 <... poll resumed> ) = 0 (Timeout) [pid 3308] 13:12:27 clock_gettime(CLOCK_REALTIME, <unfinished ...> [pid 3306] 13:12:27 write(3, "0\0\0\0\26SELECT 1=1 FROM dbmail_usermap LIMIT 1 OFFSET 0", 52 <unfinished ...> [pid 3308] 13:12:27 <... clock_gettime resumed> {1296562347, 953367000}) = 0 [pid 3306] 13:12:27 <... write resumed> ) = 52 [pid 3308] 13:12:27 futex(0x1a57c3ac, FUTEX_WAIT_PRIVATE, 49, {59, 46633000} <unfinished ...> [pid 3306] 13:12:27 read(3, "\f\0\0\1\0\f\0\0\0\1\0\0\0\0\0\0\31\0\0\2\3def\0\0\0\0031=1\0\f?\0\1\0\0\0\10\201\0\0\0\0\5\0\0\3\376\0\0\2\0", 16384) = 54 [pid 3306] 13:12:27 poll([{fd=3, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [pid 3306] 13:12:27 write(3, "\n\0\0\0\27\f\0\0\0\1\1\0\0\0", 14) = 14 [pid 3306] 13:12:27 read(3, "\1\0\0\1\1\34\0\0\2\3def\0\0\0\0031=1\0031=1\f?\0\1\0\0\0\3\1\0\0\0\0\5\0\0\3\376\0\0b\0", 16384) = 46 [pid 3306] 13:12:27 poll([{fd=3, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [pid 3306] 13:12:27 write(3, "\5\0\0\0\31\f\0\0\0", 9) = 9 [pid 3306] 13:12:27 poll([{fd=3, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [pid 3306] 13:12:27 write(3, "\1\0\0\0\16", 5) = 5 [pid 3306] 13:12:27 read(3, "\7\0\0\1\0\0\0\2\0\0\0", 16384) = 11 [pid 3306] 13:12:27 poll([{fd=3, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) [pid 3306] 13:12:27 write(3, "x\0\0\0\26SELECT login, sock_allow, sock_deny, userid FROM dbmail_usermap WHERE login in (?,'ANY') ORDER BY sock_allow, sock_deny", 124) = 124 .... and am of course greeted with "-ERR I'm leaving, you're too slow" :) Now I'd like to figure out what's going on here. Can you point me to the code I shoud start reading? Is it in dbmail or in libzdb? ---------------------------------------------------------------------- (0003160) paul (administrator) - 01-Feb-11 14:01 http://www.dbmail.org/mantis/view.php?id=879#c3160 ---------------------------------------------------------------------- Looks like imap is doing Connection_ping, but none of the other daemons is. Will fix. ---------------------------------------------------------------------- (0003161) paul (administrator) - 01-Feb-11 14:11 http://www.dbmail.org/mantis/view.php?id=879#c3161 ---------------------------------------------------------------------- I've pushed a partial fix for this. It doesn't yet try to reconnect when the database becomes reachable again. Feel free to improve this. ---------------------------------------------------------------------- (0003162) jpecar (reporter) - 07-Feb-11 13:57 http://www.dbmail.org/mantis/view.php?id=879#c3162 ---------------------------------------------------------------------- The fix doesn't really achieve what we intend. Now when the access to db is rejected with iptables, connection to pop3d hangs even before displaying the pop3 banner. I guess logging at 511 is more readable than strace: Feb 7 13:42:48 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Debug:[server] _sock_cb(+467): 12 2, 0x70c3e80, ssl:N Feb 7 13:42:48 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Info:[server] _sock_cb(+507): connection accepted Feb 7 13:42:48 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Debug:[clientbase] client_init(+162): saddr [0x70c8980] sa_family [2] len [16] Feb 7 13:42:48 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Notice:[clientbase] client_init(+167): incoming connection on [127.0.0.1:110] Feb 7 13:42:48 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Debug:[clientbase] client_init(+170): caddr [0x70c88f0] sa_family [2] len [16] Feb 7 13:42:48 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Notice:[clientbase] client_init(+185): incoming connection from [127.0.0.1:35941] Feb 7 13:42:48 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Debug:[misc] create_unique_id(+127): created: 2786c3a51cbdaee3090727751bf1a186 Feb 7 13:42:48 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Debug:[clientbase] ci_uncork(+212): [0x70cead0] And then nothing happens until I unblock db access: Feb 7 13:50:18 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Database:[db] db_con_get(+256): [0x7090610] connection from pool Feb 7 13:50:18 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Database:[db] db_con_close(+274): [0x7090610] connection to pool Feb 7 13:50:18 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Info:[clientbase] ci_write(+311): [0x70cead0] S > [87/87:+OK DBMAIL pop3 server ready to rock <2786c3a51cbdaee3090727751bf1a ]6@email-test-01> Feb 7 13:50:18 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Debug:[clientsession] socket_write_cb(+161): [0x70c8ad0] state: [0] Feb 7 13:50:18 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Debug:[pop3] pop3_cb_write(+244): [0x70c8ad0] state: [0] Feb 7 13:50:18 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Debug:[clientsession] socket_write_cb(+169): reset timeout [60] Feb 7 13:50:18 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Info:[clientbase] ci_write(+311 ] [0x70cead0] S > [35/35:-ERR I'm leaving, you're too slow Feb 7 13:50:18 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Debug:[clientsession] socket_write_cb(+161): [0x70c8ad0] state: [5] Feb 7 13:50:18 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Debug:[pop3] pop3_cb_write(+244): [0x70c8ad0] state: [5] Feb 7 13:50:18 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Debug:[clientsession] client_session_bailout(+108): [0x70c8ad0] Feb 7 13:50:18 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Debug:[clientbase] ci_close(+491): closing clientbase [0x70cead0] Feb 7 13:50:18 email-test-01 dbmail/pop3d[6638]: [0x706e3c0] Debug:[clientbase] ci_cork(+205): [0x70cead0] Do you have a picture of your state diagram or something like that that would speed up the thinking where to strategicaly place dm_db_ping checks? ---------------------------------------------------------------------- (0003163) jpecar (reporter) - 15-Feb-11 16:27 http://www.dbmail.org/mantis/view.php?id=879#c3163 ---------------------------------------------------------------------- I'm proceeding to write my monitoring scripts around the behaviour that we have now - no banner if no db connection. So please note in changelog if/when that will change. Thanks. Issue History Date Modified Username Field Change ====================================================================== 31-Jan-11 16:21 jpecar New Issue 31-Jan-11 16:33 paul Note Added: 0003156 01-Feb-11 12:04 jpecar Note Added: 0003158 01-Feb-11 13:19 jpecar Note Added: 0003159 01-Feb-11 14:01 paul Note Added: 0003160 01-Feb-11 14:11 paul Note Added: 0003161 07-Feb-11 13:57 jpecar Note Added: 0003162 15-Feb-11 16:27 jpecar Note Added: 0003163 ====================================================================== _______________________________________________ Dbmail-dev mailing list [email protected] http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev
