The following issue has been RESOLVED. ====================================================================== http://www.dbmail.org/mantis/view.php?id=1043 ====================================================================== Reported By: jamesgreig Assigned To: ====================================================================== Project: DBMail Issue ID: 1043 Category: POP3 daemon Reproducibility: random Severity: crash Priority: normal Status: resolved target: Resolution: fixed Fixed in Version: ====================================================================== Date Submitted: 11-Feb-14 11:23 CET Last Modified: 17-Feb-14 16:53 CET ====================================================================== Summary: exited on signal 11 Description: Since rebuilding a number of dbmail machines onto FreeBSD 9.1 and 9.2 with dbmail 3.1.10 we are seeing random crashes of the pop3d daemon as below:-
Feb 11 09:09:11 redacted-db-4a kernel: pid 1577 (dbmail-pop3d), uid 65534: exited on signal 11 Each time it fails on signal 11. ====================================================================== ---------------------------------------------------------------------- (0003637) jamesgreig (reporter) - 11-Feb-14 12:51 http://www.dbmail.org/mantis/view.php?id=1043#c3637 ---------------------------------------------------------------------- Currently using libzdb from ports in freebsd version:- libzdb-2.11.3 and libevent2 from ports - libevent-2.0.21-stable ---------------------------------------------------------------------- (0003638) jamesgreig (reporter) - 11-Feb-14 17:00 http://www.dbmail.org/mantis/view.php?id=1043#c3638 ---------------------------------------------------------------------- this should actually be a crash but I failed to mark as a crash when I submitted it. ---------------------------------------------------------------------- (0003639) alan (reporter) - 12-Feb-14 13:23 http://www.dbmail.org/mantis/view.php?id=1043#c3639 ---------------------------------------------------------------------- Alas with my limited testing I'm unable to reproduce this bug. Would it be feasible to get a backtrace after building a debug version? (adding WITH_DEBUG_PORTS=mail/dbmail to /etc/make.conf) ---------------------------------------------------------------------- (0003640) jamesgreig (reporter) - 12-Feb-14 13:50 http://www.dbmail.org/mantis/view.php?id=1043#c3640 ---------------------------------------------------------------------- Hi Alan, Getting it on 2 machines. Unfortunately they're live so it's a bit tricky but will see what I can get. I've enabled debugging in dbmail itself to see what I can get. If it fails again i'll rebuild one of them as above. ---------------------------------------------------------------------- (0003641) jamesgreig (reporter) - 12-Feb-14 13:54 http://www.dbmail.org/mantis/view.php?id=1043#c3641 ---------------------------------------------------------------------- Presume this isn't relevant:- http://www.gossamer-threads.com/lists/dbmail/users/34008 though admittedly libzdb in this instance is version 2.11.3 ---------------------------------------------------------------------- (0003642) paul (administrator) - 12-Feb-14 13:55 http://www.dbmail.org/mantis/view.php?id=1043#c3642 ---------------------------------------------------------------------- James, the thread you mention only affects imapd. Getting a debug trace on the crash would help. ---------------------------------------------------------------------- (0003643) jamesgreig (reporter) - 12-Feb-14 14:12 http://www.dbmail.org/mantis/view.php?id=1043#c3643 ---------------------------------------------------------------------- Oddly the last 2 deaths on one machine were the following (different signals):- Feb 11 22:13:45 mail4-db-3a kernel: pid 78837 (dbmail-imapd), uid 65534: exited on signal 6 Feb 12 11:54:32 mail4-db-3a kernel: pid 2346 (dbmail-pop3d), uid 65534: exited on signal 10 This particular machine was running fine with with 3.0.2 ---------------------------------------------------------------------- (0003644) paul (administrator) - 13-Feb-14 09:56 http://www.dbmail.org/mantis/view.php?id=1043#c3644 ---------------------------------------------------------------------- signal 6 is SIGABRT resulting from an assert statement. Those are used quite extensively so without knowing where it was raised, it's impossible to tell what's going on. signal(7) mentions sig-10 twice (SIGUSR1 and SIGBUS) so given the crash I assume it's a SIGBUS. Running either process at logging_levels=255 would provide a good indication of where in the code it happens. Another approach would be to setup your environment to generate core dumps. How to do that depends, but maybe this link can help: http://stackoverflow.com/questions/16610626/forcing-program-to-create-coredump-on-freebsd ---------------------------------------------------------------------- (0003645) jamesgreig (reporter) - 14-Feb-14 12:18 http://www.dbmail.org/mantis/view.php?id=1043#c3645 ---------------------------------------------------------------------- Hi Paul, I've not yet had a chance to rebuild with the make options, however, it has just died with debugging enabled in the conf reporting the following:- Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[pop3] pop3(+383): incoming buffer: [DELE 3366] Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[pop3] pop3(+404): state[2], command issued :cmd [DELE], value [3366] Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[pop3] pop3(+416): command looked up as commandtype 6 Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientbase] ci_write(+343): [0x804cca000] S > [22/22:-ERR too many errors^M ] Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientbase] ci_cork(+205): [0x804cca000] Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientsession] client_session_bailout(+149): [0x80441b700] Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientbase] ci_close(+517): closing clientbase [0x804cca000] [19] [19] Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientbase] ci_cork(+205): [0x804cca000] Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientbase] ci_cork(+205): [0x804cca000] Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientsession] client_session_bailout(+149): [0x80441b700] Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientbase] ci_close(+517): closing clientbase [0x804cca000] [-1] [-1] Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientbase] ci_cork(+205): [0x804cca000] Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientbase] ci_close(+531): [Bad file descriptor] pid 68120 (dbmail-pop3d), uid 65534: exited on signal 11 (I then restarted it here) Feb 14 11:15:18 mail4-db-3a dbmail/pop3d[18986]: Debug:[server] server_config_load(+1009): max_db_connections [10] I have more of the debug log if it helps at all ---------------------------------------------------------------------- (0003646) jamesgreig (reporter) - 14-Feb-14 13:40 http://www.dbmail.org/mantis/view.php?id=1043#c3646 ---------------------------------------------------------------------- 30 minutes later I got the same result:- Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase] ci_write(+343): [0x80544a000] S > [3/3:.^M ] Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase] ci_uncork(+212): [0x80544a000] Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientsession] socket_write_cb(+283): reset timeout [300] Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase] ci_read_cb(+376): [0x804629000] [11] Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase] ci_read_cb(+376): [0x804629000] [-1] Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase] ci_cork(+205): [0x804629000] Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[pop3] pop3(+383): incoming buffer: [DELE 1498] Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[pop3] pop3(+404): state[2], command issued :cmd [DELE], value [1498] Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[pop3] pop3(+416): command looked up as commandtype 6 Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase] ci_write(+343): [0x804629000] S > [22/22:-ERR too many errors^M ] Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase] ci_cork(+205): [0x804629000] Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientsession] client_session_bailout(+149): [0x80441b500] Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase] ci_close(+517): closing clientbase [0x804629000] [15] [15] Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase] ci_cork(+205): [0x804629000] Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase] ci_cork(+205): [0x804629000] Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientsession] client_session_bailout(+149): [0x80441b500] Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase] ci_close(+517): closing clientbase [0x804629000] [-1] [-1] Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase] ci_cork(+205): [0x804629000] Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase] ci_close(+531): [Bad file descriptor] ---------------------------------------------------------------------- (0003647) alan (reporter) - 17-Feb-14 14:53 http://www.dbmail.org/mantis/view.php?id=1043#c3647 ---------------------------------------------------------------------- This appears to be caused by client_session_bailout called twice after 'too many errors', the first time succeeding 'closing clientbase [0x804629000] [15] [15]', the second failing 'closing clientbase [0x804629000] [-1] [-1]' with EBADF 'Bad file descriptor' I've been unable to spot where the second call to client_session_bailout might be coming from; perhaps the call to shutdown(client->sock->sock, SHUT_RDWR) might benefit from testing client->[tx|rx] for >-1? ---------------------------------------------------------------------- (0003648) paul (administrator) - 17-Feb-14 16:53 http://www.dbmail.org/mantis/view.php?id=1043#c3648 ---------------------------------------------------------------------- I've just pushed http://git.dbmail.eu/paul/dbmail/commit/?h=dbmail_3_1&id=d08e9e57cdda0759016b488d923bd1390ce4348a which fixes this issue. Issue History Date Modified Username Field Change ====================================================================== 11-Feb-14 11:23 jamesgreig New Issue 11-Feb-14 12:51 jamesgreig Note Added: 0003637 11-Feb-14 17:00 jamesgreig Note Added: 0003638 12-Feb-14 13:23 alan Note Added: 0003639 12-Feb-14 13:50 jamesgreig Note Added: 0003640 12-Feb-14 13:54 jamesgreig Note Added: 0003641 12-Feb-14 13:55 paul Note Added: 0003642 12-Feb-14 13:56 paul Severity minor => crash 12-Feb-14 13:56 paul Status new => acknowledged 12-Feb-14 14:12 jamesgreig Note Added: 0003643 13-Feb-14 09:56 paul Note Added: 0003644 14-Feb-14 12:17 jamesgreig Note Added: 0003645 14-Feb-14 12:18 jamesgreig Note Edited: 0003645 14-Feb-14 13:40 jamesgreig Note Added: 0003646 17-Feb-14 14:53 alan Note Added: 0003647 17-Feb-14 16:53 paul Note Added: 0003648 17-Feb-14 16:53 paul Status acknowledged => resolved 17-Feb-14 16:53 paul Resolution open => fixed ====================================================================== _______________________________________________ Dbmail-dev mailing list Dbmail-dev@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev