The following issue has been RESOLVED. 
====================================================================== 
http://www.dbmail.org/mantis/view.php?id=1043 
====================================================================== 
Reported By:                jamesgreig
Assigned To:                
====================================================================== 
Project:                    DBMail
Issue ID:                   1043
Category:                   POP3 daemon
Reproducibility:            random
Severity:                   crash
Priority:                   normal
Status:                     resolved
target:                      
Resolution:                 fixed
Fixed in Version:           
====================================================================== 
Date Submitted:             11-Feb-14 11:23 CET
Last Modified:              17-Feb-14 16:53 CET
====================================================================== 
Summary:                    exited on signal 11
Description: 
Since rebuilding a number of dbmail machines onto FreeBSD 9.1 and 9.2 with
dbmail 3.1.10 we are seeing random crashes of the pop3d daemon as below:-

Feb 11 09:09:11 redacted-db-4a kernel: pid 1577 (dbmail-pop3d), uid 65534:
exited on signal 11

Each time it fails on signal 11.  
====================================================================== 

---------------------------------------------------------------------- 
 (0003637) jamesgreig (reporter) - 11-Feb-14 12:51
 http://www.dbmail.org/mantis/view.php?id=1043#c3637 
---------------------------------------------------------------------- 
Currently using libzdb from ports in freebsd version:- libzdb-2.11.3 and
libevent2 from ports - libevent-2.0.21-stable 

---------------------------------------------------------------------- 
 (0003638) jamesgreig (reporter) - 11-Feb-14 17:00
 http://www.dbmail.org/mantis/view.php?id=1043#c3638 
---------------------------------------------------------------------- 
this should actually be a crash but I failed to mark as a crash when I
submitted it. 

---------------------------------------------------------------------- 
 (0003639) alan (reporter) - 12-Feb-14 13:23
 http://www.dbmail.org/mantis/view.php?id=1043#c3639 
---------------------------------------------------------------------- 
Alas with my limited testing I'm unable to reproduce this bug.
Would it be feasible to get a backtrace after building a debug version?
(adding WITH_DEBUG_PORTS=mail/dbmail to /etc/make.conf) 

---------------------------------------------------------------------- 
 (0003640) jamesgreig (reporter) - 12-Feb-14 13:50
 http://www.dbmail.org/mantis/view.php?id=1043#c3640 
---------------------------------------------------------------------- 
Hi Alan,

Getting it on 2 machines.  Unfortunately they're live so it's a bit tricky
but will see what I can get.  I've enabled debugging in dbmail itself to
see what I can get.  If it fails again i'll rebuild one of them as above. 

---------------------------------------------------------------------- 
 (0003641) jamesgreig (reporter) - 12-Feb-14 13:54
 http://www.dbmail.org/mantis/view.php?id=1043#c3641 
---------------------------------------------------------------------- 
Presume this isn't relevant:-
http://www.gossamer-threads.com/lists/dbmail/users/34008  though admittedly
libzdb in this instance is version 2.11.3 

---------------------------------------------------------------------- 
 (0003642) paul (administrator) - 12-Feb-14 13:55
 http://www.dbmail.org/mantis/view.php?id=1043#c3642 
---------------------------------------------------------------------- 
James, the thread you mention only affects imapd.

Getting a debug trace on the crash would help. 

---------------------------------------------------------------------- 
 (0003643) jamesgreig (reporter) - 12-Feb-14 14:12
 http://www.dbmail.org/mantis/view.php?id=1043#c3643 
---------------------------------------------------------------------- 
Oddly the last 2 deaths on one machine were the following (different
signals):-

Feb 11 22:13:45 mail4-db-3a kernel: pid 78837 (dbmail-imapd), uid 65534:
exited on signal 6
Feb 12 11:54:32 mail4-db-3a kernel: pid 2346 (dbmail-pop3d), uid 65534:
exited on signal 10


This particular machine was running fine with with 3.0.2 

---------------------------------------------------------------------- 
 (0003644) paul (administrator) - 13-Feb-14 09:56
 http://www.dbmail.org/mantis/view.php?id=1043#c3644 
---------------------------------------------------------------------- 
signal 6 is SIGABRT resulting from an assert statement. Those are used
quite extensively so without knowing where it was raised, it's impossible
to tell what's going on.

signal(7) mentions sig-10 twice (SIGUSR1 and SIGBUS) so given the crash I
assume it's a SIGBUS.

Running either process at logging_levels=255 would provide a good
indication of where in the code it happens.

Another approach would be to setup your environment to generate core
dumps. How to do that depends, but maybe this link can help:

http://stackoverflow.com/questions/16610626/forcing-program-to-create-coredump-on-freebsd


---------------------------------------------------------------------- 
 (0003645) jamesgreig (reporter) - 14-Feb-14 12:18
 http://www.dbmail.org/mantis/view.php?id=1043#c3645 
---------------------------------------------------------------------- 
Hi Paul,

I've not yet had a chance to rebuild with the make options, however, it
has just died with debugging enabled in the conf reporting the following:-

Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[pop3] pop3(+383):
incoming buffer: [DELE 3366]
Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[pop3] pop3(+404):
state[2], command issued :cmd [DELE], value [3366]
Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[pop3] pop3(+416):
command looked up as commandtype 6
Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientbase]
ci_write(+343): [0x804cca000] S > [22/22:-ERR too many errors^M ]
Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientbase]
ci_cork(+205): [0x804cca000]
Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientsession]
client_session_bailout(+149): [0x80441b700]
Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientbase]
ci_close(+517): closing clientbase [0x804cca000] [19] [19]
Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientbase]
ci_cork(+205): [0x804cca000]
Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientbase]
ci_cork(+205): [0x804cca000]
Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientsession]
client_session_bailout(+149): [0x80441b700]
Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientbase]
ci_close(+517): closing clientbase [0x804cca000] [-1] [-1]
Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientbase]
ci_cork(+205): [0x804cca000]
Feb 14 11:03:59 mail4-db-3a dbmail/pop3d[68120]: Debug:[clientbase]
ci_close(+531): [Bad file descriptor]

pid 68120 (dbmail-pop3d), uid 65534: exited on signal 11

(I then restarted it here)

Feb 14 11:15:18 mail4-db-3a dbmail/pop3d[18986]: Debug:[server]
server_config_load(+1009): max_db_connections [10]


I have more of the debug log if it helps at all

 

---------------------------------------------------------------------- 
 (0003646) jamesgreig (reporter) - 14-Feb-14 13:40
 http://www.dbmail.org/mantis/view.php?id=1043#c3646 
---------------------------------------------------------------------- 
30 minutes later I got the same result:-

Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase]
ci_write(+343): [0x80544a000] S > [3/3:.^M ]
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase]
ci_uncork(+212): [0x80544a000]
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientsession]
socket_write_cb(+283): reset timeout [300]
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase]
ci_read_cb(+376): [0x804629000] [11]
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase]
ci_read_cb(+376): [0x804629000] [-1]
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase]
ci_cork(+205): [0x804629000]
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[pop3] pop3(+383):
incoming buffer: [DELE 1498]
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[pop3] pop3(+404):
state[2], command issued :cmd [DELE], value [1498]
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[pop3] pop3(+416):
command looked up as commandtype 6
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase]
ci_write(+343): [0x804629000] S > [22/22:-ERR too many errors^M ]
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase]
ci_cork(+205): [0x804629000]
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientsession]
client_session_bailout(+149): [0x80441b500]
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase]
ci_close(+517): closing clientbase [0x804629000] [15] [15]
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase]
ci_cork(+205): [0x804629000]
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase]
ci_cork(+205): [0x804629000]
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientsession]
client_session_bailout(+149): [0x80441b500]
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase]
ci_close(+517): closing clientbase [0x804629000] [-1] [-1]
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase]
ci_cork(+205): [0x804629000]
Feb 14 11:36:24 mail4-db-3a dbmail/pop3d[18988]: Debug:[clientbase]
ci_close(+531): [Bad file descriptor] 

---------------------------------------------------------------------- 
 (0003647) alan (reporter) - 17-Feb-14 14:53
 http://www.dbmail.org/mantis/view.php?id=1043#c3647 
---------------------------------------------------------------------- 
This appears to be caused by client_session_bailout called twice after 'too
many errors', the first time succeeding 'closing clientbase [0x804629000]
[15] [15]', the second failing 'closing clientbase [0x804629000] [-1] [-1]'
with EBADF 'Bad file descriptor'

I've been unable to spot where the second call to client_session_bailout
might be coming from; perhaps the call to shutdown(client->sock->sock,
SHUT_RDWR) might benefit from testing client->[tx|rx] for >-1? 

---------------------------------------------------------------------- 
 (0003648) paul (administrator) - 17-Feb-14 16:53
 http://www.dbmail.org/mantis/view.php?id=1043#c3648 
---------------------------------------------------------------------- 
I've just pushed 

http://git.dbmail.eu/paul/dbmail/commit/?h=dbmail_3_1&id=d08e9e57cdda0759016b488d923bd1390ce4348a

which fixes this issue. 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
11-Feb-14 11:23  jamesgreig     New Issue                                    
11-Feb-14 12:51  jamesgreig     Note Added: 0003637                          
11-Feb-14 17:00  jamesgreig     Note Added: 0003638                          
12-Feb-14 13:23  alan           Note Added: 0003639                          
12-Feb-14 13:50  jamesgreig     Note Added: 0003640                          
12-Feb-14 13:54  jamesgreig     Note Added: 0003641                          
12-Feb-14 13:55  paul           Note Added: 0003642                          
12-Feb-14 13:56  paul           Severity                 minor => crash      
12-Feb-14 13:56  paul           Status                   new => acknowledged 
12-Feb-14 14:12  jamesgreig     Note Added: 0003643                          
13-Feb-14 09:56  paul           Note Added: 0003644                          
14-Feb-14 12:17  jamesgreig     Note Added: 0003645                          
14-Feb-14 12:18  jamesgreig     Note Edited: 0003645                         
14-Feb-14 13:40  jamesgreig     Note Added: 0003646                          
17-Feb-14 14:53  alan           Note Added: 0003647                          
17-Feb-14 16:53  paul           Note Added: 0003648                          
17-Feb-14 16:53  paul           Status                   acknowledged =>
resolved
17-Feb-14 16:53  paul           Resolution               open => fixed       
======================================================================

_______________________________________________
Dbmail-dev mailing list
Dbmail-dev@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev

Reply via email to