A BUGNOTE has been added to this bug. ====================================================================== http://www.dbmail.org/mantis/bug_view_advanced_page.php?bug_id=0000162 ====================================================================== Reported By: xing Assigned To: ====================================================================== Project: DBMail Bug ID: 162 Category: POP3 daemon Reproducibility: always Severity: major Priority: normal Status: new ====================================================================== Date Submitted: 18-Jan-05 01:44 CET Last Modified: 11-May-05 18:46 CEST ====================================================================== Summary: dbmail-pop3d zombies galore.. Description: Belive this problem started with 2.0.3
dbmail-pop3d is creating a bunch of dbmail-pop3d zombie proceses that must be killed via kill -9 switch. I see a lot of the following in my mail log. serverchild.c,CreateChild: child_register failed Jan 17 16:29:16 mail dbmail/pop3d[19630]: serverchild.c,CreateChild: child_register failed as shown in ps: 19624 ? Z 0:00 [dbmail-pop3d] <defunct> 19625 ? Z 0:00 [dbmail-pop3d] <defunct> 19626 ? Z 0:00 [dbmail-pop3d] <defunct> I have 144 of these zombies at this very moment even though I just killed them all and restarted pop3d daemon a minute ago. Important Note: Setting trace=5 for pop3d ALLEVIATES the problem! Thus I cannot provide trace info here. Weird. I have duplicated this many times on my end before submitting this report. Here is my relevant dbmail.conf entires: [DBMAIL] # Database settings host=localhost user=postfix pass=postfix db=dbmail sqlsocket=/tmp/mysql.sock # trace level for dbmail-maintenance TRACE_LEVEL=1 [POP] EFFECTIVE_USER=postfix # the user that dbmail-pop3d will run as (need to be root to bind to a port<1024) EFFECTIVE_GROUP=postfix # the group that dbmail-pop3d will run as BINDIP=* # the ipaddress the dbmail-pop3d server has to bind to, * for all addresses PORT=110 # the port number the dbmail-pop3d server has to bind to. NCHILDREN=5 # default number of POP3 handlers (each is a process) MAXCHILDREN=20 # mac. number of POP3 handlers MAXCONNECTS=10000 # the maximum number of connections a default childs makes TIMEOUT=31 # the time (s) before the dbmail-pop3d should shutdown a connection which is being idle. RESOLVE_IP=no # if yes, the pop daemon resolves IP numbers to DNS names in the log POP_BEFORE_SMTP=no TRACE_LEVEL=1 ====================================================================== ---------------------------------------------------------------------- paul - 18-Jan-05 09:25 CET ---------------------------------------------------------------------- Xing, I recently changed the manage_stop_children code to fix bug http://www.dbmail.org/mantis/bug_view_advanced_page.php?bug_id=0000158. Could you please test the current 2.0 cvs code to check if that also helps in your case? ---------------------------------------------------------------------- xing - 18-Jan-05 11:47 CET ---------------------------------------------------------------------- Checked out the CVS branch and still have the exact same problem. Again the weird thing here is that the bug is completedly gone, when trace is set to 5 for pop daemon in dbmail.conf. My only theory based on the trace level difference is perhaps the trace=5 produces noticeable "delays" between thread/process forking which allow the system to work? Without the verbose trace, the server is trying to spawn way too fast? Just a wild guess. Extra info: I can reproduce this bug with trace=1 almost immediately upon pop3d startup each time. However, sometimes, the startup would be fine but after 3-5 minutes, all the childs get unregistered and the registering/failed attempts create the same zombie pool. So the problem not only related to startup. edited on: 18-Jan-05 11:47 ---------------------------------------------------------------------- sersop - 26-Jan-05 11:39 CET ---------------------------------------------------------------------- the same problem for dbmail-pop3d and dbmail-lmtpd on high load system Fedora Core 2 Linux 2.6.10 http://www.dbmail.org/mantis/bug_view_advanced_page.php?bug_id=0000001 SMP Mon Jan 24 14:01:32 YEKT 2005 i686 i686 i386 GNU/Linux ---------------------------------------------------------------------- xing - 31-Jan-05 08:12 CET ---------------------------------------------------------------------- Running the trace=5 workdaround has so far eliminated the pop3d errors for the past week but today my 2.0.3 dbmail-pop3d servers completedly locked up. It will not accept any new connections yet it is running. I feel this is related to the zombie problem as far as the server thread starting and killing child processes. ---------------------------------------------------------------------- paul - 28-Mar-05 12:52 CEST ---------------------------------------------------------------------- Just an idea: I don't see any MINSPARECHILDREN/MAXSPARECHILDREN settings in your config. Not that such should really matter, but please try if that makes a difference... ---------------------------------------------------------------------- xing - 08-Apr-05 04:12 CEST ---------------------------------------------------------------------- Paul, I had been running with trace=2 for both pop/imap daemons to avoid the zombie problem. For whatever reason, the extra logging stopped the runaway processe. Just tried your advice of adding: MINSPARECHILDREN=2 MAXSPARECHILDREN=4 to both my imap/pop confg lines in dbmail.conf and so far it's has been running zombie free on trace=1 for 48 hours. Can't say it's fixed for sure but looks like it. The zombie problem usually manifest itself within minutes under high load. ---------------------------------------------------------------------- tukon - 20-Apr-05 00:13 CEST ---------------------------------------------------------------------- Hi, I set up a mailserver using postfix->dbmail-lmtpd->dbmail-imapd with a postgresql 8 backend and it's been creating zombie processes much like the ones described in this bugnote. I'm guessing its due to some sort of misconfiguration, but perhaps someone could help me out here. Thanks. DBMail version: dbmail_2_0_branch_26_Mar_2005.tgz FreeBSD 5-STABLE from March 25, 2005 dbmail.conf ---begin-------------------------------------------------------- # $Id: dbmail.conf 1539 2004-12-27 21:41:07Z paul $ # (c) 2000-2002 IC&S, The Netherlands # # Configuration file for DBMAIL # This configuration file needs to be run through dbmail-config to be effective # after that, changes are effective inmediatly [DBMAIL] # Database settings host=localhost # host for database, set to localhost if database is om # the same host as dbmail and you want to use a local socket # for connecting. sqlport= # if you want to use TCP/IP for connecting to the database, # and have the database running on a non-standard port. sqlsocket=.s.PGSQL.5432 # when using a local socket connection to the database, fill # in the path to the socket here (e.g. /var/run/mysql.sock) user=dbmail # user to connect as to database pass= # password for user to database db=dbmail # name of database POSTMASTER= # postmaster's email address. # trace level for dbmail-util TRACE_LEVEL=5 [SMTP] SENDMAIL=/usr/sbin/sendmail # sendmail executable for forwarding mail AUTO_NOTIFY=no AUTO_REPLY=no TRACE_LEVEL=3 [LMTP] EFFECTIVE_USER=dbmail # the user that dbmail-lmtpd will run as (need to be root to bind to a port<1024) EFFECTIVE_GROUP=dbmail # the group that dbmail-lmtpd will run as BINDIP=* # the ipaddress the dbmail-lmtpd server has to bind # to, * for all adresses. Use 127.0.0.1 to only # bind to localhost. PORT=24 # the port number the dbmail-lmtpd server has to bind to. NCHILDREN=2 # default number of LMTP handlers (each is a process) MAXCHILDREN=10 # max. number of LMTP handlers MINSPARECHILDREN=2 MAXSPARECHILDREN=4 MAXCONNECTS=10000 # the maximum number of connections a default childs makes TIMEOUT=300 # the time (s) before the dbmail-lmtpd should shutdown a connection which is being idle. RESOLVE_IP=no # if yes, the lmtp daemon resolves IP numbers to DNS names in the log TRACE_LEVEL=5 MAX_ERRORS=500 [POP] EFFECTIVE_USER=dbmail # the user that dbmail-pop3d will run as (need to be root to bind to a port<1024) EFFECTIVE_GROUP=dbmail # the group that dbmail-pop3d will run as BINDIP=* # the ipaddress the dbmail-pop3d server has to bind to, * for all addresses PORT=110 # the port number the dbmail-pop3d server has to bind to. NCHILDREN=5 # default number of POP3 handlers (each is a process) MAXCHILDREN=200 # mac. number of POP3 handlers MINSPARECHILDREN=2 MAXSPARECHILDREN=4 MAXCONNECTS=10000 # the maximum number of connections a default childs makes TIMEOUT=300 # the time (s) before the dbmail-pop3d should shutdown a connection which is being idle. RESOLVE_IP=no # if yes, the pop daemon resolves IP numbers to DNS names in the log POP_BEFORE_SMTP=no TRACE_LEVEL=5 [IMAP] EFFECTIVE_USER=dbmail EFFECTIVE_GROUP=dbmail BINDIP=* PORT=143 NCHILDREN=2 MAXCHILDREN=200 # mac. number of IMAPD handlers MINSPARECHILDREN=2 MAXSPARECHILDREN=4 MAXCONNECTS=10000 # the maximum number of connections a default childs makes TIMEOUT=4000 # the time (s) before the dbmail-imapd should shutdown a connection which is being idle. RESOLVE_IP=no # if yes, the imap daemon resolves IP numbers to DNS names in the log IMAP_BEFORE_SMTP=no TRACE_LEVEL=5 # end of configuration file ---end----------------------------------------------------------- "ps -aux | grep defunct" lines: ---begin--------------------------------------------------------- dbmail 57638 0.0 0.0 0 0 ?? Z 12:46PM 0:00.13 <defunct> dbmail 57639 0.0 0.0 0 0 ?? Z 12:46PM 0:00.08 <defunct> dbmail 57644 0.0 0.0 0 0 ?? Z 12:46PM 0:01.39 <defunct> dbmail 57741 0.0 0.0 0 0 ?? Z 1:04PM 0:00.58 <defunct> dbmail 57743 0.0 0.0 0 0 ?? Z 1:04PM 0:00.29 <defunct> dbmail 58106 0.0 0.0 0 0 ?? Z 2:41PM 0:00.01 <defunct> ---end----------------------------------------------------------- "cat /var/log/maillog | grep 57638" pertinent lines: ---begin--------------------------------------------------------- Apr 19 12:51:52 router dbmail/imap4d[57638]: IMAPClientHandler(): Finished command close Apr 19 12:51:52 router dbmail/imap4d[57638]: IMAPClientHandler(): line read for PID 57638 Apr 19 12:51:52 router dbmail/imap4d[57638]: COMMAND: [13 logout] Apr 19 12:51:52 router dbmail/imap4d[57638]: IMAPClientHandler(): Executing command logout... Apr 19 12:51:52 router dbmail/imap4d[57638]: _ic_logout(): user (id:3) logging out @ [2005-04-19 12:51:52] Apr 19 12:51:52 router dbmail/imap4d[57638]: IMAPClientHandler(): Finished command logout Apr 19 12:51:52 router dbmail/imap4d[57638]: IMAPClientHandler(): Closing connection for client from IP [x.x.x.x] Apr 19 12:51:52 router dbmail/imap4d[57638]: PerformChildTask(): client handling complete, closing streams Apr 19 12:51:52 router dbmail/imap4d[57638]: serverchild.c,client_close: closing write stream Apr 19 12:51:52 router dbmail/imap4d[57638]: serverchild.c,client_close: closing read stream Apr 19 12:51:52 router dbmail/imap4d[57638]: PerformChildTask(): connection closed Apr 19 12:51:52 router dbmail/imap4d[57638]: PerformChildTask(): waiting for connection Apr 19 12:51:52 router dbmail/imap4d[57638]: pool.c,child_reg_disconnected: [57638] Apr 19 12:51:57 router dbmail/imap4d[57630]: pool.c,manage_spare_children: killing overcomplete spare [57638] Apr 19 12:51:57 router dbmail/imap4d[57638]: serverchild.c,active_child_sig_handler: got signal [15] Apr 19 12:51:57 router dbmail/imap4d[57638]: serverchild.c,active_child_sig_handler: setting stop request Apr 19 12:51:57 router dbmail/imap4d[57638]: PerformChildTask(): accept failed Apr 19 12:51:57 router dbmail/imap4d[57638]: PerformChildTask(): stop requested Apr 19 12:51:57 router dbmail/imap4d[57638]: pool.c,child_reg_disconnected: [57638] Apr 19 12:51:57 router dbmail/imap4d[57638]: serverchild.c,disconnect_all: database connection still open, closing Apr 19 12:51:57 router dbmail/imap4d[57638]: pool.c,child_unregister: child [57638] unregistered ---end------------------------------------------------------------ ---------------------------------------------------------------------- tukon - 11-May-05 18:46 CEST ---------------------------------------------------------------------- Looks like the issue is fixed. Now running 2.0-branch from May 10, 2005 without any issues. Been up for about a day with no zombie processes. Probably dupe of bug 199: http://www.dbmail.org/mantis/bug_view_advanced_page.php?bug_id=0000199 Bug History Date Modified Username Field Change ====================================================================== 18-Jan-05 01:44xing New Bug 18-Jan-05 09:25paul Bugnote Added: 0000539 18-Jan-05 11:42xing Bugnote Added: 0000540 18-Jan-05 11:47xing Bugnote Edited: 0000540 26-Jan-05 11:39sersop Bugnote Added: 0000569 31-Jan-05 08:12xing Bugnote Added: 0000572 28-Mar-05 12:52paul Bugnote Added: 0000637 08-Apr-05 04:12xing Bugnote Added: 0000653 20-Apr-05 00:13tukon Bugnote Added: 0000660 11-May-05 18:46tukon Bugnote Added: 0000696 ======================================================================
