A BUGNOTE has been added to this bug.
======================================================================
http://www.dbmail.org/mantis/bug_view_advanced_page.php?bug_id=0000162
======================================================================
Reported By:                xing
Assigned To:                
======================================================================
Project:                    DBMail
Bug ID:                     162
Category:                   POP3 daemon
Reproducibility:            always
Severity:                   major
Priority:                   normal
Status:                     new
======================================================================
Date Submitted:             18-Jan-05 01:44 CET
Last Modified:              20-Apr-05 00:13 CEST
======================================================================
Summary:                    dbmail-pop3d zombies galore..
Description: 
Belive this problem started with 2.0.3

dbmail-pop3d is creating a bunch of dbmail-pop3d zombie proceses that must
be killed via kill -9 switch.

I see a lot of the following in my mail log. 

serverchild.c,CreateChild: child_register failed
Jan 17 16:29:16 mail dbmail/pop3d[19630]: serverchild.c,CreateChild:
child_register failed

as shown in ps:

19624 ?        Z      0:00 [dbmail-pop3d] <defunct>
19625 ?        Z      0:00 [dbmail-pop3d] <defunct>
19626 ?        Z      0:00 [dbmail-pop3d] <defunct>

I have 144 of these zombies at this very moment even though I just killed
them all and restarted pop3d daemon a minute ago.

Important Note: Setting trace=5 for pop3d ALLEVIATES the problem! Thus I
cannot provide trace info here. Weird. I have duplicated this many times
on my end before submitting this report.

Here is my relevant dbmail.conf entires:
[DBMAIL]
# Database settings
host=localhost
user=postfix
pass=postfix
db=dbmail
sqlsocket=/tmp/mysql.sock
# trace level for dbmail-maintenance
TRACE_LEVEL=1


[POP]
EFFECTIVE_USER=postfix            # the user that dbmail-pop3d will run as
(need to be root to bind to a port<1024)
EFFECTIVE_GROUP=postfix           # the group that dbmail-pop3d will run
as
BINDIP=*                          # the ipaddress the dbmail-pop3d server
has to bind to, * for all addresses
PORT=110                          # the port number the dbmail-pop3d
server has to bind to.
NCHILDREN=5                       # default number of POP3 handlers (each
is a process)
MAXCHILDREN=20                    # mac. number of POP3 handlers
MAXCONNECTS=10000                 # the maximum number of connections a
default childs makes
TIMEOUT=31                        # the time (s) before the dbmail-pop3d
should shutdown a connection which is being idle.
RESOLVE_IP=no                    # if yes, the pop daemon resolves IP
numbers to DNS names in the log
POP_BEFORE_SMTP=no
TRACE_LEVEL=1




======================================================================

----------------------------------------------------------------------
 paul - 18-Jan-05 09:25 CET 
----------------------------------------------------------------------
Xing,

I recently changed the manage_stop_children code to fix bug 
http://www.dbmail.org/mantis/bug_view_advanced_page.php?bug_id=0000158. Could
you please test the current 2.0 cvs code to check if that also helps in
your case?

----------------------------------------------------------------------
 xing - 18-Jan-05 11:47 CET 
----------------------------------------------------------------------
Checked out the CVS branch and still have the exact same problem.

Again the weird thing here is that the bug is completedly gone, when trace
is set to 5 for pop daemon in dbmail.conf. 

My only theory based on the trace level difference is perhaps the trace=5
produces noticeable "delays" between thread/process forking which allow
the system to work? Without the verbose trace, the server is trying to
spawn way too fast? Just a wild guess.

Extra info:

I can reproduce this bug with trace=1 almost immediately upon pop3d
startup each time. However, sometimes, the startup would be fine but after
3-5 minutes, all the childs get unregistered and the registering/failed
attempts create the same zombie pool. So the problem not only related to
startup.

edited on: 18-Jan-05 11:47

----------------------------------------------------------------------
 sersop - 26-Jan-05 11:39 CET 
----------------------------------------------------------------------
the same problem for dbmail-pop3d and dbmail-lmtpd on high load system

Fedora Core 2
Linux  2.6.10 
http://www.dbmail.org/mantis/bug_view_advanced_page.php?bug_id=0000001 SMP Mon 
Jan 24 14:01:32 YEKT 2005 i686 i686 i386
GNU/Linux

----------------------------------------------------------------------
 xing - 31-Jan-05 08:12 CET 
----------------------------------------------------------------------
Running the trace=5 workdaround has so far eliminated the pop3d errors for
the past week but today my 2.0.3 dbmail-pop3d servers completedly locked
up. It will not accept any new connections yet it is running. I feel this
is related to the zombie problem as far as the server thread starting and
killing child processes.

----------------------------------------------------------------------
 paul - 28-Mar-05 12:52 CEST 
----------------------------------------------------------------------
Just an idea: I don't see any MINSPARECHILDREN/MAXSPARECHILDREN settings in
your config. 

Not that such should really matter, but please try if that makes a
difference...

----------------------------------------------------------------------
 xing - 08-Apr-05 04:12 CEST 
----------------------------------------------------------------------
Paul,

I had been running with trace=2 for both pop/imap daemons to avoid the
zombie problem. For whatever reason, the extra logging stopped the runaway
processe. Just tried your advice of adding:

MINSPARECHILDREN=2
MAXSPARECHILDREN=4

to both my imap/pop confg lines in dbmail.conf and so far it's has been
running zombie free on trace=1 for 48 hours. Can't say it's fixed for sure
but looks like it. The zombie problem usually manifest itself within
minutes under high load.

----------------------------------------------------------------------
 tukon - 20-Apr-05 00:13 CEST 
----------------------------------------------------------------------
Hi,

I set up a mailserver using postfix->dbmail-lmtpd->dbmail-imapd with a
postgresql 8 backend and it's been creating zombie processes much like the
ones described in this bugnote. I'm guessing its due to some sort of
misconfiguration, but perhaps someone could help me out here. Thanks.

DBMail version: dbmail_2_0_branch_26_Mar_2005.tgz
FreeBSD 5-STABLE from March 25, 2005

dbmail.conf
---begin--------------------------------------------------------
# $Id: dbmail.conf 1539 2004-12-27 21:41:07Z paul $
# (c) 2000-2002 IC&S, The Netherlands 
#
# Configuration file for DBMAIL 
# This configuration file needs to be run through dbmail-config to be
effective
# after that, changes are effective inmediatly 


[DBMAIL] 
# Database settings
host=localhost          # host for database, set to localhost if database
is om 
                        # the same host as dbmail and you want to use a
local socket
                        # for connecting. 
sqlport=                # if you want to use TCP/IP for connecting to the
database,
                        # and have the database running on a non-standard
port. 
sqlsocket=.s.PGSQL.5432              # when using a local socket
connection to the database, fill
                        # in the path to the socket here (e.g.
/var/run/mysql.sock)
user=dbmail                   # user to connect as to database
pass=                  # password for user to database
db=dbmail                     # name of database
POSTMASTER=             # postmaster's email address.
# trace level for dbmail-util   
TRACE_LEVEL=5      

[SMTP]
SENDMAIL=/usr/sbin/sendmail     # sendmail executable for forwarding mail
AUTO_NOTIFY=no
AUTO_REPLY=no
TRACE_LEVEL=3

[LMTP]
EFFECTIVE_USER=dbmail             # the user that dbmail-lmtpd will run as 
(need to
be root to bind to a port<1024)
EFFECTIVE_GROUP=dbmail           # the group that dbmail-lmtpd will run as

BINDIP=*                  # the ipaddress the dbmail-lmtpd server has to bind
                          # to, * for all adresses. Use 127.0.0.1 to only 
                          # bind to localhost.
PORT=24                           # the port number the dbmail-lmtpd server has 
to bind to. 
NCHILDREN=2                       # default number of LMTP handlers (each is a 
process) 
MAXCHILDREN=10                    # max. number of LMTP handlers
MINSPARECHILDREN=2
MAXSPARECHILDREN=4
MAXCONNECTS=10000                 # the maximum number of connections a default 
childs
makes
TIMEOUT=300                       # the time (s) before the dbmail-lmtpd should 
shutdown a
connection which is being idle.
RESOLVE_IP=no                    # if yes, the lmtp daemon resolves IP
numbers to DNS names in the log
TRACE_LEVEL=5
MAX_ERRORS=500

[POP]
EFFECTIVE_USER=dbmail             # the user that dbmail-pop3d will run as 
(need to
be root to bind to a port<1024)
EFFECTIVE_GROUP=dbmail           # the group that dbmail-pop3d will run as

BINDIP=*                          # the ipaddress the dbmail-pop3d server has 
to bind to, * for
all addresses
PORT=110                          # the port number the dbmail-pop3d server has 
to bind to. 
NCHILDREN=5                       # default number of POP3 handlers (each is a 
process) 
MAXCHILDREN=200                   # mac. number of POP3 handlers
MINSPARECHILDREN=2
MAXSPARECHILDREN=4
MAXCONNECTS=10000                 # the maximum number of connections a default 
childs
makes
TIMEOUT=300                       # the time (s) before the dbmail-pop3d should 
shutdown a
connection which is being idle.
RESOLVE_IP=no                    # if yes, the pop daemon resolves IP
numbers to DNS names in the log
POP_BEFORE_SMTP=no
TRACE_LEVEL=5

[IMAP]
EFFECTIVE_USER=dbmail
EFFECTIVE_GROUP=dbmail
BINDIP=*
PORT=143
NCHILDREN=2
MAXCHILDREN=200                   # mac. number of IMAPD handlers
MINSPARECHILDREN=2
MAXSPARECHILDREN=4
MAXCONNECTS=10000                 # the maximum number of connections a default 
childs
makes
TIMEOUT=4000                      # the time (s) before the dbmail-imapd should 
shutdown a
connection which is being idle.
RESOLVE_IP=no                    # if yes, the imap daemon resolves IP
numbers to DNS names in the log
IMAP_BEFORE_SMTP=no
TRACE_LEVEL=5

# end of configuration file
---end-----------------------------------------------------------

"ps -aux | grep defunct" lines:
---begin---------------------------------------------------------
dbmail  57638  0.0  0.0     0     0  ??  Z    12:46PM   0:00.13 <defunct>
dbmail  57639  0.0  0.0     0     0  ??  Z    12:46PM   0:00.08 <defunct>
dbmail  57644  0.0  0.0     0     0  ??  Z    12:46PM   0:01.39 <defunct>
dbmail  57741  0.0  0.0     0     0  ??  Z     1:04PM   0:00.58 <defunct>
dbmail  57743  0.0  0.0     0     0  ??  Z     1:04PM   0:00.29 <defunct>
dbmail  58106  0.0  0.0     0     0  ??  Z     2:41PM   0:00.01 <defunct>
---end-----------------------------------------------------------

"cat /var/log/maillog | grep 57638" pertinent lines:
---begin---------------------------------------------------------
Apr 19 12:51:52 router dbmail/imap4d[57638]: IMAPClientHandler(): Finished
command close
Apr 19 12:51:52 router dbmail/imap4d[57638]: IMAPClientHandler(): line
read for PID 57638
Apr 19 12:51:52 router dbmail/imap4d[57638]: COMMAND: [13 logout]
Apr 19 12:51:52 router dbmail/imap4d[57638]: IMAPClientHandler():
Executing command logout...
Apr 19 12:51:52 router dbmail/imap4d[57638]: _ic_logout(): user (id:3)
logging out @ [2005-04-19 12:51:52]
Apr 19 12:51:52 router dbmail/imap4d[57638]: IMAPClientHandler(): Finished
command logout
Apr 19 12:51:52 router dbmail/imap4d[57638]: IMAPClientHandler(): Closing
connection for client from IP [x.x.x.x]
Apr 19 12:51:52 router dbmail/imap4d[57638]: PerformChildTask(): client
handling complete, closing streams
Apr 19 12:51:52 router dbmail/imap4d[57638]: serverchild.c,client_close:
closing write stream
Apr 19 12:51:52 router dbmail/imap4d[57638]: serverchild.c,client_close:
closing read stream
Apr 19 12:51:52 router dbmail/imap4d[57638]: PerformChildTask():
connection closed
Apr 19 12:51:52 router dbmail/imap4d[57638]: PerformChildTask(): waiting
for connection
Apr 19 12:51:52 router dbmail/imap4d[57638]:
pool.c,child_reg_disconnected: [57638]
Apr 19 12:51:57 router dbmail/imap4d[57630]: pool.c,manage_spare_children:
killing overcomplete spare [57638]
Apr 19 12:51:57 router dbmail/imap4d[57638]:
serverchild.c,active_child_sig_handler: got signal [15]
Apr 19 12:51:57 router dbmail/imap4d[57638]:
serverchild.c,active_child_sig_handler: setting stop request
Apr 19 12:51:57 router dbmail/imap4d[57638]: PerformChildTask(): accept
failed
Apr 19 12:51:57 router dbmail/imap4d[57638]: PerformChildTask(): stop
requested
Apr 19 12:51:57 router dbmail/imap4d[57638]:
pool.c,child_reg_disconnected: [57638]
Apr 19 12:51:57 router dbmail/imap4d[57638]: serverchild.c,disconnect_all:
database connection still open, closing
Apr 19 12:51:57 router dbmail/imap4d[57638]: pool.c,child_unregister:
child [57638] unregistered
---end------------------------------------------------------------

Bug History
Date Modified  Username       Field                    Change              
======================================================================
18-Jan-05 01:44xing           New Bug                                      
18-Jan-05 09:25paul           Bugnote Added: 0000539                       
18-Jan-05 11:42xing           Bugnote Added: 0000540                       
18-Jan-05 11:47xing           Bugnote Edited: 0000540                      
26-Jan-05 11:39sersop         Bugnote Added: 0000569                       
31-Jan-05 08:12xing           Bugnote Added: 0000572                       
28-Mar-05 12:52paul           Bugnote Added: 0000637                       
08-Apr-05 04:12xing           Bugnote Added: 0000653                       
20-Apr-05 00:13tukon          Bugnote Added: 0000660                       
======================================================================

Reply via email to