The following issue has been REOPENED. ====================================================================== http://www.dbmail.org/mantis/view.php?id=256 ====================================================================== Reported By: idk Assigned To: paul ====================================================================== Project: DBMail Issue ID: 256 Category: General Reproducibility: always Severity: major Priority: normal Status: feedback ====================================================================== Date Submitted: 20-Aug-05 23:54 CEST Last Modified: 23-Aug-05 18:24 CEST ====================================================================== Summary: Invalid child management after database restart etc. Description: After stopping mysql service all children killed by pool manager (pool.c,manage_stop_children: General stop requested. Killing children..), after mysql service starting MINSPARECHILDREN only was started and any more children wasn't started even they was requested. ======================================================================
---------------------------------------------------------------------- idk - 21-Aug-05 00:06 ---------------------------------------------------------------------- My suggestions are: 1) call of manage_start_children() instead of manage_spare_children() after database resuming 2) after resuming db conn call alarm(10) for recovery alarm timer (I'm not testing if is it adequate) 3) do corrections in LIFO and infinite loop described above ---------------------------------------------------------------------- paul - 22-Aug-05 10:15 ---------------------------------------------------------------------- I've fixed this problem. There was some faulty login in manage_spare_children, the alarm is reset after the database resumes, and the missing breaks were added. Thanks a lot for working on this. Please test the latest svn code. ---------------------------------------------------------------------- idk - 23-Aug-05 18:24 ---------------------------------------------------------------------- I tried to add trace(1, "spare: %d %d", count_children(), count_spare_children()); into manage_spare_children() just before first loop, I started daemon, then I made MAX+ connections, this was logged (attached maillog.txt, I hope) Aug 23 17:13:21 start Aug 23 17:13:21 spare: 5 5 Aug 23 17:13:51 spare: 5 5 - 20s, not 10 Aug 23 17:14:00 connect Aug 23 17:14:01 spare: 5 4 Aug 23 17:14:05 disconnect Aug 23 17:14:11 spare: 5 5 Aug 23 17:14:21 spare: 5 5 Aug 23 17:14:31 spare: 5 5 Aug 23 17:14:33+ connect 5 times Aug 23 17:14:41 spare: 5 0 - last trace of this message, so alarm stopped in 17:14:41-17:14:50 Aug 23 17:14:41 register children 5-19 Aug 23 17:14:41 child_register failed (21th, ok) no more messages (alarm) killall Aug 23 17:23:31 got signal [15] Aug 23 17:23:31 stop requested Aug 23 17:23:31 child [19785] unregistered all three 20 times, but ps ax shows many zombies 19782 ? S 0:00 /_/dbmail/dbmail-2.0/.libs/lt-dbmail-imapd 19783 ? S 0:00 /_/dbmail/dbmail-2.0/.libs/lt-dbmail-imapd 19785 ? Z 0:00 [lt-dbmail-imapd] <defunct> 19787 ? Z 0:00 [lt-dbmail-imapd] <defunct> 19789 ? Z 0:00 [lt-dbmail-imapd] <defunct> 19791 ? Z 0:00 [lt-dbmail-imapd] <defunct> 19793 ? Z 0:00 [lt-dbmail-imapd] <defunct> 20075 ? Z 0:00 [lt-dbmail-imapd] <defunct> 20077 ? Z 0:00 [lt-dbmail-imapd] <defunct> 20079 ? Z 0:00 [lt-dbmail-imapd] <defunct> 20081 ? Z 0:00 [lt-dbmail-imapd] <defunct> 20083 ? Z 0:00 [lt-dbmail-imapd] <defunct> 20085 ? Z 0:00 [lt-dbmail-imapd] <defunct> 20087 ? Z 0:00 [lt-dbmail-imapd] <defunct> 20089 ? Z 0:00 [lt-dbmail-imapd] <defunct> 20091 ? Z 0:00 [lt-dbmail-imapd] <defunct> 20093 ? Z 0:00 [lt-dbmail-imapd] <defunct> 20095 ? Z 0:00 [lt-dbmail-imapd] <defunct> 20097 ? Z 0:00 [lt-dbmail-imapd] <defunct> 20099 ? Z 0:00 [lt-dbmail-imapd] <defunct> 20101 ? Z 0:00 [lt-dbmail-imapd] <defunct> 20103 ? Z 0:00 [lt-dbmail-imapd] <defunct> killing them step by step by their pid had no effect, I tried to start new instance, but Aug 23 17:23:39 File [/var/run/dbmail-imapd.pid] exists So I deleted them Aug 23 17:25:35 could not bind address to socket Sorry, I have this production server only (no test servers), I had to restart them immediatelly (due zombies), I cannot test this issue now, maybe later (tonight UTC+0200, or weekend). Issue History Date Modified Username Field Change ====================================================================== 20-Aug-05 23:54 idk New Issue 21-Aug-05 00:06 idk Note Added: 0000848 22-Aug-05 10:15 paul Status new => resolved 22-Aug-05 10:15 paul Resolution open => fixed 22-Aug-05 10:15 paul Assigned To => paul 22-Aug-05 10:15 paul Note Added: 0000849 23-Aug-05 18:24 idk Status resolved => feedback 23-Aug-05 18:24 idk Resolution fixed => reopened 23-Aug-05 18:24 idk Note Added: 0000872 ======================================================================
