A NOTE has been added to this issue. ====================================================================== http://www.dbmail.org/mantis/view.php?id=370 ====================================================================== Reported By: ryo Assigned To: ====================================================================== Project: DBMail Issue ID: 370 Category: General Reproducibility: sometimes Severity: minor Priority: normal Status: feedback target: 2.1.7 ====================================================================== Date Submitted: 21-Jun-06 13:57 CEST Last Modified: 22-Jun-06 06:12 CEST ====================================================================== Summary: waitpid() in ParentSigHander() function shuld be called multiple time for zombies. Description: This issue is related to http://www.dbmail.org/mantis/view.php?id=361 (already resolved).
I update to svn2184, but I had any zombies. I think that when the parent process recieved multiple SIGCHLD at the same time the signal handler is called only once on UNIX system. So, waitpid() in ParentSigHander() function shuld be called multiple time for zombies. In addition, it seems that the waipid() function is not called (after call kill()) in the manage_stop_chldren function. I create the patch. Please see the attached file(dbmail-nozombie.patch). ====================================================================== ---------------------------------------------------------------------- paul - 21-Jun-06 16:30 ---------------------------------------------------------------------- patch accepted ---------------------------------------------------------------------- aaron - 21-Jun-06 18:09 ---------------------------------------------------------------------- Wait, is this guaranteed not to deadlock? I think I'd rather have a for loop with a finite number of attempts, followed by some ominous trace message that there's a zombie on the loose. ---------------------------------------------------------------------- ryo - 22-Jun-06 04:19 ---------------------------------------------------------------------- According to the man page of waitpid, the retrun value of waitpid function is as follows. RETURN VALUE The process ID of the child which exited, or zero if WNO- HANG was used and no child was available, or -1 on error (in which case errno is set to an appropriate value). If zombie process does not exist then the waitpid function returns zero or -1. So, the program maybe is not locked by the while loop of my patch. I think that if you worry about the "while loop" lock then you limit the times of loop(eg, to HARD_MAX_CHILDREN). For example as follows. #include <pool.h> : int cnt = 0; : while((chpid = waitpid(-1,&sig,WNOHANG)) > 0 && (cnt++ < HARD_MAX_CHILDREN)) scoreboard_release(chpid); ---------------------------------------------------------------------- aaron - 22-Jun-06 06:12 ---------------------------------------------------------------------- Oh wait, sorry, I misread the patch. The situation that might lead to an infinite loop is if there is some child that is not a zombie and yet does not terminate. Is there a possibility of this occurring? Issue History Date Modified Username Field Change ====================================================================== 21-Jun-06 13:57 ryo New Issue 21-Jun-06 13:57 ryo File Added: dbmail-nozombie.patch 21-Jun-06 16:30 paul target => 2.1.7 21-Jun-06 16:30 paul Note Added: 0001256 21-Jun-06 16:30 paul Status new => resolved 21-Jun-06 16:30 paul Resolution open => fixed 21-Jun-06 16:30 paul Fixed in Version => SVN Trunk 21-Jun-06 18:09 aaron Note Added: 0001261 22-Jun-06 04:19 ryo Status resolved => feedback 22-Jun-06 04:19 ryo Resolution fixed => reopened 22-Jun-06 04:19 ryo Note Added: 0001263 22-Jun-06 06:12 aaron Note Added: 0001264 ======================================================================