Re: Problems with cyrus-imapd 2.1.11 under Solaris 8
Henrique de Moraes Holschuh wrote: There is a more complete solution to the SIGCHILD problems in master, that fixes all the race conditions that cause the process count to be lost. I call it the pid morgue :-) It is in the bugzilla, and it is being used in production by the fastmail.fm people, AND all Debian users without a glitch for a long while now... Is there any reason why this SIGCHILD patch should not be applied to 2.1.11 under Solaris 8? Why is this not in recent releases of cyrus-imapd? The patch dates to May 02. http://bugzilla.andrew.cmu.edu/show_bug.cgi?id=1261 -- Stephen Grier Systems Developer Computing Services Queen Mary, University of London
Re: Problems with cyrus-imapd 2.1.11 under Solaris 8
--On Monday, December 09, 2002 4:39 PM + Stephen Grier [EMAIL PROTECTED] wrote: Henrique de Moraes Holschuh wrote: There is a more complete solution to the SIGCHILD problems in master, that fixes all the race conditions that cause the process count to be lost. I call it the pid morgue :-) It is in the bugzilla, and it is being used in production by the fastmail.fm people, AND all Debian users without a glitch for a long while now... Is there any reason why this SIGCHILD patch should not be applied to 2.1.11 under Solaris 8? Why is this not in recent releases of cyrus-imapd? The patch dates to May 02. Because making changes to master like this is fraught with race conditions. I'm not going to apply the patch until I've done a very careful review and I just haven't had the chance to do a very careful review. It hasn't been all that important for us since our master processes don't lose track of the number of children. Larry
Re: Problems with cyrus-imapd 2.1.11 under Solaris 8
I think master would check exit value of its childs and decrement the number of ready_workers. Regards Nicola Ranaldo When some pop3d dies with signal (i.e. SIGTERM), all incoming connections to corresponding address:port are hangs. For example, if I have pop3d running on 192.168.0.1:110, and issue a command: $ kill PID_OF_THIS_POP3D and then $ telnet 192.168.0.1 110 I couldn't see pop3d banner after successful connection. Looks like all incoming connections to this address holds in kernel queue and doesn't reach accept(). Last message in log is process PID exited, signaled to death by 15. In the same time all connections to 192.168.0.2:110 are successfully completes and I can see a standard pop3d banner. Any ideas ?
Re: Problems with cyrus-imapd 2.1.11 under Solaris 8
Nicola Ranaldo wrote: I think master would check exit value of its childs and decrement the number of ready_workers. It seems like you perfectly right. I wrote a quick fix (i will be very thankful if you check my fix and correct me if I am wrong) and it works :-) Attachment contain this fix. *** master.c.orig Fri Nov 1 19:44:33 2002 --- master.cFri Dec 6 13:43:22 2002 *** *** 720,728 if (c c-pid == pid) { /* first thing in the linked list */ /* decrement active count for service */ if (c-s) c-s-nactive--; ! ctable[pid % child_table_size] = c-next; c-next = cfreelist; cfreelist = c; --- 720,733 if (c c-pid == pid) { /* first thing in the linked list */ + /* decrement workers count if process not exited correctly */ + if (!(WIFEXITED(status)) c-s) { + c-s-ready_workers--; + } + /* decrement active count for service */ if (c-s) c-s-nactive--; ! ctable[pid % child_table_size] = c-next; c-next = cfreelist; cfreelist = c; *** *** 737,742 --- 742,753 struct centry *t; t = c-next; + + /* decrement workers count if process not exited correctly */ + if (!(WIFEXITED(status)) t-s) { + t-s-ready_workers--; + } + /* decrement active count for service */ if (t-s) t-s-nactive--;
Re: Problems with cyrus-imapd 2.1.11 under Solaris 8
On Fri, 06 Dec 2002, Oleg Derevenetz wrote: Nicola Ranaldo wrote: I think master would check exit value of its childs and decrement the number of ready_workers. It seems like you perfectly right. I wrote a quick fix (i will be very thankful if you check my fix and correct me if I am wrong) and it works :-) Attachment contain this fix. There is a more complete solution to the SIGCHILD problems in master, that fixes all the race conditions that cause the process count to be lost. I call it the pid morgue :-) It is in the bugzilla, and it is being used in production by the fastmail.fm people, AND all Debian users without a glitch for a long while now... You may want to have a look at that stuff... -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh
Re: Problems with cyrus-imapd 2.1.11 under Solaris 8
--On Thursday, December 05, 2002 10:22 PM +0300 Oleg Derevenetz [EMAIL PROTECTED] wrote: When some pop3d dies with signal (i.e. SIGTERM), all incoming connections to corresponding address:port are hangs. For example, if I have pop3d I can confirm that the same bug exists under Solaris 8 x86 (fully patched) with imapd. To reproduce: - Start master - connect to imapd - kill the imapd process No further imapd processes will be spawned. This is reliable - not a race condition. I'll see if I can figure out what died in the code. -- Carson
Re: Problems with cyrus-imapd 2.1.11 under Solaris 8
On Thu, 5 Dec 2002, Carson Gaspar wrote: When some pop3d dies with signal (i.e. SIGTERM), all incoming connections to corresponding address:port are hangs. For example, if I have pop3d I can confirm that the same bug exists under Solaris 8 x86 (fully patched) with imapd. To reproduce: - Start master - connect to imapd - kill the imapd process No further imapd processes will be spawned. This is reliable - not a race condition. I'll see if I can figure out what died in the code. This isn't good enough for me to reproduce it. I have tried both with preforking and without preforking. I cannot get 2.1.11 to behave like this on Solaris 8. Master didn't change since 2.1.10 so I don't know what this could be. -Rob -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Rob Siemborski * Andrew Systems Group * Cyert Hall 207 * 412-268-7456 Research Systems Programmer * /usr/contributed Gatekeeper
Re: Problems with cyrus-imapd 2.1.11 under Solaris 8
--On Thursday, December 05, 2002 4:58 PM -0500 Rob Siemborski [EMAIL PROTECTED] wrote: This isn't good enough for me to reproduce it. I have tried both with preforking and without preforking. I cannot get 2.1.11 to behave like this on Solaris 8. Master didn't change since 2.1.10 so I don't know what this could be. And of course now I can't, either. Odd... -- Carson
Re: Problems with cyrus-imapd 2.1.11 under Solaris 8
Carson Gaspar wrote: --On Thursday, December 05, 2002 10:22 PM +0300 Oleg Derevenetz [EMAIL PROTECTED] wrote: When some pop3d dies with signal (i.e. SIGTERM), all incoming connections to corresponding address:port are hangs. For example, if I have pop3d I can confirm that the same bug exists under Solaris 8 x86 (fully patched) with imapd. To reproduce: - Start master - connect to imapd - kill the imapd process No further imapd processes will be spawned. This is reliable - not a race condition. I'll see if I can figure out what died in the code. I'm pretty sure that this is a case of master losing count of the number of available pop3d processes. When you kill a pop3d with SIGTERM, master never gets a SIGCHILD so never decrements its counter. -- Kenneth Murchison Oceana Matrix Ltd. Software Engineer 21 Princeton Place 716-662-8973 x26 Orchard Park, NY 14127 --PGP Public Key--http://www.oceana.com/~ken/ksm.pgp