Re: Problems with cyrus-imapd 2.1.11 under Solaris 8

2002-12-09 Thread Stephen Grier
Henrique de Moraes Holschuh wrote: 
 There is a more complete solution to the SIGCHILD problems in master, that
 fixes all the race conditions that cause the process count to be lost. I
 call it the pid morgue :-)
 
 It is in the bugzilla, and it is being used in production by the fastmail.fm
 people, AND all Debian users without a glitch for a long while now...

Is there any reason why this SIGCHILD patch should not be applied to
2.1.11 under Solaris 8? Why is this not in recent releases of
cyrus-imapd? The patch dates to May 02.

http://bugzilla.andrew.cmu.edu/show_bug.cgi?id=1261

-- 

Stephen Grier
Systems Developer
Computing Services
Queen Mary, University of London



Re: Problems with cyrus-imapd 2.1.11 under Solaris 8

2002-12-09 Thread Lawrence Greenfield
--On Monday, December 09, 2002 4:39 PM + Stephen Grier 
[EMAIL PROTECTED] wrote:

Henrique de Moraes Holschuh wrote:

There is a more complete solution to the SIGCHILD problems in master,
that fixes all the race conditions that cause the process count to be
lost. I call it the pid morgue :-)

It is in the bugzilla, and it is being used in production by the
fastmail.fm people, AND all Debian users without a glitch for a long
while now...


Is there any reason why this SIGCHILD patch should not be applied to
2.1.11 under Solaris 8? Why is this not in recent releases of
cyrus-imapd? The patch dates to May 02.


Because making changes to master like this is fraught with race conditions. 
I'm not going to apply the patch until I've done a very careful review and 
I just haven't had the chance to do a very careful review.

It hasn't been all that important for us since our master processes don't 
lose track of the number of children.

Larry



Re: Problems with cyrus-imapd 2.1.11 under Solaris 8

2002-12-06 Thread Nicola Ranaldo
I think master would check exit value of its childs and decrement the number
of ready_workers.

Regards

Nicola Ranaldo


 When some pop3d dies with signal (i.e. SIGTERM), all incoming
 connections to corresponding address:port are hangs. For example, if I
 have pop3d running on 192.168.0.1:110, and issue a command:

 $ kill PID_OF_THIS_POP3D

 and then

 $ telnet 192.168.0.1 110

 I couldn't see pop3d banner after successful connection. Looks like all
 incoming connections to this address holds in kernel queue and doesn't
 reach accept(). Last message in log is process PID exited, signaled to
 death by 15.

 In the same time all connections to 192.168.0.2:110 are successfully
 completes and I can see a standard pop3d banner.

 Any ideas ?






Re: Problems with cyrus-imapd 2.1.11 under Solaris 8

2002-12-06 Thread Oleg Derevenetz
Nicola Ranaldo wrote:

I think master would check exit value of its childs and decrement the number
of ready_workers.


It seems like you perfectly right. I wrote a quick fix (i will be very 
thankful if you check my fix and correct me if I am wrong) and it works 
:-) Attachment contain this fix.
*** master.c.orig   Fri Nov  1 19:44:33 2002
--- master.cFri Dec  6 13:43:22 2002
***
*** 720,728 
if (c  c-pid == pid) {
/* first thing in the linked list */
  
/* decrement active count for service */
if (c-s) c-s-nactive--;
! 
ctable[pid % child_table_size] = c-next;
c-next = cfreelist;
cfreelist = c;
--- 720,733 
if (c  c-pid == pid) {
/* first thing in the linked list */
  
+   /* decrement workers count if process not exited correctly */
+   if (!(WIFEXITED(status))  c-s) {
+   c-s-ready_workers--;
+   }
+ 
/* decrement active count for service */
if (c-s) c-s-nactive--;
!   
ctable[pid % child_table_size] = c-next;
c-next = cfreelist;
cfreelist = c;
***
*** 737,742 
--- 742,753 
struct centry *t;
  
t = c-next;
+ 
+   /* decrement workers count if process not exited correctly */
+   if (!(WIFEXITED(status))  t-s) {
+   t-s-ready_workers--;
+   }
+ 
/* decrement active count for service */
if (t-s) t-s-nactive--;
  



Re: Problems with cyrus-imapd 2.1.11 under Solaris 8

2002-12-06 Thread Henrique de Moraes Holschuh
On Fri, 06 Dec 2002, Oleg Derevenetz wrote:
 Nicola Ranaldo wrote:
 I think master would check exit value of its childs and decrement the 
 number
 of ready_workers.
 
 It seems like you perfectly right. I wrote a quick fix (i will be very 
 thankful if you check my fix and correct me if I am wrong) and it works 
 :-) Attachment contain this fix.

There is a more complete solution to the SIGCHILD problems in master, that
fixes all the race conditions that cause the process count to be lost. I
call it the pid morgue :-)

It is in the bugzilla, and it is being used in production by the fastmail.fm
people, AND all Debian users without a glitch for a long while now...

You may want to have a look at that stuff...

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh



Re: Problems with cyrus-imapd 2.1.11 under Solaris 8

2002-12-05 Thread Carson Gaspar


--On Thursday, December 05, 2002 10:22 PM +0300 Oleg Derevenetz 
[EMAIL PROTECTED] wrote:

When some pop3d dies with signal (i.e. SIGTERM), all incoming connections
to corresponding address:port are hangs. For example, if I have pop3d


I can confirm that the same bug exists under Solaris 8 x86 (fully patched) 
with imapd. To reproduce:

- Start master
- connect to imapd
- kill the imapd process

No further imapd processes will be spawned. This is reliable - not a race 
condition. I'll see if I can figure out what died in the code.

--
Carson



Re: Problems with cyrus-imapd 2.1.11 under Solaris 8

2002-12-05 Thread Rob Siemborski
On Thu, 5 Dec 2002, Carson Gaspar wrote:

  When some pop3d dies with signal (i.e. SIGTERM), all incoming
  connections to corresponding address:port are hangs. For example, if I
  have pop3d

 I can confirm that the same bug exists under Solaris 8 x86 (fully patched)
 with imapd. To reproduce:

 - Start master
 - connect to imapd
 - kill the imapd process

 No further imapd processes will be spawned. This is reliable - not a race
 condition. I'll see if I can figure out what died in the code.

This isn't good enough for me to reproduce it.

I have tried both with preforking and without preforking.

I cannot get 2.1.11 to behave like this on Solaris 8.

Master didn't change since 2.1.10 so I don't know what this could be.

-Rob

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Rob Siemborski * Andrew Systems Group * Cyert Hall 207 * 412-268-7456
Research Systems Programmer * /usr/contributed Gatekeeper




Re: Problems with cyrus-imapd 2.1.11 under Solaris 8

2002-12-05 Thread Carson Gaspar


--On Thursday, December 05, 2002 4:58 PM -0500 Rob Siemborski 
[EMAIL PROTECTED] wrote:

This isn't good enough for me to reproduce it.

I have tried both with preforking and without preforking.

I cannot get 2.1.11 to behave like this on Solaris 8.

Master didn't change since 2.1.10 so I don't know what this could be.


And of course now I can't, either. Odd...

--
Carson




Re: Problems with cyrus-imapd 2.1.11 under Solaris 8

2002-12-05 Thread Ken Murchison


Carson Gaspar wrote:
 
 --On Thursday, December 05, 2002 10:22 PM +0300 Oleg Derevenetz
 [EMAIL PROTECTED] wrote:
 
  When some pop3d dies with signal (i.e. SIGTERM), all incoming connections
  to corresponding address:port are hangs. For example, if I have pop3d
 
 I can confirm that the same bug exists under Solaris 8 x86 (fully patched)
 with imapd. To reproduce:
 
 - Start master
 - connect to imapd
 - kill the imapd process
 
 No further imapd processes will be spawned. This is reliable - not a race
 condition. I'll see if I can figure out what died in the code.

I'm pretty sure that this is a case of master losing count of the number
of available pop3d processes.  When you kill a pop3d with SIGTERM,
master never gets a SIGCHILD so never decrements its counter.

-- 
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 21 Princeton Place
716-662-8973 x26  Orchard Park, NY 14127
--PGP Public Key--http://www.oceana.com/~ken/ksm.pgp