[courier-users] Re: SHUTDOWN: respawnhi limit reached.

2001-12-15 Thread Sam Varshavchik

Johannes Erdfelt writes: 

 I see later:
  
 Dec  7 22:42:35 quattro courierd:
 started,id=00077C44.3C118997.6D32,from=,module=esmtp,host=ofr.pm0.net,addr
 =[EMAIL PROTECTED]
  
 I don't see any other messages for id 00077C44.3C118997.6D32.
  
 An strace on the aforementioned process resulted in:
  
 quattro:~# strace -p 28027
 read(6, 
  
 Which just sits there.

And what is file descriptor 6? 

-- 
Sam

___
courier-users mailing list
[EMAIL PROTECTED]
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users



Re: [courier-users] Re: SHUTDOWN: respawnhi limit reached.

2001-12-14 Thread Aly S.P Dharshi

 daemon   28027  0.0  0.0  1916 1008 ?SDec07   0:00
courieresmtp 0 ofr.pm0.net

Just a BTW not a related topic, it seems that the domain pm0.net is regarded
as a spam site and has been blocked by a great many sites recently.
Currently it was talked about on the Exim mailing list.


___
courier-users mailing list
[EMAIL PROTECTED]
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users



[courier-users] Re: SHUTDOWN: respawnhi limit reached.

2001-12-07 Thread Sam Varshavchik

Johannes Erdfelt writes: 

 On Thu, Dec 06, 2001, Gordon Messmer [EMAIL PROTECTED] wrote:
 On Thu, 6 Dec 2001, Johannes Erdfelt wrote:
  The mail server is busy much of the time, but I don't think it's busy
  enough to naturally hit the respawnhi timeout. It looks like somehow
  courier missed that a child finished and that's why it hit the respawnhi
  timeout. 
 
 I was wrong about that.  The child processes are still legitimately 
 running.  As fate would have it just as I started this email, I was pulled 
 in to some mail server issues and noticed that the respawnhi thing had 
 happened again.  All of the couriersmtp processes were stuck in a read() 
 system call on fd 5.  I have the control file from a couple, and there are 
 lots of DNS failures recorded. 
 
 It's much too late to do any debugging right now, but I'll be over this 
 tomorrow.  In any case, it's not that courierd isn't harvesting children,
 it's that the children are blocking on an unprotected read().  (I thought
 they all had alarms in place...  /me shrugs)
 
 I checked for any running processes, but I couldn't find any. I do have
 lots of courier related process running (authdaemon, pop and imap) so I
 may have missed one. 
 
 Either way, my system sat for 6 hours or so doing nothing. If you're
 right that there was a process still running, something is missing a
 timeout. 
 
 I wonder what the longest timeout is. I guess presumably the respawnhi
 could happen at a time right after a legitimate process is spawned which
 then needs to timeout to a client, there will always be the chance that
 courier just stops delivering email for a while. 
 
 respawnhi seems to need some sort of timeout, even if it's extremely
 long.

The server is designed to restart itself only when no mail is pending. 

The problem is that the client should not be stuck like that.  There's a 
select() before every read from the socket, so if anything, it should be 
stuck in a select(). 

Get the date of the stuck message, and review your logs to see if there are 
any errors in syslog around that time, or a little bit later. 

-- 
Sam 


___
courier-users mailing list
[EMAIL PROTECTED]
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users