On 4/11/2013 9:07 AM, Millsap, James wrote:
> Unfortunately It is difficult as this machine is critical to our
operations, I don't have a whole lot of time to troubleshoot, before I
must have it up and running. It usually takes around two days for this
issue to come up. -TERM will kill it, no need to use --KILL. This is
built from source so no redhat packages. This is what I have in the
qrunner log.
> 
[...]
> Apr 10 10:01:08 2013 (17611) OutgoingRunner qrunner caught SIGTERM.  Stopping.
[...]
> Apr 10 10:01:08 2013 (17604) Master watcher caught SIGTERM.  Exiting.
[...]
> Apr 10 10:01:37 2013 (17604) Master watcher caught SIGTERM.  Exiting.
> Apr 10 10:01:37 2013 (17611) OutgoingRunner qrunner caught SIGTERM.  Stopping.
> Apr 10 10:01:37 2013 (17611) OutgoingRunner qrunner exiting.
> Apr 10 10:01:38 2013 (17604) Master qrunner detected subprocess exit
> (pid: 17611, sig: None, sts: 15, class: OutgoingRunner, slice: 1/1)
[...]


Interesting that OutgoingRunner wouldn't exit until SIGTERMed a second
time. It seems highly likely that it is waiting on something 'not
interruptable' and this is why it stops processing in the first place
and is reluctant to die.

The real question is what's it waiting on and why? Without the answer or
some more clue to this, I don't know what.

Check the MTA logs from the time OutgoingRunner 'hung' and the time it
was SIGTERMed. Also consider enabling smtplib debug logging (see
<http://wiki.list.org/x/-IA9>).

-- 
Mark Sapiro <[email protected]>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan
------------------------------------------------------
Mailman-Users mailing list [email protected]
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Reply via email to