Unfortunately It is difficult as this machine is critical to our operations, I 
don't have a whole lot of time to troubleshoot, before I must have it up and 
running. It usually takes around two days for this issue to come up.  -TERM 
will kill it, no need to use --KILL. This is built from source so no redhat 
packages.   This is what I have in the qrunner log. 

Apr 10 10:01:08 2013 (17606) ArchRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:08 2013 (17606) ArchRunner qrunner exiting.
Apr 10 10:01:08 2013 (17611) OutgoingRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:08 2013 (17612) VirginRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:08 2013 (17612) VirginRunner qrunner exiting.
Apr 10 10:01:08 2013 (17607) BounceRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:08 2013 (17608) CommandRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:08 2013 (17608) CommandRunner qrunner exiting.
Apr 10 10:01:08 2013 (17609) IncomingRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:08 2013 (17609) IncomingRunner qrunner exiting.
Apr 10 10:01:08 2013 (17610) NewsRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:08 2013 (17610) NewsRunner qrunner exiting.
Apr 10 10:01:08 2013 (17613) RetryRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:08 2013 (17613) RetryRunner qrunner exiting.
Apr 10 10:01:08 2013 (17604) Master watcher caught SIGTERM.  Exiting.
Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit
(pid: 17606, sig: None, sts: 15, class: ArchRunner, slice: 1/1)
Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit
(pid: 17608, sig: None, sts: 15, class: CommandRunner, slice: 1/1)
Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit
(pid: 17609, sig: None, sts: 15, class: IncomingRunner, slice: 1/1)
Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit
(pid: 17610, sig: None, sts: 15, class: NewsRunner, slice: 1/1)
Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit
(pid: 17612, sig: None, sts: 15, class: VirginRunner, slice: 1/1)
Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit
(pid: 17613, sig: None, sts: 15, class: RetryRunner, slice: 1/1)
Apr 10 10:01:08 2013 (17607) BounceRunner qrunner exiting.
Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit
(pid: 17607, sig: None, sts: 15, class: BounceRunner, slice: 1/1)
Apr 10 10:01:37 2013 (17611) OutgoingRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:37 2013 (17604) Master watcher caught SIGTERM.  Exiting.
Apr 10 10:01:37 2013 (17611) OutgoingRunner qrunner caught SIGTERM.  Stopping.
Apr 10 10:01:37 2013 (17611) OutgoingRunner qrunner exiting.
Apr 10 10:01:38 2013 (17604) Master qrunner detected subprocess exit
(pid: 17611, sig: None, sts: 15, class: OutgoingRunner, slice: 1/1)
Apr 10 10:01:58 2013 (15858) CommandRunner qrunner started.
Apr 10 10:01:59 2013 (15859) IncomingRunner qrunner started.
Apr 10 10:01:59 2013 (15856) ArchRunner qrunner started.
Apr 10 10:01:59 2013 (15857) BounceRunner qrunner started.
Apr 10 10:01:59 2013 (15862) VirginRunner qrunner started.
Apr 10 10:01:59 2013 (15860) NewsRunner qrunner started.
Apr 10 10:01:59 2013 (15863) RetryRunner qrunner started.
Apr 10 10:01:59 2013 (15861) OutgoingRunner qrunner started.

-----Original Message-----
From: Mark Sapiro [mailto:[email protected]] 
Sent: Wednesday, April 10, 2013 3:59 PM
To: Millsap, James
Cc: [email protected]
Subject: Re: [Mailman-Users] mailman 2.1.14 stops sending mail

On 4/10/2013 8:43 AM, Millsap, James wrote:
> 
> mailman  15854     1  0 10:01 ?        00:00:00 /usr/bin/python 
> /usr/local/mailman/bin/mailmanctl -s start
> mailman  15861 15854  0 10:01 ?        00:00:06 /usr/bin/python 
> /usr/local/mailman/bin/qrunner --runner=OutgoingRunner:0:1 -s
> 
> I have to kill the outgoingrunner specifically.  The only thing I see in the 
> logs is a lack of logging.  It has been running with stunning reliability on 
> this machine for the last few years, so I am not sure what is going on.  
> Perhaps one of redhat's patches killed it.


Can you kill -TERM it or do you need to kill -KILL it?

Are you sure there's nothing relevant in Mailman's qrunner log 
(/var/log/mailman/qrunner if a rhel packaged Mailman)? Is there a current .bak 
file in the out queue (/var/spool/mailman/out/)

What does 'lsof' show for the process? You might be able to get something 
useful from 'gdb' or maybe see something like 
<http://stackoverflow.com/questions/132058/showing-the-stack-trace-from-a-running-python-application>.

If I had to guess, I'd guess it gets hung waiting for an SMTP response from the 
MTA.

-- 
Mark Sapiro <[email protected]>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan
------------------------------------------------------
Mailman-Users mailing list [email protected]
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Reply via email to