Unfortunately It is difficult as this machine is critical to our operations, I don't have a whole lot of time to troubleshoot, before I must have it up and running. It usually takes around two days for this issue to come up. -TERM will kill it, no need to use --KILL. This is built from source so no redhat packages. This is what I have in the qrunner log.
Apr 10 10:01:08 2013 (17606) ArchRunner qrunner caught SIGTERM. Stopping. Apr 10 10:01:08 2013 (17606) ArchRunner qrunner exiting. Apr 10 10:01:08 2013 (17611) OutgoingRunner qrunner caught SIGTERM. Stopping. Apr 10 10:01:08 2013 (17612) VirginRunner qrunner caught SIGTERM. Stopping. Apr 10 10:01:08 2013 (17612) VirginRunner qrunner exiting. Apr 10 10:01:08 2013 (17607) BounceRunner qrunner caught SIGTERM. Stopping. Apr 10 10:01:08 2013 (17608) CommandRunner qrunner caught SIGTERM. Stopping. Apr 10 10:01:08 2013 (17608) CommandRunner qrunner exiting. Apr 10 10:01:08 2013 (17609) IncomingRunner qrunner caught SIGTERM. Stopping. Apr 10 10:01:08 2013 (17609) IncomingRunner qrunner exiting. Apr 10 10:01:08 2013 (17610) NewsRunner qrunner caught SIGTERM. Stopping. Apr 10 10:01:08 2013 (17610) NewsRunner qrunner exiting. Apr 10 10:01:08 2013 (17613) RetryRunner qrunner caught SIGTERM. Stopping. Apr 10 10:01:08 2013 (17613) RetryRunner qrunner exiting. Apr 10 10:01:08 2013 (17604) Master watcher caught SIGTERM. Exiting. Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit (pid: 17606, sig: None, sts: 15, class: ArchRunner, slice: 1/1) Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit (pid: 17608, sig: None, sts: 15, class: CommandRunner, slice: 1/1) Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit (pid: 17609, sig: None, sts: 15, class: IncomingRunner, slice: 1/1) Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit (pid: 17610, sig: None, sts: 15, class: NewsRunner, slice: 1/1) Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit (pid: 17612, sig: None, sts: 15, class: VirginRunner, slice: 1/1) Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit (pid: 17613, sig: None, sts: 15, class: RetryRunner, slice: 1/1) Apr 10 10:01:08 2013 (17607) BounceRunner qrunner exiting. Apr 10 10:01:08 2013 (17604) Master qrunner detected subprocess exit (pid: 17607, sig: None, sts: 15, class: BounceRunner, slice: 1/1) Apr 10 10:01:37 2013 (17611) OutgoingRunner qrunner caught SIGTERM. Stopping. Apr 10 10:01:37 2013 (17604) Master watcher caught SIGTERM. Exiting. Apr 10 10:01:37 2013 (17611) OutgoingRunner qrunner caught SIGTERM. Stopping. Apr 10 10:01:37 2013 (17611) OutgoingRunner qrunner exiting. Apr 10 10:01:38 2013 (17604) Master qrunner detected subprocess exit (pid: 17611, sig: None, sts: 15, class: OutgoingRunner, slice: 1/1) Apr 10 10:01:58 2013 (15858) CommandRunner qrunner started. Apr 10 10:01:59 2013 (15859) IncomingRunner qrunner started. Apr 10 10:01:59 2013 (15856) ArchRunner qrunner started. Apr 10 10:01:59 2013 (15857) BounceRunner qrunner started. Apr 10 10:01:59 2013 (15862) VirginRunner qrunner started. Apr 10 10:01:59 2013 (15860) NewsRunner qrunner started. Apr 10 10:01:59 2013 (15863) RetryRunner qrunner started. Apr 10 10:01:59 2013 (15861) OutgoingRunner qrunner started. -----Original Message----- From: Mark Sapiro [mailto:[email protected]] Sent: Wednesday, April 10, 2013 3:59 PM To: Millsap, James Cc: [email protected] Subject: Re: [Mailman-Users] mailman 2.1.14 stops sending mail On 4/10/2013 8:43 AM, Millsap, James wrote: > > mailman 15854 1 0 10:01 ? 00:00:00 /usr/bin/python > /usr/local/mailman/bin/mailmanctl -s start > mailman 15861 15854 0 10:01 ? 00:00:06 /usr/bin/python > /usr/local/mailman/bin/qrunner --runner=OutgoingRunner:0:1 -s > > I have to kill the outgoingrunner specifically. The only thing I see in the > logs is a lack of logging. It has been running with stunning reliability on > this machine for the last few years, so I am not sure what is going on. > Perhaps one of redhat's patches killed it. Can you kill -TERM it or do you need to kill -KILL it? Are you sure there's nothing relevant in Mailman's qrunner log (/var/log/mailman/qrunner if a rhel packaged Mailman)? Is there a current .bak file in the out queue (/var/spool/mailman/out/) What does 'lsof' show for the process? You might be able to get something useful from 'gdb' or maybe see something like <http://stackoverflow.com/questions/132058/showing-the-stack-trace-from-a-running-python-application>. If I had to guess, I'd guess it gets hung waiting for an SMTP response from the MTA. -- Mark Sapiro <[email protected]> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan ------------------------------------------------------ Mailman-Users mailing list [email protected] http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
