On 4/11/2013 9:07 AM, Millsap, James wrote: > Unfortunately It is difficult as this machine is critical to our operations, I don't have a whole lot of time to troubleshoot, before I must have it up and running. It usually takes around two days for this issue to come up. -TERM will kill it, no need to use --KILL. This is built from source so no redhat packages. This is what I have in the qrunner log. > [...] > Apr 10 10:01:08 2013 (17611) OutgoingRunner qrunner caught SIGTERM. Stopping. [...] > Apr 10 10:01:08 2013 (17604) Master watcher caught SIGTERM. Exiting. [...] > Apr 10 10:01:37 2013 (17604) Master watcher caught SIGTERM. Exiting. > Apr 10 10:01:37 2013 (17611) OutgoingRunner qrunner caught SIGTERM. Stopping. > Apr 10 10:01:37 2013 (17611) OutgoingRunner qrunner exiting. > Apr 10 10:01:38 2013 (17604) Master qrunner detected subprocess exit > (pid: 17611, sig: None, sts: 15, class: OutgoingRunner, slice: 1/1) [...]
Interesting that OutgoingRunner wouldn't exit until SIGTERMed a second time. It seems highly likely that it is waiting on something 'not interruptable' and this is why it stops processing in the first place and is reluctant to die. The real question is what's it waiting on and why? Without the answer or some more clue to this, I don't know what. Check the MTA logs from the time OutgoingRunner 'hung' and the time it was SIGTERMed. Also consider enabling smtplib debug logging (see <http://wiki.list.org/x/-IA9>). -- Mark Sapiro <[email protected]> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan ------------------------------------------------------ Mailman-Users mailing list [email protected] http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
