On 01/06/2014 05:31 PM, Chuck Weinstock wrote:
> Thanks!
> 
> Yes to the stale lock problem. Regarding the other problem…the last time it 
> shut down was January 1. Here are some of the qrunner log entries just prior 
> to that:
> 
>> Dec 30 18:17:20 2013 (8351) Master qrunner detected subprocess exit
>> (pid: 2209, sig: 9, sts: None, class: ArchRunner, slice: 1/1) [restarting]
>> Dec 30 18:17:23 2013 (16892) ArchRunner qrunner started.
>> Dec 31 00:21:05 2013 (8351) Master qrunner detected subprocess exit
>> (pid: 16892, sig: 9, sts: None, class: ArchRunner, slice: 1/1) [restarting]
>> Dec 31 00:21:10 2013 (31527) ArchRunner qrunner started.
>> Dec 31 06:25:01 2013 (8351) Master qrunner detected subprocess exit
>> (pid: 15347, sig: 9, sts: None, class: IncomingRunner, slice: 1/1) 
>> [restarting]
>> Dec 31 06:25:04 2013 (13794) IncomingRunner qrunner started.
>> Dec 31 12:28:51 2013 (8351) Master qrunner detected subprocess exit
>> (pid: 13794, sig: 9, sts: None, class: IncomingRunner, slice: 1/1) 
>> [restarting]
>> Dec 31 12:28:53 2013 (28877) IncomingRunner qrunner started.
>> Dec 31 18:32:44 2013 (8351) Master qrunner detected subprocess exit
>> (pid: 31527, sig: 9, sts: None, class: ArchRunner, slice: 1/1) [restarting]
>> Dec 31 18:32:46 2013 (10916) ArchRunner qrunner started.
>> Jan 01 00:36:02 2014 (8351) Master qrunner detected subprocess exit
>> (pid: 12268, sig: 9, sts: None, class: OutgoingRunner, slice: 1/1) 
>> [restarting]
>> Jan 01 00:36:04 2014 (25317) OutgoingRunner qrunner started.
>> Jan 01 12:43:48 2014 (8351) Master qrunner detected subprocess exit
>> (pid: 10916, sig: 9, sts: None, class: ArchRunner, slice: 1/1) [restarting]
>> Jan 01 12:43:50 2014 (22804) ArchRunner qrunner started.
>> Jan 01 15:22:22 2014 (8351) Master qrunner detected subprocess exit
>> (pid: 28877, sig: 9, sts: None, class: IncomingRunner, slice: 1/1) 
>> [restarting]
>> Jan 01 15:22:22 2014 (8351) Qrunner IncomingRunner reached maximum restart 
>> limit of 10, not restarting.


All of the above are signal 9 (SIGKILL). Do you have some cron or other
process that's SIGKILLing the qrunners in an attempt to keep them small
or for some other reason? See the FAQ at <http://wiki.list.org/x/94A9>.


> Also there are no errors in the error log around the same time. I am seeing a 
> bunch of errors (now) like:
> 
>> Jan 05 20:46:54 2014 (1522) Uncaught runner exception: [Errno 2] No such 
>> file or directory: 
>> '/usr/local/mailman/qfiles/in/1388971910.759851+b18e7af8cb0632a2d9f551c9e39053510b278e9a.pck'
>> Jan 05 20:46:54 2014 (1522) Traceback (most recent call last):
>>   File "/usr/local/mailman/Mailman/Queue/Runner.py", line 99, in _oneloop
>>     msg, msgdata = self._switchboard.dequeue(filebase)
>>   File "/usr/local/mailman/Mailman/Queue/Switchboard.py", line 154, in 
>> dequeue
>>     fp = open(filename)
>> IOError: [Errno 2] No such file or directory: 
>> '/usr/local/mailman/qfiles/in/1388971910.759851+b18e7af8cb0632a2d9f551c9e39053510b278e9a.pck'
>>
>> Jan 05 20:46:54 2014 (1522) Skipping and preserving unparseable message: 
>> 1388971910.759851+b18e7af8cb0632a2d9f551c9e39053510b278e9a
>> Jan 05 20:46:54 2014 (1522) Failed to unlink/preserve backup file: 
>> /usr/local/mailman/qfiles/in/1388971910.759851+b18e7af8cb0632a2d9f551c9e39053510b278e9a.bak
>> [Errno 2] No such file or directory
> 
> 
> I think these are related to some pck files that I hand deleted because I 
> thought they were causing the stale lock problem.


I think these are because you have more than one qrunner processing the
same slice of the same queue. See the FAQ at <http://wiki.list.org/x/_4A9>.

-- 
Mark Sapiro <m...@msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan
------------------------------------------------------
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Reply via email to