Mailman 2.1.5 on Solaris 8, with Python 2.3.3. I was getting the following errors in logs/error and logs/qrunner:
error: Aug 13 15:16:53 2004 qrunner(7657): Traceback (most recent call last): Aug 13 15:16:53 2004 qrunner(7657): File "/usr/local/mailman/bin/qrunner", line 270, in ? Aug 13 15:16:53 2004 qrunner(7657): main() Aug 13 15:16:53 2004 qrunner(7657): File "/usr/local/mailman/bin/qrunner", line 230, in main Aug 13 15:16:53 2004 qrunner(7657): qrunner.run() Aug 13 15:16:53 2004 qrunner(7657): File "/usr/local/mailman/Mailman/Queue/Runner.py", line 70, in run Aug 13 15:16:53 2004 qrunner(7657): filecnt = self._oneloop() Aug 13 15:16:53 2004 qrunner(7657): File "/usr/local/mailman/Mailman/Queue/Runner.py", line 99, in _oneloop Aug 13 15:16:53 2004 qrunner(7657): msg, msgdata = self._switchboard.dequeue(filebase) Aug 13 15:16:53 2004 qrunner(7657): File "/usr/local/mailman/Mailman/Queue/Switchboard.py", line 144, in dequeue Aug 13 15:16:53 2004 qrunner(7657): os.unlink(filename) Aug 13 15:16:53 2004 qrunner(7657): OSError : [Errno 2] No such file or directory: '/var/priv/mail/mailman/qfiles/out/1092428211.4786341+bad1265375ae36cc455fc7e521e9c39c09a29558.pck' qrunner: Aug 13 15:16:53 2004 (29188) Master qrunner detected subprocess exit (pid: 7657, sig: None, sts: 1, class: OutgoingRunner, slice: 3/4) [restarting] Aug 13 15:16:54 2004 (7005) OutgoingRunner qrunner started. with Aug 08 05:35:34 2004 (716) Qrunner OutgoingRunner reached maximum restart limit of 10, not restarting. showing up eventually, followed by mail building up in the outgoing queue. I was running four OutgoingRunner instances, set in mm_cfg.py with: QRUNNERS = [ ('ArchRunner', 1), # messages for the archiver ('BounceRunner', 1), # for processing the qfile/bounces directory ('CommandRunner', 1), # commands and bounces from the outside world ('IncomingRunner', 1), # posts from the outside world ('NewsRunner', 1), # outgoing messages to the nntpd ('OutgoingRunner', 4), # outgoing messages to the smtpd ('VirginRunner', 1), # internally crafted (virgin birth) messages ('RetryRunner', 1), # retry temporarily failed deliveries ] The problem is a logic error in mailman/Mailman/Queue/Switchboard.py, and is fixed with a one-line patch. The problem is: In Switchboard.py:__init__, the upper and lower bounds (self.__upper and self.__lower) are both set to "None" if there is only a single instance of the qrunner class in question, and to the correct upper and lower bounds of each subslice if there is more than one. In Switchboard.py:files (which returns a list of all files in the queue directory that this qrunner instance is to process) the statement that rejects files that are not within the bounds of this qrunner instance has a logic error. The line in question: if not lower or (lower <= long(digest, 16) < upper ) : can be read as "If this is a single instance qrunner (because lower is set to "None", and therefore false) or if the file is within the upper and lower bounds of this instance, add it to the list of files." The problem is that the first slice of any multi-slice qrunner has a lower bound of 0, and (not 0) evaluates as true. This results in slice 0 of any multi-slice qrunner trying to grab files from the entire queue, rather than it's assigned portion, resulting in a race condition and the crash of one of the qrunners when slice 0 and slice n try to process the same file at the same time. Patch: *** Switchboard.py Fri Aug 13 16:43:12 2004 --- Switchboard.py_new Fri Aug 13 16:43:48 2004 *************** *** 164,170 **** when, digest = filebase.split('+') # Throw out any files which don't match our bitrange. BAW: test # performance and end-cases of this algorithm. ! if not lower or (lower <= long(digest, 16) < upper): times[float(when)] = filebase # FIFO sort keys = times.keys() --- 164,170 ---- when, digest = filebase.split('+') # Throw out any files which don't match our bitrange. BAW: test # performance and end-cases of this algorithm. ! if (lower == upper) or (lower <= long(digest, 16) < upper): times[float(when)] = filebase # FIFO sort keys = times.keys() Thanks! Brian. -- Brian Greenberg [EMAIL PROTECTED] ------------------------------------------------------ Mailman-Users mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/