Re: [Mailman-Users] Messages got stuck in in queue due to one badmessage
Xueshan Feng wrote: We are running mailman 2.1.9. Recently we had a problem that mailman's Incoming qrunner died hard, which caused messages accumulated under qfiles/in directory. Restarting mailman didn't help. We finally identified a message on top of the queue, moved it aside, restarted mailman again, then mail will start flow. Putting that message back to the in queue will trigger the problem. The sender actually tried to send it again, and it caused the same problem. Before I got into the details of mailman logs, I'd like to ask: 1. What's the best way to detect stalled condition like this? We do monitor qrunner processes but in this case it didn't help because the qrunners were still running. Now I put in a script to monitor the number of files in in queue, and if it reaches a threshold, sends alert. Actually, If I understand the situation correctly, IncomingRunner will not be running. It will die and be restarted, but after the 10th restart/die, it won't restart again (but this seems incorrect - see below). 2. Why this message didn't get moved to shunt directory? I don't know why it didn't shunt. From the traceback, the message should have shunted and that should have been the end of it. Also, the problem with decode_rfc2231 in email.Utils that seems to be the underlying issue here was fixed in email 2.5.8 which shipped with Mailman 2.1.9 and should be in Mailman's pythonlib directory. Your traceback says you are getting the email library from your Python 2.3 installation. This is not correct and should not happen if Mailman is properly installed. If I understand the actual problem with the message, it is caused by an error in email 2.5.7 and earlier and is precipitated by a message with an apostrophe (') in the subject. 3. Can the mailman recover itself without human intervention? I don't understand why the message didn't shunt. I also don't understand why IncomingRunner was restarted more than 10 times. Have you changed MAX_RESTARTS = 10 in bin/mailmanctl? Was there a second traceback with only one date/time header following the one you report? If so, what was it? Have you changed Mailman/Queue/Runner.py in any way? Details are followed: More comments below: The following error message will repeat a couple of times in error log when this message was processed: Feb 20 07:55:53 2007 qrunner(25722): Traceback (most recent call last): Feb 20 07:55:53 2007 qrunner(25722): File /var/lib/mailman/bin/qrunner, line 278, in ? Feb 20 07:55:53 2007 qrunner(25722): main() Feb 20 07:55:53 2007 qrunner(25722): File /var/lib/mailman/bin/qrunner, line 238, in main Feb 20 07:55:53 2007 qrunner(25722): qrunner.run() Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/mailman/Mailman/Queue/Runner.py, line 71, in run Feb 20 07:55:53 2007 qrunner(25722): filecnt = self._oneloop() Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/mailman/Mailman/Queue/Runner.py, line 100, in _oneloop Feb 20 07:55:53 2007 qrunner(25722): msg, msgdata = self._switchboard.dequeue(filebase) Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/mailman/Mailman/Queue/Switchboard.py, line 164, in dequeue Feb 20 07:55:53 2007 qrunner(25722): msg = email.message_from_string(msg, Message.Message) Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/python2.3/email/__init__.py, line 52, in message_from_string This and subsequent email modules should come from /usr/lib/mailman/pythonlib/email, not from /usr/lib/python2.3/email. Feb 20 07:55:53 2007 qrunner(25722): return Parser(_class, strict=strict).parsestr(s) Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/python2.3/email/Parser.py, line 75, in parsestr Feb 20 07:55:53 2007 qrunner(25722): return self.parse(StringIO(text), headersonly=headersonly) Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/python2.3/email/Parser.py, line 64, in parse Feb 20 07:55:53 2007 qrunner(25722): self._parsebody(root, fp, firstbodyline) Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/python2.3/email/Parser.py, line 240, in _parsebody Feb 20 07:55:53 2007 qrunner(25722): msgobj = self.parsestr(part) Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/python2.3/email/Parser.py, line 75, in parsestr Feb 20 07:55:53 2007 qrunner(25722): return self.parse(StringIO(text), headersonly=headersonly) Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/python2.3/email/Parser.py, line 64, in parse Feb 20 07:55:53 2007 qrunner(25722): self._parsebody(root, fp, firstbodyline) Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/python2.3/email/Parser.py, line 146, in _parsebody Feb 20 07:55:53 2007 qrunner(25722): boundary = container.get_boundary() Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/python2.3/email/Message.py, line 743, in get_boundary Feb 20 07:55:53 2007 qrunner(25722): boundary = self.get_param('boundary', missing) Feb 20 07:55:53 2007 qrunner(25722): File
Re: [Mailman-Users] Messages got stuck in in queue due to one badmessage
Mark Sapiro writes: 2. Why this message didn't get moved to shunt directory? I don't know why it didn't shunt. At least in earlier implementations of the email lib and Mailman, the original parse of the message was not enclosed in the shunt mechanism, so the exception got caught by the catchall handler, not the shunt handler. As of 2.1.5, the I18N stuff made assumptions that certain things were ASCII, and that it was handling MIME conversions correctly. These assumptions have been regularly violated, and patches have been applied piecemeal, but AFAICS (from a 2.1.5 vs. 2.1.8 diff I did a while back) the fundamental architectural issue was not addressed. It would appear that that still hasn't been done. It's not obvious to me that that's a change that's appropriate for 2.1. -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp
Re: [Mailman-Users] Messages got stuck in in queue due to one badmessage
On Tue, 2007-02-20 at 15:37 -0800, Mark Sapiro wrote: Xueshan Feng wrote: We are running mailman 2.1.9. Recently we had a problem that mailman's Incoming qrunner died hard, which caused messages accumulated under qfiles/in directory. Restarting mailman didn't help. We finally identified a message on top of the queue, moved it aside, restarted mailman again, then mail will start flow. Putting that message back to the in queue will trigger the problem. The sender actually tried to send it again, and it caused the same problem. Before I got into the details of mailman logs, I'd like to ask: 1. What's the best way to detect stalled condition like this? We do monitor qrunner processes but in this case it didn't help because the qrunners were still running. Now I put in a script to monitor the number of files in in queue, and if it reaches a threshold, sends alert. Actually, If I understand the situation correctly, IncomingRunner will not be running. It will die and be restarted, but after the 10th restart/die, it won't restart again (but this seems incorrect - see below). The qrunner log shows they died, but if I do ps -ef |grep IncomingRunner, they are still there. 2. Why this message didn't get moved to shunt directory? I don't know why it didn't shunt. From the traceback, the message should have shunted and that should have been the end of it. Also, the problem with decode_rfc2231 in email.Utils that seems to be the underlying issue here was fixed in email 2.5.8 which shipped with Mailman 2.1.9 and should be in Mailman's pythonlib directory. Your traceback says you are getting the email library from your Python 2.3 installation. This is not correct and should not happen if Mailman is properly installed. You nailed it! We indeed re-packaged Mailman with a lot of Stanford's own patches. When we upgrade from 2.1.8 to 2.1.9, I missed Mailman's own pythonlib/email installation in Debian rule file. I just re-packaged it and tested the new package. The message that caused the problem now is accepted without a problem! 3. Can the mailman recover itself without human intervention? I don't understand why the message didn't shunt. I also don't understand why IncomingRunner was restarted more than 10 times. Have you changed MAX_RESTARTS = 10 in bin/mailmanctl? No that's not changed. It is still 10. Was there a second traceback with only one date/time header following the one you report? If so, what was it? Yes, there were more similar traceback in the logs. There are 10 actually. Have you changed Mailman/Queue/Runner.py in any way? Not that program. Details are followed: More comments below: The following error message will repeat a couple of times in error log when this message was processed: Feb 20 07:55:53 2007 qrunner(25722): Traceback (most recent call last): Feb 20 07:55:53 2007 qrunner(25722): File /var/lib/mailman/bin/qrunner, line 278, in ? Feb 20 07:55:53 2007 qrunner(25722): main() Feb 20 07:55:53 2007 qrunner(25722): File /var/lib/mailman/bin/qrunner, line 238, in main Feb 20 07:55:53 2007 qrunner(25722): qrunner.run() Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/mailman/Mailman/Queue/Runner.py, line 71, in run Feb 20 07:55:53 2007 qrunner(25722): filecnt = self._oneloop() Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/mailman/Mailman/Queue/Runner.py, line 100, in _oneloop Feb 20 07:55:53 2007 qrunner(25722): msg, msgdata = self._switchboard.dequeue(filebase) Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/mailman/Mailman/Queue/Switchboard.py, line 164, in dequeue Feb 20 07:55:53 2007 qrunner(25722): msg = email.message_from_string(msg, Message.Message) Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/python2.3/email/__init__.py, line 52, in message_from_string This and subsequent email modules should come from /usr/lib/mailman/pythonlib/email, not from /usr/lib/python2.3/email. Feb 20 07:55:53 2007 qrunner(25722): return Parser(_class, strict=strict).parsestr(s) Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/python2.3/email/Parser.py, line 75, in parsestr Feb 20 07:55:53 2007 qrunner(25722): return self.parse(StringIO(text), headersonly=headersonly) Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/python2.3/email/Parser.py, line 64, in parse Feb 20 07:55:53 2007 qrunner(25722): self._parsebody(root, fp, firstbodyline) Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/python2.3/email/Parser.py, line 240, in _parsebody Feb 20 07:55:53 2007 qrunner(25722): msgobj = self.parsestr(part) Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/python2.3/email/Parser.py, line 75, in parsestr Feb 20 07:55:53 2007 qrunner(25722): return self.parse(StringIO(text), headersonly=headersonly) Feb 20 07:55:53 2007 qrunner(25722): File /usr/lib/python2.3/email/Parser.py, line