Re: [Mailman-Users] Messages got stuck in in queue due to one badmessage

2007-02-20 Thread Mark Sapiro
Xueshan Feng wrote:

We are running mailman 2.1.9. Recently we had a problem that mailman's
Incoming qrunner died hard, which caused messages accumulated under
qfiles/in directory.  Restarting mailman didn't help. We finally
identified a message on top of the queue, moved it aside, restarted
mailman again, then mail will start flow. 

Putting that message back to the in queue will trigger the problem. The
sender actually tried to send it again, and it caused the same problem.

Before I got into the details of mailman logs, I'd like to ask:

1. What's the best way to detect stalled condition like this? We do
monitor qrunner processes but in this case it didn't help because the
qrunners were still running. Now I put in a script to monitor the number
of files in in queue, and if it reaches a threshold, sends alert. 


Actually, If I understand the situation correctly, IncomingRunner will
not be running. It will die and be restarted, but after the 10th
restart/die, it won't restart again (but this seems incorrect - see
below).


2. Why this message didn't get moved to shunt directory?


I don't know why it didn't shunt. From the traceback, the message
should have shunted and that should have been the end of it. Also, the
problem with decode_rfc2231 in email.Utils that seems to be the
underlying issue here was fixed in email 2.5.8 which shipped with
Mailman 2.1.9 and should be in Mailman's pythonlib directory. Your
traceback says you are getting the email library from your Python 2.3
installation. This is not correct and should not happen if Mailman is
properly installed.

If I understand the actual problem with the message, it is caused by an
error in email 2.5.7 and earlier and is precipitated by a message with
an apostrophe (') in the subject.


3. Can the mailman recover itself without human intervention? 


I don't understand why the message didn't shunt. I also don't
understand why IncomingRunner was restarted more than 10 times. Have
you changed

MAX_RESTARTS = 10

in bin/mailmanctl?

Was there a second traceback with only one date/time header following
the one you report? If so, what was it?

Have you changed Mailman/Queue/Runner.py in any way?


Details are followed:


More comments below:


The following error message will repeat a couple of times in error log
when this message was processed: 


Feb 20 07:55:53 2007 qrunner(25722): Traceback (most recent call last):
Feb 20 07:55:53 2007 qrunner(25722):   File
/var/lib/mailman/bin/qrunner, line 278, in ?
Feb 20 07:55:53 2007 qrunner(25722):  main()
Feb 20 07:55:53 2007 qrunner(25722):   File
/var/lib/mailman/bin/qrunner, line 238, in main
Feb 20 07:55:53 2007 qrunner(25722):  qrunner.run()
Feb 20 07:55:53 2007 qrunner(25722):   File
/usr/lib/mailman/Mailman/Queue/Runner.py, line 71, in run
Feb 20 07:55:53 2007 qrunner(25722):  filecnt = self._oneloop()
Feb 20 07:55:53 2007 qrunner(25722):   File
/usr/lib/mailman/Mailman/Queue/Runner.py, line 100, in _oneloop
Feb 20 07:55:53 2007 qrunner(25722):  msg, msgdata =
self._switchboard.dequeue(filebase)
Feb 20 07:55:53 2007 qrunner(25722):   File
/usr/lib/mailman/Mailman/Queue/Switchboard.py, line 164, in dequeue
Feb 20 07:55:53 2007 qrunner(25722):  msg =
email.message_from_string(msg, Message.Message)
Feb 20 07:55:53 2007 qrunner(25722):   File
/usr/lib/python2.3/email/__init__.py, line 52, in message_from_string


This and subsequent email modules should come from
/usr/lib/mailman/pythonlib/email, not from /usr/lib/python2.3/email.


Feb 20 07:55:53 2007 qrunner(25722):  return Parser(_class,
strict=strict).parsestr(s)
Feb 20 07:55:53 2007 qrunner(25722):   File
/usr/lib/python2.3/email/Parser.py, line 75, in parsestr
Feb 20 07:55:53 2007 qrunner(25722):  return
self.parse(StringIO(text), headersonly=headersonly)
Feb 20 07:55:53 2007 qrunner(25722):   File
/usr/lib/python2.3/email/Parser.py, line 64, in parse
Feb 20 07:55:53 2007 qrunner(25722):  self._parsebody(root, fp,
firstbodyline)
Feb 20 07:55:53 2007 qrunner(25722):   File
/usr/lib/python2.3/email/Parser.py, line 240, in _parsebody
Feb 20 07:55:53 2007 qrunner(25722):  msgobj = self.parsestr(part)
Feb 20 07:55:53 2007 qrunner(25722):   File
/usr/lib/python2.3/email/Parser.py, line 75, in parsestr
Feb 20 07:55:53 2007 qrunner(25722):  return
self.parse(StringIO(text), headersonly=headersonly)
Feb 20 07:55:53 2007 qrunner(25722):   File
/usr/lib/python2.3/email/Parser.py, line 64, in parse
Feb 20 07:55:53 2007 qrunner(25722):  self._parsebody(root, fp,
firstbodyline)
Feb 20 07:55:53 2007 qrunner(25722):   File
/usr/lib/python2.3/email/Parser.py, line 146, in _parsebody
Feb 20 07:55:53 2007 qrunner(25722):  boundary =
container.get_boundary()
Feb 20 07:55:53 2007 qrunner(25722):   File
/usr/lib/python2.3/email/Message.py, line 743, in get_boundary
Feb 20 07:55:53 2007 qrunner(25722):  boundary =
self.get_param('boundary', missing)
Feb 20 07:55:53 2007 qrunner(25722):   File

Re: [Mailman-Users] Messages got stuck in in queue due to one badmessage

2007-02-20 Thread Stephen J. Turnbull
Mark Sapiro writes:

  2. Why this message didn't get moved to shunt directory?
  
  I don't know why it didn't shunt.

At least in earlier implementations of the email lib and Mailman, the
original parse of the message was not enclosed in the shunt mechanism,
so the exception got caught by the catchall handler, not the shunt
handler.

As of 2.1.5, the I18N stuff made assumptions that certain things were
ASCII, and that it was handling MIME conversions correctly.  These
assumptions have been regularly violated, and patches have been
applied piecemeal, but AFAICS (from a 2.1.5 vs. 2.1.8 diff I did a
while back) the fundamental architectural issue was not addressed.  It
would appear that that still hasn't been done.

It's not obvious to me that that's a change that's appropriate for
2.1.

--
Mailman-Users mailing list
Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=showamp;file=faq01.027.htp


Re: [Mailman-Users] Messages got stuck in in queue due to one badmessage

2007-02-20 Thread Xueshan Feng
On Tue, 2007-02-20 at 15:37 -0800, Mark Sapiro wrote:
 Xueshan Feng wrote:
 
 We are running mailman 2.1.9. Recently we had a problem that mailman's
 Incoming qrunner died hard, which caused messages accumulated under
 qfiles/in directory.  Restarting mailman didn't help. We finally
 identified a message on top of the queue, moved it aside, restarted
 mailman again, then mail will start flow. 
 
 Putting that message back to the in queue will trigger the problem. The
 sender actually tried to send it again, and it caused the same problem.
 
 Before I got into the details of mailman logs, I'd like to ask:
 
 1. What's the best way to detect stalled condition like this? We do
 monitor qrunner processes but in this case it didn't help because the
 qrunners were still running. Now I put in a script to monitor the number
 of files in in queue, and if it reaches a threshold, sends alert. 
 
 
 Actually, If I understand the situation correctly, IncomingRunner will
 not be running. It will die and be restarted, but after the 10th
 restart/die, it won't restart again (but this seems incorrect - see
 below).

The qrunner log shows they died, but if I do ps -ef |grep
IncomingRunner, they are still there. 


 
 
 2. Why this message didn't get moved to shunt directory?
 
 
 I don't know why it didn't shunt. From the traceback, the message
 should have shunted and that should have been the end of it. Also, the
 problem with decode_rfc2231 in email.Utils that seems to be the
 underlying issue here was fixed in email 2.5.8 which shipped with
 Mailman 2.1.9 and should be in Mailman's pythonlib directory. Your
 traceback says you are getting the email library from your Python 2.3
 installation. This is not correct and should not happen if Mailman is
 properly installed.

You nailed it! 

We indeed re-packaged Mailman with a lot of Stanford's own patches. When
we upgrade from 2.1.8 to 2.1.9, I missed  Mailman's own pythonlib/email
installation in Debian rule file.

I just re-packaged it and tested the new package. The message that
caused the problem now is accepted without a problem!


 
 3. Can the mailman recover itself without human intervention? 

 I don't understand why the message didn't shunt. I also don't
 understand why IncomingRunner was restarted more than 10 times. Have
 you changed
 
 MAX_RESTARTS = 10
 
 in bin/mailmanctl?

No that's not changed. It is still 10. 


 
 Was there a second traceback with only one date/time header following
 the one you report? If so, what was it?

Yes, there were more similar traceback in the logs. There are 10
actually.

 
 Have you changed Mailman/Queue/Runner.py in any way?

Not that program.


 
 
 Details are followed:
 
 
 More comments below:
 
 
 The following error message will repeat a couple of times in error log
 when this message was processed: 
 
 
 Feb 20 07:55:53 2007 qrunner(25722): Traceback (most recent call last):
 Feb 20 07:55:53 2007 qrunner(25722):   File
 /var/lib/mailman/bin/qrunner, line 278, in ?
 Feb 20 07:55:53 2007 qrunner(25722):  main()
 Feb 20 07:55:53 2007 qrunner(25722):   File
 /var/lib/mailman/bin/qrunner, line 238, in main
 Feb 20 07:55:53 2007 qrunner(25722):  qrunner.run()
 Feb 20 07:55:53 2007 qrunner(25722):   File
 /usr/lib/mailman/Mailman/Queue/Runner.py, line 71, in run
 Feb 20 07:55:53 2007 qrunner(25722):  filecnt = self._oneloop()
 Feb 20 07:55:53 2007 qrunner(25722):   File
 /usr/lib/mailman/Mailman/Queue/Runner.py, line 100, in _oneloop
 Feb 20 07:55:53 2007 qrunner(25722):  msg, msgdata =
 self._switchboard.dequeue(filebase)
 Feb 20 07:55:53 2007 qrunner(25722):   File
 /usr/lib/mailman/Mailman/Queue/Switchboard.py, line 164, in dequeue
 Feb 20 07:55:53 2007 qrunner(25722):  msg =
 email.message_from_string(msg, Message.Message)
 Feb 20 07:55:53 2007 qrunner(25722):   File
 /usr/lib/python2.3/email/__init__.py, line 52, in message_from_string
 
 
 This and subsequent email modules should come from
 /usr/lib/mailman/pythonlib/email, not from /usr/lib/python2.3/email.
 
 
 Feb 20 07:55:53 2007 qrunner(25722):  return Parser(_class,
 strict=strict).parsestr(s)
 Feb 20 07:55:53 2007 qrunner(25722):   File
 /usr/lib/python2.3/email/Parser.py, line 75, in parsestr
 Feb 20 07:55:53 2007 qrunner(25722):  return
 self.parse(StringIO(text), headersonly=headersonly)
 Feb 20 07:55:53 2007 qrunner(25722):   File
 /usr/lib/python2.3/email/Parser.py, line 64, in parse
 Feb 20 07:55:53 2007 qrunner(25722):  self._parsebody(root, fp,
 firstbodyline)
 Feb 20 07:55:53 2007 qrunner(25722):   File
 /usr/lib/python2.3/email/Parser.py, line 240, in _parsebody
 Feb 20 07:55:53 2007 qrunner(25722):  msgobj = self.parsestr(part)
 Feb 20 07:55:53 2007 qrunner(25722):   File
 /usr/lib/python2.3/email/Parser.py, line 75, in parsestr
 Feb 20 07:55:53 2007 qrunner(25722):  return
 self.parse(StringIO(text), headersonly=headersonly)
 Feb 20 07:55:53 2007 qrunner(25722):   File
 /usr/lib/python2.3/email/Parser.py, line