Re: [Mailman-Users] is driving me crazy

2006-01-22 Thread Mark Sapiro
ArteryPlanet.Net :: Manuel Kissoyan wrote:
>>So VirginRunner isn't (wasn't on Jan 18) running. Are there any error
>>log entries from Jan 18 coincident with these subprocess exits?
>the following rae the complete january 18 error logs:

>Jan 18 17:29:18 2006 qrunner(11252): Traceback (most recent call last):
>Jan 18 17:29:18 2006 qrunner(11252):   File 
>"/usr/local/cpanel/3rdparty/mailman/bin/qrunner", line 270, in ?
>Jan 18 17:29:18 2006 qrunner(11252):  main()
>Jan 18 17:29:18 2006 qrunner(11252):   File 
>"/usr/local/cpanel/3rdparty/mailman/bin/qrunner", line 230, in main
>Jan 18 17:29:18 2006 qrunner(11252):
>Jan 18 17:29:18 2006 qrunner(11252):   File 
>"/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 70, in 
>Jan 18 17:29:18 2006 qrunner(11252):  filecnt = self._oneloop()
>Jan 18 17:29:18 2006 qrunner(11252):   File 
>"/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 99, in 
>Jan 18 17:29:18 2006 qrunner(11252):  msg, msgdata = 
>Jan 18 17:29:18 2006 qrunner(11252):   File 
>"/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 143, 
>in dequeue
>Jan 18 17:29:18 2006 qrunner(11252):  fp = open(filename)
>Jan 18 17:29:18 2006 qrunner(11252): IOError :  [Errno 2] No such file or 

The above is the error that caused VirginRunner to die and be restarted
at 17:29:19 (from your previous post).

>Jan 18 18:15:50 2006 qrunner(23055): Traceback (most recent call last):
>Jan 18 18:15:50 2006 qrunner(23055):   File 
>"/usr/local/cpanel/3rdparty/mailman/bin/qrunner", line 270, in ?
>Jan 18 18:15:51 2006 qrunner(23055):  main()
>Jan 18 18:15:51 2006 qrunner(23055):   File 
>"/usr/local/cpanel/3rdparty/mailman/bin/qrunner", line 230, in main
>Jan 18 18:15:51 2006 qrunner(23055):
>Jan 18 18:15:51 2006 qrunner(23055):   File 
>"/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 70, in 
>Jan 18 18:15:51 2006 qrunner(23055):  filecnt = self._oneloop()
>Jan 18 18:15:51 2006 qrunner(23055):   File 
>"/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 99, in 
>Jan 18 18:15:51 2006 qrunner(23055):  msg, msgdata = 
>Jan 18 18:15:51 2006 qrunner(23055):   File 
>"/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 144, 
>in dequeue
>Jan 18 18:15:51 2006 qrunner(23055):  os.unlink(filename)
>Jan 18 18:15:51 2006 qrunner(23055): OSError :  [Errno 2] No such file or 

And the above caused the exit at 18:15:51

>Jan 18 18:16:03 2006 qrunner(23105): Traceback (most recent call last):
>Jan 18 18:16:03 2006 qrunner(23105):   File 
>"/usr/local/cpanel/3rdparty/mailman/bin/qrunner", line 270, in ?
>Jan 18 18:16:03 2006 qrunner(23105):  main()
>Jan 18 18:16:03 2006 qrunner(23105):   File 
>"/usr/local/cpanel/3rdparty/mailman/bin/qrunner", line 230, in main
>Jan 18 18:16:03 2006 qrunner(23105):
>Jan 18 18:16:03 2006 qrunner(23105):   File 
>"/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 70, in 
>Jan 18 18:16:03 2006 qrunner(23105):  filecnt = self._oneloop()
>Jan 18 18:16:03 2006 qrunner(23105):   File 
>"/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 99, in 
>Jan 18 18:16:03 2006 qrunner(23105):  msg, msgdata = 
>Jan 18 18:16:03 2006 qrunner(23105):   File 
>"/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 143, 
>in dequeue
>Jan 18 18:16:03 2006 qrunner(23105):  fp = open(filename)
>Jan 18 18:16:03 2006 qrunner(23105): IOError :  [Errno 2] No such file or 

And the above caused the exit at 18:16:03 from which it didn't restart

>I just shuted down mailman and were more than one processes running i had 
>kille them and restarted it, ill wait another week to see what hapen,.

It looks like you've had miltiple runners running for some time,
however this is probably the result of attempts to correct the
problems, not the cause of the problems. I.e., in the above case, it
looks like there were two VirginRunners running until 18:16:03 on Jan
18 at which point one died and didn't restart perhaps leaving only one.

In the future, try to get some information when the list is not
working. If some, but not all qrunners have died, check the qrunner
and error logs. Then if not all qrunners are running, first do
'bin/mailmanctl stop' to stop what's running, and only do
'bin/mailmanctl start' to resume after checking with ps to be sure al

Re: [Mailman-Users] is driving me crazy

2006-01-22 Thread Mark Sapiro
ArteryPlanet.Net :: Manuel Kissoyan wrote:

>I remember i saw qrunners in the server process when the mailing list were 
>down but somehow when i restart mailman looks like the whole list were down, 
>because it starting send all the lists queued mails.

This only means that perhaps one runner was down and starting anew
created multiple copies of some. Hint: only run 'bin/mailmanctl -s
start' in an init file that runs on system (re)boot. Under normal
circumstances when running bin/mailmanctl manually, don't use '-s'.

>the qrunner logs at the same hour when the list gone down are:
>Jan 18 17:29:19 2006 (23105) VirginRunner qrunner started.
>Jan 18 18:15:51 2006 (10568) Master qrunner detected subprocess exit
>(pid: 23055, sig: None, sts: 1, class: VirginRunner, slice: 1/1) 
>Jan 18 18:15:51 2006 (11632) VirginRunner qrunner started.
>Jan 18 18:16:03 2006 (480) Master qrunner detected subprocess exit
>(pid: 23105, sig: None, sts: 1, class: VirginRunner, slice: 1/1) 
>Jan 18 18:16:03 2006 (480) Qrunner VirginRunner reached maximum restart 
>limit of 10, not restarting.

So VirginRunner isn't (wasn't on Jan 18) running. Are there any error
log entries from Jan 18 coincident with these subprocess exits?

>these are the last lines right now in the error log:

>Jan 22 23:51:22 2006 qrunner(1449): OSError :  [Errno 2] No such file or 

>Jan 22 23:51:23 2006 qrunner(20729): OSError :  [Errno 2] No such file or 

>Jan 22 23:51:23 2006 qrunner(20713): IOError :  [Errno 2] No such file or 

>Jan 22 23:52:41 2006 qrunner(21835): IOError :  [Errno 2] No
such file or 

So, there are probably multiple copies of at least IncomingRunner,
OutgoingRunner, ArchRunner and BounceRunner.

Do "ps -fAw | grep 'python'" or however you spell the ps options on
your system to get all processes including command lines. There should
be exactly one each of mailmanctl, ArchRunner, BounceRunner,
CommandRunner, IncomingRunner, NewsRunner, OutgoingRunner,
VirginRunner and RetryRunner, except in the unlikely case that you are
processing your queues in slices in which case there should be one of
each runner for each unique slice.

If there are more, first do "bin/mailmanctl stop". Then if there are
any left, send them SIGTERM until they're all gone. Then start mailman

>about"Where are the posts going, i.e. which qfiles/* directories have 
>entries.", could you please clarify...the following are the directories in 
>drwxrwsr-x   11 mailman  mailman  4096 May 28  2004 ./
>drwxrwsr-x   22 mailman  mailman  4096 Jul 20  2005 ../
>drwxrws---2 mailman  mailman  4096 Jan 22 23:51 archive/
>drwxrws---2 mailman  mailman  4096 Jan 22 23:52 bounces/
>drwxrws---2 mailman  mailman  4096 Jan 19 01:41 commands/
>drwxrws---2 mailman  mailman  8192 Jan 22 23:51 in/
>drwxrws---2 mailman  mailman  4096 May 28  2004 news/
>drwxrws---2 mailman  mailman 53248 Jan 22 23:51 out/
>drwxrws---2 mailman  mailman  4096 Jan 22 22:37 retry/
>drwxrws---2 mailman  mailman  8192 Dec 17 04:39 shunt/
>drwxrws---2 mailman  mailman 36864 Jan 22 23:44 virgin/

And what is in the archive/, bounces/, etc. directories? More
importantly, when things are not working, in which of those 9
directories (queues) are the messages getting stuck?

>About "Also, what happens if you move the lists/LIST_NAME/digest.mbox file 
>aside? Does that help?"
>you mean delete that file? remember we already removed this list and 
>re-created it so that file was created new before it gone down.

The lists/LIST_NAME/digest.mbox file is where posts are collected for
an eventual digest. When it reaches digest_size_threshold size or when
cron/senddigests runs if digest_send_periodic is yes, it is used to
create the digest and then removed. I.e., under usual circumstances,
it is removed by Mailman every day.

The issue is that there are known cases when a somehow malformed or
badly encoded post has been posted and saved to digest.mbox, and this
has stopped processing for that list. This is why I suggested moving
it aside - i.e. moving it out of the lists/LIST_NAME/ directory to see
if that allows the list's processing to resume. This would indicate
the problem is a 'bad' post in digest.mbox. If moving the file aside
didn't help, then the problem is elsewhere.

Mark Sapiro <[EMAIL PROTECTED]>   The highway is for gamblers,
San Francisco Bay Area, Californiabet

Re: [Mailman-Users] is driving me crazy

2006-01-22 Thread ArteryPlanet.Net :: Manuel Kissoyan
/usr/local/cpanel/3rdparty/mailman/bin/qrunner", line 270, in ?
Jan 22 23:52:41 2006 qrunner(21835):  main()
Jan 22 23:52:41 2006 qrunner(21835):   File 
"/usr/local/cpanel/3rdparty/mailman/bin/qrunner", line 230, in main
Jan 22 23:52:41 2006 qrunner(21835):
Jan 22 23:52:41 2006 qrunner(21835):   File 
"/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 70, in 
Jan 22 23:52:41 2006 qrunner(21835):  filecnt = self._oneloop()
Jan 22 23:52:41 2006 qrunner(21835):   File 
"/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 99, in 
Jan 22 23:52:41 2006 qrunner(21835):  msg, msgdata = 
Jan 22 23:52:41 2006 qrunner(21835):   File 
"/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 143, 
in dequeue
Jan 22 23:52:41 2006 qrunner(21835):  fp = open(filename)
Jan 22 23:52:41 2006 qrunner(21835): IOError :  [Errno 2] No such file or 

about"Where are the posts going, i.e. which qfiles/* directories have 
entries.", could you please clarify...the following are the directories in 

drwxrwsr-x   11 mailman  mailman  4096 May 28  2004 ./
drwxrwsr-x   22 mailman  mailman  4096 Jul 20  2005 ../
drwxrws---2 mailman  mailman  4096 Jan 22 23:51 archive/
drwxrws---2 mailman  mailman  4096 Jan 22 23:52 bounces/
drwxrws---2 mailman  mailman  4096 Jan 19 01:41 commands/
drwxrws---2 mailman  mailman  8192 Jan 22 23:51 in/
drwxrws---2 mailman  mailman  4096 May 28  2004 news/
drwxrws---2 mailman  mailman 53248 Jan 22 23:51 out/
drwxrws---2 mailman  mailman  4096 Jan 22 22:37 retry/
drwxrws---2 mailman  mailman  8192 Dec 17 04:39 shunt/
drwxrws---2 mailman  mailman 36864 Jan 22 23:44 virgin/

About "Also, what happens if you move the lists/LIST_NAME/digest.mbox file 
aside? Does that help?"

you mean delete that file? remember we already removed this list and 
re-created it so that file was created new before it gone down.

Thank you very much for the help!

- Original Message - 
From: "Mark Sapiro" <[EMAIL PROTECTED]>
To: "ArteryPlanet.Net :: Manuel Kissoyan" <[EMAIL PROTECTED]>; 
"mailman mailing list" 
Sent: Sunday, January 22, 2006 8:40 PM
Subject: Re: [Mailman-Users] is driving me crazy

> ArteryPlanet.Net :: Manuel Kissoyan wrote:
>>We moved this client from one server to other because his mailing list 
>>were going down every week, in fact for some reason is shutid donw the 
>>whole mailman, hope someone could help us with this, we re installed 
>>mailman and also we deleted and created the list again, it ran for a month 
>>now again every week is going down, just a note...before we moved this 
>>list the mailman was working without problem in this server, so is 
>>something specific with this list, is crazy...any help?
> I don't think these log entries/error reprorts are relevant to the
> issue. See below.
> First, see
> <>.
> That said, in order to help, we need more specific information about
> the problem. I.e., at this point, is it just the list, or the whole
> Mailman server that's down. If the whole server, which if any queue
> runners are still running. What's in the 'qrunner' log. What current
> entries are in the 'error' log?. Where are the posts going, i.e. which
> qfiles/* directories have entries.
> If it's only the one list, presumably the qrunners are OK, but the
> other questions apply. Also, what happens if you move the
> lists/LIST_NAME/digest.mbox file aside? Does that help?
> Also see
> <>.
>>I did find some logs probably could help.
>>Jan 18 17:59:46 2006 admin(4202): 
>>admin(4202): [- Mailman Version: 2.1.6 -]
>>admin(4202): [- Traceback --]
>>admin(4202): Traceback (most recent call last):
>>admin(4202):   File "/usr/local/cpanel/3rdparty/mailman/scripts/driver", 
>>line 109, in run_main
>>admin(4202): sys.stdout.write(tempstdout.getvalue())
>>admin(4202): IOError: [Errno 32] Broken pipe
>>admin(4202): [- Python Information -]
>>admin(4202): sys.version =   2.2.

Re: [Mailman-Users] is driving me crazy

2006-01-22 Thread Mark Sapiro
ArteryPlanet.Net :: Manuel Kissoyan wrote:

>We moved this client from one server to other because his mailing list were 
>going down every week, in fact for some reason is shutid donw the whole 
>mailman, hope someone could help us with this, we re installed mailman and 
>also we deleted and created the list again, it ran for a month now again every 
>week is going down, just a note...before we moved this list the mailman was 
>working without problem in this server, so is something specific with this 
>list, is crazy...any help?

I don't think these log entries/error reprorts are relevant to the
issue. See below.

First, see

That said, in order to help, we need more specific information about
the problem. I.e., at this point, is it just the list, or the whole
Mailman server that's down. If the whole server, which if any queue
runners are still running. What's in the 'qrunner' log. What current
entries are in the 'error' log?. Where are the posts going, i.e. which
qfiles/* directories have entries.

If it's only the one list, presumably the qrunners are OK, but the
other questions apply. Also, what happens if you move the
lists/LIST_NAME/digest.mbox file aside? Does that help?

Also see

>I did find some logs probably could help.
>Jan 18 17:59:46 2006 admin(4202):  
>admin(4202): [- Mailman Version: 2.1.6 -] 
>admin(4202): [- Traceback --] 
>admin(4202): Traceback (most recent call last):
>admin(4202):   File "/usr/local/cpanel/3rdparty/mailman/scripts/driver", line 
>109, in run_main
>admin(4202): sys.stdout.write(tempstdout.getvalue())
>admin(4202): IOError: [Errno 32] Broken pipe
>admin(4202): [- Python Information -] 
>admin(4202): sys.version =   2.2.3 (#1, Feb  2 2005, 12:20:51) 
>[GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-49)] 
>admin(4202): sys.executable  =   /usr/bin/python2 
>admin(4202): sys.prefix  =   /usr 
>admin(4202): sys.exec_prefix =   /usr 
>admin(4202): sys.path=   /usr 
>admin(4202): sys.platform=   linux2 
>admin(4202): [- Environment Variables -] 
>admin(4202):PATH_INFO: / 
>admin(4202):SERVER_SOFTWARE: Apache 
>admin(4202):PYTHONPATH: /usr/local/cpanel/3rdparty/mailman 
>admin(4202):SCRIPT_NAME: /mailman/admindb 
>admin(4202):REQUEST_METHOD: GET 
>admin(4202):HTTP_KEEP_ALIVE: 300 
>admin(4202):SERVER_PROTOCOL: HTTP/1.1 
>admin(4202):REQUEST_URI: /mailman/admindb/ 
>admin(4202):HTTP_ACCEPT_CHARSET: ISO-8859-1,utf-8;q=0.7,*;q=0.7 
>admin(4202):HTTP_USER_AGENT: Mozilla/5.0 (Windows; U; Windows NT 5.1; 
>en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax) 
>admin(4202):HTTP_CONNECTION: keep-alive 
>admin(4202):REMOTE_PORT: 3450 
>admin(4202):HTTP_ACCEPT_LANGUAGE: en-us,en;q=0.5 
>admin(4202):SERVER_PORT: 80 
>admin(4202):GATEWAY_INTERFACE: CGI/1.1 
>admin(4202):HTTP_ACCEPT_ENCODING: gzip,deflate 
>admin(4202):DOCUMENT_ROOT: /home/okiebenz/public_html 

This indicates a user has log-in cookies as the list-admin for the
'banned' and 'mercedes' lists and is going to the admindb page for
mercedes and has possibly quit or stopped the browser before the
requested page was returned.

The actual error trace is not relevant (it just indicates the script
driver is trying to write to Apache which has already closed the
pipe), but the fact that there may have been a long delay in building
the page may indicate a problem with the list's request.pck or other

>Also when i loged in shell, in fact right now after i restarted mailman and 
>was the shell openit geting the following messages
>Traceback (most recent call last):
>  File "/usr/local/cpanel/3rdparty/mailman/bin/qrunner", line 270, in ?
>  File "/usr/local/cpanel/3

[Mailman-Users] is driving me crazy

2006-01-22 Thread ArteryPlanet.Net :: Manuel Kissoyan
We moved this client from one server to other because his mailing list were 
going down every week, in fact for some reason is shutid donw the whole 
mailman, hope someone could help us with this, we re installed mailman and also 
we deleted and created the list again, it ran for a month now again every week 
is going down, just a note...before we moved this list the mailman was working 
without problem in this server, so is something specific with this list, is 
crazy...any help?

I did find some logs probably could help.

Jan 18 17:59:46 2006 admin(4202):  

admin(4202): [- Mailman Version: 2.1.6 -] 

admin(4202): [- Traceback --] 

admin(4202): Traceback (most recent call last):

admin(4202):   File "/usr/local/cpanel/3rdparty/mailman/scripts/driver", line 
109, in run_main

admin(4202): sys.stdout.write(tempstdout.getvalue())

admin(4202): IOError: [Errno 32] Broken pipe

admin(4202): [- Python Information -] 

admin(4202): sys.version =   2.2.3 (#1, Feb  2 2005, 12:20:51) 

[GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-49)] 

admin(4202): sys.executable  =   /usr/bin/python2 

admin(4202): sys.prefix  =   /usr 

admin(4202): sys.exec_prefix =   /usr 

admin(4202): sys.path=   /usr 

admin(4202): sys.platform=   linux2 

admin(4202): [- Environment Variables -] 

admin(4202):PATH_INFO: / 


admin(4202):SERVER_SOFTWARE: Apache 

admin(4202):PYTHONPATH: /usr/local/cpanel/3rdparty/mailman 



admin(4202):SCRIPT_NAME: /mailman/admindb 

admin(4202):REQUEST_METHOD: GET 


admin(4202):HTTP_KEEP_ALIVE: 300 

admin(4202):SERVER_PROTOCOL: HTTP/1.1 


admin(4202):REQUEST_URI: /mailman/admindb/ 


admin(4202):HTTP_ACCEPT_CHARSET: ISO-8859-1,utf-8;q=0.7,*;q=0.7 

admin(4202):HTTP_USER_AGENT: Mozilla/5.0 (Windows; U; Windows NT 5.1; 
en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax) 

admin(4202):HTTP_CONNECTION: keep-alive 



admin(4202):REMOTE_PORT: 3450 

admin(4202):HTTP_ACCEPT_LANGUAGE: en-us,en;q=0.5 


admin(4202):SERVER_PORT: 80 

admin(4202):GATEWAY_INTERFACE: CGI/1.1 

admin(4202):HTTP_ACCEPT_ENCODING: gzip,deflate 


admin(4202):DOCUMENT_ROOT: /home/okiebenz/public_html 

Also when i loged in shell, in fact right now after i restarted mailman and was 
the shell openit geting the following messages

Traceback (most recent call last):
  File "/usr/local/cpanel/3rdparty/mailman/bin/qrunner", line 270, in ?
  File "/usr/local/cpanel/3rdparty/mailman/bin/qrunner", line 230, in main
  File "/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 70, 
in run
filecnt = self._oneloop()
  File "/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 99, 
in _oneloop
msg, msgdata = self._switchboard.dequeue(filebase)
  File "/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 
143, in dequeue
fp = open(filename)
IOError: [Errno 2] No such file or directory: 
Traceback (most recent call last):
  File "/usr/local/cpanel/3rdparty/mailman/bin/qrunner", line 270, in ?
  File "/usr/local/cpanel/3rdparty/mailman/bin/qrunner", line 230, in main
  File "/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 70, 
in run
filecnt = self._oneloop()
  File "/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 99, 
in _oneloop
msg, msgdata = self._switchboard.dequeue(filebase)
  File "/usr/local/cpanel/3rdparty/mailman/Mailman/Queue/", line 
143, in dequeue
fp = open(filename)
IOError: [Errno 2] No such file or directory: 

Thank you in advance!

Mailman-Users mailing list