Re: [Mailman-Users] Mailman / Sendmail horriblyslowperformance..suggestions please?!?

2003-08-25 Thread Richard Barrett
for closure of this topic:

Begin forwarded message:

From: "Andrew Falgout" <[EMAIL PROTECTED]>
Date: Mon Aug 25, 2003  8:22:05  pm Europe/London
To: "Richard Barrett" <[EMAIL PROTECTED]>
Subject: Re: [Mailman-Users] Mailman / Sendmail horribly  
slowperformance..suggestions please?!?
Reply-To: "Andrew Falgout" <[EMAIL PROTECTED]>

Problem solved.  It was archiving.  That's all I needed was to figure  
out
what it was pausing on.  I turned off archival and informed my manager  
that
the overhead associated with it would be too much for the system to  
bear.
It is FLYING through the queued mails now and delivering them now.

Been a lifesaver, and THANKS!!!

- Original Message -
From: "Richard Barrett" <[EMAIL PROTECTED]>
To: "Andrew Falgout" <[EMAIL PROTECTED]>
Sent: Monday, August 25, 2003 12:11 PM
Subject: Re: [Mailman-Users] Mailman / Sendmail horribly
slowperformance..suggestions please?!?

On Monday, August 25, 2003, at 04:22  pm, Andrew Falgout wrote:

I have not modified the default values in the SMTPDirect module or
Defaults.py.  If I could get qrunner to tell me anything about what
message(s) it is working I could tell you how many recipients were on
that
list.  Each list in the current system should be no more than 2-5
people.
We moved over a small subset of mailing lists from the older  
majordomo
server.  However, there are some lists with several hundres  
recipients
on it
that will be moving in the future when these issues get worked out.

Orphaned locks?
(ocean) root $ ls -lt
(~mailman/locks)
total 16
-rw-rw-r--2 mailman  mailman55 Aug 25  2003 qrunner.lock
-rw-rw-r--2 mailman  mailman55 Aug 25  2003
qrunner.lock.ocean.otr.usm.edu.16302
-rw-rw-r--2 mailman  mailman62 Aug 25  2003
gifted.studies.lock
-rw-rw-r--2 mailman  mailman62 Aug 25  2003
gifted.studies.lock.ocean.otr.usm.edu.16302
smtp log
Aug 25 10:16:47 2003 (16302) smtp for 1 recips, completed in 1.400
seconds
Aug 25 10:16:47 2003 (16302) smtp for 1 recips, completed in 3.799
seconds
Aug 25 10:16:49 2003 (16302) smtp for 1 recips, completed in 1.968
seconds
Aug 25 10:16:49 2003 (16302) smtp for 1 recips, completed in 1.990
seconds
I'm guessing this would mean that there are not any orphaned locks.
When I
restart qrunner by hand, I kill the proces and remove all the locks
from the
locks directory.
I thought the 3.799 seconds in the smtp log was qrunner telling you
how long
it took to send the mail through the MTA.
Mailman's qrunner process  each message through a pipeline.

Attached is a modified version of the file
$prefix/Mailman/Handlers/HandlerAPI.py
This logs extra information about the progress of each message through
the pipeline into the smtp log. This may give some insight as to where
the qrunner is spending its time.
The changes in the revised file are as follows:

/Users/richard/development/mailman-htdig/mailman-2.0.13/Mailman/ 
Handlers
bash-2.05a$ diff -u HandlerAPI.py.orig HandlerAPI.py
--- HandlerAPI.py.orig  Mon Aug 25 17:58:17 2003
+++ HandlerAPI.py   Mon Aug 25 18:06:01 2003
@@ -73,10 +73,12 @@



  def do_pipeline(mlist, msg, msgdata, pipeline):
+t0x = t1x = t2x = time.time()
  while pipeline:
  modname = pipeline.pop(0)
  mod = __import__('Mailman.Handlers.' + modname)
  func = getattr(getattr(getattr(mod, 'Handlers'), modname),
'process')
+t1x = time.time()
  try:
  pid = os.getpid()
  func(mlist, msg, msgdata)
@@ -138,6 +140,10 @@
  traceback.print_exc(file=s)
  syslog('error', s.getvalue())
  break
+t2x = time.time()
+syslog('smtp', 'do_pipeline: list %s, modname %s, startat
%.3f, durn %.3f' %
+   (mlist.internal_name(), modname, (t1x-t0x),  
(t2x-t1x)))
+t1x = t2x
  return pipeline

The revised files is:




--- 
-





About the best I can do to help.
- Original Message -
From: "Richard Barrett" <[EMAIL PROTECTED]>
To: "Andrew Falgout" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Monday, August 25, 2003 9:52 AM
Subject: Re: [Mailman-Users] Mailman / Sendmail horribly
slowperformance..suggestions please?!?

On Monday, August 25, 2003, at 02:28  pm, Andrew Falgout wrote:

The error log is at 0 bytes.  I have not overridden the Defaults.py
which
shows Defaults.py:MTA_ALIASES_STYLE = 'sendmail'.  So I assume that
means
that sendmail is the MTA.
You will be using SMTP to the local MTA (which you say is Sendmail)  
if
the following MM config variables are set as follows in either
mm_cfg.py or the Defaults.py file. The following are th standard MM
2.0.13 default values:

DELIVERY_MODULE = 'SMTPDirect'
SMTPHOST = 'localhost'
Also are the following default values still in place:

SMTP_MAX_RCPTS = 500
MAX_DELIVERY_THREADS = 0
Aug 25 08:08:31 2003 (27798) smtp for 2 recips, completed in 1.045
seconds
Aug 25 08:08:31 2003 (27798) 

Re: [Mailman-Users] Mailman / Sendmail horriblyslowperformance..suggestions please?!?

2003-08-25 Thread Andrew Falgout
I have not modified the default values in the SMTPDirect module or
Defaults.py.  If I could get qrunner to tell me anything about what
message(s) it is working I could tell you how many recipients were on that
list.  Each list in the current system should be no more than 2-5 people.
We moved over a small subset of mailing lists from the older majordomo
server.  However, there are some lists with several hundres recipients on it
that will be moving in the future when these issues get worked out.

Orphaned locks?
(ocean) root $ ls -lt
(~mailman/locks)
total 16
-rw-rw-r--2 mailman  mailman55 Aug 25  2003 qrunner.lock
-rw-rw-r--2 mailman  mailman55 Aug 25  2003
qrunner.lock.ocean.otr.usm.edu.16302
-rw-rw-r--2 mailman  mailman62 Aug 25  2003 gifted.studies.lock
-rw-rw-r--2 mailman  mailman62 Aug 25  2003
gifted.studies.lock.ocean.otr.usm.edu.16302

smtp log
Aug 25 10:16:47 2003 (16302) smtp for 1 recips, completed in 1.400 seconds
Aug 25 10:16:47 2003 (16302) smtp for 1 recips, completed in 3.799 seconds
Aug 25 10:16:49 2003 (16302) smtp for 1 recips, completed in 1.968 seconds
Aug 25 10:16:49 2003 (16302) smtp for 1 recips, completed in 1.990 seconds

I'm guessing this would mean that there are not any orphaned locks.  When I
restart qrunner by hand, I kill the proces and remove all the locks from the
locks directory.

I thought the 3.799 seconds in the smtp log was qrunner telling you how long
it took to send the mail through the MTA.

- Original Message - 
From: "Richard Barrett" <[EMAIL PROTECTED]>
To: "Andrew Falgout" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Monday, August 25, 2003 9:52 AM
Subject: Re: [Mailman-Users] Mailman / Sendmail horribly
slowperformance..suggestions please?!?


>
> On Monday, August 25, 2003, at 02:28  pm, Andrew Falgout wrote:
>
> > The error log is at 0 bytes.  I have not overridden the Defaults.py
> > which
> > shows Defaults.py:MTA_ALIASES_STYLE = 'sendmail'.  So I assume that
> > means
> > that sendmail is the MTA.
> You will be using SMTP to the local MTA (which you say is Sendmail) if
> the following MM config variables are set as follows in either
> mm_cfg.py or the Defaults.py file. The following are th standard MM
> 2.0.13 default values:
>
> DELIVERY_MODULE = 'SMTPDirect'
> SMTPHOST = 'localhost'
>
> Also are the following default values still in place:
>
> SMTP_MAX_RCPTS = 500
> MAX_DELIVERY_THREADS = 0
>
> >
> > Aug 25 08:08:31 2003 (27798) smtp for 2 recips, completed in 1.045
> > seconds
> > Aug 25 08:08:31 2003 (27798) smtp for 2 recips, completed in 1.149
> > seconds
> > Aug 25 08:12:47 2003 (27798) smtp for 1 recips, completed in 0.208
> > seconds
> > Aug 25 08:12:47 2003 (27798) smtp for 1 recips, completed in 0.241
> > seconds
> > Aug 25 08:14:25 2003 (31184) smtp for 1 recips, completed in 0.701
> > seconds
> > Aug 25 08:14:25 2003 (31184) smtp for 1 recips, completed in 0.716
> > seconds
> > Aug 25 08:17:21 2003 (31184) smtp for 1 recips, completed in 0.167
> > seconds
> > Aug 25 08:17:21 2003 (31184) smtp for 1 recips, completed in 0.179
> > seconds
> > Aug 25 08:19:34 2003 (31184) smtp for 1 recips, completed in 3.962
> > seconds
> > Aug 25 08:19:34 2003 (31184) smtp for 1 recips, completed in 4.234
> > seconds
> >
> > This is just a few of the lines in my /var/log/mailman/smtp file.
> > There are
> > a few bounces from time to time, but not on great frequency.  I did
> > send an
> > email to a test list, and attempted to time when I would see my email
> > address in the post log, but after 10 minutes I have not seen it.  This
> > really confuses me, because the mail system on this server delivers
> > mail
> > directly to a mailbox in seconds.  Why the delay?
> >
> I am going to assume that you had not made the edit to
> $prefix/Mailman/Handlers/SMTPDirect.py I suggested in one of my
> response and that these log entries are the regular ones?
>
> Bear in mind that the number in brackets in these log entries - (27798)
> for instance - is the process id of the qurnner handling the message
> and the same qrunner is handling multiple messages.
>
> The smtp log entries are interesting if you have a large backlog of
> messages in the qfiles directory because:
>
> 1. only a few recipients are being delivered for each log entry. Are
> your mailing list that sparely populated for subscribers? Do these smtp
> log entries tie in with post entries for lists with few members and
> /var/log/mail entries for outgoing mail through the MTA?
>
> 2. quite long periods - 2, 3, 4 minutes in the sample log entries you
> give - elapse between log entries. But the log entries are for the same
> process so it has been continuing to run during these pregnant pauses.
> But what is it doing?
>
> Some thoughts. Mailman's qrunner can get slow if it is:
>
> a. archiving and
>
> b. the messages for a list are large messages and/or
>
> c.  the number of messages in the current archive of a list is large.
>
> On the other hand,

Re: [Mailman-Users] Mailman / Sendmail horriblyslowperformance..suggestions please?!?

2003-08-25 Thread Andrew Falgout
I've included some of the diagnostics that was requested.  As of this moment
there are 7423 messages in the qfiles directory for mailman.  And it
continues to grow.  Please let me know if my thinking is flawed on how mails
flow through the system.

1. Message to a list hits sendmail initially and goes into the mqueue
directory for sendmail for processing.
2. Sendmail delivers the message to the mailman wrapper which places the
message in the qfiles directory for qrunner to process.
3. qrunner picks up these messages one by one, and delivers the message to
each of the recpts listed for the list.

How does qrunner decide which email to process?  Is there a way to have it
process a certain message in the qfiles directory?  Are there any utilities
for determining what is in the queue or why it was queued by qrunner.

I really do appreciate all the help.

Yes.. I am using sendmail as my MTA.
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
220 ocean.otr.usm.edu ESMTP Sendmail 8.11.6/8.11.6; Mon, 25 Aug 2003
09:21:56 -0500

(ocean) root $ time host python.org
DNS lookups are almost instantaneous.
python.org has address 194.109.137.226
host python.org  0.04s user 0.01s system 97% cpu 0.051 total

(ocean) root $ dig -t python.org
(/var/spool/mailman/qfiles)
;; Warning, ignoring invalid type python.org

; <<>> DiG 9.2.1 <<>> -t python.org
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19709
;; flags: qr rd ra; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 13

;; QUESTION SECTION:
;.  IN  NS

;; ANSWER SECTION:
.   5402IN  NS  G.ROOT-SERVERS.NET.
.   5402IN  NS  H.ROOT-SERVERS.NET.
.   5402IN  NS  I.ROOT-SERVERS.NET.
.   5402IN  NS  J.ROOT-SERVERS.NET.
.   5402IN  NS  K.ROOT-SERVERS.NET.
.   5402IN  NS  L.ROOT-SERVERS.NET.
.   5402IN  NS  M.ROOT-SERVERS.NET.
.   5402IN  NS  A.ROOT-SERVERS.NET.
.   5402IN  NS  B.ROOT-SERVERS.NET.
.   5402IN  NS  C.ROOT-SERVERS.NET.
.   5402IN  NS  D.ROOT-SERVERS.NET.
.   5402IN  NS  E.ROOT-SERVERS.NET.
.   5402IN  NS  F.ROOT-SERVERS.NET.

;; ADDITIONAL SECTION:
A.ROOT-SERVERS.NET. 600537  IN  A   198.41.0.4
B.ROOT-SERVERS.NET. 600537  IN  A   128.9.0.107
C.ROOT-SERVERS.NET. 510409  IN  A   192.33.4.12
D.ROOT-SERVERS.NET. 510409  IN  A   128.8.10.90
E.ROOT-SERVERS.NET. 602151  IN  A   192.203.230.10
F.ROOT-SERVERS.NET. 510409  IN  A   192.5.5.241
G.ROOT-SERVERS.NET. 602151  IN  A   192.112.36.4
H.ROOT-SERVERS.NET. 602151  IN  A   128.63.2.53
I.ROOT-SERVERS.NET. 510408  IN  A   192.36.148.17
J.ROOT-SERVERS.NET. 602151  IN  A   192.58.128.30
K.ROOT-SERVERS.NET. 510408  IN  A   193.0.14.129
L.ROOT-SERVERS.NET. 510408  IN  A   198.32.64.12
M.ROOT-SERVERS.NET. 510409  IN  A   202.12.27.33

;; Query time: 3 msec
;; SERVER: 131.95.84.2#53(131.95.84.2)
;; WHEN: Mon Aug 25 09:24:48 2003
;; MSG SIZE  rcvd: 436



- Original Message - 
From: "Jon Carnes" <[EMAIL PROTECTED]>
To: "Andrew Falgout" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; "Richard Barrett" <[EMAIL PROTECTED]>
Sent: Monday, August 25, 2003 9:09 AM
Subject: Re: [Mailman-Users] Mailman / Sendmail horribly
slowperformance..suggestions please?!?


> The description of the problem fits right in with the primary DNS
> resolver not working and the server having to fall back to the secondary
> for each look up.
>
> That might not be your problem, but in anycase, you should run a
> local-caching Nameserver on your mailserver. Then check to see how long
> it takes to do dns lookups:
>   host -t mx some.host.com
>   dig -t mx some.host.com
>
> The "dig" command will tell you how long it took to get the dns
> information.  The "host" command will simply give you the information
> you requested - which in this case is a listing of the Mailservers for
> the domain some.host.com
>
> You might also want to tweak your sendmail configuration:
> http://www.trilug.org/~jonc/mailserver/PartIII.html
>
> BTW: the many MTA's use the "sendmail aliases style", the fact that
> Mailman is using it to communicate with your MTA does not insure that
> you are using Sendmail as your MTA.
>
> To find out which MTA you are using:
>   telnet localhost 25
>
> Read the header it displays.  It should indicate what service is acting
> as your MTA. As an example, here is what I get:
>  220 Anncons4.nc.rr.com ESMTP Sendmail 8.12.8/8.12.8; Mon, 25 Aug 2003
> 10:07:09 -0400
>
> As you can see, I'm run