Hello world, if the host running my Mailman installation is experiencing a high load (i.e., high CPU usage, many CPU cycles waiting for I/O), posts to some lists are sometimes sent out with about 1 in 300 subscribers missing.
First of all, I verified that the recipients who didn't receive posts didn't have mail delivery suspended - they didn't. Then I looked into Mailman logs, which yield the following entries: #v+ /var/log/mailman/post: Jun 19 11:13:07 2008 (3760) post to listname from [EMAIL PROTECTED] size=42101, message-id=<[EMAIL PROTECTED]>, success #v- #v+ /var/log/mailman/smtp: Jun 19 11:13:09 2008 (3760) <[EMAIL PROTECTED]> smtp to listname for 226 recips, completed in 1.179 seconds #v- Notice the 226 recipients and the long time it takes to submit the messages (1.179 seconds) - as I said, the system is somewhat "congested" at that time. Let's have a look at the corresponding Postfix log entries: #v+ Jun 19 11:13:07 mout03 postfix/smtpd[7065]: connect from localhost[127.0.0.1] Jun 19 11:13:07 mout03 postfix/smtpd[7065]: B251578003: client=localhost[127.0.0.1] Jun 19 11:13:07 mout03 postfix/cleanup[7062]: B251578003: message-id=<[EMAIL PROTECTED]> Jun 19 11:13:07 mout03 postfix/qmgr[3962]: B251578003: from=<[EMAIL PROTECTED]>, size=42602, nrcpt=22 (queue active) Jun 19 11:13:07 mout03 postfix/smtpd[7065]: B630678004: client=localhost[127.0.0.1] Jun 19 11:13:07 mout03 postfix/cleanup[7062]: B630678004: message-id=<[EMAIL PROTECTED]> Jun 19 11:13:07 mout03 postfix/qmgr[3962]: B630678004: from=<[EMAIL PROTECTED]>, size=42627, nrcpt=26 (queue active) Jun 19 11:13:07 mout03 postfix/smtpd[7065]: C041B78005: client=localhost[127.0.0.1] Jun 19 11:13:07 mout03 postfix/cleanup[7062]: C041B78005: message-id=<[EMAIL PROTECTED]> Jun 19 11:13:08 mout03 postfix/qmgr[3962]: C041B78005: from=<[EMAIL PROTECTED]>, size=42419, nrcpt=19 (queue active) Jun 19 11:13:08 mout03 postfix/smtpd[7065]: 2DD0D78003: client=localhost[127.0.0.1] Jun 19 11:13:08 mout03 postfix/cleanup[7062]: 2DD0D78003: message-id=<[EMAIL PROTECTED]> Jun 19 11:13:08 mout03 postfix/qmgr[3962]: 2DD0D78003: from=<[EMAIL PROTECTED]>, size=42153, nrcpt=158 (queue active) Jun 19 11:13:09 mout03 postfix/smtpd[7065]: disconnect from localhost[127.0.0.1] #v- Now, 22+26+19+158 equals 225 and not 226 - no rejected mails, no NOQUEUE entries. Either Postfix or Mailman is lying. How can I find out which one it is, aside from running ngrep/tcpdump? Which additional configuration data do I have to provide to aid in remote debugging this? Ciao Stefan -- Stefan Förster http://www.incertum.net/ Public Key: 0xBBE2A9E9 FdI #68: WWW - World Wide Waiting ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp