subject:"Re\: Huge active queue and system idle, not delivering"

Re: Huge active queue and system idle, not delivering

2010-01-11 Thread Wietse Venema

Patrick Chemla:
Wietse:
  OK, so you can turn back on that connection caching. Note that
  qmail creates and destroys two processes per SMTP session, so
  reusing a session is also a win from a CPU resource point of view.

Patrick:
 If I do so, will postfix open more than one connexion to each qmail for 
 parallel deliveries?

Of course. Connection caching is a performance IMPROVEMENT feature.

However, some qmail implementations are patched and turn on
TARPIT delays when the client sends many messages or recipients
over the same SMTP connection.

Wietse

Re: Huge active queue and system idle, not delivering

2010-01-10 Thread Patrick Chemla


Wietse,

Please try the following, as asked half a week ago:

 postconf -e smtp_connection_cache_on_demand=no
 postfix reload

and report if this makes a difference.
Wietse
 

I have tested this since yesterday night.

I got some problems with Linux per user number of processes limit. I 
fixed it. I also increased some delivery concurrency  figures, and now I 
can see up to 1300 processes delivering emails to the qmail servers.


I had a few minutes shot today at a rate of 6300 emails per minute. I 
ran a full hour at 180,000 emails per hour. The outbound line was saturated.


CPU is about 30% loaded, no Wait I/O, no swap, memory is large.

I think I will reach about 600,000 emails per hour if I fix some timeout 
on the qmails (replace by postfix?). Maybe I could reach 1 million?


The full architecture that I plan will include 2 to 3 clustered postfix 
relays and 50 2nd level qmails(or postfix) delivery servers, each with 3 
to 5 IP addresses, and upgraded outbound internet connection.


With your help, I better understand now the impact of timeout and 
concurrency parameters. In fact, delivery was blocked because postfix 
was trying to reuse connections, so was waiting each email to complete 
to send the next one. Also, because hundreds processes were created at 
start time to manage inbound messages, there were no slots to fork 
processes to deliver messages on the other hand. Same problem caused 
very slow DNS and EHLO, because no available slots to fork.


Of course, if you want me to post my conf, I will with pleasure.

Many thanks to you, to Victor and Stan.

Patrick

Re: Huge active queue and system idle, not delivering

2010-01-10 Thread Stan Hoeppner

Patrick Chemla put forth on 1/10/2010 3:00 PM:
 Wietse,
 Please try the following, as asked half a week ago:

  postconf -e smtp_connection_cache_on_demand=no
  postfix reload

 and report if this makes a difference.
 Wietse
  
 I have tested this since yesterday night.
 
 I got some problems with Linux per user number of processes limit. I
 fixed it. I also increased some delivery concurrency  figures, and now I
 can see up to 1300 processes delivering emails to the qmail servers.
 
 I had a few minutes shot today at a rate of 6300 emails per minute. I
 ran a full hour at 180,000 emails per hour. The outbound line was
 saturated.
 
 CPU is about 30% loaded, no Wait I/O, no swap, memory is large.
 
 I think I will reach about 600,000 emails per hour if I fix some timeout
 on the qmails (replace by postfix?). Maybe I could reach 1 million?
 
 The full architecture that I plan will include 2 to 3 clustered postfix
 relays and 50 2nd level qmails(or postfix) delivery servers, each with 3
 to 5 IP addresses, and upgraded outbound internet connection.
 
 With your help, I better understand now the impact of timeout and
 concurrency parameters. In fact, delivery was blocked because postfix
 was trying to reuse connections, so was waiting each email to complete
 to send the next one. Also, because hundreds processes were created at
 start time to manage inbound messages, there were no slots to fork
 processes to deliver messages on the other hand. Same problem caused
 very slow DNS and EHLO, because no available slots to fork.
 
 Of course, if you want me to post my conf, I will with pleasure.
 
 Many thanks to you, to Victor and Stan.
 
 Patrick

On a technical level I'm happy you got it working.  Just please tell us you're
not sending mass spam with this setup.

--
Stan

Re: Huge active queue and system idle, not delivering

2010-01-10 Thread Wietse Venema

Patrick Chemla:
 Wietse,
  Please try the following, as asked half a week ago:
 
   postconf -e smtp_connection_cache_on_demand=no
   postfix reload
 
  and report if this makes a difference.
 Wietse
   
 I have tested this since yesterday night.
 
 I got some problems with Linux per user number of processes limit. I 
 fixed it. I also increased some delivery concurrency  figures, and now I 
 can see up to 1300 processes delivering emails to the qmail servers.
 
 I had a few minutes shot today at a rate of 6300 emails per minute. I 
 ran a full hour at 180,000 emails per hour. The outbound line was saturated.
 
 CPU is about 30% loaded, no Wait I/O, no swap, memory is large.
 
 I think I will reach about 600,000 emails per hour if I fix some timeout 
 on the qmails (replace by postfix?). Maybe I could reach 1 million?

OK, so you can turn back on that connection caching. Note that
qmail creates and destroys two processes per SMTP session, so
reusing a session is also a win from a CPU resource point of view.

1M/hour, or less than 300/s, should be possible (you mention the
queue is on a solid-state disk). Barring brain damage such as
synchronous syslogging by default on some Linux boxes, borked DNS,
process/file/etc. resource limits, etc.

Perhaps this is a good time to mention that SecurityFocus was
ezmlm-qmail based, and that they switched to Postfix for outbound
deliveries, because qmail simply could not keep up with the volume:

inbound mail - qmail - ezmlm - multiple postfix MTAs - internet

That was 2001 when I added QMQP support to Postfix, and this is
still what they appear to be using now, if I must believe their
own Received:  message headers.

Received: from lists2.securityfocus.com (lists2.securityfocus.com 
[205.206.231.20])
by outgoing2.securityfocus.com (Postfix) with QMQP
id 8AC0814370A; Thu,  7 Jan 2010 14:11:35 -0700 (MST)

My very first qmail/Postfix benchmarks showed that qmail was up to
three times slower as a transit MTA, simply because qmail creates
three queue files where Postfix creates one. Creating/deleting
files involves more disk access operations than reading/writing
files, and that hurts especially with small email messages.

Wietse

Re: Huge active queue and system idle, not delivering

2010-01-10 Thread Patrick Chemla


Le 10/01/2010 23:58, Stan Hoeppner a écrit :

On a technical level I'm happy you got it working.  Just please tell us you're
not sending mass spam with this setup.

--
Stan
   


I have to do it for a customer who send as he said, only opt-in mass 
emails. He has a big blacklisted email database where he keeps all 
unsubscribe messages. He said he has the right filters not to send 
unwanted emails.




Thanks
Patrick

Re: Huge active queue and system idle, not delivering

2010-01-10 Thread Patrick Chemla


Le 11/01/2010 01:13, Wietse Venema a écrit :

Patrick Chemla:
   

Wietse,
 

Please try the following, as asked half a week ago:

  postconf -e smtp_connection_cache_on_demand=no
  postfix reload

and report if this makes a difference.
Wietse

 

I have tested this since yesterday night.

I got some problems with Linux per user number of processes limit. I
fixed it. I also increased some delivery concurrency  figures, and now I
can see up to 1300 processes delivering emails to the qmail servers.

I had a few minutes shot today at a rate of 6300 emails per minute. I
ran a full hour at 180,000 emails per hour. The outbound line was saturated.

CPU is about 30% loaded, no Wait I/O, no swap, memory is large.

I think I will reach about 600,000 emails per hour if I fix some timeout
on the qmails (replace by postfix?). Maybe I could reach 1 million?
 

OK, so you can turn back on that connection caching. Note that
qmail creates and destroys two processes per SMTP session, so
reusing a session is also a win from a CPU resource point of view.



Wietse
   
If I do so, will postfix open more than one connexion to each qmail for 
parallel deliveries?
I am afraid that if we use connection caching this will create a single 
queue on each qmail. As far as I have available resources, I think 
prefer parallel deliveries.


Patrick

Re: Huge active queue and system idle, not delivering

2010-01-10 Thread Stan Hoeppner

Patrick Chemla put forth on 1/11/2010 1:02 AM:
 Le 10/01/2010 23:58, Stan Hoeppner a écrit :
 On a technical level I'm happy you got it working.  Just please tell
 us you're
 not sending mass spam with this setup.

 -- 
 Stan

 
 I have to do it for a customer who send as he said, only opt-in mass
 emails. He has a big blacklisted email database where he keeps all
 unsubscribe messages. He said he has the right filters not to send
 unwanted emails.

Sigh...  This doesn't pass the sniff test.  I fear we've helped enable the
sending of mass UBE.  Patrick would you mind providing the IP netblock(s) you
will be sending these mass mailings from?  Or provide them to me off list
please?  Thanks.

--
Stan

Re: Huge active queue and system idle, not delivering

2010-01-10 Thread Patrick Chemla


Le 11/01/2010 09:27, Stan Hoeppner a écrit :

Patrick Chemla put forth on 1/11/2010 1:02 AM:
   

Le 10/01/2010 23:58, Stan Hoeppner a écrit :
 

On a technical level I'm happy you got it working.  Just please tell
us you're
not sending mass spam with this setup.

--
Stan

   

I have to do it for a customer who send as he said, only opt-in mass
emails. He has a big blacklisted email database where he keeps all
unsubscribe messages. He said he has the right filters not to send
unwanted emails.
 

Sigh...  This doesn't pass the sniff test.  I fear we've helped enable the
sending of mass UBE.  Patrick would you mind providing the IP netblock(s) you
will be sending these mass mailings from?  Or provide them to me off list
please?  Thanks.

--
Stan
   
Don't be afraid Stan. They work only on french market, maybe also on 
french people who have a mailbox overseas. You have very very very low 
chance to be concerned.

Patrick

Re: Huge active queue and system idle, not delivering

2010-01-09 Thread Patrick Chemla


Hi,

I will try all your advises, but something still very strange for me:

We see that postfix logs show that ehlo process is very slow through 
postfix but very fast by hand. Even I have recorded through 
tcpdump/WireShark and I can see that messages are sent very very very 
quickly in about 1 second.


But still messages are sent at a rate of a dozen in 10 seconds. That 
means that messages are sent 1 by one.


If connexion to qmail servers are slow, or if qmails are mis-parameted, 
too slow or anything else, When I do netstat -apn |grep :25 I get only a 
few connexions from postfix server to qmail servers. Even if DNS+EHLO 
are slow, and more, because DNS+EHLO seem to be slow, why I don't see 
hundreds TCP connexions ESTABLISHED ?


I expected that postfix will deliver on 30 qmail servers at the same 
time, and should manage hundreds parallel deliveries, hundreds parallel 
connexions. Is there some parameter or some conception rule that refrain 
him to do so?


I expected that postfix will full up his own CPU/memory creating these 
parallel delivery processes or/and will wait after the qmail servers, 
but on all servers at the same time, on multiple connections to each one.


Am I correct ? or I am dreaming of another mail transport package?

Patrick

Re: Huge active queue and system idle, not delivering

2010-01-09 Thread Patrick Chemla


Hi all,

I got these statistics:

Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: start 
interval Jan  9 19:09:03
Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: domain lookup 
hits=110 miss=89 success=55%
Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: address 
lookup hits=0 miss=2492 success=0%
Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: max 
simultaneous domains=1 addresses=4 connection=4



What means miss=89 success=55%, miss=2492 success=0%?

Thanks
Patrick

Re: Huge active queue and system idle, not delivering

2010-01-09 Thread Stan Hoeppner

Patrick Chemla put forth on 1/9/2010 11:17 AM:
 Hi all,
 
 I got these statistics:
 
 Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: start
 interval Jan  9 19:09:03
 Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: domain lookup
 hits=110 miss=89 success=55%
 Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: address
 lookup hits=0 miss=2492 success=0%
 Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: max
 simultaneous domains=1 addresses=4 connection=4
 
 
 What means miss=89 success=55%, miss=2492 success=0%?

http://www.postfix.com/CONNECTION_CACHE_README.html

--
Stan

Re: Huge active queue and system idle, not delivering

2010-01-09 Thread Patrick Chemla


Hi Stan,

Thanks for your interest.

Le 09/01/2010 20:21, Stan Hoeppner a écrit :

Patrick Chemla put forth on 1/9/2010 11:17 AM:
   

Hi all,

I got these statistics:

Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: start
interval Jan  9 19:09:03
Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: domain lookup
hits=110 miss=89 success=55%
Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: address
lookup hits=0 miss=2492 success=0%
Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: max
simultaneous domains=1 addresses=4 connection=4


What means miss=89 success=55%, miss=2492 success=0%?
 

http://www.postfix.com/CONNECTION_CACHE_README.html

   
I wen t there but did not find explanations about miss address lookup or 
miss domain lookup.
While I have 122,000 messages in active queue I still don't understand 
why statistics show max simultaneous domains=1. It should be dozens , or 
hundreds.


Patrick


--
Stan

Re: Huge active queue and system idle, not delivering

2010-01-09 Thread Stan Hoeppner

Patrick Chemla put forth on 1/9/2010 11:07 AM:
 Hi,
 
 I will try all your advises, but something still very strange for me:
 
 We see that postfix logs show that ehlo process is very slow through
 postfix but very fast by hand. Even I have recorded through
 tcpdump/WireShark and I can see that messages are sent very very very
 quickly in about 1 second.
 
 But still messages are sent at a rate of a dozen in 10 seconds. That
 means that messages are sent 1 by one.
 
 If connexion to qmail servers are slow, or if qmails are mis-parameted,
 too slow or anything else, When I do netstat -apn |grep :25 I get only a
 few connexions from postfix server to qmail servers. Even if DNS+EHLO
 are slow, and more, because DNS+EHLO seem to be slow, why I don't see
 hundreds TCP connexions ESTABLISHED ?

This behavior is likely a result of the connection cache:
http://www.postfix.com/CONNECTION_CACHE_README.html

If one has a large amount of mail destined for a single host, it is inefficient
to open dozens or hundreds of TCP connections and SMTP connections due to the
additional overhead of process/thread count and memory consumption.  It is much
more efficient to pipeline all the mail through a single connection.  One can
only pump so many bits down the wire between two hosts.  If you can fill the
pipe to near capacity with one TCP/SMTP stream, why open 100s of connections to
do the same?  I believe this is why you are not seeing dozens or hundreds of TCP
connections.  Postfix is intelligently designed to avoid this inefficiency.

 I expected that postfix will deliver on 30 qmail servers at the same
 time, and should manage hundreds parallel deliveries, hundreds parallel
 connexions. Is there some parameter or some conception rule that refrain
 him to do so?
 
 I expected that postfix will full up his own CPU/memory creating these
 parallel delivery processes or/and will wait after the qmail servers,
 but on all servers at the same time, on multiple connections to each one.
 
 Am I correct ? or I am dreaming of another mail transport package?
 
 Patrick

As Victor and others have already stated:

1.  In your previous configuration, you had multiple thousands of unique IP
addresses (your customers) connecting directly to your 30 qmail servers to relay
their mail.  qmail performed fine with this configuration because no one qmail
server was seeing thousands of delivery attempts per minute from any one single
IP address.

2.  In your current Postfix configuration, your qmail servers are seeing a
single unique IP address attempting to send multiple thousands of messages per
minute, and qmail is reacting with rate limiting countermeasures because of 
this.

You need to figure out what settings in the qmail configuration are controlling
this rate throttling and in what way.  Once you find this and change it, you
should see a dramatic improvement in Postfix's ability to quickly move the mail
out of the queue to the 30 qmail servers, most likely using a single or only a
few TCP connections to each qmail server.

--
Stan

Re: Huge active queue and system idle, not delivering

2010-01-09 Thread Stan Hoeppner

Patrick Chemla put forth on 1/9/2010 12:37 PM:

 I wen t there but did not find explanations about miss address lookup or
 miss domain lookup.
 While I have 122,000 messages in active queue I still don't understand
 why statistics show max simultaneous domains=1. It should be dozens , or
 hundreds.

Those are statistics relating to scache performance.  It tells you how many
domains or addresses were able to be delivered via scache reuse.  I.e. how many
emails Postfix was able to send through an already open SMTP connection to a
given host.

Since all of your qmail hosts are configured identically, and should be able to
relay mail bound for any destination on the internet, you should never see
anything less than ~100% in those statistics, _unless_ there is some other kind
of problem.

If your qmail servers are rate limiting via any method, and Postfix is
attempting to send 2000 emails per minute down that one SMTP connection, when
qmail blocks individual deliveries for any reason, those scache failure
statistics will increase.

--
Stan

Re: Huge active queue and system idle, not delivering

2010-01-09 Thread Patrick Chemla


Le 09/01/2010 20:54, Stan Hoeppner a écrit :

Patrick Chemla put forth on 1/9/2010 12:37 PM:

   

I wen t there but did not find explanations about miss address lookup or
miss domain lookup.
While I have 122,000 messages in active queue I still don't understand
why statistics show max simultaneous domains=1. It should be dozens , or
hundreds.
 

Those are statistics relating to scache performance.  It tells you how many
domains or addresses were able to be delivered via scache reuse.  I.e. how many
emails Postfix was able to send through an already open SMTP connection to a
given host.

Since all of your qmail hosts are configured identically, and should be able to
relay mail bound for any destination on the internet, you should never see
anything less than ~100% in those statistics, _unless_ there is some other kind
of problem.

   


You mean 100% success?

If your qmail servers are rate limiting via any method, and Postfix is
attempting to send 2000 emails per minute down that one SMTP connection, when
qmail blocks individual deliveries for any reason, those scache failure
statistics will increase.

   
Before I set up the postfix relay to load balance between 30 qmail 
servers, each of them was able to accept in his own queue hundreds 
thousands email. Email were sent by campaigns of thousands balanced on 3 
qmails servers, each one full in CPU/memory working hard to deliver.


Instead of sending each campaign on only 3 qmails, I though that by 
sending each campaign on 30 qmails I will cut each one load by ten and 
speed up deliveries. But now, postfix is retaining the emails in his own 
queue, not pushing the queue down to the qmails.


Postfix server and qmail servers are all about 90%cpu free. only 1 to 9 
connexions exist at a time from postfix to qmails.


This is exactly what I would like to append: Instead of a queue of 
122,000 on postfix, I expect to have each qmail with a queue of 4000.


Qmails did this before I set up postfix.

Patrick


--
Stan

Re: Huge active queue and system idle, not delivering

2010-01-09 Thread Stan Hoeppner

Patrick Chemla put forth on 1/9/2010 1:08 PM:

 You mean 100% success?

Yes.

 Before I set up the postfix relay to load balance between 30 qmail
 servers, each of them was able to accept in his own queue hundreds
 thousands email. Email were sent by campaigns of thousands balanced on 3
 qmails servers, each one full in CPU/memory working hard to deliver.
 
 Instead of sending each campaign on only 3 qmails, I though that by
 sending each campaign on 30 qmails I will cut each one load by ten and
 speed up deliveries. But now, postfix is retaining the emails in his own
 queue, not pushing the queue down to the qmails.

An admiral technical goal.  Can you elaborate on these campaigns?  You said
previously that you had hundreds of thousands of customers whose email you were
relaying, as if you are an ISP.  Now you are saying the mail load is generated
by campaigns.  What exactly are these campaigns?

 Postfix server and qmail servers are all about 90%cpu free. only 1 to 9
 connexions exist at a time from postfix to qmails.

This is because the qmail servers won't let the postfix server send any faster.
 We've been over this mulitple times now.  Multiple people have told you the
same thing.  For this to work correctly, you need to figure out why the qmail
servers are rate limiting the postfix server deliveries.

 This is exactly what I would like to append: Instead of a queue of
 122,000 on postfix, I expect to have each qmail with a queue of 4000.
 
 Qmails did this before I set up postfix.

All MTAs have unique performance characteristics.  You've changed one of the
MTAs in your architecture.  Now you must re-tune your qmail farm servers to work
with the new MTA, postfix, which you have introduced.

This is kinda IT 101 stuff.  You can't automatically assume the problem lies
with the new thing you introduced.  Often, the new thing exposes problems or
weaknesses that already existed in the old stuff.

--
Stan

Re: Huge active queue and system idle, not delivering

2010-01-09 Thread Wietse Venema

Patrick Chemla:
 Hi all,
 
 I got these statistics:
 
 Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: start 
 interval Jan  9 19:09:03
 Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: domain lookup 
 hits=110 miss=89 success=55%
 Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: address 
 lookup hits=0 miss=2492 success=0%
 Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: max 
 simultaneous domains=1 addresses=4 connection=4

Please try the following, as asked half a week ago:

postconf -e smtp_connection_cache_on_demand=no
postfix reload

and report if this makes a difference.

Wietse

Re: Huge active queue and system idle, not delivering

2010-01-09 Thread Wietse Venema

Wietse Venema:
 Patrick Chemla:
  Hi all,
  
  I got these statistics:
  
  Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: start 
  interval Jan  9 19:09:03
  Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: domain lookup 
  hits=110 miss=89 success=55%
  Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: address 
  lookup hits=0 miss=2492 success=0%
  Jan  9 19:15:21 postfix postfix/scache[18038]: statistics: max 
  simultaneous domains=1 addresses=4 connection=4
 
 Please try the following, as asked half a week ago:
 
 postconf -e smtp_connection_cache_on_demand=no
 postfix reload
 
 and report if this makes a difference.

Oh, and please limit the discussion to people who understand the
hard technical internals of Postfix.  Other people please stay out
of the way.

Wietse

Re: Huge active queue and system idle, not delivering

2010-01-08 Thread Patrick Chemla


Le 08/01/2010 03:03, Wietse Venema a écrit :

Patrick Chemla:
   

But the CPU of the box is idle more than 80%. It is clear that it is not a
matter of CPU, nor memory, nor disk. Something in the number of
processes/users/simultaneous tasks is blocking.
 

Indeed, the symptom of blocking is in the third field of
the Postfix delays logging.

The format of the delays=a/b/c/d logging is as follows:

o  a = time from message arrival to last active queue entry

o  b = time from last active queue entry to connection setup

o  c = time in connection setup, including DNS, EHLO and TLS

o  d = time in message transmission

In your case, it takes a minute or more to set up the connection
including DNS lookup and EHLO handshake. That is holding up your mail.

- Check if the qmail servers are responsive (telnet hostname 25).

   
qmail are responsive. I made some arrangements to my DNS. DNS is better 
now, but the connexion is still very slow. I saw this morning c=285.

- Check if your Postfix needs a /var/spool/postfix/etc/resolv.conf
   file, and if that file is consistent with /etc/resolv.conf. If
   Postfix needs /var/spool/postfix/etc/resolv.conf and the file
   is missong or contains a bogus server that will add time to
   your deliveries.

   

Hi Wietse,
How do I know if  Postfix needs a /var/spool/postfix/etc/resolv.conf
directory /var/spool/postfix/etc doesn't exist.



- If they aren't, increase the concurrency on the qmail side.

   

conccurency =100. It's already a large number. I can increase it.

Wietse
   

Thanks
Patrick

Re: Huge active queue and system idle, not delivering

2010-01-08 Thread Patrick Chemla


Le 08/01/2010 00:43, Victor Duchovni a écrit :

On Fri, Jan 08, 2010 at 12:30:34AM +0200, Patrick Chemla wrote:

   

Jan  7 22:02:57 postfix postfix/qmgr[26441]: 5B91F873F6: removed
Jan  7 22:02:57 postfix postfix/smtp[27180]: 375DDD5923:
to=lexoti...@gmail.com, relay=a139.localpc2105.com[10.0.0.139]:25,
conn_use=59, delay=61550, delays=17019/44435/96/0.17, dsn=2.0.0,
status=sent (250 ok 1262894577 qp 12113)
 

This recipient does not match the destination that is clogging the
queue. Is the queue clogged with postmaster notices. I never enable
any postmaster notices, they don't scale.

notify_classes =

   

done, no change.

This said, the 96 seconds of connection setup latency is an obvious and
severe problem. Why on earth does it take 96 seconds to complete a HELO
handshake with a139.localpcc2105.com? You are not going to get much
mail out if each delivery takes 96 seconds...

Is your Postfix server's IP address resolvable on the qmail systems?
   

Should it be? qmail accept all RELAY CLIENT from local network.

Are they doing some sort of pre-banner delay? ...


   

When I do  telnet a139.localpc2105.com 25, I get immediate response.

Jan  7 22:02:58 postfix postfix/smtp[27070]: 7F0F2943B3:
to=gpo...@wanadoo.fr, relay=a70.localpc2105.com[10.0.0.70]:25,
conn_use=10, delay=73795, delays=29264/44481/50/0.21, dsn=2.0.0,
status=sent (250 ok 1262894577 qp 23067)
 

Once again, 50 seconds is severely crippled.

   

When I telnet a70.localpc2105.com 25 I get an immediate response.

I have checked my local DNS. There were some troubles, and I made some 
improvements. I have now 2 local caching DNS respawning fast. All qmail 
servers addresses are in the postfix /etc/hosts to avoid Ip lookup.
I have checked qmails servers, nothing has changed since they were able 
to have a queue of 200,000 messages, but they have now a few hundreds only.
I have calculated average times to complete HELO. All qmails are in the 
same kind of value around 2 minutes. Not any one is better than others. 
Again, each was handling a queue of hundreds thousands before I set up 
the postfix relay to load balance.

I really don't have a clue. I don't know where to look.

Jan  7 22:02:58 postfix postfix/smtp[27050]: 32BB182182:
to=gmarin-jardins-lois...@wanadoo.fr,
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=48, delay=73799,
delays=29268/44466/65/0.28, dsn=2.0.0, status=sent (250 ok 1262894578 qp
12121)
 

This is enough. Fix this.

   

How I can fix it if it works fine through telnet?

Where are the deliveries to the clogged destination???

   

Sorry, I don't understand this question. Please be clear.

Patrick

Re: Huge active queue and system idle, not delivering

2010-01-08 Thread Wietse Venema

Patrick Chemla:
[ Charset ISO-8859-1 unsupported, converting... ]
 Le 08/01/2010 00:43, Victor Duchovni a ?crit :
  On Fri, Jan 08, 2010 at 12:30:34AM +0200, Patrick Chemla wrote:
 
 
  Jan  7 22:02:57 postfix postfix/qmgr[26441]: 5B91F873F6: removed
  Jan  7 22:02:57 postfix postfix/smtp[27180]: 375DDD5923:
  to=lexoti...@gmail.com, relay=a139.localpc2105.com[10.0.0.139]:25,
  conn_use=59, delay=61550, delays=17019/44435/96/0.17, dsn=2.0.0,
 ^^^

Note that this connection has been reused multiple times (see below
for what this means in Postfix).

Why does it take 69 seconds to initialize a reused SMTP connection?

What happens when you set

smtp_connection_cache_on_demand=no 

in main.cf (and do postfix reload)?

If this makes a difference, then

a) you have a problem with smtp-scache communication.

b) qmail does not like RSET commands

c) Your machine is running low on memory and swapping out the scache
process.

d) something else.

Wietse

Under high load, smtp(8) processes give their open connections to
scache(8). Later, they ask scache(8) for an open connection to a
specific destination.  Once an smtp(8) client retrieves an open
connection, it sends RSET tothe remote server and waits for a 250
reply (i.e. the server is still happy). According to the logfile
record this lookup/rset/reply sequence is taking 96 seconds.

Re: Huge active queue and system idle, not delivering

2010-01-08 Thread Stan Hoeppner


On Fri, 08 Jan 2010 15:24:25 +0200, Patrick Chemla

 When I telnet a70.localpc2105.com 25 I get an immediate response.

I assume you are telnet'ing from the Postfix server with the queue delay
problem.  At this point, after you receive the 220, type:

ehlo your.postfix-server.tld enter

and time the delay of the 250 responses.  Continue to do a complete manual
mail transaction through telnet, and time each smtp command completion
(wall clock is fine).  Post the results here please.

--
Stan

Re: Huge active queue and system idle, not delivering

2010-01-08 Thread Wietse Venema

Wietse Venema:
 Patrick Chemla:
  Le 08/01/2010 00:43, Victor Duchovni a ?crit :
   On Fri, Jan 08, 2010 at 12:30:34AM +0200, Patrick Chemla wrote:
  
  
   Jan  7 22:02:57 postfix postfix/qmgr[26441]: 5B91F873F6: removed
   Jan  7 22:02:57 postfix postfix/smtp[27180]: 375DDD5923:
   to=lexoti...@gmail.com, relay=a139.localpc2105.com[10.0.0.139]:25,
   conn_use=59, delay=61550, delays=17019/44435/96/0.17, dsn=2.0.0,
  ^^^
 
 Note that this connection has been reused multiple times (see below
 for what this means in Postfix).
 
 Why does it take 69 seconds to initialize a reused SMTP connection?
 
 What happens when you set
 
 smtp_connection_cache_on_demand=no 
 
 in main.cf (and do postfix reload)?
 
 If this makes a difference, then

Check your qmail configuration for tarpit options. There exist
qmail patches that will slow down the qmail SMTP server when the
client sends lots of email, or lots of recipients.

Wietse

 a) you have a problem with smtp-scache communication.
 
 b) qmail does not like RSET commands
 
 c) Your machine is running low on memory and swapping out the scache
 process.
 
 d) something else.
 
   Wietse
 
 Under high load, smtp(8) processes give their open connections to
 scache(8). Later, they ask scache(8) for an open connection to a
 specific destination.  Once an smtp(8) client retrieves an open
 connection, it sends RSET tothe remote server and waits for a 250
 reply (i.e. the server is still happy). According to the logfile
 record this lookup/rset/reply sequence is taking 96 seconds.

Re: Huge active queue and system idle, not delivering

2010-01-08 Thread Victor Duchovni

On Fri, Jan 08, 2010 at 03:24:25PM +0200, Patrick Chemla wrote:

 When I do  telnet a139.localpc2105.com 25, I get immediate response.

What does response mean? Immediate connection completion means
nothing. Do you get a 220 banner right away? Do you get all of
it or just the first line in a multi-line banner, with the
rest arriving later?

 I have checked my local DNS. There were some troubles, and I made some 
 improvements. I have now 2 local caching DNS respawning fast. All qmail 
 servers addresses are in the postfix /etc/hosts to avoid Ip lookup.
 I have checked qmails servers, nothing has changed since they were able to 
 have a queue of 200,000 messages, but they have now a few hundreds only.
 I have calculated average times to complete HELO. All qmails are in the 
 same kind of value around 2 minutes.

Why the heck does it take 2 minutes to respond to HELO??? There's
your problem. Fix the qmail servers, to ensure that it takes a few
milliseconds to respond to HELO. Right now your HELO response is 3-4
orders of magnitude too slow.

Fix, probably involves as much removal of tweaks that are
counter-productive as adding specific changes to address the problem
(disable any rate controls that deliberately slow the sender, or
constrain resources).

Start debugging on the qmail side, find out what it is doing for 2
minutes...

-- 
Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the Reply-To header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:
mailto:majord...@postfix.org?body=unsubscribe%20postfix-users

If my response solves your problem, the best way to thank me is to not
send an it worked, thanks follow-up. If you must respond, please put
It worked, thanks in the Subject so I can delete these quickly.

Re: Huge active queue and system idle, not delivering

2010-01-07 Thread Wietse Venema

Patrick Chemla:
 Hi,
 
 I am running Postfix 2.5.6 on a Fedora 11 Linux system on a hardware 
 based Intel  I5/750 Quad Core, 8 Gb memory, 160Gb SSD hard disk.
 
 Incoming messages are entering very fast (500 smtp processes declared) 
 and the active queue is actually of 2 millions messages waiting for 
 delivery.
 
 The delivery, for all messages should go through a farm of 30 MX servers 
 from domain localpc2105.com, on load balancing through DNS resolution. 
 DNS server is of course local. All 30 MX servers are running qmail. All 
 of them are more than 90% idle. Before I set up my postfix server, email 
 were sent directly to the qmail servers, and qmail was running at full 
 CPU. So I am sure that qmail can handle much more faster. I have set up 
 the postfix server to load balance the load between all the 30 qmail 
 servers to avoid situation where some were running at full charge and 
 others were not working.

http://www.postfix.org/DEBUG_README.html#logging

Wietse

Re: Huge active queue and system idle, not delivering

2010-01-07 Thread Barney Desmond

2010/1/8 Patrick Chemla patrick.che...@perfaction.net
 Incoming messages are entering very fast (500 smtp processes declared) and
 the active queue is actually of 2 millions messages waiting for delivery.
 snip
 here is my main.cf file:

That's some very thorough information, you've provided plenty of
context and clear description, which is great. While I lack sufficient
knowledge to provide thoughts on the bottlenecking, I *do* expect that
people will want to see the output of `postconf -n`, instead of your
main.cf (to ensure we see what postfix actually sees and uses).

Can you clarify what you mean by 500 smtp processes declared? A
sample output from qshape also wouldn't go astray either
(http://www.postfix.org/qshape.1.html). You're provided some
proportional figures (percentages), but some solid throughput numbers
would be good too. Eg. We're injecting 2 million messages to the
postfix box, we expect to enqueue them in X hrs, but it takes Y hrs,
and they're only leaving the postfix box at Z messages/sec. I see you
said I just found that Postfix could send 1 million emails per hour
when I send less than a half million in 24 hours, but I can't make
sense of that, sorry.

Re: Huge active queue and system idle, not delivering

2010-01-07 Thread Patrick Chemla


Le 07/01/2010 20:03, Barney Desmond a écrit :

2010/1/8 Patrick Chemlapatrick.che...@perfaction.net
   

Incoming messages are entering very fast (500 smtp processes declared) and
the active queue is actually of 2 millions messages waiting for delivery.
snip
here is my main.cf file:
 

That's some very thorough information, you've provided plenty of
context and clear description, which is great. While I lack sufficient
knowledge to provide thoughts on the bottlenecking, I *do* expect that
people will want to see the output of `postconf -n`, instead of your
main.cf (to ensure we see what postfix actually sees and uses).

   

Here is postconf -n
alias_database = hash:/etc/aliases
alias_maps = hash:/etc/aliases
command_directory = /usr/sbin
config_directory = /etc/postfix
daemon_directory = /usr/libexec/postfix
data_directory = /var/lib/postfix
debug_peer_level = 8
debug_peer_list = orange.fr
default_delivery_slot_cost = 30
default_delivery_slot_discount = 100
default_destination_concurrency_failed_cohort_limit = 10
default_destination_concurrency_limit = 500
default_destination_recipient_limit = 200
default_minimum_delivery_slots = 30
default_process_limit = 1000
default_recipient_limit = 200
html_directory = no
inet_interfaces = all
inet_protocols = all
initial_destination_concurrency = 100
lmtp_destination_concurrency_limit = $default_destination_concurrency_limit
local_destination_concurrency_limit = 50
local_destination_recipient_limit = 500
mail_owner = postfix
mailbox_size_limit = 512000
mailq_path = /usr/bin/mailq.postfix
manpage_directory = /usr/share/man
max_use = 1000
mime_nesting_limit = 100
mydestination = $myhostname, localhost.$mydomain, localhost
mydomain = localpc2105.com
myhostname = postfix.proacti5.net
mynetworks = 172.27.27.0/24, 10.0.0.0/24, 127.0.0.0/24
newaliases_path = /usr/bin/newaliases.postfix
qmgr_fudge_factor = 200
qmgr_message_active_limit = 200
qmgr_message_recipient_limit = 200
queue_directory = /var/spool/postfix
queue_file_attribute_count_limit = 250
readme_directory = /usr/share/doc/postfix-2.5.6/README_FILES
relay_destination_concurrency_limit = $default_destination_concurrency_limit
relayhost = $mydomain
sample_directory = /usr/share/doc/postfix-2.5.6/samples
sendmail_path = /usr/sbin/sendmail.postfix
setgid_group = postdrop
smtp_connect_timeout = 10s
smtp_data_done_timeout = 10s
smtp_destination_concurrency_limit = $default_destination_concurrency_limit
smtp_mail_timeout = 5s
smtpd_history_flush_threshold = 100
smtpd_junk_command_limit = 100
smtpd_peername_lookup = no
unknown_local_recipient_reject_code = 550



Can you clarify what you mean by 500 smtp processes declared? A
sample output from qshape also wouldn't go astray either
(http://www.postfix.org/qshape.1.html).
Here is qshape:  T  5 10 20 40 80  160   320   640 
1280 1280+
 TOTAL 133000  0  0  0  0  0 2470 40538 80844 
7167  1981
wanadoo.fr  61955  0  0  0  0  0 2469 26830 31340 
126056
 orange.fr   4171  0  0  0  0  00  1176  2144  
540   311
 skynet.be   3286  0  0  0  0  00 1  
32840 1
  aliceadsl.fr   3259  0  0  0  0  0054  3169   
2511
   aol.com   3150  0  0  0  0  00  1545  1524   
4041
   free.fr   2138  0  0  0  0  00   453  1561   
8935
sfr.fr840  0  0  0  0  0023   
8161 0
hotmail.fr679  0  0  0  0  00   150   420   
1297
telenet.be658  0  0  0  0  00 0   
6580 0
 gmail.com358  0  0  0  0  00   157   145   
1145
   hotmail.com325  0  0  0  0  0044   220   
2041
   neuf.fr252  0  0  0  0  0062   176   
14 0
9online.fr250  0  0  0  0  00 6   
2440 0
   cegetel.net195  0  0  0  0  0026   
155410
   laposte.net183  0  0  0  0  005193   
1524
  swing.be141  0  0  0  0  00 2   
1390 0
  9business.fr111  0  0  0  0  0023
853 0
sonepar.fr107  0  0  0  0  0033
722 0
axa.fr103  0  0  0  0  0030
671 5


most of the messages stay in the queue for hours.



You're provided some
proportional figures (percentages), but some solid throughput numbers
would be good too. Eg. We're injecting 2 million messages to the
postfix box, we expect to enqueue them in X hrs, but it takes Y hrs,
and they're only leaving the postfix box at Z messages/sec. I see you
said I just found that Postfix could send 1 million emails per hour
when I send less than a half million in 24 hours, but I can't make
sense of

Re: Huge active queue and system idle, not delivering

2010-01-07 Thread Patrick Chemla


Le 07/01/2010 20:00, Wietse Venema a écrit :

Patrick Chemla:
   

Hi,

I am running Postfix 2.5.6 on a Fedora 11 Linux system on a hardware
based Intel  I5/750 Quad Core, 8 Gb memory, 160Gb SSD hard disk.

Incoming messages are entering very fast (500 smtp processes declared)
and the active queue is actually of 2 millions messages waiting for
delivery.

The delivery, for all messages should go through a farm of 30 MX servers
from domain localpc2105.com, on load balancing through DNS resolution.
DNS server is of course local. All 30 MX servers are running qmail. All
of them are more than 90% idle. Before I set up my postfix server, email
were sent directly to the qmail servers, and qmail was running at full
CPU. So I am sure that qmail can handle much more faster. I have set up
the postfix server to load balance the load between all the 30 qmail
servers to avoid situation where some were running at full charge and
others were not working.
 

http://www.postfix.org/DEBUG_README.html#logging

Wietse
   


Here the logs:

Jan  6 23:12:48 postfix postfix/qmgr[31260]: warning: to turn off these 
warnings specify: qmgr_clog_warn_time = 0
Jan  6 23:19:39 postfix postfix/qmgr[31260]: warning: mail for 
localpc2105.com is using up 461335 of 461335 active queue entries
Jan  6 23:19:39 postfix postfix/qmgr[31260]: warning: you may need to 
increase the main.cf smtp_destination_concurrency_limit from 100
Jan  6 23:19:39 postfix postfix/qmgr[31260]: warning: please avoid 
flushing the whole queue when you have
Jan  6 23:19:39 postfix postfix/qmgr[31260]: warning: lots of deferred 
mail, that is bad for performance
Jan  6 23:19:39 postfix postfix/qmgr[31260]: warning: to turn off these 
warnings specify: qmgr_clog_warn_time = 0
Jan  6 23:24:51 postfix postfix/qmgr[31260]: warning: mail for 
localpc2105.com is using up 461086 of 461086 active queue entries
Jan  6 23:24:51 postfix postfix/qmgr[31260]: warning: you may need to 
increase the main.cf smtp_destination_concurrency_limit from 100
Jan  6 23:24:51 postfix postfix/qmgr[31260]: warning: please avoid 
flushing the whole queue when you have
Jan  6 23:24:51 postfix postfix/qmgr[31260]: warning: lots of deferred 
mail, that is bad for performance
Jan  6 23:24:51 postfix postfix/qmgr[31260]: warning: to turn off these 
warnings specify: qmgr_clog_warn_time = 0
Jan  6 23:29:51 postfix postfix/qmgr[31260]: warning: mail for 
localpc2105.com is using up 460872 of 460872 active queue entries
Jan  6 23:29:51 postfix postfix/qmgr[31260]: warning: you may need to 
increase the main.cf smtp_destination_concurrency_limit from 100
Jan  6 23:29:51 postfix postfix/qmgr[31260]: warning: please avoid 
flushing the whole queue when you have
Jan  6 23:29:51 postfix postfix/qmgr[31260]: warning: lots of deferred 
mail, that is bad for performance
Jan  6 23:29:51 postfix postfix/qmgr[31260]: warning: to turn off these 
warnings specify: qmgr_clog_warn_time = 0
Jan  6 23:35:51 postfix postfix/qmgr[31260]: warning: mail for 
localpc2105.com is using up 460025 of 460025 active queue entries
Jan  6 23:35:51 postfix postfix/qmgr[31260]: warning: you may need to 
increase the main.cf smtp_destination_concurrency_limit from 100
Jan  6 23:35:51 postfix postfix/qmgr[31260]: warning: please avoid 
flushing the whole queue when you have
Jan  6 23:35:51 postfix postfix/qmgr[31260]: warning: lots of deferred 
mail, that is bad for performance
Jan  6 23:35:51 postfix postfix/qmgr[31260]: warning: to turn off these 
warnings specify: qmgr_clog_warn_time = 0
Jan  6 23:40:51 postfix postfix/qmgr[31260]: warning: mail for 
localpc2105.com is using up 460283 of 460283 active queue entries
Jan  6 23:40:51 postfix postfix/qmgr[31260]: warning: you may need to 
increase the main.cf smtp_destination_concurrency_limit from 100
Jan  6 23:40:51 postfix postfix/qmgr[31260]: warning: please avoid 
flushing the whole queue when you have
Jan  6 23:40:51 postfix postfix/qmgr[31260]: warning: lots of deferred 
mail, that is bad for performance
Jan  6 23:40:51 postfix postfix/qmgr[31260]: warning: to turn off these 
warnings specify: qmgr_clog_warn_time = 0
Jan  6 23:47:21 postfix postfix/qmgr[31260]: warning: mail for 
localpc2105.com is using up 459714 of 459714 active queue entries
Jan  6 23:47:21 postfix postfix/qmgr[31260]: warning: you may need to 
increase the main.cf smtp_destination_concurrency_limit from 100
Jan  6 23:47:21 postfix postfix/qmgr[31260]: warning: please avoid 
flushing the whole queue when you have
Jan  6 23:47:21 postfix postfix/qmgr[31260]: warning: lots of deferred 
mail, that is bad for performance
Jan  6 23:47:21 postfix postfix/qmgr[31260]: warning: to turn off these 
warnings specify: qmgr_clog_warn_time = 0
Jan  6 23:52:21 postfix postfix/qmgr[31260]: warning: mail for 
localpc2105.com is using up 459491 of 459491 active queue entries
Jan  6 23:52:21 postfix postfix/qmgr[31260]: warning: you may need to 
increase the main.cf smtp_destination_concurrency_limit from 100
Jan

Re: Huge active queue and system idle, not delivering

2010-01-07 Thread Victor Duchovni

On Thu, Jan 07, 2010 at 07:43:55PM +0200, Patrick Chemla wrote:

 CPU is more than 85% idle on my postfix I5/750 box, but the outbound queue 
 is very very slow.

Throughput == Concurrency / Latency

What destination are most of the messages in the queue going to?

What is the associated transport?

Are you using any content filters?

What is the destination concurrency limit for that transport?

What is the delivery latency to that transport? Show a/b/c/d data
averaged (mean, median, stddev) over a bunch of log entries.

 It seems that something refrain qmgr to work at full range, despite the 
 parameters

Most of your parameter tweaks are counter-productive. Do not tweak
anything other than the destination concurrency limit for a transport
that delivers to a high capacity destination you control, say:

# The 200 is not a golden value, start at 50 and raise only
# if throughput improves as a result...
#
relay_destination_concurrency_limit = 200

 myhostname = postfix.proacti5.net
 mydomain = localpc2105.com
 inet_interfaces = all
 mydestination = $myhostname, localhost.$mydomain, localhost
 unknown_local_recipient_reject_code = 550
 mynetworks = 172.27.27.0/24, 10.0.0.0/24, 127.0.0.0/24
 relayhost = $mydomain

With relayhost set, all remote mail goes to the MX hosts for $mydomain,
so in this case, you can also raise:

# The 200 is not a golden value, start at 50 and raise only
# if throughput improves as a result...
#
smtp_destination_concurrency_limit = 200

if necessary.

 local_destination_recipient_limit = 500

Terrible idea.

 local_destination_concurrency_limit = 50

Terrible idea.

 debug_peer_level = 8

Absurd.

 debug_peer_list = orange.fr

I hope very little mail goes there...

 default_process_limit = 1000

Raise just the master.cf limits for the smtpd(8) and smtp(8)
services. You don't need 1000 of everything.

 initial_destination_concurrency = 100

Too high.

 transport_initial_destination_concurrency = 100

You misunderstood the docs, this is useless.

 default_destination_concurrency_failed_cohort_limit = 10

Should not be necessary.

 default_destination_recipient_limit = 200

OK.

 transport_destination_recipient_limit = 100

You misunderstood the docs, this is useless.

 default_delivery_slot_cost = 30
 default_minimum_delivery_slots = 30
 default_delivery_slot_discount = 100
 qmgr_fudge_factor = 200

Don't mess with the nqmgr tunables, they are too subtle for mortals.

 smtpd_peername_lookup = no

When output is starved, why make the input even faster?

 default_recipient_limit = 200
 qmgr_message_active_limit = 200
 qmgr_message_recipient_limit = 200

The Postfix queue does not scale to arbitrarily large sizes,
at some point, there is more to do than available capacity to
process the backlog. 2 million active messages may be OK for
a mass-mail engine that fires up periodically, and works as fast
as it can, but it is terrible for a mail forwarding relay. Which
use-case are you in?

 mailbox_size_limit = 512000

Why does this machine have any mailboxes at all? Isn't it a relay?
What software performs well with 5GB mailboxes?

 default_destination_concurrency_limit = 500

Better to specify smtp, relay or both, but not default.

 lmtp_destination_concurrency_limit = $default_destination_concurrency_limit
 smtp_destination_concurrency_limit = $default_destination_concurrency_limit
 relay_destination_concurrency_limit = $default_destination_concurrency_limit
 mime_nesting_limit = 100

These are default settings, don't add them to main.cf

 max_use = 1000

Fine.

 queue_file_attribute_count_limit = 250
 smtpd_history_flush_threshold = 100

Why???

 smtpd_junk_command_limit = 100

Why so generous to the input side?

 smtp_connect_timeout = 10s

Reasonable for a large nearby  MX pool, you can even use 1s if you want.

 smtp_data_done_timeout = 10s

Really not a good idea.

 smtp_mail_timeout = 5s

A bit aggressive...

 smtp  inetn   -   n   -   -   smtpd

Tune the process limit here

 qmgr  fifon   -   n   30  1   qmgr

Why re-scan the incoming queue every 30 seconds? The default is fine.

 smtp  unix-   -   n   -   -   smtp

Adjust the process limit here to the right number of smtp(8)
delivery agents.

 relay unix
 -o smtp_fallback_relay=   -   -   n   -   -   smtp

Adjust this process limit if you service any relay domains.

 I tried many combinations to speed up the delivery. Nothing help up to now.

LOGS!!!

 I just found that Postfix could send 1 million emails per hour when I send 
 less than a half million in 24 hours.

LOGS!!!

-- 
Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the Reply-To header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:

Re: Huge active queue and system idle, not delivering

2010-01-07 Thread Victor Duchovni

On Thu, Jan 07, 2010 at 08:29:44PM +0200, Patrick Chemla wrote:

 Here the logs:

This is just the qmgr(8) warnings about a clogged queue. Other than
telling us that all the mail is going to localpc2105.com, this
is not very useful. Where are the logs from smtp(8)?

What transport is localpc2105.com destined for? Any earlier
logging about actual delivery attempts for this destination?

-- 
Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the Reply-To header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:
mailto:majord...@postfix.org?body=unsubscribe%20postfix-users

If my response solves your problem, the best way to thank me is to not
send an it worked, thanks follow-up. If you must respond, please put
It worked, thanks in the Subject so I can delete these quickly.

Re: Huge active queue and system idle, not delivering

2010-01-07 Thread Stefan Caunter

On Thu, Jan 7, 2010 at 1:25 PM, Patrick Chemla
patrick.che...@perfaction.net wrote:

 said I just found that Postfix could send 1 million emails per hour
 when I send less than a half million in 24 hours, but I can't make
 sense of that, sorry.


 I have to inject 2 to 4 millions emails to the postfix box in 24 hours, and
 I expect to deliver within the same delay.
 Actually, I can't deliver more than 500,000 per 24h hours.

It could be viewed that half a million delivered in 24 hours is fine.
Are you signing the mail? This can help with delivery rates to the
large webmailer mx destinations.

Stef

 But the CPU of the box is idle more than 80%. It is clear that it is not a
 matter of CPU, nor memory, nor disk. Something in the number of
 processes/users/simultaneous tasks is blocking.

Re: Huge active queue and system idle, not delivering

2010-01-07 Thread Victor Duchovni

On Thu, Jan 07, 2010 at 04:47:14PM -0500, Stefan Caunter wrote:

 
  I have to inject 2 to 4 millions emails to the postfix box in 24 hours, and
  I expect to deliver within the same delay.
  Actually, I can't deliver more than 500,000 per 24h hours.
 
 It could be viewed that half a million delivered in 24 hours is fine.

No, it is too slow, when there is no content inspection involved,
especially with a nearby farm of relayhosts.

 Are you signing the mail? This can help with delivery rates to the
 large webmailer mx destinations.

This is unrelated to the OP's problem.

-- 
Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the Reply-To header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:
mailto:majord...@postfix.org?body=unsubscribe%20postfix-users

If my response solves your problem, the best way to thank me is to not
send an it worked, thanks follow-up. If you must respond, please put
It worked, thanks in the Subject so I can delete these quickly.

Re: Huge active queue and system idle, not delivering

2010-01-07 Thread Ralf Hildebrandt

* Stefan Caunter s...@caunter.ca:

 It could be viewed that half a million delivered in 24 hours is fine.
 Are you signing the mail? This can help with delivery rates to the
 large webmailer mx destinations.

There are many things to consider:

* DKIM signing - which is the prerequisite for getting into feedback
  loops at major email providers
* get into the feedback loops at major email providers
* SPF
* good reputation (e.g. SenderBase, senderscore)

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de

Re: Huge active queue and system idle, not delivering

2010-01-07 Thread Victor Duchovni

On Thu, Jan 07, 2010 at 10:54:15PM +0100, Ralf Hildebrandt wrote:

  It could be viewed that half a million delivered in 24 hours is fine.
  Are you signing the mail? This can help with delivery rates to the
  large webmailer mx destinations.
 
 There are many things to consider:
 
 * DKIM signing - which is the prerequisite for getting into feedback
   loops at major email providers
 * get into the feedback loops at major email providers
 * SPF
 * good reputation (e.g. SenderBase, senderscore)

None of these apply to the OP's problem. He is sending mail to a pool
of 30 qmail hosts.

-- 
Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the Reply-To header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:
mailto:majord...@postfix.org?body=unsubscribe%20postfix-users

If my response solves your problem, the best way to thank me is to not
send an it worked, thanks follow-up. If you must respond, please put
It worked, thanks in the Subject so I can delete these quickly.

Re: Huge active queue and system idle, not delivering

2010-01-07 Thread Patrick Chemla


Le 07/01/2010 23:47, Stefan Caunter a écrit :

On Thu, Jan 7, 2010 at 1:25 PM, Patrick Chemla
patrick.che...@perfaction.net  wrote:

   

said I just found that Postfix could send 1 million emails per hour
when I send less than a half million in 24 hours, but I can't make
sense of that, sorry.

   

I have to inject 2 to 4 millions emails to the postfix box in 24 hours, and
I expect to deliver within the same delay.
Actually, I can't deliver more than 500,000 per 24h hours.
 

It could be viewed that half a million delivered in 24 hours is fine.
Are you signing the mail? This can help with delivery rates to the
large webmailer mx destinations.

Stef

   


Half a million is 4 times lower than what we have done with qmail 
servers. Email are signed, but not from Postfix. Postfix must only relay 
mails from clients to local MXs. These local MXs will assume deliveries 
to the outside. Mail queue should be on these MXs, because they are 
dependant on final destinations.

But the CPU of the box is idle more than 80%. It is clear that it is not a
matter of CPU, nor memory, nor disk. Something in the number of
processes/users/simultaneous tasks is blocking.

Re: Huge active queue and system idle, not delivering

2010-01-07 Thread Patrick Chemla


Le 07/01/2010 20:37, Victor Duchovni a écrit :

On Thu, Jan 07, 2010 at 08:29:44PM +0200, Patrick Chemla wrote:

   

Here the logs:
 

This is just the qmgr(8) warnings about a clogged queue. Other than
telling us that all the mail is going to localpc2105.com, this
is not very useful. Where are the logs from smtp(8)?

What transport is localpc2105.com destined for? Any earlier
logging about actual delivery attempts for this destination?

   


Victor, thank you for your interest.

Daily logs are huge.

Here is a sample of deliveries:
Jan  7 22:02:57 postfix postfix/qmgr[26441]: 5B91F873F6: removed
Jan  7 22:02:57 postfix postfix/smtp[27180]: 375DDD5923: 
to=lexoti...@gmail.com, relay=a139.localpc2105.com[10.0.0.139]:25, 
conn_use=59, delay=61550, delays=17019/44435/96/0.17, dsn=2.0.0, 
status=sent (250 ok 1262894577 qp 12113)

Jan  7 22:02:57 postfix postfix/qmgr[26441]: 375DDD5923: removed
Jan  7 22:02:58 postfix postfix/smtp[27070]: 7F0F2943B3: 
to=gpo...@wanadoo.fr, relay=a70.localpc2105.com[10.0.0.70]:25, 
conn_use=10, delay=73795, delays=29264/44481/50/0.21, dsn=2.0.0, 
status=sent (250 ok 1262894577 qp 23067)

Jan  7 22:02:58 postfix postfix/qmgr[26441]: 7F0F2943B3: removed
Jan  7 22:02:58 postfix postfix/smtp[27050]: 32BB182182: 
to=gmarin-jardins-lois...@wanadoo.fr, 
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=48, delay=73799, 
delays=29268/44466/65/0.28, dsn=2.0.0, status=sent (250 ok 1262894578 qp 
12121)

Jan  7 22:02:58 postfix postfix/qmgr[26441]: 32BB182182: removed
Jan  7 22:02:58 postfix postfix/smtp[26758]: 577D6C7F7D: 
to=gerardtremb...@vinsdusiecle.com, 
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=60, delay=68451, 
delays=23920/44481/50/0.29, dsn=2.0.0, status=sent (250 ok 1262894578 qp 
12122)

Jan  7 22:02:58 postfix postfix/qmgr[26441]: 577D6C7F7D: removed
Jan  7 22:02:58 postfix postfix/smtp[26935]: CDCE074F53: 
to=christian.lebe...@arcelor.com, 
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=49, delay=104597, 
delays=60065/44421/110/0.3, dsn=2.0.0, status=sent (250 ok 1262894578 qp 
12135)

Jan  7 22:02:58 postfix postfix/qmgr[26441]: CDCE074F53: removed
Jan  7 22:02:58 postfix postfix/smtp[26708]: 4B0B6E77FD: 
to=m...@metaproductique.com, 
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=61, delay=46137, 
delays=1606/44461/70/0.31, dsn=2.0.0, status=sent (250 ok 1262894578 qp 
12136)

Jan  7 22:02:58 postfix postfix/qmgr[26441]: 4B0B6E77FD: removed
Jan  7 22:02:58 postfix postfix/smtp[26794]: D2CB5DC84C: 
to=secretar...@mairie-charly.fr, 
relay=a70.localpc2105.com[10.0.0.70]:25, conn_use=11, delay=58160, 
delays=13628/44481/50/0.23, dsn=2.0.0, status=sent (250 ok 1262894578 qp 
23076)

Jan  7 22:02:58 postfix postfix/qmgr[26441]: D2CB5DC84C: removed
Jan  7 22:02:58 postfix postfix/smtp[26968]: 1A651E17E0: 
to=davau.br...@orange.fr, relay=a74.localpc2105.com[10.0.0.74]:25, 
conn_use=2, delay=54426, delays=9894/44462/69/0.27, dsn=2.0.0, 
status=sent (250 ok 1262894578 qp 7411)

Jan  7 22:02:58 postfix postfix/qmgr[26441]: 1A651E17E0: removed
Jan  7 22:02:58 postfix postfix/smtp[27037]: 4CCC486B55: 
to=lenaerts.natuurst...@pandora.be, 
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=50, delay=45538, 
delays=1005/44407/125/0.17, dsn=2.0.0, status=sent (250 ok 1262894578 qp 
12150)

Jan  7 22:02:58 postfix postfix/qmgr[26441]: 4CCC486B55: removed
Jan  7 22:02:58 postfix postfix/smtp[27188]: D130997201: 
to=cont...@afcmecanum.com, relay=a74.localpc2105.com[10.0.0.74]:25, 
conn_use=2, delay=71536, delays=27004/8/84/0.28, dsn=2.0.0, 
status=sent (250 ok 1262894578 qp 7412)

Jan  7 22:02:58 postfix postfix/qmgr[26441]: D130997201: removed
Jan  7 22:02:59 postfix postfix/smtp[27033]: 6BD743906A: 
to=copyboli...@orange.fr, relay=a139.localpc2105.com[10.0.0.139]:25, 
conn_use=62, delay=81473, delays=36941/44467/65/0.24, dsn=2.0.0, 
status=sent (250 ok 1262894579 qp 12157)

Jan  7 22:02:59 postfix postfix/qmgr[26441]: 6BD743906A: removed
Jan  7 22:02:59 postfix postfix/smtp[26793]: 84947C14B2: 
to=wgall...@saemshema.com, relay=a70.localpc2105.com[10.0.0.70]:25, 
conn_use=12, delay=69401, delays=24868/44469/63/0.2, dsn=2.0.0, 
status=sent (250 ok 1262894578 qp 23084)

Jan  7 22:02:59 postfix postfix/qmgr[26441]: 84947C14B2: removed
Jan  7 22:02:59 postfix postfix/smtp[26737]: 6023552F52: 
to=cont...@installation-spa-gard.com, 
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=51, delay=96132, 
delays=51599/8/84/0.3, dsn=2.0.0, status=sent (250 ok 1262894579 qp 
12158)

Jan  7 22:02:59 postfix postfix/qmgr[26441]: 6023552F52: removed
Jan  7 22:02:59 postfix postfix/smtp[27134]: connect to 
a132.localpc2105.com[10.0.0.132]:25: Connection timed out
Jan  7 22:02:59 postfix postfix/smtp[26717]: 96A447C426: 
to=alain.perignon.aulnaysousb...@reseau.renault.fr, 
relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=63, delay=103800, 
delays=59267/44433/99/0.27, dsn=2.0.0, status=sent (250 ok 1262894579 qp 
12166)

Jan  7 22:02:59 postfix postfix/qmgr[26441]:

Re: Huge active queue and system idle, not delivering

2010-01-07 Thread Victor Duchovni

On Fri, Jan 08, 2010 at 12:30:34AM +0200, Patrick Chemla wrote:

 Jan  7 22:02:57 postfix postfix/qmgr[26441]: 5B91F873F6: removed
 Jan  7 22:02:57 postfix postfix/smtp[27180]: 375DDD5923: 
 to=lexoti...@gmail.com, relay=a139.localpc2105.com[10.0.0.139]:25, 
 conn_use=59, delay=61550, delays=17019/44435/96/0.17, dsn=2.0.0, 
 status=sent (250 ok 1262894577 qp 12113)

This recipient does not match the destination that is clogging the
queue. Is the queue clogged with postmaster notices. I never enable
any postmaster notices, they don't scale.

notify_classes =

This said, the 96 seconds of connection setup latency is an obvious and
severe problem. Why on earth does it take 96 seconds to complete a HELO
handshake with a139.localpcc2105.com? You are not going to get much
mail out if each delivery takes 96 seconds...

Is your Postfix server's IP address resolvable on the qmail systems?
Are they doing some sort of pre-banner delay? ...


 Jan  7 22:02:58 postfix postfix/smtp[27070]: 7F0F2943B3: 
 to=gpo...@wanadoo.fr, relay=a70.localpc2105.com[10.0.0.70]:25, 
 conn_use=10, delay=73795, delays=29264/44481/50/0.21, dsn=2.0.0, 
 status=sent (250 ok 1262894577 qp 23067)

Once again, 50 seconds is severely crippled.

 Jan  7 22:02:58 postfix postfix/smtp[27050]: 32BB182182: 
 to=gmarin-jardins-lois...@wanadoo.fr, 
 relay=a139.localpc2105.com[10.0.0.139]:25, conn_use=48, delay=73799, 
 delays=29268/44466/65/0.28, dsn=2.0.0, status=sent (250 ok 1262894578 qp 
 12121)

This is enough. Fix this.

Where are the deliveries to the clogged destination???

-- 
Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the Reply-To header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:
mailto:majord...@postfix.org?body=unsubscribe%20postfix-users

If my response solves your problem, the best way to thank me is to not
send an it worked, thanks follow-up. If you must respond, please put
It worked, thanks in the Subject so I can delete these quickly.

Re: Huge active queue and system idle, not delivering

2010-01-07 Thread Wietse Venema

Patrick Chemla:
  But the CPU of the box is idle more than 80%. It is clear that it is not a
  matter of CPU, nor memory, nor disk. Something in the number of
  processes/users/simultaneous tasks is blocking.

Indeed, the symptom of blocking is in the third field of
the Postfix delays logging.

   The format of the delays=a/b/c/d logging is as follows:

   o  a = time from message arrival to last active queue entry

   o  b = time from last active queue entry to connection setup

   o  c = time in connection setup, including DNS, EHLO and TLS

   o  d = time in message transmission

In your case, it takes a minute or more to set up the connection
including DNS lookup and EHLO handshake. That is holding up your mail.

- Check if the qmail servers are responsive (telnet hostname 25).

- Check if your Postfix needs a /var/spool/postfix/etc/resolv.conf
  file, and if that file is consistent with /etc/resolv.conf. If
  Postfix needs /var/spool/postfix/etc/resolv.conf and the file
  is missong or contains a bogus server that will add time to
  your deliveries.

- If they aren't, increase the concurrency on the qmail side.

Wietse

38 matches

Mail list logo