On Fri, Mar 19, 2010 at 05:28:07PM +0100, Attila Nagy wrote:

> On 03/19/10 16:13, Victor Duchovni wrote:
>> Forward mail for this domain to a separate queue (Postfix instance)
>> that handles mail for this---and perhaps some other similar---domains.
>> The slow domain will no longer clog your primary queue.
>>    
> You are right that this will solve the problem, but isn't more correct to 
> do this automatically? I mean, this seems to be so basic that I don't 
> understand why postfix doesn't include a mechanism to overcome it.
> Currently the only sane thing seems to be to raise the active queue limit 
> to the size of (or near to) the incoming queue, which makes the delivery 
> for other domains scream.
> But that's lame, and needs a lot of ram for a problem, which could be 
> easily solved in other ways.

If your input rate permanently exceeds your output rate and the input
rate is out of your hands, there is no solution other than negotiating
a higher delivery rate to the slow destination with the admins of that
destination. The mail will continue to pile-up on your server.

If the input is has huge bursts, followed by prolonged inactivity, and
more huge bursts, indeed build systems with more RAM and increase the
active queue size. Another way to do that is to field more servers to
queue this traffic.

>> The latency of "0.33" seconds is not unreasonably high. Is this typical
>> for deliveries to this domain? With a concurrency of 20, you should be
>> able to deliver ~60 messages per second to this destination. Can you
>> compute a "smoothed" latency for this destination?
>>    
> I've only written this, because I was sure that somebody would miss it.
> This destination is not slow because of slow delivery times on the already 
> open connections, but because of connection timeouts (I can observe this on 
> other, mostly silent systems, which send only few messages there) and 
> artificial limits on the recipient side.

Well the "connection timeouts" lead to a high "c" value, so that would
show up in the numbers. How long is your timeout?  Connection caching
should compensate for high connection set-up costs, why is that failing
for you? The conn_use=76 from your log message suggests that connection
caching is working reasonably well. Perhaps a dedicated transport with
a lower smtp_connect_timeout is the answer... You can also use the
new 2.5 scheduler controls to reduce the impact of negative feedback...

>>    
> egrep 'to.*citromail\.hu.*status=sent' maillog | egrep -o 
> '[0-9]+/[0-9]+/[0-9]+/[0-9]+' | awk -F '/' '{a=$1;b=$2;c=$3;d=$4; 
> lavg=lavg*0.95+(c+d)*0.05; count=count+1; if (count % 100 == 0) print 
> lavg}'

The smoothed latencies look acceptable...

>> How many concurrent connections do you have for this destination?
>> What is the destination concurrency limit?
>>    
> foreach i (`jot 8`)
> foreach? netstat -a | egrep 'citroma.*ESTAB' | wc -l

The connection count looks good.

What is the input rate (messages with recipients in this domain per minute, 
with each 50 recipients of a single message counting as a single message,
so that a 200 recipient message is 4 logical messages, if you have not
changed the smtp_destination_recipient_limit)?

What is the output rate (envelopes delivered to this domain per minute,
counting deliveries to multiple recipients of a single message as one
delivery when the delivery agent pid, queue-id, delays, dsn and remote
reply are identical.

> I know where the problem is (so you do :), I just don't understand why is 
> it good to have this feature in postfix.

Explaining the entire design will take too long. Suffice it to say that
the trade-offs made were decided carefully.

-- 
        Viktor.

P.S. Morgan Stanley is looking for a New York City based, Senior Unix
system/email administrator to architect and sustain our perimeter email
environment.  If you are interested, please drop me a note.

Reply via email to