On Thu, Nov 01, 2012 at 12:38:26AM +0800, Fred Ho wrote:

> OK, smtpd[9647]: disconnect happened at 13:04:23. There's no
> smtpd[9647] entries after 10:15:16.
> 
> Oct 31 10:15:16 mailgate2 postfix/smtpd[9647]:
>   connect from ttsenmta2.ttasia.com[210.17.183.8]
> Oct 31 10:15:17 mailgate2 postfix/smtpd[9647]: 4803D7F20:
>   client=ttsenmta2.ttasia.com[210.17.183.8]
> Oct 31 10:15:17 mailgate2 postfix/cleanup[9648]: 4803D7F20:
>   message-id=<002601cdb70d$8e5363e0$aafa2ba0$@com.tw>
> Oct 31 13:04:23 mailgate2 postfix/smtpd[9647]:
>   disconnect from ttsenmta2.ttasia.com[210.17.183.8]
> Oct 31 13:04:23 mailgate2 postfix/qmgr[4869]: 4803D7F20:
>   from=<v...@otogarment.com>, size=595890, nrcpt=3 (queue active)
> [...]

As suspected the disconnect coincides with the start of delivery
processing. There are two possibilities:

  1. The smtpd(8) server did not see the ".<CRLF>" from the client
     until the client disconnected. This could be a bug in the
     client PIPELINING implementation, or some firewall between the
     client and SMTP server.

  2. The ".<CRLF>" was received in a timely manner, but smtpd(8)
     and/or cleanup(8) took a long time to commit the message to
     the incoming queue, by which time the client had long disconnected,
     but this fact was only discovered once the message processing was
     completed.

One possibility for case 2 is very slow processing of virtual alias
expansion or very slow transport resolution, ... which block the
cleanup(8) service.

Does the problem happen intermittently with all senders, or just
with some senders and not others?

You can add one of the more frequent problem senders to "debug_peer_list"
and we'll see when the SMTP server receives ".<CRLF>". Otherwise you
need to capture a tcpdump "pcap" file of the problem transactions and
look at packet timestamps.


> Here's the postconf -n output:

> default_destination_concurrency_limit = 100

This is a bad idea.

> default_destination_recipient_limit = 300

This violates RFC limits and is also a bad idea.

> message_size_limit = 838860800

I'd recommend something <= 100 MB, the default is 10 MB.

> relay_domains = $mydestination

Make "relay_domains" empty instead. If you want to relay specific
sub-domains of domains listed in mydestination, list them explicitly,
and set "parent_domain_matches_subdomains" to just list
"smtpd_access_maps", and omit "relay_domains".

With local delivery disabled, you should probably also have
mydestination empty!

> smtp_connect_timeout = 300s

This is too long the default is probably better.

> smtp_data_init_timeout = 300s
> smtp_data_xfer_timeout = 240s

You probably don't need to tune these.

> smtpd_policy_service_timeout = 300s
> smtpd_proxy_timeout = 300s

Are you using a proxy filter? Your logs don't show this. Could
it be responsible for the delay?

> smtpd_timeout = 600s

You probably don't need to change this, and Postfix by default
makes this stress-dependent, which is a good idea.

There's no evidence of any slow lookup mechanisms used for
address rewriting or transport lookups, so firewall issues
or pipelining bugs in clients are the most likely.

-- 
        Viktor.

Reply via email to