Hello all,


Thanks for the suggestions.



We changed smtp_starttls_timeout to 10s and now the messages are in the
queue exactly 10 seconds when next hop is O365. There is no issue with
other servers.



During the investigation we found out that the OpenSSL version had been
updated from 1.0.0 to 1.0.1e and due to that the TLS connection is done via
TLSv1.2 instead of TLSv1.



The suggested smtp_skip_quit_response is set to yes (from the default
options).

Turning on debug_peerlist did’t yield anything significant as there is no
information on the TLS connection closing:



Jun 17 15:19:13 mx1 postfix/smtpd[17002]: 0053C18DF:
client=relay.host.tld[10.ip.ip.ip]

Jun 17 15:19:13 mx1 postfix/cleanup[17234]: 0053C18DF: message-id=<
ms...@otherhost.tld>

Jun 17 15:19:13 mx1 postfix/qmgr[8901]: 0053C18DF: from=<u...@otherhost.tld>,
size=12334, nrcpt=1 (queue active)

Jun 17 15:19:13 mx1 postfix/smtpd[17002]: disconnect from
relay.host.tld[10.ip.ip.ip]

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
220 host.protection.outlook.com Microsoft ESMTP MAIL Service ready at Mon,
17 Jun 2019

Jun 17 15:19:13 mx1 postfix/smtp[9553]: >
host.prod.outlook.com[ip.ip.ip.ip]:25:
EHLO mx1.host.tld

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250-host.protection.outlook.com Hello [ip.ip.ip.ip]

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250-SIZE 157286400

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250-PIPELINING

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250-DSN

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250-ENHANCEDSTATUSCODES

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250-STARTTLS

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250-8BITMIME

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250-BINARYMIME

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250-CHUNKING

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250 SMTPUTF8

Jun 17 15:19:13 mx1 postfix/smtp[9553]: server features: 0x901f size
157286400

Jun 17 15:19:13 mx1 postfix/smtp[9553]: Using ESMTP PIPELINING, TCP send
buffer size is 4096

Jun 17 15:19:13 mx1 postfix/smtp[9553]: >
host.prod.outlook.com[ip.ip.ip.ip]:25:
STARTTLS

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
220 2.0.0 SMTP server ready

Jun 17 15:19:13 mx1 postfix/smtp[9553]: setting up TLS connection to
host.prod.outlook.com[ip.ip.ip.ip]:25

Jun 17 15:19:13 mx1 postfix/smtp[9553]: auto_clnt_open: connected to
private/tlsmgr

Jun 17 15:19:13 mx1 postfix/smtp[9553]: send attr request = seed

Jun 17 15:19:13 mx1 postfix/smtp[9553]: send attr size = 32

Jun 17 15:19:13 mx1 postfix/smtp[9553]: private/tlsmgr: wanted attribute:
status

Jun 17 15:19:13 mx1 postfix/smtp[9553]: input attribute name: status

Jun 17 15:19:13 mx1 postfix/smtp[9553]: input attribute value: 0

Jun 17 15:19:13 mx1 postfix/smtp[9553]: private/tlsmgr: wanted attribute:
seed

Jun 17 15:19:13 mx1 postfix/smtp[9553]: input attribute name: seed

Jun 17 15:19:13 mx1 postfix/smtp[9553]: input attribute value:
0wn/teYsBxIEcaOstQNWf3R6Tn/T8bF1UO0MhfUV7X4=

Jun 17 15:19:13 mx1 postfix/smtp[9553]: private/tlsmgr: wanted attribute:
(list terminator)

Jun 17 15:19:13 mx1 postfix/smtp[9553]: input attribute name: (end)

Jun 17 15:19:13 mx1 postfix/smtp[9553]: Trusted TLS connection established
to host.prod.outlook.com[ip.ip.ip.ip]:25: TLSv1.2 with cipher
ECDHE-RSA-AES256-SHA384 (256/256 bits)

Jun 17 15:19:13 mx1 postfix/smtp[9553]: >
host.prod.outlook.com[ip.ip.ip.ip]:25:
EHLO mx1.host.tld

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250-host.protection.outlook.com Hello [ip.ip.ip.ip]

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250-SIZE 157286400

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250-PIPELINING

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250-DSN

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250-ENHANCEDSTATUSCODES

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250-8BITMIME

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250-BINARYMIME

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250-CHUNKING

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250 SMTPUTF8

Jun 17 15:19:13 mx1 postfix/smtp[9553]: server features: 0x900f size
157286400

Jun 17 15:19:13 mx1 postfix/smtp[9553]: Using ESMTP PIPELINING, TCP send
buffer size is 4096

Jun 17 15:19:13 mx1 postfix/smtp[9553]: >
host.prod.outlook.com[ip.ip.ip.ip]:25:
MAIL FROM:<u...@otherhost.tld> SIZE=12334

Jun 17 15:19:13 mx1 postfix/smtp[9553]: >
host.prod.outlook.com[ip.ip.ip.ip]:25:
RCPT TO:<u...@onmicrosoft.com> ORCPT=rfc822;u...@host.tld

Jun 17 15:19:13 mx1 postfix/smtp[9553]: >
host.prod.outlook.com[ip.ip.ip.ip]:25:
DATA

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250 2.1.0 Sender OK

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250 2.1.5 Recipient OK

Jun 17 15:19:13 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
354 Start mail input; end with <CRLF>.<CRLF>

Jun 17 15:19:13 mx1 postfix/smtp[9553]: >
host.prod.outlook.com[ip.ip.ip.ip]:25:
.

Jun 17 15:19:13 mx1 postfix/smtp[9553]: >
host.prod.outlook.com[ip.ip.ip.ip]:25:
QUIT

Jun 17 15:19:14 mx1 postfix/smtp[9553]: <
host.prod.outlook.com[ip.ip.ip.ip]:25:
250 2.6.0 <u...@otherhost.tld> [InternalId=123456, Hostname=AM

Jun 17 15:19:14 mx1 postfix/smtp[9553]: 0053C18DF: to=<u...@onmicrosoft.com>,
relay=host.prod.outlook.com[ip.ip.ip.ip]:25, delay=1.4,
delays=0.01/0/0.44/0.93, dsn=2.6.0, status=sent (250 2.6.0 <
ms...@otherhost.tld> [InternalId=123456, Hostname=host.prod.outlook.com]
21277 bytes in 0.206, 100.666 KB/sec Queued mail for delivery)

Jun 17 15:19:14 mx1 postfix/smtp[9553]: name_mask: resource

Jun 17 15:19:14 mx1 postfix/smtp[9553]: name_mask: software

Jun 17 15:24:14 mx1 postfix/qmgr[8901]: 0053C18DF: removed



Maybe we should enable more TLS logging? Currently it is smtp_tls_loglevel
= 1. However, I am afraid that it could overwhelm the server as the
messages are a lot.

Let me know if I am missing something.


BR,

Toncho

On Fri, Jun 14, 2019 at 9:12 PM Viktor Dukhovni <postfix-us...@dukhovni.org>
wrote:

> On Fri, Jun 14, 2019 at 03:17:44PM +0300, anzelmooo . wrote:
>
> > Jun 14 09:36:26 mx1 postfix/smtp[6409]: 488845290: to=<
> us...@onmicrosoft.com>,
> >   relay=host.mail.protection.outlook.com[ip.ip.ip.ip]:25, delay=1.7,
> >   delays=0.01/0/0.34/1.4, dsn=2.6.0, status=sent (250 2.6.0 <msgid2>
> >   [InternalId=1234, Hostname=host.prod.outlook.com] 111697 bytes in
> 0.722,
> >   151.051 KB/sec Queued mail for delivery)
> > Jun 14 09:41:26 mx1 postfix/qmgr[2699]: 488845290: removed
>
> Note the 5 minute (300s) delay.
>
> On Fri, Jun 14, 2019 at 09:25:01AM -0400, Wietse Venema wrote:
>
> > If you want to debug this, you can log the SMTP session (debug_peer_list
> > = example.com) and see if the remote server delays its QUIT response.
> > That could explain the 5s delay.
>
> That was 5 minutes, and given the Postfix default:
>
>     smtp_quit_timeout = 300s
>
> the QUIT hypothesis is in the right ballpark.
>
> > Such delays should not be a problem with the default setting
> > "smtp_skip_quit_response = yes", but maybe RedHat has changed that.
>
> But I doubt Redhat changed this.  Another plausible hypothesis is:
>
>     smtp_starttls_timeout = 300s
>
> which 2.6 also applies to the 2-way TLS session shutdown.  Perhaps
> after "QUIT", the TCP session appears to stay up with no close
> notify from the server before the timeout.
>
> There may be relevant changes in the RedHat SSL stack.  It would
> be good to see the TLS log messages (which don't have the queue-id,
> but have the same process-id as the smtp delivery agent) and are
> logged shortly before delivery success.  This would confirm use
> of TLS and protocol details.
>
> By the way, already in RF2246 (TLS 1.0), there is no requirement
> to wait for the peer's close-notify.  If the application protocol
> provides sufficient session termination signalling, then there's
> no need to duplicate that at the TLS close-notify layer.
>
>    https://tools.ietf.org/html/rfc2246#section-7.2.1
>    https://tools.ietf.org/html/rfc4346#section-7.2.1
>    https://tools.ietf.org/html/rfc5246#section-7.2.1
>
>    Unless some other fatal alert has been transmitted, each party
>    is required to send a close_notify alert before closing the write
>    side of the connection.  The other party MUST respond with a
>    close_notify alert of its own and close down the connection
>    immediately, discarding any pending writes.  It is not required
>    for the initiator of the close to wait for the responding
>    close_notify alert before closing the read side of the connection.
>
> Therefore, we should probably eliminate the second tls_bio_shutdown()
> call from tls_session_stop().
>
> --
>         Viktor.
>
> --- src/tls/tls_session.c
> +++ src/tls/tls_session.c
> @@ -95,18 +95,13 @@ void    tls_session_stop(TLS_APPL_STATE *unused_ctx,
> VSTREAM *stream, int timeou
>         msg_panic("%s: stream has no active TLS context", myname);
>
>      /*
> -     * Perform SSL_shutdown() twice, as the first attempt will send out
> the
> -     * shutdown alert but it will not wait for the peer's shutdown alert.
> -     * Therefore, when we are the first party to send the alert, we must
> call
> -     * SSL_shutdown() again. On failure we don't want to resume the
> session,
> -     * so we will not perform SSL_shutdown() and the session will be
> removed
> -     * as being bad.
> +     * Perform SSL_shutdown(), sending out the shutdown alert without
> waiting
> +     * for the peer's shutdown alert.  On failure we don't want to resume
> the
> +     * session, so we will not perform SSL_shutdown() and the session
> will be
> +     * removed as being bad.
>       */
> -    if (!failure) {
> -       retval = tls_bio_shutdown(vstream_fileno(stream), timeout,
> TLScontext);
> -       if (retval == 0)
> -           tls_bio_shutdown(vstream_fileno(stream), timeout, TLScontext);
> -    }
> +    if (!failure)
> +       (void) tls_bio_shutdown(vstream_fileno(stream), timeout,
> TLScontext);
>      tls_free_context(TLScontext);
>      tls_stream_stop(stream);
>  }
>
>

Reply via email to