Hello all,
Thanks for the suggestions. We changed smtp_starttls_timeout to 10s and now the messages are in the queue exactly 10 seconds when next hop is O365. There is no issue with other servers. During the investigation we found out that the OpenSSL version had been updated from 1.0.0 to 1.0.1e and due to that the TLS connection is done via TLSv1.2 instead of TLSv1. The suggested smtp_skip_quit_response is set to yes (from the default options). Turning on debug_peerlist did’t yield anything significant as there is no information on the TLS connection closing: Jun 17 15:19:13 mx1 postfix/smtpd[17002]: 0053C18DF: client=relay.host.tld[10.ip.ip.ip] Jun 17 15:19:13 mx1 postfix/cleanup[17234]: 0053C18DF: message-id=< ms...@otherhost.tld> Jun 17 15:19:13 mx1 postfix/qmgr[8901]: 0053C18DF: from=<u...@otherhost.tld>, size=12334, nrcpt=1 (queue active) Jun 17 15:19:13 mx1 postfix/smtpd[17002]: disconnect from relay.host.tld[10.ip.ip.ip] Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 220 host.protection.outlook.com Microsoft ESMTP MAIL Service ready at Mon, 17 Jun 2019 Jun 17 15:19:13 mx1 postfix/smtp[9553]: > host.prod.outlook.com[ip.ip.ip.ip]:25: EHLO mx1.host.tld Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250-host.protection.outlook.com Hello [ip.ip.ip.ip] Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250-SIZE 157286400 Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250-PIPELINING Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250-DSN Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250-ENHANCEDSTATUSCODES Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250-STARTTLS Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250-8BITMIME Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250-BINARYMIME Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250-CHUNKING Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250 SMTPUTF8 Jun 17 15:19:13 mx1 postfix/smtp[9553]: server features: 0x901f size 157286400 Jun 17 15:19:13 mx1 postfix/smtp[9553]: Using ESMTP PIPELINING, TCP send buffer size is 4096 Jun 17 15:19:13 mx1 postfix/smtp[9553]: > host.prod.outlook.com[ip.ip.ip.ip]:25: STARTTLS Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 220 2.0.0 SMTP server ready Jun 17 15:19:13 mx1 postfix/smtp[9553]: setting up TLS connection to host.prod.outlook.com[ip.ip.ip.ip]:25 Jun 17 15:19:13 mx1 postfix/smtp[9553]: auto_clnt_open: connected to private/tlsmgr Jun 17 15:19:13 mx1 postfix/smtp[9553]: send attr request = seed Jun 17 15:19:13 mx1 postfix/smtp[9553]: send attr size = 32 Jun 17 15:19:13 mx1 postfix/smtp[9553]: private/tlsmgr: wanted attribute: status Jun 17 15:19:13 mx1 postfix/smtp[9553]: input attribute name: status Jun 17 15:19:13 mx1 postfix/smtp[9553]: input attribute value: 0 Jun 17 15:19:13 mx1 postfix/smtp[9553]: private/tlsmgr: wanted attribute: seed Jun 17 15:19:13 mx1 postfix/smtp[9553]: input attribute name: seed Jun 17 15:19:13 mx1 postfix/smtp[9553]: input attribute value: 0wn/teYsBxIEcaOstQNWf3R6Tn/T8bF1UO0MhfUV7X4= Jun 17 15:19:13 mx1 postfix/smtp[9553]: private/tlsmgr: wanted attribute: (list terminator) Jun 17 15:19:13 mx1 postfix/smtp[9553]: input attribute name: (end) Jun 17 15:19:13 mx1 postfix/smtp[9553]: Trusted TLS connection established to host.prod.outlook.com[ip.ip.ip.ip]:25: TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits) Jun 17 15:19:13 mx1 postfix/smtp[9553]: > host.prod.outlook.com[ip.ip.ip.ip]:25: EHLO mx1.host.tld Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250-host.protection.outlook.com Hello [ip.ip.ip.ip] Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250-SIZE 157286400 Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250-PIPELINING Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250-DSN Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250-ENHANCEDSTATUSCODES Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250-8BITMIME Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250-BINARYMIME Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250-CHUNKING Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250 SMTPUTF8 Jun 17 15:19:13 mx1 postfix/smtp[9553]: server features: 0x900f size 157286400 Jun 17 15:19:13 mx1 postfix/smtp[9553]: Using ESMTP PIPELINING, TCP send buffer size is 4096 Jun 17 15:19:13 mx1 postfix/smtp[9553]: > host.prod.outlook.com[ip.ip.ip.ip]:25: MAIL FROM:<u...@otherhost.tld> SIZE=12334 Jun 17 15:19:13 mx1 postfix/smtp[9553]: > host.prod.outlook.com[ip.ip.ip.ip]:25: RCPT TO:<u...@onmicrosoft.com> ORCPT=rfc822;u...@host.tld Jun 17 15:19:13 mx1 postfix/smtp[9553]: > host.prod.outlook.com[ip.ip.ip.ip]:25: DATA Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250 2.1.0 Sender OK Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250 2.1.5 Recipient OK Jun 17 15:19:13 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 354 Start mail input; end with <CRLF>.<CRLF> Jun 17 15:19:13 mx1 postfix/smtp[9553]: > host.prod.outlook.com[ip.ip.ip.ip]:25: . Jun 17 15:19:13 mx1 postfix/smtp[9553]: > host.prod.outlook.com[ip.ip.ip.ip]:25: QUIT Jun 17 15:19:14 mx1 postfix/smtp[9553]: < host.prod.outlook.com[ip.ip.ip.ip]:25: 250 2.6.0 <u...@otherhost.tld> [InternalId=123456, Hostname=AM Jun 17 15:19:14 mx1 postfix/smtp[9553]: 0053C18DF: to=<u...@onmicrosoft.com>, relay=host.prod.outlook.com[ip.ip.ip.ip]:25, delay=1.4, delays=0.01/0/0.44/0.93, dsn=2.6.0, status=sent (250 2.6.0 < ms...@otherhost.tld> [InternalId=123456, Hostname=host.prod.outlook.com] 21277 bytes in 0.206, 100.666 KB/sec Queued mail for delivery) Jun 17 15:19:14 mx1 postfix/smtp[9553]: name_mask: resource Jun 17 15:19:14 mx1 postfix/smtp[9553]: name_mask: software Jun 17 15:24:14 mx1 postfix/qmgr[8901]: 0053C18DF: removed Maybe we should enable more TLS logging? Currently it is smtp_tls_loglevel = 1. However, I am afraid that it could overwhelm the server as the messages are a lot. Let me know if I am missing something. BR, Toncho On Fri, Jun 14, 2019 at 9:12 PM Viktor Dukhovni <postfix-us...@dukhovni.org> wrote: > On Fri, Jun 14, 2019 at 03:17:44PM +0300, anzelmooo . wrote: > > > Jun 14 09:36:26 mx1 postfix/smtp[6409]: 488845290: to=< > us...@onmicrosoft.com>, > > relay=host.mail.protection.outlook.com[ip.ip.ip.ip]:25, delay=1.7, > > delays=0.01/0/0.34/1.4, dsn=2.6.0, status=sent (250 2.6.0 <msgid2> > > [InternalId=1234, Hostname=host.prod.outlook.com] 111697 bytes in > 0.722, > > 151.051 KB/sec Queued mail for delivery) > > Jun 14 09:41:26 mx1 postfix/qmgr[2699]: 488845290: removed > > Note the 5 minute (300s) delay. > > On Fri, Jun 14, 2019 at 09:25:01AM -0400, Wietse Venema wrote: > > > If you want to debug this, you can log the SMTP session (debug_peer_list > > = example.com) and see if the remote server delays its QUIT response. > > That could explain the 5s delay. > > That was 5 minutes, and given the Postfix default: > > smtp_quit_timeout = 300s > > the QUIT hypothesis is in the right ballpark. > > > Such delays should not be a problem with the default setting > > "smtp_skip_quit_response = yes", but maybe RedHat has changed that. > > But I doubt Redhat changed this. Another plausible hypothesis is: > > smtp_starttls_timeout = 300s > > which 2.6 also applies to the 2-way TLS session shutdown. Perhaps > after "QUIT", the TCP session appears to stay up with no close > notify from the server before the timeout. > > There may be relevant changes in the RedHat SSL stack. It would > be good to see the TLS log messages (which don't have the queue-id, > but have the same process-id as the smtp delivery agent) and are > logged shortly before delivery success. This would confirm use > of TLS and protocol details. > > By the way, already in RF2246 (TLS 1.0), there is no requirement > to wait for the peer's close-notify. If the application protocol > provides sufficient session termination signalling, then there's > no need to duplicate that at the TLS close-notify layer. > > https://tools.ietf.org/html/rfc2246#section-7.2.1 > https://tools.ietf.org/html/rfc4346#section-7.2.1 > https://tools.ietf.org/html/rfc5246#section-7.2.1 > > Unless some other fatal alert has been transmitted, each party > is required to send a close_notify alert before closing the write > side of the connection. The other party MUST respond with a > close_notify alert of its own and close down the connection > immediately, discarding any pending writes. It is not required > for the initiator of the close to wait for the responding > close_notify alert before closing the read side of the connection. > > Therefore, we should probably eliminate the second tls_bio_shutdown() > call from tls_session_stop(). > > -- > Viktor. > > --- src/tls/tls_session.c > +++ src/tls/tls_session.c > @@ -95,18 +95,13 @@ void tls_session_stop(TLS_APPL_STATE *unused_ctx, > VSTREAM *stream, int timeou > msg_panic("%s: stream has no active TLS context", myname); > > /* > - * Perform SSL_shutdown() twice, as the first attempt will send out > the > - * shutdown alert but it will not wait for the peer's shutdown alert. > - * Therefore, when we are the first party to send the alert, we must > call > - * SSL_shutdown() again. On failure we don't want to resume the > session, > - * so we will not perform SSL_shutdown() and the session will be > removed > - * as being bad. > + * Perform SSL_shutdown(), sending out the shutdown alert without > waiting > + * for the peer's shutdown alert. On failure we don't want to resume > the > + * session, so we will not perform SSL_shutdown() and the session > will be > + * removed as being bad. > */ > - if (!failure) { > - retval = tls_bio_shutdown(vstream_fileno(stream), timeout, > TLScontext); > - if (retval == 0) > - tls_bio_shutdown(vstream_fileno(stream), timeout, TLScontext); > - } > + if (!failure) > + (void) tls_bio_shutdown(vstream_fileno(stream), timeout, > TLScontext); > tls_free_context(TLScontext); > tls_stream_stop(stream); > } > >