On Wed, Mar 27, 2019 at 03:36:28PM +0100, Juliana Rodrigueiro wrote:

> However, during a benchmark, we realized 3.3.2 was 5 times slower than the 
> version before.

This is misleading.  Postfix is not 5 times slower, your benchmark
appears to be measuring the LMTP delivery rate to a single sink
mailbox, and this artificial work-load may be slower under some
adverse conditions (below...).

> With version >= 2.11.1, the first emails are received and processed somewhat 
> instantly,

How many?  Are those connections closed or kept open in scache?

> Mar 27 14:46:50 localdomain postfix/qmgr[24339]: 7079468: 
> from=<d...@localdomain.com>, size=356, nrcpt=1 (queue active)
> Mar 27 14:46:50 localdomain postfix/lmtp[24750]: 6CEFF61: 
> to=<d...@localdomain.com>, orig_to=<dst@localhost>, 
> relay=localdomain.com[/var/
> imap/socket/lmtp], delay=0.02, delays=0.01/0/0.01/0, dsn=2.1.5, status=sent 
> (250 2.1.5 Ok SESSIONID=<localdomain.com-24421-1553694410-1>)

> Mar 27 14:46:50 localdomain postfix/lmtp[24752]: 6E8CE63: 
> to=<d...@localdomain.com>, orig_to=<dst@localhost>, 
> relay=localdomain.com[/var/
> imap/socket/lmtp], delay=0.02, delays=0/0/0.01/0.01, dsn=2.1.5, status=sent 
> (250 2.1.5 Ok SESSIONID=<localdomain.com-24753-1553694410-1>)

> Mar 27 14:46:52 localdomain postfix/lmtp[24752]: 764A06A: 
> to=<d...@localdomain.com>, orig_to=<dst@localhost>, 
> relay=localdomain.com[/var/
> imap/socket/lmtp], delay=2, delays=0/2/0/0, dsn=2.1.5, status=sent (250 2.1.5 
> Ok SESSIONID=<localdomain.com-24757-1553694412-1>)

This message waited 2s in the active queue, before it was its turn
to go, this suggests some other deliveries were still in progress...

> Mar 27 14:46:54 localdomain postfix/lmtp[24750]: 778DC61: 
> to=<d...@localdomain.com>, orig_to=<dst@localhost>, 
> relay=localdomain.com[/var/
> imap/socket/lmtp], delay=4, delays=0.01/2/2/0.02, dsn=2.1.5, status=sent (250 

This delivery took 2 seconds to complete a connection, why?

> Mar 27 14:46:58 localdomain postfix/lmtp[24750]: 7CFDE72: 
> to=<d...@localdomain.com>, orig_to=<dst@localhost>, 
> relay=localdomain.com[/var/
> imap/socket/lmtp], delay=8.1, delays=0/6/2/0.02, dsn=2.1.5, status=sent (250 

This delivery spent 6s waiting in the active queue and 2s to complete
a connection.  Why are connections to the LMTP server so expensive?
Is your syslog service logging synchronously?  Perhaps logging of
connection establishment (in the LMTP server) is causing congestion
at LMTP connect?  What is the LMTP server doing in connect and LHLO
processing?

> Mar 27 15:00:35 localdomain postfix/lmtp[26451]: AB58261: 
> to=<d...@localdomain.com>, orig_to=<dst@localhost>, 
> relay=localdomain.com[/var/
> imap/socket/lmtp], conn_use=2, delay=0.01, delays=0/0/0/0, dsn=2.1.5, 
> status=sent (250 2.1.5 Ok SESSIONID=<localdomain.com-24769-1553695235-7>)

This shows LMTP connection re-use, but that should not make much
difference with unix-domain sockets.  Local connections are supposed
to be fast.

> Mar 27 15:00:35 localdomain postfix/lmtp[26453]: AC80663: 
> to=<d...@localdomain.com>, orig_to=<dst@localhost>, 
> relay=localdomain.com[/var/
> imap/socket/lmtp], conn_use=2, delay=0.01, delays=0/0/0/0, dsn=2.1.5, 
> status=sent (250 2.1.5 Ok SESSIONID=<localdomain.com-26455-1553695235-5>)

Again re-use, but the next connection is fresh:

> Mar 27 15:00:35 localdomain postfix/lmtp[26451]: AD4EB67: 
> to=<d...@localdomain.com>, orig_to=<dst@localhost>, 
> relay=localdomain.com[/var/
> imap/socket/lmtp], delay=0.01, delays=0/0/0/0, dsn=2.1.5, status=sent (250 
> 2.1.5 Ok SESSIONID=<localdomain.com-26452-1553695235-7>)

> Going deeper with bisecting the code, I found out that the following line 
> created this behaviour change:
> 
> -#define HAVE_NEXTHOP_STATE(state) (STR((state)->iterator->request_nexthop) 
> != 
> 0)
> +#define HAVE_NEXTHOP_STATE(state) 
> (STR((state)->iterator->request_nexthop)[0] != 0)

That's a necessary bugfix, without it the test is always true.

> Questions:
> 
> (1) Does it mean that the lmtp connections via socket are never cached or 
> reused?

There is no caching by nexthop domain, since there is no nexthop
domain for unix-domain LMTP sockets.  All there is is a nexthop
socket address.  So session caching, if any, would naturally be
just by endpoint address.  However, in 2.11 the smtp_connect_local()
function also changed:

    
https://github.com/vdukhovni/postfix/commit/072fa2d1f7beaf41756c30a69ef96cb611461cec#diff-24650cb98191c311688117b61626fffbR506

to use smtp_reuse_nexthop() rather than smtp_reuse_addr(), so that
SASL authenticated connections might be re-used, but this looks
like a mistake, since we're therefore creating cached sessions that
will never be re-used, but that tie up LMTP server resources,
possibly leading to subsequent congestion under load.

> (2) Is this expected behaviour? Is this a bug?

There's likely a bug.  We should either simulate a synthetic nexthop
($myhostname?) for unix-domain destinations, and then do nexthop
reuse (and perhaps do no caching by endpoint address for unix-domain
destinations), or go back to using re-use by endpoint address.
Currently there's a mismatch, and we're keeping connections alive
that can't be reused.

The fix will be easy, once we decide on the fine details.

-- 
        Viktor.

Reply via email to