On Wed, Mar 27, 2019 at 03:36:28PM +0100, Juliana Rodrigueiro wrote: > However, during a benchmark, we realized 3.3.2 was 5 times slower than the > version before.
This is misleading. Postfix is not 5 times slower, your benchmark appears to be measuring the LMTP delivery rate to a single sink mailbox, and this artificial work-load may be slower under some adverse conditions (below...). > With version >= 2.11.1, the first emails are received and processed somewhat > instantly, How many? Are those connections closed or kept open in scache? > Mar 27 14:46:50 localdomain postfix/qmgr[24339]: 7079468: > from=<d...@localdomain.com>, size=356, nrcpt=1 (queue active) > Mar 27 14:46:50 localdomain postfix/lmtp[24750]: 6CEFF61: > to=<d...@localdomain.com>, orig_to=<dst@localhost>, > relay=localdomain.com[/var/ > imap/socket/lmtp], delay=0.02, delays=0.01/0/0.01/0, dsn=2.1.5, status=sent > (250 2.1.5 Ok SESSIONID=<localdomain.com-24421-1553694410-1>) > Mar 27 14:46:50 localdomain postfix/lmtp[24752]: 6E8CE63: > to=<d...@localdomain.com>, orig_to=<dst@localhost>, > relay=localdomain.com[/var/ > imap/socket/lmtp], delay=0.02, delays=0/0/0.01/0.01, dsn=2.1.5, status=sent > (250 2.1.5 Ok SESSIONID=<localdomain.com-24753-1553694410-1>) > Mar 27 14:46:52 localdomain postfix/lmtp[24752]: 764A06A: > to=<d...@localdomain.com>, orig_to=<dst@localhost>, > relay=localdomain.com[/var/ > imap/socket/lmtp], delay=2, delays=0/2/0/0, dsn=2.1.5, status=sent (250 2.1.5 > Ok SESSIONID=<localdomain.com-24757-1553694412-1>) This message waited 2s in the active queue, before it was its turn to go, this suggests some other deliveries were still in progress... > Mar 27 14:46:54 localdomain postfix/lmtp[24750]: 778DC61: > to=<d...@localdomain.com>, orig_to=<dst@localhost>, > relay=localdomain.com[/var/ > imap/socket/lmtp], delay=4, delays=0.01/2/2/0.02, dsn=2.1.5, status=sent (250 This delivery took 2 seconds to complete a connection, why? > Mar 27 14:46:58 localdomain postfix/lmtp[24750]: 7CFDE72: > to=<d...@localdomain.com>, orig_to=<dst@localhost>, > relay=localdomain.com[/var/ > imap/socket/lmtp], delay=8.1, delays=0/6/2/0.02, dsn=2.1.5, status=sent (250 This delivery spent 6s waiting in the active queue and 2s to complete a connection. Why are connections to the LMTP server so expensive? Is your syslog service logging synchronously? Perhaps logging of connection establishment (in the LMTP server) is causing congestion at LMTP connect? What is the LMTP server doing in connect and LHLO processing? > Mar 27 15:00:35 localdomain postfix/lmtp[26451]: AB58261: > to=<d...@localdomain.com>, orig_to=<dst@localhost>, > relay=localdomain.com[/var/ > imap/socket/lmtp], conn_use=2, delay=0.01, delays=0/0/0/0, dsn=2.1.5, > status=sent (250 2.1.5 Ok SESSIONID=<localdomain.com-24769-1553695235-7>) This shows LMTP connection re-use, but that should not make much difference with unix-domain sockets. Local connections are supposed to be fast. > Mar 27 15:00:35 localdomain postfix/lmtp[26453]: AC80663: > to=<d...@localdomain.com>, orig_to=<dst@localhost>, > relay=localdomain.com[/var/ > imap/socket/lmtp], conn_use=2, delay=0.01, delays=0/0/0/0, dsn=2.1.5, > status=sent (250 2.1.5 Ok SESSIONID=<localdomain.com-26455-1553695235-5>) Again re-use, but the next connection is fresh: > Mar 27 15:00:35 localdomain postfix/lmtp[26451]: AD4EB67: > to=<d...@localdomain.com>, orig_to=<dst@localhost>, > relay=localdomain.com[/var/ > imap/socket/lmtp], delay=0.01, delays=0/0/0/0, dsn=2.1.5, status=sent (250 > 2.1.5 Ok SESSIONID=<localdomain.com-26452-1553695235-7>) > Going deeper with bisecting the code, I found out that the following line > created this behaviour change: > > -#define HAVE_NEXTHOP_STATE(state) (STR((state)->iterator->request_nexthop) > != > 0) > +#define HAVE_NEXTHOP_STATE(state) > (STR((state)->iterator->request_nexthop)[0] != 0) That's a necessary bugfix, without it the test is always true. > Questions: > > (1) Does it mean that the lmtp connections via socket are never cached or > reused? There is no caching by nexthop domain, since there is no nexthop domain for unix-domain LMTP sockets. All there is is a nexthop socket address. So session caching, if any, would naturally be just by endpoint address. However, in 2.11 the smtp_connect_local() function also changed: https://github.com/vdukhovni/postfix/commit/072fa2d1f7beaf41756c30a69ef96cb611461cec#diff-24650cb98191c311688117b61626fffbR506 to use smtp_reuse_nexthop() rather than smtp_reuse_addr(), so that SASL authenticated connections might be re-used, but this looks like a mistake, since we're therefore creating cached sessions that will never be re-used, but that tie up LMTP server resources, possibly leading to subsequent congestion under load. > (2) Is this expected behaviour? Is this a bug? There's likely a bug. We should either simulate a synthetic nexthop ($myhostname?) for unix-domain destinations, and then do nexthop reuse (and perhaps do no caching by endpoint address for unix-domain destinations), or go back to using re-use by endpoint address. Currently there's a mismatch, and we're keeping connections alive that can't be reused. The fix will be easy, once we decide on the fine details. -- Viktor.