Hi all!

We used to have postfix 2.6.11 in our systems, which was then updated with no 
problems to 3.3.2.
However, during a benchmark, we realized 3.3.2 was 5 times slower than the 
version before. Even after disabling all mail filters the slowdown was still 
the same.

The benchmark consists of a large amount of emails being fetched and entering 
postfix via the smtpd daemon and then handed over to cyrus via lmtp and socket 
to be delivered mostly to the same recipient.

At first, we thought maybe the concurrency limits have changed, or maybe now 
there are less processes running, but all the configuration variables 
regarding this have not changed.

The benchmark uses 30000 emails, but generating 20 emails is sufficient to see 
the bug.

After trials and errors and running out of ideas, I decided to bisect the code 
to see what made this change.

It happens that version 2.11.1 behaves as slow as 3.3.2, and version 2.11.0 as 
fast as 2.6.11. All with the same hardware, same configuration, same load.

With version >= 2.11.1, the first emails are received and processed somewhat 
instantly, after some time it seems that lmtp is not processing anything and 
just waiting while all emails are fetched, so afterwards it starts delivering 
to the recipients at the rate of about 2 per second. (See excerpt of the log 
file)

Excerpt of maillog version > 2.11.1:

Mar 27 14:46:50 localdomain postfix/qmgr[24339]: 7079468: 
from=<d...@localdomain.com>, size=356, nrcpt=1 (queue active)
Mar 27 14:46:50 localdomain postfix/lmtp[24750]: 6CEFF61: 
to=<d...@localdomain.com>, orig_to=<dst@localhost>, relay=localdomain.com[/var/
imap/socket/lmtp], delay=0.02, delays=0.01/0/0.01/0, dsn=2.1.5, status=sent 
(250 2.1.5 Ok SESSIONID=<localdomain.com-24421-1553694410-1>)
Mar 27 14:46:50 localdomain postfix/smtpd[24746]: 718F46A: 
client=localdomain.com[127.0.0.1]
Mar 27 14:46:50 localdomain postfix/qmgr[24339]: 6CEFF61: removed
Mar 27 14:46:50 localdomain postfix/cleanup[24748]: 718F46A: message-
id=<20190327134650.718f...@localdomain.com>
Mar 27 14:46:50 localdomain postfix/qmgr[24339]: 718F46A: 
from=<d...@localdomain.com>, size=356, nrcpt=1 (queue active)
Mar 27 14:46:50 localdomain postfix/smtpd[24746]: 7273461: 
client=localdomain.com[127.0.0.1]
Mar 27 14:46:50 localdomain postfix/lmtp[24752]: 6E8CE63: 
to=<d...@localdomain.com>, orig_to=<dst@localhost>, relay=localdomain.com[/var/
imap/socket/lmtp], delay=0.02, delays=0/0/0.01/0.01, dsn=2.1.5, status=sent 
(250 2.1.5 Ok SESSIONID=<localdomain.com-24753-1553694410-1>)
Mar 27 14:46:50 localdomain postfix/qmgr[24339]: 6E8CE63: removed
...
...
Mar 27 14:46:50 localdomain postfix/cleanup[24748]: 764A06A: message-
id=<20190327134650.764a...@localdomain.com>
Mar 27 14:46:50 localdomain postfix/qmgr[24339]: 764A06A: 
from=<d...@localdomain.com>, size=356, nrcpt=1 (queue active)
Mar 27 14:46:50 localdomain postfix/smtpd[24746]: 778DC61: 
client=localdomain.com[127.0.0.1]
Mar 27 14:46:50 localdomain postfix/cleanup[24758]: 778DC61: message-
id=<20190327134650.778d...@localdomain.com>
Mar 27 14:46:50 localdomain postfix/qmgr[24339]: 778DC61: 
from=<d...@localdomain.com>, size=356, nrcpt=1 (queue active)
Mar 27 14:46:50 localdomain postfix/smtpd[24746]: 786A96B: 
client=localdomain.com[127.0.0.1]
Mar 27 14:46:50 localdomain postfix/cleanup[24758]: 786A96B: message-
id=<20190327134650.786a...@localdomain.com>
...
...
Mar 27 14:46:52 localdomain postfix/lmtp[24752]: 764A06A: 
to=<d...@localdomain.com>, orig_to=<dst@localhost>, relay=localdomain.com[/var/
imap/socket/lmtp], delay=2, delays=0/2/0/0, dsn=2.1.5, status=sent (250 2.1.5 
Ok SESSIONID=<localdomain.com-24757-1553694412-1>)
Mar 27 14:46:52 localdomain postfix/qmgr[24339]: 764A06A: removed
Mar 27 14:46:54 localdomain postfix/lmtp[24750]: 778DC61: 
to=<d...@localdomain.com>, orig_to=<dst@localhost>, relay=localdomain.com[/var/
imap/socket/lmtp], delay=4, delays=0.01/2/2/0.02, dsn=2.1.5, status=sent (250 
2.1.5 Ok SESSIONID=<localdomain.com-24421-1553694414-1>)
Mar 27 14:46:54 localdomain postfix/qmgr[24339]: 778DC61: removed
...
...
Mar 27 14:46:58 localdomain postfix/lmtp[24750]: 7CFDE72: 
to=<d...@localdomain.com>, orig_to=<dst@localhost>, relay=localdomain.com[/var/
imap/socket/lmtp], delay=8.1, delays=0/6/2/0.02, dsn=2.1.5, status=sent (250 
2.1.5 Ok SESSIONID=<localdomain.com-24421-1553694418-1>)
Mar 27 14:46:58 localdomain postfix/qmgr[24339]: 7CFDE72: removed



With version <= 2.11.0, the messages are delivered as soon as they arrive.

Excerpt of maillog version <= 2.11.0:

...
...
Mar 27 15:00:35 localdomain postfix/qmgr[26443]: AA6CC67: removed
Mar 27 15:00:35 localdomain postfix/smtpd[26446]: AC80663: 
client=localdomain.com[127.0.0.1]
Mar 27 15:00:35 localdomain postfix/cleanup[26456]: AC80663: message-
id=<20190327140035.ac80...@localdomain.com>
Mar 27 15:00:35 localdomain postfix/qmgr[26443]: AC80663: 
from=<d...@localdomain.com>, size=356, nrcpt=1 (queue active)
Mar 27 15:00:35 localdomain postfix/lmtp[26451]: AB58261: 
to=<d...@localdomain.com>, orig_to=<dst@localhost>, relay=localdomain.com[/var/
imap/socket/lmtp], conn_use=2, delay=0.01, delays=0/0/0/0, dsn=2.1.5, 
status=sent (250 2.1.5 Ok SESSIONID=<localdomain.com-24769-1553695235-7>)
Mar 27 15:00:35 localdomain postfix/smtpd[26446]: AD4EB67: 
client=localdomain.com[127.0.0.1]
Mar 27 15:00:35 localdomain postfix/qmgr[26443]: AB58261: removed
Mar 27 15:00:35 localdomain postfix/cleanup[26456]: AD4EB67: message-
id=<20190327140035.ad4e...@localdomain.com>
Mar 27 15:00:35 localdomain postfix/qmgr[26443]: AD4EB67: 
from=<d...@localdomain.com>, size=356, nrcpt=1 (queue active)
Mar 27 15:00:35 localdomain postfix/smtpd[26446]: disconnect from 
localdomain.com[127.0.0.1]
Mar 27 15:00:35 localdomain postfix/lmtp[26453]: AC80663: 
to=<d...@localdomain.com>, orig_to=<dst@localhost>, relay=localdomain.com[/var/
imap/socket/lmtp], conn_use=2, delay=0.01, delays=0/0/0/0, dsn=2.1.5, 
status=sent (250 2.1.5 Ok SESSIONID=<localdomain.com-26455-1553695235-5>)
Mar 27 15:00:35 localdomain postfix/qmgr[26443]: AC80663: removed
Mar 27 15:00:35 localdomain postfix/lmtp[26451]: AD4EB67: 
to=<d...@localdomain.com>, orig_to=<dst@localhost>, relay=localdomain.com[/var/
imap/socket/lmtp], delay=0.01, delays=0/0/0/0, dsn=2.1.5, status=sent (250 
2.1.5 Ok SESSIONID=<localdomain.com-26452-1553695235-7>)
Mar 27 15:00:35 localdomain postfix/qmgr[26443]: AD4EB67: removed



Going deeper with bisecting the code, I found out that the following line 
created this behaviour change:

diff --git a/postfix/src/smtp/smtp.h b/postfix/src/smtp/smtp.h
index 336a4f47..5437088e 100644
--- a/postfix/src/smtp/smtp.h
+++ b/postfix/src/smtp/smtp.h
@@ -195,7 +195,7 @@ typedef struct SMTP_STATE {
        STR((state)->iterator->request_nexthop)[0] = 0; \
     }
 
-#define HAVE_NEXTHOP_STATE(state) (STR((state)->iterator->request_nexthop) != 
0)
+#define HAVE_NEXTHOP_STATE(state) (STR((state)->iterator->request_nexthop)[0] 
!= 0)
 
 
  /*

  
This changes the result of the conditions in a few places throughout the smtp 
code. Also, "request_nexthop" can only acquire a value different than 0 during 
the method "smtp_connect_inet", which probably shouldn't run during a unix 
socket connection. 

Questions:

(1) Does it mean that the lmtp connections via socket are never cached or 
reused?
(2) Is this expected behaviour? Is this a bug?



Other configs and outputs:


Excerpt from main.cf: (only the configs that I believe are pertinent)

mailbox_transport = lmtp:unix:/var/imap/socket/lmtp
lmtp_destination_recipient_limit = 1
local_destination_concurrency_limit = 2


Output from scache after 20 emails:

version <= 2.11.0:

postfix/scache[26740]: statistics: start interval Mar 27 15:20:50
postfix/scache[26740]: statistics: domain lookup hits=7 miss=8 success=46%
postfix/scache[26740]: statistics: max simultaneous domains=1 addresses=1 
connection=2


version >= 2.11.1:

postfix/scache[29823]: statistics: start interval Mar 27 15:24:59
postfix/scache[29823]: statistics: domain lookup hits=0 miss=17 success=0%
postfix/scache[29823]: statistics: max simultaneous domains=0 addresses=1 
connection=4


Some help to make sense of all of it would be really high appreciated.

Julie.


Reply via email to