There's a lot to consider here,
First up, you could do with figuring out the rates at which Exim is
dumping mail into the queue then throttling it using Exim to see if the
problem goes away. That way you'll know whether it is a problem with the
rate or the number/type of email. Look at things like smtp_accept_max or
maybe queue_smtp_domains to make deliveries go through the queue rather
than open up a new SMTP thread for every message.
Size of email and encoding/attachments will likely make things take longer.
Secondly, you need to use debugging to find out if something is
happening that is causing ASSP to take a long time to handle messages.
Thirdly, resources. I've a max of 112 concurrent connections showing on
the stats page though it is only 43 since last restart on Sunday so the
general average is lower. I have two VMs running Ubuntu with 16 vCPUs.
12GB on the primary and 16GB on the secondary as this runs the rebuild.
MySQL is a separate machine again with 16 vCPUs and 8GB ram.
So ASSP can easily handle the througput you're looking at and more, you
need to look for bottlenecks and other errors. The actual issue will
have ocurred at least 30s before the logs you have posted at 08:27:17 as
that is when the timeout counter started that expired at 08:27:47.
"Cannot pack NaN" makes me suspicious as well for the usual - check all
perl modules and ancillary files are up to date as well as the main
assp.pl. Something isn't right.
Then there's the another question about the config. Is there a
particular reason the Exim server needs to run through ASSP? All my
servers accept email then hand off to Exim for delivery. There are
plenty of servers that use ASSP as a smart host, but I'd question
putting a server that dumps mail like that through. The reason for that
is to think about the types of emails and the effect on the corpus. If
you're dumping a mailing list through then you're going to affect the
bayes/hmm database. You could redlist but then why waste the resources
and not just have Exim send direct?
I know it's been a week or so since you posted, hopefully you've done
some or all of that by now as it is fairly standard troubleshooting
rather than anything specific to ASSP. If you've confirmed your setup is
in order and can pull some logs that show ASSP actually causing a
problem then that's what the list is for.
All the best,
Colin.
On 07/07/2017 16:39, MK wrote:
Using ASSP CVS 2.5.6/17184.
I have a server that pumps about 1800 messages into a queue and exim
on that server makes connections to ASSP to forward the mail.
Basically ASSP is the outgoing mail server.
It get through about 140 messages, at which point the SMTP connections
time out (per exim's logs). I'm not sure the concurrency it generates
to do so, but the connections to the proxy SMTP server it sends to
gets to about 40 right away and then drops off (so I assume that means
my concurrent connections about 40)
Meanwhile, ASSP shows:
...[all is fine to here]...
Jul-07-17 08:27:46 [Main_Thread] Info: unable to detect any running
worker for a new connection - wait (max 30 seconds)
...[repeated]...
Jul-07-17 08:27:47 [Main_Thread] Info: unable to detect any running
worker for a new connection - wait (max 30 seconds)
Jul-07-17 08:27:47 [Main_Thread] Info: ConnectionTransferTimeOut (30
seconds) is now reached
Jul-07-17 08:27:47 [Main_Thread] Warning: Main_Thread is unable to
transfer connection to any worker - try again!
Jul-07-17 08:27:47 [Main_Thread] Error: Main_Thread is unable to
transfer connection to any worker within 120 seconds - restart ASSP
!
Jul-07-17 08:27:47 [Main_Thread] Initializing shutdown sequence
Jul-07-17 08:27:47 [Shutdown] Info: removing all SMTP and Proxy listeners
Jul-07-17 08:27:47 [Worker_4] Info: shutdown: Worker_4: Cannot pack
NaN with 'C' at sub main::ipNetwork line 11.
Jul-07-17 08:27:47 [Worker_3] Info: shutdown: Worker_3: Cannot pack
NaN with 'C' at sub main::ipNetwork line 11.
Jul-07-17 08:27:47 [Worker_5] Info: shutdown: Worker_5: Cannot pack
NaN with 'C' at sub main::ipNetwork line 11.
Jul-07-17 08:27:47 [Worker_3] Worker_3 finished
Jul-07-17 08:27:47 [Worker_4] Worker_4 finished
Jul-07-17 08:27:47 [Worker_5] Worker_5 finished
Jul-07-17 08:27:47 [Worker_2] Info: shutdown: Worker_2: Cannot pack
NaN with 'C' at sub main::ipNetwork line 11.
Jul-07-17 08:27:47 [Worker_2] Worker_2 finished
Jul-07-17 08:27:47 [Shutdown] Waiting for all SMTP-Workers to be finished
Jul-07-17 08:27:47 [Worker_1] Info: shutdown: Worker_1: Cannot pack
NaN with 'C' at sub main::ipNetwork line 11.
Once ASSP restarts and the retry interval is received, ASSP tries
again, makes it through about 200 messages and then the same outcome.
Of course what it's doing is flooding ASSP with SMTP connections.
The host is in AccetAllMail (yes I know we're not using relayport, but
we need to make sure the SMTP server can handle a flood of connections
gracefully)
The maxSMTPSessions is 64 and MaxSMTPipSessions is 15, and given the
status of the workers, I don't think it's hitting those limits.
5 NumComWorkers (SMTP Threads), EnableHighPerformance (off),
ThreadCycleTime (3000), IO:Poll engine.
Using a local bind resolver which shows nothing strange.
Any thoughts?
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test