I can confirm that this number is artificially high quite easily.
I actually have a third ASSP instance. This one is specially configured to
only accept emails from Office 365 and acts as an outbound relay purely to
gather messages for the corpus. This one does not experience the problem
with no running workers being available.

Its stats haven't been reset for a long time and show an average of 2901
per day. Concurrent SMTP sessions since last restart are 0 (4 max).

The last message showing in the logs says:

2016-10-05 14:43:57 [Main_Thread] Info: Main_Thread got connection request
2016-10-05 14:43:57 [Main_Thread] Info: Main_Thread looks up the best
Worker for new connection - 75
2016-10-05 14:43:57 [Main_Thread] Info: Main_Thread will wait (max 30 s)
for the answer of Worker_7 which handles 0 sockets
2016-10-05 14:43:57 [Worker_7] Worker_7 wakes up
2016-10-05 14:43:57 [Main_Thread] Info: Main_Thread freed by idle Worker_7
in 0.005 seconds - got (ok)
2016-10-05 14:43:57 [Worker_7] Info: Worker_7 got connection from
MainThread - 75/75
2016-10-05 14:43:57 [Worker_7] Info: Worker_7 freed Main_Thread - 84

There are no other emails going on at this time. Just the one connection
and it still says 75. This server is firewalled and the only messages it
ever sees are from known Office 365 IP addresses along with the internal
test server that sends 1 message every 60 seconds.

On Wed, Oct 5, 2016 at 2:33 PM, cw <colin.war...@gmail.com> wrote:

> Hi Thomas,
>
> Thank you for the rundown. The load is exceptionally high at the moment
> because everything backs up every time the mail servers stop. Having said
> that, I've just watched it happen again on one of the servers. It only had
> 7 messages showing in the shutdown_list when it happened. Looking at the
> logs I see:
>
> 2016-10-05 14:23:50 [Worker_4] Info: Worker_4 got connection from
> MainThread - 78/78
> 2016-10-05 14:23:50 [Main_Thread] Info: Main_Thread freed by interrupted
> Worker_4 in 10.758 seconds - got (ok)
> 2016-10-05 14:23:50 [Worker_4] Info: Worker_4 freed Main_Thread - 101
>
> So something is artificially inflating the connection count. My best guess
> ties in with something else I was looking at where a remote server was
> opening two connections that never got answered. It only got through on the
> third connection and that could be replicated each time. I put it down to
> some odd firewall on their end but maybe I was wrong in doing that.
>
> The actual stats on one mailserver are:
>
> Last reset 8 days, 0 hours, 6 mins
> Messages process 165413 (20664 per day)
> SMTP Connections received 113278
>
> The other is:
> Last reset 8 days, 0 hours, 18 mins
> Messaged processed 131720 (16438 per day)
> SMTP Connections received 93088
>
> The stats reset themselves during on of the upgrades of ASSP, not sure why.
>
> Each server is 16 CPU cores, 12GB RAM. DB is on a separate MySQL box
> though MTA and Clam run local. URIBL is disabled. I don't use OCR, DCC or
> Razor. I tried to get OCR working once and all hell broke lose and I
> haven't been back since.
>
> I've been monitoring the performance of the servers all along and not
> seeing any bottlenecks in CPU, RAM or network.When the problem started the
> mail servers had 8 and 10GB RAM respectively (10GB for the one running
> rebuild).
> All the best,
> Colin
>
>
>
> On Wed, Oct 5, 2016 at 2:07 PM, Thomas Eckardt <thomas.ecka...@thockar.com
> > wrote:
>
>> It seems, that this system is overloaded with active connections.
>>
>> >2016-10-05 11:47:41 [Main_Thread] Info: Main_Thread freed by interrupted
>> Worker_3 in 2.972 seconds - got (ok)
>>
>> Even this time value is ten times higher than expected.
>>
>> >2016-10-05 11:47:41 [Worker_3] Info: Worker_3 freed Main_Thread - 170
>>
>> The '170' is the filehandle number in perl (counted from 1.....) - this
>> implies 70 or more active connections.
>>
>> >2016-10-05 11:47:38 [Main_Thread] Info: Main_Thread will wait (max 30 s)
>> for the answer of Worker_3 which handles 12 sockets
>>
>> This implies the same, if the equal balancing of all connections over all
>> seven workers works correctly.
>>
>> >2016-10-05 11:47:43 [Worker_3] 109.168.50.75 disconnected:
>> session:7FF3AA6E9A28 109.168.50.75 - processing time 2 seconds
>> >2016-10-05 11:47:43 [Worker_3] SC-Time Worker_3: 0.103252172470093
>> >2016-10-05 11:47:43 [Worker_3] SC-Time Worker_3: 0.0278699398040771
>> >2016-10-05 11:47:43 [Worker_3] SC-Time Worker_3: 0.0423040390014648
>>
>> The processing time itself seems to be normal. SC-Time - is the time
>> required to process one loop (read + queue or read + check + queue) for
>> one active connection.
>>
>> Assuming an average of 4 second life time per connection - the workload
>> per day would be 1.5 million (70 / 4 *3600 *24) connections based on the
>> provided output.
>> Configured to an ISP mode the avg. maximum for a single assp instance is
>> 800.000 connections. To be able to handle workload peeks, the
>> configuration should not allow more than 400.000 connection a day.
>>
>> ISP mode:
>> - 16 CPU cores
>> - 16 Workers
>> - 24 GB RAM
>> - dedicated systems for MTA, enterprise database, ClamAV
>> - HMM and spamDB holded unshared in RAM
>> - high performance DNS servers
>> - limited DNS usage (disable most extensive DNS-using checks - eg. URIBL)
>> - no OCR
>> - no DCC
>> - no Razor2
>>
>> >2016-10-05 11:48:13 [Main_Thread] Info: Main_Thread freed by interrupted
>> Worker_3 in 31.940 seconds - got (ok)
>>
>> This time value is near the end of the line. At this time a small count of
>> additionally connections can lead in to an assp shutdown, because the
>> workers don't accept new connections.
>>
>> Thomas
>>
>>
>>
>> Von:    cw <colin.war...@gmail.com>
>> An:     ASSP development mailing list <assp-test@lists.sourceforge.net>
>> Datum:  05.10.2016 13:15
>> Betreff:        Re: [Assp-test] unable to detect any running worker
>>
>>
>>
>> Thanks.
>>
>> I've had both servers come up against unable to detect any running worker
>> since clearing out all the files suggested. So I'm getting 16279 running
>> now.
>>
>> I noticed the startup with those files removed was really quick, starting
>> back up the second time took several minutes so presumably reading those
>> files during startup takes a little while versus creating them fresh.
>>
>> I've caught one already and traced it through the new logs:
>>
>> 016-10-05 11:47:38 [Main_Thread] Info: Main_Thread got connection request
>> 2016-10-05 11:47:38 [Main_Thread] Info: Main_Thread looks up the best
>> Worker for new connection - 73
>> 2016-10-05 11:47:38 [Main_Thread] Info: try to interrupt worker Worker_3
>> (12) for new connection
>> 2016-10-05 11:47:38 [Main_Thread] Info: Main_Thread interrupted Worker_3
>> (12) to submit the connection
>> 2016-10-05 11:47:38 [Main_Thread] Info: Main_Thread will wait (max 30 s)
>> for the answer of Worker_3 which handles 12 sockets
>> 2016-10-05 11:47:41 [Worker_3] SC-Time Worker_3: 0.0462169647216797
>> 2016-10-05 11:47:41 [Worker_3] Info: Worker_3 got connection from
>> MainThread - 73/73
>> 2016-10-05 11:47:41 [Worker_3] Info: Worker_3 freed Main_Thread - 170
>> 2016-10-05 11:47:41 [Main_Thread] Info: Main_Thread freed by interrupted
>> Worker_3 in 2.972 seconds - got (ok)
>> 2016-10-05 11:47:41 [Worker_3] Connected: session:7FF3AA6E9A28
>> 109.168.50.75:41612 > 92.63.138.65:25 > 127.0.0.1:125
>> 2016-10-05 11:47:43 [Worker_3] SC-Time Worker_3: 0.103252172470093
>> 2016-10-05 11:47:43 [Worker_3] SC-Time Worker_3: 0.0278699398040771
>> 2016-10-05 11:47:43 [Worker_3] 109.168.50.75 [SMTP Reply] 220
>> mail2.smtphost.co.uk ESMTP Exim 4.86_2 Ubuntu Wed, 05 Oct 2016 11:47:41
>> +0100
>> 2016-10-05 11:47:43 [Worker_3] SC-Time Worker_3: 0.0423040390014648
>> 2016-10-05 11:47:43 [Worker_3] 109.168.50.75 SC-Time Worker_3:
>> 0.0296478271484375
>> 2016-10-05 11:47:43 [Worker_3] 109.168.50.75 disconnected:
>> session:7FF3AA6E9A28 109.168.50.75 - processing time 2 seconds
>> 2016-10-05 11:48:13 [Main_Thread] Info: Main_Thread freed by interrupted
>> Worker_3 in 31.940 seconds - got (ok)
>>
>> In this case, it looks like the connection ended without any actual data.
>> There is lots of activity reported by Worker_3 in between 11:47:43 and
>> 11:48:13 but this all pertains to other connections that were already in
>> progress.
>>
>> That is a bit different to the earlier one where a message was received
>> and
>> the message completed within the 30s window.
>>
>> I'm not seeing anything to help me figure out why though and I don't want
>> to simply post a big excerpt of the maillog.txt.
>>
>> On Wed, Oct 5, 2016 at 11:02 AM, Thomas Eckardt
>> <thomas.ecka...@thockar.com>
>> wrote:
>>
>> > >I looked at SF but only see 16275 in test (updated 3 days ago) so maybe
>> > it
>> > hasn't made its way live yet.
>> >
>> > Sorry, my background CVS sync was not running - update is done.
>> >
>> > Thomas
>> >
>> >
>> >
>> >
>> > Von:    cw <colin.war...@gmail.com>
>> > An:     ASSP development mailing list <assp-test@lists.sourceforge.net>
>> > Datum:  05.10.2016 11:48
>> > Betreff:        Re: [Assp-test] unable to detect any running worker
>> >
>> >
>> >
>> > Thank you Thomas.
>> >
>> > useDB4IntCache - already set to off
>> > I've set WorkerLog to diagnostic and done the other steps.
>> > I don't have anything in CorrectASSPcfg.pm so as part of the
>> > troubleshooting I have previously deleted it and downloaded a fresh copy
>> > from SourceForge.
>> >
>> > I have to say that ASSP started up within seconds after clearing those
>> > files out - it normally takes several minutes and has always done.
>> >
>> > I looked at SF but only see 16275 in test (updated 3 days ago) so maybe
>> it
>> > hasn't made its way live yet.
>> >
>> > On Wed, Oct 5, 2016 at 10:33 AM, Thomas Eckardt
>> > <thomas.ecka...@thockar.com>
>> > wrote:
>> >
>> > > I've provided an updated assp.pl (2.5.4 16279) in CVS /test. This
>> > version
>> > > shows some more information, if 'WorkerLog' is set to diagnostic.
>> > >
>> > > Thomas
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > Von:    cw <colin.war...@gmail.com>
>> > > An:     ASSP development mailing list
>> <assp-test@lists.sourceforge.net>
>> > > Datum:  05.10.2016 10:10
>> > > Betreff:        Re: [Assp-test] unable to detect any running worker
>> > >
>> > >
>> > >
>> > > Hi Thomas,
>> > >
>> > > Thanks for chipping in. All modules are installed by running the
>> latest
>> > > mod_inst.pl. Crypt::GOST on Ubuntu actually has a bug in it so it
>> > requires
>> > > a minor edit to the code to get it to build. So that module was
>> > installed
>> > > by switching to /root/.cpan/build/Crypt-GOST-x-x-x and running:
>> > > make clean
>> > > perl Makefile.PL
>> > > make
>> > > make test
>> > > make install
>> > >
>> > > I run cpan-outdated -p|cpanm from time to time in order to keep
>> modules
>> > up
>> > > to date as well so things don't stay stuck on old versions.
>> > >
>> > > Everything that is installed now is a completely fresh build as the
>> > > Upgrade
>> > > to 16.04 replaced perl 5.18 with 5.22 and all the modules therefore
>> had
>> > to
>> > > be installed from scratch.
>> > >
>> > > I bypassed the issue by truncating the tables and setting up the users
>> > > again. With there only being one user it was easier to do that than
>> muck
>> > > about with it - especially with the other bigger issue at hand.
>> > >
>> > > On Wed, Oct 5, 2016 at 8:59 AM, Thomas Eckardt
>> > > <thomas.ecka...@thockar.com>
>> > > wrote:
>> > >
>> > > > >to replace the corrupted encrypted
>> > > > strings with the correct values
>> > > >
>> > > > Was the 'Crypt::GOST' module from the SF download page at the old
>> assp
>> > > > instance?
>> > > > If it was, did you install the 'Crypt::GOST' module from the SF
>> > download
>> > > > page, before you started the new assp instance?
>> > > >
>> > > > https://sourceforge.net/projects/assp/files/ASSP%20V2%
>> > > > 20multithreading/ASSP%20V2%20module%20installation/Crypt-GOST/
>> > > >
>> > > > Thomas
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > Von:    cw <colin.war...@gmail.com>
>> > > > An:     ASSP development mailing list
>> > <assp-test@lists.sourceforge.net>
>> > > > Datum:  05.10.2016 09:52
>> > > > Betreff:        Re: [Assp-test] unable to detect any running worker
>> > > >
>> > > >
>> > > >
>> > > > Cheers for the reply.
>> > > >
>> > > > I don't think backups are an option seen as I've moved completely
>> from
>> > > > Ubuntu 14.04 to Ubuntu 16.04.
>> > > >
>> > > > Also, this issue has been around for months. It was causing a
>> handful
>> > of
>> > > > shutdowns a week with the occasional spat of more frequent
>> shutdowns.
>> > It
>> > > > is
>> > > > entirely possible that the errors are behaviour related and nothing
>> to
>> > > do
>> > > > with the upgrade and that current email behaviour is triggering it
>> big
>> > > > style.
>> > > >
>> > > > I'm not convinced though, unfortunately I'm not convinced of
>> anything
>> > > else
>> > > > hence not having much to go on. I can't see any consistencies in the
>> > > > behaviour leading up to the events.
>> > > >
>> > > > I don't think it is database related. It happened on one of the mail
>> > > > servers during the upgrade to 16.04 when ASSP had just started but I
>> > had
>> > > > not yet got into the web interface to replace the corrupted
>> encrypted
>> > > > strings with the correct values so all database connections were in
>> > > error.
>> > > >
>> > > > The problem has already started again this morning so I can see this
>> > > being
>> > > > another fun day that either leads to a fix or having to put
>> something
>> > > else
>> > > > in place.
>> > > >
>> > > > On Wed, Oct 5, 2016 at 4:34 AM, K Post <nntp.p...@gmail.com> wrote:
>> > > >
>> > > > > I've been reading here, but I haven't had anything to suggest. All
>> > > > seems
>> > > > > quite odd if it was working prior to upgrading and downgrading
>> > didn't
>> > > > work.
>> > > > >
>> > > > >
>> > > > > Could you spin up a backup of the installation after copying the
>> > > current
>> > > > > data?  Sure you'd have an older corpus, but I'd think you could
>> add
>> > > the
>> > > > new
>> > > > > files if necessary, manually replace whitelist etc.
>> > > > >
>> > > > >
>> > > > > On Tue, Oct 4, 2016 at 5:48 PM, cw <colin.war...@gmail.com>
>> wrote:
>> > > > >
>> > > > > > Further development on this today, very little.
>> > > > > > I have moved both servers onto Ubuntu 16.04 LTS which means
>> going
>> > > from
>> > > > > perl
>> > > > > > 5.18 to 5.22 and rebuilding all perl modules from scratch.
>> > > > > >
>> > > > > > The admin user db did not work after the upgrade so I had to
>> empty
>> > > the
>> > > > > > tables before it would come back online.
>> > > > > >
>> > > > > > I'm still getting delayed emails and assp shutting down telling
>> me
>> > > it
>> > > > is
>> > > > > > unable
>> > > > > > to detect any running worker.
>> > > > > >
>> > > > > > If this goes on much longer the MD will pull the plug and we'll
>> > end
>> > > up
>> > > > > > moving to a third party solution which is not something I want
>> but
>> > > if
>> > > > I
>> > > > > > can't fix it I can't defend it :/
>> > > > > >
>> > > > > > ------------------------------------------------------------
>> > > > > > ------------------
>> > > > > > Check out the vibrant tech community on one of the world's most
>> > > > > > engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> > > > > > _______________________________________________
>> > > > > > Assp-test mailing list
>> > > > > > Assp-test@lists.sourceforge.net
>> > > > > > https://lists.sourceforge.net/lists/listinfo/assp-test
>> > > > > >
>> > > > > >
>> > > > >
>> > > > > ------------------------------------------------------------
>> > > > > ------------------
>> > > > > Check out the vibrant tech community on one of the world's most
>> > > > > engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> > > > > _______________________________________________
>> > > > > Assp-test mailing list
>> > > > > Assp-test@lists.sourceforge.net
>> > > > > https://lists.sourceforge.net/lists/listinfo/assp-test
>> > > > >
>> > > > >
>> > > > ------------------------------------------------------------
>> > > > ------------------
>> > > > Check out the vibrant tech community on one of the world's most
>> > > > engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> > > > _______________________________________________
>> > > > Assp-test mailing list
>> > > > Assp-test@lists.sourceforge.net
>> > > > https://lists.sourceforge.net/lists/listinfo/assp-test
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > DISCLAIMER:
>> > > > *******************************************************
>> > > > This email and any files transmitted with it may be confidential,
>> > > legally
>> > > > privileged and protected in law and are intended solely for the use
>> of
>> > > the
>> > > >
>> > > > individual to whom it is addressed.
>> > > > This email was multiple times scanned for viruses. There should be
>> no
>> > > > known virus in this email!
>> > > > *******************************************************
>> > > >
>> > > >
>> > > > ------------------------------------------------------------
>> > > > ------------------
>> > > > Check out the vibrant tech community on one of the world's most
>> > > > engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> > > > _______________________________________________
>> > > > Assp-test mailing list
>> > > > Assp-test@lists.sourceforge.net
>> > > > https://lists.sourceforge.net/lists/listinfo/assp-test
>> > > >
>> > > >
>> > > ------------------------------------------------------------
>> > > ------------------
>> > > Check out the vibrant tech community on one of the world's most
>> > > engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> > > _______________________________________________
>> > > Assp-test mailing list
>> > > Assp-test@lists.sourceforge.net
>> > > https://lists.sourceforge.net/lists/listinfo/assp-test
>> > >
>> > >
>> > >
>> > >
>> > > DISCLAIMER:
>> > > *******************************************************
>> > > This email and any files transmitted with it may be confidential,
>> > legally
>> > > privileged and protected in law and are intended solely for the use of
>> > the
>> > >
>> > > individual to whom it is addressed.
>> > > This email was multiple times scanned for viruses. There should be no
>> > > known virus in this email!
>> > > *******************************************************
>> > >
>> > >
>> > > ------------------------------------------------------------
>> > > ------------------
>> > > Check out the vibrant tech community on one of the world's most
>> > > engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> > > _______________________________________________
>> > > Assp-test mailing list
>> > > Assp-test@lists.sourceforge.net
>> > > https://lists.sourceforge.net/lists/listinfo/assp-test
>> > >
>> > >
>> > ------------------------------------------------------------
>> > ------------------
>> > Check out the vibrant tech community on one of the world's most
>> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> > _______________________________________________
>> > Assp-test mailing list
>> > Assp-test@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/assp-test
>> >
>> >
>> >
>> >
>> > DISCLAIMER:
>> > *******************************************************
>> > This email and any files transmitted with it may be confidential,
>> legally
>> > privileged and protected in law and are intended solely for the use of
>> the
>> >
>> > individual to whom it is addressed.
>> > This email was multiple times scanned for viruses. There should be no
>> > known virus in this email!
>> > *******************************************************
>> >
>> >
>> > ------------------------------------------------------------
>> > ------------------
>> > Check out the vibrant tech community on one of the world's most
>> > engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> > _______________________________________________
>> > Assp-test mailing list
>> > Assp-test@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/assp-test
>> >
>> >
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Assp-test mailing list
>> Assp-test@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/assp-test
>>
>>
>>
>>
>> DISCLAIMER:
>> *******************************************************
>> This email and any files transmitted with it may be confidential, legally
>> privileged and protected in law and are intended solely for the use of the
>>
>> individual to whom it is addressed.
>> This email was multiple times scanned for viruses. There should be no
>> known virus in this email!
>> *******************************************************
>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Assp-test mailing list
>> Assp-test@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/assp-test
>>
>>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to