I saw the same issue with stuck workers, and also went back to b14097. PTROK was where I last saw them as well.
My signal ALRM error was different from yours though: May-02-14 00:05:27 [Main_Thread] Warning: got unexpected signal ALRM in Main_Thread: package - IO::Socket::SSL, file - /opt/perl-5.14.3/lib/site_perl/5.14.3/IO/Socket/SSL.pm, line - 757! -C Colin Waring said the following on 5/2/2014 2:40 AM: > Primary mailserver just crashed out twice and restarted ASSP: > > > > 2014-05-02 09:51:10 [Main_Thread] Info: Main_Thread freed by interrupted > Worker_6 in 0.851 seconds - got (ok) > > 2014-05-02 09:51:10 [Worker_6] 1.1.1.1 IP 1.1.1.1 matches acceptAllMail - > with 1.1.1.1/32 > > 2014-05-02 09:51:10 [Worker_6] Connected: session:9D3C0E50 1.1.1.1:37891 > > 195.88.101.110:25 > 127.0.0.1:125 > > 2014-05-02 09:51:11 [Main_Thread] Info: Main_Thread got connection request > > 2014-05-02 09:51:11 m1-20662-07189 [Worker_6] 2.2.2.2 <[email protected]> to: > [email protected] [Plugin] calling plugin ASSP_AFC > > 2014-05-02 09:51:16 [Worker_6] ClamAv Down > > 2014-05-02 09:51:16 [Worker_10000] Info: Name Server 194.168.4.123: > ResponseTime = 16 ms for sourceforge.net > > 2014-05-02 09:51:33 [Worker_6] ClamAv Up > > 2014-05-02 09:51:41 [Main_Thread] Warning: got unexpected signal ALRM in > Main_Thread: package - main, file - sub main::ThreadYield, line - 2! > > 2014-05-02 09:51:42 [Main_Thread] Info: unable to detect any running worker > for a new connection - wait (max 30 seconds) > > 2014-05-02 09:51:42 [Main_Thread] Info: unable to detect any running worker > for a new connection - wait (max 30 seconds) > > 2014-05-02 09:51:42 [Main_Thread] Info: unable to detect any running worker > for a new connection - wait (max 30 seconds) > > 2014-05-02 09:51:42 [Main_Thread] Info: unable to detect any running worker > for a new connection - wait (max 30 seconds) > > 2014-05-02 09:51:42 [Main_Thread] Info: unable to detect any running worker > for a new connection - wait (max 30 seconds) > > 2014-05-02 09:51:42 [Main_Thread] Info: unable to detect any running worker > for a new connection - wait (max 30 seconds) > > 2014-05-02 09:51:42 [Main_Thread] Info: unable to detect any running worker > for a new connection - wait (max 30 seconds) > > > > 2014-05-02 10:24:53 [Main_Thread] Warning: got unexpected signal ALRM in > Main_Thread: package - main, file - sub main::ThreadYield, line - 15! > > 2014-05-02 10:25:17 [Worker_10000] Info: Name Server 194.168.4.123: > ResponseTime = 30 ms for sourceforge.net > > 2014-05-02 10:25:43 [Worker_10000] Info: synchronizing all BerkeleyDB hashes > to disk > > 2014-05-02 10:25:43 [Worker_10000] Info: compacting all BerkeleyDB hashes on > disk > > 2014-05-02 10:25:45 [Main_Thread] Warning: Main_Thread is unable to transfer > connection to any worker - try again! > > 2014-05-02 10:25:48 [Worker_10000] SSLfailedCache: cleaning cache finished: > IP's before=2, deleted=0 > > 2014-05-02 10:25:48 [Worker_10000] LocalFrequency: cleaning cache finished: > addresses's before=4, deleted=3 > > 2014-05-02 10:25:48 [Worker_10000] SubjectFrequency: cleaning cache > finished: subjects before=44, deleted=28 > > 2014-05-02 10:26:16 [Main_Thread] Info: Loop in Worker_6 was not active for > 206 seconds > > 2014-05-02 10:26:16 [Main_Thread] Info: Worker_6 : last sigoff in main, sub > main::SPFok, 7, main::SPFok_Run, 1, , , at 14-2-4 10:2251 1399022571.24198 > - 78 > > 2014-05-02 10:26:16 [Main_Thread] Info: Worker_6 : last sigon in main, sub > main::SMTPTraffic, 13, main::sigonTry, 1, , , at 14-2-4 10:2250 > 1399022570.72899 - 13 > > 2014-05-02 10:26:16 [Main_Thread] Info: Worker_6 : last action was : SPF2 > > 2014-05-02 10:26:16 [Main_Thread] Warning: try to terminate > inactive/stucking Worker_6 > > 2014-05-02 10:26:16 [Main_Thread] Warning: Main_Thread is unable to transfer > connection to any worker - try again! > > 2014-05-02 10:26:16 [Main_Thread] Info: unable to detect any running worker > for a new connection - wait (max 30 seconds) > > > > I logged on and noticed that the load was over 7, perl process from top: > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 27003 root 20 0 609m 516m 5568 S 385 17.1 50:41.47 perl > > > > I've been watching the worker status page. What happens is one by one the > workers go to PTROK (stuck). Immediately before I have seen the workers say > both SPF2 (stuck) and MXsomething (stuck). > > > > I've had to roll back to 14097 unfortunately. > > > > All the best, > > Colin Waring. > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available. > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Assp-test mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/assp-test > ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available. Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ Assp-test mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/assp-test
