I saw the same issue with stuck workers, and also went back to b14097.
PTROK was where I last saw them as well.

My signal ALRM error was different from yours though:

May-02-14 00:05:27 [Main_Thread] Warning: got unexpected signal ALRM in 
Main_Thread: package - IO::Socket::SSL, file - 
/opt/perl-5.14.3/lib/site_perl/5.14.3/IO/Socket/SSL.pm, line - 757!

-C



Colin Waring said the following on 5/2/2014 2:40 AM:
> Primary mailserver just crashed out twice and restarted ASSP:
>
>   
>
> 2014-05-02 09:51:10 [Main_Thread] Info: Main_Thread freed by interrupted
> Worker_6 in 0.851 seconds - got (ok)
>
> 2014-05-02 09:51:10 [Worker_6] 1.1.1.1 IP 1.1.1.1 matches acceptAllMail -
> with 1.1.1.1/32
>
> 2014-05-02 09:51:10 [Worker_6] Connected: session:9D3C0E50 1.1.1.1:37891 >
> 195.88.101.110:25 > 127.0.0.1:125
>
> 2014-05-02 09:51:11 [Main_Thread] Info: Main_Thread got connection request
>
> 2014-05-02 09:51:11 m1-20662-07189 [Worker_6] 2.2.2.2 <[email protected]> to:
> [email protected] [Plugin] calling plugin ASSP_AFC
>
> 2014-05-02 09:51:16 [Worker_6] ClamAv Down
>
> 2014-05-02 09:51:16 [Worker_10000] Info: Name Server 194.168.4.123:
> ResponseTime = 16 ms for sourceforge.net
>
> 2014-05-02 09:51:33 [Worker_6] ClamAv Up
>
> 2014-05-02 09:51:41 [Main_Thread] Warning: got unexpected signal ALRM in
> Main_Thread: package - main, file - sub main::ThreadYield, line - 2!
>
> 2014-05-02 09:51:42 [Main_Thread] Info: unable to detect any running worker
> for a new connection - wait (max 30 seconds)
>
> 2014-05-02 09:51:42 [Main_Thread] Info: unable to detect any running worker
> for a new connection - wait (max 30 seconds)
>
> 2014-05-02 09:51:42 [Main_Thread] Info: unable to detect any running worker
> for a new connection - wait (max 30 seconds)
>
> 2014-05-02 09:51:42 [Main_Thread] Info: unable to detect any running worker
> for a new connection - wait (max 30 seconds)
>
> 2014-05-02 09:51:42 [Main_Thread] Info: unable to detect any running worker
> for a new connection - wait (max 30 seconds)
>
> 2014-05-02 09:51:42 [Main_Thread] Info: unable to detect any running worker
> for a new connection - wait (max 30 seconds)
>
> 2014-05-02 09:51:42 [Main_Thread] Info: unable to detect any running worker
> for a new connection - wait (max 30 seconds)
>
>   
>
> 2014-05-02 10:24:53 [Main_Thread] Warning: got unexpected signal ALRM in
> Main_Thread: package - main, file - sub main::ThreadYield, line - 15!
>
> 2014-05-02 10:25:17 [Worker_10000] Info: Name Server 194.168.4.123:
> ResponseTime = 30 ms for sourceforge.net
>
> 2014-05-02 10:25:43 [Worker_10000] Info: synchronizing all BerkeleyDB hashes
> to disk
>
> 2014-05-02 10:25:43 [Worker_10000] Info: compacting all BerkeleyDB hashes on
> disk
>
> 2014-05-02 10:25:45 [Main_Thread] Warning: Main_Thread is unable to transfer
> connection to any worker - try again!
>
> 2014-05-02 10:25:48 [Worker_10000] SSLfailedCache: cleaning cache finished:
> IP's before=2, deleted=0
>
> 2014-05-02 10:25:48 [Worker_10000] LocalFrequency: cleaning cache finished:
> addresses's before=4, deleted=3
>
> 2014-05-02 10:25:48 [Worker_10000] SubjectFrequency: cleaning cache
> finished: subjects before=44, deleted=28
>
> 2014-05-02 10:26:16 [Main_Thread] Info: Loop in Worker_6 was not active for
> 206 seconds
>
> 2014-05-02 10:26:16 [Main_Thread] Info: Worker_6 : last sigoff in main, sub
> main::SPFok, 7, main::SPFok_Run, 1, , ,  at 14-2-4 10:2251 1399022571.24198
> - 78
>
> 2014-05-02 10:26:16 [Main_Thread] Info: Worker_6 : last sigon in main, sub
> main::SMTPTraffic, 13, main::sigonTry, 1, , ,  at 14-2-4 10:2250
> 1399022570.72899 - 13
>
> 2014-05-02 10:26:16 [Main_Thread] Info: Worker_6 : last action was : SPF2
>
> 2014-05-02 10:26:16 [Main_Thread] Warning: try to terminate
> inactive/stucking Worker_6
>
> 2014-05-02 10:26:16 [Main_Thread] Warning: Main_Thread is unable to transfer
> connection to any worker - try again!
>
> 2014-05-02 10:26:16 [Main_Thread] Info: unable to detect any running worker
> for a new connection - wait (max 30 seconds)
>
>   
>
> I logged on and noticed that the load was over 7, perl process from top:
>
>   
>
>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>
> 27003 root      20   0  609m 516m 5568 S  385 17.1  50:41.47 perl
>
>   
>
> I've been watching the worker status page. What happens is one by one the
> workers go to PTROK (stuck). Immediately before I have seen the workers say
> both SPF2 (stuck) and MXsomething (stuck).
>
>   
>
> I've had to roll back to 14097 unfortunately.
>
>   
>
> All the best,
>
> Colin Waring.
>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos.  Get
> unparalleled scalability from the best Selenium testing platform available.
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs
> _______________________________________________
> Assp-test mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/assp-test
>


------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get 
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to