Hi Brian,

Could you post your conf/httpd.conf.d/httpd.webservices please? 

Also, post the output to this:

# cat /proc/$(pgrep -u pf -nf webservices)/limits

Google suggests you may be hitting a limit somewhere.

--
Louis Munro
[email protected]  ::  www.inverse.ca 
+1.514.447.4918 x125  :: +1 (866) 353-6153 x125
Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence 
(www.packetfence.org)

On 2014-10-02, at 23:15 , Brian Lucas <[email protected]> wrote:

> Well, the problem has resurfaced.  The relevant log entries are:
> 
> [Thu Oct 02 22:04:07 2014] [emerg] (4)Interrupted system call: couldn't grab 
> the accept mutex
> 
> [Thu Oct 02 22:04:07 2014] [alert] Child 4973 returned a Fatal error... 
> Apache is exiting!
> 
> 
> 
> I can't figure this out.  Any help would be appreciated.  webservices is 
> definitely what is crashing.  I find it a bit odd that there are entries in 
> packetfence.log after this crash attributed to httpd.webservices and it takes 
> some time after this crash for everything to fall apart.  pfmon doesn't seem 
> to successfully restart it either.  
> 
> Oct 02 22:10:13 pfcmd.pl(5590) INFO: Daemon httpd.webservices took 7.219 
> seconds to start. (pf::services::manager::launchService)
> 
> makes it appear as if it started back up, but it does not and there is no 
> relevant info in httpd.webservices.error
> 
> 
> 
> I am at a loss :(
> 
> 
> 
> 
> On Wed, Oct 1, 2014 at 10:00 AM, Brian Lucas <[email protected]> wrote:
> Thanks for following along everyone. The problem appears to be resolved. I'm 
> guessing it was more the 2 process per interface causing deadlocks. But 
> cleaning the database sure didn't hurt! Cheers!
> 
> On Sep 30, 2014 4:30 PM, "Brian Lucas" <[email protected]> wrote:
> Further research shows that I had two pfdhcplisteners running on each 
> interface... bad shutdown somewhere along the way?  Some reading of the past 
> posts shows that can cause some database issues as well.. Here's hoping.
> 
> On Tue, Sep 30, 2014 at 2:52 PM, Brian Lucas <[email protected]> wrote:
> I'm going to have to wait for traffic to be back up to normal to see if we're 
> okay but it looks like the database maintenance script was not running since 
> the update due to the password for mysql needing to be re input.  The radacct 
> table MAY have been getting out of hand big and causing a slowdown that was 
> in turn causing the webservices to crash and bringing everything down.  Fixed 
> the password and cleaned the database.  I will post back with results once my 
> users are back.
> 
> Brian
> 
> On Tue, Sep 30, 2014 at 10:19 AM, Brian Lucas <[email protected]> wrote:
> There hasn't been enough usage on the network to cause my crash to happen 
> again yet, but I thought maybe this could help diagnose the problem.  Here is 
> a snippit of httpd.webservices.error around the timeframe of the crashes.
> 
> 
> [Sun Sep 28 14:01:16 2014] [notice] Apache/2.2.15 (Unix) mod_ssl/2.2.15 
> OpenSSL/1.0.1e-fips mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming normal 
> operations
> 
> [Sun Sep 28 14:02:21 2014] [error] server reached MaxClients setting, 
> consider raising the MaxClients setting
> 
> [Sun Sep 28 14:06:24 2014] [emerg] (4)Interrupted system call: couldn't grab 
> the accept mutex
> 
> [Sun Sep 28 14:06:24 2014] [alert] Child 11638 returned a Fatal error... 
> Apache is exiting!
> 
> [Sun Sep 28 14:06:46 2014] [emerg] (4)Interrupted system call: couldn't grab 
> the accept mutex
> 
> [Sun Sep 28 14:06:47 2014] [emerg] (4)Interrupted system call: couldn't grab 
> the accept mutex
> 
> [Sun Sep 28 14:17:24 2014] [emerg] (4)Interrupted system call: couldn't grab 
> the accept mutex
> 
> [Sun Sep 28 14:17:25 2014] [emerg] (4)Interrupted system call: couldn't grab 
> the accept mutex
> 
> 
> 
> Seeing the "Server reached MaxClients setting" error as well as the crazy 
> amount of connections that are happening per minute in the 
> httpd.webservices.access file makes me wonder if this is the problem.
> 
> 
> 
> Here is a snippit of that log around the same time frame:
> 
> 
> 
> 127.0.0.1 - - [28/Sep/2014:14:03:17 -0500] "OPTIONS * HTTP/1.0" 200 - "-" 
> "Apache (internal dummy connection)"
> 
> 127.0.0.1 - radius_accounting [28/Sep/2014:14:03:15 -0500] "POST / HTTP/1.1" 
> 200 35 "-" "-"
> 
> 127.0.0.1 - radius_accounting [28/Sep/2014:14:03:15 -0500] "POST / HTTP/1.1" 
> 200 35 "-" "-"
> 
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:03 -0500] "POST / HTTP/1.1" 
> 200 66 "-" "-"
> 
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:02:03 -0500] "POST / HTTP/1.1" 
> 200 66 "-" "-"
> 
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:02:19 -0500] "POST / HTTP/1.1" 
> 200 66 "-" "-"
> 
> 127.0.0.1 - radius_accounting [28/Sep/2014:14:03:15 -0500] "POST / HTTP/1.1" 
> 200 35 "-" "-"
> 
> 127.0.0.1 - radius_accounting [28/Sep/2014:14:03:15 -0500] "POST / HTTP/1.1" 
> 200 35 "-" "-"
> 
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:01:37 -0500] "POST / HTTP/1.1" 
> 200 66 "-" "-"
> 
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:06 -0500] "POST / HTTP/1.1" 
> 200 66 "-" "-"
> 
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:04 -0500] "POST / HTTP/1.1" 
> 200 66 "-" "-"
> 
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:04 -0500] "POST / HTTP/1.1" 
> 200 66 "-" "-"
> 
> 127.0.0.1 - update_iplog [28/Sep/2014:14:03:17 -0500] "POST / HTTP/1.1" 204 - 
> "-" "-"
> 
> 127.0.0.1 - trigger_violation [28/Sep/2014:14:03:17 -0500] "POST / HTTP/1.1" 
> 204 - "-" "-"
> 
> 127.0.0.1 - trigger_violation [28/Sep/2014:14:03:17 -0500] "POST / HTTP/1.1" 
> 204 - "-" "-"
> 
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:01:37 -0500] "POST / HTTP/1.1" 
> 200 66 "-" "-"
> 
> 127.0.0.1 - - [28/Sep/2014:14:03:18 -0500] "OPTIONS * HTTP/1.0" 200 - "-" 
> "Apache (internal dummy connection)"
> 
> 127.0.0.1 - - [28/Sep/2014:14:03:19 -0500] "OPTIONS * HTTP/1.0" 200 - "-" 
> "Apache (internal dummy connection)"
> 
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:04 -0500] "POST / HTTP/1.1" 
> 200 66 "-" "-"
> 
> 127.0.0.1 - update_iplog [28/Sep/2014:14:03:19 -0500] "POST / HTTP/1.1" 204 - 
> "-" "-"
> 
> 127.0.0.1 - update_iplog [28/Sep/2014:14:03:19 -0500] "POST / HTTP/1.1" 204 - 
> "-" "-"
> 
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:17 -0500] "POST / HTTP/1.1" 
> 200 66 "-" "-"
> 
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:17 -0500] "POST / HTTP/1.1" 
> 200 66 "-" "-"
> 
> 127.0.0.1 - - [28/Sep/2014:14:03:20 -0500] "OPTIONS * HTTP/1.0" 200 - "-" 
> "Apache (internal dummy connection)"
> 
> 127.0.0.1 - - [28/Sep/2014:14:03:21 -0500] "OPTIONS * HTTP/1.0" 200 - "-" 
> "Apache (internal dummy connection)"
> 
> 127.0.0.1 - - [28/Sep/2014:14:03:22 -0500] "OPTIONS * HTTP/1.0" 200 - "-" 
> "Apache (internal dummy connection)"
> 
> 127.0.0.1 - trigger_violation [28/Sep/2014:14:03:22 -0500] "POST / HTTP/1.1" 
> 204 - "-" "-"
> 
> 127.0.0.1 - trigger_violation [28/Sep/2014:14:03:22 -0500] "POST / HTTP/1.1" 
> 204 - "-" "-"
> 
> 127.0.0.1 - - [28/Sep/2014:14:03:23 -0500] "OPTIONS * HTTP/1.0" 200 - "-" 
> "Apache (internal dummy connection)"
> 
> 127.0.0.1 - - [28/Sep/2014:14:03:24 -0500] "OPTIONS * HTTP/1.0" 200 - "-" 
> "Apache (internal dummy connection)"
> 
> 127.0.0.1 - update_iplog [28/Sep/2014:14:03:24 -0500] "POST / HTTP/1.1" 204 - 
> "-" "-"
> 
> 127.0.0.1 - - [28/Sep/2014:14:03:25 -0500] "OPTIONS * HTTP/1.0" 200 - "-" 
> "Apache (internal dummy connection)"
> 
> 127.0.0.1 - - [28/Sep/2014:14:03:26 -0500] "OPTIONS * HTTP/1.0" 200 - "-" 
> "Apache (internal dummy connection)"
> 
> 127.0.0.1 - - [28/Sep/2014:14:03:27 -0500] "OPTIONS * HTTP/1.0" 200 - "-" 
> "Apache (internal dummy connection)"
> 
> 127.0.0.1 - - [28/Sep/2014:14:03:28 -0500] "OPTIONS * HTTP/1.0" 200 - "-" 
> "Apache (internal dummy connection)"
> 
> 
> 127.0.0.1 - - [28/Sep/2014:14:03:29 -0500] "OPTIONS * HTTP/1.0" 200 - "-" 
> "Apache (internal dummy connection)"
> 
> 
> 
> 
> 
> 
> On Mon, Sep 29, 2014 at 6:09 PM, Brian Lucas <[email protected]> wrote:
> Will do and post back. Right now I have all services restarting one an hour 
> from cron to avoid disruption to my users. Will have to wait until an odd 
> hour.
> 
> Brian
> 
> On Sep 29, 2014 3:22 PM, "Louis Munro" <[email protected]> wrote:
> 
> 
> On 2014-09-26, at 19:19 , Brian Lucas <[email protected]> wrote:
> 
>> All,
>> 
>> I'm seeing 1000s of the following error per hour on our setup after the 
>> update to 4.4.  A restart of all services clears it up for a time, but it 
>> comes back.  Any suggestions as to the problem?
>> 
>> Fri Sep 26 18:16:45 2014 : Error: rlm_perl: An error occurred while 
>> processing the authorize RPC request: An error occured while sending a 
>> MessagePack request: 7 Couldn't connect to server couldn't connect to host 
>> at /usr/local/pf/lib//pf/radius/rpc.pm line 52. 
>> 
>> 
> 
> Hi Brian,
> 
> Next time this happens, try to see why radius could not connect to the 
> webservice.
> 
> By default, the webservice runs on port 9090 on localhost.
> If you try to connect to it yourself, does it work?
> 
> e.g. run this command:
> 
> # curl -kvLI http://localhost:9090
>  
> And see what it says.
> 
> --
> Louis Munro
> [email protected]  ::  www.inverse.ca 
> +1.514.447.4918 x125  :: +1 (866) 353-6153 x125
> Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence 
> (www.packetfence.org)
> 
> 
> ------------------------------------------------------------------------------
> Slashdot TV.  Videos for Nerds.  Stuff that Matters.
> http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
> _______________________________________________
> PacketFence-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/packetfence-users
> 
> 
> 
> 
> 
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk_______________________________________________
> PacketFence-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/packetfence-users

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
PacketFence-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/packetfence-users

Reply via email to