Thanks for following along everyone. The problem appears to be resolved.
I'm guessing it was more the 2 process per interface causing deadlocks. But
cleaning the database sure didn't hurt! Cheers!
On Sep 30, 2014 4:30 PM, "Brian Lucas" <[email protected]> wrote:
> Further research shows that I had two pfdhcplisteners running on each
> interface... bad shutdown somewhere along the way? Some reading of the
> past posts shows that can cause some database issues as well.. Here's
> hoping.
>
> On Tue, Sep 30, 2014 at 2:52 PM, Brian Lucas <[email protected]> wrote:
>
>> I'm going to have to wait for traffic to be back up to normal to see if
>> we're okay but it looks like the database maintenance script was not
>> running since the update due to the password for mysql needing to be re
>> input. The radacct table MAY have been getting out of hand big and causing
>> a slowdown that was in turn causing the webservices to crash and bringing
>> everything down. Fixed the password and cleaned the database. I will post
>> back with results once my users are back.
>>
>> Brian
>>
>> On Tue, Sep 30, 2014 at 10:19 AM, Brian Lucas <[email protected]> wrote:
>>
>>> There hasn't been enough usage on the network to cause my crash to
>>> happen again yet, but I thought maybe this could help diagnose the
>>> problem. Here is a snippit of httpd.webservices.error around the timeframe
>>> of the crashes.
>>>
>>>
>>> [Sun Sep 28 14:01:16 2014] [notice] Apache/2.2.15 (Unix) mod_ssl/2.2.15
>>> OpenSSL/1.0.1e-fips mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming
>>> normal operations
>>>
>>> [Sun Sep 28 14:02:21 2014] [error] server reached MaxClients setting,
>>> consider raising the MaxClients setting
>>>
>>> [Sun Sep 28 14:06:24 2014] [emerg] (4)Interrupted system call: couldn't
>>> grab the accept mutex
>>>
>>> [Sun Sep 28 14:06:24 2014] [alert] Child 11638 returned a Fatal error...
>>> Apache is exiting!
>>>
>>> [Sun Sep 28 14:06:46 2014] [emerg] (4)Interrupted system call: couldn't
>>> grab the accept mutex
>>>
>>> [Sun Sep 28 14:06:47 2014] [emerg] (4)Interrupted system call: couldn't
>>> grab the accept mutex
>>>
>>> [Sun Sep 28 14:17:24 2014] [emerg] (4)Interrupted system call: couldn't
>>> grab the accept mutex
>>>
>>> [Sun Sep 28 14:17:25 2014] [emerg] (4)Interrupted system call: couldn't
>>> grab the accept mutex
>>>
>>>
>>> Seeing the "Server reached MaxClients setting" error as well as the
>>> crazy amount of connections that are happening per minute in the
>>> httpd.webservices.access file makes me wonder if this is the problem.
>>>
>>>
>>> Here is a snippit of that log around the same time frame:
>>>
>>>
>>> 127.0.0.1 - - [28/Sep/2014:14:03:17 -0500] "OPTIONS * HTTP/1.0" 200 -
>>> "-" "Apache (internal dummy connection)"
>>>
>>> 127.0.0.1 - radius_accounting [28/Sep/2014:14:03:15 -0500] "POST /
>>> HTTP/1.1" 200 35 "-" "-"
>>>
>>> 127.0.0.1 - radius_accounting [28/Sep/2014:14:03:15 -0500] "POST /
>>> HTTP/1.1" 200 35 "-" "-"
>>>
>>> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:03 -0500] "POST /
>>> HTTP/1.1" 200 66 "-" "-"
>>>
>>> 127.0.0.1 - radius_authorize [28/Sep/2014:14:02:03 -0500] "POST /
>>> HTTP/1.1" 200 66 "-" "-"
>>>
>>> 127.0.0.1 - radius_authorize [28/Sep/2014:14:02:19 -0500] "POST /
>>> HTTP/1.1" 200 66 "-" "-"
>>>
>>> 127.0.0.1 - radius_accounting [28/Sep/2014:14:03:15 -0500] "POST /
>>> HTTP/1.1" 200 35 "-" "-"
>>>
>>> 127.0.0.1 - radius_accounting [28/Sep/2014:14:03:15 -0500] "POST /
>>> HTTP/1.1" 200 35 "-" "-"
>>>
>>> 127.0.0.1 - radius_authorize [28/Sep/2014:14:01:37 -0500] "POST /
>>> HTTP/1.1" 200 66 "-" "-"
>>>
>>> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:06 -0500] "POST /
>>> HTTP/1.1" 200 66 "-" "-"
>>>
>>> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:04 -0500] "POST /
>>> HTTP/1.1" 200 66 "-" "-"
>>>
>>> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:04 -0500] "POST /
>>> HTTP/1.1" 200 66 "-" "-"
>>>
>>> 127.0.0.1 - update_iplog [28/Sep/2014:14:03:17 -0500] "POST / HTTP/1.1"
>>> 204 - "-" "-"
>>>
>>> 127.0.0.1 - trigger_violation [28/Sep/2014:14:03:17 -0500] "POST /
>>> HTTP/1.1" 204 - "-" "-"
>>>
>>> 127.0.0.1 - trigger_violation [28/Sep/2014:14:03:17 -0500] "POST /
>>> HTTP/1.1" 204 - "-" "-"
>>>
>>> 127.0.0.1 - radius_authorize [28/Sep/2014:14:01:37 -0500] "POST /
>>> HTTP/1.1" 200 66 "-" "-"
>>>
>>> 127.0.0.1 - - [28/Sep/2014:14:03:18 -0500] "OPTIONS * HTTP/1.0" 200 -
>>> "-" "Apache (internal dummy connection)"
>>>
>>> 127.0.0.1 - - [28/Sep/2014:14:03:19 -0500] "OPTIONS * HTTP/1.0" 200 -
>>> "-" "Apache (internal dummy connection)"
>>>
>>> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:04 -0500] "POST /
>>> HTTP/1.1" 200 66 "-" "-"
>>>
>>> 127.0.0.1 - update_iplog [28/Sep/2014:14:03:19 -0500] "POST / HTTP/1.1"
>>> 204 - "-" "-"
>>>
>>> 127.0.0.1 - update_iplog [28/Sep/2014:14:03:19 -0500] "POST / HTTP/1.1"
>>> 204 - "-" "-"
>>>
>>> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:17 -0500] "POST /
>>> HTTP/1.1" 200 66 "-" "-"
>>>
>>> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:17 -0500] "POST /
>>> HTTP/1.1" 200 66 "-" "-"
>>>
>>> 127.0.0.1 - - [28/Sep/2014:14:03:20 -0500] "OPTIONS * HTTP/1.0" 200 -
>>> "-" "Apache (internal dummy connection)"
>>>
>>> 127.0.0.1 - - [28/Sep/2014:14:03:21 -0500] "OPTIONS * HTTP/1.0" 200 -
>>> "-" "Apache (internal dummy connection)"
>>>
>>> 127.0.0.1 - - [28/Sep/2014:14:03:22 -0500] "OPTIONS * HTTP/1.0" 200 -
>>> "-" "Apache (internal dummy connection)"
>>>
>>> 127.0.0.1 - trigger_violation [28/Sep/2014:14:03:22 -0500] "POST /
>>> HTTP/1.1" 204 - "-" "-"
>>>
>>> 127.0.0.1 - trigger_violation [28/Sep/2014:14:03:22 -0500] "POST /
>>> HTTP/1.1" 204 - "-" "-"
>>>
>>> 127.0.0.1 - - [28/Sep/2014:14:03:23 -0500] "OPTIONS * HTTP/1.0" 200 -
>>> "-" "Apache (internal dummy connection)"
>>>
>>> 127.0.0.1 - - [28/Sep/2014:14:03:24 -0500] "OPTIONS * HTTP/1.0" 200 -
>>> "-" "Apache (internal dummy connection)"
>>>
>>> 127.0.0.1 - update_iplog [28/Sep/2014:14:03:24 -0500] "POST / HTTP/1.1"
>>> 204 - "-" "-"
>>>
>>> 127.0.0.1 - - [28/Sep/2014:14:03:25 -0500] "OPTIONS * HTTP/1.0" 200 -
>>> "-" "Apache (internal dummy connection)"
>>>
>>> 127.0.0.1 - - [28/Sep/2014:14:03:26 -0500] "OPTIONS * HTTP/1.0" 200 -
>>> "-" "Apache (internal dummy connection)"
>>>
>>> 127.0.0.1 - - [28/Sep/2014:14:03:27 -0500] "OPTIONS * HTTP/1.0" 200 -
>>> "-" "Apache (internal dummy connection)"
>>>
>>> 127.0.0.1 - - [28/Sep/2014:14:03:28 -0500] "OPTIONS * HTTP/1.0" 200 -
>>> "-" "Apache (internal dummy connection)"
>>>
>>> 127.0.0.1 - - [28/Sep/2014:14:03:29 -0500] "OPTIONS * HTTP/1.0" 200 -
>>> "-" "Apache (internal dummy connection)"
>>>
>>>
>>>
>>>
>>> On Mon, Sep 29, 2014 at 6:09 PM, Brian Lucas <[email protected]> wrote:
>>>
>>>> Will do and post back. Right now I have all services restarting one an
>>>> hour from cron to avoid disruption to my users. Will have to wait until an
>>>> odd hour.
>>>>
>>>> Brian
>>>> On Sep 29, 2014 3:22 PM, "Louis Munro" <[email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>> On 2014-09-26, at 19:19 , Brian Lucas <[email protected]> wrote:
>>>>>
>>>>> All,
>>>>>
>>>>> I'm seeing 1000s of the following error per hour on our setup after
>>>>> the update to 4.4. A restart of all services clears it up for a time, but
>>>>> it comes back. Any suggestions as to the problem?
>>>>>
>>>>> Fri Sep 26 18:16:45 2014 : Error: rlm_perl: An error occurred while
>>>>> processing the authorize RPC request: An error occured while sending a
>>>>> MessagePack request: 7 Couldn't connect to server couldn't connect to host
>>>>> at /usr/local/pf/lib//pf/radius/rpc.pm line 52.
>>>>>
>>>>>
>>>>> Hi Brian,
>>>>>
>>>>> Next time this happens, try to see why radius could not connect to the
>>>>> webservice.
>>>>>
>>>>> By default, the webservice runs on port 9090 on localhost.
>>>>> If you try to connect to it yourself, does it work?
>>>>>
>>>>> e.g. run this command:
>>>>>
>>>>> # curl -kvLI http://localhost:9090
>>>>>
>>>>> And see what it says.
>>>>>
>>>>> --
>>>>> Louis Munro
>>>>> [email protected] :: www.inverse.ca
>>>>> +1.514.447.4918 x125 :: +1 (866) 353-6153 x125
>>>>> Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence (
>>>>> www.packetfence.org)
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Slashdot TV. Videos for Nerds. Stuff that Matters.
>>>>>
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
>>>>> _______________________________________________
>>>>> PacketFence-users mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/packetfence-users
>>>>>
>>>>>
>>>
>>
>
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
PacketFence-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/packetfence-users