I recently upgraded our freeradius servers to 2.1.8 and over the past
month it has died on one of the servers two times (spaced about two
weeks apart I think). So fairly infrequently.

A bit of background, We use this server predominantly to proxy
requests. Every day for about 15 minutes, the two main home servers we
proxy to stop responding (they are doing backups or maintenance during
this time) so for those 15 minutes our clients (LNS/NAS) would be
sending a very large number of accounting interim packets and some
stop packets and would be resending these while the home servers are
down.

Some relevant proxy home server settings we currently use for the main
home servers we proxy to:

        response_window = 14
        zombie_period = 40
        status_check = status-server
        check_interval = 30
        num_answers_to_alive = 3


The times that freeradius has died has been near the end of the 15
minutes of the home servers downtime.

The last time this happened, I noticed logs in the attached file. Ones
that sound relevant as follows:

Sun Mar 14 17:30:15 2010 : Proxy: Marking home server 10.0.1.48
port 1646 as zombie (it looks like it is dead).
Sun Mar 14 17:30:16 2010 : Proxy: Marking home server 10.0.1.47
port 1646 as zombie (it looks like it is dead).
Sun Mar 14 17:30:19 2010 : Proxy: Marking home server 10.0.1.47
port 1645 as zombie (it looks like it is dead).
Sun Mar 14 17:30:19 2010 : Error: No response to status check 903535
for home server 10.0.1.48 port 1646
Sun Mar 14 17:30:20 2010 : Error: No response to status check 903536
for home server 10.0.1.47 port 1646
...
Sun Mar 14 17:30:32 2010 : Error: Internal sanity check failed for
child state
...
Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for
proxying requests.
Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for
proxying requests.
Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for
proxying requests.
...
Fri Mar 19 17:30:56 2010 : Error: ASSERT FAILED event.c[1084]:
home->ev != NULL


That last one is where it dies I think.

That last error seems a bit similar (but a bit different) to the
following thread:

http://www.mail-archive.com/freeradius-users@lists.freeradius.org/msg58052.html

">Re: ASSERT FAILED event.c in 2.1.7
>Alan DeKok
>Fri, 25 Sep 2009 01:19:40 -0700
>
>Maja Wolniewicz wrote:
>> After the upgrade from 2.1.6 to 2.1.7 my two servers died 3-4 times
>> daily with the following error:
>>
>> Thu Sep 24 19:07:13 2009 : Error: Received conflicting packet from
>> client AP-8 port 32777 - ID: 240 due to unfinished request 2396.  Giving
>> up on old request.
>> Thu Sep 24 19:07:13 2009 : Error: ASSERT FAILED event.c[2682]:
>> request->ev != NULL
>>
>> I have to return to 2.1.6, which works smoothly.
>
 > The simplest thing to do in the short term is to delete the assertion.
>
> Alan DeKok.
"

That one was found to be a bug and was fixed - I don't know if my case
is a bug though.


Another thread that sounds useful for this is:

http://www.mail-archive.com/freeradius-users@lists.freeradius.org/msg59985.html

"...
>Until then, configuring status-checks && local detail files will
>definitely help. I would recommend doing that *anyways* for network
>stability.
>
>Alan DeKok."


I don't currently use the robust proxy accounting that that thread
suggests. I expect that would probably work around the issue of
freeradius crashing in this case and I will give that a go. Just
posting this to let you know that it _might_ be a bug and to ask for
advice about whether you think this is a bug or not, and if I should
follow up on that, or if you think it is just my configuration that
needs some changes and what areas I should concentrate on if that is
the case?

Regards,
Anthony
Sun Mar 14 17:30:15 2010 : Proxy: Marking home server 10.0.1.48 
port 1646 as zombie (it looks like it is dead).
Sun Mar 14 17:30:16 2010 : Proxy: Marking home server 10.0.1.47 
port 1646 as zombie (it looks like it is dead).
Sun Mar 14 17:30:19 2010 : Proxy: Marking home server 10.0.1.47 
port 1645 as zombie (it looks like it is dead).
Sun Mar 14 17:30:19 2010 : Error: No response to status check 903535 
for home server 10.0.1.48 port 1646
Sun Mar 14 17:30:20 2010 : Error: No response to status check 903536 
for home server 10.0.1.47 port 1646
Sun Mar 14 17:30:23 2010 : Error: No response to status check 61094 
for home server 10.0.1.47 port 1645
Sun Mar 14 17:30:31 2010 : Error: rlm_radutmp: Logout entry for NAS 
lns02 port 2520 has wrong ID
Sun Mar 14 17:30:32 2010 : Error: Internal sanity check failed for 
child state
Sun Mar 14 17:30:32 2010 : Error: Reply from home server 10.0.1.48 
port 1646  - ID: 224 arrived too late for request 903469. Try 
increasing 'retry_delay' or 'max_request_time'
Sun Mar 14 17:30:33 2010 : Proxy: Marking home server 10.0.1.48 
port 1645 as zombie (it looks like it is dead).
Sun Mar 14 17:30:34 2010 : Error: Internal sanity check failed for 
child state
Sun Mar 14 17:30:34 2010 : Error: Reply from home server 10.0.1.48 
port 1646  - ID: 25 arrived too late for request 903472. Try 
increasing 'retry_delay' or 'max_request_time'
...

Sun Mar 14 17:30:34 2010 : Error: Internal sanity check failed for 
child state
Sun Mar 14 17:30:34 2010 : Error: Reply from home server 10.0.1.48 
port 1646  - ID: 170 arrived too late for request 903473. Try 
increasing 'retry_delay' or 'max_request_time'
Sun Mar 14 17:30:34 2010 : Proxy: Received response to status check 
61098 (4 in current sequence)
Sun Mar 14 17:30:34 2010 : Error: Internal sanity check failed for 
child state
Sun Mar 14 17:30:34 2010 : Error: Reply from home server 10.0.1.47 
port 1646  - ID: 66 arrived too late for request 903474. Try 
increasing 'retry_delay' or 'max_request_time'
Sun Mar 14 17:30:34 2010 : Proxy: No outstanding request was found 
for reply from host 10.0.1.47 port 1645 - ID 192
Sun Mar 14 17:30:34 2010 : Error: Internal sanity check failed for 
child state
Sun Mar 14 17:30:34 2010 : Error: Reply from home server 10.0.1.48 
port 1646  - ID: 28 arrived too late for request 903475. Try 
increasing 'retry_delay' or 'max_request_time'
Sun Mar 14 17:30:34 2010 : Error: Internal sanity check failed for 
child state
...


Fri Mar 19 17:30:45 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:45 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:46 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:46 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:46 2010 : Proxy: Failed to create a new socket for 
proxying requests.Fri Mar 19 17:30:46 2010 : Proxy: Failed to create 
a new socket for proxying requests.
Fri Mar 19 17:30:46 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:47 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:47 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:47 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:48 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:48 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:48 2010 : Proxy: Failed to create a new socket for 
proxying requests.Fri Mar 19 17:30:48 2010 : Proxy: Failed to create 
a new socket for proxying requests.
Fri Mar 19 17:30:49 2010 : Proxy: Failed to create a new socket for 
proxying requests.Fri Mar 19 17:30:50 2010 : Proxy: Failed to create 
a new socket for proxying requests.
Fri Mar 19 17:30:50 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:50 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:50 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:50 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:50 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:50 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:50 2010 : Error: No response to status check 890812 
for home server 10.0.1.48 port 1646
Fri Mar 19 17:30:53 2010 : Error: rlm_radutmp: Logout entry for NAS 
lns02 port 465 has wrong ID
Fri Mar 19 17:30:53 2010 : Error: rlm_radutmp: Logout entry for NAS 
lns02 port 763 has wrong ID
Fri Mar 19 17:30:53 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:53 2010 : Error: Discarding duplicate request from 
client lns02 port 1645 - ID: 222 due to unfinished request 90512
Fri Mar 19 17:30:53 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:55 2010 : Error: No response to status check 90519 
for home server 10.0.1.48 port 1645
Fri Mar 19 17:30:56 2010 : Proxy: Failed to create a new socket for 
proxying requests.
Fri Mar 19 17:30:56 2010 : Proxy: Marking home server 10.0.1.47 
port 1646 as dead.
Fri Mar 19 17:30:56 2010 : Error: ASSERT FAILED event.c[1084]: 
home->ev != NULL
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

Reply via email to