I recently upgraded our freeradius servers to 2.1.8 and over the past month it has died on one of the servers two times (spaced about two weeks apart I think). So fairly infrequently.
A bit of background, We use this server predominantly to proxy requests. Every day for about 15 minutes, the two main home servers we proxy to stop responding (they are doing backups or maintenance during this time) so for those 15 minutes our clients (LNS/NAS) would be sending a very large number of accounting interim packets and some stop packets and would be resending these while the home servers are down. Some relevant proxy home server settings we currently use for the main home servers we proxy to: response_window = 14 zombie_period = 40 status_check = status-server check_interval = 30 num_answers_to_alive = 3 The times that freeradius has died has been near the end of the 15 minutes of the home servers downtime. The last time this happened, I noticed logs in the attached file. Ones that sound relevant as follows: Sun Mar 14 17:30:15 2010 : Proxy: Marking home server 10.0.1.48 port 1646 as zombie (it looks like it is dead). Sun Mar 14 17:30:16 2010 : Proxy: Marking home server 10.0.1.47 port 1646 as zombie (it looks like it is dead). Sun Mar 14 17:30:19 2010 : Proxy: Marking home server 10.0.1.47 port 1645 as zombie (it looks like it is dead). Sun Mar 14 17:30:19 2010 : Error: No response to status check 903535 for home server 10.0.1.48 port 1646 Sun Mar 14 17:30:20 2010 : Error: No response to status check 903536 for home server 10.0.1.47 port 1646 ... Sun Mar 14 17:30:32 2010 : Error: Internal sanity check failed for child state ... Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for proxying requests. ... Fri Mar 19 17:30:56 2010 : Error: ASSERT FAILED event.c[1084]: home->ev != NULL That last one is where it dies I think. That last error seems a bit similar (but a bit different) to the following thread: http://www.mail-archive.com/freeradius-users@lists.freeradius.org/msg58052.html ">Re: ASSERT FAILED event.c in 2.1.7 >Alan DeKok >Fri, 25 Sep 2009 01:19:40 -0700 > >Maja Wolniewicz wrote: >> After the upgrade from 2.1.6 to 2.1.7 my two servers died 3-4 times >> daily with the following error: >> >> Thu Sep 24 19:07:13 2009 : Error: Received conflicting packet from >> client AP-8 port 32777 - ID: 240 due to unfinished request 2396. Giving >> up on old request. >> Thu Sep 24 19:07:13 2009 : Error: ASSERT FAILED event.c[2682]: >> request->ev != NULL >> >> I have to return to 2.1.6, which works smoothly. > > The simplest thing to do in the short term is to delete the assertion. > > Alan DeKok. " That one was found to be a bug and was fixed - I don't know if my case is a bug though. Another thread that sounds useful for this is: http://www.mail-archive.com/freeradius-users@lists.freeradius.org/msg59985.html "... >Until then, configuring status-checks && local detail files will >definitely help. I would recommend doing that *anyways* for network >stability. > >Alan DeKok." I don't currently use the robust proxy accounting that that thread suggests. I expect that would probably work around the issue of freeradius crashing in this case and I will give that a go. Just posting this to let you know that it _might_ be a bug and to ask for advice about whether you think this is a bug or not, and if I should follow up on that, or if you think it is just my configuration that needs some changes and what areas I should concentrate on if that is the case? Regards, Anthony
Sun Mar 14 17:30:15 2010 : Proxy: Marking home server 10.0.1.48 port 1646 as zombie (it looks like it is dead). Sun Mar 14 17:30:16 2010 : Proxy: Marking home server 10.0.1.47 port 1646 as zombie (it looks like it is dead). Sun Mar 14 17:30:19 2010 : Proxy: Marking home server 10.0.1.47 port 1645 as zombie (it looks like it is dead). Sun Mar 14 17:30:19 2010 : Error: No response to status check 903535 for home server 10.0.1.48 port 1646 Sun Mar 14 17:30:20 2010 : Error: No response to status check 903536 for home server 10.0.1.47 port 1646 Sun Mar 14 17:30:23 2010 : Error: No response to status check 61094 for home server 10.0.1.47 port 1645 Sun Mar 14 17:30:31 2010 : Error: rlm_radutmp: Logout entry for NAS lns02 port 2520 has wrong ID Sun Mar 14 17:30:32 2010 : Error: Internal sanity check failed for child state Sun Mar 14 17:30:32 2010 : Error: Reply from home server 10.0.1.48 port 1646 - ID: 224 arrived too late for request 903469. Try increasing 'retry_delay' or 'max_request_time' Sun Mar 14 17:30:33 2010 : Proxy: Marking home server 10.0.1.48 port 1645 as zombie (it looks like it is dead). Sun Mar 14 17:30:34 2010 : Error: Internal sanity check failed for child state Sun Mar 14 17:30:34 2010 : Error: Reply from home server 10.0.1.48 port 1646 - ID: 25 arrived too late for request 903472. Try increasing 'retry_delay' or 'max_request_time' ... Sun Mar 14 17:30:34 2010 : Error: Internal sanity check failed for child state Sun Mar 14 17:30:34 2010 : Error: Reply from home server 10.0.1.48 port 1646 - ID: 170 arrived too late for request 903473. Try increasing 'retry_delay' or 'max_request_time' Sun Mar 14 17:30:34 2010 : Proxy: Received response to status check 61098 (4 in current sequence) Sun Mar 14 17:30:34 2010 : Error: Internal sanity check failed for child state Sun Mar 14 17:30:34 2010 : Error: Reply from home server 10.0.1.47 port 1646 - ID: 66 arrived too late for request 903474. Try increasing 'retry_delay' or 'max_request_time' Sun Mar 14 17:30:34 2010 : Proxy: No outstanding request was found for reply from host 10.0.1.47 port 1645 - ID 192 Sun Mar 14 17:30:34 2010 : Error: Internal sanity check failed for child state Sun Mar 14 17:30:34 2010 : Error: Reply from home server 10.0.1.48 port 1646 - ID: 28 arrived too late for request 903475. Try increasing 'retry_delay' or 'max_request_time' Sun Mar 14 17:30:34 2010 : Error: Internal sanity check failed for child state ... Fri Mar 19 17:30:45 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:45 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:46 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:46 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:46 2010 : Proxy: Failed to create a new socket for proxying requests.Fri Mar 19 17:30:46 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:46 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:47 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:47 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:47 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:48 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:48 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:48 2010 : Proxy: Failed to create a new socket for proxying requests.Fri Mar 19 17:30:48 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:49 2010 : Proxy: Failed to create a new socket for proxying requests.Fri Mar 19 17:30:50 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:50 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:50 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:50 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:50 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:50 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:50 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:50 2010 : Error: No response to status check 890812 for home server 10.0.1.48 port 1646 Fri Mar 19 17:30:53 2010 : Error: rlm_radutmp: Logout entry for NAS lns02 port 465 has wrong ID Fri Mar 19 17:30:53 2010 : Error: rlm_radutmp: Logout entry for NAS lns02 port 763 has wrong ID Fri Mar 19 17:30:53 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:53 2010 : Error: Discarding duplicate request from client lns02 port 1645 - ID: 222 due to unfinished request 90512 Fri Mar 19 17:30:53 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:55 2010 : Error: No response to status check 90519 for home server 10.0.1.48 port 1645 Fri Mar 19 17:30:56 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:56 2010 : Proxy: Marking home server 10.0.1.47 port 1646 as dead. Fri Mar 19 17:30:56 2010 : Error: ASSERT FAILED event.c[1084]: home->ev != NULL
- List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html