Re: freeradius 2.1.8 dies Error: ASSERT FAILED event.c[1084]: home->ev != NULL
fab junkmail wrote: >> Why is it running out of sockets? This shouldn't happen. > > Not sure but there is a _lot_ of attempted proxying going on - maybe > it just went over the system limits like open file limits or > something? In any case it probably won't be a problem when I implement > the robust-proxy-accounting. Likely, yes. If the server is overloaded and unable to proxy packets... who knows what can happen. The *intent* is to have it still work, but it's a poorly tested code path. Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: freeradius 2.1.8 dies Error: ASSERT FAILED event.c[1084]: home->ev != NULL
Hi Alan, Thanks for your response. Alan DeKok wrote: > You can configure the proxy to log accounting packets to disk when the > home server is down. See raddb/sites-available/robust-proxy-accounting Ok I will definitely do this then. >> Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for >> proxying requests. > > Why is it running out of sockets? This shouldn't happen. Not sure but there is a _lot_ of attempted proxying going on - maybe it just went over the system limits like open file limits or something? In any case it probably won't be a problem when I implement the robust-proxy-accounting. > You have a NAS which is sending large amounts of traffic to a proxy > when the home server is down. The proxy isn't configured to do anything > useful with the packets. This is a bug in the *architecture*. Understood. Thanks for your help Alan. Regards, Anthony - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: freeradius 2.1.8 dies Error: ASSERT FAILED event.c[1084]: home->ev != NULL
fab junkmail wrote: > I recently upgraded our freeradius servers to 2.1.8 and over the past > month it has died on one of the servers two times (spaced about two > weeks apart I think). So fairly infrequently. OK. > A bit of background, We use this server predominantly to proxy > requests. Every day for about 15 minutes, the two main home servers we > proxy to stop responding (they are doing backups or maintenance during > this time) so for those 15 minutes our clients (LNS/NAS) would be > sending a very large number of accounting interim packets and some > stop packets and would be resending these while the home servers are > down. You can configure the proxy to log accounting packets to disk when the home server is down. See raddb/sites-available/robust-proxy-accounting > Sun Mar 14 17:30:15 2010 : Proxy: Marking home server 10.0.1.48 > port 1646 as zombie (it looks like it is dead). > Sun Mar 14 17:30:16 2010 : Proxy: Marking home server 10.0.1.47 > port 1646 as zombie (it looks like it is dead). > Sun Mar 14 17:30:19 2010 : Proxy: Marking home server 10.0.1.47 > port 1645 as zombie (it looks like it is dead). > Sun Mar 14 17:30:19 2010 : Error: No response to status check 903535 > for home server 10.0.1.48 port 1646 > Sun Mar 14 17:30:20 2010 : Error: No response to status check 903536 > for home server 10.0.1.47 port 1646 > ... > Sun Mar 14 17:30:32 2010 : Error: Internal sanity check failed for > child state Hmm... that's not good. > Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for > proxying requests. Why is it running out of sockets? This shouldn't happen. > Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for > proxying requests. > Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for > proxying requests. > ... > Fri Mar 19 17:30:56 2010 : Error: ASSERT FAILED event.c[1084]: > home->ev != NULL Well... after all of the previous errors, it's not surprising that something *worse* eventually goes wrong. It's like driving your car for 45 minutes after the tires are flat: not a good idea. > That last one is where it dies I think. Yes. > That one was found to be a bug and was fixed - I don't know if my case > is a bug though. It's a bug, but the other problems you're seeing should be fixed, too. > I don't currently use the robust proxy accounting that that thread > suggests. I expect that would probably work around the issue of > freeradius crashing in this case and I will give that a go. Yes. > Just > posting this to let you know that it _might_ be a bug and to ask for > advice about whether you think this is a bug or not, and if I should > follow up on that, or if you think it is just my configuration that > needs some changes and what areas I should concentrate on if that is > the case? You have a NAS which is sending large amounts of traffic to a proxy when the home server is down. The proxy isn't configured to do anything useful with the packets. This is a bug in the *architecture*. Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
freeradius 2.1.8 dies Error: ASSERT FAILED event.c[1084]: home->ev != NULL
I recently upgraded our freeradius servers to 2.1.8 and over the past month it has died on one of the servers two times (spaced about two weeks apart I think). So fairly infrequently. A bit of background, We use this server predominantly to proxy requests. Every day for about 15 minutes, the two main home servers we proxy to stop responding (they are doing backups or maintenance during this time) so for those 15 minutes our clients (LNS/NAS) would be sending a very large number of accounting interim packets and some stop packets and would be resending these while the home servers are down. Some relevant proxy home server settings we currently use for the main home servers we proxy to: response_window = 14 zombie_period = 40 status_check = status-server check_interval = 30 num_answers_to_alive = 3 The times that freeradius has died has been near the end of the 15 minutes of the home servers downtime. The last time this happened, I noticed logs in the attached file. Ones that sound relevant as follows: Sun Mar 14 17:30:15 2010 : Proxy: Marking home server 10.0.1.48 port 1646 as zombie (it looks like it is dead). Sun Mar 14 17:30:16 2010 : Proxy: Marking home server 10.0.1.47 port 1646 as zombie (it looks like it is dead). Sun Mar 14 17:30:19 2010 : Proxy: Marking home server 10.0.1.47 port 1645 as zombie (it looks like it is dead). Sun Mar 14 17:30:19 2010 : Error: No response to status check 903535 for home server 10.0.1.48 port 1646 Sun Mar 14 17:30:20 2010 : Error: No response to status check 903536 for home server 10.0.1.47 port 1646 ... Sun Mar 14 17:30:32 2010 : Error: Internal sanity check failed for child state ... Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for proxying requests. Fri Mar 19 17:30:54 2010 : Proxy: Failed to create a new socket for proxying requests. ... Fri Mar 19 17:30:56 2010 : Error: ASSERT FAILED event.c[1084]: home->ev != NULL That last one is where it dies I think. That last error seems a bit similar (but a bit different) to the following thread: http://www.mail-archive.com/freeradius-users@lists.freeradius.org/msg58052.html ">Re: ASSERT FAILED event.c in 2.1.7 >Alan DeKok >Fri, 25 Sep 2009 01:19:40 -0700 > >Maja Wolniewicz wrote: >> After the upgrade from 2.1.6 to 2.1.7 my two servers died 3-4 times >> daily with the following error: >> >> Thu Sep 24 19:07:13 2009 : Error: Received conflicting packet from >> client AP-8 port 32777 - ID: 240 due to unfinished request 2396. Giving >> up on old request. >> Thu Sep 24 19:07:13 2009 : Error: ASSERT FAILED event.c[2682]: >> request->ev != NULL >> >> I have to return to 2.1.6, which works smoothly. > > The simplest thing to do in the short term is to delete the assertion. > > Alan DeKok. " That one was found to be a bug and was fixed - I don't know if my case is a bug though. Another thread that sounds useful for this is: http://www.mail-archive.com/freeradius-users@lists.freeradius.org/msg59985.html "... >Until then, configuring status-checks && local detail files will >definitely help. I would recommend doing that *anyways* for network >stability. > >Alan DeKok." I don't currently use the robust proxy accounting that that thread suggests. I expect that would probably work around the issue of freeradius crashing in this case and I will give that a go. Just posting this to let you know that it _might_ be a bug and to ask for advice about whether you think this is a bug or not, and if I should follow up on that, or if you think it is just my configuration that needs some changes and what areas I should concentrate on if that is the case? Regards, Anthony Sun Mar 14 17:30:15 2010 : Proxy: Marking home server 10.0.1.48 port 1646 as zombie (it looks like it is dead). Sun Mar 14 17:30:16 2010 : Proxy: Marking home server 10.0.1.47 port 1646 as zombie (it looks like it is dead). Sun Mar 14 17:30:19 2010 : Proxy: Marking home server 10.0.1.47 port 1645 as zombie (it looks like it is dead). Sun Mar 14 17:30:19 2010 : Error: No response to status check 903535 for home server 10.0.1.48 port 1646 Sun Mar 14 17:30:20 2010 : Error: No response to status check 903536 for home server 10.0.1.47 port 1646 Sun Mar 14 17:30:23 2010 : Error: No response to status check 61094 for home server 10.0.1.47 port 1645 Sun Mar 14 17:30:31 2010 : Error: rlm_radutmp: Logout entry for NAS lns02 port 2520 has wrong ID Sun Mar 14 17:30:32 2010 : Error: Internal sanity check failed for child state Sun Mar 14 17:30:32 2010 : Error: Reply from home server 10.0.1.48 port 1646 - ID: 224 arrived too late for request 903469. Try increasing 'retry_delay' or 'max_request_time' Sun Mar 14 17:30:33 2010 : Proxy: Marking home server 10.0.1.48 port 1645 as zombie (it looks like it is dead). Sun Mar 14 17:30:34 2010 : Error: Internal sanity check failed for child state Sun Mar 14 17:30:34 2010 : Error: