Quick bit of background. We're using FreeRADIUS in combination with rlm_perl for network access control at our site. Everything was running fine on FreeBSD 8.0 with FreeRADIUS 2.1.8 compiled from ports and Perl 5.8 compiled to be non-threaded and not support multiplicity. We got new higher spec servers from Dell, and sadly the only *nix OS that currently supports the RAID card in them seems to be Ubuntu Server 10.4, so we're now on that.
Perl is compiled (as it is by default on Ubuntu) with threads and multiplicity, but some testing by hammering the server from multiple different machines using radclient and some test packets from a file seemed to show that (unlike the last time we tried, which I believe was on FreeRADIUS 2.1.3) using threaded perl was stable and worked fine with rlm_perl in FreeRADIUS (it used to lock up eventually). Ran fine for a week or so, but in the last few days we've had it crash twice, both times with the same message. The logs initially fill with messages of the sort: "Sat Jul 24 01:05:08 2010 : Error: WARNING: Unresponsive child for request 128145, in module perl component accounting" and "Sat Jul 24 01:05:08 2010 : Info: WARNING: Child is hung for request 128145." We'll end up with 32 of the former type of error message (we're currently running with 32 threads configured in the thread_pool in radius.conf), interspersed with the latter, then a stack of the latter type of error (Always for the same set of requests, i.e. one of the ones we got an initial error for, but we'll get the second error multiple times for a given request). Then eventually we get "Sat Jul 24 01:05:27 2010 : Error: ASSERT FAILED threads.c[406]: (*request)->magic == REQUEST_MAGIC" All our accounting module does in perl is convert the incoming radius hash to yaml and then attempt to write it to a database with a timestamp. I am strongly suspecting that the initial problem is to do with threading in combination with the DBD::MySQL module in perl or the MySQL client rather than FreeRADIUS, despite our testing seeming to show it was OK. But I do not think that final ASSERT FAILED error should be being generated as a result of the former issue. I am trying to understand what is going on. Is FreeRADIUS attempting to kill the deadlocked threads and being unable to do so? Thanks Dan - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html