Hello list, I'm currently testing freeradius-snapshot-20020114 (configured as a proxy only) on Solaris 8 and running into a problem.
radiusd will run for a short (seemingly random) period of time (any where from say 10 seconds to 30 seconds) and happily processing requests until it simply dies with a signal 9 (SIGKILL??) and core dumps. The problem also seems to relate to the load put on the server. At 4 or 5 requests per second it will start to exhibit the crashing behaviour within about 30 seconds. At 2 or 3 requests per second it is taking several minutes for the problem to appear. The problem also has only so far appeared for accounting requests. Maybe there is a timing issue somewhere, since accounting requests take that much longer to process as my proxy has to wait for a response to come back from the second radius server? Using gdb it appears that radiusd is crashing at at least a few different places, which is not very helpful, and kind of suggests it may not be an actual bug in FreeRADIUS? Here are three back traces that I captured: #0 0x188b4 in proxy_send (request=0x9cb18) at proxy.c:317 317 request->proxy->timestamp = request->timestamp; (gdb) bt #0 0x188b4 in proxy_send (request=0x9cb18) at proxy.c:317 #1 0x15480 in rad_respond (request=0x9cb18, fun=0x170a0 <rad_accounting>) at radiusd.c:1527 #2 0x1ecf8 in request_handler_thread (arg=0x98110) at threads.c:169 ---------------------------------------------------------------------------- - #0 0xff141da4 in t_delete () from /usr/lib/libc.so.1 (gdb) bt #0 0xff141da4 in t_delete () from /usr/lib/libc.so.1 #1 0xff141998 in realfree () from /usr/lib/libc.so.1 #2 0xff14226c in cleanfree () from /usr/lib/libc.so.1 #3 0xff1413a0 in _malloc_unlocked () from /usr/lib/libc.so.1 #4 0xff141294 in malloc () from /usr/lib/libc.so.1 #5 0x22538 in rad_decode (packet=0xa00f8, original=0xa3b68, secret=0x98dec "gloople") at radius.c:1060 #6 0x15208 in rad_respond (request=0x98da0, fun=0x170a0 <rad_accounting>) at radiusd.c:1437 #7 0x1ecf8 in request_handler_thread (arg=0x982f0) at threads.c:169 ---------------------------------------------------------------------------- - #0 0x23220 in pairfind (first=0x190, attr=41) at valuepair.c:97 97 first = first->next; (gdb) bt #0 0x23220 in pairfind (first=0x190, attr=41) at valuepair.c:97 #1 0x1888c in proxy_send (request=0x9d728) at proxy.c:312 #2 0x15480 in rad_respond (request=0x9d728, fun=0x170a0 <rad_accounting>) at radiusd.c:1527 #3 0x1ecf8 in request_handler_thread (arg=0xa70f0) at threads.c:169 This appears to point back to the threading, but whether it is a Solaris issue or a FreeRADIUS issue I'm not really sure. The log files don't appear (to me) to give a definitive answer to what is happening here, except that at the time of the "crash", I'm getting incomplete attribute logging such as: Thread 2 handling request 167, (17 handled so far) Proxy-State = 0x313639 Sending Accounting-Response of id 169 to 203.108.109.27:62729 Finished request 167 Going to the next request Thread 2 waiting to be assigned a request NAS-IP-Address = 203.108.109.27 = 1 = Async = Start = "123" Proxy-State = "169" = UNKNOWN-TYPE When I run the server with the "-s" option it seems to fun fine and does not exhibit this behaviour. Once again this appears to point towards a problem with threading? I know there are at least a couple of people running FreeRADIUS on Solaris 8 and just wondering if anyone can possibly point me in a direction to start looking for the problem, or if there is a known issue with solaris that requires a patch or something similar? Many thanks for your time and effort, Michael - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html