Hi all, I'm at a bit of a loss. I'm currently trying to load test the authentication proxy performance of freeRADIUS 1.0.1 in preparation for a deployment this weekend.
Unfortunately, I'm running into this error "Error: FATAL! Server is too busy to process requests". My scenario is: Authentication Request comes in, we look the username up in openLDAP running on the same server, and if the user doesn't exist, proxy the request by setting Proxy-To-Realm using the attribute rewrite module. I have made changes to the rlm_ldap module for local requirements, and also running a couple of custom modules, so its not pure 1.0.1. The error still occurs when I disable all of my custom modules (except rlm_ldap of course). Interestingly, this error doesn't seem to occur when the openLDAP server is running on a different server, however the rate of requests that I can push through the server is also a lot less in this circumstance (about 25%). Oh, and finally, this is running Solaris 9 on a V240. >From what I can tell, it doesn't seem to be related to thread starvation, and the time it takes to reach this error seems to be somewhat variable. The CPU (according to prstat) doesn't need to be at 100% for this to occur either. However typically when it does occur radiusd is using all or close to all of one of the CPU's. It also doesn't happen when I run the server with -xx. Presumably this is because the extra output slows the server down enough such that its not hitting whatever barrier is causing this. To do the testing I'm using radclient sending using multiple threads. The number of radclient threads does seem to have a bearing, and I can stop the error from happening by reducing the number of threads. Once again I presume this is due to the reduced throughput or requests. I've just worked 11 days straight and averaged at least 15 hours a day, and at the moment I'm just too tired to trace right through the code to see what might be causing the issue. It is 3am here after all. Any advice or help is really appreciated at this stage. What might be the cause of (*request)->child_pid != NO_SUCH_CHILD_PID in request_dequeue? Anything I should look at, or tune to reduce the likelyhood of this occurring? It seems that I can also resolve the issue (at least for the same requests rate) by looping at the "select" in requests_dequeue 20 times instead of 10. What risk does this present? I then get errors like: Fri Feb 17 03:10:54 2006 : Error: Dropping conflicting packet from client dbst1:63628 - ID: 198 due to unfinished request 44357 Which is better (to me) than the server stopping. ;-) Thankyou kindly for your time to read this email. Sorry it was so long winded. Hopefully you will be able to offer some advice! kind regards, Mike - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html