On Thu, Nov 26, 2009 at 06:17:29PM +0100, Alan DeKok wrote: > Josip Rodin wrote: > > I upgraded one of our proxy servers from 2.0.4 to 2.1.7, and noticed that > > the proxying changed in a way that "status_check = request" logic started > > being critical, so this kind of stuff: > > > > Sun Nov 22 09:25:56 2009 : Error: Rejecting request 70011 due to lack of > > any response from home server X port 1812 > > > > ...was replaced, without a change in home server configuration, with: > > It wasn't replaced, it just happens less often. > > > It was unclear to me why didn't FreeRADIUS notice this as soon as it first > > happened, and when it eventually happened, why didn't it explicate the > > rationale. So I looked and found these in src/main/event.c: > > Odds are your config handles the "no response" packets. So the above > message happens less often.
Returning to the original problem, in my pool of two fail-over home servers I now have both of them set up with "status_check = none". My upstream proxy maintainers refuse to implement decent status checks, so I'm forced to do this for now. I can do a status check with an entry from a particular HL RADIUS that I happen to control, but that just creates a daisy-chain of SPoFs. :/ They insist that I not do anything like this, but that I set up my server so that it stubbornly tries their first server, then if that fails their second server, for each request. Now, when a request comes through that gets discarded by the first proxy (because it itself times out on a random HL RADIUS), that one gets marked as a zombie. Strangely enough, my server keeps it marked as a zombie even after several minutes (long past any of the zombie_period and revive_interval periods I've kept in the configuration). My server keeps talking only with the second server which is in the 'alive' state, and ignores the zombie. After re-reading proxy.conf comments, this actually looks logical - there is no kind of a status check that would unmark it as a zombie. revive_interval can resurrect it from the 'dead' state, but not from the zombie state. Also this part of the revive_interval comment is a bit confusing: # As a result, we recommend enabling status checks, and # we do NOT recommend using "revive_interval". # # The "revive_interval" is used ONLY if the "status_check" # entry below is not "none". Otherwise, it will not be used, # and should be deleted. So it's supposed to be a crutch only for people who *have* status checks, but not a crutch for those of us who do *not* have status checks. What is a crutch for this situation? A cron job that keeps doing radmin -e 'set home_server state X Y alive'? :) -- 2. That which causes joy or happiness. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html