sorry for the late answer but most of my primary site was really down today when I noticed&posted this, that which drew my attention ;) now I've looked into some deeper, answers&findings below: > Kostas Kalevras > Sent: Tuesday, December 14, 2004 4:31 PM > To: [EMAIL PROTECTED] > Subject: RE: replication with radrelay: Failed to aquire filelock > > On Tue, 14 Dec 2004, Michael Markstaller wrote: > > > My setup: Running FreeRADIUS 1.0.1 on Debian sarge > > server2 (secondary) -> detail-relay/radrelay -> server1 (primary) -> > > mysql > > The servers are far away from being under (dual Xeon 2,8, > 1GB, SCSI 15k > > etc) > > > > As long as the primary runs and is reachable, everything is fine but > > whenever the secondary server comes into action due to the > primary being > > unreachable, > > I see this "Error: rlm_detail: Failed to aquire filelock > > for...<detail-relay>" frequently. (~1000 acct per hour with > ~25 "Error: > > rlm_detail: " messages per hour) > > > > Is there really nothing that can be done about this because I'm > > concerned to loose some accounting as whenever this happens > the primary > > is most likely down ? > > Does this happen when the primary server comes back up or while it is > down? For > instance does the detail file get larger when these messages > are printed?
for the first occurences (when I wrote this post) I couldn't surely tell as the lines/routing to the primary were flapping. But now after everything back up, I see it hasn't happended anymore after the primary was "totally" unreachable which makes sense for me: as soon as radrelay cannot relay anymore (and therefore doesnt lock the detail-relay-file for updates) the problem disappears. I also think the reason why I'm hit by that is the NAS' are not falling back immidiately to the primary but radrelay starts relaying to the primary and even more unfortuate, when the primary is reachable from the secondarys' view this doesn't mean it's reachable form the NAS in my multi-failover-mess ;) This dead-time is configured on my NASes to prevent problems/timeouts in exactly such flapping-scenarios, also very likely that up/downstream radius' act like this - as freeradius does with dead_time in proxy.conf.. > radrelay should not create any problem especially in this > case (where the > target radius server is down), since it will fill up it's > accounting slots > and > not read the detail file untill the corresponding packets have been > acknowledged by the primary radius server. > that seems exactly what happens; as long as the primary is totally unreachable things are fine, but when radrelay starts working&relaying the details, the log starts spitting the error: "Error: rlm_detail: Failed to aquire filelock for /var/...detail-relay, giving up" Now I understand this is neither a daily occuring problem (hopefully, depends on MCI ;) nor really extremely bad but I'd sleep much better when being 100% sure all three A's of my two radius are working smooth with whatever mess like flapping lines happens. Maybe there's some idea to tune either radrelay or rlm_detail to work around this better as this most likely happens every time after short outages. Michael - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html