RE: replication with radrelay: Failed to aquire filelock

Michael Markstaller Tue, 14 Dec 2004 16:51:13 -0800

sorry for the late answer but most of my primary site was really down 
today when I noticed&posted this, that which drew my attention ;)
now I've looked into some deeper, answers&findings below: 
 
> Kostas Kalevras
> Sent: Tuesday, December 14, 2004 4:31 PM
> To: [EMAIL PROTECTED]
> Subject: RE: replication with radrelay: Failed to aquire filelock
> 
> On Tue, 14 Dec 2004, Michael Markstaller wrote:
> 
> > My setup: Running FreeRADIUS 1.0.1 on Debian sarge
> > server2 (secondary) -> detail-relay/radrelay -> server1 (primary) ->
> > mysql
> > The servers are far away from being under (dual Xeon 2,8, 
> 1GB, SCSI 15k
> > etc)
> >
> > As long as the primary runs and is reachable, everything is fine but
> > whenever the secondary server comes into action due to the 
> primary being
> > unreachable,
> > I see this "Error: rlm_detail: Failed to aquire filelock
> > for...<detail-relay>" frequently. (~1000 acct per hour with 
> ~25 "Error:
> > rlm_detail: " messages per hour)
> >
> > Is there really nothing that can be done about this because I'm
> > concerned to loose some accounting as whenever this happens 
> the primary
> > is most likely down ?
> 
> Does this happen when the primary server comes back up or while it is
> down? For
> instance does the detail file get larger when these messages 
> are printed?


for the first occurences (when I wrote this post) I couldn't surely 
tell as the lines/routing to the primary were flapping.
But now after everything back up, I see it hasn't happended anymore 
after the primary was "totally" unreachable which makes sense for me:
as soon as radrelay cannot relay anymore (and therefore doesnt lock 
the detail-relay-file for updates) the problem disappears.
I also think the reason why I'm hit by that is the NAS' are not falling 
back immidiately to the primary but radrelay starts relaying to the 
primary and even more unfortuate, when the primary is reachable from 
the secondarys' view this doesn't mean it's reachable form the NAS in 
my multi-failover-mess ;)

This dead-time is configured on my NASes to prevent problems/timeouts 
in exactly such flapping-scenarios, also very likely that up/downstream 
radius' act like this - as freeradius does with dead_time in
proxy.conf..

> radrelay should not create any problem especially in this 
> case (where the
> target radius server is down), since it will fill up it's 
> accounting slots
> and
> not read the detail file untill the corresponding packets have been
> acknowledged by the primary radius server.
> 

that seems exactly what happens; as long as the primary is totally 
unreachable things are fine, but when radrelay starts working&relaying 
the details, the log starts spitting the error:
"Error: rlm_detail: Failed to aquire filelock for /var/...detail-relay,
giving up"

Now I understand this is neither a daily occuring problem (hopefully, 
depends on MCI ;) nor really extremely bad but I'd sleep much better 
when being 100% sure all three A's of my two radius are working smooth 
with whatever mess like flapping lines happens.

Maybe there's some idea to tune either radrelay or rlm_detail to work 
around this better as this most likely happens every time after short 
outages.


Michael


-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

RE: replication with radrelay: Failed to aquire filelock

Reply via email to