Mensaje citado por Dustin Doris <[EMAIL PROTECTED]>:

> >
> > hello folks
> >
> > we are using freeradius since 0.8.x and since 0.9.x we start to use the
> > rml_sql (mysql) module to store the accounting, now we also use the
> > mysql db to store the user so the auth and autz also use the rml_sql
> > module
> >
> > we had problems since the begining with that module, micelaneus
> > problems, but when we switch to version 1.0.0 everything start to work
> > better, good improving and nice work
> >
> > so, what is the problems that we are having ?, well we are trying to
> > identify it, is not easy because it only had been happen 4 times since
> > september (when we start using 1.0.1) very random, yesterday night was
> > the last time.
> >
> > the radius server just stop responding and die, without any anormal log,
> > the process end, if you start it it start and logs as usual but our
> > users can't connect, it doesn't matter how may time you try to restart
> > the services, it never give services, but if you start it in debug mode
> > ( -X ) --to see if anything goes wrong-- and then restart it as usual
> > (without debug because you didn't see anything anormal in debug mode)
> > everything start to function as it supose and our users start to
> > connect.
> >
> > my guest is that it is something related with the rml_sql but it is just
> > a guest.
> >
> > the radius server is a littel busy, we have 3 Cisco AS ( 2 AS5400 a 1
> > AS5300) that make 720 lines from which between 500 and 600 are use it
> > all the time
> >
> > as i say before, yesterday night our two server die aroung the same
> > time, very extrange
> >
> > the enviroment is:
> > OS: WhiteBox3 (RHEL3 clone) with all the updates
> > freeradius rebuilded from the last SRPM provided by RH (1.0.1-1) (we need
> experimental
> > modules: sqlcounter)
> >
> > does anybody had this experience ?
> >
> > thanks very much
> > roger
> > PD: i'm apologies because of my bad english
> >
> >
>
> The fact that you say the two servers died around the same time is an
> interesting fact.  I would setup a packet sniffer on those machines and
> capture the radius traffic going to the box and hope to capture the
> traffic that is hitting the machine during the next time it goes down.  Of
> course this may not help, put it might be worth giving it a shot.
>

yes, it is
but one of the servers (the secundary) logged this:

Mon Jan 10 21:33:09 2005 : Error: Assertion failed in modcall.c, line 68

it was the last log
the fist radius server didn't log anything anormal.
probably this doesn't mean anything, but maybe it do :-) (there is always hope 
:-) )

if this problem happen to both servers at the same time is because is related 
to something
common to them

those servers have two thing in common.

1- the clients
2- the accounting db (mysql)

the mysqld didn't goes down it was just the radiusd server

the last time was exceptional because we had changed the designe of the dialup 
connection,
in the other times that we had the problem each radius server had it own db (we 
sincronise
the db by it's own method, but that proved to be week) so we change to a case 
where we
have a master/prncipal radius with the master mysql db and an secundary radius 
server
with an slave mysql db but with the secundary FR server connecting to the 
master mysql
server just for accounting (the auth and autz is to the slave mysql db).

is very interesting that when this happen (the last time that the radiusd goes 
down) if i
try to restart it everything look fine, but only one AS can provide 
connectivity to the
remote users the others two AS couldn't do it but i had  events from those two 
AS in the
radius.log file
as usual, when i started the radius with the -X command line option everything 
looks fine,
after that action (start radiusd -X)  i went to start radiusd as usual and 
after that,
our AS, all of then, started providing connectivity

the privous ocasions with the problem (the fists 3 of them) the radius just 
couldn't
connect to the mysql server, again, starting radius as radiusd -X sove the 
situation

all the times, looks like the -X clean some enviroment, very wear to me

> As this packet capture may get huge, you will probably want to stop it and
> start over every day if your servers don't go down.  The easiest way would
> be a tcpdump outputing to a file and then use ethereal to analyze it.
>
yes, that is the problem, the sniffer will get a __lot__ of traffic
because the problem only appear from time to time (like once a month) and our 
radius has a
lot of traffic (about 27000 connections per day (weekday) )

i know i need to do more troubleshuting but is dificult because i dont have a 
glue about
what it tha cause

thanks anyway for your reply

roger

----------------------------------------------------------------------
Nodo central de la red Infomed                 (http://www.sld.cu)
Usuario linux: 97152                           (http://counter.li.org)
Miembro del grupo de coordinacion de LinuxCuba (http://www.linux.cu)

"Whatever you do will be insignificant, but it is very important
 that you do it."
                       Gandhi
----------------------------------------------------------------------


-------------------------------------------------
Este mensaje fue enviado usando el servicio de correo en web de Infomed
http://webmail.sld.cu

-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

Reply via email to