>
> hello folks
>
> we are using freeradius since 0.8.x and since 0.9.x we start to use the
> rml_sql (mysql) module to store the accounting, now we also use the
> mysql db to store the user so the auth and autz also use the rml_sql
> module
>
> we had problems since the begining with that module, micelaneus
> problems, but when we switch to version 1.0.0 everything start to work
> better, good improving and nice work
>
> so, what is the problems that we are having ?, well we are trying to
> identify it, is not easy because it only had been happen 4 times since
> september (when we start using 1.0.1) very random, yesterday night was
> the last time.
>
> the radius server just stop responding and die, without any anormal log,
> the process end, if you start it it start and logs as usual but our
> users can't connect, it doesn't matter how may time you try to restart
> the services, it never give services, but if you start it in debug mode
> ( -X ) --to see if anything goes wrong-- and then restart it as usual
> (without debug because you didn't see anything anormal in debug mode)
> everything start to function as it supose and our users start to
> connect.
>
> my guest is that it is something related with the rml_sql but it is just
> a guest.
>
> the radius server is a littel busy, we have 3 Cisco AS ( 2 AS5400 a 1
> AS5300) that make 720 lines from which between 500 and 600 are use it
> all the time
>
> as i say before, yesterday night our two server die aroung the same
> time, very extrange
>
> the enviroment is:
> OS: WhiteBox3 (RHEL3 clone) with all the updates
> freeradius rebuilded from the last SRPM provided by RH (1.0.1-1) (we need 
> experimental
> modules: sqlcounter)
>
> does anybody had this experience ?
>
> thanks very much
> roger
> PD: i'm apologies because of my bad english
>
>

The fact that you say the two servers died around the same time is an
interesting fact.  I would setup a packet sniffer on those machines and
capture the radius traffic going to the box and hope to capture the
traffic that is hitting the machine during the next time it goes down.  Of
course this may not help, put it might be worth giving it a shot.

As this packet capture may get huge, you will probably want to stop it and
start over every day if your servers don't go down.  The easiest way would
be a tcpdump outputing to a file and then use ethereal to analyze it.

If you can get lucky enough to have it happen again and see the packets
coming in, then you can use radclient to resend those packets to a
development machine that is running in debug mode.  You will get to see if
there is something interesting about the sql queries you are creating with
those radius requests.

I hope thats helpful.  Just a suggestion on troubleshooting.  I've had to
do similar things before, mostly with bind.  Turned out some windows
sourced dns query was taking down our servers.  We would have never
figured that out unless we did the packet capture, as the logs showed
nothing wrong.

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

Reply via email to