Mensaje citado por Dustin Doris <[EMAIL PROTECTED]>: > > > > hello folks > > > > we are using freeradius since 0.8.x and since 0.9.x we start to use the > > rml_sql (mysql) module to store the accounting, now we also use the > > mysql db to store the user so the auth and autz also use the rml_sql > > module > > > > we had problems since the begining with that module, micelaneus > > problems, but when we switch to version 1.0.0 everything start to work > > better, good improving and nice work > > > > so, what is the problems that we are having ?, well we are trying to > > identify it, is not easy because it only had been happen 4 times since > > september (when we start using 1.0.1) very random, yesterday night was > > the last time. > > > > the radius server just stop responding and die, without any anormal log, > > the process end, if you start it it start and logs as usual but our > > users can't connect, it doesn't matter how may time you try to restart > > the services, it never give services, but if you start it in debug mode > > ( -X ) --to see if anything goes wrong-- and then restart it as usual > > (without debug because you didn't see anything anormal in debug mode) > > everything start to function as it supose and our users start to > > connect. > > > > my guest is that it is something related with the rml_sql but it is just > > a guest. > > > > the radius server is a littel busy, we have 3 Cisco AS ( 2 AS5400 a 1 > > AS5300) that make 720 lines from which between 500 and 600 are use it > > all the time > > > > as i say before, yesterday night our two server die aroung the same > > time, very extrange > > > > the enviroment is: > > OS: WhiteBox3 (RHEL3 clone) with all the updates > > freeradius rebuilded from the last SRPM provided by RH (1.0.1-1) (we need > experimental > > modules: sqlcounter) > > > > does anybody had this experience ? > > > > thanks very much > > roger > > PD: i'm apologies because of my bad english > > > > > > The fact that you say the two servers died around the same time is an > interesting fact. I would setup a packet sniffer on those machines and > capture the radius traffic going to the box and hope to capture the > traffic that is hitting the machine during the next time it goes down. Of > course this may not help, put it might be worth giving it a shot. >
yes, it is but one of the servers (the secundary) logged this: Mon Jan 10 21:33:09 2005 : Error: Assertion failed in modcall.c, line 68 it was the last log the fist radius server didn't log anything anormal. probably this doesn't mean anything, but maybe it do :-) (there is always hope :-) ) if this problem happen to both servers at the same time is because is related to something common to them those servers have two thing in common. 1- the clients 2- the accounting db (mysql) the mysqld didn't goes down it was just the radiusd server the last time was exceptional because we had changed the designe of the dialup connection, in the other times that we had the problem each radius server had it own db (we sincronise the db by it's own method, but that proved to be week) so we change to a case where we have a master/prncipal radius with the master mysql db and an secundary radius server with an slave mysql db but with the secundary FR server connecting to the master mysql server just for accounting (the auth and autz is to the slave mysql db). is very interesting that when this happen (the last time that the radiusd goes down) if i try to restart it everything look fine, but only one AS can provide connectivity to the remote users the others two AS couldn't do it but i had events from those two AS in the radius.log file as usual, when i started the radius with the -X command line option everything looks fine, after that action (start radiusd -X) i went to start radiusd as usual and after that, our AS, all of then, started providing connectivity the privous ocasions with the problem (the fists 3 of them) the radius just couldn't connect to the mysql server, again, starting radius as radiusd -X sove the situation all the times, looks like the -X clean some enviroment, very wear to me > As this packet capture may get huge, you will probably want to stop it and > start over every day if your servers don't go down. The easiest way would > be a tcpdump outputing to a file and then use ethereal to analyze it. > yes, that is the problem, the sniffer will get a __lot__ of traffic because the problem only appear from time to time (like once a month) and our radius has a lot of traffic (about 27000 connections per day (weekday) ) i know i need to do more troubleshuting but is dificult because i dont have a glue about what it tha cause thanks anyway for your reply roger ---------------------------------------------------------------------- Nodo central de la red Infomed (http://www.sld.cu) Usuario linux: 97152 (http://counter.li.org) Miembro del grupo de coordinacion de LinuxCuba (http://www.linux.cu) "Whatever you do will be insignificant, but it is very important that you do it." Gandhi ---------------------------------------------------------------------- ------------------------------------------------- Este mensaje fue enviado usando el servicio de correo en web de Infomed http://webmail.sld.cu - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html