Re: Freeradius process dies with some (bad?!) EAP requests

2009-01-07 Thread Alan DeKok
Alexander Clouter wrote:
>From what I can remember, I think the segfault for use was in the GNU 
> regexp library it's-self.

  Yes.  glibc was segfaulting on internal functions.  The only solution
is to upgrade glibc to a version that works.

  Alan DeKok.
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Re: Freeradius process dies with some (bad?!) EAP requests

2009-01-06 Thread Alexander Clouter
Nelson Vale  wrote:
> 
> We have several machines running freeradius 2.0.2 as authentication server,
> and we're facing a strange and very critical problem.
> Occasionally radius server just dies with no apparent reason. When I look at
> the logs, the last lines I see before it happens are like:
> 
> "...
> Error: rlm_eap: No EAPsession matching the State variable.
> Error: rlm_eap: Either EAP-request timed out OR EAP-response to an unknown
> EAP-request
> ..."
> 
> **
> I've googled for problems like this and I found some simillar occurrences,
> but no solutions provided...
> 
> Is this a known problem that is fixed in a recent version?
> 
> This is a big issue for us because this causes several thousands of users to
> complain about it, so we would appreciate your help...
> 
For us it was not FreeRADIUS at fault, it was glibc.  As we *ran* Debian 
'stable' the version of libc6 was really really old (2.3.6) and it just 
kept randomly segfaulting (turned out it waas in libc6).  The EAP 
session timeouts looked like a possible clue but in the end I discarded 
it as a red herring.

In the end bumping to Debian lenny (currently 'testing' but soon to be 
stable) fixed all our problems and I have not had *any* reliability 
issues either...all down to the libc6 version (now 2.7).  Another added 
bonus (this was just before etchandahalf) I was able to start using gdb 
on FreeRADIUS as a kernel bug[1] (earlier than 2.6.23) prevents it from 
functioning.

When reliability hits you, it's good to learn briefly how to use gdb.  
Very simply[2]:
 * log in as root
 * open a screen session[3]
 * make sure FreeRADIUS is not running
 * make sure you have all the debug symbols about, or a debugable 
version installed
 * configure screen to log to a file; 'Ctrl-A H'
 * type 'gdb /usr/sbin/freeradius'
 * in gdb type 'run -X'
 * detach from screen 'Ctrl-A D'
 * when you notice FreeRADIUS has died, reconnect to your screen session
 * and the gdb prompt type 'where' or for *lots* of info try
'thread apply all bt full'[3]
 * tell screen to stop logging, 'Ctrl-A H'
 * logout of screen

Means you can run FreeRADIUS and get the debugging you need to either 
blame the OS or Alan :)

>From what I can remember, I think the segfault for use was in the GNU 
regexp library it's-self.

Cheers

Alex

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=449181
[2] a different approach to the one on http://bugs.freeradius.org/
[3] http://blogamundo.net/code/screen/
[4] http://wiki.debian.org/HowToGetABacktrace

-- 
Alexander Clouter
.sigmonster says:   "The jig's up, Elman."
"Which jig?"
-- Jeff Elman

-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Re: Freeradius process dies with some (bad?!) EAP requests

2009-01-06 Thread A . L . M . Buxey
Hi,

> and we're facing a strange and very critical problem.
> Occasionally radius server just dies with no apparent reason. When I look at

I've had similar issues and would recommend upgrading to
latest issue - many many EAP issues were addressed
during the more to 2.1.x

alan
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Freeradius process dies with some (bad?!) EAP requests

2009-01-06 Thread Nelson Vale
Hi all,

We have several machines running freeradius 2.0.2 as authentication server,
and we're facing a strange and very critical problem.
Occasionally radius server just dies with no apparent reason. When I look at
the logs, the last lines I see before it happens are like:

"...
Error: rlm_eap: No EAPsession matching the State variable.
Error: rlm_eap: Either EAP-request timed out OR EAP-response to an unknown
EAP-request
..."

**
I've googled for problems like this and I found some simillar occurrences,
but no solutions provided...

Is this a known problem that is fixed in a recent version?

This is a big issue for us because this causes several thousands of users to
complain about it, so we would appreciate your help...



Thx,


Nelson Vale
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html