You must realize that "gdb" by itself is an answer that is of very little use.  
While I am aware that gdb is the GNU Debugger, you have no way of knowing that 
I do, and you gave no other context or other information that would help me use 
gdb to gather anything.

So let me be more clear:

What EXACTLY do I need to do to get more information about this phenomenon, and 
under what circumstances do I need to do it, and once I have some output, what 
should I be looking for in it?  Running production RADIUS servers with "strace 
radiusd -X" is probably impractical (and highly insecure), and may even alter 
the runtime environment such that the fatal event never occurs.  I've never 
observed the failure in either of the two test servers I run, and their 
configurations are identical, so I must assume that radiusd dies after 
receiving some sort of improper/unexpected data, or when it gets into some 
weird state, or other such thing.

But it can't be fixed if I can't figure out how to reproduce it.  It'll happen 
eventually, but a server that is no longer running doesn't tell me much either. 
 How is gdb going to help me figure out why something isn't working any more?

--J

________________________________
From: freeradius-users-bounces+mcnuttj=missouri....@lists.freeradius.org 
[mailto:freeradius-users-bounces+mcnuttj=missouri....@lists.freeradius.org] On 
Behalf Of Gary Gatten
Sent: Tuesday, March 08, 2011 5:06 PM
To: 'freeradius-users@lists.freeradius.org'
Subject: Re: FR 2.1.7 Exits for no reason

Gdb

From: McNutt, Justin M. [mailto:mcnu...@missouri.edu]
Sent: Tuesday, March 08, 2011 04:59 PM
To: freeradius-users@lists.freeradius.org 
<freeradius-users@lists.freeradius.org>
Subject: FR 2.1.7 Exits for no reason

Hey all,

So the host-based auth stuff is working well now, but we've discovered another 
problem.

We have four FR 2.1.7 servers running on RHEL 5 (fully patched).  Every now and 
then, for no apparent reason, radiusd just stops.  It exits with "Exiting 
normally." to syslog.  They don't all exit at the same time.  Since there are 
four of them behind a load balancer, it usually doesn't result in a service 
outage, and we've been lucky so far that only a couple of them have been down 
at once.  But it's still disconcerting.

The servers tend to all be started within a minute of each other, since I make 
changes to Server #1, and then use an rsync script to replicate /etc/raddb to 
the other servers and restart them.  So they all start within seconds of one 
another.  This week, Server #3 stopped within about 8 hours of being started 
(went from 1130 to 1930).  Server #1 failed last week at 2330.  Server #4 
hasn't failed yet.  It's very odd.

Any ideas on how I can troubleshoot this?

Thanks!

Justin McNutt
Network Systems Analyst - Ninja
DNPS, Mizzou Telecom
(573) 882-5183

"Do you have a concussion?"

Ping is NOT a service.  You don't need it.  Use a real test.


"This email is intended to be reviewed by only the intended recipient and may 
contain information that is privileged and/or confidential. If you are not the 
intended recipient, you are hereby notified that any review, use, 
dissemination, disclosure or copying of this email and its attachments, if any, 
is strictly prohibited. If you have received this email in error, please 
immediately notify the sender by return email and delete this email from your 
system."
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

Reply via email to