Re: SEVERE! radiusd 2.0 and 1.1.4 dying! Segmentation fault

2007-01-28 Thread Alan DeKok
Guilherme Franco wrote:
 The strange thing is that even when the accounting is off (with low
 load then) the error appears randomly.
 
 Also, if the proxy realm dies the problem occurs too.
...
 I can't give you a gdb because the server is running fine now, but who
 knows when it may happen...

  Then please set up a test server, with a test DB so that you can
reproduce the problem, and not affect your users.

  If you're running Linux, valgrind is a good tool.

  Alan DeKok.
--
  http://deployingradius.com   - The web site of the book
  http://deployingradius.com/blog/ - The blog
- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Re: SEVERE! radiusd 2.0 and 1.1.4 dying! Segmentation fault

2007-01-27 Thread Phil Mayers

Guilherme Franco wrote:

Hi,

Freeradius 2.0 alpha was working correctly since November 1st.

Then, this month, suddenly the server started to die, complaining of
Info: rlm_sql (sql): There are no DB handles to use! skipped 0, tried
to connect 0.


This normally means your database is slow. Clean out old accounting 
records (maybe move them to another table) and execute a vacuum analyze.




The server runs threaded with max_servers = 32 and num_sql_socks = 32
(there are 5 reqs per seconds, no more than that).

Ok so I've tried to run it single threaded (-X), but then, it's slow
and it missess some access requests, due to processing the accounting.


...indicating a high load, supporting the hypothesis.



I've uninstalled it and installed 1.1.4, but the same occurs!

Restarting radiusd when it fails gives another 15 minutes before it dies 
again.


Also, disabling accounting helps prolong the server lifetime.


Probably because it reduces the load, again supporting the hypothesis.

However - you also say it is segfaulting? Which I would not expect.

I don't really understand the format of the crash dump - can you supply 
one from gdb as documented in doc/bugs?
- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Re: SEVERE! radiusd 2.0 and 1.1.4 dying! Segmentation fault

2007-01-27 Thread Guilherme Franco

Thanks Mr. Mayers,

The database is Oracle on a powerful machine which only do acct/ auth.
All the relevant auth/ accounting queries are indexed to speed things
up.

There's a PostgreSQL database to take care of the sqlippool module.

The strange thing is that even when the accounting is off (with low
load then) the error appears randomly.

Also, if the proxy realm dies the problem occurs too.

That segfault was captured by running radiusd -xxx, which pinpoints
to an Oracle OCI error in this case (with acct on).

I can't give you a gdb because the server is running fine now, but who
knows when it may happen...

That setup was running fine for almost 3 months. All indicates a
resource starving problem, but the load is low :(

Thank you very much.
- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html