Re: Multiple accounting requests crash the server

2008-01-25 Thread tnt
>
>#1:
>
>rad_recv: Accounting-Request packet from host X.X.X.X:46641, id=184,
>length=302
> User-Name = "blah"
> NAS-Port = 2
> NAS-Port-Type = Wireless-802.11
> NAS-Identifier = "XX"
> NAS-IP-Address = X.X.X.X
> Acct-Status-Type = Stop
> Calling-Station-Id = "MAC"
> Called-Station-Id = "MAC"
> Event-Timestamp = "Jan 25 2008 10:49:13 CET"
> Acct-Delay-Time = 955
> Acct-Session-Id = "59fdb3fd"
> Acct-Authentic = RADIUS
> Acct-Session-Time = 343
> Acct-Input-Octets = 48365
> Acct-Input-Gigawords = 0
> Acct-Input-Packets = 199
> Acct-Output-Octets = 42343
> Acct-Output-Gigawords = 0
> Acct-Output-Packets = 248
> Acct-Terminate-Cause = Idle-Timeout
> Framed-IP-Address = 192.168.51.186
> WISPr-Location-Name = "Address"
> Vendor-18529-Attr-42 = 0x48455336373131
> Attr-103 = 0x4799b09a
> Attr-103 = 0x4799b09a
> WISPr-Location-ID = "isocc=es,cc=34,ac=08080,network=SSID"
>UPDATE query (fails)
>INSERT query (seems to take place)
>rlm_sql (sql): Released sql socket id: 3
>   modcall[accounting]: module "sql" returns ok for request 0
>modcall: leaving group accounting (returns ok) for request 0
>Sending Accounting-Response of id 184 to X.X.X.X port 46641
>Finished request 0
>

Did you find out why did the UPDATE query fail. That's where the trouble
started. INSERT takes much longer to do ...

Ivan Kalik
Kalik Informatika ISP

-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Re: Multiple accounting requests crash the server

2008-01-25 Thread Mother

Hi Alan,

Thanks for your answers, mine inline below:

Alan DeKok wrote:

Mother wrote:

I am seeing a strange situation. I receive an accounting-stop request
from a NAS, and FreeRADIUS (1.1.7 against Oracle) updates the
corresponding radacct record. However, the NAS is not receiving the ack,
and thus re-sends the stop request. On the second request, FreeRADIUS
tries to do an update query again, and then, an insert query, with the
stop message details (i.e. only a stop time, reason idle-timeout, etc.),
which fails.


  Why?


As I mention in my update, an accounting-stop that uses the alternate 
default query will attempt an insert, not an update (I have a few new 
records in radacct with only stop time after this episode, all with the 
same session ID). The reason the NAS was resending was that it was not 
receiving the confirmation back from FreeRADIUS, most likely due to a 
temporary network failure on a segment along the way.



After the third request from the NAS (and corresponding
update followed by insert), Oracle throws a unique constraint violation
error, and the server freezes.


  Weird.


Indeed - why it throws this after the third query is what gets me, as 
the unique constraint should have been hit right after the first update 
was done. I will investigate the timeline of events with more detail.



Questions:

1. Why is FreeRADIUS failing to see that this request was already
acknowledged, i.e. it has been updated on the database, and just sends
an ACK, rather than trying to insert a new record?


  Because RADIUS doesn't work like that.  Accounting requests are never
re-sent, so *all* accounting requests are "brand new", and have to be
treated that way.


They are never re-sent? Then please tell one of the major European 
wireless ISP about this, as they were hammering my server :) This is 
what they saw:


Fri Jan 25 10:31:10 2008: INFO: AuthRADIUS: No reply after 2
retransmissions to 80.33.137.68:1813 for [EMAIL PROTECTED]  (25). Now have
1 consecutive failures over 0 seconds. Backing off for 30 seconds

Note that they are *not* using FreeRADIUS. In any case, I will adapt the 
queries as needed so that they don't hit a brick wall if I find 
conditions like this.



2. Why does FreeRADIUS freeze on an SQL error from Oracle? Should it not
just log the error and carry on about its business? I am finding my
server freezing every few days due to these issues, for example, if a
query takes too long to run, or a trigger fails to execute. Is
FreeRADIUS against Oracle more fragile than say MySQL?


  I think that fewer people are using it that way.


Yes, I understand Oracle penetration in FreeRADIUS is tiny, but still.


  Perhaps you could try posting the error message, or run it under "gdb"
to see why it's freezing.


These are the sanitized logs, with irrelevant crud cut out:

#1:

rad_recv: Accounting-Request packet from host X.X.X.X:46641, id=184, 
length=302

User-Name = "blah"
NAS-Port = 2
NAS-Port-Type = Wireless-802.11
NAS-Identifier = "XX"
NAS-IP-Address = X.X.X.X
Acct-Status-Type = Stop
Calling-Station-Id = "MAC"
Called-Station-Id = "MAC"
Event-Timestamp = "Jan 25 2008 10:49:13 CET"
Acct-Delay-Time = 955
Acct-Session-Id = "59fdb3fd"
Acct-Authentic = RADIUS
Acct-Session-Time = 343
Acct-Input-Octets = 48365
Acct-Input-Gigawords = 0
Acct-Input-Packets = 199
Acct-Output-Octets = 42343
Acct-Output-Gigawords = 0
Acct-Output-Packets = 248
Acct-Terminate-Cause = Idle-Timeout
Framed-IP-Address = 192.168.51.186
WISPr-Location-Name = "Address"
Vendor-18529-Attr-42 = 0x48455336373131
Attr-103 = 0x4799b09a
Attr-103 = 0x4799b09a
WISPr-Location-ID = "isocc=es,cc=34,ac=08080,network=SSID"
UPDATE query (fails)
INSERT query (seems to take place)
rlm_sql (sql): Released sql socket id: 3
  modcall[accounting]: module "sql" returns ok for request 0
modcall: leaving group accounting (returns ok) for request 0
Sending Accounting-Response of id 184 to X.X.X.X port 46641
Finished request 0


#2:

rad_recv: Accounting-Request packet from host X.X.X.X:46641, id=185, 
length=302

User-Name = "blah"
NAS-Port = 2
NAS-Port-Type = Wireless-802.11
NAS-Identifier = "XXX"
NAS-IP-Address = X.X.X.X
Acct-Status-Type = Stop
Calling-Station-Id = "MAC"
Called-Station-Id = "MAC"
Event-Timestamp = "Jan 25 2008 10:49:13 CET"
Acct-Delay-Time = 955
Acct-Session-Id = "59fdb3fd"
Acct-Authentic = RADIUS
Acct-Session-Time = 343
Acct-Input-Octets = 48365
Acct-Input-Gigawords = 0
Acct-Input-Packets = 199
Acct-Output-Octets = 42343
Acct-Output-Gigawords = 0
Acct-Output-Packets = 248
Acct-Terminate-Cause = Idle-Timeout
Framed-IP-Address = 192.168.51.186
WISPr-Location-Name = "Address"
Vendor-18529-Attr-42 = 0x48455336373131
Attr-103 = 0x4799b05a
Attr-103 = 0x4799b09b
WISPr-Location-ID = "isocc=es,cc=34,ac=08080,networ

Re: Multiple accounting requests crash the server

2008-01-25 Thread Alan DeKok
Mother wrote:
> I am seeing a strange situation. I receive an accounting-stop request
> from a NAS, and FreeRADIUS (1.1.7 against Oracle) updates the
> corresponding radacct record. However, the NAS is not receiving the ack,
> and thus re-sends the stop request. On the second request, FreeRADIUS
> tries to do an update query again, and then, an insert query, with the
> stop message details (i.e. only a stop time, reason idle-timeout, etc.),
> which fails.

  Why?

> After the third request from the NAS (and corresponding
> update followed by insert), Oracle throws a unique constraint violation
> error, and the server freezes.

  Weird.

> Questions:
> 
> 1. Why is FreeRADIUS failing to see that this request was already
> acknowledged, i.e. it has been updated on the database, and just sends
> an ACK, rather than trying to insert a new record?

  Because RADIUS doesn't work like that.  Accounting requests are never
re-sent, so *all* accounting requests are "brand new", and have to be
treated that way.

> 2. Why does FreeRADIUS freeze on an SQL error from Oracle? Should it not
> just log the error and carry on about its business? I am finding my
> server freezing every few days due to these issues, for example, if a
> query takes too long to run, or a trigger fails to execute. Is
> FreeRADIUS against Oracle more fragile than say MySQL?

  I think that fewer people are using it that way.

  Perhaps you could try posting the error message, or run it under "gdb"
to see why it's freezing.

  Alan DeKok.
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Re: Multiple accounting requests crash the server [update]

2008-01-25 Thread Mother
Update to the problem: the accounting-stop alternate query is actually 
an INSERT, not an UPDATE, by default, which actually surprises me, as in 
case of a duplicate packet, an INSERT into a properly unique-indexed 
table is doomed. I have now simply changed the -alt into an UPDATE 
query, so it would also not affect any record, but at least will not 
cause Oracle to balk.


My only problem now is how to make sure FreeRADIUS does not freeze upon 
an SQL error.


Cheers,

Mike


Mother wrote:

Hi all,

I am seeing a strange situation. I receive an accounting-stop request 
from a NAS, and FreeRADIUS (1.1.7 against Oracle) updates the 
corresponding radacct record. However, the NAS is not receiving the ack, 
and thus re-sends the stop request. On the second request, FreeRADIUS 
tries to do an update query again, and then, an insert query, with the 
stop message details (i.e. only a stop time, reason idle-timeout, etc.), 
which fails. After the third request from the NAS (and corresponding 
update followed by insert), Oracle throws a unique constraint violation 
error, and the server freezes.


Questions:

1. Why is FreeRADIUS failing to see that this request was already 
acknowledged, i.e. it has been updated on the database, and just sends 
an ACK, rather than trying to insert a new record?


2. Why does FreeRADIUS freeze on an SQL error from Oracle? Should it not 
just log the error and carry on about its business? I am finding my 
server freezing every few days due to these issues, for example, if a 
query takes too long to run, or a trigger fails to execute. Is 
FreeRADIUS against Oracle more fragile than say MySQL?


Cheers,

Mike

-
List info/subscribe/unsubscribe? See 
http://www.freeradius.org/list/users.html



-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Multiple accounting requests crash the server

2008-01-25 Thread Mother

Hi all,

I am seeing a strange situation. I receive an accounting-stop request 
from a NAS, and FreeRADIUS (1.1.7 against Oracle) updates the 
corresponding radacct record. However, the NAS is not receiving the ack, 
and thus re-sends the stop request. On the second request, FreeRADIUS 
tries to do an update query again, and then, an insert query, with the 
stop message details (i.e. only a stop time, reason idle-timeout, etc.), 
which fails. After the third request from the NAS (and corresponding 
update followed by insert), Oracle throws a unique constraint violation 
error, and the server freezes.


Questions:

1. Why is FreeRADIUS failing to see that this request was already 
acknowledged, i.e. it has been updated on the database, and just sends 
an ACK, rather than trying to insert a new record?


2. Why does FreeRADIUS freeze on an SQL error from Oracle? Should it not 
just log the error and carry on about its business? I am finding my 
server freezing every few days due to these issues, for example, if a 
query takes too long to run, or a trigger fails to execute. Is 
FreeRADIUS against Oracle more fragile than say MySQL?


Cheers,

Mike

-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html