Re: Indeterministic EAP error

2012-10-05 Thread Alan DeKok
Matthias Nagel wrote:
 Anyway, first things - check your eap {} module config, specifically 
 ensure that max_sessions is high enough to support your load, that 
 timer_expire isn't too low, and if applicable, that your TLS session 
 caching is ok (size, particularly).
 
 I did not find max_sessions anywhere in the config files. Where is it 
 supposed to be set and what is the default if not set?

  It is in the eap module configuration.  You were told this.  Go read
raddb/eap.conf.

  And as before, the issue is not FreeRADIUS.  No amount of poking the
FreeRADIUS configuration will fix AP or WiFi problems.

  Alan DeKok.
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Indeterministic EAP error

2012-10-04 Thread Matthias Nagel
Hello,

sometimes I get the error

WARNING: !! EAP session for state 0xABCDEFGHIJKLMNOP did not finish!

in my log files / debug output. Before anybody says have a look at

http://deployingradius.com/documents/configuration/eap-problems.html

that will help, please read on, because I already have done that and I believe 
the problem is a little bit more tricky.

I support PEAP+MsCHAPv2 only and 90% of time it just works. I am pretty sure 
that the certificate is all right. If anybody wants to check it, one can find 
it here

https://freeradius:eaper...@www.stud.uni-karlsruhe.de/~uzbii/hekauth-certs.pem

The certificate file includes all intermediate issuers and the trusted CA. The 
CA is Germany's biggest telco, so most OSes ship with that by default. The 
certificate also includes the X509v3 Extended Key Usage TLS Web Client and 
Authentication and TLS Web Server Authentication in order to satisfy Windows 
clients.

My radius config looks like that:

certdir = ${confdir}/certs
cadir = ${confdir}/certs
private_key_file = ${certdir}/hekauth-key.pem
certificate_file = ${certdir}/hekauth-certs.pem
# CA_file = 
CA_path = ${certdir}/empty-by-purpose/


If a new client connects for the very first time, most OSes automatically 
detect the correct authentication scheme, ask for username and password, 
present the certificate for confirmation and it works out of the box. (No 
errors on neither client nor server side.)

Randomly, I get this error message although the respective client normally 
works. In that case the client just restarts the authentication and then 
succeeds on the second trial. Hence the only difference the user might notice 
is an authentication that might take some milliseconds longer.

During the last four days there have been 1278 such errors, 2519 session, 9651 
successful authentication attempts, i.e. each session triggered approximately 
3.8 re-authentications, 93 different clients and at least 6 different OSes.

I cannot find any pattern, so I do not believe it to be a client side issue.

Of course, one can argue to ignore the warning as it works most of the time, 
but I do not like indeterministically behaving IT systems, hence it preys on my 
mind.

Has anybody an idea what the reason might be? If anybody wants to see a full 
debug output or a tcpdump, I can provide you with plenty of that. But I could 
not find anything.

Yours, Matthias

--
Matthias Nagel
Willy-Andreas-Allee 1, Zimmer 506
76131 Karlsruhe

Telefon: +49-721-8695-1506
Mobil: +49-151-15998774
e-Mail: matthias.h.na...@gmail.com
ICQ: 499797758
Skype: nagmat84

-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Re: Indeterministic EAP error

2012-10-04 Thread Phil Mayers

On 04/10/12 16:45, Matthias Nagel wrote:


I cannot find any pattern, so I do not believe it to be a client side
issue.

Of course, one can argue to ignore the warning as it works most of
the time, but I do not like indeterministically behaving IT systems,
hence it preys on my mind.

Has anybody an idea what the reason might be? If anybody wants to see
a full debug output or a tcpdump, I can provide you with plenty of
that. But I could not find anything.


One thing: that logging only happens in debug mode. Most people don't 
run in debug mode all the time, so as far as I know, it could be normal 
- maybe everyone sees failure rates of that order?



Anyway, first things - check your eap {} module config, specifically 
ensure that max_sessions is high enough to support your load, that 
timer_expire isn't too low, and if applicable, that your TLS session 
caching is ok (size, particularly).


Otherwise - I assume you are authenticating wireless clients?

Unfortunately, wireless is funky. Clients can stop doing the EAP 
exchange for all sorts of reasons - interference / packet loss, signal 
strength issues (they moved to a different AP), prompting the user for 
password / cert issuance, etc.


Are you able to determine where the EAP sessions have got to before they 
hang up? Are they still in TLS setup, or inner-tunnel? Does it hang up 
after e.g. the EAP-MSCHAP challenge?


Regrettably the session did not finish logging isn't great, so 
determining this is hard - I keep meaning to see if it can be improved 
e.g. log some attributes from the original packet, log the state of the 
EAP session, etc.

-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Re: Indeterministic EAP error

2012-10-04 Thread Alan DeKok
Matthias Nagel wrote:
 I cannot find any pattern, so I do not believe it to be a client side issue.

  It's always an issue with the client, WiFi, or AP.  It's not an issue
with FreeRADIUS.

  Why?  All of the EAP is driven by the client.

 Of course, one can argue to ignore the warning as it works most of the time, 
 but I do not like indeterministically behaving IT systems, hence it preys on 
 my mind.
 
 Has anybody an idea what the reason might be? If anybody wants to see a full 
 debug output or a tcpdump, I can provide you with plenty of that. But I could 
 not find anything.

  You won't see it in a tcpdump.  The *non* continuance of the EAP
session is what FreeRADIUS is complaining about.  tcpdump won't show you
any more.

  Look on the client and/or the AP for the problem.

  Alan DeKok.
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Re: Indeterministic EAP error

2012-10-04 Thread Matthias Nagel
Hello,

Am Donnerstag 04 Oktober 2012, 17:09:35 schrieb Phil Mayers:
 On 04/10/12 16:45, Matthias Nagel wrote:
 
  I cannot find any pattern, so I do not believe it to be a client side
  issue.
 
  Has anybody an idea what the reason might be? If anybody wants to see
  a full debug output or a tcpdump, I can provide you with plenty of
  that. But I could not find anything.
 
 One thing: that logging only happens in debug mode. Most people don't 
 run in debug mode all the time, so as far as I know, it could be normal 
 - maybe everyone sees failure rates of that order?

That would be nice, indeed. But if the reason is signal strengh of a WiFi, then 
the numbers heavily depend on your WiFi coverage. So it is difficult to compare.

 Anyway, first things - check your eap {} module config, specifically 
 ensure that max_sessions is high enough to support your load, that 
 timer_expire isn't too low, and if applicable, that your TLS session 
 caching is ok (size, particularly).

I did not find max_sessions anywhere in the config files. Where is it 
supposed to be set and what is the default if not set?  timer_expire is 60 
seconds. The cache size for session resumption is set to 0. I read that this 
means infinite somewhere. I see a lot of session resumptions that work.

I found the entry
#  fragment_size = 1024
to be commented out. Does anybody has experiences with HP E-MSM 430 APs? 
Probably, this is a dummy question: I always believed that the smallest MTU 
that must be supported by an ethernet devices is 1500. Are there really APs 
that support less? I did not find anything on that in the specifications of my 
AP. And second question: Does a wrong value for fragment_size always fail? Or 
to state it conversely: If a default fragment size of 1024 works most of the 
time (as it does with me), can this still be a reason for the failure, if it is 
too high?


 Otherwise - I assume you are authenticating wireless clients?

Half-half. It is a HP 5412 chassis solution with an integrated MSM 765zl WiFi 
controller. Most clients are wired (desktop pcs) and some clients (Smartphones, 
Tablets, Laptops) are wireless. But yes, if I (hopefully correctly) link the 
error message to the corresponding access challenge, most errors are from 
wireless sessions.

 Are you able to determine where the EAP sessions have got to before they 
 hang up? Are they still in TLS setup, or inner-tunnel? Does it hang up 
 after e.g. the EAP-MSCHAP challenge?

I am not sure, if I do the linking between error message and access challenge 
correctly. But if I do so, there is no particular point. 

 Regrettably the session did not finish logging isn't great, so 
 determining this is hard - I keep meaning to see if it can be improved 
 e.g. log some attributes from the original packet, log the state of the 
 EAP session, etc.

At the moment I do the following: I pick the hex number from the error message 
and look for an access challenge, that has the same number in its State AVP. 
If this is the wrong way to do, then all I said before is non-sense.

Matthias

--
Matthias Nagel
Willy-Andreas-Allee 1, Zimmer 506
76131 Karlsruhe

Telefon: +49-721-8695-1506
Mobil: +49-151-15998774
e-Mail: matthias.h.na...@gmail.com
ICQ: 499797758
Skype: nagmat84

-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Re: Indeterministic EAP error

2012-10-04 Thread Phil Mayers

On 04/10/12 18:10, Matthias Nagel wrote:


That would be nice, indeed. But if the reason is signal strengh of a
WiFi, then the numbers heavily depend on your WiFi coverage. So it is
difficult to compare.


Sure.

As Alan says, it's the client that's going away.

Maybe search the logs of your wireless kit for radio-layer events.

To be honest, the rest of my suggestions are unlikely to help - it's 
probably just wifi packet loss. We see this a lot. EAP seems to be 
particularly susceptible to being interrupted, because it runs in 
lockstep and upper-layer retransmits are simpler than something like TCP.




I did not find max_sessions anywhere in the config files. Where is


https://github.com/FreeRADIUS/freeradius-server/blob/v2.1.x/raddb/eap.conf#L61


of my AP. And second question: Does a wrong value for fragment_size
always fail? Or to state it conversely: If a default fragment size of
1024 works most of the time (as it does with me), can this still be a
reason for the failure, if it is too high?



I doubt it. I think it's set to 1024 to be safe and handle things like 
weird IPSec tunnel MTUs, etc.



At the moment I do the following: I pick the hex number from the
error message and look for an access challenge, that has the same
number in its State AVP. If this is the wrong way to do, then all I
said before is non-sense.


That's right. The hex number in the message is the State value.
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Re: Indeterministic EAP error

2012-10-04 Thread alan buxey
Hi,

 I cannot find any pattern, so I do not believe it to be a client side
 issue.

snip

 One thing: that logging only happens in debug mode. Most people
 don't run in debug mode all the time, so as far as I know, it could
 be normal - maybe everyone sees failure rates of that order?

snip


as Phil says, that message only appears in debug mode ...and debug mode runs in
a single thread and slows the whole process down. if you have multiple clients
trying to connect when in this state..and your server cannot deal with the 
client
fast enough, then you run into timing issues...et voila, plenty of errors and
did not finish errors etc.

ensure your main EAP method is first in the list. use the caching feature so 
the clients
dont have to go through the whole 12 trips etc

..and , as Phil says, with wireless you are dealing with the whole PHY issue - 
packets
sent may have got scrambled, needed resending...if the air is 'busy' with duty 
cycles the
client may not be able to transmit in a timely fashion - got 802.11b clients 
around?

alan
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Re: Indeterministic EAP error

2012-10-04 Thread alan buxey
Hi,

 I found the entry
 #  fragment_size = 1024
 to be commented out. Does anybody has experiences with HP E-MSM 430 APs? 
 Probably, this is a dummy question: I always believed that the smallest MTU 
 that must be supported by an ethernet devices is 1500. Are there really APs 
 that support less? I did not find anything on that in the specifications of 
 my AP. And second question: Does a wrong value for fragment_size always fail? 
 Or to state it conversely: If a default fragment size of 1024 works most of 
 the time (as it does with me), can this still be a reason for the failure, if 
 it is too high?

actually, wifi has bigger MTUs than that - around 2304 for payload - the 
problem is ethernet...which is USALLY 1500.

if you DONT set this, then the RADIUS server will cram as much as possible into 
each packet...and this your certificate,
its intermediates and CA root are all shover through some rather large 
packets... if you set this value - eg to 1024
then those packets are nice and tight.


alan
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html


Re: Indeterministic EAP error

2012-10-04 Thread Matthew Newton
On Thu, Oct 04, 2012 at 05:45:30PM +0200, Matthias Nagel wrote:
 WARNING: !! EAP session for state 0xABCDEFGHIJKLMNOP did not finish!
...
 Has anybody an idea what the reason might be?

We see it a lot less since we tweaked the EAP timers on our Cisco
Wireless Controller. You don't say what APs or system you're
using, but for example if it's the Cisco WLCs see
https://supportforums.cisco.com/docs/DOC-12110

The issue would go /something/ like (I forget the precise details):

  User clicks connect

  (*) Types in username and password slowly

  EAP Identity Request would time out (20s or so)

  EAP session would get closed - client  controller would give up -
  error above

  User clicks login

  EAP session starts again

  either a) EAP completes and client connects

  or b) client realises that their EAP session got broken, and
  prompts the user for their password again - go back to '*'.

Then... after after a couple of times, the controller might figure
that the client has done some bad authentications, and ban them
for a minute or so.

We tweaked the timers to make the Identity Request time + max
retries longer, and disabled the automatic banning of clients from
invalid authentications. Generally now the only time we see that
error is if we restart FreeRADIUS (in which case, EAP sessions in
transit get broken, so it's the sort of thing I expect).

You still sometimes see it if a client is on the edge of a radio
cell, and moves out of range whilst connecting, for example, but
it's nothing like as often as it used to be.

In short, it's a client/NAS issue, as already stated.

Hope that helps,

Matthew


-- 
Matthew Newton, Ph.D. m...@le.ac.uk

Systems Architect (UNIX and Networks), Network Services,
I.T. Services, University of Leicester, Leicester LE1 7RH, United Kingdom

For IT help contact helpdesk extn. 2253, ith...@le.ac.uk
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html