Re: Indeterministic EAP error
Matthias Nagel wrote: Anyway, first things - check your eap {} module config, specifically ensure that max_sessions is high enough to support your load, that timer_expire isn't too low, and if applicable, that your TLS session caching is ok (size, particularly). I did not find max_sessions anywhere in the config files. Where is it supposed to be set and what is the default if not set? It is in the eap module configuration. You were told this. Go read raddb/eap.conf. And as before, the issue is not FreeRADIUS. No amount of poking the FreeRADIUS configuration will fix AP or WiFi problems. Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Indeterministic EAP error
Hello, sometimes I get the error WARNING: !! EAP session for state 0xABCDEFGHIJKLMNOP did not finish! in my log files / debug output. Before anybody says have a look at http://deployingradius.com/documents/configuration/eap-problems.html that will help, please read on, because I already have done that and I believe the problem is a little bit more tricky. I support PEAP+MsCHAPv2 only and 90% of time it just works. I am pretty sure that the certificate is all right. If anybody wants to check it, one can find it here https://freeradius:eaper...@www.stud.uni-karlsruhe.de/~uzbii/hekauth-certs.pem The certificate file includes all intermediate issuers and the trusted CA. The CA is Germany's biggest telco, so most OSes ship with that by default. The certificate also includes the X509v3 Extended Key Usage TLS Web Client and Authentication and TLS Web Server Authentication in order to satisfy Windows clients. My radius config looks like that: certdir = ${confdir}/certs cadir = ${confdir}/certs private_key_file = ${certdir}/hekauth-key.pem certificate_file = ${certdir}/hekauth-certs.pem # CA_file = CA_path = ${certdir}/empty-by-purpose/ If a new client connects for the very first time, most OSes automatically detect the correct authentication scheme, ask for username and password, present the certificate for confirmation and it works out of the box. (No errors on neither client nor server side.) Randomly, I get this error message although the respective client normally works. In that case the client just restarts the authentication and then succeeds on the second trial. Hence the only difference the user might notice is an authentication that might take some milliseconds longer. During the last four days there have been 1278 such errors, 2519 session, 9651 successful authentication attempts, i.e. each session triggered approximately 3.8 re-authentications, 93 different clients and at least 6 different OSes. I cannot find any pattern, so I do not believe it to be a client side issue. Of course, one can argue to ignore the warning as it works most of the time, but I do not like indeterministically behaving IT systems, hence it preys on my mind. Has anybody an idea what the reason might be? If anybody wants to see a full debug output or a tcpdump, I can provide you with plenty of that. But I could not find anything. Yours, Matthias -- Matthias Nagel Willy-Andreas-Allee 1, Zimmer 506 76131 Karlsruhe Telefon: +49-721-8695-1506 Mobil: +49-151-15998774 e-Mail: matthias.h.na...@gmail.com ICQ: 499797758 Skype: nagmat84 - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Indeterministic EAP error
On 04/10/12 16:45, Matthias Nagel wrote: I cannot find any pattern, so I do not believe it to be a client side issue. Of course, one can argue to ignore the warning as it works most of the time, but I do not like indeterministically behaving IT systems, hence it preys on my mind. Has anybody an idea what the reason might be? If anybody wants to see a full debug output or a tcpdump, I can provide you with plenty of that. But I could not find anything. One thing: that logging only happens in debug mode. Most people don't run in debug mode all the time, so as far as I know, it could be normal - maybe everyone sees failure rates of that order? Anyway, first things - check your eap {} module config, specifically ensure that max_sessions is high enough to support your load, that timer_expire isn't too low, and if applicable, that your TLS session caching is ok (size, particularly). Otherwise - I assume you are authenticating wireless clients? Unfortunately, wireless is funky. Clients can stop doing the EAP exchange for all sorts of reasons - interference / packet loss, signal strength issues (they moved to a different AP), prompting the user for password / cert issuance, etc. Are you able to determine where the EAP sessions have got to before they hang up? Are they still in TLS setup, or inner-tunnel? Does it hang up after e.g. the EAP-MSCHAP challenge? Regrettably the session did not finish logging isn't great, so determining this is hard - I keep meaning to see if it can be improved e.g. log some attributes from the original packet, log the state of the EAP session, etc. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Indeterministic EAP error
Matthias Nagel wrote: I cannot find any pattern, so I do not believe it to be a client side issue. It's always an issue with the client, WiFi, or AP. It's not an issue with FreeRADIUS. Why? All of the EAP is driven by the client. Of course, one can argue to ignore the warning as it works most of the time, but I do not like indeterministically behaving IT systems, hence it preys on my mind. Has anybody an idea what the reason might be? If anybody wants to see a full debug output or a tcpdump, I can provide you with plenty of that. But I could not find anything. You won't see it in a tcpdump. The *non* continuance of the EAP session is what FreeRADIUS is complaining about. tcpdump won't show you any more. Look on the client and/or the AP for the problem. Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Indeterministic EAP error
Hello, Am Donnerstag 04 Oktober 2012, 17:09:35 schrieb Phil Mayers: On 04/10/12 16:45, Matthias Nagel wrote: I cannot find any pattern, so I do not believe it to be a client side issue. Has anybody an idea what the reason might be? If anybody wants to see a full debug output or a tcpdump, I can provide you with plenty of that. But I could not find anything. One thing: that logging only happens in debug mode. Most people don't run in debug mode all the time, so as far as I know, it could be normal - maybe everyone sees failure rates of that order? That would be nice, indeed. But if the reason is signal strengh of a WiFi, then the numbers heavily depend on your WiFi coverage. So it is difficult to compare. Anyway, first things - check your eap {} module config, specifically ensure that max_sessions is high enough to support your load, that timer_expire isn't too low, and if applicable, that your TLS session caching is ok (size, particularly). I did not find max_sessions anywhere in the config files. Where is it supposed to be set and what is the default if not set? timer_expire is 60 seconds. The cache size for session resumption is set to 0. I read that this means infinite somewhere. I see a lot of session resumptions that work. I found the entry # fragment_size = 1024 to be commented out. Does anybody has experiences with HP E-MSM 430 APs? Probably, this is a dummy question: I always believed that the smallest MTU that must be supported by an ethernet devices is 1500. Are there really APs that support less? I did not find anything on that in the specifications of my AP. And second question: Does a wrong value for fragment_size always fail? Or to state it conversely: If a default fragment size of 1024 works most of the time (as it does with me), can this still be a reason for the failure, if it is too high? Otherwise - I assume you are authenticating wireless clients? Half-half. It is a HP 5412 chassis solution with an integrated MSM 765zl WiFi controller. Most clients are wired (desktop pcs) and some clients (Smartphones, Tablets, Laptops) are wireless. But yes, if I (hopefully correctly) link the error message to the corresponding access challenge, most errors are from wireless sessions. Are you able to determine where the EAP sessions have got to before they hang up? Are they still in TLS setup, or inner-tunnel? Does it hang up after e.g. the EAP-MSCHAP challenge? I am not sure, if I do the linking between error message and access challenge correctly. But if I do so, there is no particular point. Regrettably the session did not finish logging isn't great, so determining this is hard - I keep meaning to see if it can be improved e.g. log some attributes from the original packet, log the state of the EAP session, etc. At the moment I do the following: I pick the hex number from the error message and look for an access challenge, that has the same number in its State AVP. If this is the wrong way to do, then all I said before is non-sense. Matthias -- Matthias Nagel Willy-Andreas-Allee 1, Zimmer 506 76131 Karlsruhe Telefon: +49-721-8695-1506 Mobil: +49-151-15998774 e-Mail: matthias.h.na...@gmail.com ICQ: 499797758 Skype: nagmat84 - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Indeterministic EAP error
On 04/10/12 18:10, Matthias Nagel wrote: That would be nice, indeed. But if the reason is signal strengh of a WiFi, then the numbers heavily depend on your WiFi coverage. So it is difficult to compare. Sure. As Alan says, it's the client that's going away. Maybe search the logs of your wireless kit for radio-layer events. To be honest, the rest of my suggestions are unlikely to help - it's probably just wifi packet loss. We see this a lot. EAP seems to be particularly susceptible to being interrupted, because it runs in lockstep and upper-layer retransmits are simpler than something like TCP. I did not find max_sessions anywhere in the config files. Where is https://github.com/FreeRADIUS/freeradius-server/blob/v2.1.x/raddb/eap.conf#L61 of my AP. And second question: Does a wrong value for fragment_size always fail? Or to state it conversely: If a default fragment size of 1024 works most of the time (as it does with me), can this still be a reason for the failure, if it is too high? I doubt it. I think it's set to 1024 to be safe and handle things like weird IPSec tunnel MTUs, etc. At the moment I do the following: I pick the hex number from the error message and look for an access challenge, that has the same number in its State AVP. If this is the wrong way to do, then all I said before is non-sense. That's right. The hex number in the message is the State value. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Indeterministic EAP error
Hi, I cannot find any pattern, so I do not believe it to be a client side issue. snip One thing: that logging only happens in debug mode. Most people don't run in debug mode all the time, so as far as I know, it could be normal - maybe everyone sees failure rates of that order? snip as Phil says, that message only appears in debug mode ...and debug mode runs in a single thread and slows the whole process down. if you have multiple clients trying to connect when in this state..and your server cannot deal with the client fast enough, then you run into timing issues...et voila, plenty of errors and did not finish errors etc. ensure your main EAP method is first in the list. use the caching feature so the clients dont have to go through the whole 12 trips etc ..and , as Phil says, with wireless you are dealing with the whole PHY issue - packets sent may have got scrambled, needed resending...if the air is 'busy' with duty cycles the client may not be able to transmit in a timely fashion - got 802.11b clients around? alan - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Indeterministic EAP error
Hi, I found the entry # fragment_size = 1024 to be commented out. Does anybody has experiences with HP E-MSM 430 APs? Probably, this is a dummy question: I always believed that the smallest MTU that must be supported by an ethernet devices is 1500. Are there really APs that support less? I did not find anything on that in the specifications of my AP. And second question: Does a wrong value for fragment_size always fail? Or to state it conversely: If a default fragment size of 1024 works most of the time (as it does with me), can this still be a reason for the failure, if it is too high? actually, wifi has bigger MTUs than that - around 2304 for payload - the problem is ethernet...which is USALLY 1500. if you DONT set this, then the RADIUS server will cram as much as possible into each packet...and this your certificate, its intermediates and CA root are all shover through some rather large packets... if you set this value - eg to 1024 then those packets are nice and tight. alan - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Indeterministic EAP error
On Thu, Oct 04, 2012 at 05:45:30PM +0200, Matthias Nagel wrote: WARNING: !! EAP session for state 0xABCDEFGHIJKLMNOP did not finish! ... Has anybody an idea what the reason might be? We see it a lot less since we tweaked the EAP timers on our Cisco Wireless Controller. You don't say what APs or system you're using, but for example if it's the Cisco WLCs see https://supportforums.cisco.com/docs/DOC-12110 The issue would go /something/ like (I forget the precise details): User clicks connect (*) Types in username and password slowly EAP Identity Request would time out (20s or so) EAP session would get closed - client controller would give up - error above User clicks login EAP session starts again either a) EAP completes and client connects or b) client realises that their EAP session got broken, and prompts the user for their password again - go back to '*'. Then... after after a couple of times, the controller might figure that the client has done some bad authentications, and ban them for a minute or so. We tweaked the timers to make the Identity Request time + max retries longer, and disabled the automatic banning of clients from invalid authentications. Generally now the only time we see that error is if we restart FreeRADIUS (in which case, EAP sessions in transit get broken, so it's the sort of thing I expect). You still sometimes see it if a client is on the edge of a radio cell, and moves out of range whilst connecting, for example, but it's nothing like as often as it used to be. In short, it's a client/NAS issue, as already stated. Hope that helps, Matthew -- Matthew Newton, Ph.D. m...@le.ac.uk Systems Architect (UNIX and Networks), Network Services, I.T. Services, University of Leicester, Leicester LE1 7RH, United Kingdom For IT help contact helpdesk extn. 2253, ith...@le.ac.uk - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html