Review of draft-ietf-hokey-erx-09

I have reviewed this document as part of the Operations and Management
directorate effort.  These comments were primarily written for the
benefit of the O&M area directors.  Document editors and WG chairs
should treat these comments just like any other last call comments. 

Detailed review comments are available here:
http://www.drizzle.com/~aboba/EAP/erx-review.txt

An answer to typical O&M issues is included below:

1. Is the specification complete?  Can multiple interoperable implementations
be built based on the specification? 

There are a few areas of the document which are unclear to me, such as how
AAA routing is accomplished, and how/when peers require the local realm, and
if so, how it is to be obtained.  Also, clarity with respect to algorithm
agility could be improved.  There are also some issues with respect to the
required behavior of ERX peers and severs (use of normative language). 

There are also situations in which multiple approaches can be chosen (such as
the various bootstrap options), without one being chosen as mandatory or
default.  Choosing one approach would seem to be better.   

In my judgement, addressing these issues would improve the likelihood of
being able to build multiple interoperable implementations. 

2. Is the proposed specification deployable?  If not, how could it be
improved? 

Based on my reading of the document, it would appear that the ERX proposal
requires changes to EAP peers, authenticators and servers, as well as RADIUS 
clients, proxies and servers.  It also appears possible that changes to the
lower layer protocols will be required in at least some cases, such as to
make the local domain available to the peer. 

Given my experience in designing and operating wireless networks, deployments
requiring changes only to peers and authenticators (but not servers or RADIUS
infrastructure) can take as long as 3-5 years to complete.  For example,
WPA2 is still not universally deployed, even though the specification was
finished in 2004. 

By also requiring changes to AAA infrastructure, it seems to me that ERP
deployment will be made more difficult than upgrades to the lower layer
(such as IEEE 802.11r), which appear to achieve a similar objective.  
This puts the ERX proposal at a competitive disadvantage, and makes it
unlikely that it will be widely deployed in its current form.

3.  Does the proposed approach have any scaling issues that could affect
usability for large scale operation? 

The proposed approach introduces state into NASes, as well as RADIUS
proxies and servers.  This state is typically of two types:  routing
state and key state.  In terms of key state storage, it would appear
that the RADIUS server needs to store key state for each authenticated
user within the Session-Id lifetime, regardless of where they are
located.  Local ERX servers store state for all local users, regardless
of their home realms.  

In order to scale to handle a large user population, additional RADIUS
servers are typically deployed, going against a replicated backend
store (such as an LDAP directory).  Similarly, additional RADIUS
proxies are deployed to handle the forwarding load. 

In conventional RADIUS deployments, proxies act much like routers,
so that the failure of a RADIUS proxy will not necessarily result in
failure of an EAP authentication in progress.  For example, a NAS
could switch over from use of one proxy to another one and as long
as the same RADIUS server remained reachable, the conversation could
complete normally.  

Similarly, while failure of a RADIUS server during a conversation will
require re-starting the EAP conversation, that conversation could 
complete normally if restrated with a new server, since all servers
presumably have access to the same backend credential store. 

Some of these assumptions no longer apply with ERX, since RADIUS
proxies and servers now store key state which is not replicated
between them.  Therefore RADIUS failover would disrupt the functioning
of ERX in a way that it does not disrupt operation of RADIUS today. 

For example, if a RADIUS proxy or server goes down, all key state at that
proxy/server may be lost (the document does not talk about use of stable
storage to preserve keys), and therefore ERP requests will fail.   

With respect to the resource requirements required to store key state,
I believe that they are manageable for the most part. 
Typically RADIUS servers have substantial resources
associated with them, so that they are more capable of handling this kind
of state than NASes which are embedded devices. In terms of NAS state,
it would appear to me that the proposed approach scales better than 
existing proposals such as IEEE 802.11r, since an authenticator will only
hold state for connected devices, as opposed to devices that *might* 
connect in the future. 

My only concern would be about RADIUS proxies.  In my experience, 
proxies are often installed in co-location facilities where repairs
can be expensive and difficult, and so they are often installed on
stripped-down hardware;  with the current move toward flash, they
may not even have a hard disk in the near future.  Such stripped
down boxes may not be capable of maintaining large key caches.  

4. Are there any backward compatibility issues? 

There seem to be some issues with respect to backward
compatibility with EAP as defined in RFC 3748 and RFC 4137.  For example,
the document appears to enable two packets to be in flight at the same
time, and there seems to be an assumption that ERP implementations will not
respond to EAP-Request/Identity packets.  

A bigger problem may exist with respect to RFC 2284 implementations
which represent the bulk of existing EAP deployments.  Since RFC 2284
does not specify how peers and servers behave when encountering new
EAP message types or peer-initiated messages, the behavior in the
field will be implementation dependent.

Hopefully, this does not include unanticipated ill effects (crashes,
security compromises) but it's not possible to rule this out without
testing. 

There also may be issues with respect to compatibility with existing
EAP lower layers.  For example, it would appear to me that IEEE 802.1X-2001
(which represents the bulk of existing 802.1X deployments) does not support
peer-initiated messages. 

In order to minimize the backward compatibility issues, it probably makes
sense for the peer not to utilize ERP unless it has an indication that it
is supported on a given network and AAA server (e.g. based on 
pre-configuration).  Currently the document does not require this.  

Sections of the document relating to AAA packet routing are somewhat
unclear, and may introduce changes to the way that RADIUS
clients route packets.  However, discussion of AAA routing seems somewhat
orthogonal to the purpose of this document, so one way forward would be to
move this material to the RADIUS ERP document instead.  

5. Do you anticipate any manageability issues with the specification? 

In today's carrier deployments, we are seeing the need for the facilities
such as "Hotlining", which require the ability to modify authorizations
or remove key state created by a user session. 

RFC 5137 typically uses the User-Name as the key which the NAS uses in 
order to locate the state which is to be affected.  However, ERP 
introduces state within the local ERX server as well as on the NAS,
and it is not clear how this state can be removed.  For example, the
local ERX server may not have access to the actual User-Name, since
this could be hidden within the EAP conversation.  As a result,
I think that there is an implication that a user identifier such 
as the CUI is used to identify key state on the ERX server; however,
this is not stated. 

6. Does the specification introduce new potential security risks or
avenues for fraud?

One of the issues introduced by "fast handoff" specifications that
bypass the AAA server is that this can result in accounting packets
being sent without corresponding evidence of user presence.  For 
example, when the user is required to authenticate at each authenticator,
the home server has evidence that the user was in fact present at
those locations and times, even though the session times could be
inflated.

With ERP, it is required for the user to authenticate once within the
local domain, and then for it to remain there until the keys expire. 
This could involve a continuous session, or the user could go to
another domain and come back without having to re-authenticate. 

To some extent, the risk can be controlled by the home server
administrator by changing the key lifetime so as to require 
re-authentication within a given time frame.  However, the document
does not describe how rIK key lifetime will relate to other lifetimes
such as the Session-Id in order to accomplish this. 

A more serious issue appears to arise in the "implicit boostrap" exchange,
where the DSRK request is inserted by the local ERX server in a normal
EAP conversation.  As specified in the document, the AAA server does
not appear to have the ability to verify this request.  For example,
there is no requirement that the "local domain" correspond to the 
domain that would be returned from a PTR RR query on the NAS-IP-Address. 
This would seem to imply that any intermediate proxy can obtain a 
DSRK, and with it, the ability to submit unverifiable accounting
records.  

This would seem to introduce a fraud risk that is not
present in existing fast handoff proposals. 
_______________________________________________
Ietf mailing list
Ietf@ietf.org
http://www.ietf.org/mailman/listinfo/ietf

Reply via email to