Hi Alan, Thanks for the updates in the -07; they largely address my concerns. The main point I see as open is nailing down the TLS-PSK situation for EAP-TLS (noting in particular that if we use a well-known PSK value then TLS-PSK does not provide server authentication).
Trimming some bits but otherwise inline... On Tue, Jun 03, 2025 at 01:09:54PM -0400, Alan DeKok wrote: > On May 16, 2025, at 3:24 PM, Benjamin Kaduk via Datatracker > <[email protected]> wrote: > > > ## Discuss > > > > ### (Non-)Permanence of domain registrations > > > > Section 6.6 (and others) describes self-assignment of identifiers under the > > "v." subdomain, with an organization being able to use a FQDN they have > > registered as the domain prefix. But such domain registrations are not > > permanent, and implementations using such names in software may persist > > after > > the registration has lapsed. I think we should have some text in the > > document > > discussing this mismatch in timescales, which might entail guidance to > > domain > > owners to ensure they keep the domain registered or some guidance to > > implementors/users that such self-registrations may become stale if the > > domain > > ownership changes (or some other solution, of course). (For example, the > > claim in §3.2 that such self-assigned identifiers "cannot conflict with > > other > > identifiers" is not true if the domain name used to construct the > > identifiers > > gets reassigned.) > > I'll add some text to address this issue. I think that looks good. > > ### authenticate the server or not > > > > It looks like there's some internal inconsistency in what we expect to > > happen > > for using EAP-TLS for provisioning w.r.t. server authentication. In > > toplevel > > §5 we say that EAP-TLS has the advantage of authenticating the EAP server, > > but > > in §5.1 we say that the device "SHOULD ignore" the server certificate (but > > that the device likely has web CAs present and could use those to > > authenticate > > the EAP server). Is there some subtlety I'm missing that makes these cases > > different? If not, it seems like we need to have a consistent message on > > what > > EAP-TLS for provisioning is supposed to provide (and if there is a subtle > > distinction, we should call it out clearly). > > I've added some text to address this issue, based on similar comments from > others. The text better explain when / why the EAP server is authenticated, > and what impact that has. Yes, that text is helpful -- thanks! > > If we do end up keeping the statement that peers could use web CAs to > > authenticate the EAP server, I would strongly recommend providing some > > commentary about when it would or would not be a good idea to actually do > > so, > > or what factors would come into play in deciding whether or not to do so. > > I would leave that as something to be defined by the specific provisioning > method. I'll add a note to this effect. With the added text we should be ok. > > ### Does TLS-PSK need to be handled separately from regular EAP-TLS? > > > > The final paragraph of §5.1 mentions that TLS-PSK can technically be used > > with > > EAP-TLS for provisioning purposes, but in all the TLS stacks I know of, > > using > > TLS-PSK is effectively a distinct operation than doing a certificate-based > > handshake, and I would not generally expect either peers or servers to be > > prepared to handle both for the same TLS connection (i.e., letting the other > > endpoint pick which to use). > > I'm not sure what that means. The text in the document isn't suggesting > that it's using both certificates and PSK at the same time, for the same > connection. I'll add some text making that clearer. I was trying to say that a given TLS server "usually" (there is a good bit of handwaving here) expects all of its clients to either use a certificate to authenticate the server or use PSK; the situation for TLS clients talking to a specific server is even more strongly so, with it being a very strang (and often difficult) flow for a client to say "talk to server X, I will accept either a certificate chaining to these CAs or successful proof of knowledge of this PSK as server authentication". That is, whether or not PSK should be used on a given connection is "usually" (again, hand-waving) something that is known out of band vs negotiated on a per-connection basis. In that regard there is some reason (not necessarily a conclusive reason) to want a different EPI for "EAP-TLS with certificate" and "EAP-TLS with PSK" ... > If instead we expect that one EAP server handles both certificates and PSKs > at the same time, then that should be possible. There are already multiple > EAP server implementations which support this. ... and you seem to be saying that existing EAP servers will do ok with this, and that (by implication) EAP peers will have the advance knowledge of when to (not) try TLS-PSK that they need for success. > > To me, that suggests that interoperability would > > benefit from defining a distinct provisioning NAI to indicate that TLS-PSK > > should be used with EAP-TLS, leaving [email protected] for > > certificate-based > > (server) authentication. Do we have reason to believe that the current > > specification will be interoperable in the face of peers/servers that do and > > do not want to attempt TLS-PSK "authentication"? > > I think it's really a decision for either a local deployment, or for a > particular provisioning method. I thought we were trying to claim that EPI was 1:1 with provisioning method, such that EAP-TLS provisioning behavior would need to be fully specified in this document (to the extent that it interacts with the TLS handshake used for EAP). If we can defer things to the specific technique used to provision credentials after EAP-TLS establishment, that would be ok, but I don't think the current specification gives us rope to make the TLS handshake behavior of the EAP-TLS connection depend on a provisioning mechanism when the choice of provisioning mechanism is not indicated in the EPI. > > I would probably also say something to clarify that the (lowercase!) raw > > ASCII > > byte string of the NAI name is used directly as the PSK, without other > > processing, but that's just at a comment level. > > The text already says that the PSK identity must be the same as the EAP > Identifier (i.e. NAI), and also used as the PSK. I'll add some clarifying > text. (as noted above) using a well-known value as the PSK itself cannot provide server authentication. If we're going to say that "all provisioning methods [...] MUST Define a way to authenticate the server. This authentication can happen either at the EAP layer (as with EAP-TLS), [...]" then I think we need EAP-TLS with TLS-PSK to authenticate the server as well. Am I missing something? (It looks like we have some text in §5 that mentions TTLS and PEAP as using the EPI for identity+password that also impacts their ability to perform server authentication if TLS-PSK is used.) > > ### NAIs for TLS-based EAP methods > > > > The rules for the registry seem to say that there must be a 1:1 > > correspondence > > (or at least N:1) between provisioning NAI and EAP method. So I'm really > > confused at why we have any discussion of TTLS and PEAP (in §5.2) but say to > > use the same NAI ([email protected]) as for EAP-TLS. Why do we not need > > to > > define distinct NAIs to provide the semantics indicated here? > > Yes. I think it's simplest to just delete the text mentioned TTLS and PEAP. Sure, that works fine. > > ## Comments > > > > ### division of responsibility between this doc and provisioning methods > > > > In §5 we have some discussion about how our predefined provisioning NAIs > > will > > interact with existing EAP types, including a statement that where TLS-based > > methods have inner identity/authentication, those credentials "MUST be the > > provisioning identifier", among other requirements. I'm not sure I > > understand > > why we need to tie our hands so strongly in this document, when any given > > provisioning identifier is going to be specific to a single EAP method (per > > §6.2 and 3.4.1). Why is it necessary for the core protocol framework > > specifically to impose this requirement, vs the individual provisioning > > methods doing so (with guidance from the framework as a useful default)? > > I don't see why provisioning methods would need to define per-method > credentials. I don't either, but that doesn't mean I can prove they would never need to do so. > > I do see that the registration procedure is merely "expert review" so there > > may not always be a document that would be able to hold such a requirement. > > But it seems like we could say "unless otherwise specified, assume that the > > password is the provisioning identifier" and leave room for future > > evolution. > > I can make it a SHOULD. It looks like you changed §5¶2 but we also have a MUST in §5¶3 that talks about inner identity/password. > > ### Direct configuration of NAI > > > > In §3.4.1 we say: > > > >> EAP peers MUST NOT allow these NAIs to be configured directly by end users. > > Instead the user (or some other process) chooses a provisioning method, and > > the > > peer then chooses a predefined NAI which matches that provisioning method. > > > > I agree with the goal here, but are there or could there be existing > > situations where implementations already allow the user to directly enter > > the > > NAI (along with the associated credentials)? > > Yes. > > > If so, we probably want some > > discussion about what might happen if a user (maliciously?) enters a > > predefined NAI in such a way, along with guidance that implementations that > > do > > allow this behavior need to check for eap.arpa entries and reject them. > > The user will either not get authenticated (if the server doesn't support > provisioning), or the user will be placed into a captive portal, where > nothing will happen. I'll add some text to clarify this. Thanks for adding this text (I cannot refute the reasoning you give that the implementation SHOULD NOT specifically check for EPIs) > > ### Allow for server upgrades > > > > In §3.4.1: > > > >> There are a number of ways in which provisioning can fail. One way is when > > the server does not implement the provisioning method. EAP peers therefore > > MUST > > track which provisioning methods have been tried, and not repeat the same > > method to the same EAP server when receiving a an EAP Nak. EAP peers MUST > > rate > > limit attempts at provisioning, in order to avoid overloading the server. > > > > We may want to saay something about the not repeating being bound to some > > large-ish but not-infinite timeframe, to allow for another attempt much > > later > > to succeed if the server has been upgraded in the interim. (We also don't > > want requirements on peers to have unbounded local storage requirements!) > > Do you have suggested text / timers? I think it would be safe to forget this state after a month (which would be a somewhat optimistic timescale for deploying software updates to an EAP server, I suspect, but is also long enough to present only a minimal amount of additiona load on the server). But maybe the state-keeping requirements only make sense to preserve it for the current network attachment, which would give rather different properties. I'm not actually sure. If we like the time-based guidance, it could look like: % EAP peers therefore MUST trak which provisioning methods have been tried, % and not repeat the same method to the same EAP server when receiving an % EAP Nak. EAP peers MAY retry a given provisioning method after a % sufficiently long interval that the EAP server might have implemented the % provisioning method, e.g., a month. but maybe we want to think about the "current network attachment" formulation, e.g.: % EAP peers therefore MUST trak which provisioning methods have been tried, % and not repeat the same method to the same EAP server when receiving an % EAP Nak, within the scope of the current network attachment. EAP peers SHOULD % retain this state for at least a day, but MAY discard state after such a % delay, allowing them to retry the provisioning method at a much later % time, which allows for the possibility that the server has implemented % the provisioning method in question. > > (We could also give some guidance on what good rate limiting might look > > like, > > even if that takes the form of factors to consider rather than specific > > values. Note that rate limiting also comes up in §3.4.2.) > > Suggested text would be welcome here. Mentioning jitter and backoff (as you now do) is itself a big improvement ... I do not think I have the domain knowledge to supply any appreciably better guidance, myself. > > ### Large amounts of data and PQC > > > > In §3.4.2: > > > >> A limited network SHOULD also limit the amount of data being transferred by > > devices being provisioned, and SHOULD limit the network services which are > > available to those devices. The provisioning process generally does not > > need to > > download large amounts of data, and similarly does not need access to a > > large > > number of services. > > > > Do you have a sense for what people might take "large amounts of data" to > > mean? As we start transitioning to post-quantum cryptography with its > > larger > > key sizes, it would be unfortunate if the total data limit for provisioning > > was too small to admit transfer of credentials using PQC algorithms (but I'm > > not sure if we actually need so say something, if the limits in practice > > will > > be fine without us doing so). > > (There is some related discussion in §5.1 that might want a section > > reference > > back to any new content added here.) > > I have no quantitative numbers for this. Ideally not gigabytes. Ok. I don't have great thoughts, either, so maybe we just leave this as-is unless someone else has better ideas. > > ### Rationale > > > > We have §4.2.1 to give a rationale for provisioning inside EAP, but no > > corresponding section with a rationale for provisioning inside a captive > > portal, yet we do not specifically recommend provisioning inside EAP. This > > leaves me unsure what the purpose of the section is, if we're going to spend > > time justifying something that's just one option to choose from with no > > other > > special status. (I can infer that using a captive portal facilitiates reuse > > of existing provisioning protocols and/or deployments, but the document > > doesn't tell me that.) > > > > ### EAP-TLS clarifications > > > > The final sentence of toplevel §5 provides some commentary about what > > EAP-TLS > > allows, but I find myself unclear both about why this information is being > > added and what scenarios are being described. My current theory is that > > it's > > saying that an EAP peer can use EAP-TLS-based provisioning via captive > > portal > > with only a small amount of pre-provisioned or factory-provisioned > > information > > (the CAs that are locally configured), and we're mentioning this to support > > our argument that using EAP-TLS for provisioning (whether with in-EAP > > provisioning or captive-portal provisioning) provides advantages and is > > generally recommended. Is that correct? > > > > ### guidance to the experts > > > > Generally we treat "SHOULD NOT" as "MUST NOT, with exceptions". If NAIs in > > the registry SHOULD NOT contain more than one subdomain, what kind of > > exceptions might make sense? > > I'm not sure. I'll try to think of something. > > > Relatedly, I think the guidance should say that NAIs with any "v." > > subdomain, > > leading or otherwise, MUST NOT be retistered, in order to preserve the > > purpose > > of that prefix. > > That makes sense. > > > Do we need to specifically include in this section the content from §6.4 > > that > > the Method Type must provide MSK and EMSK? > > Yes. I'll add a note. (I am not seeing this one.) Thanks again, Ben _______________________________________________ Emu mailing list -- [email protected] To unsubscribe send an email to [email protected]
