Thanks JB, I do feel like the discussion around OAuth2, SigV4, etc. is a big enough topic that we wouldn't want to bundle it with other proposed changes. I think the discussion around both what is included in the spec and what the reference implementations will be for each of these protocols will be a rather large topic.
In general, we find fewer, more focused proposals allow for better discussion and faster resolution. Can you split that section out into a separate document and create an issue for the auth changes? Thanks, -Dan On Thu, May 30, 2024 at 4:55 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Jack, > > Here's my comments: > > 1. I don't think we should remove the oauth2 endpoint directly like > this. I would first deprecate the endpoint and plan the removal in the > spec v2. > 4. I agree, and it has to be pluggable. > > I updated the REST Spec v2 proposal including first steps on v1: > > https://docs.google.com/document/d/1JUtFpdEoa6IAKt1EzJi_re0PUbh56XnfUtRe5WAfl0s/edit?usp=sharing > > As already shared on the mailing list, I'm working on a PR to have > interfaces with JAXRS/Swagger annotations to generate OpenAPI > JSON/YAML with the swagger-gradle-plugin. > > Thanks, > Regards > JB > > On Wed, May 29, 2024 at 8:03 PM Jack Ye <yezhao...@gmail.com> wrote: > > > > Just to reiterate my points discussed in the community sync here: the > more I think about it the more I agree the OAuth endpoint should be removed > from the REST spec. Even though the endpoint is optional, and even if we do > not care about the security concerns, it still provides users an impression > that the endpoint "should" be implemented, or "is the preferred > authentication mechanism". And as we have found out, the server capability > proposal does not cover this case since this is the first endpoint to hit > before the GetConfig endpoint. > > > > As Ryan said, if we want to do that we need an alternative plan. I don't > have anything concrete, but here is my line of thought: > > > > 1. remove OAuth2 endpoint from the "REST OpenAPI spec" > > > > 2. create a client-side interface (in each language) that different > authentication mechanisms can be plugged in to talk to the REST catalog > > > > 3. refactor and make OAuth2 an implementation of that interface. I can > also help with doing the same for AWS Sigv4, and the community can further > support some additional ones like Kerberos, SAML, Google SSO, etc. based on > the individual use cases. > > > > 4. turn 2 + 3 into a "REST catalog authentication spec" that documents > about all the supported authentication mechanisms and their defaults. For > OAuth2, the default is to have the auth server at the same endpoint as the > resource server for backwards compatibility, but that is a configurable > property, and we could recommend not to do that based on security concerns. > > > > Best, > > Jack Ye > > > > On Wed, May 29, 2024 at 10:28 AM Steven Wu <stevenz...@gmail.com> wrote: > >> > >> Wondering if the auth endpoints can be separated out to a separate > OpenAPI spec file. Then we still have some reference for interactions with > auth server and make it clear it is not required as part of the REST > catalog server. In most enterprise environments, auth server is likely a > separate server. > >> > >> On Tue, May 28, 2024 at 1:25 PM Alex Dutra > <alex.du...@dremio.com.invalid> wrote: > >>> > >>> Hi, > >>> > >>>> > >>>> On point 4, isn't that possible today, Can't that be achieved with > the current token exchange approach, and the internal implementation of the > endpoint? > >>> > >>> > >>> Unfortunately, no. Token exchange is not widely adopted yet: for > example, Keycloak has only partial support for it, and Authelia, or > Authentik, have no support for it at all. > >>> > >>> This, and a few other technical issues with the current internals of > the REST client, makes it nearly impossible to achieve a good integration > of Iceberg REST with the majority of popular OSS authorization servers. > >>> > >>> I am planning to start another email thread to discuss these > practicalities, but let's first reach consensus on the broader security > issues voiced here, before we tackle the details. > >>> > >>> Thanks, > >>> > >>> Alex Dutra > >>> > >>> On Tue, May 28, 2024 at 8:41 PM Amogh Jahagirdar <am...@tabular.io> > wrote: > >>>> > >>>> I disagree with removing "/v1/oauth/tokens" and I think I also > disagree with the premise that implementing that endpoint is required, but > I can understand how that's not clear in the spec. I think we can address > the required vs non-required discussion with the capabilities PR. > >>>> > >>>> It seems like another part of what's driving this discussion is some > concern around how do we enforce REST catalog implementations which do > implement this endpoint to make sure that the implementation is secure (for > example to avoid the MITM example that was brought up). This is ultimately > a runtime detail. To me it seems like if we make it clear that such an > endpoint should be implemented respecting OAuth2 standards, and we know > that OAuth2 compliance requires avoiding that MITM situation, then runtime > implementations should just follow the spec there > >>>> > >>>> >3. Enable flexibility for Iceberg REST servers to opt for other > >>>> authorization mechanisms than OAuth 2.0. > >>>> >4. Enable REST servers to opt for integrating with any standard > OAuth2 / > >>>> OIDC provider (e.g. Okta, Keycloak, Authelia). > >>>> > >>>> I agree with both of these points; again I don't think the intention > is Oauth2 is the only way, but I think the capabilities PR will make that > even more clear. > >>>> On point 4, isn't that possible today, Can't that be achieved with > the current token exchange approach, and the internal implementation of the > endpoint? Sorry if I missed that explanation. > >>>> > >>>> Thanks, > >>>> > >>>> Amogh Jahagirdar > >>>> > >>>> On Tue, May 28, 2024 at 11:13 AM Yufei Gu <flyrain...@gmail.com> > wrote: > >>>>> > >>>>> Not an expert on authentication, but reading from the context, I > agree that it’s not a good practice to use a resource server as a token > server. The resource server would need to securely handle and store > credentials or tokens, increasing the risk of credential theft or leakage. > Making the token endpoint optional will mitigate the issue a bit. But if we > want to disable it completely, it's better to do it now to prevent any > issues and migration costs in the future. Can we have a consensus on it? > >>>>> > >>>>> > >>>>> I would prefer to deprecate it to prevent any intentional and > unintentional misuse. We will also need to change the clients since it > connects to the endpoint by default. > >>>>> > >>>>> > >>>>> Yufei > >>>>> > >>>>> > >>>>> On Tue, May 28, 2024 at 8:27 AM Jack Ye <yezhao...@gmail.com> wrote: > >>>>>> > >>>>>> Sounds like we should try to finalize a consensus around > https://github.com/apache/iceberg/pull/9940, so that we make it very > clear what APIs/features are optional. > >>>>>> > >>>>>> -Jack > >>>>>> > >>>>>> On Tue, May 28, 2024 at 7:25 AM Fokko Driesprong <fo...@apache.org> > wrote: > >>>>>>> > >>>>>>> Hey Robert, > >>>>>>> > >>>>>>> Sorry for the late reply as I was out last week. I'm not an OAuth > guru either, but some context from my end. > >>>>>>> > >>>>>>>> * Credentials (for example username/password) must _never_ be > sent to > >>>>>>>> the resource server, only to the authorization server. > >>>>>>> > >>>>>>> > >>>>>>> In an earlier discussion, it was agreed that the resource server > can also function as the authorization server. But the roles can also be > separate. > >>>>>>> > >>>>>>>> 1.2. As long as OAuth2 is the only mechanism supported by the > Iceberg > >>>>>>>> client, make the existing client parameter “oauth2-server-uri” > >>>>>>>> mandatory. The Iceberg REST catalog must fail to initialize if the > >>>>>>>> “oauth2-server-uri” parameter is not defined. > >>>>>>> > >>>>>>> > >>>>>>> It can also be that there is no authentication in the case of an > internal REST catalog. For example, the iceberg-rest-image that we use for > integration tests in PyIceberg. > >>>>>>> > >>>>>>>> We think that Apache Iceberg REST Catalog spec should not mandate > that a > >>>>>>>> catalog implementation responds to requests to produce Auth Tokens > >>>>>>>> (since the REST spec v1 defines a /v1/tokens endpoint, current > >>>>>>>> implementations have to take deliberate actions when responding > to those > >>>>>>>> requests, whether with successful token responses or with “access > >>>>>>>> denied” or “unsupported” responses). > >>>>>>> > >>>>>>> The `/v1/tokens` endpoint is optional. > >>>>>>> > >>>>>>>> * Credentials (for example username/password) must _never_ be > sent to > >>>>>>>> the resource server, only to the authorization server. > >>>>>>> > >>>>>>> > >>>>>>> I fully agree! > >>>>>>> > >>>>>>>> Even if an Iceberg REST server does not implement the > ‘/v1/oauth/tokens’ > >>>>>>>> endpoint, it can still receive requests to ‘/v1/oauth/tokens’ > containing > >>>>>>>> clear text credentials, if clients are misconfigured (humans do > make > >>>>>>>> mistakes) - it’s a non-zero risk - bad actors can > implement/intercept > >>>>>>>> that ‘/v1/oauth/tokens’ endpoint and just wait for misconfigured > >>>>>>>> clients to send credentials. > >>>>>>> > >>>>>>> > >>>>>>> I think the wording is chosen badly. It should not send any > credentials, but the code (as in this example by GCS). > >>>>>>> > >>>>>>>> I think Jack makes a good point with AWS SigV4 Authentication. I > suppose, in REST Catalog implementations that support that auth method, the > /v1/oauth/token Catalog REST endpoint is redundant. > >>>>>>> > >>>>>>> > >>>>>>> There are other cloud providers next to AWS. > >>>>>>> > >>>>>>> Kind regards, > >>>>>>> Fokko > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Op do 23 mei 2024 om 15:49 schreef Dmitri Bourlatchkov > <dmitri.bourlatch...@dremio.com.invalid>: > >>>>>>>> > >>>>>>>> I think Jack makes a good point with AWS SigV4 Authentication. I > suppose, in REST Catalog implementations that support that auth method, the > /v1/oauth/token Catalog REST endpoint is redundant. > >>>>>>>> > >>>>>>>> Cheers, > >>>>>>>> Dmitri. > >>>>>>>> > >>>>>>>> On Thu, May 23, 2024 at 9:20 AM Jack Ye <yezhao...@gmail.com> > wrote: > >>>>>>>>> > >>>>>>>>> I do not know enough details about OAuth to make comments about > this issue, but just regarding the statement "OAuth2 is the only mechanism > supported by the Iceberg client", AWS Sigv4 auth is also supported, at > least in the Java client implementation. It would be nice if we formalize > that in the spec, at least define it as a generic authorization header. > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Jack Ye > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Thu, May 23, 2024 at 2:51 AM Robert Stupp <sn...@snazy.de> > wrote: > >>>>>>>>>> > >>>>>>>>>> Hi all, > >>>>>>>>>> > >>>>>>>>>> Iceberg REST implementations, either accessible on the public > internet > >>>>>>>>>> or inside an organization, are usually being secured using > appropriate > >>>>>>>>>> authorization mechanisms. The Nessie team is looking at > implementing the > >>>>>>>>>> Iceberg REST specification and have some questions around the > security > >>>>>>>>>> endpoint(s) defined in the spec. > >>>>>>>>>> > >>>>>>>>>> TL;DR we have questions (potentially concerns) about having the > >>>>>>>>>> ‘/v1/oauth/tokens’ endpoint, for the reasons explained below. > We think > >>>>>>>>>> that ‘/v1/oauth/tokens’ poses potential security and OAuth2 > compliance > >>>>>>>>>> issues, and imposes how authorization should be implemented. > >>>>>>>>>> * As an open table format, it would be good for Iceberg to > focus on the > >>>>>>>>>> table format / catalog and not how authorization is > implemented. The > >>>>>>>>>> existence of an OAuth endpoint pushes implementations to adopt > >>>>>>>>>> authorization using only OAuth, whereas the implementers might > choose > >>>>>>>>>> several other ways to implement authorization (e.g. SAML). In > our > >>>>>>>>>> opinion the spec should leave it open to the implementation to > decide > >>>>>>>>>> how authorization will be implemented. > >>>>>>>>>> * The existence of that endpoint also pushes operators of > Iceberg REST > >>>>>>>>>> endpoints into the authorization service business. > >>>>>>>>>> * Clients might expose their clear-text credentials to the wrong > >>>>>>>>>> service, if the (correct) OAuth endpoint is not configured > (humans do > >>>>>>>>>> make mistakes). > >>>>>>>>>> * (Naive) Iceberg REST servers may proxy requests received for > >>>>>>>>>> ‘/v1/oauth/tokens’ - and effectively become a > “man-in-the-middle”, which > >>>>>>>>>> is not fully compliant with the OAuth 2.0 specification. > >>>>>>>>>> > >>>>>>>>>> Our goals with this discussion are: > >>>>>>>>>> 1. Secure the Iceberg REST specification by preventing > accidental > >>>>>>>>>> misuse/misimplementation. > >>>>>>>>>> 2. Prevent that Iceberg REST to get into dictating the > “authorization > >>>>>>>>>> server specifics”. > >>>>>>>>>> 3. Enable flexibility for Iceberg REST servers to opt for other > >>>>>>>>>> authorization mechanisms than OAuth 2.0. > >>>>>>>>>> 4. Enable REST servers to opt for integrating with any standard > OAuth2 / > >>>>>>>>>> OIDC provider (e.g. Okta, Keycloak, Authelia). > >>>>>>>>>> > >>>>>>>>>> OAuth 2.0 [1] is one of the common standards accepted in the > industry. > >>>>>>>>>> It defines a secure mechanism to access resources (here: > Iceberg REST > >>>>>>>>>> endpoints). The most important aspect for OAuth 2.0 resources > is that > >>>>>>>>>> (Iceberg REST) servers do not (have to) support password > authentication, > >>>>>>>>>> especially considering the security weaknesses inherent in > passwords. > >>>>>>>>>> Compromised passwords result in compromised data protected by > that password. > >>>>>>>>>> > >>>>>>>>>> Therefore OAuth 2.0 defines a set of strict rules. Some of > these are: > >>>>>>>>>> * Credentials (for example username/password) must _never_ be > sent to > >>>>>>>>>> the resource server, only to the authorization server. > >>>>>>>>>> * OAuth 2.0 refresh tokens must _never_ be sent to the resource > server, > >>>>>>>>>> only to the authorization server. (“Unlike access tokens, > refresh tokens > >>>>>>>>>> are intended for use only with authorization servers and are > never sent > >>>>>>>>>> to resource servers.”, cite from section 1.5 of the OAuth RFC > 6749.) > >>>>>>>>>> * While the OAuth RFC states "The authorization server may be > the same > >>>>>>>>>> server as the resource server or a separate entity", this > should not be > >>>>>>>>>> mandated. i.e the spec should be open to supporting > implementations that > >>>>>>>>>> have the authorization server and resource server co-located as > well as > >>>>>>>>>> separate. > >>>>>>>>>> > >>>>>>>>>> The Iceberg PR 4771 [2] added the OpenAPI path > ‘/v1/oauth/tokens’, > >>>>>>>>>> intentionally marked to “To exchange client credentials (client > ID and > >>>>>>>>>> secret) for an access token. This uses the client credentials > flow.” > >>>>>>>>>> [3]. Technically: client ID and secret are submitted using a > HTTP POST > >>>>>>>>>> request to that Iceberg REST endpoint. > >>>>>>>>>> > >>>>>>>>>> Having ‘/v1/oauth/tokens’ in the Iceberg REST specification can > easily > >>>>>>>>>> be seen as a hard requirement. In order to implement this in > compliance > >>>>>>>>>> with the OAuth 2.0 spec, that ‘/v1/oauth/tokens’ MUST be the > >>>>>>>>>> authorization server. If users do not (want to) implement an > >>>>>>>>>> authorization server, the only way to implement this > ‘/v1/oauth/tokens’ > >>>>>>>>>> endpoint would be to proxy ‘/v1/oauth/tokens’ to the actual > >>>>>>>>>> authorization server, which means, that this proxy technically > becomes a > >>>>>>>>>> “man in the middle” - knowing both all credentials and all > involved tokens. > >>>>>>>>>> > >>>>>>>>>> Even if an Iceberg REST server does not implement the > ‘/v1/oauth/tokens’ > >>>>>>>>>> endpoint, it can still receive requests to ‘/v1/oauth/tokens’ > containing > >>>>>>>>>> clear text credentials, if clients are misconfigured (humans do > make > >>>>>>>>>> mistakes) - it’s a non-zero risk - bad actors can > implement/intercept > >>>>>>>>>> that ‘/v1/oauth/tokens’ endpoint and just wait for > misconfigured > >>>>>>>>>> clients to send credentials. > >>>>>>>>>> > >>>>>>>>>> Further usages of the REST Catalog API path ‘/v1/oauth/tokens’ > are “To > >>>>>>>>>> exchange a client token and an identity token for a more > specific access > >>>>>>>>>> token. This uses the token exchange flow.” and “To exchange an > access > >>>>>>>>>> token for one with the same claims and a refreshed expiration > period > >>>>>>>>>> This uses the token exchange flow.” Both usages should and can > be > >>>>>>>>>> implemented differently. > >>>>>>>>>> > >>>>>>>>>> Apache Iceberg, as a table format project, should recommend > protecting > >>>>>>>>>> sensitive information. But Iceberg should not mandate _how_ that > >>>>>>>>>> protection is implemented - but the Iceberg REST specification > does > >>>>>>>>>> effectively mandate OAuth 2.0, because other Iceberg REST > endpoints do > >>>>>>>>>> refer/require OAuth 2.0 specifics. Users that want to use other > >>>>>>>>>> mechanisms, because they are forced to do so by their > organization, > >>>>>>>>>> would be locked out of Iceberg REST. > >>>>>>>>>> > >>>>>>>>>> Apache Iceberg should not mandate OAuth 2.0 as the only option > - for the > >>>>>>>>>> sake of openness for the project and flexibility for the server > >>>>>>>>>> implementations. > >>>>>>>>>> > >>>>>>>>>> We think that Apache Iceberg REST Catalog spec should not > mandate that a > >>>>>>>>>> catalog implementation responds to requests to produce Auth > Tokens > >>>>>>>>>> (since the REST spec v1 defines a /v1/tokens endpoint, current > >>>>>>>>>> implementations have to take deliberate actions when responding > to those > >>>>>>>>>> requests, whether with successful token responses or with > “access > >>>>>>>>>> denied” or “unsupported” responses). > >>>>>>>>>> > >>>>>>>>>> We propose the following actions: > >>>>>>>>>> 1. Immediate mitigation: > >>>>>>>>>> 1.1. Remove the ‘/v1/oauth/tokens’ endpoint entirely from the > Iceberg’s > >>>>>>>>>> OpenAPI spec w/o replacement. > >>>>>>>>>> 1.2. As long as OAuth2 is the only mechanism supported by the > Iceberg > >>>>>>>>>> client, make the existing client parameter “oauth2-server-uri” > >>>>>>>>>> mandatory. The Iceberg REST catalog must fail to initialize if > the > >>>>>>>>>> “oauth2-server-uri” parameter is not defined. > >>>>>>>>>> 1.3. Remove all fallbacks to the ‘/v1/oauth/tokens’ endpoint. > >>>>>>>>>> 1.4. Forbid or discourage the communication of tokens from any > Iceberg > >>>>>>>>>> REST Catalog endpoint, both via the "token" property or with > any of the > >>>>>>>>>> "urn:ietf:params:oauth:token-type:*" properties. > >>>>>>>>>> 2. As a follow up: We’d propose a couple of implementation > fixes and > >>>>>>>>>> changes and test improvements. > >>>>>>>>>> 3. As a follow up: Define a discovery mechanism for both the > Iceberg > >>>>>>>>>> REST base URI and OAuth 2.0 endpoints/discovery, which allows > users to > >>>>>>>>>> use a single URI to securely access Iceberg REST endpoints. > >>>>>>>>>> 4. As a follow up: Not new, but we also want to improve the > Iceberg REST > >>>>>>>>>> specification via the “new” REST proposal. > >>>>>>>>>> > >>>>>>>>>> We do not think that adding recommendations to > inline-documentation is > >>>>>>>>>> enough to fully mitigate the above concerns. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> References: > >>>>>>>>>> > >>>>>>>>>> [1] RFC 6749 - The OAuth 2.0 Authorization Framework, > >>>>>>>>>> https://datatracker.ietf.org/doc/html/rfc6749 > >>>>>>>>>> [2] Iceberg pull request 4771 - Core: Add OAuth2 to REST > catalog spec - > >>>>>>>>>> https://github.com/apache/iceberg/pull/4771 > >>>>>>>>>> [3] Iceberg pull request 4843 - Spec: Add more context about > OAuth2 to > >>>>>>>>>> the REST spec - https://github.com/apache/iceberg/pull/4843 > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> Robert Stupp > >>>>>>>>>> @snazy > >>>>>>>>>> >