Just to reiterate my points discussed in the community sync here: the more
I think about it the more I agree the OAuth endpoint *should be removed
from the REST spec*. Even though the endpoint is optional, and even if we
do not care about the security concerns, it still provides users an
impression that the endpoint "should" be implemented, or "is the preferred
authentication mechanism". And as we have found out, the server capability
proposal does not cover this case since this is the first endpoint to hit
before the GetConfig endpoint.

As Ryan said, if we want to do that we need an alternative plan. I don't
have anything concrete, but here is my line of thought:

1. remove OAuth2 endpoint from the "REST OpenAPI spec"

2. create a client-side interface (in each language) that different
authentication mechanisms can be plugged in to talk to the REST catalog

3. refactor and make OAuth2 an implementation of that interface. I can also
help with doing the same for AWS Sigv4, and the community can further
support some additional ones like Kerberos, SAML, Google SSO, etc. based on
the individual use cases.

4. turn 2 + 3 into a "REST catalog authentication spec" that documents
about all the supported authentication mechanisms and their defaults. For
OAuth2, the default is to have the auth server at the same endpoint as the
resource server for backwards compatibility, but that is a configurable
property, and we could recommend not to do that based on security concerns.

Best,
Jack Ye

On Wed, May 29, 2024 at 10:28 AM Steven Wu <stevenz...@gmail.com> wrote:

> Wondering if the auth endpoints can be separated out to a separate OpenAPI
> spec file. Then we still have some reference for interactions with auth
> server and make it clear it is not required as part of the REST catalog
> server. In most enterprise environments, auth server is likely a separate
> server.
>
> On Tue, May 28, 2024 at 1:25 PM Alex Dutra <alex.du...@dremio.com.invalid>
> wrote:
>
>> Hi,
>>
>>
>>> On point 4, isn't that possible today, Can't that be achieved with the
>>> current token exchange approach, and the internal implementation of the
>>> endpoint?
>>
>>
>> Unfortunately, no. Token exchange is not widely adopted yet: for example,
>> Keycloak has only partial support for it, and Authelia, or Authentik, have
>> no support for it at all.
>>
>> This, and a few other technical issues with the current internals of the
>> REST client, makes it nearly impossible to achieve a good integration of
>> Iceberg REST with the majority of popular OSS authorization servers.
>>
>> I am planning to start another email thread to discuss these
>> practicalities, but let's first reach consensus on the broader security
>> issues voiced here, before we tackle the details.
>>
>> Thanks,
>>
>> Alex Dutra
>>
>> On Tue, May 28, 2024 at 8:41 PM Amogh Jahagirdar <am...@tabular.io>
>> wrote:
>>
>>> I disagree with removing "/v1/oauth/tokens" and I think I also disagree
>>> with the premise that implementing that endpoint is required, but I can
>>> understand how that's not clear in the spec. I think we can address the
>>> required vs non-required discussion with the capabilities PR.
>>> <https://github.com/apache/iceberg/pull/9940>
>>>
>>> It seems like another part of what's driving this discussion is some
>>> concern around how do we enforce REST catalog implementations which do
>>> implement this endpoint to make sure that the implementation is secure (for
>>> example to avoid the MITM example that was brought up). This is ultimately
>>> a runtime detail. To me it seems like if we make it clear that such an
>>> endpoint should be implemented respecting OAuth2 standards, and we know
>>> that OAuth2 compliance requires avoiding that MITM situation, then runtime
>>> implementations should just follow the spec there
>>>
>>> >3. Enable flexibility for Iceberg REST servers to opt for other
>>> authorization mechanisms than OAuth 2.0.
>>> >4. Enable REST servers to opt for integrating with any standard OAuth2
>>> /
>>> OIDC provider (e.g. Okta, Keycloak, Authelia).
>>>
>>> I agree with both of these points; again I don't think the intention is
>>> Oauth2 is the only way, but I think the capabilities PR will make that even
>>> more clear.
>>> On point 4, isn't that possible today, Can't that be achieved with the
>>> current token exchange approach, and the internal implementation of the
>>> endpoint? Sorry if I missed that explanation.
>>>
>>> Thanks,
>>>
>>> Amogh Jahagirdar
>>>
>>> On Tue, May 28, 2024 at 11:13 AM Yufei Gu <flyrain...@gmail.com> wrote:
>>>
>>>> Not an expert on authentication, but reading from the context, I agree
>>>> that it’s not a good practice to use a resource server as a token server.
>>>> The resource server would need to securely handle and store credentials or
>>>> tokens, increasing the risk of credential theft or leakage. Making the
>>>> token endpoint optional will mitigate the issue a bit. But if we want to
>>>> disable it completely, it's better to do it now to prevent any issues and
>>>> migration costs in the future. Can we have a consensus on it?
>>>>
>>>>
>>>> I would prefer to deprecate it to prevent any intentional and
>>>> unintentional misuse. We will also need to change the clients since it
>>>> connects to the endpoint by default.
>>>>
>>>>
>>>> Yufei
>>>>
>>>>
>>>> On Tue, May 28, 2024 at 8:27 AM Jack Ye <yezhao...@gmail.com> wrote:
>>>>
>>>>> Sounds like we should try to finalize a consensus around
>>>>> https://github.com/apache/iceberg/pull/9940, so that we make it very
>>>>> clear what APIs/features are optional.
>>>>>
>>>>> -Jack
>>>>>
>>>>> On Tue, May 28, 2024 at 7:25 AM Fokko Driesprong <fo...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hey Robert,
>>>>>>
>>>>>> Sorry for the late reply as I was out last week. I'm not an OAuth
>>>>>> guru either, but some context from my end.
>>>>>>
>>>>>> * Credentials (for example username/password) must _never_ be sent to
>>>>>>> the resource server, only to the authorization server.
>>>>>>
>>>>>>
>>>>>> In an earlier discussion
>>>>>> <https://github.com/apache/iceberg/pull/8976>, it was agreed that
>>>>>> the resource server can also function as the authorization server. But 
>>>>>> the
>>>>>> roles can also be separate.
>>>>>>
>>>>>> 1.2. As long as OAuth2 is the only mechanism supported by the Iceberg
>>>>>>> client, make the existing client parameter “oauth2-server-uri”
>>>>>>> mandatory. The Iceberg REST catalog must fail to initialize if the
>>>>>>> “oauth2-server-uri” parameter is not defined.
>>>>>>
>>>>>>
>>>>>> It can also be that there is no authentication in the case of an
>>>>>> internal REST catalog. For example, the iceberg-rest-image
>>>>>> <https://github.com/tabular-io/iceberg-rest-image> that we use for
>>>>>> integration tests in PyIceberg.
>>>>>>
>>>>>> We think that Apache Iceberg REST Catalog spec should not mandate
>>>>>>> that a
>>>>>>> catalog implementation responds to requests to produce Auth Tokens
>>>>>>> (since the REST spec v1 defines a /v1/tokens endpoint, current
>>>>>>> implementations have to take deliberate actions when responding to
>>>>>>> those
>>>>>>> requests, whether with successful token responses or with “access
>>>>>>> denied” or “unsupported” responses).
>>>>>>
>>>>>> The `/v1/tokens` endpoint is optional
>>>>>> <https://github.com/apache/iceberg-python/blob/756ae625a2ea0f9c12df78430512ce991f6a1976/pyiceberg/catalog/rest.py#L488-L489>
>>>>>> .
>>>>>>
>>>>>> * Credentials (for example username/password) must _never_ be sent to
>>>>>>> the resource server, only to the authorization server.
>>>>>>
>>>>>>
>>>>>> I fully agree!
>>>>>>
>>>>>> Even if an Iceberg REST server does not implement the
>>>>>>> ‘/v1/oauth/tokens’
>>>>>>> endpoint, it can still receive requests to ‘/v1/oauth/tokens’
>>>>>>> containing
>>>>>>> clear text credentials, if clients are misconfigured (humans do make
>>>>>>> mistakes) - it’s a non-zero risk - bad actors can implement/intercept
>>>>>>> that  ‘/v1/oauth/tokens’ endpoint and just wait for misconfigured
>>>>>>> clients to send credentials.
>>>>>>
>>>>>>
>>>>>> I think the wording is chosen badly. It should not send
>>>>>> any credentials, but the code (as in this example
>>>>>> <https://developers.google.com/identity/protocols/oauth2#installed> by
>>>>>> GCS).
>>>>>>
>>>>>> I think Jack makes a good point with AWS SigV4 Authentication. I
>>>>>>> suppose, in REST Catalog implementations that support that auth method, 
>>>>>>> the
>>>>>>> /v1/oauth/token Catalog REST endpoint is redundant.
>>>>>>>
>>>>>>
>>>>>> There are other cloud providers next to AWS.
>>>>>>
>>>>>> Kind regards,
>>>>>> Fokko
>>>>>>
>>>>>>
>>>>>>
>>>>>> Op do 23 mei 2024 om 15:49 schreef Dmitri Bourlatchkov
>>>>>> <dmitri.bourlatch...@dremio.com.invalid>:
>>>>>>
>>>>>>> I think Jack makes a good point with AWS SigV4 Authentication. I
>>>>>>> suppose, in REST Catalog implementations that support that auth method, 
>>>>>>> the
>>>>>>> /v1/oauth/token Catalog REST endpoint is redundant.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Dmitri.
>>>>>>>
>>>>>>> On Thu, May 23, 2024 at 9:20 AM Jack Ye <yezhao...@gmail.com> wrote:
>>>>>>>
>>>>>>>> I do not know enough details about OAuth to make comments about
>>>>>>>> this issue, but just regarding the statement "OAuth2 is the only 
>>>>>>>> mechanism
>>>>>>>> supported by the Iceberg client", AWS Sigv4 auth is also supported, at
>>>>>>>> least in the Java client implementation
>>>>>>>> <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/HTTPClient.java#L72>.
>>>>>>>> It would be nice if we formalize that in the spec, at least define it 
>>>>>>>> as a
>>>>>>>> generic authorization header.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jack Ye
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, May 23, 2024 at 2:51 AM Robert Stupp <sn...@snazy.de>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> Iceberg REST implementations, either accessible on the public
>>>>>>>>> internet
>>>>>>>>> or inside an organization, are usually being secured using
>>>>>>>>> appropriate
>>>>>>>>> authorization mechanisms. The Nessie team is looking at
>>>>>>>>> implementing the
>>>>>>>>> Iceberg REST specification and have some questions around the
>>>>>>>>> security
>>>>>>>>> endpoint(s) defined in the spec.
>>>>>>>>>
>>>>>>>>> TL;DR we have questions (potentially concerns) about having the
>>>>>>>>> ‘/v1/oauth/tokens’ endpoint, for the reasons explained below. We
>>>>>>>>> think
>>>>>>>>> that ‘/v1/oauth/tokens’ poses potential security and OAuth2
>>>>>>>>> compliance
>>>>>>>>> issues, and imposes how authorization should be implemented.
>>>>>>>>> * As an open table format, it would be good for Iceberg to focus
>>>>>>>>> on the
>>>>>>>>> table format / catalog and not how authorization is implemented.
>>>>>>>>> The
>>>>>>>>> existence of an OAuth endpoint pushes implementations to adopt
>>>>>>>>> authorization using only OAuth, whereas the implementers might
>>>>>>>>> choose
>>>>>>>>> several other ways to implement authorization (e.g. SAML). In our
>>>>>>>>> opinion the spec should leave it open to the implementation to
>>>>>>>>> decide
>>>>>>>>> how authorization will be implemented.
>>>>>>>>> * The existence of that endpoint also pushes operators of Iceberg
>>>>>>>>> REST
>>>>>>>>> endpoints into the authorization service business.
>>>>>>>>> * Clients might expose their clear-text credentials to the wrong
>>>>>>>>> service, if the (correct) OAuth endpoint is not configured (humans
>>>>>>>>> do
>>>>>>>>> make mistakes).
>>>>>>>>> * (Naive) Iceberg REST servers may proxy requests received for
>>>>>>>>> ‘/v1/oauth/tokens’ - and effectively become a “man-in-the-middle”,
>>>>>>>>> which
>>>>>>>>> is not fully compliant with the OAuth 2.0 specification.
>>>>>>>>>
>>>>>>>>> Our goals with this discussion are:
>>>>>>>>> 1. Secure the Iceberg REST specification by preventing accidental
>>>>>>>>> misuse/misimplementation.
>>>>>>>>> 2. Prevent that Iceberg REST to get into dictating the
>>>>>>>>> “authorization
>>>>>>>>> server specifics”.
>>>>>>>>> 3. Enable flexibility for Iceberg REST servers to opt for other
>>>>>>>>> authorization mechanisms than OAuth 2.0.
>>>>>>>>> 4. Enable REST servers to opt for integrating with any standard
>>>>>>>>> OAuth2 /
>>>>>>>>> OIDC provider (e.g. Okta, Keycloak, Authelia).
>>>>>>>>>
>>>>>>>>> OAuth 2.0 [1] is one of the common standards accepted in the
>>>>>>>>> industry.
>>>>>>>>> It defines a secure mechanism to access resources (here: Iceberg
>>>>>>>>> REST
>>>>>>>>> endpoints). The most important aspect for OAuth 2.0 resources is
>>>>>>>>> that
>>>>>>>>> (Iceberg REST) servers do not (have to) support password
>>>>>>>>> authentication,
>>>>>>>>> especially considering the security weaknesses inherent in
>>>>>>>>> passwords.
>>>>>>>>> Compromised passwords result in compromised data protected by that
>>>>>>>>> password.
>>>>>>>>>
>>>>>>>>> Therefore OAuth 2.0 defines a set of strict rules. Some of these
>>>>>>>>> are:
>>>>>>>>> * Credentials (for example username/password) must _never_ be sent
>>>>>>>>> to
>>>>>>>>> the resource server, only to the authorization server.
>>>>>>>>> * OAuth 2.0 refresh tokens must _never_ be sent to the resource
>>>>>>>>> server,
>>>>>>>>> only to the authorization server. (“Unlike access tokens, refresh
>>>>>>>>> tokens
>>>>>>>>> are intended for use only with authorization servers and are never
>>>>>>>>> sent
>>>>>>>>> to resource servers.”, cite from section 1.5 of the OAuth RFC
>>>>>>>>> 6749.)
>>>>>>>>> * While the OAuth RFC states "The authorization server may be the
>>>>>>>>> same
>>>>>>>>> server as the resource server or a separate entity", this should
>>>>>>>>> not be
>>>>>>>>> mandated. i.e the spec should be open to supporting
>>>>>>>>> implementations that
>>>>>>>>> have the authorization server and resource server co-located as
>>>>>>>>> well as
>>>>>>>>> separate.
>>>>>>>>>
>>>>>>>>> The Iceberg PR 4771 [2] added the OpenAPI path ‘/v1/oauth/tokens’,
>>>>>>>>> intentionally marked to “To exchange client credentials (client ID
>>>>>>>>> and
>>>>>>>>> secret) for an access token. This uses the client credentials
>>>>>>>>> flow.”
>>>>>>>>> [3]. Technically: client ID and secret are submitted using a HTTP
>>>>>>>>> POST
>>>>>>>>> request to that Iceberg REST endpoint.
>>>>>>>>>
>>>>>>>>> Having ‘/v1/oauth/tokens’ in the Iceberg REST specification can
>>>>>>>>> easily
>>>>>>>>> be seen as a hard requirement. In order to implement this in
>>>>>>>>> compliance
>>>>>>>>> with the OAuth 2.0 spec, that ‘/v1/oauth/tokens’ MUST be the
>>>>>>>>> authorization server. If users do not (want to) implement an
>>>>>>>>> authorization server, the only way to implement this
>>>>>>>>> ‘/v1/oauth/tokens’
>>>>>>>>> endpoint would be to proxy ‘/v1/oauth/tokens’ to the actual
>>>>>>>>> authorization server, which means, that this proxy technically
>>>>>>>>> becomes a
>>>>>>>>> “man in the middle” - knowing both all credentials and all
>>>>>>>>> involved tokens.
>>>>>>>>>
>>>>>>>>> Even if an Iceberg REST server does not implement the
>>>>>>>>> ‘/v1/oauth/tokens’
>>>>>>>>> endpoint, it can still receive requests to ‘/v1/oauth/tokens’
>>>>>>>>> containing
>>>>>>>>> clear text credentials, if clients are misconfigured (humans do
>>>>>>>>> make
>>>>>>>>> mistakes) - it’s a non-zero risk - bad actors can
>>>>>>>>> implement/intercept
>>>>>>>>> that  ‘/v1/oauth/tokens’ endpoint and just wait for misconfigured
>>>>>>>>> clients to send credentials.
>>>>>>>>>
>>>>>>>>> Further usages of the REST Catalog API path ‘/v1/oauth/tokens’ are
>>>>>>>>> “To
>>>>>>>>> exchange a client token and an identity token for a more specific
>>>>>>>>> access
>>>>>>>>> token. This uses the token exchange flow.” and “To exchange an
>>>>>>>>> access
>>>>>>>>> token for one with the same claims and a refreshed expiration
>>>>>>>>> period
>>>>>>>>> This uses the token exchange flow.” Both usages should and can be
>>>>>>>>> implemented differently.
>>>>>>>>>
>>>>>>>>> Apache Iceberg, as a table format project, should recommend
>>>>>>>>> protecting
>>>>>>>>> sensitive information. But Iceberg should not mandate _how_ that
>>>>>>>>> protection is implemented - but the Iceberg REST specification
>>>>>>>>> does
>>>>>>>>> effectively mandate OAuth 2.0, because other Iceberg REST
>>>>>>>>> endpoints do
>>>>>>>>> refer/require OAuth 2.0 specifics. Users that want to use other
>>>>>>>>> mechanisms, because they are forced to do so by their
>>>>>>>>> organization,
>>>>>>>>> would be locked out of Iceberg REST.
>>>>>>>>>
>>>>>>>>> Apache Iceberg should not mandate OAuth 2.0 as the only option -
>>>>>>>>> for the
>>>>>>>>> sake of openness for the project and flexibility for the server
>>>>>>>>> implementations.
>>>>>>>>>
>>>>>>>>> We think that Apache Iceberg REST Catalog spec should not mandate
>>>>>>>>> that a
>>>>>>>>> catalog implementation responds to requests to produce Auth Tokens
>>>>>>>>> (since the REST spec v1 defines a /v1/tokens endpoint, current
>>>>>>>>> implementations have to take deliberate actions when responding to
>>>>>>>>> those
>>>>>>>>> requests, whether with successful token responses or with “access
>>>>>>>>> denied” or “unsupported” responses).
>>>>>>>>>
>>>>>>>>> We propose the following actions:
>>>>>>>>> 1. Immediate mitigation:
>>>>>>>>> 1.1. Remove the ‘/v1/oauth/tokens’ endpoint entirely from the
>>>>>>>>> Iceberg’s
>>>>>>>>> OpenAPI spec w/o replacement.
>>>>>>>>> 1.2. As long as OAuth2 is the only mechanism supported by the
>>>>>>>>> Iceberg
>>>>>>>>> client, make the existing client parameter “oauth2-server-uri”
>>>>>>>>> mandatory. The Iceberg REST catalog must fail to initialize if the
>>>>>>>>> “oauth2-server-uri” parameter is not defined.
>>>>>>>>> 1.3. Remove all fallbacks to the ‘/v1/oauth/tokens’ endpoint.
>>>>>>>>> 1.4. Forbid or discourage the communication of tokens from any
>>>>>>>>> Iceberg
>>>>>>>>> REST Catalog endpoint, both via the "token" property or with any
>>>>>>>>> of the
>>>>>>>>> "urn:ietf:params:oauth:token-type:*" properties.
>>>>>>>>> 2. As a follow up: We’d propose a couple of implementation fixes
>>>>>>>>> and
>>>>>>>>> changes and test improvements.
>>>>>>>>> 3. As a follow up: Define a discovery mechanism for both the
>>>>>>>>> Iceberg
>>>>>>>>> REST base URI and OAuth 2.0 endpoints/discovery, which allows
>>>>>>>>> users to
>>>>>>>>> use a single URI to securely access Iceberg REST endpoints.
>>>>>>>>> 4. As a follow up: Not new, but we also want to improve the
>>>>>>>>> Iceberg REST
>>>>>>>>> specification via the “new” REST proposal.
>>>>>>>>>
>>>>>>>>> We do not think that adding recommendations to
>>>>>>>>> inline-documentation is
>>>>>>>>> enough to fully mitigate the above concerns.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> References:
>>>>>>>>>
>>>>>>>>> [1] RFC 6749 - The OAuth 2.0 Authorization Framework,
>>>>>>>>> https://datatracker.ietf.org/doc/html/rfc6749
>>>>>>>>> [2] Iceberg pull request 4771 - Core: Add OAuth2 to REST catalog
>>>>>>>>> spec -
>>>>>>>>> https://github.com/apache/iceberg/pull/4771
>>>>>>>>> [3] Iceberg pull request 4843 - Spec: Add more context about
>>>>>>>>> OAuth2 to
>>>>>>>>> the REST spec - https://github.com/apache/iceberg/pull/4843
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Robert Stupp
>>>>>>>>> @snazy
>>>>>>>>>
>>>>>>>>>

Reply via email to