Summary of the REST-sync meeting yesterday, comments welcome:


There were no objections to the PR for M1 (deprecation of the endpoint in the spec). The only comment was to deprecate the oauth schema types as well. This has been integrated in the PR.


A concern was raised that it's convenient for users who generate code wrt oauth. However, the types and endpoints are defined by the oauth specs and depend on the oauth (and possibly other) specs. Consensus in the sync was to remove everything that's oauth specific.


A concern about backwards-compatibility (when the tokens-endpoint is removed from the spec) was raised. The path proposed in the improvement-doc is that clients can and should configure the currently existing endpoint explicitly. If a service wants the current tokens-endpoint, it can still be used.




On 28.06.24 20:44, Robert Stupp wrote:

It's been a while since the discussion started. There were no objections to the proposal, so I went ahead and started to implement "M1" of the document [1].


[1] PR for M1: https://github.com/apache/iceberg/pull/10603


Robert



On 19.06.24 15:11, Robert Stupp wrote:

Alright, I've just created the issue: https://github.com/apache/iceberg/issues/10537

The proposal document is here: https://docs.google.com/document/d/1Xi5MRk8WdBWFC3N_eSmVcrLhk3yu5nJ9x_wC0ec6kVQ/

Robert


On 29.05.24 22:01, Robert Stupp wrote:

Jack's proposal is pretty much along what we've been thinking of.

We can help with OAuth 2 client implementations for Java that support bearer, client-credentials, password as well as authorization code and device code flows (see docs [1] and implementation [2]. All implementations are built via NessieAutnenticationProvider [3] implementations. I'm sure we can figure out a way to use the code both in Nessie and Iceberg (it's about 7k lines of code just for oauth2 including tests). Technically those are all HTTP "request/response interceptors".

For 1), I went ahead and opened a draft PR [4]

Regarding the "backwards compatibility", I think it is better to be explicit and always require the "oauth-server-uri" parameter. We can make the implementation deal with relative URIs, which would be resolved against the given Iceberg-REST base "uri" parameter value. This prevents Iceberg-REST users from accidentally exposing their client-ID & client-secret to Iceberg-REST service providers.

IMO we should also make sure that whatever is implemented works against major OAuth2 providers/servers, for example Keycloak, Authelia, Authentik, Okta, and have integration tests to validate the implementations.

Also +1 on having a "REST catalog authentication spec". We can list all known auth mechanisms there, including recommended config option keys and possible values. However, implementations may differ across languages (Java, Python, Rust, Go), so that might only serve as a guideline/recommendation, not as a strict requirement. This can also serve as documentation/reference of what each client (language/version) supports and how.

Robert


[1] https://projectnessie.org/nessie-latest/client_config/#oauth2-settings

[2] https://github.com/projectnessie/nessie/tree/main/api/client/src/main/java/org/projectnessie/client/auth/oauth2

[3] https://github.com/projectnessie/nessie/blob/main/api/client/src/main/java/org/projectnessie/client/auth/NessieAuthenticationProvider.java

[4] https://github.com/apache/iceberg/pull/10398

On 29.05.24 20:03, Jack Ye wrote:
Just to reiterate my points discussed in the community sync here: the more I think about it the more I agree the OAuth endpoint *should be removed from the REST spec*. Even though the endpoint is optional, and even if we do not care about the security concerns, it still provides users an impression that the endpoint "should" be implemented, or "is the preferred authentication mechanism". And as we have found out, the server capability proposal does not cover this case since this is the first endpoint to hit before the GetConfig endpoint.

As Ryan said, if we want to do that we need an alternative plan. I don't have anything concrete, but here is my line of thought:

1. remove OAuth2 endpoint from the "REST OpenAPI spec"

2. create a client-side interface (in each language) that different authentication mechanisms can be plugged in to talk to the REST catalog

3. refactor and make OAuth2 an implementation of that interface. I can also help with doing the same for AWS Sigv4, and the community can further support some additional ones like Kerberos, SAML, Google SSO, etc. based on the individual use cases.

4. turn 2 + 3 into a "REST catalog authentication spec" that documents about all the supported authentication mechanisms and their defaults. For OAuth2, the default is to have the auth server at the same endpoint as the resource server for backwards compatibility, but that is a configurable property, and we could recommend not to do that based on security concerns.

Best,
Jack Ye

On Wed, May 29, 2024 at 10:28 AM Steven Wu <stevenz...@gmail.com> wrote:

    Wondering if the auth endpoints can be separated out to a
    separate OpenAPI spec file. Then we still have some reference
    for interactions with auth server and make it clear it is not
    required as part of the REST catalog server. In most enterprise
    environments, auth server is likely a separate server.

    On Tue, May 28, 2024 at 1:25 PM Alex Dutra
    <alex.du...@dremio.com.invalid> wrote:

        Hi,

            On point 4, isn't that possible today, Can't that be
            achieved with the current token exchange approach, and
            the internal implementation of the endpoint?


        Unfortunately, no. Token exchange is not widely adopted
        yet: for example, Keycloak has only partial support for it,
        and Authelia, or Authentik, have no support for it at all.

        This, and a few other technical issues with the current
        internals of the REST client, makes it nearly impossible to
        achieve a good integration of Iceberg REST with the
        majority of popular OSS authorization servers.

        I am planning to start another email thread to discuss
        these practicalities, but let's first reach consensus on
        the broader security issues voiced here, before we tackle
        the details.

        Thanks,

        Alex Dutra

        On Tue, May 28, 2024 at 8:41 PM Amogh Jahagirdar
        <am...@tabular.io> wrote:

            I disagree with removing "/v1/oauth/tokens" and I think
            I also disagree with the premise that implementing that
            endpoint is required, but I can understand how that's
            not clear in the spec. I think we can address the
            required vs non-required discussion with the
            capabilities PR.
            <https://github.com/apache/iceberg/pull/9940>

            It seems like another part of what's driving this
            discussion is some concern around how do we enforce
            REST catalog implementations which do implement this
            endpoint to make sure that the implementation is secure
            (for example to avoid the MITM example that was brought
            up). This is ultimately a runtime detail. To me it
            seems like if we make it clear that such an endpoint
            should be implemented respecting OAuth2 standards, and
            we know that OAuth2 compliance requires avoiding that
            MITM situation, then runtime implementations should
            just follow the spec there

            >3. Enable flexibility for Iceberg REST servers to opt
            for other
            authorization mechanisms than OAuth 2.0.
            >4. Enable REST servers to opt for integrating with any
            standard OAuth2 /
            OIDC provider (e.g. Okta, Keycloak, Authelia).

            I agree with both of these points; again I don't think
            the intention is Oauth2 is the only way, but I think
            the capabilities PR will make that even more clear.
            On point 4, isn't that possible today, Can't that be
            achieved with the current token exchange approach, and
            the internal implementation of the endpoint? Sorry if I
            missed that explanation.

            Thanks,

            Amogh Jahagirdar

            On Tue, May 28, 2024 at 11:13 AM Yufei Gu
            <flyrain...@gmail.com> wrote:

                Not an expert on authentication, but reading from
                the context, I agree that it’s not a good practice
                to use a resource server as a token server. The
                resource server would need to securely handle and
                store credentials or tokens, increasing the risk of
                credential theft or leakage. Making the token
                endpoint optional will mitigate the issue a bit.
                But if we want to disable it completely, it's
                better to do it now to prevent any issues and
                migration costs in the future. Can we have a
                consensus on it?


                I would prefer to deprecate it to prevent any
                intentional and unintentional misuse. We will also
                need to change the clients since it connects to the
                endpoint by default.


                Yufei


                On Tue, May 28, 2024 at 8:27 AM Jack Ye
                <yezhao...@gmail.com> wrote:

                    Sounds like we should try to finalize a
                    consensus around
                    https://github.com/apache/iceberg/pull/9940, so
                    that we make it very clear what APIs/features
                    are optional.

                    -Jack

                    On Tue, May 28, 2024 at 7:25 AM Fokko
                    Driesprong <fo...@apache.org> wrote:

                        Hey Robert,

                        Sorry for the late reply as I was out last
                        week. I'm not an OAuth guru either, but
                        some context from my end.

                            * Credentials (for example
                            username/password) must _never_ be sent to
                            the resource server, only to the
                            authorization server.


                        In an earlier discussion
                        <https://github.com/apache/iceberg/pull/8976>,
                        it was agreed that the resource server can
                        also function as the authorization server.
                        But the roles can also be separate.

                            1.2. As long as OAuth2 is the only
                            mechanism supported by the Iceberg
                            client, make the existing client
                            parameter “oauth2-server-uri”
                            mandatory. The Iceberg REST catalog
                            must fail to initialize if the
                            “oauth2-server-uri” parameter is not
                            defined.

                        It can also be that there is no
                        authentication in the case of an internal
                        REST catalog. For example, the
                        iceberg-rest-image
                        <https://github.com/tabular-io/iceberg-rest-image>
                        that we use for integration tests in PyIceberg.

                            We think that Apache Iceberg REST
                            Catalog spec should not mandate that a
                            catalog implementation responds to
                            requests to produce Auth Tokens
                            (since the REST spec v1 defines a
                            /v1/tokens endpoint, current
                            implementations have to take deliberate
                            actions when responding to those
                            requests, whether with successful token
                            responses or with “access
                            denied” or “unsupported” responses).

                        The `/v1/tokens` endpoint is optional
                        
<https://github.com/apache/iceberg-python/blob/756ae625a2ea0f9c12df78430512ce991f6a1976/pyiceberg/catalog/rest.py#L488-L489>.

                            * Credentials (for example
                            username/password) must _never_ be sent to
                            the resource server, only to the
authorization server.

                        I fully agree!

                            Even if an Iceberg REST server does not
                            implement the ‘/v1/oauth/tokens’
                            endpoint, it can still receive requests
                            to ‘/v1/oauth/tokens’ containing
                            clear text credentials, if clients are
                            misconfigured (humans do make
                            mistakes) - it’s a non-zero risk - bad
                            actors can implement/intercept
                            that  ‘/v1/oauth/tokens’ endpoint and
                            just wait for misconfigured
                            clients to send credentials.


                        I think the wording is chosen badly. It
                        should not send any credentials, but the
                        code (as in this example
                        
<https://developers.google.com/identity/protocols/oauth2#installed> by
                        GCS).

                            I think Jack makes a good point with
                            AWS SigV4 Authentication. I suppose, in
                            REST Catalog implementations that
                            support that auth method, the
                            /v1/oauth/token Catalog REST endpoint
                            is redundant.


                        There are other cloud providers next to AWS.

                        Kind regards,
                        Fokko



                        Op do 23 mei 2024 om 15:49 schreef Dmitri
                        Bourlatchkov
                        <dmitri.bourlatch...@dremio.com.invalid>:

                            I think Jack makes a good point with
                            AWS SigV4 Authentication. I suppose, in
                            REST Catalog implementations that
                            support that auth method, the
                            /v1/oauth/token Catalog REST endpoint
                            is redundant.

                            Cheers,
                            Dmitri.

                            On Thu, May 23, 2024 at 9:20 AM Jack Ye
                            <yezhao...@gmail.com> wrote:

                                I do not know enough details about
                                OAuth to make comments about this
                                issue, but just regarding the
                                statement "OAuth2 is the only
                                mechanism supported by the Iceberg
                                client", AWS Sigv4 auth is also
                                supported, at least in the Java
                                client implementation
                                
<https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/HTTPClient.java#L72>.
                                It would be nice if we formalize
                                that in the spec, at least define
                                it as a generic authorization header.

                                Best,
                                Jack Ye



                                On Thu, May 23, 2024 at 2:51 AM
                                Robert Stupp <sn...@snazy.de> wrote:

                                    Hi all,

                                    Iceberg REST implementations,
                                    either accessible on the public
                                    internet
                                    or inside an organization, are
                                    usually being secured using
                                    appropriate
                                    authorization mechanisms. The
                                    Nessie team is looking at
                                    implementing the
                                    Iceberg REST specification and
                                    have some questions around the
                                    security
                                    endpoint(s) defined in the spec.

                                    TL;DR we have questions
                                    (potentially concerns) about
                                    having the
                                    ‘/v1/oauth/tokens’ endpoint,
                                    for the reasons explained
                                    below. We think
                                    that ‘/v1/oauth/tokens’ poses
                                    potential security and OAuth2
                                    compliance
                                    issues, and imposes how
                                    authorization should be
                                    implemented.
                                    * As an open table format, it
                                    would be good for Iceberg to
                                    focus on the
                                    table format / catalog and not
                                    how authorization is
                                    implemented. The
                                    existence of an OAuth endpoint
                                    pushes implementations to adopt
                                    authorization using only OAuth,
                                    whereas the implementers might
                                    choose
                                    several other ways to implement
                                    authorization (e.g. SAML). In our
                                    opinion the spec should leave
                                    it open to the implementation
                                    to decide
                                    how authorization will be
                                    implemented.
                                    * The existence of that
                                    endpoint also pushes operators
                                    of Iceberg REST
                                    endpoints into the
                                    authorization service business.
                                    * Clients might expose their
                                    clear-text credentials to the
                                    wrong
                                    service, if the (correct) OAuth
                                    endpoint is not configured
                                    (humans do
                                    make mistakes).
                                    * (Naive) Iceberg REST servers
                                    may proxy requests received for
                                    ‘/v1/oauth/tokens’ - and
                                    effectively become a
                                    “man-in-the-middle”, which
                                    is not fully compliant with the
                                    OAuth 2.0 specification.

                                    Our goals with this discussion are:
                                    1. Secure the Iceberg REST
                                    specification by preventing
                                    accidental
                                    misuse/misimplementation.
                                    2. Prevent that Iceberg REST to
                                    get into dictating the
                                    “authorization
                                    server specifics”.
                                    3. Enable flexibility for
                                    Iceberg REST servers to opt for
                                    other
                                    authorization mechanisms than
                                    OAuth 2.0.
                                    4. Enable REST servers to opt
                                    for integrating with any
                                    standard OAuth2 /
                                    OIDC provider (e.g. Okta,
                                    Keycloak, Authelia).

                                    OAuth 2.0 [1] is one of the
                                    common standards accepted in
                                    the industry.
                                    It defines a secure mechanism
                                    to access resources (here:
                                    Iceberg REST
                                    endpoints). The most important
                                    aspect for OAuth 2.0 resources
                                    is that
                                    (Iceberg REST) servers do not
                                    (have to) support password
                                    authentication,
                                    especially considering the
                                    security weaknesses inherent in
                                    passwords.
                                    Compromised passwords result in
                                    compromised data protected by
                                    that password.

                                    Therefore OAuth 2.0 defines a
                                    set of strict rules. Some of
                                    these are:
                                    * Credentials (for example
                                    username/password) must _never_
                                    be sent to
                                    the resource server, only to
                                    the authorization server.
                                    * OAuth 2.0 refresh tokens must
                                    _never_ be sent to the resource
                                    server,
                                    only to the authorization
                                    server. (“Unlike access tokens,
                                    refresh tokens
                                    are intended for use only with
                                    authorization servers and are
                                    never sent
                                    to resource servers.”, cite
                                    from section 1.5 of the OAuth
                                    RFC 6749.)
                                    * While the OAuth RFC states
                                    "The authorization server may
                                    be the same
                                    server as the resource server
                                    or a separate entity", this
                                    should not be
                                    mandated. i.e the spec should
                                    be open to supporting
                                    implementations that
                                    have the authorization server
                                    and resource server co-located
                                    as well as
                                    separate.

                                    The Iceberg PR 4771 [2] added
                                    the OpenAPI path
                                    ‘/v1/oauth/tokens’,
                                    intentionally marked to “To
                                    exchange client credentials
                                    (client ID and
                                    secret) for an access token.
                                    This uses the client
                                    credentials flow.”
                                    [3]. Technically: client ID and
                                    secret are submitted using a
                                    HTTP POST
                                    request to that Iceberg REST
                                    endpoint.

                                    Having ‘/v1/oauth/tokens’ in
                                    the Iceberg REST specification
                                    can easily
                                    be seen as a hard requirement.
                                    In order to implement this in
                                    compliance
                                    with the OAuth 2.0 spec, that
                                    ‘/v1/oauth/tokens’ MUST be the
                                    authorization server. If users
                                    do not (want to) implement an
                                    authorization server, the only
                                    way to implement this
                                    ‘/v1/oauth/tokens’
                                    endpoint would be to proxy
                                    ‘/v1/oauth/tokens’ to the actual
                                    authorization server, which
                                    means, that this proxy
                                    technically becomes a
                                    “man in the middle” - knowing
                                    both all credentials and all
                                    involved tokens.

                                    Even if an Iceberg REST server
                                    does not implement the
                                    ‘/v1/oauth/tokens’
                                    endpoint, it can still receive
                                    requests to ‘/v1/oauth/tokens’
                                    containing
                                    clear text credentials, if
                                    clients are misconfigured
                                    (humans do make
                                    mistakes) - it’s a non-zero
                                    risk - bad actors can
                                    implement/intercept
                                    that ‘/v1/oauth/tokens’
                                    endpoint and just wait for
                                    misconfigured
                                    clients to send credentials.

                                    Further usages of the REST
                                    Catalog API path
                                    ‘/v1/oauth/tokens’ are “To
                                    exchange a client token and an
                                    identity token for a more
                                    specific access
                                    token. This uses the token
                                    exchange flow.” and “To
                                    exchange an access
                                    token for one with the same
                                    claims and a refreshed
                                    expiration period
                                    This uses the token exchange
                                    flow.” Both usages should and
                                    can be
                                    implemented differently.

                                    Apache Iceberg, as a table
                                    format project, should
                                    recommend protecting
                                    sensitive information. But
                                    Iceberg should not mandate
                                    _how_ that
                                    protection is implemented - but
                                    the Iceberg REST specification
                                    does
                                    effectively mandate OAuth 2.0,
                                    because other Iceberg REST
                                    endpoints do
                                    refer/require OAuth 2.0
                                    specifics. Users that want to
                                    use other
                                    mechanisms, because they are
                                    forced to do so by their
                                    organization,
                                    would be locked out of Iceberg
                                    REST.

                                    Apache Iceberg should not
                                    mandate OAuth 2.0 as the only
                                    option - for the
                                    sake of openness for the
                                    project and flexibility for the
                                    server
                                    implementations.

                                    We think that Apache Iceberg
                                    REST Catalog spec should not
                                    mandate that a
                                    catalog implementation responds
                                    to requests to produce Auth Tokens
                                    (since the REST spec v1 defines
                                    a /v1/tokens endpoint, current
                                    implementations have to take
                                    deliberate actions when
                                    responding to those
                                    requests, whether with
                                    successful token responses or
                                    with “access
                                    denied” or “unsupported”
                                    responses).

                                    We propose the following actions:
                                    1. Immediate mitigation:
                                    1.1. Remove the
                                    ‘/v1/oauth/tokens’ endpoint
                                    entirely from the Iceberg’s
                                    OpenAPI spec w/o replacement.
                                    1.2. As long as OAuth2 is the
                                    only mechanism supported by the
                                    Iceberg
                                    client, make the existing
                                    client parameter
                                    “oauth2-server-uri”
                                    mandatory. The Iceberg REST
                                    catalog must fail to initialize
                                    if the
                                    “oauth2-server-uri” parameter
                                    is not defined.
                                    1.3. Remove all fallbacks to
                                    the ‘/v1/oauth/tokens’ endpoint.
                                    1.4. Forbid or discourage the
                                    communication of tokens from
                                    any Iceberg
                                    REST Catalog endpoint, both via
                                    the "token" property or with
                                    any of the
                                    "urn:ietf:params:oauth:token-type:*"
                                    properties.
                                    2. As a follow up: We’d propose
                                    a couple of implementation
                                    fixes and
                                    changes and test improvements.
                                    3. As a follow up: Define a
                                    discovery mechanism for both
                                    the Iceberg
                                    REST base URI and OAuth 2.0
                                    endpoints/discovery, which
                                    allows users to
                                    use a single URI to securely
                                    access Iceberg REST endpoints.
                                    4. As a follow up: Not new, but
                                    we also want to improve the
                                    Iceberg REST
                                    specification via the “new”
                                    REST proposal.

                                    We do not think that adding
                                    recommendations to
                                    inline-documentation is
                                    enough to fully mitigate the
                                    above concerns.


                                    References:

                                    [1] RFC 6749 - The OAuth 2.0
                                    Authorization Framework,
                                    
https://datatracker.ietf.org/doc/html/rfc6749
                                    [2] Iceberg pull request 4771 -
                                    Core: Add OAuth2 to REST
                                    catalog spec -
                                    https://github.com/apache/iceberg/pull/4771
                                    [3] Iceberg pull request 4843 -
                                    Spec: Add more context about
                                    OAuth2 to
                                    the REST spec -
                                    https://github.com/apache/iceberg/pull/4843

-- Robert Stupp
                                    @snazy

--
Robert Stupp
@snazy
--
Robert Stupp
@snazy
--
Robert Stupp
@snazy

--
Robert Stupp
@snazy

Reply via email to