Forgot the Links: [1] https://docs.databricks.com/aws/en/external-access/admin#grant-a-principal-unity-catalog-privileges
[2] https://docs.databricks.com/aws/en/dev-tools/auth/#databricks-authorization-methods [3] https://docs.snowflake.com/en/user-guide/opencatalog/query-table-using-third-party-engine [4] https://docs.snowflake.com/en/user-guide/opencatalog/configure-service-connection On Tue, 9 Sept 2025 at 13:13, Christian Thiel <[email protected]> wrote: > Dear all, Thanks for pushing this topic forward. > > Regarding @Dan’s first question on which flows to include: I am in favor > of removing ROPC to keep the initial contribution streamlined. Similarly, I > suggest we focus on a single Human-to-Machine (U2M) flow for now—preferably > Device Code Flow - in addition to Token-Exchange. Device-Code flow simpler > to implement than Authorization Code and doesn’t require opening a socket > for server callbacks. > > I’ll contribute use case documentation this week to the original proposal. > > I want to take a strong stance on including U2M flows in Iceberg for the > first time. Currently, it’s not possible to deploy a secure exploration or > development environment for Users, simply because we are missing the proper > Authentication. This is a significant limitation, especially as permission > management increasingly shifts to the Catalog. > > Client Credential Flow—like Username/Password and Basic Auth—is unsuitable > for Users. It lacks support for 2FA, which is standard for human > authentication. Consequently, most Identity Providers do not allow Client > Credentials to be issued for human identities. This forces users to create > secondary machine identities for their query engines, which must then be > granted the same permissions as their human identity. This duplication is > not only cumbersome but also introduces security and governance challenges. > > We see this issue broadly in the Lakekeeper community, but it’s also > reflected in the documentation of major platforms. If the point is clear, > feel free to stop reading here. Otherwise, here are two real-world examples > that illustrate the problem: > > - > > Databricks: [1] shows that OAuth and PATs are supported for > authentication. However, since Identity Providers don’t support Client > Credentials for Users, PATs become the fallback. Yet, just one link further > in [2], PATs are discouraged due to their inherent insecurity. > - > > Snowflake: To connect a third-party query engine, Snowflake Open > Catalog requires a service connection [3]. As documented in the service > connection setup [4] this can only be done by administrators. This means > that even users who already have access to data can't access it with > external query engines. Instead, they must involve an admin for every query > engine they wish to use. The admin must then duplicate permissions across > both the human and machine identities, by at least allowing both identities > to assume the same roles. The resulting client credentials are just as > unsuitable for human authentication as PATs. > > It has been surprisingly quiet in this thread, I would be interested in > opinions from the broader community on this topic as well, especially the > need for U2M flows. > > Regarding your second question @Dan, I am not deep enough into Java to > have a founded opinion. I hope that stripping down the flows for now helps. > > Best, > > Christian > > > > On Tue, 2 Sept 2025 at 20:08, Daniel Weeks <[email protected]> wrote: > >> Hey Alex, >> >> Thanks for the update and I'm glad to see the reduction in overall size. >> >> I feel like the discussion we had focused more on integrating and having >> a single OAuth2Manager solution as opposed to the direction of Option 1 >> (separate module), which would create two different flavors of OAuth2. I >> took a quick look at the latest from the project and there are a lot of >> parallels between the proposal and what's currently in the baseline from an >> implementation perspective. >> >> I feel like there are two things we need to address: >> >> 1) What flows do we want to support (e.g. ROPC is discouraged) and we >> probably need to document the use cases (U2M vs. M2M flows). >> 2) How can we go about integrating these new flows into the current >> implementation >> >> I'm not sure I follow some of the items you have listed above (like the >> spark/flink runtimes), but maybe these are addressed by having the >> implementation included in core. >> >> We can follow up in the next sync as well, >> -Dan >> >> On Mon, Sep 1, 2025 at 8:15 AM Alexandre Dutra <[email protected]> wrote: >> >>> Hi all, >>> >>> Bumping this thread again with a few updates from the AuthManager >>> project: >>> >>> The project recently adopted the Nimbus OIDC Java SDK. This decision >>> was made after carefully considering its pros and cons, ultimately >>> concluding it was the best choice for the project's continued growth. >>> >>> This move addresses two prior concerns about the donation: >>> >>> - The absence of a recognized OAuth2 library as the project's foundation. >>> - The volume of code to be donated: Nimbus incidentally reduced the >>> number of Java production classes by half (from about 90 to 45). >>> >>> With the above in mind, I've simplified my previous PR breakdown >>> proposal to align with the current codebase, and the updated version >>> is as follows: >>> >>> 1. Project setup >>> 2. OAuth2Manager >>> 3. Token Exchange >>> 4. Authorization Code >>> 5. Device Code >>> 6. Resource Owner Password >>> 7. Integration & Stress tests >>> 8. Spark & Flink runtimes >>> 9. Advanced authentication >>> 10. Documentation generator tool >>> >>> Note: the above plan assumes Option 1 from my previous email (donation >>> within the Apache Iceberg repository, as a separate module). >>> >>> Let me know what you all think of this revised plan. >>> >>> Thanks, >>> Alex >>> >>> On Thu, Jul 31, 2025 at 7:29 PM Alexandre Dutra <[email protected]> >>> wrote: >>> > >>> > Hi all, >>> > >>> > Thanks for the productive discussion yesterday ! Since there isn't a >>> > recording available yet, I wanted to summarize the key outcomes and >>> > next steps: >>> > >>> > Our main questions revolved around "where" and "what": where to host >>> > the donated code, and what features to accept. >>> > >>> > I believe we should start by focusing on the first question: the >>> > code's location. During our discussion, we explored a few options: >>> > >>> > 1) Within the Apache Iceberg repository, as a separate module >>> > 2) In a new repository under Apache Iceberg governance >>> > 3) In a new repository under Apache Polaris governance >>> > >>> > Each option has its pros and cons: >>> > >>> > - Option 1: This offers a better user experience, as it makes the new >>> > manager readily available for Spark and Flink, and simplifies its >>> > integration into the Iceberg connector for Trino. The main drawback is >>> > the increased maintenance burden. >>> > >>> > - Options 2 and 3: These options would require users to adjust their >>> > habits by adding more JARs or packages. Releases would also follow a >>> > separate cadence. However, the code would be better confined within >>> > its own repository, which facilitates maintenance. >>> > >>> > I already voice my preference for Option 1, but I don't have any >>> > strong opinions against the others. >>> > >>> > I would love to hear the opinions of all those involved. >>> > >>> > Thanks, >>> > Alex >>> > >>> > >>> > On Wed, Jul 30, 2025 at 2:41 PM Alexandre Dutra <[email protected]> >>> wrote: >>> > > >>> > > Hi Ryan, >>> > > >>> > > Great idea! I will add this topic to the agenda today. >>> > > >>> > > I also prepared a proposal document to facilitate the discussion: >>> > > >>> > > >>> https://docs.google.com/document/d/1ZcZ5VrXZZOgYllPI9-HTZt8986kBJTMQwFHT_-ASgj0/edit?usp=sharing >>> > > >>> > > Thanks, >>> > > Alex >>> > > >>> > > On Wed, Jul 30, 2025 at 1:23 AM Ryan Blue <[email protected]> wrote: >>> > > > >>> > > > Hi Alex, I think it's a great idea to break down contributions >>> like this into smaller PRs. It's probably good to discuss this at >>> tomorrow's catalog sync to prioritize the functionality you want to add and >>> figure out the best way to fit it in. >>> > > > >>> > > > On Tue, Jul 29, 2025 at 11:33 AM Alex Dutra >>> <[email protected]> wrote: >>> > > >> >>> > > >> Dear Community, >>> > > >> >>> > > >> I would like to revive this discussion regarding the potential >>> donation of Dremio's Auth Manager. >>> > > >> >>> > > >> Over the past few days, I have explored the suggestion of >>> dividing the contribution into smaller parts. I am pleased to report that I >>> have successfully broken down the features into approximately 15 pull >>> requests, targeting the main Iceberg repository. >>> > > >> >>> > > >> While these pull requests are all rather substantial, I think >>> that they remain within a manageable size for reviewers. >>> > > >> >>> > > >> Would this approach be a good path forward? If so, I can share >>> more details about the timeline and roadmap I have in mind, and of course, >>> I am prepared to begin the donation as soon as I have the Community's green >>> light. >>> > > >> >>> > > >> Thanks, >>> > > >> Alex Dutra >>> > > >> >>> > > >> >>> > > >> On Wed, Jun 25, 2025 at 9:57 AM Alex Dutra <[email protected]> >>> wrote: >>> > > >>> >>> > > >>> Hi Daniel, hi all, >>> > > >>> >>> > > >>> Sorry for the late reply. Here are some answers to your >>> questions: >>> > > >>> >>> > > >>> > I was under the impression that the AuthManager implementation >>> was relatively small (based on the recent work for the GCP AuthManager) >>> > > >>> >>> > > >>> These are not comparable. The GCP AuthManager is small because >>> it only >>> > > >>> works for GCP, and thus can leverage Google auth libraries (more >>> > > >>> specifically, it uses the google-auth-library-oauth2-http >>> artifact; >>> > > >>> and since this artifact is already a required dependency for >>> > > >>> iceberg-gcp, it doesn't bring in any extra dependency). >>> > > >>> >>> > > >>> Conversely, this AuthManager is a general-purpose AuthManager >>> that can >>> > > >>> work with any IDP. >>> > > >>> >>> > > >>> > The broader community wasn't involved in decisions made about >>> the implementation >>> > > >>> >>> > > >>> That’s exactly the purpose of this donation. >>> > > >>> >>> > > >>> > "impersonation flow" which I'm not familiar with >>> > > >>> >>> > > >>> This is a feature where the manager can dynamically fetch the >>> subject >>> > > >>> token for a token exchange, thus managing both the catalog's >>> token and >>> > > >>> the user's token, facilitating impersonation (and delegation) use >>> > > >>> cases. Hence the name (admittedly a bit confusing). This feature >>> is >>> > > >>> still evolving, but we received positive feedback from users and >>> we >>> > > >>> believe it brings a lot of value – and is not something that a >>> > > >>> third-party library could do. >>> > > >>> >>> > > >>> > we need to break it into smaller contributions and figure out >>> the appropriate way to review and assimilate the functionality >>> > > >>> >>> > > >>> While we are open to this option, we are concerned about the >>> potential >>> > > >>> duration of its completion. In the interim, users have expressed >>> a >>> > > >>> need for improved OAuth2 support. Would it be possible to gain >>> some >>> > > >>> clarity regarding the timeline for a review of this initiative? >>> > > >>> Perhaps an initial review of the current codebase could help >>> identify >>> > > >>> and address any potential roadblocks? I can also schedule a demo >>> of >>> > > >>> the new auth manager, if that helps. >>> > > >>> >>> > > >>> > how well the community understands the behaviors. >>> > > >>> >>> > > >>> While OAuth2 may not be familiar or palatable to most Iceberg >>> > > >>> contributors, I am confident that some of them possess the >>> expertise >>> > > >>> to effectively review and assess the donation. >>> > > >>> >>> > > >>> > The main competency of this project isn't to implement >>> security protocols >>> > > >>> >>> > > >>> This may be true for the GCP auth manager or for the SigV4 one – >>> these >>> > > >>> are vendor-specific and can leverage the respective vendor's >>> SDK. But >>> > > >>> how would we support OAuth2 in a generic way otherwise? Or >>> Kerberos? >>> > > >>> Whether this is a competency of the project or not is debatable. >>> > > >>> Managing HTTP requests is not a main competency of this project >>> > > >>> either, and yet we have one RESTClient interface and one >>> HTTPClient >>> > > >>> implementation, and lots of JSON parsers. >>> > > >>> >>> > > >>> The RESTClient in its current form already implies using some >>> > > >>> authentication protocol. The simple case of using static >>> (provided via >>> > > >>> configuration) tokens does not cover real-world cases that users >>> have >>> > > >>> expressed interest in. Accepting the Auth Manager will certainly >>> > > >>> require some extra attention to security protocols from Iceberg >>> > > >>> maintainers, but it will allow the project to support more >>> advanced >>> > > >>> use cases. Additionally, the Auth Manager provides a path for >>> users of >>> > > >>> the existing, deprecated “/token” endpoint to migrate to standard >>> > > >>> RFC-based OAuth flows. >>> > > >>> >>> > > >>> > Was there any exploration of leveraging other standard >>> implementations like Apache Oltu, Nimbus, etc. to build the implementation >>> off of? >>> > > >>> >>> > > >>> Yes, we considered that and decided not to go down that route. >>> For a >>> > > >>> few reasons: >>> > > >>> >>> > > >>> 1. Most OAuth libraries provide building blocks to create >>> clients, but >>> > > >>> they are not fully-fledged clients; you still need to write code >>> in >>> > > >>> order to glue things together [1]. >>> > > >>> >>> > > >>> 2. These libraries usually have (too?) many dependencies [2]; >>> some of >>> > > >>> them have not been maintained for a while. And Apache Oltu is >>> retired. >>> > > >>> In contrast, our Auth Manager only has one small dependency: >>> > > >>> auth0-jwt. >>> > > >>> >>> > > >>> 3. If you delegate to a third-party library, then you cannot >>> share the >>> > > >>> catalog's RESTClient or Executor. The library is going to >>> maintain its >>> > > >>> own HTTP client and executor, leading to increased resource >>> > > >>> consumption. >>> > > >>> >>> > > >>> 4. Nothing precludes us from switching to a third-party library >>> later >>> > > >>> on (it's an implementation detail). We thought it's best to >>> start with >>> > > >>> a self-contained project. >>> > > >>> >>> > > >>> Thanks, >>> > > >>> Alex >>> > > >>> >>> > > >>> [1]: >>> https://connect2id.com/products/nimbus-oauth-openid-connect-sdk/guides/oauth-client-server-development >>> > > >>> [2] For Nimbus: >>> > > >>> >>> https://central.sonatype.com/artifact/com.nimbusds/oauth2-oidc-sdk/11.26/dependencies >>> > > >>> >>> > > >>> On Thu, Jun 19, 2025 at 5:58 PM Daniel Weeks <[email protected]> >>> wrote: >>> > > >>> > >>> > > >>> > I hadn't seen this thread before we discussed it yesterday, >>> but since then I've taken a look and have some reservations. >>> > > >>> > >>> > > >>> > I was under the impression that the AuthManager implementation >>> was relatively small (based on the recent work for the GCP AuthManager), >>> but after taking a look at the repo, this is far from a small contribution. >>> > > >>> > >>> > > >>> > I strongly support more robust security support (especially >>> for OAuth2/OIDC), but I don't feel this is going to be a small effort to >>> introduce. The broader community wasn't involved in decisions made about >>> the implementation and I see elements that give me pause (like >>> "impersonation flow" which I'm not familiar with and implementation details >>> like extensions to immutables that aren't consistent with the broader >>> codebase). >>> > > >>> > >>> > > >>> > If we decide that we want to take this on, I feel like we need >>> to break it into smaller contributions and figure out the appropriate way >>> to review and assimilate the functionality in a way that's consistent with >>> the rest of the project. Due to this being security related, we should >>> take extra precautions around what this introduces and how well the >>> community understands the behaviors. >>> > > >>> > >>> > > >>> > However, looking at the complexity here relative to the >>> approach with the GCP, I have to question whether this is the right path >>> overall. The main competency of this project isn't to implement security >>> protocols, so it's a lot to say we want a full and complete (possibly with >>> extensions) native implementation of the OAuth2 specification (there are >>> whole projects built around that alone). >>> > > >>> > >>> > > >>> > Was there any exploration of leveraging other standard >>> implementations like Apache Oltu, Nimbus, etc. to build the implementation >>> off of? >>> > > >>> > >>> > > >>> > -Dan >>> > > >>> > >>> > > >>> > On Thu, Jun 19, 2025 at 5:33 AM Alex Dutra >>> <[email protected]> wrote: >>> > > >>> >> >>> > > >>> >> Hi Ryan & JB, hi all, >>> > > >>> >> >>> > > >>> >> I think it would be easier to introduce this new manager as an >>> > > >>> >> alternative manager. This would make the migration smoother >>> as it >>> > > >>> >> would give users time to migrate at their convenience. >>> Besides, the >>> > > >>> >> new manager has the notion of "dialects", and can be >>> configured to >>> > > >>> >> behave exactly like the current one (honoring the same config >>> > > >>> >> options), making the migration even easier. >>> > > >>> >> >>> > > >>> >> > Why not contribute the functionality directly to the >>> AuthManager already in Iceberg? Is this incompatible or is there a reason >>> the current one can't be extended through contributions? >>> > > >>> >> >>> > > >>> >> There are a few reasons why I believe it's not possible to >>> extend the >>> > > >>> >> current manager indefinitely: >>> > > >>> >> >>> > > >>> >> 1. The current auth manager lives in iceberg-core; as we >>> introduce >>> > > >>> >> more features, it will become impractical to keep it there, >>> especially >>> > > >>> >> since some of the features will require third-party >>> dependencies. As a >>> > > >>> >> data point: the new manager contains almost 100 Java >>> production >>> > > >>> >> classes (not counting test classes and build scripts). >>> > > >>> >> 2. The current auth manager has some well known shortcomings, >>> notably >>> > > >>> >> around token refreshes. It's not possible to fix that without >>> > > >>> >> introducing regressions and potentially breaking many catalog >>> clients >>> > > >>> >> already in production. >>> > > >>> >> 3. As we introduce features like Authorization Code grant >>> support, >>> > > >>> >> interactions with the IDP will become more complex than just a >>> > > >>> >> request-response cycle. Since most of the current logic >>> resides in the >>> > > >>> >> OAuth2Util class, which is entirely public, it won't be an >>> easy task >>> > > >>> >> to introduce support for such complex flows while avoiding >>> binary >>> > > >>> >> incompatibilities. >>> > > >>> >> >>> > > >>> >> Thanks, >>> > > >>> >> Alex >>> > > >>> >> >>> > > >>> >> >>> > > >>> >> On Wed, Jun 18, 2025 at 11:35 PM Jean-Baptiste Onofré < >>> [email protected]> wrote: >>> > > >>> >> > >>> > > >>> >> > Hi >>> > > >>> >> > >>> > > >>> >> > I think it makes sense to directly add in AuthManager. I >>> don't see >>> > > >>> >> > blockers (with some adaptations). Alex ? >>> > > >>> >> > >>> > > >>> >> > From a donation process standpoint (if accepted), I'm happy >>> to help >>> > > >>> >> > with the SGA and IP Clearance. >>> > > >>> >> > >>> > > >>> >> > Regards >>> > > >>> >> > JB >>> > > >>> >> > >>> > > >>> >> > On Wed, Jun 18, 2025 at 9:15 PM Ryan Blue <[email protected]> >>> wrote: >>> > > >>> >> > > >>> > > >>> >> > > I think it would be great to bring this functionality >>> into Iceberg. I'm curious about your plan for getting it in. It sounds like >>> you're suggesting adding the Dremio project to the Iceberg repo and making >>> it optional. Why not contribute the functionality directly to the >>> AuthManager already in Iceberg? Is this incompatible or is there a reason >>> the current one can't be extended through contributions? >>> > > >>> >> > > >>> > > >>> >> > > On Tue, Jun 17, 2025 at 11:23 AM Christian Thiel < >>> [email protected]> wrote: >>> > > >>> >> > >> >>> > > >>> >> > >> Hey Alex, >>> > > >>> >> > >> >>> > > >>> >> > >> Thanks for the Initiative — I really appreciate the >>> effort here! >>> > > >>> >> > >> >>> > > >>> >> > >> Having good auth compatibility in the Catalog ecosystem >>> is key to establish secure standards by making them easy to use. While >>> Iceberg should stay open to other means of Authentication, OAuth2 is the >>> most widely adopted interoperable auth standard, and its role in Iceberg >>> REST reflects that. But with human-centric flows like Auth Code (with PKCE >>> 😉) and Device Code missing from most standard clients, users often default >>> to handing out personal Client ID/secret pairs—which is really bad from a >>> security perspective. >>> > > >>> >> > >> >>> > > >>> >> > >> While I can’t speak to the Java details, I fully support >>> bringing the functionality into Iceberg. I have tested the proposed code >>> successfully with Spark and different IdPs, including Auth & Device Code >>> flows with token refresh, as well as token refresh for Client Credential >>> flows. >>> > > >>> >> > >> >>> > > >>> >> > >> Thanks! >>> > > >>> >> > >> >>> > > >>> >> > >> Christian >>> > > >>> >> > >> >>> > > >>> >> > >> >>> > > >>> >> > >> >>> > > >>> >> > >> On Mon, 16 Jun 2025 at 20:33, Alex Dutra >>> <[email protected]> wrote: >>> > > >>> >> > >>> >>> > > >>> >> > >>> Hi all, >>> > > >>> >> > >>> >>> > > >>> >> > >>> Dremio recently open-sourced a new implementation of >>> the Auth Manager >>> > > >>> >> > >>> API for OAuth2: >>> > > >>> >> > >>> >>> > > >>> >> > >>> https://github.com/dremio/iceberg-auth-manager >>> > > >>> >> > >>> >>> > > >>> >> > >>> I wrote a blog post about it a while ago [1]. >>> > > >>> >> > >>> >>> > > >>> >> > >>> Built on top of the Auth Manager API introduced in >>> Iceberg 1.9.0, this >>> > > >>> >> > >>> project provides a more flexible and extensible OAuth2 >>> manager >>> > > >>> >> > >>> compared to the built-in equivalent in Iceberg Core. It >>> follows OAuth2 >>> > > >>> >> > >>> standards strictly, but also provides compatibility >>> with any existing >>> > > >>> >> > >>> Apache Iceberg REST catalog, and contains no >>> Dremio-specific >>> > > >>> >> > >>> functionality. To date, this is the only OAuth2 manager >>> fully >>> > > >>> >> > >>> compliant with external identity providers. >>> > > >>> >> > >>> >>> > > >>> >> > >>> Dremio would like to contribute this code to the Apache >>> Iceberg >>> > > >>> >> > >>> project. I am therefore initiating this discussion to >>> determine the >>> > > >>> >> > >>> community's interest in accepting this donation. >>> > > >>> >> > >>> >>> > > >>> >> > >>> This project is beneficial to the community because it >>> addresses >>> > > >>> >> > >>> well-known limitations, such as token refresh problems >>> [2][3][4], and >>> > > >>> >> > >>> also because it introduces highly anticipated features >>> like the >>> > > >>> >> > >>> Authorization Code grant support [5]. Fixing these >>> limitations or >>> > > >>> >> > >>> adding support for such large features in the built-in >>> manager, while >>> > > >>> >> > >>> avoiding any risk of regressions, would have been a lot >>> harder. >>> > > >>> >> > >>> >>> > > >>> >> > >>> Also worth mentioning: this project adheres to the >>> "Iceberg OAuth2 >>> > > >>> >> > >>> Client Authentication Guide", proposed by Christian >>> Thiel [6]. >>> > > >>> >> > >>> >>> > > >>> >> > >>> This project could initially serve as a >>> runtime-selectable alternative >>> > > >>> >> > >>> to the current built-in implementation. Upon reaching >>> sufficient >>> > > >>> >> > >>> maturity however, it could potentially replace the >>> existing manager. >>> > > >>> >> > >>> >>> > > >>> >> > >>> Please share your thoughts by replying to this email. >>> Alternatively, >>> > > >>> >> > >>> we can discuss this topic at the Catalog Sync meeting >>> this Wednesday, >>> > > >>> >> > >>> June 18th, if that is a more comfortable option to >>> everyone. >>> > > >>> >> > >>> >>> > > >>> >> > >>> Thanks, >>> > > >>> >> > >>> >>> > > >>> >> > >>> Alex >>> > > >>> >> > >>> >>> > > >>> >> > >>> [1] >>> https://medium.com/data-engineering-with-dremio/introducing-dremio-auth-manager-for-apache-iceberg-223827342d19 >>> > > >>> >> > >>> [2]: https://github.com/apache/iceberg/issues/12196 >>> > > >>> >> > >>> [3]: https://github.com/apache/iceberg/issues/12363 >>> > > >>> >> > >>> [4]: https://github.com/apache/iceberg/issues/13030 >>> > > >>> >> > >>> [5]: https://github.com/apache/iceberg/issues/10677 >>> > > >>> >> > >>> [6]: >>> https://docs.google.com/document/d/1buW9PCNoHPeP7Br5_vZRTU-_3TExwLx6bs075gi94xc/edit?tab=t.0#heading=h.hufqidg1ij89 >>> >>
