I’m okay with moving forward using the catalog-only planning mode, but I’d still like a clean, official way to export or deregister a table from the catalog. I don’t want to block progress here, so I’ll open a separate thread soon to discuss this as a potential new feature.
Thanks, Peter On Fri, Feb 6, 2026, 23:43 Prashant Singh <[email protected]> wrote: > Thank you for the feedbacks everyone, we discussed this topic in the > community Sync[1], I wanted to share couple of things we discussed : > 1/ Have client and catalog mode only for now as for preference mode we can > park that as catalog can always send back 429 if they are on load (I have > updated the PR with the same [2]) > 2/ We discussed about the benefits of the scan planning i.e sharing / > cross format interop / governance along with able to support low memory > budget clients, and walked though some existing products > that are based / used early versions of remote scan planning i.e SageMaker > Lakehouse [3] and also discussed how read only tables are handled as of > today > 3/ We did bring up a way to export the table to another catalog and > touched upon : > a. using the metadata.json in the loadTable response and using that in the > register table call > b. having dedicate endpoint for export > > Please let us know if we are fine to proceed or we need more discussion as > while discussing 3 we ran short on time (we took ~40 minutes itself to this > topic), Happy to facilitate that. > > > [1] > https://docs.google.com/document/d/1iPGVCIcr-M0XtAiudOguWAvmqIdVgpYN5vz5ohO8PKw/edit?tab=t.0#bookmark=id.ishstxjeoz1z > [2] https://github.com/apache/iceberg/pull/14867 > [3] > https://docs.aws.amazon.com/sagemaker-lakehouse-architecture/latest/userguide/rms-integration.html > > Best, > Prashant Singh > > On Wed, Feb 4, 2026 at 1:19 AM Péter Váry <[email protected]> > wrote: > >> > I'm a little concerned about using the REST spec as a means to force >> portability on implementations. I feel that level of requirement could >> result in a reluctance to provide interoperability which would limit access >> to data or normalize non-compliance with the spec. Ultimately, I feel user >> demand will drive the goals of openness and portability, which is a trend >> we see across the ecosystem and continues to drive interest in open formats >> and standards. >> >> If we feel strongly about this, we could define the deregister/export >> operation as an optional endpoint. Catalog implementations could choose >> whether to support it, allowing users to make informed decisions based on >> feature availability when selecting a catalog. Once the feature becomes >> broadly adopted, we could move the endpoint into the set of required table >> endpoints. >> >> >> Daniel Weeks <[email protected]> ezt írta (időpont: 2026. jan. 28., Sze, >> 20:38): >> >>> I think there's good reason to consider a "deregister" or "export" like >>> functionality given that there isn't a clear path to hand off ownership of >>> a table between catalogs. This is a slightly different motivation for >>> similar functionality, but shares the same underlying goal of improving >>> portability. >>> >>> Even without this, there are ways to capture the metadata (e.g. persist >>> the json response and use that as the metadata reference for registering), >>> so I don't think the absence of a physical json file is really a blocker. >>> We originally wanted to preserve the physical representation to both adhere >>> to the spec language regarding how commits are effected and to ensure >>> access for older clients that do not support the REST Catalog. At this >>> point, REST support is nearing ubiquity and the metadata representation is >>> still available in some form (though less convenient for direct file >>> reference). >>> >>> I'm a little concerned about using the REST spec as a means to force >>> portability on implementations. I feel that level of requirement could >>> result in a reluctance to provide interoperability which would limit access >>> to data or normalize non-compliance with the spec. Ultimately, I feel user >>> demand will drive the goals of openness and portability, which is a trend >>> we see across the ecosystem and continues to drive interest in open formats >>> and standards. >>> >>> -Dan >>> >>> On Wed, Jan 28, 2026 at 7:55 AM Russell Spitzer < >>> [email protected]> wrote: >>> >>>> Prior to the introduction of CATALOG_ONLY tables, reading a table >>>>> implicitly required that the full table metadata be accessible to readers. >>>>> This made it possible to migrate a table between catalog implementations >>>>> by >>>>> simply pointing */v1/{prefix}/namespaces/{namespace}/register* to the >>>>> existing metadata.json, assuming the appropriate user privileges were in >>>>> place. >>>> >>>> >>>> This actually hasn’t been the case for quite a while across several >>>> vendors (though not the one I work at — we still expose full metadata). >>>> There’s nothing preventing, and in fact several vendors are already, >>>> shipping Iceberg metadata that does not strictly represent the table. >>>> Properties, snapshots, or even the table itself can redirect to another >>>> representation of the same table, leaving no way to recover a true “ground >>>> truth” view via the REST API. I’m also aware of folks shipping different >>>> versions of the metadata or exposing what is essentially a read-only >>>> metadata.json layered on top of a table in another format. So I think >>>> the ship has largely sailed on relying on metadata as a guaranteed >>>> canonical view. >>>> >>>> I do think it’s still important to preserve *portability*, or at least >>>> to make it clear to end users whether or not their tables will be portable. >>>> With that in mind, I was wondering if we should introduce an explicit >>>> catalog export command that is essentially the inverse of register. >>>> Unlike loadTable, it would be required to produce the path of a >>>> metadata.json that represents the entire Iceberg table without >>>> modification. >>>> >>>> That would give catalogs a clear way to signal whether they support >>>> “unregistering” a table in a way that lets it be used in another system. We >>>> could also scope permissions for this functionality so that only specific >>>> users are allowed to perform an export. >>>> >>>> >>>> >>>> On Wed, Jan 28, 2026 at 5:42 AM Péter Váry <[email protected]> >>>> wrote: >>>> >>>>> > I am not sure about the concern for lock-in. Users are free to adopt >>>>> any catalog that is spec compliant. Catalog-only tables are not the >>>>> choices >>>>> of the catalog vendor/provider, it is the choice of the table owner by >>>>> users for access control. >>>>> >>>>> Prior to the introduction of CATALOG_ONLY tables, reading a table >>>>> implicitly required that the full table metadata be accessible to readers. >>>>> This made it possible to migrate a table between catalog implementations >>>>> by >>>>> simply pointing */v1/{prefix}/namespaces/{namespace}/register* to the >>>>> existing metadata.json, assuming the appropriate user privileges were in >>>>> place. >>>>> >>>>> With CATALOG_ONLY tables, this implicit requirement is removed, and no >>>>> alternative requirement is introduced. As a result, migrating the complete >>>>> history of a table may become impossible without performing a manual >>>>> traversal of the plan(s) and metadata. >>>>> >>>>> What I am suggesting is that the ability to re‑register an Iceberg >>>>> table with a different catalog should be an explicit requirement for a >>>>> spec‑compliant catalog. >>>>> >>>>> > Also this proposal doesn't say that the write path shouldn't produce >>>>> the metadata.json file, which is still required today to be spec >>>>> compliant. >>>>> >>>>> The Iceberg table specification describes metadata.json and manifest >>>>> files, but after this change a catalog could be fully compliant with the >>>>> Iceberg REST Catalog specification while still not exposing these files in >>>>> a way that is accessible to users. This would effectively prevent use >>>>> cases >>>>> such as migrating tables between catalogs. >>>>> >>>>> >>>>> Steven Wu <[email protected]> ezt írta (időpont: 2026. jan. 26., >>>>> H, 20:33): >>>>> >>>>>> catching up on this thread. >>>>>> >>>>>> I am not sure about the concern for lock-in. Users are free to adopt >>>>>> any catalog that is spec compliant. Catalog-only tables are not the >>>>>> choices >>>>>> of the catalog vendor/provider, it is the choice of the table owner by >>>>>> users for access control. >>>>>> >>>>>> Also this proposal doesn't say that the write path shouldn't produce >>>>>> the metadata.json file, which is still required today to be spec >>>>>> compliant. >>>>>> It is just that clients may not need to load the metadata.json (and >>>>>> manifest list, manifest files) directly for client-side scan planning. >>>>>> >>>>>> I also like Dan's suggestion of not including client >>>>>> preference/config in the spec. >>>>>> >>>>>> > I want to highlight that introducing "CATALOG_ONLY" planners >>>>>> implicitly creates a new requirement for all compliant engines. Without >>>>>> support for this, engines would be unable to read these new tables. This >>>>>> seems like a significant change that we should call out explicitly. >>>>>> >>>>>> Agree with Peter that this is a significant new requirement for >>>>>> engines. Iceberg libraries (Java or other languages) can probably hide it >>>>>> internally in the scan planning implementation. Some engines may not use >>>>>> Iceberg libraries. This would be a new requirement. >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jan 20, 2026 at 4:55 PM Prashant Singh < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Thank you Peter, I will go ahead and find a slot that works for most >>>>>>> of the folks interested in the discussion and put it in dev calendar ~ >>>>>>> >>>>>>> Regarding Agenda : I would request to keep the discussion contained >>>>>>> in context of what does this mean to have a mode of planning like >>>>>>> catalog_only its use cases >>>>>>> and side effects, for example READ only tables is something that can >>>>>>> be done as of today, infacts folks use this in production, for example: >>>>>>> tools such as Apache Xtable (incubating) or Uniform where one generates >>>>>>> iceberg metadata on top of >>>>>>> existing data files, having CATALOG_ONLY doesn't change much except >>>>>>> the fact that now that fake metadata doesn't need to be written, but it >>>>>>> was >>>>>>> fake in the first place as an iceberg client didn't generate it and >>>>>>> catalog >>>>>>> is already fully capable of doing that. >>>>>>> >>>>>>> With that being said, I will definitely put all your suggestions on >>>>>>> the agenda, let's discuss this more in depth, to understand the feedback >>>>>>> better. I also wanna include the types of mode discussion. Maybe we >>>>>>> should >>>>>>> just keep client_only and catalog_only for now ? since preference is too >>>>>>> much for the first phase ? >>>>>>> >>>>>>> Please let me circle back with concrete time, meeting links etc, i >>>>>>> will post it here ! >>>>>>> >>>>>>> Best, >>>>>>> Prashant Singh >>>>>>> >>>>>>> On Sat, Jan 17, 2026 at 11:28 PM Péter Váry < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi Prashant, >>>>>>>> >>>>>>>> I agree that having a dedicated sync makes a lot of sense. I’d >>>>>>>> suggest the following agenda items: >>>>>>>> >>>>>>>> 1. *Read-only tables* >>>>>>>> During the early discussions around the File Format API, I >>>>>>>> suggested starting with the read path, as this would allow us to >>>>>>>> integrate >>>>>>>> new data sources more quickly. At the time, there were strong >>>>>>>> objections, >>>>>>>> with the argument that every Iceberg table should be fully readable and >>>>>>>> writable through Iceberg in order to be considered a “real” Iceberg >>>>>>>> table. >>>>>>>> I’m interested to understand whether this position has changed since >>>>>>>> then. >>>>>>>> >>>>>>>> 2. *Table migration* >>>>>>>> I see clear benefits in generating table metadata on the fly (e.g., >>>>>>>> easier integration with fast-changing systems, stricter security >>>>>>>> models, >>>>>>>> and potential performance gains). My concern is that, if we allow this >>>>>>>> without constraints, a fully compliant Iceberg catalog could choose >>>>>>>> not to >>>>>>>> materialize metadata at all. This would make migration to another >>>>>>>> compliant >>>>>>>> Iceberg catalog much harder. Openness and easy migration are major >>>>>>>> selling >>>>>>>> points of Iceberg, and I think we should continue to enforce those >>>>>>>> values. >>>>>>>> >>>>>>>> 3. *Engine compatibility* >>>>>>>> I want to highlight that introducing "CATALOG_ONLY" planners >>>>>>>> implicitly creates a new requirement for all compliant engines. Without >>>>>>>> support for this, engines would be unable to read these new tables. >>>>>>>> This >>>>>>>> seems like a significant change that we should call out explicitly. >>>>>>>> >>>>>>>> 4. *CATALOG_ONLY tables* >>>>>>>> If we reach agreement on the points above, I think the decision on >>>>>>>> this topic will naturally follow. >>>>>>>> >>>>>>>> My current perspective on these topics: >>>>>>>> >>>>>>>> 1. *Read-only tables* >>>>>>>> I like this idea, as it would allow Iceberg catalogs to more easily >>>>>>>> expose external databases such as Delta, Lance, and others. My main >>>>>>>> hesitation is that I’ve proposed this before and it was strongly >>>>>>>> rejected >>>>>>>> by the community. >>>>>>>> >>>>>>>> 2. *Table migration* >>>>>>>> My concern is that we may be taking incremental steps away from >>>>>>>> Iceberg’s original position of full compliance, easy migration, and >>>>>>>> broad >>>>>>>> compatibility, toward a more closed, catalog-bounded model. I’d like >>>>>>>> us to >>>>>>>> step back and clearly define our core values, then enforce them in the >>>>>>>> specification. This could be as simple as a few sentences in the >>>>>>>> "LoadTableResponse" description requiring a way (for some users) to >>>>>>>> obtain >>>>>>>> the full metadata JSON along with the corresponding manifest and data >>>>>>>> files, or perhaps a dedicated migration endpoint that allows one >>>>>>>> catalog to >>>>>>>> take over a table from another. >>>>>>>> >>>>>>>> 3. *Engine compatibility* >>>>>>>> I have the sense that this “small” enum change actually introduces >>>>>>>> a fairly large new requirement for engines, and I want to make sure we >>>>>>>> explicitly highlight that. >>>>>>>> >>>>>>>> 4. *CATALOG_ONLY tables* >>>>>>>> As above, I think our answers to the earlier questions will >>>>>>>> effectively determine our position here. >>>>>>>> >>>>>>>> Overall, I like your proposal, but in a few areas it seems to move >>>>>>>> us in a different direction from what we previously agreed on. I’d >>>>>>>> like to >>>>>>>> understand whether the community is aligned with this new direction. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Peter >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jan 15, 2026, 20:34 Prashant Singh < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Thank you for the discussion everyone, >>>>>>>>> really appreciate all of you taking time ! >>>>>>>>> >>>>>>>>> Unfortunately we were not able to discuss this in the catalog sync >>>>>>>>> this week, since we ran out of time, I was wondering if all the >>>>>>>>> interested >>>>>>>>> folks would be open to a discussion. >>>>>>>>> I can go ahead and request one in the iceberg calendar. >>>>>>>>> >>>>>>>>> Peter : >>>>>>>>> >>>>>>>>> > With the introduction of CATALOG_ONLY tables, storing Iceberg >>>>>>>>> metadata files is no longer required for any operation >>>>>>>>> >>>>>>>>> I am not sure if i fully get the concern here, the client still >>>>>>>>> writes the manifests and manifest lists to the tables which are given >>>>>>>>> to >>>>>>>>> the catalog where it creates / tracks the metadata.json, for writes >>>>>>>>> we need >>>>>>>>> to have hold of these manifests specially for cases such as >>>>>>>>> validating no >>>>>>>>> new data has been inserted to the table (conflict detection) >>>>>>>>> please ref validateAddedDataFiles [1], this can't be achieved by >>>>>>>>> scan planning at least not without breaking the existing iceberg >>>>>>>>> clients as >>>>>>>>> these validations are client side based on the isolation level, which >>>>>>>>> would >>>>>>>>> make these tables unusable with client if we want to write. >>>>>>>>> >>>>>>>>> For the tables which are read only, I am not sure if those tables >>>>>>>>> are sufficient for enforcing vendor lock in, in addition to what can >>>>>>>>> be >>>>>>>>> achieved today, I believe this would be circumvented though if we >>>>>>>>> clarify / >>>>>>>>> tighten the metadata location expectation in the spec, that it should >>>>>>>>> exactly state the state of the table as committed by clients >>>>>>>>> i.e it should have precise references to the manifest and manifest >>>>>>>>> list that the client created ? >>>>>>>>> >>>>>>>>> With that being said, I request everyone interested in this thread >>>>>>>>> please let me know if you all are open for a dedicated community >>>>>>>>> discussion >>>>>>>>> for this, happy to brainstorm together and reach consensus. >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L377 >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Prashant Singh >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jan 14, 2026 at 7:38 AM Péter Váry < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi Dan, >>>>>>>>>> >>>>>>>>>> > While it is possible and may feel like it would prevent >>>>>>>>>> interoperability, that would be easily circumvented by just copying >>>>>>>>>> the >>>>>>>>>> entire contents of the table through scan/plan. >>>>>>>>>> >>>>>>>>>> This enables the user to recreate a snapshot of the table, but it >>>>>>>>>> does not provide the full history or complete table metadata. It is >>>>>>>>>> also >>>>>>>>>> significantly more involved than simply calling the register table >>>>>>>>>> operation. >>>>>>>>>> >>>>>>>>>> > REST Catalog implementations have always been able to restrict >>>>>>>>>> access to physical storage regardless of whether a client could load >>>>>>>>>> the >>>>>>>>>> table metadata or not. >>>>>>>>>> >>>>>>>>>> Previously, this was primarily a matter of gaining access to the >>>>>>>>>> underlying storage. With the introduction of CATALOG_ONLY tables, >>>>>>>>>> storing >>>>>>>>>> Iceberg metadata files is no longer required for any operation. >>>>>>>>>> >>>>>>>>>> > there are lots of different ways closed systems can restrict >>>>>>>>>> access already (e.g. jdbc only or proprietary APIs), so I don't feel >>>>>>>>>> like >>>>>>>>>> this is changing that dynamic. >>>>>>>>>> >>>>>>>>>> I’m not sure I understand this. Could you please provide more >>>>>>>>>> details? >>>>>>>>>> >>>>>>>>>> The goal, as I understand it, is that if a Catalog implements the >>>>>>>>>> Iceberg specification, migration to and from this Catalog should be >>>>>>>>>> possible with any other Catalog that adheres to the same >>>>>>>>>> specification. >>>>>>>>>> Introducing CATALOG_ONLY tables, however, feels like another step >>>>>>>>>> away from >>>>>>>>>> interoperability. >>>>>>>>>> >>>>>>>>>> > I think the motivation behind catalog only mode is more for >>>>>>>>>> cases where the underlying data is either in a different >>>>>>>>>> representation or >>>>>>>>>> is being adapted on-the-fly. For example, if you wanted to expose a >>>>>>>>>> table >>>>>>>>>> from a database that can export data to parquet, but doesn't natively >>>>>>>>>> support Iceberg as a format, you can hide that behind scan plan >>>>>>>>>> interfaces. >>>>>>>>>> >>>>>>>>>> Using the Scan Planning interface has been optional until now, >>>>>>>>>> but with the introduction of CATALOG_ONLY tables, it becomes >>>>>>>>>> mandatory. As >>>>>>>>>> a result, compliant engines will need to implement it. >>>>>>>>>> >>>>>>>>>> > There may not be a full representation of the table metadata >>>>>>>>>> but using a subset of Iceberg primitives, you can still achieve >>>>>>>>>> interoperability (at least for read). >>>>>>>>>> >>>>>>>>>> In earlier discussions, we agreed that tables should not >>>>>>>>>> implement only a subset of the Iceberg specification. This proposal >>>>>>>>>> seems >>>>>>>>>> to move in a different direction. While I’m not opposed to the >>>>>>>>>> feature and >>>>>>>>>> recognize the benefits of integrating non-Iceberg tables into Iceberg >>>>>>>>>> catalogs and making them queryable by compatible engines, I believe >>>>>>>>>> it >>>>>>>>>> would be useful to clarify our current understanding of the >>>>>>>>>> boundaries and >>>>>>>>>> the level of feature parity we aim to maintain. Establishing this >>>>>>>>>> would >>>>>>>>>> provide a consistent framework for evaluating similar proposals going >>>>>>>>>> forward. >>>>>>>>>> >>>>>>>>>> This seems like a good candidate for today’s catalog sync >>>>>>>>>> discussion. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Peter >>>>>>>>>> >>>>>>>>>> Daniel Weeks <[email protected]> ezt írta (időpont: 2026. jan. >>>>>>>>>> 14., Sze, 0:23): >>>>>>>>>> >>>>>>>>>>> I don't feel we should be too concerned about catalogs switching >>>>>>>>>>> to a "catalog only" mode and not providing direct access. While it >>>>>>>>>>> is >>>>>>>>>>> possible and may feel like it would prevent interoperability, that >>>>>>>>>>> would be >>>>>>>>>>> easily circumvented by just copying the entire contents of the table >>>>>>>>>>> through scan/plan. I wouldn't agree there was implied access just >>>>>>>>>>> by >>>>>>>>>>> having a metadata-location field either. REST Catalog >>>>>>>>>>> implementations have >>>>>>>>>>> always been able to restrict access to physical storage regardless >>>>>>>>>>> of >>>>>>>>>>> whether a client could load the table metadata or not. I >>>>>>>>>>> understand the >>>>>>>>>>> concern about lock-in, but there are lots of different ways closed >>>>>>>>>>> systems >>>>>>>>>>> can restrict access already (e.g. jdbc only or proprietary APIs), >>>>>>>>>>> so I >>>>>>>>>>> don't feel like this is changing that dynamic. >>>>>>>>>>> >>>>>>>>>>> I think the motivation behind catalog only mode is more for >>>>>>>>>>> cases where the underlying data is either in a different >>>>>>>>>>> representation or >>>>>>>>>>> is being adapted on-the-fly. For example, if you wanted to expose >>>>>>>>>>> a table >>>>>>>>>>> from a database that can export data to parquet, but doesn't >>>>>>>>>>> natively >>>>>>>>>>> support Iceberg as a format, you can hide that behind scan plan >>>>>>>>>>> interfaces. There may not be a full representation of the table >>>>>>>>>>> metadata >>>>>>>>>>> but using a subset of Iceberg primitives, you can still achieve >>>>>>>>>>> interoperability (at least for read). >>>>>>>>>>> >>>>>>>>>>> Introducing modes just is a way to express the >>>>>>>>>>> intent/availability for the scan plan and coordinate between the >>>>>>>>>>> client and >>>>>>>>>>> server, but I don't think it really affects whether a client could >>>>>>>>>>> be >>>>>>>>>>> prevented from reading table data directly (a catalog can do that >>>>>>>>>>> regardless). >>>>>>>>>>> >>>>>>>>>>> I would add that I don't think the spec should include anything >>>>>>>>>>> about the client modes (I added a comment to the PR on this). The >>>>>>>>>>> spec >>>>>>>>>>> should only indicate what the server can return and what the >>>>>>>>>>> expectations >>>>>>>>>>> should be for a client. What a client implements and what >>>>>>>>>>> configurations >>>>>>>>>>> it exposes is more of a client-side implementation detail and >>>>>>>>>>> should not be >>>>>>>>>>> part of the spec. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -Dan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Jan 13, 2026 at 11:07 AM Prashant Singh < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello Peter, >>>>>>>>>>>> Thank you for the feedback. >>>>>>>>>>>> >>>>>>>>>>>> IIUC, you mean to say an interpretation, could be a dummy file >>>>>>>>>>>> which would in worst case simply not exist ? sure i believe we can >>>>>>>>>>>> be >>>>>>>>>>>> explicit there to avoid this. >>>>>>>>>>>> Note: this is predating this proposal though and happy to take >>>>>>>>>>>> a stab in being explicit here. >>>>>>>>>>>> >>>>>>>>>>>> > users were required to have direct read access to the >>>>>>>>>>>> metadata files in order to plan queries on the table. That implied >>>>>>>>>>>> an >>>>>>>>>>>> access requirement, even though it was never explicitly documented >>>>>>>>>>>> >>>>>>>>>>>> while the requirement is true but it's not like every user >>>>>>>>>>>> would get credentials to do so, it was strictly based on if the >>>>>>>>>>>> user is >>>>>>>>>>>> authorized to read the table based on the privileges defined in the >>>>>>>>>>>> catalog, loadTable's credential was optional meaning if a catalog >>>>>>>>>>>> wants it >>>>>>>>>>>> could very well not vend any credentials despite the client >>>>>>>>>>>> sending X-Iceberg-Access-Delegation due to this [1] and hence >>>>>>>>>>>> they can >>>>>>>>>>>> cut off any client if they want to. I believe the flexibility >>>>>>>>>>>> is there because we don't define authorization in IRC spec. As >>>>>>>>>>>> i said the admin is the one who had given the access to storage to >>>>>>>>>>>> the >>>>>>>>>>>> catalog in the first place so it can very well revoke that access >>>>>>>>>>>> to >>>>>>>>>>>> storage and migrate if the catalog is misbehaving by calling every >>>>>>>>>>>> table to >>>>>>>>>>>> itself to do planning and can move to a different catalog if the >>>>>>>>>>>> culprit >>>>>>>>>>>> catalog doesn't fix it. >>>>>>>>>>>> >>>>>>>>>>>> > Maybe we add a sentence in the spec to enforce that there >>>>>>>>>>>> should be some users where the catalog MUST provide access to the >>>>>>>>>>>> metadata >>>>>>>>>>>> files. >>>>>>>>>>>> >>>>>>>>>>>> Regarding the original feedback, there will always be an ADMIN >>>>>>>>>>>> user who has configured the catalog in the first place with the >>>>>>>>>>>> storage >>>>>>>>>>>> permission (lets say proving the IAM and establishing the trust >>>>>>>>>>>> relationship) who can get hold of the storage directly and access >>>>>>>>>>>> those >>>>>>>>>>>> metadata files directly from storage. So some are implicit in that >>>>>>>>>>>> sense. >>>>>>>>>>>> >>>>>>>>>>>> I believe by introducing CATALOG only mode for planning on >>>>>>>>>>>> existing assumptions we are not introducing new ways to trap end >>>>>>>>>>>> users in >>>>>>>>>>>> getting into vendor lock-in and like always existed a user has a >>>>>>>>>>>> way to >>>>>>>>>>>> walk out of it with the constructs. >>>>>>>>>>>> >>>>>>>>>>>> Please let me know what WDYT is considering above ? >>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>> https://github.com/apache/iceberg/blob/fc434997fbc63a3f1f47481c0878073b1ccf6359/open-api/rest-catalog-open-api.yaml#L1886-L1887 >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Prashant Singh >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jan 13, 2026 at 6:11 AM Péter Váry < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Prashant, >>>>>>>>>>>>> >>>>>>>>>>>>> The specification states: >>>>>>>>>>>>> >>>>>>>>>>>>>> The corresponding file location of table metadata should be >>>>>>>>>>>>>> returned in the `metadata-location` field >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> However, it does not specify that this location must be >>>>>>>>>>>>> readable by any users. (Perhaps this is something we should >>>>>>>>>>>>> revisit and >>>>>>>>>>>>> clarify going forward.) >>>>>>>>>>>>> >>>>>>>>>>>>> Before the introduction of CATALOG_ONLY tables, users were >>>>>>>>>>>>> required to have direct read access to the metadata files in >>>>>>>>>>>>> order to plan >>>>>>>>>>>>> queries on the table. That implied an access requirement, even >>>>>>>>>>>>> though it >>>>>>>>>>>>> was never explicitly documented. With the introduction of >>>>>>>>>>>>> CATALOG_ONLY, >>>>>>>>>>>>> this implicit requirement no longer applies, and we currently do >>>>>>>>>>>>> not have >>>>>>>>>>>>> an explicit requirement defined in the specification either. >>>>>>>>>>>>> >>>>>>>>>>>>> Prashant Singh <[email protected]> ezt írta (időpont: >>>>>>>>>>>>> 2026. jan. 12., H, 23:33): >>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you for the feedback everyone ! >>>>>>>>>>>>>> >>>>>>>>>>>>>> Eduard : I am open to being it named _ENFORCED or even not >>>>>>>>>>>>>> having _ONLY or _ENFORCED in the first place as Dan suggested >>>>>>>>>>>>>> here, please >>>>>>>>>>>>>> let me know if you are ok with that as per [1] >>>>>>>>>>>>>> >>>>>>>>>>>>>> Amogh : Thank you for the feedback on the _preference mode, i >>>>>>>>>>>>>> tried to document some concrete use cases that could benefit >>>>>>>>>>>>>> with it [2] as >>>>>>>>>>>>>> I believe it can provide some options for the catalog and client >>>>>>>>>>>>>> to >>>>>>>>>>>>>> negotiate when they are open to it please let me know wdyt ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Peter : I believe such kind of vendor locking would not be >>>>>>>>>>>>>> possible to have since the model we are going after i.e in the >>>>>>>>>>>>>> loadTable >>>>>>>>>>>>>> itself we get back the metadata pointer which is self describing >>>>>>>>>>>>>> and can be >>>>>>>>>>>>>> used to register this table in the new catalog, also the way the >>>>>>>>>>>>>> catalog >>>>>>>>>>>>>> (irc) specially has been laid out it decouple compute from >>>>>>>>>>>>>> storage >>>>>>>>>>>>>> so in the end it's the Admin user of the catalog which has >>>>>>>>>>>>>> given the catalog admin cred which gets scoped down based on the >>>>>>>>>>>>>> grants it >>>>>>>>>>>>>> had to the catalog defined and the ADMIN can simply revoke the >>>>>>>>>>>>>> catalog from >>>>>>>>>>>>>> doing it or can configure a new catalog with a different admin >>>>>>>>>>>>>> storage >>>>>>>>>>>>>> creds. >>>>>>>>>>>>>> I tried elaborating more on this on the PR feedback too [3] >>>>>>>>>>>>>> please let me know what wdyt ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I will be on top of both the PR and thread moving forward ! >>>>>>>>>>>>>> Appreciate all your feedback. >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1] >>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/14867#discussion_r2673087002 >>>>>>>>>>>>>> [2] >>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/14867#discussion_r2678941794 >>>>>>>>>>>>>> [3] >>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/14867#discussion_r2678376025 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Prashant Singh >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Jan 9, 2026 at 10:34 PM Péter Váry < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have a concern about some catalogs starting to make every >>>>>>>>>>>>>>> table `CATALOG_ONLY`, which would essentially lock users to the >>>>>>>>>>>>>>> catalog >>>>>>>>>>>>>>> without providing a way to migrate the data to another catalog. >>>>>>>>>>>>>>> Maybe we add a sentence in the spec to enforce, that there >>>>>>>>>>>>>>> should be some users where the catalog MUST provide access to >>>>>>>>>>>>>>> the metadata >>>>>>>>>>>>>>> files. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> WDYT? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Jan 8, 2026, 18:38 Amogh Jahagirdar < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I did a pass over PR but I guess I'm a little skeptical on >>>>>>>>>>>>>>>> what notion of "preferences" truly gets us in the protocol. In >>>>>>>>>>>>>>>> case the >>>>>>>>>>>>>>>> endpoint is available but not enforced, my mental model is to >>>>>>>>>>>>>>>> just let the >>>>>>>>>>>>>>>> client make whatever choice it wants. If a server really >>>>>>>>>>>>>>>> thinks it's >>>>>>>>>>>>>>>> advantageous to use the remote planning, I'd think it'd just >>>>>>>>>>>>>>>> say server >>>>>>>>>>>>>>>> side planning is enforced. For the "momentary load" case, all >>>>>>>>>>>>>>>> a client >>>>>>>>>>>>>>>> would need to do is just handle the server throttling and >>>>>>>>>>>>>>>> fallback to a >>>>>>>>>>>>>>>> client side planning (don't think the protocol needs to expand >>>>>>>>>>>>>>>> just for >>>>>>>>>>>>>>>> that). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Jan 7, 2026 at 11:28 AM Russell Spitzer < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm in agreement with Prashsant's current plan, I have no >>>>>>>>>>>>>>>>> preference on naming of Only vs Enforced" >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Jan 7, 2026 at 4:42 AM Eduard Tudenhöfner < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Instead of calling it "ONLY", maybe "ENFORCED" would be a >>>>>>>>>>>>>>>>>> better term? I think that would more naturally express the >>>>>>>>>>>>>>>>>> behavior without >>>>>>>>>>>>>>>>>> having to define what "ONLY" really means. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Dec 24, 2025 at 12:05 AM Prashant Singh < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> *Hi everyone,* >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> *JB:* Mostly yes, but it's more about what the server >>>>>>>>>>>>>>>>>>> wants the client to do. The server can indicate if it >>>>>>>>>>>>>>>>>>> supports a mode or >>>>>>>>>>>>>>>>>>> not via the /v1/config endpoint at this point. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> *Russell:* Thank you for the thorough feedback! I think >>>>>>>>>>>>>>>>>>> it is a great idea to break the optional mode into *Prefer >>>>>>>>>>>>>>>>>>> Client | Prefer Catalog*—it really opens up a lot of >>>>>>>>>>>>>>>>>>> interesting use cases. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> For example, the server might support planning but, due >>>>>>>>>>>>>>>>>>> to momentary load, wants the client to see if it's open to >>>>>>>>>>>>>>>>>>> planning on the >>>>>>>>>>>>>>>>>>> client side. Similarly, an argument can be made that if the >>>>>>>>>>>>>>>>>>> server has a >>>>>>>>>>>>>>>>>>> table cached in memory, it would prefer the client comes to >>>>>>>>>>>>>>>>>>> the server. >>>>>>>>>>>>>>>>>>> Earlier, with just the optional value, we were simply >>>>>>>>>>>>>>>>>>> falling back to >>>>>>>>>>>>>>>>>>> server or client side planning based on whether the server >>>>>>>>>>>>>>>>>>> supported scan >>>>>>>>>>>>>>>>>>> planning. Now, the client can express its own overrides via >>>>>>>>>>>>>>>>>>> catalog configs >>>>>>>>>>>>>>>>>>> as well. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Based on our offline discussion, I have incorporated the >>>>>>>>>>>>>>>>>>> feedback into the updated matrix [1] to document what the >>>>>>>>>>>>>>>>>>> planning modes >>>>>>>>>>>>>>>>>>> would be based on the server response and client overrides: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> *CLIENT_ONLY + CATALOG_ONLY* = FAIL >>>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> *One "ONLY" + opposite "PREFERRED"* = ONLY wins >>>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> *Both "PREFERRED"* = Client config wins >>>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> *Client not configured* = Use server config or >>>>>>>>>>>>>>>>>>> default >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I will update the reference implementation soon based on >>>>>>>>>>>>>>>>>>> this. I would love to know what other folks think! >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Prashant Singh >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/14867#issuecomment-3683989832 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Sat, Dec 20, 2025 at 1:26 PM Russell Spitzer < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I can imagine one more >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> (None - I would rename this) ClientOnly - Client can >>>>>>>>>>>>>>>>>>>> use Catalog Planning or Local Planning >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> PreferClient - Client should use local planning, but >>>>>>>>>>>>>>>>>>>> the plan api is available for this table — I can only >>>>>>>>>>>>>>>>>>>> imagine this would be >>>>>>>>>>>>>>>>>>>> useful for a scenario where most clients are heavy and >>>>>>>>>>>>>>>>>>>> have the resources >>>>>>>>>>>>>>>>>>>> to do local planning (or engine distributed planning) but >>>>>>>>>>>>>>>>>>>> you still want to >>>>>>>>>>>>>>>>>>>> support lightweight clients which can’t really do planning >>>>>>>>>>>>>>>>>>>> themselves. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> PreferCatalog - Client should use the plan API, but >>>>>>>>>>>>>>>>>>>> credentials have been provided to enable local planning — >>>>>>>>>>>>>>>>>>>> This is probably >>>>>>>>>>>>>>>>>>>> a transitional state as we move from clients that only >>>>>>>>>>>>>>>>>>>> support local >>>>>>>>>>>>>>>>>>>> planning to those which can use the plan api. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> CatalogOnly - Clients are not provided with the >>>>>>>>>>>>>>>>>>>> credentials required to read the table from the >>>>>>>>>>>>>>>>>>>> Metadata.json alone. If >>>>>>>>>>>>>>>>>>>> they do not implement the scan plan API they should fail >>>>>>>>>>>>>>>>>>>> fast, otherwise >>>>>>>>>>>>>>>>>>>> they will fail when they attempt to load a manifest_list >>>>>>>>>>>>>>>>>>>> file — This is >>>>>>>>>>>>>>>>>>>> used in circumstances where the catalog is giving either >>>>>>>>>>>>>>>>>>>> file specific >>>>>>>>>>>>>>>>>>>> credentials or is protecting the delivered files in some >>>>>>>>>>>>>>>>>>>> way such that >>>>>>>>>>>>>>>>>>>> their contents has been specially redacted or something >>>>>>>>>>>>>>>>>>>> like that. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I assume most catalogs will start with “ClientOnly” or >>>>>>>>>>>>>>>>>>>> “None” >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Then as Catalogs being to support planning API we will >>>>>>>>>>>>>>>>>>>> see most tables move to >>>>>>>>>>>>>>>>>>>> PreferCatalog with some perhaps extremely heavy or >>>>>>>>>>>>>>>>>>>> large tables staying as PreferClient or Client Only. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Then catalogs with special protections may have some >>>>>>>>>>>>>>>>>>>> tables return CatalogOnly so they can either scope >>>>>>>>>>>>>>>>>>>> credentials more >>>>>>>>>>>>>>>>>>>> tightly or manipulate the files that the client actually >>>>>>>>>>>>>>>>>>>> has access to in >>>>>>>>>>>>>>>>>>>> some way. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sat, Dec 20, 2025 at 1:09 AM Jean-Baptiste Onofré < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi Prashant >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> It makes sense to me. I guess we are using Catalog >>>>>>>>>>>>>>>>>>>>> properties to indicate what the REST server supports to >>>>>>>>>>>>>>>>>>>>> the client, right ? >>>>>>>>>>>>>>>>>>>>> I will take a look at the PR, but I like the idea. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>>>>>> JB >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Sat, Dec 20, 2025 at 12:53 AM Prashant Singh < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hey All, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I wanted to bring up the discussion of introducing a >>>>>>>>>>>>>>>>>>>>>> concept of rest scan planning mode which would help the >>>>>>>>>>>>>>>>>>>>>> server to instruct >>>>>>>>>>>>>>>>>>>>>> the client on how to plan the table via >>>>>>>>>>>>>>>>>>>>>> loadTableResponse or config at >>>>>>>>>>>>>>>>>>>>>> table level override. >>>>>>>>>>>>>>>>>>>>>> There are three possible values which one could think >>>>>>>>>>>>>>>>>>>>>> of : >>>>>>>>>>>>>>>>>>>>>> 1. *None* : i.e plan it on the client side, this may >>>>>>>>>>>>>>>>>>>>>> be the table is too small and the additional rest >>>>>>>>>>>>>>>>>>>>>> request would add more >>>>>>>>>>>>>>>>>>>>>> overhead than benefit. >>>>>>>>>>>>>>>>>>>>>> 2. *Optional* : client can choose to plan it either >>>>>>>>>>>>>>>>>>>>>> locally or can trigger server side planning. >>>>>>>>>>>>>>>>>>>>>> 3. *Required* : client MUST do server side planning, >>>>>>>>>>>>>>>>>>>>>> the server could suggest this if it has better indexed >>>>>>>>>>>>>>>>>>>>>> the iceberg metadata >>>>>>>>>>>>>>>>>>>>>> or client is running on low resources or the table is >>>>>>>>>>>>>>>>>>>>>> protected. Server MAY >>>>>>>>>>>>>>>>>>>>>> choose whatever way required to enforce the client cant >>>>>>>>>>>>>>>>>>>>>> bypass this for >>>>>>>>>>>>>>>>>>>>>> example let's say don't vend cred as part of loadTable >>>>>>>>>>>>>>>>>>>>>> and only mint it >>>>>>>>>>>>>>>>>>>>>> part of planning completion this would mean if the >>>>>>>>>>>>>>>>>>>>>> client doesn't call plan >>>>>>>>>>>>>>>>>>>>>> table . >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I proactively have created a pull request [1], would >>>>>>>>>>>>>>>>>>>>>> love to know all your feedback either here or in the PR >>>>>>>>>>>>>>>>>>>>>> directly ! >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Wish you all a very happy Holidays, it has been great >>>>>>>>>>>>>>>>>>>>>> working with you all. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> [1] https://github.com/apache/iceberg/pull/14867 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>> Prashant Singh >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>
