I have a concern about some catalogs starting to make every table `CATALOG_ONLY`, which would essentially lock users to the catalog without providing a way to migrate the data to another catalog. Maybe we add a sentence in the spec to enforce, that there should be some users where the catalog MUST provide access to the metadata files.
WDYT? On Thu, Jan 8, 2026, 18:38 Amogh Jahagirdar <[email protected]> wrote: > I did a pass over PR but I guess I'm a little skeptical on what notion of > "preferences" truly gets us in the protocol. In case the endpoint is > available but not enforced, my mental model is to just let the client make > whatever choice it wants. If a server really thinks it's advantageous to > use the remote planning, I'd think it'd just say server side planning is > enforced. For the "momentary load" case, all a client would need to do is > just handle the server throttling and fallback to a client side planning > (don't think the protocol needs to expand just for that). > > On Wed, Jan 7, 2026 at 11:28 AM Russell Spitzer <[email protected]> > wrote: > >> I'm in agreement with Prashsant's current plan, I have no preference on >> naming of Only vs Enforced" >> >> On Wed, Jan 7, 2026 at 4:42 AM Eduard Tudenhöfner < >> [email protected]> wrote: >> >>> Instead of calling it "ONLY", maybe "ENFORCED" would be a better term? I >>> think that would more naturally express the behavior without having to >>> define what "ONLY" really means. >>> >>> On Wed, Dec 24, 2025 at 12:05 AM Prashant Singh < >>> [email protected]> wrote: >>> >>>> *Hi everyone,* >>>> >>>> *JB:* Mostly yes, but it's more about what the server wants the client >>>> to do. The server can indicate if it supports a mode or not via the >>>> /v1/config endpoint at this point. >>>> >>>> *Russell:* Thank you for the thorough feedback! I think it is a great >>>> idea to break the optional mode into *Prefer Client | Prefer Catalog*—it >>>> really opens up a lot of interesting use cases. >>>> >>>> For example, the server might support planning but, due to momentary >>>> load, wants the client to see if it's open to planning on the client side. >>>> Similarly, an argument can be made that if the server has a table cached in >>>> memory, it would prefer the client comes to the server. Earlier, with just >>>> the optional value, we were simply falling back to server or client side >>>> planning based on whether the server supported scan planning. Now, the >>>> client can express its own overrides via catalog configs as well. >>>> >>>> Based on our offline discussion, I have incorporated the feedback into >>>> the updated matrix [1] to document what the planning modes would be based >>>> on the server response and client overrides: >>>> >>>> - >>>> >>>> *CLIENT_ONLY + CATALOG_ONLY* = FAIL >>>> - >>>> >>>> *One "ONLY" + opposite "PREFERRED"* = ONLY wins >>>> - >>>> >>>> *Both "PREFERRED"* = Client config wins >>>> - >>>> >>>> *Client not configured* = Use server config or default >>>> >>>> I will update the reference implementation soon based on this. I would >>>> love to know what other folks think! >>>> >>>> Best, >>>> >>>> Prashant Singh >>>> >>>> [1] >>>> https://github.com/apache/iceberg/pull/14867#issuecomment-3683989832 >>>> >>>> On Sat, Dec 20, 2025 at 1:26 PM Russell Spitzer < >>>> [email protected]> wrote: >>>> >>>>> I can imagine one more >>>>> >>>>> >>>>> (None - I would rename this) ClientOnly - Client can use Catalog >>>>> Planning or Local Planning >>>>> >>>>> PreferClient - Client should use local planning, but the plan api is >>>>> available for this table — I can only imagine this would be useful for a >>>>> scenario where most clients are heavy and have the resources to do local >>>>> planning (or engine distributed planning) but you still want to support >>>>> lightweight clients which can’t really do planning themselves. >>>>> >>>>> PreferCatalog - Client should use the plan API, but credentials have >>>>> been provided to enable local planning — This is probably a transitional >>>>> state as we move from clients that only support local planning to those >>>>> which can use the plan api. >>>>> >>>>> CatalogOnly - Clients are not provided with the credentials required >>>>> to read the table from the Metadata.json alone. If they do not implement >>>>> the scan plan API they should fail fast, otherwise they will fail when >>>>> they >>>>> attempt to load a manifest_list file — This is used in circumstances where >>>>> the catalog is giving either file specific credentials or is protecting >>>>> the >>>>> delivered files in some way such that their contents has been specially >>>>> redacted or something like that. >>>>> >>>>> >>>>> I assume most catalogs will start with “ClientOnly” or “None” >>>>> >>>>> Then as Catalogs being to support planning API we will see most tables >>>>> move to >>>>> PreferCatalog with some perhaps extremely heavy or large tables >>>>> staying as PreferClient or Client Only. >>>>> >>>>> Then catalogs with special protections may have some tables return >>>>> CatalogOnly so they can either scope credentials more tightly or >>>>> manipulate the files that the client actually has access to in some way. >>>>> >>>>> On Sat, Dec 20, 2025 at 1:09 AM Jean-Baptiste Onofré <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Prashant >>>>>> >>>>>> It makes sense to me. I guess we are using Catalog properties to >>>>>> indicate what the REST server supports to the client, right ? >>>>>> I will take a look at the PR, but I like the idea. >>>>>> >>>>>> Regards >>>>>> JB >>>>>> >>>>>> On Sat, Dec 20, 2025 at 12:53 AM Prashant Singh < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hey All, >>>>>>> >>>>>>> I wanted to bring up the discussion of introducing a concept of rest >>>>>>> scan planning mode which would help the server to instruct the client on >>>>>>> how to plan the table via loadTableResponse or config at table level >>>>>>> override. >>>>>>> There are three possible values which one could think of : >>>>>>> 1. *None* : i.e plan it on the client side, this may be the table >>>>>>> is too small and the additional rest request would add more overhead >>>>>>> than >>>>>>> benefit. >>>>>>> 2. *Optional* : client can choose to plan it either locally or can >>>>>>> trigger server side planning. >>>>>>> 3. *Required* : client MUST do server side planning, the server >>>>>>> could suggest this if it has better indexed the iceberg metadata or >>>>>>> client >>>>>>> is running on low resources or the table is protected. Server MAY choose >>>>>>> whatever way required to enforce the client cant bypass this for example >>>>>>> let's say don't vend cred as part of loadTable and only mint it part of >>>>>>> planning completion this would mean if the client doesn't call plan >>>>>>> table . >>>>>>> >>>>>>> I proactively have created a pull request [1], would love to know >>>>>>> all your feedback either here or in the PR directly ! >>>>>>> >>>>>>> Wish you all a very happy Holidays, it has been great working with >>>>>>> you all. >>>>>>> >>>>>>> [1] https://github.com/apache/iceberg/pull/14867 >>>>>>> >>>>>>> Best, >>>>>>> Prashant Singh >>>>>>> >>>>>>
