Thanks Ryan and Russell: That's a wonderful suggestion, I don't imagine we need such "set function" at other places so let me see how I can achieve this directly in registerTable at the catalog.
Cheers, Hongyue On Wed, Jul 9, 2025 at 2:07 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > I think that was my request before as well :) I want a Catalog api for > "register" directly then each implementation can decide how that gets > applied. > > On Wed, Jul 9, 2025 at 4:13 PM Ryan Blue <rdb...@gmail.com> wrote: > >> Thanks for bringing this up, Hongyue. I think the logic here makes sense >> and that `commit(base, new)` probably isn't a good API to use for >> `registerTable`. But my main objection is that I don't think that it makes >> sense to use `TableOperations` for this. Adding a `set` method is awkward >> because the table may not already exist. >> >> Why not have catalogs implement `registerTable` directly? For instance, >> JDBC could run an INSERT query. >> >> Ryan >> >> On Tue, Jul 8, 2025 at 4:42 PM Steve <hongyue.apa...@gmail.com> wrote: >> >>> Hey Iceberg devs: >>> >>> While implementing the overwrite option for registering an external >>> table (see PR12228), I realized we might want to evaluate the option to add >>> a new method *set(metadata)* on TableOperations interfaces for >>> unconditionally set latest table metadata. After some discussions with >>> Steven and Russell, I want to seek opinions from the community. >>> >>> Today, the register-table under the hood use the commit API from >>> TableOperations, but it might not work well with overwrite registration due >>> to given assumptions >>> >>> 1. >>> >>> commit (base, new) is designed to avoid overwriting updates and >>> mandate the provided base metadata is the same as the current metadata of >>> the table. However for overwrite registration, the end goal is to reset >>> table metadata to a desired state, so we do not want to retry on failure >>> even if the base table state changes. >>> 2. >>> >>> commit(base, new) will always write new metadata.json files in case >>> of successful commit. For a successful registration. The file that >>> is being passed is the file that should be used for registration. >>> Previous workarounds (e.g., PR6591) reused user-provided metadata only >>> for >>> new tables, but cannot generalize to the overwrite case as it cannot >>> differentiate normal update and overwrite registration at the >>> TableOperations layer. >>> 3. >>> >>> When committing fails, the system attempts to delete the new >>> metadata file, presuming it was authored by the committer. This is not >>> appropriate for registration scenarios where the metadata file might have >>> been generated elsewhere (see PR13169) >>> >>> >>> One alternative—dropping and recreating the table—raises concerns about >>> atomicity, since it can leave the table in an invalid intermediate state if >>> a failure occurs between drop and creation. >>> >>> Given this, I would like to share proposed new API to be add: >>> >>> * /*** >>> >>> * * *Atomically set the provided table metadata as current, bypassing >>> base state checks. >>> >>> * * @param metadata *the new table metadata to make current >>> >>> * */* >>> >>> * void set(TableMetadata metadata);* >>> >>> >>> PR12228 Add overwrite option when register external table to catalog: >>> https://github.com/apache/iceberg/pull/12228 >>> >>> PR6591 Avoid creating new metadata file when registerTable API is used: >>> https://github.com/apache/iceberg/pull/6591 >>> PR13169 Only remove metadata files if TableOp created a new one: >>> github.com/apache/iceberg/pull/13169 >>> >>> Thanks, >>> >>> Hongyue Zhang >>> >>> >>>