There is already a Catalog#registerTable method: https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/catalog/Catalog.java#L346-L348
On Wed, Jul 9, 2025 at 2:41 PM Steve <hongyue.apa...@gmail.com> wrote: > Thanks Ryan and Russell: > > That's a wonderful suggestion, I don't imagine we need such > "set function" at other places so let me see how I can achieve this > directly in registerTable at the catalog. > > Cheers, > Hongyue > > On Wed, Jul 9, 2025 at 2:07 PM Russell Spitzer <russell.spit...@gmail.com> > wrote: > >> I think that was my request before as well :) I want a Catalog api for >> "register" directly then each implementation can decide how that gets >> applied. >> >> On Wed, Jul 9, 2025 at 4:13 PM Ryan Blue <rdb...@gmail.com> wrote: >> >>> Thanks for bringing this up, Hongyue. I think the logic here makes sense >>> and that `commit(base, new)` probably isn't a good API to use for >>> `registerTable`. But my main objection is that I don't think that it makes >>> sense to use `TableOperations` for this. Adding a `set` method is awkward >>> because the table may not already exist. >>> >>> Why not have catalogs implement `registerTable` directly? For instance, >>> JDBC could run an INSERT query. >>> >>> Ryan >>> >>> On Tue, Jul 8, 2025 at 4:42 PM Steve <hongyue.apa...@gmail.com> wrote: >>> >>>> Hey Iceberg devs: >>>> >>>> While implementing the overwrite option for registering an external >>>> table (see PR12228), I realized we might want to evaluate the option to add >>>> a new method *set(metadata)* on TableOperations interfaces for >>>> unconditionally set latest table metadata. After some discussions with >>>> Steven and Russell, I want to seek opinions from the community. >>>> >>>> Today, the register-table under the hood use the commit API from >>>> TableOperations, but it might not work well with overwrite registration due >>>> to given assumptions >>>> >>>> 1. >>>> >>>> commit (base, new) is designed to avoid overwriting updates and >>>> mandate the provided base metadata is the same as the current metadata >>>> of >>>> the table. However for overwrite registration, the end goal is to reset >>>> table metadata to a desired state, so we do not want to retry on failure >>>> even if the base table state changes. >>>> 2. >>>> >>>> commit(base, new) will always write new metadata.json files in case >>>> of successful commit. For a successful registration. The file that >>>> is being passed is the file that should be used for registration. >>>> Previous workarounds (e.g., PR6591) reused user-provided metadata only >>>> for >>>> new tables, but cannot generalize to the overwrite case as it cannot >>>> differentiate normal update and overwrite registration at the >>>> TableOperations layer. >>>> 3. >>>> >>>> When committing fails, the system attempts to delete the new >>>> metadata file, presuming it was authored by the committer. This is not >>>> appropriate for registration scenarios where the metadata file might >>>> have >>>> been generated elsewhere (see PR13169) >>>> >>>> >>>> One alternative—dropping and recreating the table—raises concerns about >>>> atomicity, since it can leave the table in an invalid intermediate state if >>>> a failure occurs between drop and creation. >>>> >>>> Given this, I would like to share proposed new API to be add: >>>> >>>> * /*** >>>> >>>> * * *Atomically set the provided table metadata as current, >>>> bypassing base state checks. >>>> >>>> * * @param metadata *the new table metadata to make current >>>> >>>> * */* >>>> >>>> * void set(TableMetadata metadata);* >>>> >>>> >>>> PR12228 Add overwrite option when register external table to catalog: >>>> https://github.com/apache/iceberg/pull/12228 >>>> >>>> PR6591 Avoid creating new metadata file when registerTable API is used: >>>> https://github.com/apache/iceberg/pull/6591 >>>> PR13169 Only remove metadata files if TableOp created a new one: >>>> github.com/apache/iceberg/pull/13169 >>>> >>>> Thanks, >>>> >>>> Hongyue Zhang >>>> >>>> >>>>