Thanks for bringing this up, Hongyue. I think the logic here makes sense and that `commit(base, new)` probably isn't a good API to use for `registerTable`. But my main objection is that I don't think that it makes sense to use `TableOperations` for this. Adding a `set` method is awkward because the table may not already exist.
Why not have catalogs implement `registerTable` directly? For instance, JDBC could run an INSERT query. Ryan On Tue, Jul 8, 2025 at 4:42 PM Steve <hongyue.apa...@gmail.com> wrote: > Hey Iceberg devs: > > While implementing the overwrite option for registering an external > table (see PR12228), I realized we might want to evaluate the option to add > a new method *set(metadata)* on TableOperations interfaces for > unconditionally set latest table metadata. After some discussions with > Steven and Russell, I want to seek opinions from the community. > > Today, the register-table under the hood use the commit API from > TableOperations, but it might not work well with overwrite registration due > to given assumptions > > 1. > > commit (base, new) is designed to avoid overwriting updates and > mandate the provided base metadata is the same as the current metadata of > the table. However for overwrite registration, the end goal is to reset > table metadata to a desired state, so we do not want to retry on failure > even if the base table state changes. > 2. > > commit(base, new) will always write new metadata.json files in case of > successful commit. For a successful registration. The file that is > being passed is the file that should be used for registration. > Previous workarounds (e.g., PR6591) reused user-provided metadata only for > new tables, but cannot generalize to the overwrite case as it cannot > differentiate normal update and overwrite registration at the > TableOperations layer. > 3. > > When committing fails, the system attempts to delete the new metadata > file, presuming it was authored by the committer. This is not appropriate > for registration scenarios where the metadata file might have been > generated elsewhere (see PR13169) > > > One alternative—dropping and recreating the table—raises concerns about > atomicity, since it can leave the table in an invalid intermediate state if > a failure occurs between drop and creation. > > Given this, I would like to share proposed new API to be add: > > * /*** > > * * *Atomically set the provided table metadata as current, bypassing > base state checks. > > * * @param metadata *the new table metadata to make current > > * */* > > * void set(TableMetadata metadata);* > > > PR12228 Add overwrite option when register external table to catalog: > https://github.com/apache/iceberg/pull/12228 > > PR6591 Avoid creating new metadata file when registerTable API is used: > https://github.com/apache/iceberg/pull/6591 > PR13169 Only remove metadata files if TableOp created a new one: > github.com/apache/iceberg/pull/13169 > > Thanks, > > Hongyue Zhang > > >