I think that was my request before as well :) I want a Catalog api for "register" directly then each implementation can decide how that gets applied.
On Wed, Jul 9, 2025 at 4:13 PM Ryan Blue <rdb...@gmail.com> wrote: > Thanks for bringing this up, Hongyue. I think the logic here makes sense > and that `commit(base, new)` probably isn't a good API to use for > `registerTable`. But my main objection is that I don't think that it makes > sense to use `TableOperations` for this. Adding a `set` method is awkward > because the table may not already exist. > > Why not have catalogs implement `registerTable` directly? For instance, > JDBC could run an INSERT query. > > Ryan > > On Tue, Jul 8, 2025 at 4:42 PM Steve <hongyue.apa...@gmail.com> wrote: > >> Hey Iceberg devs: >> >> While implementing the overwrite option for registering an external >> table (see PR12228), I realized we might want to evaluate the option to add >> a new method *set(metadata)* on TableOperations interfaces for >> unconditionally set latest table metadata. After some discussions with >> Steven and Russell, I want to seek opinions from the community. >> >> Today, the register-table under the hood use the commit API from >> TableOperations, but it might not work well with overwrite registration due >> to given assumptions >> >> 1. >> >> commit (base, new) is designed to avoid overwriting updates and >> mandate the provided base metadata is the same as the current metadata of >> the table. However for overwrite registration, the end goal is to reset >> table metadata to a desired state, so we do not want to retry on failure >> even if the base table state changes. >> 2. >> >> commit(base, new) will always write new metadata.json files in case >> of successful commit. For a successful registration. The file that is >> being passed is the file that should be used for registration. >> Previous workarounds (e.g., PR6591) reused user-provided metadata only for >> new tables, but cannot generalize to the overwrite case as it cannot >> differentiate normal update and overwrite registration at the >> TableOperations layer. >> 3. >> >> When committing fails, the system attempts to delete the new metadata >> file, presuming it was authored by the committer. This is not appropriate >> for registration scenarios where the metadata file might have been >> generated elsewhere (see PR13169) >> >> >> One alternative—dropping and recreating the table—raises concerns about >> atomicity, since it can leave the table in an invalid intermediate state if >> a failure occurs between drop and creation. >> >> Given this, I would like to share proposed new API to be add: >> >> * /*** >> >> * * *Atomically set the provided table metadata as current, bypassing >> base state checks. >> >> * * @param metadata *the new table metadata to make current >> >> * */* >> >> * void set(TableMetadata metadata);* >> >> >> PR12228 Add overwrite option when register external table to catalog: >> https://github.com/apache/iceberg/pull/12228 >> >> PR6591 Avoid creating new metadata file when registerTable API is used: >> https://github.com/apache/iceberg/pull/6591 >> PR13169 Only remove metadata files if TableOp created a new one: >> github.com/apache/iceberg/pull/13169 >> >> Thanks, >> >> Hongyue Zhang >> >> >>