Hey Iceberg devs: While implementing the overwrite option for registering an external table (see PR12228), I realized we might want to evaluate the option to add a new method *set(metadata)* on TableOperations interfaces for unconditionally set latest table metadata. After some discussions with Steven and Russell, I want to seek opinions from the community.
Today, the register-table under the hood use the commit API from TableOperations, but it might not work well with overwrite registration due to given assumptions 1. commit (base, new) is designed to avoid overwriting updates and mandate the provided base metadata is the same as the current metadata of the table. However for overwrite registration, the end goal is to reset table metadata to a desired state, so we do not want to retry on failure even if the base table state changes. 2. commit(base, new) will always write new metadata.json files in case of successful commit. For a successful registration. The file that is being passed is the file that should be used for registration. Previous workarounds (e.g., PR6591) reused user-provided metadata only for new tables, but cannot generalize to the overwrite case as it cannot differentiate normal update and overwrite registration at the TableOperations layer. 3. When committing fails, the system attempts to delete the new metadata file, presuming it was authored by the committer. This is not appropriate for registration scenarios where the metadata file might have been generated elsewhere (see PR13169) One alternative—dropping and recreating the table—raises concerns about atomicity, since it can leave the table in an invalid intermediate state if a failure occurs between drop and creation. Given this, I would like to share proposed new API to be add: * /*** * * *Atomically set the provided table metadata as current, bypassing base state checks. * * @param metadata *the new table metadata to make current * */* * void set(TableMetadata metadata);* PR12228 Add overwrite option when register external table to catalog: https://github.com/apache/iceberg/pull/12228 PR6591 Avoid creating new metadata file when registerTable API is used: https://github.com/apache/iceberg/pull/6591 PR13169 Only remove metadata files if TableOp created a new one: github.com/apache/iceberg/pull/13169 Thanks, Hongyue Zhang