Hey Iceberg devs:

  While implementing the overwrite option for registering an external table
(see PR12228), I realized we might want to evaluate the option to add a new
method *set(metadata)* on TableOperations interfaces for unconditionally
set latest table metadata. After some discussions with Steven and Russell,
I want to seek opinions from the community.

Today, the register-table under the hood use the commit API from
TableOperations, but it might not work well with overwrite registration due
to given assumptions

   1.

   commit (base, new) is designed to avoid overwriting updates and mandate
   the provided base metadata is the same as the current metadata of the
   table. However for overwrite registration, the end goal is to reset table
   metadata to a desired state, so we do not want to retry on failure even if
   the base table state changes.
   2.

   commit(base, new) will always write new metadata.json files in case of
   successful commit. For a successful registration. The file that is being
   passed is the file that should be used for registration. Previous
   workarounds (e.g., PR6591) reused user-provided metadata only for new
   tables, but cannot generalize to the overwrite case as it cannot
   differentiate normal update and overwrite registration at the
   TableOperations layer.
   3.

   When committing fails, the system attempts to delete the new metadata
   file, presuming it was authored by the committer. This is not appropriate
   for registration scenarios where the metadata file might have been
   generated elsewhere (see PR13169)


One alternative—dropping and recreating the table—raises concerns about
atomicity, since it can leave the table in an invalid intermediate state if
a failure occurs between drop and creation.

Given this, I would like to share proposed new API to be add:

*  /***

*   * *Atomically set the provided table metadata as current, bypassing
base state checks.

*   * @param metadata *the new table metadata to make current

*   */*

*  void set(TableMetadata metadata);*


PR12228 Add overwrite option when register external table to catalog:
https://github.com/apache/iceberg/pull/12228

PR6591 Avoid creating new metadata file when registerTable API is used:
https://github.com/apache/iceberg/pull/6591
PR13169 Only remove metadata files if TableOp created a new one:
github.com/apache/iceberg/pull/13169

Thanks,

Hongyue Zhang

Reply via email to