I think that was my request before as well :) I want a Catalog api for
"register" directly then each implementation can decide how that gets
applied.

On Wed, Jul 9, 2025 at 4:13 PM Ryan Blue <rdb...@gmail.com> wrote:

> Thanks for bringing this up, Hongyue. I think the logic here makes sense
> and that `commit(base, new)` probably isn't a good API to use for
> `registerTable`. But my main objection is that I don't think that it makes
> sense to use `TableOperations` for this. Adding a `set` method is awkward
> because the table may not already exist.
>
> Why not have catalogs implement `registerTable` directly? For instance,
> JDBC could run an INSERT query.
>
> Ryan
>
> On Tue, Jul 8, 2025 at 4:42 PM Steve <hongyue.apa...@gmail.com> wrote:
>
>> Hey Iceberg devs:
>>
>>   While implementing the overwrite option for registering an external
>> table (see PR12228), I realized we might want to evaluate the option to add
>> a new method *set(metadata)* on TableOperations interfaces for
>> unconditionally set latest table metadata. After some discussions with
>> Steven and Russell, I want to seek opinions from the community.
>>
>> Today, the register-table under the hood use the commit API from
>> TableOperations, but it might not work well with overwrite registration due
>> to given assumptions
>>
>>    1.
>>
>>    commit (base, new) is designed to avoid overwriting updates and
>>    mandate the provided base metadata is the same as the current metadata of
>>    the table. However for overwrite registration, the end goal is to reset
>>    table metadata to a desired state, so we do not want to retry on failure
>>    even if the base table state changes.
>>    2.
>>
>>    commit(base, new) will always write new metadata.json files in case
>>    of successful commit. For a successful registration. The file that is
>>    being passed is the file that should be used for registration.
>>    Previous workarounds (e.g., PR6591) reused user-provided metadata only for
>>    new tables, but cannot generalize to the overwrite case as it cannot
>>    differentiate normal update and overwrite registration at the
>>    TableOperations layer.
>>    3.
>>
>>    When committing fails, the system attempts to delete the new metadata
>>    file, presuming it was authored by the committer. This is not appropriate
>>    for registration scenarios where the metadata file might have been
>>    generated elsewhere (see PR13169)
>>
>>
>> One alternative—dropping and recreating the table—raises concerns about
>> atomicity, since it can leave the table in an invalid intermediate state if
>> a failure occurs between drop and creation.
>>
>> Given this, I would like to share proposed new API to be add:
>>
>> *  /***
>>
>> *   * *Atomically set the provided table metadata as current, bypassing
>> base state checks.
>>
>> *   * @param metadata *the new table metadata to make current
>>
>> *   */*
>>
>> *  void set(TableMetadata metadata);*
>>
>>
>> PR12228 Add overwrite option when register external table to catalog:
>> https://github.com/apache/iceberg/pull/12228
>>
>> PR6591 Avoid creating new metadata file when registerTable API is used:
>> https://github.com/apache/iceberg/pull/6591
>> PR13169 Only remove metadata files if TableOp created a new one:
>> github.com/apache/iceberg/pull/13169
>>
>> Thanks,
>>
>> Hongyue Zhang
>>
>>
>>

Reply via email to