Hey Maninder,

Thanks for raising this. For the issue that you're hitting
<https://github.com/apache/iceberg/issues/12738>, I have a PR that fixes
that <https://github.com/apache/iceberg/pull/11868>, and it would be great
to get in (maybe for 1.10.0? :).

I do see the confusion with the approach that Iceberg is taking, but my
main concern is that it essentially keeps time travel possible. It could be
that a client is still reading an older version of the table, so we need to
keep the metadata. Or that a process is reading a specific tag or branch of
the table, which would be deleted with the approach you're suggesting.

As suggested in the prior discussion
<https://lists.apache.org/thread/s0c2wjdztzsh7nf8wf570kycoxxpnql3>, I think
it would be good to add this to the implementation notes.

Kind regards,
Fokko

Op do 26 jun 2025 om 05:18 schreef Maninderjit Singh <
parmar.maninder...@gmail.com>:

> Hi dev community,
>
> I'd like to open a discussion around the current REPLACE TABLE
> transaction semantics, particularly how it's implemented via the REST
> catalog, and propose clarifying the spec to allow more flexible semantics
> to allow broader alignment with database and catalog vendors.
>
> Today, the replace table implementation
> <https://github.com/apache/iceberg/blob/2cdff366982b30685f6410c290cbd16aed274caf/core/src/main/java/org/apache/iceberg/rest/RESTTableOperations.java#L129>
>  for
> REST catalog uses the updateTable
> <https://github.com/apache/iceberg/blob/2cdff366982b30685f6410c290cbd16aed274caf/open-api/rest-catalog-open-api.yaml#L997>
> API which retains the old table metadata (including UUID, schema,
> properties, and snapshot), while resetting the current snapshot to -1.
> From SQL point of view, this acts as a (TRUNCATE [1
> <https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-truncate-table.html>
> ][2 <https://docs.snowflake.com/en/sql-reference/sql/truncate-table>][3
> <https://trino.io/docs/current/sql/truncate.html>][4
> <https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/truncate/>]
> + Optional Schema evolution) rather than a commonly understood CREATE OR
> REPLACE table behavior used in other database systems which define it as
> drop and recreation of the table (eg: DuckDb
> <https://duckdb.org/docs/stable/sql/statements/create_table.html#:~:text=The%20CREATE%20OR%20REPLACE%20syntax,then%20creating%20the%20new%20one.>,
> Flink
> <https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/>,
> Snowflake <https://docs.snowflake.com/en/sql-reference/sql/create-table>,
> BigQuery
> <https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_table_statement>,
> ClickHouse
> <https://clickhouse.com/docs/sql-reference/statements/create/table> etc).
> Similarly, CREATE OR REPLACE for other entities like views typically
> implies dropping and recreating the view [1
> <https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-create-view.html>
> ][2 <https://docs.snowflake.com/en/sql-reference/sql/create-view>][3
> <https://trino.io/docs/current/sql/create-view.html>]
>
> There has been prior discussion
> <https://lists.apache.org/thread/bz4ohd1jbzctzdjlkn8d8fol3hzcfr29> regarding
> this issue where iceberg's REST catalog's behavior has been defined as
> "CREATE AND UPDATE" but there wasn't a clear consensus on whether this
> should be the only allowed behavior. There were also requests to allow
> additional flexibility since the current behavior is non-obvious to many
> database vendors and introduces more complexity than required for the REST
> catalog. For example, this github issue
> <https://github.com/apache/iceberg/issues/12738> is a good example of how
> subtle bugs can still exist with this implementation. I was able to refine
> this to create a 3 simple statement example that would break existing
> implementations.
>
> My ask from the community is to consider allowing more permissible
> definition of replace table where databases vendors could choose to
> implement it as drop + create. In future, we can consider creating a
> separate endpoint to do drop+create atomically in the REST catalog. This
> behavior has few advantages since it is very easy to explain to the end
> user, easier to implement for all the cases and is more aligned with many
> other database systems. For users that want to retain their snapshot
> history, they can still consider using TRUNCATE to get the desired behavior.
>
> I’d love to hear the community’s thoughts on whether we can update the
> implementation notes in the spec to allow having drop+create as an
> acceptable behavior.
>
> Thanks,
> Maninder
>
>
>
>
>

Reply via email to