Hi Team,

I found that Iceberg's "overwrite" is different from Spark's built-in
sources like Parquet. The "overwrite" semantics in Iceberg seems more like
"upsert", but not deleting the partitions where new data doesn't contain.

I would like to know what is the purpose of such design choice? Also if I
want to achieve Spark Parquet's "overwrite" semantics, how would I
achieve this?

Warning

*Spark does not define the behavior of DataFrame overwrite*. Like most
sources, Iceberg will dynamically overwrite partitions when the dataframe
contains rows in a partition. Unpartitioned tables are completely
overwritten.

Best regards,
Saisai

Reply via email to