Query about the semantics of "overwrite" in Iceberg

Saisai Shao Thu, 21 Nov 2019 23:34:24 -0800

Hi Team,

I found that Iceberg's "overwrite" is different from Spark's built-in
sources like Parquet. The "overwrite" semantics in Iceberg seems more like
"upsert", but not deleting the partitions where new data doesn't contain.


I would like to know what is the purpose of such design choice? Also if I
want to achieve Spark Parquet's "overwrite" semantics, how would I
achieve this?

Warning

*Spark does not define the behavior of DataFrame overwrite*. Like most
sources, Iceberg will dynamically overwrite partitions when the dataframe
contains rows in a partition. Unpartitioned tables are completely
overwritten.

Best regards,
Saisai

Query about the semantics of "overwrite" in Iceberg

Reply via email to