Hey everyone,

I'm currently working on the INSERT OVERWRITE statement for Iceberg tables
in Impala.

Seems like ReplacePartitions is the perfect interface for this job:
https://github.infra.cloudera.com/CDH/iceberg/blob/cdpd-master/api/src/main/java/org/apache/iceberg/ReplacePartitions.java

IIUC Spark also uses this interface for dynamic overwrites. Though the
class comment says that this interface is not recommended to use, and use
OverwriteFiles instead. OverwriteFiles is more generic and more explicit,
but for this task it would need extra boilerplate code, therefore more
possibilities to do stg wrong.

So my question is, is there any problem with using ReplacePartitions for
dynamic overwrites? I see that it only replaces partitions with the current
partition spec, but that's probably fine. Otherwise handling partition
layout evolution and dynamic inserts can be complicated and error-prone
anyway.

Apart from which interface to use, is there anything I should be aware of?
E.g. I guess we don't want to allow dynamic overwrites if the table is
partitioned by the BUCKET transform.

Thanks,
    Zoltan

Reply via email to