Thanks for your answer, Ryan.
In the short term we'll only have INSERT INTO/OVERWRITE for Icebeg tables
in Impala (so the way to go is AppendFiles and ReplacePartitions
accordingly).
But I agree that a MERGE INTO statement would be super useful. Hopefully
we'll add support for it as well in the not too distant future.

Thanks again,
Zoltan



On Fri, Jan 29, 2021 at 8:19 PM Ryan Blue <rb...@netflix.com.invalid> wrote:

> Zoltan,
>
> The warning is that dynamic overwrites in general aren't recommended.
> ReplacePartitions is the right operation to use for dynamic overwrite, we
> just want to steer users away from dynamic overwrites in general.
>
> The problem with dynamic overwrite is that its behavior depends on the
> underlying physical structure of the table. If your table is partitioned by
> days, then it will overwrite by day. If it is partitioned by hours, it will
> overwrite by hour. That means that the behavior changes when table
> partitioning changes. That's a bad thing because we want SQL operations to
> be logical operations that do not depend on the table layout.
>
> OverwriteFiles is a better option because what gets overwritten is
> explicit. If you want to overwrite a day, you pass a filter for that day.
> Another way around this problem is to support MERGE INTO, which will detect
> the files that need to be changed and correctly rewrite them, wherever they
> are in the table.
>
> rb
>
> On Fri, Jan 29, 2021 at 10:14 AM Zoltán Borók-Nagy
> <borokna...@cloudera.com.invalid> wrote:
>
>> Hey everyone,
>>
>> I'm currently working on the INSERT OVERWRITE statement for Iceberg
>> tables in Impala.
>>
>> Seems like ReplacePartitions is the perfect interface for this job:
>> https://github.infra.cloudera.com/CDH/iceberg/blob/cdpd-master/api/src/main/java/org/apache/iceberg/ReplacePartitions.java
>>
>> IIUC Spark also uses this interface for dynamic overwrites. Though the
>> class comment says that this interface is not recommended to use, and use
>> OverwriteFiles instead. OverwriteFiles is more generic and more explicit,
>> but for this task it would need extra boilerplate code, therefore more
>> possibilities to do stg wrong.
>>
>> So my question is, is there any problem with using ReplacePartitions for
>> dynamic overwrites? I see that it only replaces partitions with the current
>> partition spec, but that's probably fine. Otherwise handling partition
>> layout evolution and dynamic inserts can be complicated and error-prone
>> anyway.
>>
>> Apart from which interface to use, is there anything I should be aware
>> of? E.g. I guess we don't want to allow dynamic overwrites if the table is
>> partitioned by the BUCKET transform.
>>
>> Thanks,
>>     Zoltan
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Reply via email to