Thanks. Compaction is https://github.com/apache/iceberg/pull/2303 and it is
currently blocked by https://github.com/apache/iceberg/issues/2308?

On Mon, May 17, 2021 at 6:17 PM OpenInx <[email protected]> wrote:

> Hi Huadong
>
> From the perspective of iceberg developers, we don't expose the format v2
> to end users because we think there is still other work that needs to be
> done. As you can see there are still some unfinished issues from your link.
> As for whether v2 will cause data loss, from my perspective as a designer,
> semantics and correctness should be handled very rigorously if we don't do
> any compaction.  Once we introduce the compaction action,  we will
> encounter this issue: https://github.com/apache/iceberg/issues/2308,
> we've proposed a solution but still not reached an agreement in the
> community.  I will suggest using v2 in production after we resolve this
> issue at least.
>
> On Sat, May 15, 2021 at 8:01 AM Huadong Liu <[email protected]> wrote:
>
>> Hi iceberg-dev,
>>
>> I tried v2 row-level deletion by committing equality delete files after
>> *upgradeToFormatVersion(2)*. It worked well. I know that Spark actions
>> to compact delete files and data files
>> <https://github.com/apache/iceberg/milestone/4> etc. are in progress. I
>> currently use the JAVA API to update, query and do maintenance ops. I am
>> not using Flink at the moment and I will definitely pick up Spark actions
>> when they are completed. Deletions can be scheduled in batches (e.g.
>> weekly) to control the volume of delete files. I want to get a sense of the
>> risk level of losing data at some point because of v2 Spec/API changes if I
>> start to use v2 format now. It is not an easy question. Any input is
>> appreciated.
>>
>> --
>> Huadong
>>
>

Reply via email to