Hi Huadong

>From the perspective of iceberg developers, we don't expose the format v2
to end users because we think there is still other work that needs to be
done. As you can see there are still some unfinished issues from your link.
As for whether v2 will cause data loss, from my perspective as a designer,
semantics and correctness should be handled very rigorously if we don't do
any compaction.  Once we introduce the compaction action,  we will
encounter this issue: https://github.com/apache/iceberg/issues/2308,  we've
proposed a solution but still not reached an agreement in the community.  I
will suggest using v2 in production after we resolve this issue at least.

On Sat, May 15, 2021 at 8:01 AM Huadong Liu <huadong...@gmail.com> wrote:

> Hi iceberg-dev,
>
> I tried v2 row-level deletion by committing equality delete files after
> *upgradeToFormatVersion(2)*. It worked well. I know that Spark actions to
> compact delete files and data files
> <https://github.com/apache/iceberg/milestone/4> etc. are in progress. I
> currently use the JAVA API to update, query and do maintenance ops. I am
> not using Flink at the moment and I will definitely pick up Spark actions
> when they are completed. Deletions can be scheduled in batches (e.g.
> weekly) to control the volume of delete files. I want to get a sense of the
> risk level of losing data at some point because of v2 Spec/API changes if I
> start to use v2 format now. It is not an easy question. Any input is
> appreciated.
>
> --
> Huadong
>

Reply via email to