Thanks. Compaction is https://github.com/apache/iceberg/pull/2303 and it is currently blocked by https://github.com/apache/iceberg/issues/2308?
On Mon, May 17, 2021 at 6:17 PM OpenInx <[email protected]> wrote: > Hi Huadong > > From the perspective of iceberg developers, we don't expose the format v2 > to end users because we think there is still other work that needs to be > done. As you can see there are still some unfinished issues from your link. > As for whether v2 will cause data loss, from my perspective as a designer, > semantics and correctness should be handled very rigorously if we don't do > any compaction. Once we introduce the compaction action, we will > encounter this issue: https://github.com/apache/iceberg/issues/2308, > we've proposed a solution but still not reached an agreement in the > community. I will suggest using v2 in production after we resolve this > issue at least. > > On Sat, May 15, 2021 at 8:01 AM Huadong Liu <[email protected]> wrote: > >> Hi iceberg-dev, >> >> I tried v2 row-level deletion by committing equality delete files after >> *upgradeToFormatVersion(2)*. It worked well. I know that Spark actions >> to compact delete files and data files >> <https://github.com/apache/iceberg/milestone/4> etc. are in progress. I >> currently use the JAVA API to update, query and do maintenance ops. I am >> not using Flink at the moment and I will definitely pick up Spark actions >> when they are completed. Deletions can be scheduled in batches (e.g. >> weekly) to control the volume of delete files. I want to get a sense of the >> risk level of losing data at some point because of v2 Spec/API changes if I >> start to use v2 format now. It is not an easy question. Any input is >> appreciated. >> >> -- >> Huadong >> >
