I had a few follow-up points. 1 *"(1) for hot partitions, users can try to only perform Convert and Rewrite to keep delete file sizes and count manageable, until the partition becomes cold and a Merge can be performed safely.".* : I believe for the CDC use case it is hard to guarantee that that partitions will turn cold and can be merged without conflicts, as 'hotness' is a factor of mutation rate in the source DB; and perhaps some partitions are always "hot"; so in essence the following: *"Merge strategy that does not do any bin-packing, and only merges the delete files for each data file and writes it back. The new data file will have the same sequence number as the old file before Merge"* seems important. Though as a follow-up I am wondering why can't this strategy do bin-packing or sorting as well; if that is required; as long as the sequence number is not updated.
2 *"During the commit validation phase of a Merge operation, we need to verify that for each data file that would be removed, there are no new position deletes with higher sequence number added."* : Just to be clear; for the tables only being written into by a Flink CDC pipeline, this should not happen as position deletes are only created for in-progress (uncommitted) data files, correct ? Thanks and regards, - Puneet On Thu, Oct 21, 2021 at 10:54 PM Jack Ye <[email protected]> wrote: > Had some offline discussions on Slack and WeChat. > > For Russell's point, we are reconfirming with related people on Slack, and > will post updates once we have an agreement. > > Regarding point 6, for Flink CDC the data file flushed to disk might be > associated with position deletes, but after the flush all deletes will be > equality deletes, so 6-2 still works. After all, as long as data files for > position deletes are not removed, the process should be able to succeed > with optimistic retry. Currently we are missing the following that needs to > be worked on to resolve the CDC performance issue: > 1. We need to support setting the sequence number for individual content > files. > 2. During the commit validation phase of a Merge operation, we need to > verify that for each data file that would be removed, there are no new > position deletes with higher sequence number added. If detected, merge of > that file has to be completely retried (we can support incremental progress > for this). > > -Jack > > > On Thu, Oct 21, 2021 at 7:58 PM Russell Spitzer <[email protected]> > wrote: > >> I think I understood the Rewrite strategy discussion a little differently >> >> Binpack Strategy and SortStrategy each get a new flag which lets you pick >> files based on their number of delete files. So basically you can set a >> variety of parameters, small files, large files, files with deletes etc ... >> >> A new strategy is added which determines which file to rewrite by looking >> for all files currently touched by delete files. Instead of looking through >> files with X deletes, we look up all files affected by deletes and rewrite >> them. Although now as I write this it's basically running the above >> strategies with number of delete files >= 1 and files per group at 1. So >> maybe it doesn't need another strategy? >> >> But maybe I got that wrong ... >> >> On Thu, Oct 21, 2021 at 8:39 PM Jack Ye <[email protected]> wrote: >> >>> Thanks to everyone who came to the meeting. >>> >>> Here is the full meeting recording I made: >>> https://drive.google.com/file/d/1yuBFlNn9nkMlH9TIut2H8CXmJGLd18Sa/view?usp=sharing >>> >>> Here are some key takeaways: >>> >>> 1. we generally agreed upon the division of compactions into Rewrite, >>> Convert and Merge. >>> >>> 2. Merge will be implemented through RewriteDataFiles as proposed in >>> https://github.com/apache/iceberg/pull/3207, but instead as a new >>> strategy by extending the existing BinPackStrategy. For users who would >>> also like to run sort during Merge, we will have another delete strategy >>> that extends the SortStrategy. >>> >>> 3. Merge can have an option that allows users to set the minimum numbers >>> of delete files to trigger a compaction. However, that would result in very >>> frequent compaction of full partition if people add many global delete >>> files. A Convert of global equality deletes to partition position deletes >>> while maintaining the same sequence number can be used to solve the issue. >>> Currently there is no way to write files with a custom sequence number. >>> This functionality needs to be added. >>> >>> 4. we generally agreed upon the APIs for Rewrite and Convert at >>> https://github.com/apache/iceberg/pull/2841. >>> >>> 5. we had some discussion around the separation of row and partition >>> level filters. The general direction in the meeting is to just have a >>> single filter method. We will sync offline to reach an agreement. >>> >>> 6. people raised the issue that if new delete files are added to a data >>> file while a Merge is going on, then the Merge would fail. That causes huge >>> performance issues for CDC streaming use cases and Merge is very hard to >>> succeed. There are 2 proposed solutions: >>> (1) for hot partitions, users can try to only perform Convert and >>> Rewrite to keep delete file sizes and count manageable, until the partition >>> becomes cold and a Merge can be performed safely. >>> (2) it looks like we need a Merge strategy that does not do any >>> bin-packing, and only merges the delete files for each data file and writes >>> it back. The new data file will have the same sequence number as the old >>> file before Merge. By doing so, new delete files can still be applied >>> safely and the compaction can succeed without concerns around conflict. The >>> caveat is that this does not work for position deletes because the row >>> position changes for each file after Merge. But for the CDC streaming use >>> case it is acceptable to only write equality deletes, so this looks like a >>> feasible approach. >>> >>> 7. people raised the concern about the memory consumption issue for the >>> is_deleted metadata column. We ran out of time and will continue the >>> discussion offline on Slack. >>> >>> Best, >>> Jack Ye >>> >>> >>> >>> On Mon, Oct 18, 2021 at 7:50 PM Jack Ye <[email protected]> wrote: >>> >>>> Hi everyone, >>>> >>>> We are planning to have a meeting to discuss the design of Iceberg >>>> delete compaction on Thursday 5-6pm PDT. The meeting link is >>>> https://meet.google.com/nxx-nnvj-omx. >>>> >>>> We have also created the channel #compaction on Slack, please join the >>>> channel for daily discussions if you are interested in the progress. >>>> >>>> Best, >>>> Jack Ye >>>> >>>> On Tue, Sep 28, 2021 at 10:23 PM Jack Ye <[email protected]> wrote: >>>> >>>>> Hi everyone, >>>>> >>>>> As there are more and more people adopting the v2 spec, we are seeing >>>>> an increasing number of requests for delete compaction support. >>>>> >>>>> Here is a document discussing the use cases and basic interface design >>>>> for it to get the community aligned around what compactions we would offer >>>>> and how the interfaces would be divided: >>>>> https://docs.google.com/document/d/1-EyKSfwd_W9iI5jrzAvomVw3w1mb_kayVNT7f2I-SUg >>>>> >>>>> Any feedback would be appreciated! >>>>> >>>>> Best, >>>>> Jack Ye >>>>> >>>>
