Re: Iceberg Delete Compaction Interface Design

Puneet Zaroo Mon, 01 Nov 2021 14:42:21 -0700

I had a few follow-up points.

1 *"(1) for hot partitions, users can try to only perform Convert and
Rewrite to keep delete file sizes and count manageable, until the partition
becomes cold and a Merge can be performed safely.".* : I believe for the
CDC use case it is hard to guarantee that  that partitions will turn cold
and can be merged without conflicts, as 'hotness' is a factor of mutation
rate in the source DB; and perhaps some partitions are always "hot"; so in
essence the following:  *"Merge strategy that does not do any bin-packing,
and only merges the delete files for each data file and writes it back. The
new data file will have the same sequence number as the old file before
Merge"* seems important. Though as a follow-up I am wondering why can't
this strategy do bin-packing or sorting as well; if that is required; as
long as the sequence number is not updated.


2 *"During the commit validation phase of a Merge operation, we need to
verify that for each data file that would be removed, there are no new
position deletes with higher sequence number added."* : Just to be clear;
for the tables only being written into by a Flink CDC pipeline, this should
not happen as position deletes are only created for in-progress
(uncommitted) data files, correct ?

Thanks and regards,
- Puneet



On Thu, Oct 21, 2021 at 10:54 PM Jack Ye <yezhao...@gmail.com> wrote:

> Had some offline discussions on Slack and WeChat.
>
> For Russell's point, we are reconfirming with related people on Slack, and
> will post updates once we have an agreement.
>
> Regarding point 6, for Flink CDC the data file flushed to disk might be
> associated with position deletes, but after the flush all deletes will be
> equality deletes, so 6-2 still works. After all, as long as data files for
> position deletes are not removed, the process should be able to succeed
> with optimistic retry. Currently we are missing the following that needs to
> be worked on to resolve the CDC performance issue:
> 1. We need to support setting the sequence number for individual content
> files.
> 2. During the commit validation phase of a Merge operation, we need to
> verify that for each data file that would be removed, there are no new
> position deletes with higher sequence number added. If detected, merge of
> that file has to be completely retried (we can support incremental progress
> for this).
>
> -Jack
>
>
> On Thu, Oct 21, 2021 at 7:58 PM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
>> I think I understood the Rewrite strategy discussion a little differently
>>
>> Binpack Strategy and SortStrategy each get a new flag which lets you pick
>> files based on their number of delete files. So basically you can set a
>> variety of parameters, small files, large files, files with deletes etc ...
>>
>> A new strategy is added which determines which file to rewrite by looking
>> for all files currently touched by delete files. Instead of looking through
>> files with X deletes, we look up all files affected by deletes and rewrite
>> them. Although now as I write this it's basically running the above
>> strategies with number of delete files >= 1 and files per group at 1. So
>> maybe it doesn't need another strategy?
>>
>> But maybe I got that wrong ...
>>
>> On Thu, Oct 21, 2021 at 8:39 PM Jack Ye <yezhao...@gmail.com> wrote:
>>
>>> Thanks to everyone who came to the meeting.
>>>
>>> Here is the full meeting recording I made:
>>> https://drive.google.com/file/d/1yuBFlNn9nkMlH9TIut2H8CXmJGLd18Sa/view?usp=sharing
>>>
>>> Here are some key takeaways:
>>>
>>> 1. we generally agreed upon the division of compactions into Rewrite,
>>> Convert and Merge.
>>>
>>> 2. Merge will be implemented through RewriteDataFiles as proposed in
>>> https://github.com/apache/iceberg/pull/3207, but instead as a new
>>> strategy by extending the existing BinPackStrategy. For users who would
>>> also like to run sort during Merge, we will have another delete strategy
>>> that extends the SortStrategy.
>>>
>>> 3. Merge can have an option that allows users to set the minimum numbers
>>> of delete files to trigger a compaction. However, that would result in very
>>> frequent compaction of full partition if people add many global delete
>>> files. A Convert of global equality deletes to partition position deletes
>>> while maintaining the same sequence number can be used to solve the issue.
>>> Currently there is no way to write files with a custom sequence number.
>>> This functionality needs to be added.
>>>
>>> 4. we generally agreed upon the APIs for Rewrite and Convert at
>>> https://github.com/apache/iceberg/pull/2841.
>>>
>>> 5. we had some discussion around the separation of row and partition
>>> level filters. The general direction in the meeting is to just have a
>>> single filter method. We will sync offline to reach an agreement.
>>>
>>> 6. people raised the issue that if new delete files are added to a data
>>> file while a Merge is going on, then the Merge would fail. That causes huge
>>> performance issues for CDC streaming use cases and Merge is very hard to
>>> succeed. There are 2 proposed solutions:
>>>   (1) for hot partitions, users can try to only perform Convert and
>>> Rewrite to keep delete file sizes and count manageable, until the partition
>>> becomes cold and a Merge can be performed safely.
>>>   (2) it looks like we need a Merge strategy that does not do any
>>> bin-packing, and only merges the delete files for each data file and writes
>>> it back. The new data file will have the same sequence number as the old
>>> file before Merge. By doing so, new delete files can still be applied
>>> safely and the compaction can succeed without concerns around conflict. The
>>> caveat is that this does not work for position deletes because the row
>>> position changes for each file after Merge. But for the CDC streaming use
>>> case it is acceptable to only write equality deletes, so this looks like a
>>> feasible approach.
>>>
>>> 7. people raised the concern about the memory consumption issue for the
>>> is_deleted metadata column. We ran out of time and will continue the
>>> discussion offline on Slack.
>>>
>>> Best,
>>> Jack Ye
>>>
>>>
>>>
>>> On Mon, Oct 18, 2021 at 7:50 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> We are planning to have a meeting to discuss the design of Iceberg
>>>> delete compaction on Thursday 5-6pm PDT. The meeting link is
>>>> https://meet.google.com/nxx-nnvj-omx.
>>>>
>>>> We have also created the channel #compaction on Slack, please join the
>>>> channel for daily discussions if you are interested in the progress.
>>>>
>>>> Best,
>>>> Jack Ye
>>>>
>>>> On Tue, Sep 28, 2021 at 10:23 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> As there are more and more people adopting the v2 spec, we are seeing
>>>>> an increasing number of requests for delete compaction support.
>>>>>
>>>>> Here is a document discussing the use cases and basic interface design
>>>>> for it to get the community aligned around what compactions we would offer
>>>>> and how the interfaces would be divided:
>>>>> https://docs.google.com/document/d/1-EyKSfwd_W9iI5jrzAvomVw3w1mb_kayVNT7f2I-SUg
>>>>>
>>>>> Any feedback would be appreciated!
>>>>>
>>>>> Best,
>>>>> Jack Ye
>>>>>
>>>>

Re: Iceberg Delete Compaction Interface Design

Reply via email to