Re: [Proposal] Add support for Flink Maintenance in Iceberg

Péter Váry Wed, 08 May 2024 22:38:18 -0700

Thanks everyone for the discussion and the votes!

With my +1, the result is:
+1: 4 (binding), 4 (non-binding)
+0: 0 (binding), 0 (non-binding)
-1: 0 (binding), 0 (non-binding)


Since the proposal  is passed, let us continue with the next steps:
- Rod is already working on the SinkV2 implementation (
https://github.com/apache/iceberg/pull/10179)
- Monitor Source
- Trigger Manager
- Commit Converter

Thanks,
Peter

On Tue, May 7, 2024, 18:13 Péter Váry <[email protected]> wrote:

> Thanks Dan for your support.
>
> There was a longer discussion around what is more important/useful:
>
>    - Some of the commenters were concerned about the resource usage and
>    effects of the maintenance task to the Flink job checkpointing. These users
>    prefer the `Separate Maintenance Job` solution, which is better at
>    separating resource usage and separation of concerns.
>    - Some of the commenters were planning to use it for well maintained
>    tables where the expected resource usage is less of a concern. These users
>    prefer the `Post Commit Maintenance` approach, with which they could reuse
>    resources from the job doing the actual writes.
>
> Since both of the solutions are using the same building blocks, I have
> incorporated both approaches into the document, and plan to implement both
> of them.
> So I think we have a consensus here.
>
> At Flink, even if we have consensus during the discussion, it is required
> to start a vote. I am not sure what is the Iceberg approach here, but I
> think it is important to have a final validation for the framework before
> we start adding code.
>
> Thanks,
> Peter
>
> Daniel Weeks <[email protected]> ezt írta (időpont: 2024. máj. 7., K,
> 17:36):
>
>> +1 for supporting more maintenance support in Flink
>>
>> Peter, just wondering if there is really any known opposition/dissenting
>> opinions or if you're just looking for general agreement on the path
>> forward?
>>
>> I would also agree with the single pipeline / post commit approach as
>> having to configure multiple jobs or scheduling is a lot of additional
>> infrastructure work to set up, so single feels like it provides the most
>> immediate value for the larger community.
>>
>> -Dan
>>
>> On Tue, May 7, 2024 at 6:32 AM Zhu Zhu <[email protected]> wrote:
>>
>>> +1
>>>
>>> Thanks,
>>> Zhu
>>>
>>> Jean-Baptiste Onofré <[email protected]> 于2024年5月7日周二 16:17写道：
>>>
>>>> +1
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> On Fri, May 3, 2024 at 8:30 PM Péter Váry <[email protected]>
>>>> wrote:
>>>> >
>>>> > Hi everyone,
>>>> >
>>>> > I would like to make a proposal [1] to support Flink Table
>>>> Maintenance in Iceberg. The main goal is to have a solution where Flink can
>>>> execute the Maintenance Tasks as part of the streaming job. Especially
>>>> Rewrite Data Files, Rewrite Manifest Files and Expire Snapshots.
>>>> > The secondary goal is to provide building blocks for Flink batch jobs
>>>> to execute the Maintenance Tasks independently, where the scheduling is
>>>> done outside of Flink.
>>>> >
>>>> > This proposal is the outcome of extensive community discussions on
>>>> the mailing list [2, 3].
>>>> >
>>>> > Please respond with your recommendation:
>>>> > +1 if you support moving forward with the two separate objects model.
>>>> > 0 if you are neutral.
>>>> > -1 if you disagree with the two separate objects model.
>>>> >
>>>> > Thanks,
>>>> > Peter
>>>> >
>>>> > [1] https://github.com/apache/iceberg/issues/10264
>>>> > [2] https://lists.apache.org/thread/yjcwbf1037jdq4prty6rtrrqmjzc71o0
>>>> > [3] https://lists.apache.org/thread/10mdf9zo6pn0dfq791nf4w1m7jh9k3sl
>>>>
>>>

Re: [Proposal] Add support for Flink Maintenance in Iceberg

Reply via email to