Thanks everyone for the discussion and the votes! With my +1, the result is: +1: 4 (binding), 4 (non-binding) +0: 0 (binding), 0 (non-binding) -1: 0 (binding), 0 (non-binding)
Since the proposal is passed, let us continue with the next steps: - Rod is already working on the SinkV2 implementation ( https://github.com/apache/iceberg/pull/10179) - Monitor Source - Trigger Manager - Commit Converter Thanks, Peter On Tue, May 7, 2024, 18:13 Péter Váry <peter.vary.apa...@gmail.com> wrote: > Thanks Dan for your support. > > There was a longer discussion around what is more important/useful: > > - Some of the commenters were concerned about the resource usage and > effects of the maintenance task to the Flink job checkpointing. These users > prefer the `Separate Maintenance Job` solution, which is better at > separating resource usage and separation of concerns. > - Some of the commenters were planning to use it for well maintained > tables where the expected resource usage is less of a concern. These users > prefer the `Post Commit Maintenance` approach, with which they could reuse > resources from the job doing the actual writes. > > Since both of the solutions are using the same building blocks, I have > incorporated both approaches into the document, and plan to implement both > of them. > So I think we have a consensus here. > > At Flink, even if we have consensus during the discussion, it is required > to start a vote. I am not sure what is the Iceberg approach here, but I > think it is important to have a final validation for the framework before > we start adding code. > > Thanks, > Peter > > Daniel Weeks <dwe...@apache.org> ezt írta (időpont: 2024. máj. 7., K, > 17:36): > >> +1 for supporting more maintenance support in Flink >> >> Peter, just wondering if there is really any known opposition/dissenting >> opinions or if you're just looking for general agreement on the path >> forward? >> >> I would also agree with the single pipeline / post commit approach as >> having to configure multiple jobs or scheduling is a lot of additional >> infrastructure work to set up, so single feels like it provides the most >> immediate value for the larger community. >> >> -Dan >> >> On Tue, May 7, 2024 at 6:32 AM Zhu Zhu <reed...@gmail.com> wrote: >> >>> +1 >>> >>> Thanks, >>> Zhu >>> >>> Jean-Baptiste Onofré <j...@nanthrax.net> 于2024年5月7日周二 16:17写道: >>> >>>> +1 >>>> >>>> Regards >>>> JB >>>> >>>> On Fri, May 3, 2024 at 8:30 PM Péter Váry <peter.vary.apa...@gmail.com> >>>> wrote: >>>> > >>>> > Hi everyone, >>>> > >>>> > I would like to make a proposal [1] to support Flink Table >>>> Maintenance in Iceberg. The main goal is to have a solution where Flink can >>>> execute the Maintenance Tasks as part of the streaming job. Especially >>>> Rewrite Data Files, Rewrite Manifest Files and Expire Snapshots. >>>> > The secondary goal is to provide building blocks for Flink batch jobs >>>> to execute the Maintenance Tasks independently, where the scheduling is >>>> done outside of Flink. >>>> > >>>> > This proposal is the outcome of extensive community discussions on >>>> the mailing list [2, 3]. >>>> > >>>> > Please respond with your recommendation: >>>> > +1 if you support moving forward with the two separate objects model. >>>> > 0 if you are neutral. >>>> > -1 if you disagree with the two separate objects model. >>>> > >>>> > Thanks, >>>> > Peter >>>> > >>>> > [1] https://github.com/apache/iceberg/issues/10264 >>>> > [2] https://lists.apache.org/thread/yjcwbf1037jdq4prty6rtrrqmjzc71o0 >>>> > [3] https://lists.apache.org/thread/10mdf9zo6pn0dfq791nf4w1m7jh9k3sl >>>> >>>