Hi, Sharing more details, the OperatorConrdinator is the part of the new Data Source API(Beta) involved in the Flink 1.11's release note[1].
Flink 1.11 was released only about half a year ago. The design of RFC-13 began at the end of 2019, and most of the implementation was completed when Flink 1.11 was released. I believe that the production environment of many large companies has not been upgraded so quickly (As far as our company is concerned, we still have some jobs running on flink release packages below 1.9). So, maybe we need to find a mechanism to benefit both new and old users. [1]: https://flink.apache.org/news/2020/07/06/release-1.11.0.html#new-data-source-api-beta Best, Vino vino yang <[email protected]> 于2021年1月5日周二 下午12:30写道: > Hi, > > +1, thank you Danny for introducing this new feature > (OperatorCoordinator)[1] of Flink in the recently latest version. > This feature is very helpful for improving the implementation mechanism of > Flink write-client. > > But this feature is only available after Flink 1.11. Before that, there > was no good way to realize the mechanism of task upstream and downstream > coordination through the public API provided by Flink. > I just have a concern, whether we need to take into account the users of > earlier versions (less than Flink 1.11). > > [1]: https://issues.apache.org/jira/browse/FLINK-15099 > > Best, > Vino > > Gary Li <[email protected]> 于2021年1月5日周二 上午10:40写道: > >> Hi Danny, >> >> Thanks for the proposal. I'd recommend starting a new RFC. RFC-13 was >> done and including some work about the refactoring so we should mark it as >> completed. Looking forward to having further discussion on the RFC. >> >> Best, >> Gary Li >> ________________________________ >> From: Danny Chan <[email protected]> >> Sent: Tuesday, January 5, 2021 10:22 AM >> To: [email protected] <[email protected]> >> Subject: Re: [DISCUSS] New Flink Writer Proposal >> >> Sure, i can update the RFC-13 cwiki if you agree with that. >> >> Vinoth Chandar <[email protected]> 于2021年1月5日周二 上午2:58写道: >> >> > Overall +1 on the idea. >> > >> > Danny, could we move this to the apache cwiki if you don't mind? >> > That's what we have been using for other RFC discussions. >> > >> > On Mon, Jan 4, 2021 at 1:22 AM Danny Chan <[email protected]> wrote: >> > >> > > The RFC-13 Flink writer has some bottlenecks that make it hard to >> adapter >> > > to production: >> > > >> > > - The InstantGeneratorOperator is parallelism 1, which is a limit for >> > > high-throughput consumption; because all the split inputs drain to a >> > single >> > > thread, the network IO would gains pressure too >> > > - The WriteProcessOperator handles inputs by partition, that means, >> > within >> > > each partition write process, the BUCKETs are written one by one, the >> > FILE >> > > IO is limit to adapter to high-throughput inputs >> > > - It buffers the data by checkpoints, which is too hard to be robust >> for >> > > production, the checkpoint function is blocking and should not have IO >> > > operations. >> > > - The FlinkHoodieIndex is only valid for a per-job scope, it does not >> > work >> > > for existing bootstrap data or for different Flink jobs >> > > >> > > Thus, here I propose a new design for the Flink writer to solve these >> > > problems[1]. Overall, the new design tries to remove the single >> > parallelism >> > > operators and make the index more powerful and scalable. >> > > >> > > I plan to solve these bottlenecks incrementally (4 steps), there are >> > > already some local POCs for these proposals. >> > > >> > > I'm looking forward to your feedback. Any suggestions are appreciated >> ~ >> > > >> > > [1] >> > > >> > > >> > >> https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1oOcU0VNwtEtZfTRt3v9z4xNQWY-Hy5beu7a1t5B-75I%2Fedit%3Fusp%3Dsharing&data=04%7C01%7C%7Cd256cf75a4f14db4c7f608d8b120d69c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637454101880191121%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Ecw3TcwsVPFFG74scaE7KhMsIryhVRn9g40B0yMQvfc%3D&reserved=0 >> > > >> > >> >
