Sorry ~ Forget to say that my Confluence ID is danny0405.
It would be nice if any of you can help on this. Best, Danny Chan Danny Chan <[email protected]> 于2021年1月6日周三 下午12:00写道: > Hi, can someone give me the CWIKI permission so that i can update the > design details to that (maybe as a new RFC though ~). > > wangxianghu <[email protected]> 于2021年1月5日周二 下午2:43写道: > >> + 1, Thanks Danny! >> I believe this new feature OperatorConrdinator in flink-1.11 will help >> improve the current implementation >> >> Best, >> >> XianghuWang >> >> At 2021-01-05 14:17:37, "vino yang" <[email protected]> wrote: >> >Hi, >> > >> >Sharing more details, the OperatorConrdinator is the part of the new Data >> >Source API(Beta) involved in the Flink 1.11's release note[1]. >> > >> >Flink 1.11 was released only about half a year ago. The design of RFC-13 >> >began at the end of 2019, and most of the implementation was completed >> when >> >Flink 1.11 was released. >> > >> >I believe that the production environment of many large companies has not >> >been upgraded so quickly (As far as our company is concerned, we still >> have >> >some jobs running on flink release packages below 1.9). >> > >> >So, maybe we need to find a mechanism to benefit both new and old users. >> > >> >[1]: >> > >> https://flink.apache.org/news/2020/07/06/release-1.11.0.html#new-data-source-api-beta >> > >> >Best, >> >Vino >> > >> >vino yang <[email protected]> 于2021年1月5日周二 下午12:30写道: >> > >> >> Hi, >> >> >> >> +1, thank you Danny for introducing this new feature >> >> (OperatorCoordinator)[1] of Flink in the recently latest version. >> >> This feature is very helpful for improving the implementation >> mechanism of >> >> Flink write-client. >> >> >> >> But this feature is only available after Flink 1.11. Before that, there >> >> was no good way to realize the mechanism of task upstream and >> downstream >> >> coordination through the public API provided by Flink. >> >> I just have a concern, whether we need to take into account the users >> of >> >> earlier versions (less than Flink 1.11). >> >> >> >> [1]: https://issues.apache.org/jira/browse/FLINK-15099 >> >> >> >> Best, >> >> Vino >> >> >> >> Gary Li <[email protected]> 于2021年1月5日周二 上午10:40写道: >> >> >> >>> Hi Danny, >> >>> >> >>> Thanks for the proposal. I'd recommend starting a new RFC. RFC-13 was >> >>> done and including some work about the refactoring so we should mark >> it as >> >>> completed. Looking forward to having further discussion on the RFC. >> >>> >> >>> Best, >> >>> Gary Li >> >>> ________________________________ >> >>> From: Danny Chan <[email protected]> >> >>> Sent: Tuesday, January 5, 2021 10:22 AM >> >>> To: [email protected] <[email protected]> >> >>> Subject: Re: [DISCUSS] New Flink Writer Proposal >> >>> >> >>> Sure, i can update the RFC-13 cwiki if you agree with that. >> >>> >> >>> Vinoth Chandar <[email protected]> 于2021年1月5日周二 上午2:58写道: >> >>> >> >>> > Overall +1 on the idea. >> >>> > >> >>> > Danny, could we move this to the apache cwiki if you don't mind? >> >>> > That's what we have been using for other RFC discussions. >> >>> > >> >>> > On Mon, Jan 4, 2021 at 1:22 AM Danny Chan <[email protected]> >> wrote: >> >>> > >> >>> > > The RFC-13 Flink writer has some bottlenecks that make it hard to >> >>> adapter >> >>> > > to production: >> >>> > > >> >>> > > - The InstantGeneratorOperator is parallelism 1, which is a limit >> for >> >>> > > high-throughput consumption; because all the split inputs drain >> to a >> >>> > single >> >>> > > thread, the network IO would gains pressure too >> >>> > > - The WriteProcessOperator handles inputs by partition, that >> means, >> >>> > within >> >>> > > each partition write process, the BUCKETs are written one by one, >> the >> >>> > FILE >> >>> > > IO is limit to adapter to high-throughput inputs >> >>> > > - It buffers the data by checkpoints, which is too hard to be >> robust >> >>> for >> >>> > > production, the checkpoint function is blocking and should not >> have IO >> >>> > > operations. >> >>> > > - The FlinkHoodieIndex is only valid for a per-job scope, it does >> not >> >>> > work >> >>> > > for existing bootstrap data or for different Flink jobs >> >>> > > >> >>> > > Thus, here I propose a new design for the Flink writer to solve >> these >> >>> > > problems[1]. Overall, the new design tries to remove the single >> >>> > parallelism >> >>> > > operators and make the index more powerful and scalable. >> >>> > > >> >>> > > I plan to solve these bottlenecks incrementally (4 steps), there >> are >> >>> > > already some local POCs for these proposals. >> >>> > > >> >>> > > I'm looking forward to your feedback. Any suggestions are >> appreciated >> >>> ~ >> >>> > > >> >>> > > [1] >> >>> > > >> >>> > > >> >>> > >> >>> >> https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1oOcU0VNwtEtZfTRt3v9z4xNQWY-Hy5beu7a1t5B-75I%2Fedit%3Fusp%3Dsharing&data=04%7C01%7C%7Cd256cf75a4f14db4c7f608d8b120d69c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637454101880191121%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Ecw3TcwsVPFFG74scaE7KhMsIryhVRn9g40B0yMQvfc%3D&reserved=0 >> >>> > > >> >>> > >> >>> >> >> >> >
