Hi Danny, You should have cwiki edit permission now. Any problems let me know.
Best, Vino Danny Chan <danny0...@apache.org> 于2021年1月6日周三 下午12:05写道: > Sorry ~ > > Forget to say that my Confluence ID is danny0405. > > It would be nice if any of you can help on this. > > Best, > Danny Chan > > Danny Chan <danny0...@apache.org> 于2021年1月6日周三 下午12:00写道: > > > Hi, can someone give me the CWIKI permission so that i can update the > > design details to that (maybe as a new RFC though ~). > > > > wangxianghu <wxhj...@126.com> 于2021年1月5日周二 下午2:43写道: > > > >> + 1, Thanks Danny! > >> I believe this new feature OperatorConrdinator in flink-1.11 will help > >> improve the current implementation > >> > >> Best, > >> > >> XianghuWang > >> > >> At 2021-01-05 14:17:37, "vino yang" <yanghua1...@gmail.com> wrote: > >> >Hi, > >> > > >> >Sharing more details, the OperatorConrdinator is the part of the new > Data > >> >Source API(Beta) involved in the Flink 1.11's release note[1]. > >> > > >> >Flink 1.11 was released only about half a year ago. The design of > RFC-13 > >> >began at the end of 2019, and most of the implementation was completed > >> when > >> >Flink 1.11 was released. > >> > > >> >I believe that the production environment of many large companies has > not > >> >been upgraded so quickly (As far as our company is concerned, we still > >> have > >> >some jobs running on flink release packages below 1.9). > >> > > >> >So, maybe we need to find a mechanism to benefit both new and old > users. > >> > > >> >[1]: > >> > > >> > https://flink.apache.org/news/2020/07/06/release-1.11.0.html#new-data-source-api-beta > >> > > >> >Best, > >> >Vino > >> > > >> >vino yang <yanghua1...@gmail.com> 于2021年1月5日周二 下午12:30写道: > >> > > >> >> Hi, > >> >> > >> >> +1, thank you Danny for introducing this new feature > >> >> (OperatorCoordinator)[1] of Flink in the recently latest version. > >> >> This feature is very helpful for improving the implementation > >> mechanism of > >> >> Flink write-client. > >> >> > >> >> But this feature is only available after Flink 1.11. Before that, > there > >> >> was no good way to realize the mechanism of task upstream and > >> downstream > >> >> coordination through the public API provided by Flink. > >> >> I just have a concern, whether we need to take into account the users > >> of > >> >> earlier versions (less than Flink 1.11). > >> >> > >> >> [1]: https://issues.apache.org/jira/browse/FLINK-15099 > >> >> > >> >> Best, > >> >> Vino > >> >> > >> >> Gary Li <garyli1...@outlook.com> 于2021年1月5日周二 上午10:40写道: > >> >> > >> >>> Hi Danny, > >> >>> > >> >>> Thanks for the proposal. I'd recommend starting a new RFC. RFC-13 > was > >> >>> done and including some work about the refactoring so we should mark > >> it as > >> >>> completed. Looking forward to having further discussion on the RFC. > >> >>> > >> >>> Best, > >> >>> Gary Li > >> >>> ________________________________ > >> >>> From: Danny Chan <danny0...@apache.org> > >> >>> Sent: Tuesday, January 5, 2021 10:22 AM > >> >>> To: dev@hudi.apache.org <dev@hudi.apache.org> > >> >>> Subject: Re: [DISCUSS] New Flink Writer Proposal > >> >>> > >> >>> Sure, i can update the RFC-13 cwiki if you agree with that. > >> >>> > >> >>> Vinoth Chandar <vin...@apache.org> 于2021年1月5日周二 上午2:58写道: > >> >>> > >> >>> > Overall +1 on the idea. > >> >>> > > >> >>> > Danny, could we move this to the apache cwiki if you don't mind? > >> >>> > That's what we have been using for other RFC discussions. > >> >>> > > >> >>> > On Mon, Jan 4, 2021 at 1:22 AM Danny Chan <danny0...@apache.org> > >> wrote: > >> >>> > > >> >>> > > The RFC-13 Flink writer has some bottlenecks that make it hard > to > >> >>> adapter > >> >>> > > to production: > >> >>> > > > >> >>> > > - The InstantGeneratorOperator is parallelism 1, which is a > limit > >> for > >> >>> > > high-throughput consumption; because all the split inputs drain > >> to a > >> >>> > single > >> >>> > > thread, the network IO would gains pressure too > >> >>> > > - The WriteProcessOperator handles inputs by partition, that > >> means, > >> >>> > within > >> >>> > > each partition write process, the BUCKETs are written one by > one, > >> the > >> >>> > FILE > >> >>> > > IO is limit to adapter to high-throughput inputs > >> >>> > > - It buffers the data by checkpoints, which is too hard to be > >> robust > >> >>> for > >> >>> > > production, the checkpoint function is blocking and should not > >> have IO > >> >>> > > operations. > >> >>> > > - The FlinkHoodieIndex is only valid for a per-job scope, it > does > >> not > >> >>> > work > >> >>> > > for existing bootstrap data or for different Flink jobs > >> >>> > > > >> >>> > > Thus, here I propose a new design for the Flink writer to solve > >> these > >> >>> > > problems[1]. Overall, the new design tries to remove the single > >> >>> > parallelism > >> >>> > > operators and make the index more powerful and scalable. > >> >>> > > > >> >>> > > I plan to solve these bottlenecks incrementally (4 steps), there > >> are > >> >>> > > already some local POCs for these proposals. > >> >>> > > > >> >>> > > I'm looking forward to your feedback. Any suggestions are > >> appreciated > >> >>> ~ > >> >>> > > > >> >>> > > [1] > >> >>> > > > >> >>> > > > >> >>> > > >> >>> > >> > https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1oOcU0VNwtEtZfTRt3v9z4xNQWY-Hy5beu7a1t5B-75I%2Fedit%3Fusp%3Dsharing&data=04%7C01%7C%7Cd256cf75a4f14db4c7f608d8b120d69c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637454101880191121%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Ecw3TcwsVPFFG74scaE7KhMsIryhVRn9g40B0yMQvfc%3D&reserved=0 > >> >>> > > > >> >>> > > >> >>> > >> >> > >> > > >