Hi Danny, Thanks for the proposal. I'd recommend starting a new RFC. RFC-13 was done and including some work about the refactoring so we should mark it as completed. Looking forward to having further discussion on the RFC.
Best, Gary Li ________________________________ From: Danny Chan <[email protected]> Sent: Tuesday, January 5, 2021 10:22 AM To: [email protected] <[email protected]> Subject: Re: [DISCUSS] New Flink Writer Proposal Sure, i can update the RFC-13 cwiki if you agree with that. Vinoth Chandar <[email protected]> 于2021年1月5日周二 上午2:58写道: > Overall +1 on the idea. > > Danny, could we move this to the apache cwiki if you don't mind? > That's what we have been using for other RFC discussions. > > On Mon, Jan 4, 2021 at 1:22 AM Danny Chan <[email protected]> wrote: > > > The RFC-13 Flink writer has some bottlenecks that make it hard to adapter > > to production: > > > > - The InstantGeneratorOperator is parallelism 1, which is a limit for > > high-throughput consumption; because all the split inputs drain to a > single > > thread, the network IO would gains pressure too > > - The WriteProcessOperator handles inputs by partition, that means, > within > > each partition write process, the BUCKETs are written one by one, the > FILE > > IO is limit to adapter to high-throughput inputs > > - It buffers the data by checkpoints, which is too hard to be robust for > > production, the checkpoint function is blocking and should not have IO > > operations. > > - The FlinkHoodieIndex is only valid for a per-job scope, it does not > work > > for existing bootstrap data or for different Flink jobs > > > > Thus, here I propose a new design for the Flink writer to solve these > > problems[1]. Overall, the new design tries to remove the single > parallelism > > operators and make the index more powerful and scalable. > > > > I plan to solve these bottlenecks incrementally (4 steps), there are > > already some local POCs for these proposals. > > > > I'm looking forward to your feedback. Any suggestions are appreciated ~ > > > > [1] > > > > > https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1oOcU0VNwtEtZfTRt3v9z4xNQWY-Hy5beu7a1t5B-75I%2Fedit%3Fusp%3Dsharing&data=04%7C01%7C%7Cd256cf75a4f14db4c7f608d8b120d69c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637454101880191121%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Ecw3TcwsVPFFG74scaE7KhMsIryhVRn9g40B0yMQvfc%3D&reserved=0 > > >
