Re: Re: [DISCUSS] New Flink Writer Proposal

vino yang Tue, 05 Jan 2021 21:38:39 -0800

Hi Danny,

You should have cwiki edit permission now.
Any problems let me know.


Best,
Vino

Danny Chan <[email protected]> 于2021年1月6日周三 下午12:05写道：

> Sorry ~
>
> Forget to say that my Confluence ID is danny0405.
>
> It would be nice if any of you can help on this.
>
> Best,
> Danny Chan
>
> Danny Chan <[email protected]> 于2021年1月6日周三 下午12:00写道：
>
> > Hi, can someone give me the CWIKI permission so that i can update the
> > design details to that (maybe as a new RFC though ~).
> >
> > wangxianghu <[email protected]> 于2021年1月5日周二 下午2:43写道：
> >
> >> + 1, Thanks Danny!
> >> I believe this new feature OperatorConrdinator in flink-1.11 will help
> >> improve the current implementation
> >>
> >> Best,
> >>
> >> XianghuWang
> >>
> >> At 2021-01-05 14:17:37, "vino yang" <[email protected]> wrote:
> >> >Hi,
> >> >
> >> >Sharing more details, the OperatorConrdinator is the part of the new
> Data
> >> >Source API(Beta) involved in the Flink 1.11's release note[1].
> >> >
> >> >Flink 1.11 was released only about half a year ago. The design of
> RFC-13
> >> >began at the end of 2019, and most of the implementation was completed
> >> when
> >> >Flink 1.11 was released.
> >> >
> >> >I believe that the production environment of many large companies has
> not
> >> >been upgraded so quickly (As far as our company is concerned, we still
> >> have
> >> >some jobs running on flink release packages below 1.9).
> >> >
> >> >So, maybe we need to find a mechanism to benefit both new and old
> users.
> >> >
> >> >[1]:
> >> >
> >>
> https://flink.apache.org/news/2020/07/06/release-1.11.0.html#new-data-source-api-beta
> >> >
> >> >Best,
> >> >Vino
> >> >
> >> >vino yang <[email protected]> 于2021年1月5日周二 下午12:30写道：
> >> >
> >> >> Hi,
> >> >>
> >> >> +1, thank you Danny for introducing this new feature
> >> >> (OperatorCoordinator)[1] of Flink in the recently latest version.
> >> >> This feature is very helpful for improving the implementation
> >> mechanism of
> >> >> Flink write-client.
> >> >>
> >> >> But this feature is only available after Flink 1.11. Before that,
> there
> >> >> was no good way to realize the mechanism of task upstream and
> >> downstream
> >> >> coordination through the public API provided by Flink.
> >> >> I just have a concern, whether we need to take into account the users
> >> of
> >> >> earlier versions (less than Flink 1.11).
> >> >>
> >> >> [1]: https://issues.apache.org/jira/browse/FLINK-15099
> >> >>
> >> >> Best,
> >> >> Vino
> >> >>
> >> >> Gary Li <[email protected]> 于2021年1月5日周二 上午10:40写道：
> >> >>
> >> >>> Hi Danny,
> >> >>>
> >> >>> Thanks for the proposal. I'd recommend starting a new RFC. RFC-13
> was
> >> >>> done and including some work about the refactoring so we should mark
> >> it as
> >> >>> completed. Looking forward to having further discussion on the RFC.
> >> >>>
> >> >>> Best,
> >> >>> Gary Li
> >> >>> ________________________________
> >> >>> From: Danny Chan <[email protected]>
> >> >>> Sent: Tuesday, January 5, 2021 10:22 AM
> >> >>> To: [email protected] <[email protected]>
> >> >>> Subject: Re: [DISCUSS] New Flink Writer Proposal
> >> >>>
> >> >>> Sure, i can update the RFC-13 cwiki if you agree with that.
> >> >>>
> >> >>> Vinoth Chandar <[email protected]> 于2021年1月5日周二 上午2:58写道：
> >> >>>
> >> >>> > Overall +1 on the idea.
> >> >>> >
> >> >>> > Danny, could we move this to the apache cwiki if you don't mind?
> >> >>> > That's what we have been using for other RFC discussions.
> >> >>> >
> >> >>> > On Mon, Jan 4, 2021 at 1:22 AM Danny Chan <[email protected]>
> >> wrote:
> >> >>> >
> >> >>> > > The RFC-13 Flink writer has some bottlenecks that make it hard
> to
> >> >>> adapter
> >> >>> > > to production:
> >> >>> > >
> >> >>> > > - The InstantGeneratorOperator is parallelism 1, which is a
> limit
> >> for
> >> >>> > > high-throughput consumption; because all the split inputs drain
> >> to a
> >> >>> > single
> >> >>> > > thread, the network IO would gains pressure too
> >> >>> > > - The WriteProcessOperator handles inputs by partition, that
> >> means,
> >> >>> > within
> >> >>> > > each partition write process, the BUCKETs are written one by
> one,
> >> the
> >> >>> > FILE
> >> >>> > > IO is limit to adapter to high-throughput inputs
> >> >>> > > - It buffers the data by checkpoints, which is too hard to be
> >> robust
> >> >>> for
> >> >>> > > production, the checkpoint function is blocking and should not
> >> have IO
> >> >>> > > operations.
> >> >>> > > - The FlinkHoodieIndex is only valid for a per-job scope, it
> does
> >> not
> >> >>> > work
> >> >>> > > for existing bootstrap data or for different Flink jobs
> >> >>> > >
> >> >>> > > Thus, here I propose a new design for the Flink writer to solve
> >> these
> >> >>> > > problems[1]. Overall, the new design tries to remove the single
> >> >>> > parallelism
> >> >>> > > operators and make the index more powerful and scalable.
> >> >>> > >
> >> >>> > > I plan to solve these bottlenecks incrementally (4 steps), there
> >> are
> >> >>> > > already some local POCs for these proposals.
> >> >>> > >
> >> >>> > > I'm looking forward to your feedback. Any suggestions are
> >> appreciated
> >> >>> ~
> >> >>> > >
> >> >>> > > [1]
> >> >>> > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1oOcU0VNwtEtZfTRt3v9z4xNQWY-Hy5beu7a1t5B-75I%2Fedit%3Fusp%3Dsharing&amp;data=04%7C01%7C%7Cd256cf75a4f14db4c7f608d8b120d69c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637454101880191121%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Ecw3TcwsVPFFG74scaE7KhMsIryhVRn9g40B0yMQvfc%3D&amp;reserved=0
> >> >>> > >
> >> >>> >
> >> >>>
> >> >>
> >>
> >
>

Re: Re: [DISCUSS] New Flink Writer Proposal

Reply via email to