Sure, i can update the RFC-13 cwiki if you agree with that.

Vinoth Chandar <vin...@apache.org> 于2021年1月5日周二 上午2:58写道:

> Overall +1 on the idea.
>
> Danny, could we move this to the apache cwiki if you don't mind?
> That's what we have been using for other RFC discussions.
>
> On Mon, Jan 4, 2021 at 1:22 AM Danny Chan <danny0...@apache.org> wrote:
>
> > The RFC-13 Flink writer has some bottlenecks that make it hard to adapter
> > to production:
> >
> > - The InstantGeneratorOperator is parallelism 1, which is a limit for
> > high-throughput consumption; because all the split inputs drain to a
> single
> > thread, the network IO would gains pressure too
> > - The WriteProcessOperator handles inputs by partition, that means,
> within
> > each partition write process, the BUCKETs are written one by one, the
> FILE
> > IO is limit to adapter to high-throughput inputs
> > - It buffers the data by checkpoints, which is too hard to be robust for
> > production, the checkpoint function is blocking and should not have IO
> > operations.
> > - The FlinkHoodieIndex is only valid for a per-job scope, it does not
> work
> > for existing bootstrap data or for different Flink jobs
> >
> > Thus, here I propose a new design for the Flink writer to solve these
> > problems[1]. Overall, the new design tries to remove the single
> parallelism
> > operators and make the index more powerful and scalable.
> >
> > I plan to solve these bottlenecks incrementally (4 steps), there are
> > already some local POCs for these proposals.
> >
> > I'm looking forward to your feedback. Any suggestions are appreciated ~
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/1oOcU0VNwtEtZfTRt3v9z4xNQWY-Hy5beu7a1t5B-75I/edit?usp=sharing
> >
>

Reply via email to