Re: [DISCUSS] Adaptive Parallelism of Job Vertex

Bo WANG Wed, 17 Apr 2019 04:55:14 -0700

Thanks Till for the comments.
We will implement a new adaptive parallelism supported scheduler in the new
schedule framework. Based on these schedule interfaces, we could do the
work in parallel.


On Tue, Apr 16, 2019 at 11:18 PM Till Rohrmann <[email protected]> wrote:

> Hi Bo Wang,
>
> thanks for proposing this design document. I think it is an interesting
> idea to improve Flink's execution efficiency.
>
> At the moment, the community is actively working on making Flink's
> scheduler pluggable. Once this is possible, we could try this feature out
> by implementing a scheduler which supports adaptive parallelism without
> affecting the existing code. I think this would be a nice approach to
> further evaluate and benchmark the implications of such a strategy. What do
> you think?
>
> Cheers,
> Till
>
> On Mon, Apr 8, 2019 at 10:28 AM Bo WANG <[email protected]> wrote:
>
> > Hi all,
> > In distribution computing system, execution parallelism is vital for both
> > resource efficiency and execution performance. In Flink, execution
> > parallelism is a pre-specified parameter, which is usually an empirical
> > value and thus might not be optimal on the various amount of data
> processed
> > by each task.
> >
> > Furthermore, a fixed parallelism cannot scale to varying data size, which
> > is common in production cluster, since we may not frequently change the
> > cluster configuration.
> >
> > Thus, we propose adaptively determine the execution parallelism of each
> > vertex at runtime based on the actual input data size and an ideal data
> > size processed by each task. The ideal data size is a pre-specified
> > parameter according to the property of the operator.
> >
> > The design doc is ready:
> >
> >
> https://docs.google.com/document/d/1ZxnoJ3SOxUk1PL2xC1t-kepq28Pg20IL6eVKUUOWSKY/edit?usp=sharing
> > ,
> > any comments are highly appreciated.
> >
>

Re: [DISCUSS] Adaptive Parallelism of Job Vertex

Reply via email to