Thanks Till for the comments. We will implement a new adaptive parallelism supported scheduler in the new schedule framework. Based on these schedule interfaces, we could do the work in parallel.
On Tue, Apr 16, 2019 at 11:18 PM Till Rohrmann <trohrm...@apache.org> wrote: > Hi Bo Wang, > > thanks for proposing this design document. I think it is an interesting > idea to improve Flink's execution efficiency. > > At the moment, the community is actively working on making Flink's > scheduler pluggable. Once this is possible, we could try this feature out > by implementing a scheduler which supports adaptive parallelism without > affecting the existing code. I think this would be a nice approach to > further evaluate and benchmark the implications of such a strategy. What do > you think? > > Cheers, > Till > > On Mon, Apr 8, 2019 at 10:28 AM Bo WANG <wbeaglewatc...@gmail.com> wrote: > > > Hi all, > > In distribution computing system, execution parallelism is vital for both > > resource efficiency and execution performance. In Flink, execution > > parallelism is a pre-specified parameter, which is usually an empirical > > value and thus might not be optimal on the various amount of data > processed > > by each task. > > > > Furthermore, a fixed parallelism cannot scale to varying data size, which > > is common in production cluster, since we may not frequently change the > > cluster configuration. > > > > Thus, we propose adaptively determine the execution parallelism of each > > vertex at runtime based on the actual input data size and an ideal data > > size processed by each task. The ideal data size is a pre-specified > > parameter according to the property of the operator. > > > > The design doc is ready: > > > > > https://docs.google.com/document/d/1ZxnoJ3SOxUk1PL2xC1t-kepq28Pg20IL6eVKUUOWSKY/edit?usp=sharing > > , > > any comments are highly appreciated. > > >