[ https://issues.apache.org/jira/browse/FLINK-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhu Zhu updated FLINK-10240: ---------------------------- Summary: Pluggable scheduling strategy for batch job (was: Flexible scheduling strategy is needed for batch job) > Pluggable scheduling strategy for batch job > ------------------------------------------- > > Key: FLINK-10240 > URL: https://issues.apache.org/jira/browse/FLINK-10240 > Project: Flink > Issue Type: New Feature > Components: Distributed Coordination > Reporter: Zhu Zhu > Priority: Major > Labels: scheduling > > Currently batch jobs are scheduled with LAZY_FROM_SOURCES strategy: source > tasks are scheduled in the beginning, and other tasks are scheduled once > there input data are consumable. > However, input data consumable does not always mean the task can work at > once. > > One example is the hash join operation, where the operator first consumes one > side(we call it build side) to setup a table, then consumes the other side(we > call it probe side) to do the real join work. If the probe side is started > early, it just get stuck on back pressure as the join operator will not > consume data from it before the building stage is done, causing a waste of > resources. > If we have the probe side task started after the build stage is done, both > the build and probe side can have more computing resources as they are > staggered. > > That's why we think a flexible scheduling strategy is needed, allowing job > owners to customize the vertex schedule order and constraints. Better > resource utilization usually means better performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)