[ 
https://issues.apache.org/jira/browse/FLINK-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-10240:
----------------------------
    Description: 
Currently in Flink we have 2 schedule mode:

1. EAGER mode starts all tasks at once, mainly for streaming job

2. LAZY_FROM_SOURCES mode starts a task once its input data is consumable

 

However, in batch job, input data ready does not always mean the task can work 
at once. 

One example is the hash join operation, where the operator first consumes one 
side(we call it build side) to setup a table, then consumes the other side(we 
call it probe side) to do the real join work. If the probe side is started 
early, it just get stuck on back pressure as the join operator will not consume 
data from it before the building stage is done, causing a waste of resources.

If we have the probe side task started after the build stage is done, both the 
build and probe side can have more computing resources as they are staggered.

 

That's why we think a flexible scheduling strategy is needed, allowing job 
owners to customize the vertex schedule order and constraints. Better resource 
utilization usually means better performance.

  was:
Currently in Flink we have 2 schedule mode:

1. EAGER mode starts all tasks at once, mainly for streaming job

2. LAZY_FROM_SOURCES mode starts a task once its input data is consumable

 

However, in batch job, input data ready does not always mean the task can work 
at once. 

One example is the hash join operation, where the operator first consumes one 
side(we call it build side) to setup a table, then consumes the other side(we 
call it probe side) to do the real join work. If the probe side is started 
early, it just get stuck on back pressure as the join operator will not consume 
data from it before the building stage is done, causing a waste of resources.

If we have the probe side task start after the build stage is done, both the 
build and probe side can have more computing resources as they are staggered.

 

That's way we think a flexible scheduling strategy is needed, allowing job 
owners to customize the vertex schedule order and constraints. Better resource 
utilization usually means better performance.


> Flexible scheduling strategy is needed for batch job
> ----------------------------------------------------
>
>                 Key: FLINK-10240
>                 URL: https://issues.apache.org/jira/browse/FLINK-10240
>             Project: Flink
>          Issue Type: New Feature
>          Components: Distributed Coordination
>            Reporter: Zhu Zhu
>            Priority: Major
>              Labels: scheduling
>
> Currently in Flink we have 2 schedule mode:
> 1. EAGER mode starts all tasks at once, mainly for streaming job
> 2. LAZY_FROM_SOURCES mode starts a task once its input data is consumable
>  
> However, in batch job, input data ready does not always mean the task can 
> work at once. 
> One example is the hash join operation, where the operator first consumes one 
> side(we call it build side) to setup a table, then consumes the other side(we 
> call it probe side) to do the real join work. If the probe side is started 
> early, it just get stuck on back pressure as the join operator will not 
> consume data from it before the building stage is done, causing a waste of 
> resources.
> If we have the probe side task started after the build stage is done, both 
> the build and probe side can have more computing resources as they are 
> staggered.
>  
> That's why we think a flexible scheduling strategy is needed, allowing job 
> owners to customize the vertex schedule order and constraints. Better 
> resource utilization usually means better performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to