[jira] [Closed] (FLINK-28131) FLIP-168: Speculative Execution for Batch Job

2022-09-26 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-28131.
---
Release Note: 
Speculative execution(FLIP-168) is introduced in Flink 1.16 to mitigate batch 
job slowness which is caused by problematic nodes. A problematic node may have 
hardware problems, accident I/O busy, or high CPU load. These problems may make 
the hosted tasks run much slower than tasks on other nodes, and affect the 
overall execution time of a batch job.

When speculative execution is enabled, Flink will keep detecting slow tasks. 
Once slow tasks are detected, the nodes that the slow tasks locate in will be 
identified as problematic nodes and get blocked via the blocklist 
mechanism(FLIP-224). The scheduler will create new attempts for the slow tasks 
and deploy them to nodes that are not blocked, while the existing attempts will 
keep running. The new attempts process the same input data and produce the same 
data as the original attempt. Once any attempt finishes first, it will be 
admitted as the only finished attempt of the task, and the remaining attempts 
of the task will be canceled.

Most existing sources can work with speculative execution(FLIP-245). Only if a 
source uses SourceEvent, it must implement 
SupportsHandleExecutionAttemptSourceEvent to support speculative execution. 
Sinks do not support speculative execution yet so that speculative execution 
will not happen on sinks at the moment.

The Web UI & REST API are also improved(FLIP-249) to display multiple 
concurrent attempts of tasks and blocked task managers.
  Resolution: Done

> FLIP-168: Speculative Execution for Batch Job
> -
>
> Key: FLINK-28131
> URL: https://issues.apache.org/jira/browse/FLINK-28131
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Major
> Fix For: 1.16.0
>
>
> Speculative executions is helpful to mitigate slow tasks caused by 
> problematic nodes. The basic idea is to start mirror tasks on other nodes 
> when a slow task is detected. The mirror task processes the same input data 
> and produces the same data as the original task. 
> More detailed can be found in 
> [FLIP-168|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+Execution+for+Batch+Job].]
>  
> This is the umbrella ticket to track all the changes of this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-28131) FLIP-168: Speculative Execution for Batch Job

2022-09-18 Thread Xingbo Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xingbo Huang closed FLINK-28131.

Resolution: Done

> FLIP-168: Speculative Execution for Batch Job
> -
>
> Key: FLINK-28131
> URL: https://issues.apache.org/jira/browse/FLINK-28131
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Major
> Fix For: 1.16.0
>
>
> Speculative executions is helpful to mitigate slow tasks caused by 
> problematic nodes. The basic idea is to start mirror tasks on other nodes 
> when a slow task is detected. The mirror task processes the same input data 
> and produces the same data as the original task. 
> More detailed can be found in 
> [FLIP-168|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+Execution+for+Batch+Job].]
>  
> This is the umbrella ticket to track all the changes of this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)