[ 
https://issues.apache.org/jira/browse/FLINK-21439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346655#comment-17346655
 ] 

Matthias edited comment on FLINK-21439 at 5/18/21, 7:59 AM:
------------------------------------------------------------

Hi [~bytesandwich],
the {{AdaptiveScheduler}} does not support task failures for now, i.e. there's 
not dedicated task information provided which could be used to derive some task 
name. 

For now, only failures causing a full restart of the {{ExecutionGraph}} can 
occur. You might want to compare the error handling of the {{DefaultScheduler}} 
with the error handling of the {{AdaptiveScheduler}}. The 
{{FailureHandlingResult}} created in case of failure in the 
{{DefaultScheduler}} does not have a {{ExecutionVertexID}} referring to the 
{{Execution}} causing the error. The {{FailureHandlingResult}} is passed into 
the factory method in 
[DefaultScheduler:255|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java#L255]
 and that specific case is then handled in 
[FailureHandlingResultSnapshot:66|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/exceptionhistory/FailureHandlingResultSnapshot.java#L66].


was (Author: mapohl):
Hi [~bytesandwich],
the {{AdaptiveScheduler}} does not support task failures for now, i.e. there's 
not dedicated task information provided which could be used to derive some task 
name. 

For now, only global failures can occur. You might want to compare the error 
handling of the {{DefaultScheduler}} with the error handling of the 
{{AdaptiveScheduler}}. The {{FailureHandlingResult}} created in case of failure 
in the {{DefaultScheduler}} does not have a {{ExecutionVertexID}} referring to 
the {{Execution}} causing the error. The {{FailureHandlingResult}} is passed 
into the factory method in 
[DefaultScheduler:255|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java#L255]
 and that specific case is then handled in 
[FailureHandlingResultSnapshot:66|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/exceptionhistory/FailureHandlingResultSnapshot.java#L66].

> Adaptive Scheduler: Add support for exception history
> -----------------------------------------------------
>
>                 Key: FLINK-21439
>                 URL: https://issues.apache.org/jira/browse/FLINK-21439
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.13.0
>            Reporter: Matthias
>            Assignee: John Phelan
>            Priority: Major
>              Labels: pull-request-available, reactive
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> {{SchedulerNG.requestJob}} returns an {{ExecutionGraphInfo}} that was 
> introduced in FLINK-21188. This {{ExecutionGraphInfo}} holds the information 
> about the {{ArchivedExecutionGraph}} and exception history information. 
> Currently, it's a list of {{ErrorInfos}}. This might change due to ongoing 
> work in FLINK-21190 where we might introduced a wrapper class with more 
> information on the failure.
> The goal of this ticket is to implement the exception history for the 
> {{AdaptiveScheduler}}, i.e. collecting the exceptions that caused restarts. 
> This collection of failures should be forwarded through 
> {{SchedulerNG.requestJob}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to