[ 
https://issues.apache.org/jira/browse/FLINK-12662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860537#comment-16860537
 ] 

vinoyang commented on FLINK-12662:
----------------------------------

Hi [~till.rohrmann] After getting your idea, I'd like to propose my new 
thought. Since {{ExecutionGraph}}, {{AccessExecutionGraph}} and 
{{ArchiveExecutionGraph}} are all have a same method : 
{code:java}
ErrorInfo getFailureInfo()
{code}
which means, they only can get the latest {{ErrorInfo}} of this job running 
instance. Based on this, we can add two fields into {{ExecutionGraph}}:
 * List<ErrorInfo> attemptFailures;
 * long globalRestartTimes;

We also need to provide two new methods for {{ExecutionGraph}}, 
{{AccessExecutionGraph}} and {{ArchiveExecutionGraph}} :
 *  {{getAttemptFailureInfos}} to distinguish with the method {{getFailureInfo}}
 * {{getGlobalRestartTimes}}

In addition, I have two further questions:
 # shall we introduce a new data structure named e.g. 
{{ExecutionGraphAttemptHistory}}, if have it, we can also encapsulate 
{{attemptStart}} and {{attemptEnd}} fields?
 # shall we consider failover strategy(region recovery)? 
 # maybe we also need to consider how two show the restart info in the Flink 
web UI, of cause it can be tracked with another issue?

 

 

> show jobs failover in history server as well
> --------------------------------------------
>
>                 Key: FLINK-12662
>                 URL: https://issues.apache.org/jira/browse/FLINK-12662
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / REST
>            Reporter: Su Ralph
>            Assignee: vinoyang
>            Priority: Major
>
> Currently 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/historyserver.html]
>  only show the completed jobs (completd, cancel, failed). Not showing any 
> intermediate failover. 
> Which make the cluster administrator/developer hard to find first place if 
> there is two failover happens. Feature ask is to 
> - make a failover as a record in history server as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to