JIN SUN created FLINK-10298:
-------------------------------

             Summary: Batch Job Failover Strategy
                 Key: FLINK-10298
                 URL: https://issues.apache.org/jira/browse/FLINK-10298
             Project: Flink
          Issue Type: Sub-task
          Components: JobManager
            Reporter: JIN SUN
            Assignee: JIN SUN


The new failover strategy needs to consider handling failures according to 
different failure types. It orchestrates all the logics we mentioned in this 
[document|https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit#],
 we can put the logic in onTaskFailure method of the FailoverStrategy 
interface, with the logic inline:
{code:java}
public void onTaskFailure(Execution taskExecution, Throwable cause) {  

        //1. Get the throwable type

        //2. If the type is NonrecoverableType fail the job

        //3. If the type is PatritionDataMissingError, do revocation

        //4. If the type is EnvironmentError, do check blacklist

//5. Other failure types are recoverable, but we need to remember the count of 
the failure,

if it exceeds the threshold, fail the job

}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to