[ 
https://issues.apache.org/jira/browse/GOBBLIN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-661.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 0.15.0

Issue resolved by pull request #2532
[https://github.com/apache/incubator-gobblin/pull/2532]

> Prevent jobs resubmission after manager failure
> -----------------------------------------------
>
>                 Key: GOBBLIN-661
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-661
>             Project: Apache Gobblin
>          Issue Type: Improvement
>            Reporter: Kuai Yu
>            Assignee: Kuai Yu
>            Priority: Major
>             Fix For: 0.15.0
>
>
> In gobblin cluster, if manager failed and relaunched, all the jobs persisted 
> in the job catalog will be relaunched. This can cause a few issues:
> 1) Scalability issue: because the unfinished job might be submitted at 
> different point of time, now if all of them are submitted at the same time, 
> it can cause a performance issue.
> 2) Waste effort: because the unfinished job now needs to be deleted, we have 
> to kill the existing running job, and resubmit.
>  
> In this change, we improve both 1) and 2)
> 1) In taskdriver mode, we will delete the job spec once we submit to Helix, 
> because we believe Helix is durable and all the jobs submitted wont' be lost, 
> so that we can safely delete the job specs. Next reboot manager won't see 
> those deleted job spec, thus no resubmission is needed. 
> 2) In taskdriver mode, we will cleanup Helix running jobs. If it is a 
> planning job, we won't delete it. Instead we just let it run to the end.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to