[ https://issues.apache.org/jira/browse/GOBBLIN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hung Tran resolved GOBBLIN-661. ------------------------------- Resolution: Fixed Fix Version/s: 0.15.0 Issue resolved by pull request #2532 [https://github.com/apache/incubator-gobblin/pull/2532] > Prevent jobs resubmission after manager failure > ----------------------------------------------- > > Key: GOBBLIN-661 > URL: https://issues.apache.org/jira/browse/GOBBLIN-661 > Project: Apache Gobblin > Issue Type: Improvement > Reporter: Kuai Yu > Assignee: Kuai Yu > Priority: Major > Fix For: 0.15.0 > > > In gobblin cluster, if manager failed and relaunched, all the jobs persisted > in the job catalog will be relaunched. This can cause a few issues: > 1) Scalability issue: because the unfinished job might be submitted at > different point of time, now if all of them are submitted at the same time, > it can cause a performance issue. > 2) Waste effort: because the unfinished job now needs to be deleted, we have > to kill the existing running job, and resubmit. > > In this change, we improve both 1) and 2) > 1) In taskdriver mode, we will delete the job spec once we submit to Helix, > because we believe Helix is durable and all the jobs submitted wont' be lost, > so that we can safely delete the job specs. Next reboot manager won't see > those deleted job spec, thus no resubmission is needed. > 2) In taskdriver mode, we will cleanup Helix running jobs. If it is a > planning job, we won't delete it. Instead we just let it run to the end. -- This message was sent by Atlassian JIRA (v7.6.3#76005)