Hanghang Liu created GOBBLIN-1692:
-------------------------------------

             Summary: Make GobblinHelixJobScheduler stop Helix workflow 
synchronously
                 Key: GOBBLIN-1692
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1692
             Project: Apache Gobblin
          Issue Type: Improvement
          Components: gobblin-cluster
            Reporter: Hanghang Liu
            Assignee: Hung Tran


When handleUpdateJobConfigArrival, a new job config gets posted, 
GobblinHelixJobScheduler will firstly stop and delete the old job, and try to 
spin up the updated helix workflow.
The job scheduler will try to do the stop synchronically with a default 10 
seconds timeout setting. However, this stop constantly running longer than the 
timeout for Helix, causing the job state not correctly updated as stopped. 
Thus, when construct the GobblinHelixJobLauncher, we will have the previous job 
in a wrong state as jobRunningMap is not updated yet, causing the new job won’t 
being launched. So we always see this log: {{{}Job {} will not be executed 
because other jobs are still running{}}}. 

We can make the job delete asynchronized, and let waitForJobCompletion method 
to ensure the job status get updated correctly eventually.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to