Hanghang Liu created GOBBLIN-1692:
-------------------------------------
Summary: Make GobblinHelixJobScheduler stop Helix workflow
synchronously
Key: GOBBLIN-1692
URL: https://issues.apache.org/jira/browse/GOBBLIN-1692
Project: Apache Gobblin
Issue Type: Improvement
Components: gobblin-cluster
Reporter: Hanghang Liu
Assignee: Hung Tran
When handleUpdateJobConfigArrival, a new job config gets posted,
GobblinHelixJobScheduler will firstly stop and delete the old job, and try to
spin up the updated helix workflow.
The job scheduler will try to do the stop synchronically with a default 10
seconds timeout setting. However, this stop constantly running longer than the
timeout for Helix, causing the job state not correctly updated as stopped.
Thus, when construct the GobblinHelixJobLauncher, we will have the previous job
in a wrong state as jobRunningMap is not updated yet, causing the new job won’t
being launched. So we always see this log: {{{}Job {} will not be executed
because other jobs are still running{}}}.
We can make the job delete asynchronized, and let waitForJobCompletion method
to ensure the job status get updated correctly eventually.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)