[ 
https://issues.apache.org/jira/browse/GOBBLIN-1692?focusedWorklogId=806848&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-806848
 ]

ASF GitHub Bot logged work on GOBBLIN-1692:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Sep/22 22:46
            Start Date: 07/Sep/22 22:46
    Worklog Time Spent: 10m 
      Work Description: hanghangliu commented on PR #3546:
URL: https://github.com/apache/gobblin/pull/3546#issuecomment-1239983292

   HELIX_JOB_WAIT_COMPLETION_TIMEOUT_SECONDS is the config to tune job wait 
completion timeout




Issue Time Tracking
-------------------

            Worklog Id:     (was: 806848)
    Remaining Estimate: 0h
            Time Spent: 10m

> Make GobblinHelixJobScheduler stop Helix workflow asynchronously
> ----------------------------------------------------------------
>
>                 Key: GOBBLIN-1692
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1692
>             Project: Apache Gobblin
>          Issue Type: Improvement
>          Components: gobblin-cluster
>            Reporter: Hanghang Liu
>            Assignee: Hung Tran
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When handleUpdateJobConfigArrival, a new job config gets posted, 
> GobblinHelixJobScheduler will firstly stop and delete the old job, and try to 
> spin up the updated helix workflow.
> The job scheduler will try to do the stop synchronically with a default 10 
> seconds timeout setting. However, this stop constantly running longer than 
> the timeout for Helix, causing the job state not correctly updated as 
> stopped. Thus, when construct the GobblinHelixJobLauncher, we will have the 
> previous job in a wrong state as jobRunningMap is not updated yet, causing 
> the new job won’t being launched. So we always see this log: {{{}Job {} will 
> not be executed because other jobs are still running{}}}. 
> We can make the job delete asynchronized, and let waitForJobCompletion method 
> to ensure the job status get updated correctly eventually.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to