[
https://issues.apache.org/jira/browse/GOBBLIN-2011?focusedWorklogId=907872&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-907872
]
ASF GitHub Bot logged work on GOBBLIN-2011:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 02/Mar/24 00:15
Start Date: 02/Mar/24 00:15
Worklog Time Spent: 10m
Work Description: arjun4084346 commented on PR #3888:
URL: https://github.com/apache/gobblin/pull/3888#issuecomment-1974119694
I think we should not set the status "Failed" when the last execution is
running. We should instead emit a new event "SKIPPED". With this any further
execution should be able to correctly decide whether a new job should run or
not.
This will make the code more maintainable.
Issue Time Tracking
-------------------
Worklog Id: (was: 907872)
Time Spent: 40m (was: 0.5h)
> Fix bug where concurrent flows can be kicked off depending on a jobstatus
> race condition
> ----------------------------------------------------------------------------------------
>
> Key: GOBBLIN-2011
> URL: https://issues.apache.org/jira/browse/GOBBLIN-2011
> Project: Apache Gobblin
> Issue Type: Bug
> Reporter: William Lo
> Priority: Major
> Time Spent: 40m
> Remaining Estimate: 0h
>
> There's a bug that causes GaaS multileader to kick off unintended concurrent
> flows which happens in the order described below:
> 1. Host A checks the latest flow execution status to ensure the prior flow is
> not running, sees that the prior execution is still running.
> 2. Host A fails the flow pending execution as it cannot run concurrent flow,
> this emits a FAILED event to GaaS which is ingested by the JobStatusMonitor.
> 3. Host B checks the latest flow execution status, sees the current flow
> execution ID which is FAILED (considered a finished flow).
> 4. Host B kicks off the pending flow execution when it shouldn't be.
> To resolve this, we need to ensure that we are looking at the past 2 flow
> executions, and follow the behavior:
> 1. If there is no prior execution, kick off the pending flow
> 2. If the prior execution is IN PROGRESS, we want to indicate that there is a
> concurrent flow and block the pending execution.
> 3. If the prior execution is FINISHED, then we want to kick off the pending
> execution (rely on the DagManager for deduplication of flows because we do
> not know if the host managing this pending flow is running behind the other
> hosts).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)