[
https://issues.apache.org/jira/browse/GOBBLIN-1634?focusedWorklogId=763191&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-763191
]
ASF GitHub Bot logged work on GOBBLIN-1634:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 27/Apr/22 23:01
Start Date: 27/Apr/22 23:01
Worklog Time Spent: 10m
Work Description: Will-Lo commented on PR #3495:
URL: https://github.com/apache/gobblin/pull/3495#issuecomment-1111559959
>
> nice work!
>
> this seemed an important element (from your PR summary):
>
> ```
> Additionally, we also do not want to retry if a flow is skipped due to
concurrent jobs running at the same time
> ```
>
> yet I'm having trouble finding anything related to this in the impl. (a
comment saying the same at the least...). does it just come down to some
`CANCELED` ones not bearing a `isFlowSlaKilled.equals(true)`?
In this scenario I'm describing the timing event emitted is SKIPPED, which
is emitted from the Orchestrator. Since the dag isn't created I believe a retry
would be overriding the check, which is done in the DagManager (which comes
after Orchestration). Also it's unlikely that the concurrent flow actually
finished by the time the retry is processed anyways, since we do not have waits
before retries.
Issue Time Tracking
-------------------
Worklog Id: (was: 763191)
Time Spent: 0.5h (was: 20m)
> GaaS Flow SLA Kills should be retryable if configured
> -----------------------------------------------------
>
> Key: GOBBLIN-1634
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1634
> Project: Apache Gobblin
> Issue Type: Task
> Reporter: William Lo
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> On Gobblin as a Service flows can fail SLAs if they do not receive a Kafka
> event in some designated amount of time.
> Since GaaS supports retrys on failures, these failures due to SLAs should
> also be retryable.
> However, if the flow is cancelled from a user specified event through the API
> we do not want to retry.
> Additionally, we also do not want to retry if a flow is skipped due to
> concurrent jobs running at the same time, as it is unlikely without a more
> sophisticated waiting algorithm that the job will be finished by the time the
> job is retried again, wasting resources.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)