[jira] [Work logged] (GOBBLIN-1634) GaaS Flow SLA Kills should be retryable if configured

ASF GitHub Bot (Jira) Wed, 27 Apr 2022 16:02:09 -0700


     [ 
https://issues.apache.org/jira/browse/GOBBLIN-1634?focusedWorklogId=763191&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-763191
 ]


ASF GitHub Bot logged work on GOBBLIN-1634:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Apr/22 23:01
            Start Date: 27/Apr/22 23:01
    Worklog Time Spent: 10m 
      Work Description: Will-Lo commented on PR #3495:
URL: https://github.com/apache/gobblin/pull/3495#issuecomment-1111559959

   > 
   
   > nice work!
   > 
   > this seemed an important element (from your PR summary):
   > 
   > ```
   > Additionally, we also do not want to retry if a flow is skipped due to 
concurrent jobs running at the same time
   > ```
   > 
   > yet I'm having trouble finding anything related to this in the impl. (a 
comment saying the same at the least...). does it just come down to some 
`CANCELED` ones not bearing a `isFlowSlaKilled.equals(true)`?
   
   In this scenario I'm describing the timing event emitted is SKIPPED, which 
is emitted from the Orchestrator. Since the dag isn't created I believe a retry 
would be overriding the check, which is done in the DagManager (which comes 
after Orchestration). Also it's unlikely that the concurrent flow actually 
finished by the time the retry is processed anyways, since we do not have waits 
before retries.
   
   




Issue Time Tracking
-------------------

    Worklog Id:     (was: 763191)
    Time Spent: 0.5h  (was: 20m)

> GaaS Flow SLA Kills should be retryable if configured
> -----------------------------------------------------
>
>                 Key: GOBBLIN-1634
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1634
>             Project: Apache Gobblin
>          Issue Type: Task
>            Reporter: William Lo
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> On Gobblin as a Service flows can fail SLAs if they do not receive a Kafka 
> event in some designated amount of time.
> Since GaaS supports retrys on failures, these failures due to SLAs should 
> also be retryable.
> However, if the flow is cancelled from a user specified event through the API 
> we do not want to retry.
> Additionally, we also do not want to retry if a flow is skipped due to 
> concurrent jobs running at the same time, as it is unlikely without a more 
> sophisticated waiting algorithm that the job will be finished by the time the 
> job is retried again, wasting resources.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (GOBBLIN-1634) GaaS Flow SLA Kills should be retryable if configured

Reply via email to