[
https://issues.apache.org/jira/browse/GOBBLIN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vaibhav Singhal updated GOBBLIN-2181:
-------------------------------------
Description:
- Many times we experience failures in flow initialization or processing due to
which flow could not be concluded properly
- Azkaban client exceptions and SQLIntegrityViolation exceptions are examples
which have caused failures in recent history
- Currently most of these failures are by default considered transient
exceptions and are retried infinitely
- As a side effect, it causes flows not to conclude and causes failures in
future flow submissions which have caused incidents recently
- More details on analysis and options explored can be referred here -
https://docs.google.com/document/d/1PeuuslIVSX6gQrX1J5d0HW0HbNuNhULCHQVjPxpDghs/edit?tab=t.0
- As a first step we want to consider all exceptions as non transient and not
retry and remove conclude the flow by removing flowspec and dag action
- This issue tracks the changes to conclude the flow for non transient
exceptions and also mark them as failure to reflect the correct status of the
flow
> Non transient exception handling by flowspec removal
> ----------------------------------------------------
>
> Key: GOBBLIN-2181
> URL: https://issues.apache.org/jira/browse/GOBBLIN-2181
> Project: Apache Gobblin
> Issue Type: Bug
> Reporter: Vaibhav Singhal
> Priority: Major
>
> - Many times we experience failures in flow initialization or processing due
> to which flow could not be concluded properly
> - Azkaban client exceptions and SQLIntegrityViolation exceptions are examples
> which have caused failures in recent history
> - Currently most of these failures are by default considered transient
> exceptions and are retried infinitely
> - As a side effect, it causes flows not to conclude and causes failures in
> future flow submissions which have caused incidents recently
> - More details on analysis and options explored can be referred here -
> https://docs.google.com/document/d/1PeuuslIVSX6gQrX1J5d0HW0HbNuNhULCHQVjPxpDghs/edit?tab=t.0
> - As a first step we want to consider all exceptions as non transient and not
> retry and remove conclude the flow by removing flowspec and dag action
> - This issue tracks the changes to conclude the flow for non transient
> exceptions and also mark them as failure to reflect the correct status of the
> flow
--
This message was sent by Atlassian Jira
(v8.20.10#820010)