[
https://issues.apache.org/jira/browse/OOZIE-548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dénes Bodó updated OOZIE-548:
-----------------------------
Summary: OOZIE-131: Support WF action level retry (was: OOZIE-131: Support
WF action level rery)
> OOZIE-131: Support WF action level retry
> ----------------------------------------
>
> Key: OOZIE-548
> URL: https://issues.apache.org/jira/browse/OOZIE-548
> Project: Oozie
> Issue Type: New Feature
> Reporter: Mohammad Islam
> Assignee: Roman Shaposhnik
> Priority: Major
>
> While there are hadoop task level retry and oozie level retry for any
> transient error, it is desirable to allow WF action level retry configured by
> user as well.
> In this proposed task, the following sub-tasks needs to be considered:
> 1. Enable user to specify the retry count and retry interval (time between
> two successive tries).
> 2. Retry interval will be in minutes and the default value is 10 minutes. The
> default value should be system level configuration.
> 3. Default retry count is 0 (no-retry), to keep backward compatible.
> 4. A new state called "RETRY" will be added in WF action. An action will be
> in RETRY state, if the job failed and needs to be retried.
> 5. Three fields needs to be added into WF action table. retry_count,
> max_retry, retry_interval.
> 6. Some services like Recovery service will periodically check for the
> following sql "select action_id from WF_ACTIONS where status = 'RETRY' and
> (last_modified_time + retry_interval ) < current_time and max_retry >
> retry_count)" and queue RETRY_COMMAND. The last filter of SQL might not be
> required.
> 5. RETRY_COMMAND will update the status from RETRY to PREP and push a
> ActionStartXCommand.
> Open Question:
> a) Who will remove the temporary directories/files (such as ACTION_DIR)
> created by Oozie? Is it part when the job moves to RETRY state? Or
> RETRY_COMMAND could do it?
> b) Do we need to keep historical information such as why the previous retries
> failed? Historical information includes error code, error message etc.
> c)anything else?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)