[ 
https://issues.apache.org/jira/browse/OOZIE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128269#comment-14128269
 ] 

Shwetha G S commented on OOZIE-1536:
------------------------------------

Just outlining the issue and whole discussion:
Take a case where a coord action and its corresponding workflow is killed. So, 
we have 2 options for re-running this instance. Typically in a prod 
environment, there is more than one person monitoring the pipeline and we can't 
make sure that they use coord action re-run/workflow re-run always. If the 
coord action is re-run, oozie launches a new workflow and now there is 1 coord 
action and 2 workflows for the same nominal time. If someone goes and re-runs 
both the workflows, there will be 2 jobs running in parallel for the same 
nominal time which generates the same data. This will result in in-consistent 
data and its a nightmare to figure out the issue and fix it.

Oozie should make sure that there is single instance of the workflow for a 
coord for a given nominal time. The way we can achieve it is by re-running the 
old workflow with all new properties even in case of coord action re-run. Some 
of the issues raised with this approach are:
1. Coord action re-run with refresh option should re-validate the data sets: 
This will pick new definition from COORD_JOBS and re-materialise the instance. 
Instead of launching new workflow, it can re-run the existing workflow by 
overriding with new properties
2. If coord is updated, coord action re-run with refresh should pick new 
definition: Same as 1 and will work
3. Case where workflow path is updated for coord: Same as 1. 
ReRunXCommand(workflow re-run) deletes all entries from WF_ACTIONS(since skip 
nodes will not be set for coord action re-run) and runs the workflow like a 
fresh workflow. 

This will solve a lot of issues:
1. Data inconsistency because of parallel workflows for the same instance
2. Concurrency handling: Workflow re-run doesn't honour concurrency. Coord 
action re-run handles concurrency, but launches new workflow which causes issue
3. Decreases number of workflows in DB as coord action re-runs the existing 
workflow

[~rohini], [~puru] can you check if you see any issues with this?

> Coordinator action reruns start a new workflow
> ----------------------------------------------
>
>                 Key: OOZIE-1536
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1536
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Srikanth Sundarrajan
>
> Coordinator action reruns start a new workflow and if existing workflow for 
> the action is in running state, the same is not checked. Coord rerun can 
> possibly do a workflow re-run to prevent this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to