[ https://issues.apache.org/jira/browse/OOZIE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128269#comment-14128269 ]
Shwetha G S commented on OOZIE-1536: ------------------------------------ Just outlining the issue and whole discussion: Take a case where a coord action and its corresponding workflow is killed. So, we have 2 options for re-running this instance. Typically in a prod environment, there is more than one person monitoring the pipeline and we can't make sure that they use coord action re-run/workflow re-run always. If the coord action is re-run, oozie launches a new workflow and now there is 1 coord action and 2 workflows for the same nominal time. If someone goes and re-runs both the workflows, there will be 2 jobs running in parallel for the same nominal time which generates the same data. This will result in in-consistent data and its a nightmare to figure out the issue and fix it. Oozie should make sure that there is single instance of the workflow for a coord for a given nominal time. The way we can achieve it is by re-running the old workflow with all new properties even in case of coord action re-run. Some of the issues raised with this approach are: 1. Coord action re-run with refresh option should re-validate the data sets: This will pick new definition from COORD_JOBS and re-materialise the instance. Instead of launching new workflow, it can re-run the existing workflow by overriding with new properties 2. If coord is updated, coord action re-run with refresh should pick new definition: Same as 1 and will work 3. Case where workflow path is updated for coord: Same as 1. ReRunXCommand(workflow re-run) deletes all entries from WF_ACTIONS(since skip nodes will not be set for coord action re-run) and runs the workflow like a fresh workflow. This will solve a lot of issues: 1. Data inconsistency because of parallel workflows for the same instance 2. Concurrency handling: Workflow re-run doesn't honour concurrency. Coord action re-run handles concurrency, but launches new workflow which causes issue 3. Decreases number of workflows in DB as coord action re-runs the existing workflow [~rohini], [~puru] can you check if you see any issues with this? > Coordinator action reruns start a new workflow > ---------------------------------------------- > > Key: OOZIE-1536 > URL: https://issues.apache.org/jira/browse/OOZIE-1536 > Project: Oozie > Issue Type: Improvement > Reporter: Srikanth Sundarrajan > > Coordinator action reruns start a new workflow and if existing workflow for > the action is in running state, the same is not checked. Coord rerun can > possibly do a workflow re-run to prevent this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)