[ https://issues.apache.org/jira/browse/OOZIE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Purshotam Shah updated OOZIE-1976: ---------------------------------- Attachment: OOZIE-1976-V3.patch > Specifying coordinator input datasets in more logical ways > ---------------------------------------------------------- > > Key: OOZIE-1976 > URL: https://issues.apache.org/jira/browse/OOZIE-1976 > Project: Oozie > Issue Type: New Feature > Components: coordinator > Affects Versions: trunk > Reporter: Mona Chitnis > Assignee: Purshotam Shah > Fix For: trunk > > Attachments: Input-check.docx, OOZIE-1976-V3.patch, > OOZIE-1976-WIP.patch, OOZIE-1976-rough-design-2.pdf, > OOZIE-1976-rough-design.pdf > > > All dataset instances specified as input to coordinator, currently work on > AND logic i.e. ALL of them should be available for workflow to start. We > should enhance this to include more logical ways of specifying availability > criteria e.g. > * OR between instances > * minimum N out of K instances > * delta datasets (process data incrementally) > Use-cases for this: > * Different datasets are BCP, and workflow can run with either, whichever > arrives earlier. > * Data is not guaranteed, and while $coord:latest allows skipping to > available ones, workflow will never trigger unless mentioned number of > instances are found. > * Workflow is like a ‘refining’ algorithm which should run after minimum > required datasets are ready, and should only process the delta for efficiency. > This JIRA is to discuss the design and then the review the implementation for > some or all of the above features. -- This message was sent by Atlassian JIRA (v6.3.4#6332)