[
https://issues.apache.org/jira/browse/FALCON-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195086#comment-15195086
]
Pallavi Rao commented on FALCON-1852:
-------------------------------------
To ensure partial data is not consumed, we can enhance OozieELExtensions to
check for the existence of the availabilityFlag under each instance dir and
create a list of paths with only those dirs that contain the availabilityFlag.
The only challenge will be, what if there are no instances at all for a given
window. In such a case, the property will get evaluated to an "empty" string
and the workflow will fail. If only I could set this value to an "empty dir",
the problem would be solved.
> Optional Input for a process not truly optional
> -----------------------------------------------
>
> Key: FALCON-1852
> URL: https://issues.apache.org/jira/browse/FALCON-1852
> Project: Falcon
> Issue Type: Bug
> Reporter: Pallavi Rao
> Assignee: Pallavi Rao
>
> Currently, when a feed input is marked as optional, we do not add it to the
> coordinator definition's datasets. This means we do not wait for all
> instances (for a given data window) to arrive. Instead, we just resolve the
> paths for a data window and pass it as a parameter.
> For example:
> {noformat}
> <inputs>
> <!-- In the workflow, the input paths will be available in a variable
> 'inpaths' -->
> <input name="inpaths" feed="in" start="now(0,-5)" end="now(0,-1)"/>
> <input name="in2paths" feed="in2" start="now(0,-5)" end="now(0,-1)"
> optional="true"/>
> </inputs>
> {noformat}
> For a process instance 2013-01-01T00:00Z, the optional input, in2paths, will
> be resolved as below:
> {noformat}
> <property>
> <name>in2paths</name>
>
> <value>hdfs://localhost:9000/data/in2/2013/11/15/00/04,hdfs://localhost:9000/data/in2/2013/11/15/00/03,hdfs://localhost:9000/data/in2/2013/11/15/00/02,hdfs://localhost:9000/data/in2/2013/11/15/00/01,hdfs://localhost:9000/data/in2/2013/11/15/00/00</value>
> </property>
> {noformat}
> If one of the instance of in2paths (example,
> hdfs://localhost:9000/data/in2/2013/11/15/00/04) is missing, the workflow
> will fail anyway.
> Hence, input, in2paths is not truly optional. Only that the triggering of
> instance is not gated on it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)