[
https://issues.apache.org/jira/browse/FALCON-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191118#comment-15191118
]
sandeep samudrala commented on FALCON-1852:
-------------------------------------------
Its very much right to do this, as there have been multiple occasions where
users have reported for their job failures for optional inputs being not
evaluated(data not being present).
The only issue I am seeing is in case of the data not being available
completely for a path(feeds with availability flag may not be complete with
mere existence of the directory), in which case there might a half baked data
being consumed by the process. The above case can be written off saying for
optional meaning to what ever the data being available.
I am all thumps up for the above approach too.
> Optional Input for a process not truly optional
> -----------------------------------------------
>
> Key: FALCON-1852
> URL: https://issues.apache.org/jira/browse/FALCON-1852
> Project: Falcon
> Issue Type: Bug
> Reporter: Pallavi Rao
> Assignee: Pallavi Rao
>
> Currently, when a feed input is marked as optional, we do not add it to the
> coordinator definition's datasets. This means we do not wait for all
> instances (for a given data window) to arrive. Instead, we just resolve the
> paths for a data window and pass it as a parameter.
> For example:
> {noformat}
> <inputs>
> <!-- In the workflow, the input paths will be available in a variable
> 'inpaths' -->
> <input name="inpaths" feed="in" start="now(0,-5)" end="now(0,-1)"/>
> <input name="in2paths" feed="in2" start="now(0,-5)" end="now(0,-1)"
> optional="true"/>
> </inputs>
> {noformat}
> For a process instance 2013-01-01T00:00Z, the optional input, in2paths, will
> be resolved as below:
> {noformat}
> <property>
> <name>in2paths</name>
>
> <value>hdfs://localhost:9000/data/in2/2013/11/15/00/04,hdfs://localhost:9000/data/in2/2013/11/15/00/03,hdfs://localhost:9000/data/in2/2013/11/15/00/02,hdfs://localhost:9000/data/in2/2013/11/15/00/01,hdfs://localhost:9000/data/in2/2013/11/15/00/00</value>
> </property>
> {noformat}
> If one of the instance of in2paths (example,
> hdfs://localhost:9000/data/in2/2013/11/15/00/04) is missing, the workflow
> will fail anyway.
> Hence, input, in2paths is not truly optional. Only that the triggering of
> instance is not gated on it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)