[ 
https://issues.apache.org/jira/browse/FALCON-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15191118#comment-15191118
 ] 

sandeep samudrala commented on FALCON-1852:
-------------------------------------------

Its very much right to do this, as there have been multiple occasions where 
users have reported for their job failures for optional inputs being not 
evaluated(data not being present).

The only issue I am seeing is in case of the data not being available 
completely for a path(feeds with availability flag may not be complete with 
mere existence of the directory), in which case there might a half baked data 
being consumed by the process. The above case can be written off saying for 
optional meaning to what ever the data being available. 

I am all thumps up for the above approach too.

> Optional Input for a process not truly optional
> -----------------------------------------------
>
>                 Key: FALCON-1852
>                 URL: https://issues.apache.org/jira/browse/FALCON-1852
>             Project: Falcon
>          Issue Type: Bug
>            Reporter: Pallavi Rao
>            Assignee: Pallavi Rao
>
> Currently, when a feed input is marked as optional, we do not add it to the 
> coordinator definition's datasets. This means we do not wait for all 
> instances (for a given data window) to arrive. Instead, we just resolve the 
> paths for a data window and pass it as a parameter.
> For example:
> {noformat}
> <inputs>
>         <!-- In the workflow, the input paths will be available in a variable 
> 'inpaths' -->
>         <input name="inpaths" feed="in" start="now(0,-5)" end="now(0,-1)"/>
>         <input name="in2paths" feed="in2" start="now(0,-5)" end="now(0,-1)" 
> optional="true"/>
>     </inputs>
> {noformat}
> For a process instance 2013-01-01T00:00Z, the optional input, in2paths, will 
> be resolved as below:
> {noformat}
>  <property>
>     <name>in2paths</name>
>     
> <value>hdfs://localhost:9000/data/in2/2013/11/15/00/04,hdfs://localhost:9000/data/in2/2013/11/15/00/03,hdfs://localhost:9000/data/in2/2013/11/15/00/02,hdfs://localhost:9000/data/in2/2013/11/15/00/01,hdfs://localhost:9000/data/in2/2013/11/15/00/00</value>
>   </property>
> {noformat}
> If one of the instance of in2paths (example, 
> hdfs://localhost:9000/data/in2/2013/11/15/00/04) is missing, the workflow 
> will fail anyway.
> Hence, input, in2paths is not truly optional. Only that the triggering of 
> instance is not gated on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to