[ https://issues.apache.org/jira/browse/FALCON-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195086#comment-15195086 ]
Pallavi Rao commented on FALCON-1852: ------------------------------------- To ensure partial data is not consumed, we can enhance OozieELExtensions to check for the existence of the availabilityFlag under each instance dir and create a list of paths with only those dirs that contain the availabilityFlag. The only challenge will be, what if there are no instances at all for a given window. In such a case, the property will get evaluated to an "empty" string and the workflow will fail. If only I could set this value to an "empty dir", the problem would be solved. > Optional Input for a process not truly optional > ----------------------------------------------- > > Key: FALCON-1852 > URL: https://issues.apache.org/jira/browse/FALCON-1852 > Project: Falcon > Issue Type: Bug > Reporter: Pallavi Rao > Assignee: Pallavi Rao > > Currently, when a feed input is marked as optional, we do not add it to the > coordinator definition's datasets. This means we do not wait for all > instances (for a given data window) to arrive. Instead, we just resolve the > paths for a data window and pass it as a parameter. > For example: > {noformat} > <inputs> > <!-- In the workflow, the input paths will be available in a variable > 'inpaths' --> > <input name="inpaths" feed="in" start="now(0,-5)" end="now(0,-1)"/> > <input name="in2paths" feed="in2" start="now(0,-5)" end="now(0,-1)" > optional="true"/> > </inputs> > {noformat} > For a process instance 2013-01-01T00:00Z, the optional input, in2paths, will > be resolved as below: > {noformat} > <property> > <name>in2paths</name> > > <value>hdfs://localhost:9000/data/in2/2013/11/15/00/04,hdfs://localhost:9000/data/in2/2013/11/15/00/03,hdfs://localhost:9000/data/in2/2013/11/15/00/02,hdfs://localhost:9000/data/in2/2013/11/15/00/01,hdfs://localhost:9000/data/in2/2013/11/15/00/00</value> > </property> > {noformat} > If one of the instance of in2paths (example, > hdfs://localhost:9000/data/in2/2013/11/15/00/04) is missing, the workflow > will fail anyway. > Hence, input, in2paths is not truly optional. Only that the triggering of > instance is not gated on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)