[
https://issues.apache.org/jira/browse/FALCON-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15129922#comment-15129922
]
Pragya Mittal commented on FALCON-1807:
---------------------------------------
Attaching all required definitions :
Process :
{noformat}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<process name="ProcessLateRerunTest-agregator-coord16-bb54f97c"
xmlns="uri:falcon:process:0.1">
<clusters>
<cluster name="ProcessLateRerunTest-corp-bf31e225">
<validity start="2016-02-03T06:48Z" end="2016-02-03T07:18Z"/>
</cluster>
</clusters>
<parallel>2</parallel>
<order>FIFO</order>
<frequency>minutes(5)</frequency>
<timezone>UTC</timezone>
<inputs>
<input name="inputData"
feed="ProcessLateRerunTest-raaw-logs16-3d7c1e49" start="now(0,-1)"
end="now(0,0)"/>
</inputs>
<outputs>
<output name="outputData"
feed="ProcessLateRerunTest-agregated-logs16-975a0d4c" instance="now(0,0)"/>
</outputs>
<properties>
<property name="queueName" value="default"/>
</properties>
<workflow path="/tmp/falcon-regression/ProcessLateRerunTest/aggregator"/>
<retry policy="periodic" delay="minutes(10)" attempts="3"/>
<late-process policy="periodic" delay="minutes(4)">
<late-input input="inputData"
workflow-path="/tmp/falcon-regression/ProcessLateRerunTest/aggregator"/>
</late-process>
<ACL owner="pragya" group="dataqa" permission="*"/>
</process>
{noformat}
Feed1 :
{noformat}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<feed name="ProcessLateRerunTest-agregated-logs16-975a0d4c" description="clicks
log" xmlns="uri:falcon:feed:0.1">
<frequency>minutes(5)</frequency>
<timezone>UTC</timezone>
<late-arrival cut-off="hours(6)"/>
<clusters>
<cluster name="ProcessLateRerunTest-corp-bf31e225" type="source">
<validity start="2009-01-01T01:00Z" end="2099-12-31T23:59Z"/>
<retention limit="months(6)" action="delete"/>
</cluster>
</clusters>
<locations>
<location type="data"
path="/tmp/falcon-regression/ProcessLateRerunTest/output-data/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}"/>
<location type="stats" path="/projects/falcon/clicksStats"/>
<location type="meta" path="/projects/falcon/clicksMetaData"/>
</locations>
<ACL owner="pragya" group="dataqa" permission="*"/>
<schema location="/schema/clicks" provider="protobuf"/>
<properties>
<property name="field5" value="value1"/>
<property name="field6" value="value2"/>
</properties>
</feed>
{noformat}
Feed2 :
{noformat}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<feed name="ProcessLateRerunTest-raaw-logs16-3d7c1e49" description="clicks log"
xmlns="uri:falcon:feed:0.1">
<frequency>minutes(1)</frequency>
<timezone>UTC</timezone>
<late-arrival cut-off="hours(6)"/>
<clusters>
<cluster name="ProcessLateRerunTest-corp-bf31e225" type="source">
<validity start="2009-01-01T00:00Z" end="2099-12-31T23:59Z"/>
<retention limit="months(6)" action="delete"/>
</cluster>
</clusters>
<locations>
<location type="data"
path="/tmp/falcon-regression/ProcessLateRerunTest/input/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}"/>
<location type="stats" path="/projects/falcon/clicksStats"/>
<location type="meta" path="/projects/falcon/clicksMetaData"/>
</locations>
<ACL owner="pragya" group="dataqa" permission="*"/>
<schema location="/schema/clicks" provider="protobuf"/>
<properties>
<property name="field3" value="value1"/>
<property name="field4" value="value2"/>
</properties>
</feed>
{noformat}
Workflow :
{noformat}
<workflow-app xmlns="uri:oozie:workflow:0.2" name="aggregator-wf">
<start to="aggregator"/>
<action name="aggregator">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${outputData}"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.mapper.class</name>
<value>org.apache.hadoop.mapred.lib.IdentityMapper</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>org.apache.hadoop.mapred.lib.IdentityReducer</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>${inputData}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${outputData}</value>
</property>
</configuration>
</map-reduce>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Map/Reduce failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
{noformat}
> Late Rerun is not working in distributed mode
> ---------------------------------------------
>
> Key: FALCON-1807
> URL: https://issues.apache.org/jira/browse/FALCON-1807
> Project: Falcon
> Issue Type: Bug
> Components: rerun
> Affects Versions: 0.9
> Reporter: Pragya Mittal
> Assignee: sandeep samudrala
> Priority: Blocker
>
> Ideally late rerun, runs the instance if and when the data becomes available
> in the late rerun zone. This is not happening currently.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)