Hi Mike,
Suppose your feed describes late arrival policy as
<late-arrival cut-off=³days(3)"/>
Suppose you specify input as
<inputs>
<input end="now(0,0)" start="now(0,-1)" feed="raaw-logs16"
name="inputData"/>
</inputs>
Now, you can have late process as
<late-process policy=³periodic" delay=³hours(6)">
<late-input input="inputData"
workflow-path="hdfs://inputData/late/workflow" />
</late-process>
To answer your questions,
‹ Make the shell script part of shell action in oozie workflow.
‹ If you want to handle late-data in a manner different from regular data,
you should have a two different workflows. Else, you can have the same
workflow you use to handle regular data.
‹ In above example, Falcon looks periodically (Policy) every six hours
(delay) for the late data to arrive until 3 days (late-arrival cut-off
defined in feed).
The date-patterns used for input data will not change between on-time data
and late-arrival data.
Thank you
Balu Vellanki
On 2/11/16, 4:07 AM, "Mikhail Ilin" <[email protected]> wrote:
>Hello!
>
>
>I have a shell script that performs certain actions on files in HDFS.
>Script is run through Oozie workflow which I want to schedule in Falcon.
>
>Files, as usual, are located in partitions (/some_root_dir/2016/02/03
>etc).
>
>Every day new directory appears and new data arrives.
>
>
>The problem is, sometimes data may be late for a few days and I want
>Falcon to recognize that and, upon late arrival, run Oozie/Shell action
>on that data as well - not only on today's portion.
>
>But that part is insufficiently documented at the moment:
>
>
>https://falcon.apache.org/FalconDocumentation.html#Handling_late_input_dat
>a
>
>https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_data_governanc
>e/content/ch_falcon_late_data_handling.html
>
>
>I don't understand, what should be in the late workflow?
>
>How and at which moment does Falcon decide on which directories to run
>that late workflow?
>
>How are the dates (locations) of those directories passed to the late
>workflow??
>
>
>Best regards,
>
>Mike