Hello!

I have a shell script that performs certain actions on files in HDFS. Script is 
run through Oozie workflow which I want to schedule in Falcon.

Files, as usual, are located in partitions (/some_root_dir/2016/02/03 etc).

Every day new directory appears and new data arrives.


The problem is, sometimes data may be late for a few days and I want Falcon to 
recognize that and, upon late arrival, run Oozie/Shell action on that data as 
well - not only on today's portion.

But that part is insufficiently documented at the moment:


https://falcon.apache.org/FalconDocumentation.html#Handling_late_input_data

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_data_governance/content/ch_falcon_late_data_handling.html


I don't understand, what should be in the late workflow?

How and at which moment does Falcon decide on which directories to run that 
late workflow?

How are the dates (locations) of those directories passed to the late workflow??


Best regards,

Mike

Reply via email to