Mass Dosage created FALCON-1686:
-----------------------------------

             Summary: Support for reprocessing
                 Key: FALCON-1686
                 URL: https://issues.apache.org/jira/browse/FALCON-1686
             Project: Falcon
          Issue Type: Improvement
    Affects Versions: 0.7
            Reporter: Mass Dosage


We have a number of ETL jobs which we schedule to run on a regular basis with 
Falcon. This works fine. However, we often have cases where we need to run the 
exact same jobs over past date ranges in order to reprocess data after a code 
change. There doesn't seem to be any easy way to do this in Falcon at the 
moment. Ideally we'd have a controlled way of saying "run this process for 
dates between X and Y". There should also be a way to control whether 
downstream processes are triggered by the data being reprocessed or not. In 
some cases you may want downstream jobs to also run on the new data but in 
other cases you might not. 

With Oozie, if one wants to reprocess data from any time in history, one can 
update the start & end-dates (using the job.properties file) and submit a new 
coordinator to run alongside the existing one. As the coordinator-ids are 
unique they do not clash. In Falcon, processes are defined by their readable 
name so one would need to update that in the process file directly. 

We are currently working around this issue by making a copy of the original 
Falcon process, giving it a different name and changing the dates. This isn't 
ideal and leads to a lot of XML duplication. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to