Abhishek Agarwal created PIG-4680:
-------------------------------------

             Summary: Enable pig job graphs to resume from last successful state
                 Key: PIG-4680
                 URL: https://issues.apache.org/jira/browse/PIG-4680
             Project: Pig
          Issue Type: Improvement
          Components: impl
            Reporter: Abhishek Agarwal


Pig scripts can have multiple ETL jobs in the DAG which may take hours to 
finish. In case of transient errors, the job fails. When the job is rerun, all 
the nodes in Job graph will rerun. Some of these nodes may have already run 
successfully. Redundant runs lead to wastage of cluster capacity and pipeline 
delays. 

In case of failure, we can persist the graph state. In next run, only the 
failed nodes and their successors will rerun. This is of course subject to 
preconditions such as 
 - Pig script has not changed
 - Input locations have not changed
 - Output data from previous run is intact
 - Configuration has not changed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to