[ https://issues.apache.org/jira/browse/PIG-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14802876#comment-14802876 ]
Srikanth Sundarrajan commented on PIG-4680: ------------------------------------------- This can be quite handy, particularly when pig scripts is launched via oozie and if the launcher were to fail and attempt is retried. > Enable pig job graphs to resume from last successful state > ---------------------------------------------------------- > > Key: PIG-4680 > URL: https://issues.apache.org/jira/browse/PIG-4680 > Project: Pig > Issue Type: Improvement > Components: impl > Reporter: Abhishek Agarwal > Assignee: Abhishek Agarwal > > Pig scripts can have multiple ETL jobs in the DAG which may take hours to > finish. In case of transient errors, the job fails. When the job is rerun, > all the nodes in Job graph will rerun. Some of these nodes may have already > run successfully. Redundant runs lead to wastage of cluster capacity and > pipeline delays. > In case of failure, we can persist the graph state. In next run, only the > failed nodes and their successors will rerun. This is of course subject to > preconditions such as > - Pig script has not changed > - Input locations have not changed > - Output data from previous run is intact > - Configuration has not changed -- This message was sent by Atlassian JIRA (v6.3.4#6332)