[jira] [Commented] (PIG-4680) Enable pig job graphs to resume from last successful state

Srikanth Sundarrajan (JIRA) Thu, 17 Sep 2015 06:03:46 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14802876#comment-14802876
 ]


Srikanth Sundarrajan commented on PIG-4680:
-------------------------------------------

This can be quite handy, particularly when pig scripts is launched via oozie 
and if the launcher were to fail and attempt is retried.

> Enable pig job graphs to resume from last successful state
> ----------------------------------------------------------
>
>                 Key: PIG-4680
>                 URL: https://issues.apache.org/jira/browse/PIG-4680
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Abhishek Agarwal
>            Assignee: Abhishek Agarwal
>
> Pig scripts can have multiple ETL jobs in the DAG which may take hours to 
> finish. In case of transient errors, the job fails. When the job is rerun, 
> all the nodes in Job graph will rerun. Some of these nodes may have already 
> run successfully. Redundant runs lead to wastage of cluster capacity and 
> pipeline delays. 
> In case of failure, we can persist the graph state. In next run, only the 
> failed nodes and their successors will rerun. This is of course subject to 
> preconditions such as 
>  - Pig script has not changed
>  - Input locations have not changed
>  - Output data from previous run is intact
>  - Configuration has not changed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4680) Enable pig job graphs to resume from last successful state

Reply via email to