[
https://issues.apache.org/jira/browse/PIG-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952799#comment-14952799
]
Abhishek Agarwal commented on PIG-4680:
---------------------------------------
[~rohini] I am trying to upload the patch, I have generated using
"git diff --cached" (cached option because there are staged changes). However
I am getting this error while uploading the diff
Line 2: No valid separator after the filename was found in the diff header. I
see that similar sort of patch is accepted by oozie reviewboard.
> Enable pig job graphs to resume from last successful state
> ----------------------------------------------------------
>
> Key: PIG-4680
> URL: https://issues.apache.org/jira/browse/PIG-4680
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Reporter: Abhishek Agarwal
> Assignee: Abhishek Agarwal
> Attachments: PIG-4680.patch
>
>
> Pig scripts can have multiple ETL jobs in the DAG which may take hours to
> finish. In case of transient errors, the job fails. When the job is rerun,
> all the nodes in Job graph will rerun. Some of these nodes may have already
> run successfully. Redundant runs lead to wastage of cluster capacity and
> pipeline delays.
> In case of failure, we can persist the graph state. In next run, only the
> failed nodes and their successors will rerun. This is of course subject to
> preconditions such as
> - Pig script has not changed
> - Input locations have not changed
> - Output data from previous run is intact
> - Configuration has not changed
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)