----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/39226/ -----------------------------------------------------------
(Updated Oct. 12, 2015, 11:30 a.m.) Review request for pig and Rohini Palaniswamy. Repository: pig-git Description (updated) ------- Pig scripts can have multiple ETL jobs in the DAG which may take hours to finish. In case of transient errors, the job fails. When the job is rerun, all the nodes in Job graph will rerun. Some of these nodes may have already run successfully. Redundant runs lead to wastage of cluster capacity and pipeline delays. In case of failure, we can persist the graph state. In next run, only the failed nodes and their successors will rerun. This is of course subject to preconditions such as > Pig script has not changed > Input locations have not changed > Output data from previous run is intact > Configuration has not changed Diffs ----- src/org/apache/pig/PigConfiguration.java 03b36a5 src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 595e68c src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/MRIntermediateDataVisitor.java 4b62112 src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/MRJobRecovery.java PRE-CREATION src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/MRJobState.java PRE-CREATION src/org/apache/pig/impl/io/FileLocalizer.java f0f9b43 src/org/apache/pig/tools/grunt/GruntParser.java 439d087 src/org/apache/pig/tools/pigstats/ScriptState.java 03a12b1 Diff: https://reviews.apache.org/r/39226/diff/ Testing ------- Thanks, Abhishek Agarwal