> On Oct. 21, 2015, 11:19 a.m., Rohini Palaniswamy wrote: > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java, > > line 491 > > <https://reviews.apache.org/r/39226/diff/1/?file=1095351#file1095351line491> > > > > Even if you skip deleting intermediate files here, it will delete in > > finally block of Main.java
When the jobCheckpoint is enabled, temporary output is written inside the staging container. Necessary transformation happens in mr.transform() call. In the finally block of main, only the temporary container is deleted. > On Oct. 21, 2015, 11:19 a.m., Rohini Palaniswamy wrote: > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java, > > line 507 > > <https://reviews.apache.org/r/39226/diff/1/?file=1095351#file1095351line507> > > > > Just storing the current plan? how about what part of it has succeeded? Success or failure is being inferred through the commit file in the output path of an intermediate job. If the output path SUCCESS file is present, then it is assumed that job has succeeded. > On Oct. 21, 2015, 11:19 a.m., Rohini Palaniswamy wrote: > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java, > > line 532 > > <https://reviews.apache.org/r/39226/diff/1/?file=1095351#file1095351line532> > > > > Creating files that user is not expecting in output directories will be > > a problem. That is understandable. Another approach could be store the completion state of the job along with output path. We can opt to not rerun the job if last state was successful and the directory is still present. Should we also note the timestamp of the directory? > On Oct. 21, 2015, 11:19 a.m., Rohini Palaniswamy wrote: > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/MRJobRecovery.java, > > line 103 > > <https://reviews.apache.org/r/39226/diff/1/?file=1095353#file1095353line103> > > > > This might have to do more checks to skip some settings that usually > > change between runs but do not affect recovering. For eg: Running through > > Oozie, for a rerun you will get a different launcher job id in the config. I probably missed this configuration. Skipping custom hard-coded settings is not feasible. User can give as an option to skip some configurations but that will make things complex for the user. I am now inclining toward an approach similar to oozie. For a rerun, user can explicitly specifiy rerun option. Pig will simply use the new configuration and recover the job. At least then behavior is easier to explain. > On Oct. 21, 2015, 11:19 a.m., Rohini Palaniswamy wrote: > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/MRJobRecovery.java, > > line 119 > > <https://reviews.apache.org/r/39226/diff/1/?file=1095353#file1095353line119> > > > > This might not be as simple as removing the operators. You might also > > have to traverse the plan and remove any predecessors and other corner > > cases. Since we are walking in dependency order, the predecessor should have already been removed. If not, current node will not recover - Abhishek ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/39226/#review103384 ----------------------------------------------------------- On Oct. 12, 2015, 11:30 a.m., Abhishek Agarwal wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/39226/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2015, 11:30 a.m.) > > > Review request for pig and Rohini Palaniswamy. > > > Repository: pig-git > > > Description > ------- > > Pig scripts can have multiple ETL jobs in the DAG which may take hours to > finish. In case of transient errors, the job fails. When the job is rerun, > all the nodes in Job graph will rerun. Some of these nodes may have already > run successfully. Redundant runs lead to wastage of cluster capacity and > pipeline delays. > > In case of failure, we can persist the graph state. In next run, only the > failed nodes and their successors will rerun. This is of course subject to > preconditions such as > > Pig script has not changed > > Input locations have not changed > > Output data from previous run is intact > > Configuration has not changed > > > Diffs > ----- > > src/org/apache/pig/PigConfiguration.java 03b36a5 > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java > 595e68c > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/MRIntermediateDataVisitor.java > 4b62112 > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/MRJobRecovery.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/MRJobState.java > PRE-CREATION > src/org/apache/pig/impl/io/FileLocalizer.java f0f9b43 > src/org/apache/pig/tools/grunt/GruntParser.java 439d087 > src/org/apache/pig/tools/pigstats/ScriptState.java 03a12b1 > > Diff: https://reviews.apache.org/r/39226/diff/ > > > Testing > ------- > > > Thanks, > > Abhishek Agarwal > >