----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16928/#review32286 -----------------------------------------------------------
This is great work. Thank you so much! I have two comments- 1) It doesn't seem to work for a map-only job. For eg, I tried to run load and dump in grunt as follows- x = load '/user/cheolsoop/foo'; dump x; This job doesn't get converted to local mode because no of reducers are 21, which doesn't make sense. See log output below- 2014-01-20 10:05:30,578 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Size of input: 8 bytes. 2014-01-20 10:05:30,578 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - No of reducers: 21 2014-01-20 10:05:30,578 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2) The changes in PigStats and PigStatsUtil might break backward compatibility. Perhaps we could avoid them if they're not necessary. Thoughts? trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java <https://reviews.apache.org/r/16928/#comment61021> Do you mind replacing these with static variables too? trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java <https://reviews.apache.org/r/16928/#comment61022> I think the pseudo distributed mode means single-node and multi-processes. But you mean the local mode (multi-threads) here, don't you? trunk/src/org/apache/pig/tools/pigstats/PigStats.java <https://reviews.apache.org/r/16928/#comment61027> I like removing this from PigStats. But I am a bit worried that this might break backward compatibility with downstream applications since it is public. trunk/src/org/apache/pig/tools/pigstats/mapreduce/MRPigStatsUtil.java <https://reviews.apache.org/r/16928/#comment61023> Update the comment to reflect the change. trunk/src/org/apache/pig/tools/pigstats/mapreduce/MRPigStatsUtil.java <https://reviews.apache.org/r/16928/#comment61024> Update the comment to reflect the change. - Cheolsoo Park On Jan. 16, 2014, 10:04 p.m., Aniket Mokashi wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/16928/ > ----------------------------------------------------------- > > (Updated Jan. 16, 2014, 10:04 p.m.) > > > Review request for pig, Cheolsoo Park, Daniel Dai, Dmitriy Ryaboy, and Julien > Le Dem. > > > Bugs: PIG-3463 > https://issues.apache.org/jira/browse/PIG-3463 > > > Repository: pig > > > Description > ------- > > If pig.auto.local.enabled is set, JCC will modify Configuration of all the > jobs with one reducer and input size less than pig.auto.local.input.maxbytes, > so that they are forced to run in local mode. Output of local run is also > written to hdfs. > > > Diffs > ----- > > trunk/src/org/apache/pig/ExecTypeProvider.java 1558572 > trunk/src/org/apache/pig/PigConfiguration.java 1558572 > trunk/src/org/apache/pig/backend/hadoop/datastorage/ConfigurationUtil.java > 1558572 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java > 1558572 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java > 1558572 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java > 1558572 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java > 1558572 > > trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigInputFormat.java > 1558572 > trunk/src/org/apache/pig/impl/PigImplConstants.java 1558572 > trunk/src/org/apache/pig/tools/pigstats/EmbeddedPigStats.java 1558572 > trunk/src/org/apache/pig/tools/pigstats/PigStats.java 1558572 > trunk/src/org/apache/pig/tools/pigstats/mapreduce/MRPigStatsUtil.java > 1558572 > trunk/src/org/apache/pig/tools/pigstats/mapreduce/SimplePigStats.java > 1558572 > trunk/test/org/apache/pig/test/TestAutoLocalMode.java PRE-CREATION > > Diff: https://reviews.apache.org/r/16928/diff/ > > > Testing > ------- > > Tried few scenarios with the patch- > Load small data, group all, count - works in local mode. > Load small data, another small data and replicated join - works in local mode. > Load small data and order by key - all 3 jobs work in local mode and . > Load small data and large data for replicated join - first job runs in local > mode, second runs in MR mode. > Load large data and order by key - works in first stages in local mode and > last stage in MR mode. > > > Thanks, > > Aniket Mokashi > >
