----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16507/#review31213 -----------------------------------------------------------
Ship it! Looks good to me. I will commit it after running unit tests and e2e tests. I found a minor bug below. Let me fix it when I commit it. /trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java <https://reviews.apache.org/r/16507/#comment59615> I think "return" is omitted here. The explain still outputs the MR plan even if the plan is fetchable. - Cheolsoo Park On Jan. 3, 2014, 10:57 p.m., Lorand Bendig wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/16507/ > ----------------------------------------------------------- > > (Updated Jan. 3, 2014, 10:57 p.m.) > > > Review request for pig. > > > Bugs: PIG-3642 > https://issues.apache.org/jira/browse/PIG-3642 > > > Repository: pig > > > Description > ------- > > With this patch I'd like to add the possibility to directly read data from > HDFS instead of launching MR jobs in case of simple (map-only) tasks. Hive > already has this feature (fetch). This patch shares some similarities with > the local mode of Pig 0.6. Here, fetching kicks off when the following holds > for a script: > > it contains only LIMIT, FILTER, UNION (if no split is generated), STREAM, > (nested) FOREACH with expression operators, custom UDFs..etc > no scalar aliases > no SampleLoader > single leaf job > DUMP (no STORE) > > The feature is enabled by default and can be toggled with: > > -N or -no_fetch > set opt.fetch true/false; > > There's no STORE support because I wanted to make it explicit that this > "optimization" is for launching small/simple scripts during development, > rather than querying and filtering large number of rows on the client > machine. However, a threshold could be given on the input size (an > estimation) to determine whether to prefer fetch over MR jobs, similar to > what Hive's 'hive.fetch.task.conversion.threshold' does. (through Pig's > LoadMetadata#getStatistic ?) > > > Diffs > ----- > > > /trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java > 1555255 > > /trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/FixedWidthLoader.java > 1555255 > /trunk/src/org/apache/pig/Main.java 1555255 > /trunk/src/org/apache/pig/PigConfiguration.java 1555255 > /trunk/src/org/apache/pig/PigServer.java 1555255 > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java > 1555255 > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/fetch/FetchLauncher.java > PRE-CREATION > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/fetch/FetchOptimizer.java > PRE-CREATION > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/fetch/FetchPOStoreImpl.java > PRE-CREATION > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/fetch/FetchProgressableReporter.java > PRE-CREATION > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java > 1555255 > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java > 1555255 > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POStream.java > 1555255 > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/util/MapRedUtil.java > 1555255 > /trunk/src/org/apache/pig/impl/io/FileLocalizer.java 1555255 > /trunk/src/org/apache/pig/impl/util/PropertiesUtil.java 1555255 > /trunk/src/org/apache/pig/impl/util/Utils.java 1555255 > > /trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java > 1555255 > /trunk/src/org/apache/pig/tools/pigstats/SimpleFetchPigStats.java > PRE-CREATION > /trunk/test/org/apache/pig/test/TestAssert.java 1555255 > /trunk/test/org/apache/pig/test/TestEvalPipeline2.java 1555255 > /trunk/test/org/apache/pig/test/TestFetch.java PRE-CREATION > /trunk/test/org/apache/pig/test/TestPigRunner.java 1555255 > > Diff: https://reviews.apache.org/r/16507/diff/ > > > Testing > ------- > > - new testcase added: TestFetch > - the patch was checked against test-commit and test-core > - Because opt.fetch is set by default, the testcases were using fetch instead > of MR jobs wherever it was possible > > > Thanks, > > Lorand Bendig > >