> On Dec. 30, 2013, 9:50 p.m., Cheolsoo Park wrote: > > This a great work. Thank you very much! > > > > I have few minor comments below mostly about tests.
Cheolsoo, thanks for taking your time to review it! I fixed/commented the issues. > On Dec. 30, 2013, 9:50 p.m., Cheolsoo Park wrote: > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/fetch/FetchOptimizer.java, > > line 74 > > <https://reviews.apache.org/r/16507/diff/1/?file=404117#file404117line74> > > > > Can you move this to PigConfiguration? PigConfiguration seems to be a better place to put OPT_FETCH > On Dec. 30, 2013, 9:50 p.m., Cheolsoo Park wrote: > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POStream.java, > > line 23 > > <https://reviews.apache.org/r/16507/diff/1/?file=404122#file404122line23> > > > > Do you mind removing unused imports? > > - import java.util.LinkedList; > > - import org.apache.pig.impl.util.IdentityHashSet; > > - import org.apache.pig.pen.util.LineageTracer; Sure. Intially didn't want to remove these leftovers from PIG-1712 > On Dec. 30, 2013, 9:50 p.m., Cheolsoo Park wrote: > > /trunk/test/org/apache/pig/test/TestAssert.java, lines 91-94 > > <https://reviews.apache.org/r/16507/diff/1/?file=404127#file404127line91> > > > > Does this else block ever get executed given the we're running the test > > with opt.fetch on? > > > > I think you can do either- > > > > 1) explicitly set opt.fetch to true or false in setup(), > > > > or > > > > 2) change the test to run the query twice with opt.fetch on and off to > > ensure we're not breaking anything when opt.fetch is off. Not really. I chose the second option > On Dec. 30, 2013, 9:50 p.m., Cheolsoo Park wrote: > > /trunk/test/org/apache/pig/test/TestPigRunner.java, line 174 > > <https://reviews.apache.org/r/16507/diff/1/?file=404130#file404130line174> > > > > Why is this changed? I think the default value for opt.multiquery is > > true. I accidentally changed it - Lorand ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16507/#review30936 ----------------------------------------------------------- On Dec. 29, 2013, 11:19 p.m., Lorand Bendig wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/16507/ > ----------------------------------------------------------- > > (Updated Dec. 29, 2013, 11:19 p.m.) > > > Review request for pig. > > > Bugs: PIG-3642 > https://issues.apache.org/jira/browse/PIG-3642 > > > Repository: pig > > > Description > ------- > > With this patch I'd like to add the possibility to directly read data from > HDFS instead of launching MR jobs in case of simple (map-only) tasks. Hive > already has this feature (fetch). This patch shares some similarities with > the local mode of Pig 0.6. Here, fetching kicks off when the following holds > for a script: > > it contains only LIMIT, FILTER, UNION (if no split is generated), STREAM, > (nested) FOREACH with expression operators, custom UDFs..etc > no scalar aliases > no SampleLoader > single leaf job > DUMP (no STORE) > > The feature is enabled by default and can be toggled with: > > -N or -no_fetch > set opt.fetch true/false; > > There's no STORE support because I wanted to make it explicit that this > "optimization" is for launching small/simple scripts during development, > rather than querying and filtering large number of rows on the client > machine. However, a threshold could be given on the input size (an > estimation) to determine whether to prefer fetch over MR jobs, similar to > what Hive's 'hive.fetch.task.conversion.threshold' does. (through Pig's > LoadMetadata#getStatistic ?) > > > Diffs > ----- > > > /trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java > 1553596 > > /trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/FixedWidthLoader.java > 1553596 > /trunk/src/org/apache/pig/Main.java 1553596 > /trunk/src/org/apache/pig/PigServer.java 1553596 > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java > 1553596 > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/fetch/FetchLauncher.java > PRE-CREATION > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/fetch/FetchOptimizer.java > PRE-CREATION > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/fetch/FetchPOStoreImpl.java > PRE-CREATION > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/fetch/FetchProgressableReporter.java > PRE-CREATION > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java > 1553596 > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java > 1553596 > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POStream.java > 1553596 > > /trunk/src/org/apache/pig/backend/hadoop/executionengine/util/MapRedUtil.java > 1553596 > /trunk/src/org/apache/pig/impl/util/PropertiesUtil.java 1553596 > > /trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java > 1553596 > /trunk/src/org/apache/pig/tools/pigstats/SimpleFetchPigStats.java > PRE-CREATION > /trunk/test/org/apache/pig/test/TestAssert.java 1553596 > /trunk/test/org/apache/pig/test/TestEvalPipeline2.java 1553596 > /trunk/test/org/apache/pig/test/TestFetch.java PRE-CREATION > /trunk/test/org/apache/pig/test/TestPigRunner.java 1553596 > > Diff: https://reviews.apache.org/r/16507/diff/ > > > Testing > ------- > > - new testcase added: TestFetch > - the patch was checked against test-commit and test-core > - Because opt.fetch is set by default, the testcases were using fetch instead > of MR jobs wherever it was possible > > > Thanks, > > Lorand Bendig > >