-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16507/#review30936
-----------------------------------------------------------


This a great work. Thank you very much!

I have few minor comments below mostly about tests.


/trunk/src/org/apache/pig/backend/hadoop/executionengine/fetch/FetchLauncher.java
<https://reviews.apache.org/r/16507/#comment59191>

    Replace UDFContext.getUdFContext() with udfContext.



/trunk/src/org/apache/pig/backend/hadoop/executionengine/fetch/FetchOptimizer.java
<https://reviews.apache.org/r/16507/#comment59252>

    Do you mind removing trailing white spaces in the patch?



/trunk/src/org/apache/pig/backend/hadoop/executionengine/fetch/FetchOptimizer.java
<https://reviews.apache.org/r/16507/#comment59192>

    Can you move this to PigConfiguration?



/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POStream.java
<https://reviews.apache.org/r/16507/#comment59194>

    Do you mind removing unused imports?
    - import java.util.LinkedList;
    - import org.apache.pig.impl.util.IdentityHashSet;
    - import org.apache.pig.pen.util.LineageTracer;



/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POStream.java
<https://reviews.apache.org/r/16507/#comment59276>

    Do you mind adding a comment explaining about isFetchable in POStream?



/trunk/test/org/apache/pig/test/TestAssert.java
<https://reviews.apache.org/r/16507/#comment59195>

    Does this else block ever get executed given the we're running the test 
with opt.fetch on?
    
    I think you can do either-
    
    1) explicitly set opt.fetch to true or false in setup(),
    
    or
    
    2) change the test to run the query twice with opt.fetch on and off to 
ensure we're not breaking anything when opt.fetch is off.



/trunk/test/org/apache/pig/test/TestEvalPipeline2.java
<https://reviews.apache.org/r/16507/#comment59197>

    Can you change the test to not take into account the order of records 
instead of handling fetchEnabled separately? Since union doesn't guarantee any 
ordering, the test shouldn't assume it in the first place.
    



/trunk/test/org/apache/pig/test/TestEvalPipeline2.java
<https://reviews.apache.org/r/16507/#comment59248>

    Same as above. Given that opt.fetch is on in the test, the bincond will be 
always true.
    
    I think you can do either-
    
    1) explicitly set opt.fetch to true or false in setup(),
    
    or
    
    2) change the test to run the query twice with opt.fetch on and off.



/trunk/test/org/apache/pig/test/TestPigRunner.java
<https://reviews.apache.org/r/16507/#comment59196>

    Why is this changed? I think the default value for opt.multiquery is true.



/trunk/test/org/apache/pig/test/TestPigRunner.java
<https://reviews.apache.org/r/16507/#comment59250>

    Same here. The default value for opt.multiquery is true.


- Cheolsoo Park


On Dec. 29, 2013, 11:19 p.m., Lorand Bendig wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16507/
> -----------------------------------------------------------
> 
> (Updated Dec. 29, 2013, 11:19 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Bugs: PIG-3642
>     https://issues.apache.org/jira/browse/PIG-3642
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> With this patch I'd like to add the possibility to directly read data from 
> HDFS instead of launching MR jobs in case of simple (map-only) tasks. Hive 
> already has this feature (fetch). This patch shares some similarities with 
> the local mode of Pig 0.6. Here, fetching kicks off when the following holds 
> for a script:
> 
>     it contains only LIMIT, FILTER, UNION (if no split is generated), STREAM, 
> (nested) FOREACH with expression operators, custom UDFs..etc
>     no scalar aliases
>     no SampleLoader
>     single leaf job
>     DUMP (no STORE)
> 
> The feature is enabled by default and can be toggled with:
> 
>     -N or -no_fetch
>     set opt.fetch true/false;
> 
> There's no STORE support because I wanted to make it explicit that this 
> "optimization" is for launching small/simple scripts during development, 
> rather than querying and filtering large number of rows on the client 
> machine. However, a threshold could be given on the input size (an 
> estimation) to determine whether to prefer fetch over MR jobs, similar to 
> what Hive's 'hive.fetch.task.conversion.threshold' does. (through Pig's 
> LoadMetadata#getStatistic ?)
> 
> 
> Diffs
> -----
> 
>   
> /trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java
>  1553596 
>   
> /trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/FixedWidthLoader.java
>  1553596 
>   /trunk/src/org/apache/pig/Main.java 1553596 
>   /trunk/src/org/apache/pig/PigServer.java 1553596 
>   
> /trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java
>  1553596 
>   
> /trunk/src/org/apache/pig/backend/hadoop/executionengine/fetch/FetchLauncher.java
>  PRE-CREATION 
>   
> /trunk/src/org/apache/pig/backend/hadoop/executionengine/fetch/FetchOptimizer.java
>  PRE-CREATION 
>   
> /trunk/src/org/apache/pig/backend/hadoop/executionengine/fetch/FetchPOStoreImpl.java
>  PRE-CREATION 
>   
> /trunk/src/org/apache/pig/backend/hadoop/executionengine/fetch/FetchProgressableReporter.java
>  PRE-CREATION 
>   
> /trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
>  1553596 
>   
> /trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
>  1553596 
>   
> /trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POStream.java
>  1553596 
>   
> /trunk/src/org/apache/pig/backend/hadoop/executionengine/util/MapRedUtil.java 
> 1553596 
>   /trunk/src/org/apache/pig/impl/util/PropertiesUtil.java 1553596 
>   
> /trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java
>  1553596 
>   /trunk/src/org/apache/pig/tools/pigstats/SimpleFetchPigStats.java 
> PRE-CREATION 
>   /trunk/test/org/apache/pig/test/TestAssert.java 1553596 
>   /trunk/test/org/apache/pig/test/TestEvalPipeline2.java 1553596 
>   /trunk/test/org/apache/pig/test/TestFetch.java PRE-CREATION 
>   /trunk/test/org/apache/pig/test/TestPigRunner.java 1553596 
> 
> Diff: https://reviews.apache.org/r/16507/diff/
> 
> 
> Testing
> -------
> 
> - new testcase added:  TestFetch
> - the patch was checked against test-commit and test-core
> - Because opt.fetch is set by default, the testcases were using fetch instead 
> of MR jobs wherever it was possible
> 
> 
> Thanks,
> 
> Lorand Bendig
> 
>

Reply via email to