[ https://issues.apache.org/jira/browse/PIG-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093330#comment-13093330 ]
jirapos...@reviews.apache.org commented on PIG-2237: ---------------------------------------------------- ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1664/#review1684 ----------------------------------------------------------- trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/LimitAdjuster.java <https://reviews.apache.org/r/1664/#comment3838> please fix the indentation for the contents of this if block, and add {}s trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/LimitAdjuster.java <https://reviews.apache.org/r/1664/#comment3839> you keep having to cast it. Just add POStore storeOp = (POStore) mpLeaf; at the beginning of the block; it'll clean up the code. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRUtil.java <https://reviews.apache.org/r/1664/#comment3841> please add documentation for Pig Developers indicating when and how to use the methods in this helper class. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRUtil.java <https://reviews.apache.org/r/1664/#comment3840> will using this mess up projection push-down? - Dmitriy On 2011-08-29 23:34:23, Daniel Dai wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1664/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-08-29 23:34:23) bq. bq. bq. Review request for pig and Thejas Nair. bq. bq. bq. Summary bq. ------- bq. bq. See PIG-2237 bq. bq. bq. This addresses bug PIG-2237. bq. https://issues.apache.org/jira/browse/PIG-2237 bq. bq. bq. Diffs bq. ----- bq. bq. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/LimitAdjuster.java PRE-CREATION bq. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java 1162260 bq. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRUtil.java PRE-CREATION bq. trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1162260 bq. trunk/test/org/apache/pig/test/TestEvalPipeline2.java 1162260 bq. trunk/test/org/apache/pig/test/TestMRCompiler.java 1162260 bq. bq. Diff: https://reviews.apache.org/r/1664/diff bq. bq. bq. Testing bq. ------- bq. bq. Test-patch: bq. [exec] +1 overall. bq. [exec] bq. [exec] +1 @author. The patch does not contain any @author tags. bq. [exec] bq. [exec] +1 tests included. The patch appears to include 3 new or modified tests. bq. [exec] bq. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. bq. [exec] bq. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. bq. [exec] bq. [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. bq. [exec] bq. [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. bq. bq. Unit test: bq. all pass. bq. bq. bq. Thanks, bq. bq. Daniel bq. bq. > LIMIT generates wrong number of records if pig determines no of reducers as > more than 1 > --------------------------------------------------------------------------------------- > > Key: PIG-2237 > URL: https://issues.apache.org/jira/browse/PIG-2237 > Project: Pig > Issue Type: Bug > Affects Versions: 0.8.0, 0.9.0 > Reporter: Anitha Raju > Assignee: Daniel Dai > Fix For: 0.9.1, 0.10 > > Attachments: PIG-2237-1.patch, PIG-2237-2.patch, PIG-2237-3.patch > > > Hi, > For a script > ======== > A = load 'test.txt' using PigStorage() as (a:int,b:int); > B = order A by a ; > C = limit B 2; > store C into 'op1' using PigStorage(); > ======== > Limit and ORDER BY are done in the same MR job if no explicit PARALLELism is > mentioned. > In this case, the no of reducers are determined by pig and sometimes it is > calculated > 1. > Since limit happens at the reduce side, each reduce tasks does a limit > separately generating n*2 records where n is the no of reduce tasks > calculated by pig. > If an explicit specification of no of reduce tasks using PARALLEL keyword is > done on ORDER BY, > ========== > B = order A by a PARALLEL 4; > ========== > another MR is created with 1 reduce task where the limit is done. > In short, the issue occurs when the no of reducers calculated by pig is > greater than 1 and a limit is involved in the MR. > The issue can be replicated by specifying > ========== > -Dpig.exec.reducers.bytes.per.reducer > ========== > The issue is seen in 0.8 and 0.9 version. It works good in 0.7 > Regards, > Anitha -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira