[ https://issues.apache.org/jira/browse/PIG-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255408#comment-13255408 ]
Dmitriy V. Ryaboy commented on PIG-2652: ---------------------------------------- Agreed with unlinking from 0.10, this is clearly becoming a major patch rather than a minor one. 0.10.1, maybe.. crossing fingers. We should document this for 0.10, at least. Interesting about LimitAdjuster. Separate jira or do you think we can kill both birds with one stone? I really think avoiding having to know # of reducers in any optimizers will serve us better in the long term. Can LimitAdjuster be done without this knowledge? Re: size estimation for skewed join, yes, I mean the "big" table -- except it's not the big table, it's the one with data skew. The other table might be the same size, or even bigger! > Skew join and order by don't trigger reducer estimation > ------------------------------------------------------- > > Key: PIG-2652 > URL: https://issues.apache.org/jira/browse/PIG-2652 > Project: Pig > Issue Type: Bug > Reporter: Bill Graham > Assignee: Bill Graham > Fix For: 0.10.0, 0.9.3, 0.11 > > Attachments: PIG-2652_1.patch, PIG-2652_2.patch, PIG-2652_3.patch, > PIG-2652_3_10.patch > > > If neither PARALLEL, default parallel or {{mapred.reduce.tasks}} are set, the > number of reducers is not estimated based on input size for skew joins or > order by. Instead, these jobs get only 1 reducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira