[ https://issues.apache.org/jira/browse/PIG-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13259324#comment-13259324 ]
Dmitriy V. Ryaboy commented on PIG-2652: ---------------------------------------- Ok, not that simple. The adjuster messes with the inputs / outputs of the limiting job in pretty complex ways, so one would have to unroll all of that before running the pre-limit job. Also, apparently one can't simply run the adjuster inside the JobControlCompiler -- it does correctly add a new job, but that job fails with {code}12/04/22 17:27:31 INFO mapred.TaskInProgress: Error from attempt_20120422172532429_0004_m_000000_0: org.apache.pig.backend.executionengine.ExecException: ERROR 2044: The type null cannot be collected as a Key type {code} Moreover, the extra job is not accounted for in stats, progress, etc. Looking at how we can do this better. > Skew join and order by don't trigger reducer estimation > ------------------------------------------------------- > > Key: PIG-2652 > URL: https://issues.apache.org/jira/browse/PIG-2652 > Project: Pig > Issue Type: Bug > Reporter: Bill Graham > Assignee: Bill Graham > Fix For: 0.10.0, 0.9.3, 0.11 > > Attachments: PIG-2652_1.patch, PIG-2652_2.patch, PIG-2652_3.patch, > PIG-2652_3_10.patch, PIG-2652_4.patch, PIG-2652_5.patch > > > If neither PARALLEL, default parallel or {{mapred.reduce.tasks}} are set, the > number of reducers is not estimated based on input size for skew joins or > order by. Instead, these jobs get only 1 reducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira