[ https://issues.apache.org/jira/browse/PIG-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254397#comment-13254397 ]
Bill Graham commented on PIG-2652: ---------------------------------- FYI, here's my script that reproduces with an initial Map-only job: {noformat} L = LOAD 'data1.txt' AS (owner:chararray,pet:chararray,age:int,phone:chararray); R = LOAD 'data2.txt' AS (owner:chararray,pet:chararray,age:int,phone:chararray); L2 = FILTER L BY ((int)age > 0); UNIONED = UNION L, L2; JOINED = JOIN UNIONED BY owner, R BY owner USING 'skewed'; STORE JOINED INTO 'tmp/skew_join_union'; {noformat} > Skew join and order by don't trigger reducer estimation > ------------------------------------------------------- > > Key: PIG-2652 > URL: https://issues.apache.org/jira/browse/PIG-2652 > Project: Pig > Issue Type: Bug > Reporter: Bill Graham > Assignee: Bill Graham > Fix For: 0.10.0, 0.9.3, 0.11 > > Attachments: PIG-2652_1.patch > > > If neither PARALLEL, default parallel or {{mapred.reduce.tasks}} are set, the > number of reducers is not estimated based on input size for skew joins or > order by. Instead, these jobs get only 1 reducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira