[ https://issues.apache.org/jira/browse/PIG-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashutosh Chauhan updated PIG-951: --------------------------------- Attachment: pig-951.patch One line patch which fixes this. Also, added test case to catch regression on this. > Reset parallelism to 1 for indexing job in MergeJoin > ---------------------------------------------------- > > Key: PIG-951 > URL: https://issues.apache.org/jira/browse/PIG-951 > Project: Pig > Issue Type: Bug > Components: impl > Reporter: Ashutosh Chauhan > Assignee: Ashutosh Chauhan > Attachments: pig-951.patch > > > After sampling one tuple from every block, one reducer is used to sort the > index entries in reduce phase to produce sorted index to be used in actual > join job. Thus, parallelism of index job should be explictly set to 1. > Currently, its not. > Currently, this is a non-issue, since we don't allow any blocking operators > in pipeline before merge-join. However, later when we do allow blocking > operators, then parallelism of indexing job will be that of preceding > blocking operator. Even then, job will complete successfully because all > tuple will go to only one reducer, because we are grouping on only one key > "all". However, it will waste cluster resources by starting all the extra > reducers which get no data and thus do nothing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.