[
https://issues.apache.org/jira/browse/HIVE-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174223#comment-14174223
]
Chao commented on HIVE-8486:
----------------------------
I just talked with [~szehon] offline, and he pointed me to
{{ReduceSinkOperator::computeHashCode}}. This function returns:
{noformat}
return bucketNumber < 0 ? keyHashCode : keyHashCode * 31 + bucketNumber;
{noformat}
So, if the {{bucketNumber}} is 0, and we set the number of partitions to 31,
then all keys will go to the same partition. I think this explains why the
issue only happens only when we set {{mapreduce.job.reduces}} to 31. I also
verified it locally.
> TPC-DS Query 96 parallelism is not set correcly
> -----------------------------------------------
>
> Key: HIVE-8486
> URL: https://issues.apache.org/jira/browse/HIVE-8486
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Brock Noland
> Assignee: Chao
>
> When we run the query on a 20B we only have a parallelism factor of 1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)