[
https://issues.apache.org/jira/browse/HIVE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204357#comment-14204357
]
Xuefu Zhang commented on HIVE-8542:
-----------------------------------
{quote}
One thing I'm not quite sure is that if we still need SHUFFLE_SORT. Ideally it
should only be used for total order, but we can also achieve that with MR
shuffle and setting #reducer to 1. I think hive forces #reducer to 1 for order
by query, right?
{quote}
Yes, MR does total ordering by setting reducer to 1. Yes, in Spark, we can
achieve the same thing with MR styled shuffer + 1 reducer. Historically, Spark
didn't do a good job on Shuffle_SORT, but I heard they have made improvement
lately. On the other hand, having 1 reducer isn't good either.
I think for now we keep both, but later we can do some benchmarking to see
which performs better.
> Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch]
> ----------------------------------------------------------------------------
>
> Key: HIVE-8542
> URL: https://issues.apache.org/jira/browse/HIVE-8542
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Reporter: Chao
> Assignee: Rui Li
> Attachments: HIVE-8542.1-spark.patch, HIVE-8542.2-spark.patch,
> HIVE-8542.3-spark.patch, HIVE-8542.4-spark.patch
>
>
> Currently, in Spark branch, results for these two test files are very
> different from MR's. We need to find out the cause for this, and identify
> potential bug in our current implementation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)