[ 
https://issues.apache.org/jira/browse/HIVE-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096774#comment-14096774
 ] 

Rui Li commented on HIVE-7659:
------------------------------

After some research, I found the unnecessary sort is mainly introduced when we 
generate GBY operator. This patch ignores the sort order in RS if the partition 
keys, sorting keys and grouping keys are the same. Otherwise, e.g. in case of 
DISTINCT or data skew, we apply the sort shuffle according to the sort order so 
that the query can produce correct results.

> Unnecessary sort in query plan
> ------------------------------
>
>                 Key: HIVE-7659
>                 URL: https://issues.apache.org/jira/browse/HIVE-7659
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-7659-spark.patch
>
>
> For hive on spark.
> Currently we rely on the sort order in RS to decide whether we need a 
> sortByKey transformation. However a simple group by query will also have the 
> sort order set to '+'.
> Consider the query: select key from table group by key. The RS in the map 
> work will have sort order set to '+', thus requiring a sortByKey shuffle.
> To avoid the unnecessary sort, we should either use another way to decide if 
> there has to be a sort shuffle, or we should set the sort order only when 
> sort is really needed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to