[
https://issues.apache.org/jira/browse/HIVE-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rui Li updated HIVE-7527:
-------------------------
Attachment: HIVE-7527-spark.patch
> Support order by and sort by on Spark
> -------------------------------------
>
> Key: HIVE-7527
> URL: https://issues.apache.org/jira/browse/HIVE-7527
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Xuefu Zhang
> Assignee: Rui Li
> Attachments: HIVE-7527-spark.patch
>
>
> Currently Hive depends completely on MapReduce's sorting as part of shuffling
> to achieve order by (global sort, one reducer) and sort by (local sort).
> Spark has a sort by transformation in different variations that can used to
> support Hive's order by and sort by. However, we still need to evaluate
> weather Spark's sortBy can achieve the same functionality inherited from
> MapReduce's shuffle sort.
> Currently Hive on Spark should be able to run simple sort by or order by, by
> changing the currently partitionBy to sortby. This is the way to verify
> theories. Complete solution will not be available until we have complete
> SparkPlanGenerator.
> There is also a question of how we determine that there is order by or sort
> by by just looking at the operator tree, from which Spark task is created.
> This is the responsibility of SparkPlanGenerator, but we need to have an idea.
--
This message was sent by Atlassian JIRA
(v6.2#6252)