[jira] [Commented] (HIVE-13293) Query occurs performance degradation after enabling parallel order by for Hive on Spark

Xuefu Zhang (JIRA) Wed, 11 May 2016 08:49:36 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-13293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280312#comment-15280312
 ]


Xuefu Zhang commented on HIVE-13293:
------------------------------------

[~lirui], thanks for working on this. The patch looks good, but one thing I'm 
not very sure of is the persistence level. Order by is almost always at the end 
of stages. Thus, does it make sense to have a mixed of memory and disk?

As a side, out of scope question, do we need to explicitly call rdd.unpersist() 
for those cached rdds once a query is completed? Right now, rdds are never 
reused across queries.

> Query occurs performance degradation after enabling parallel order by for 
> Hive on Spark
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-13293
>                 URL: https://issues.apache.org/jira/browse/HIVE-13293
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 2.0.0
>            Reporter: Lifeng Wang
>            Assignee: Rui Li
>         Attachments: HIVE-13293.1.patch, HIVE-13293.1.patch
>
>
> I use TPCx-BB to do some performance test on Hive on Spark engine. And found 
> query 10 has performance degradation when enabling parallel order by.
> It seems that sampling cost much time before running the real query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13293) Query occurs performance degradation after enabling parallel order by for Hive on Spark

Reply via email to